Introduction to dMRI data#

Diffusion imaging probes the random, microscopic movement of water molecules by using MRI sequences that are sensitive to the geometry and environmental organization surrounding these protons. This is a popular technique for studying the white matter of the brain. The diffusion within biological structures, such as the brain, are often restricted due to barriers (e.g., cell membranes), resulting in a preferred direction of diffusion (anisotropy). A typical dMRI scan will acquire multiple volumes (or angular samples), each sensitive to a particular diffusion direction.

Sourced from Dr. A. Rokem, DIPY Workshop 2021

These diffusion directions (or orientations) are a fundamental piece of metadata to interpret dMRI data, as models need to know the exact orientation of each angular sample.

Main elements of a dMRI dataset

  • A 4D data array, where the last dimension encodes the reconstructed diffusion direction maps.

  • Tabular data or a 2D array, listing the diffusion directions (.bvec) and the encoding gradient strength (.bval).

In summary, dMRI involves complex data types that, as programmers, we want to access, query and manipulate with ease.

Python and object oriented programming#

Python is an object oriented programming language. It allows us to represent and encapsulate data types and corresponding behaviors into programming structures called objects.

Data structures

How you feed in data into your algorithm will impose constraints that might completely hinder the implementation of nonfunctional requirements down the line. Therefore, a careful plan must also be thought out for the data structures we are going to handle.

Therefore, let’s leverage Python to create objects that contain dMRI data. In Python, objects can be specified by defining a class. In the example code below, we’ve created a class with the name DWI. To simplify class creation, we’ve also used the magic of a Python library called attrs.

"""Representing data in hard-disk and memory."""
import attr

def _data_repr(value):
    if value is None:
        return "None"
    return f"<{'x'.join(str(v) for v in value.shape)} ({value.dtype})>"


@attr.s(slots=True)
class DWI:
    """Data representation structure for dMRI data."""

    dataobj = attr.ib(default=None, repr=_data_repr)
    """A numpy ndarray object for the data array, without *b=0* volumes."""
    brainmask = attr.ib(default=None, repr=_data_repr)
    """A boolean ndarray object containing a corresponding brainmask."""
    bzero = attr.ib(default=None, repr=_data_repr)
    """A *b=0* reference map, preferably obtained by some smart averaging."""
    gradients = attr.ib(default=None, repr=_data_repr)
    """A 2D numpy array of the gradient table in RAS+B format."""
    em_affines = attr.ib(default=None)
    """
    List of :obj:`nitransforms.linear.Affine` objects that bring
    DWIs (i.e., no b=0) into alignment.
    """

    def __len__(self):
        """Obtain the number of high-*b* orientations."""
        return self.gradients.shape[-1]

This code implements several attributes as well as a behavior - the __len__ method. The __len__ method is special in Python, as it will be executed when we call the built-in function len() on our object.

Let’s test out the DWI data structure with some simulated data:

# NumPy is a fundamental Python library for working with arrays
import numpy as np

# create a new DWI object, with only gradient information that is random
dmri_dataset = DWI(gradients=np.random.normal(size=(4, 64)))

# call Python's built-in len() function
print(len(dmri_dataset))
64

The output of this print() statement is telling us that this (simulated) dataset has 64 diffusion-weighted samples.

Using the new data representation object#

The code shown above was just a snippet of the DWI class. For simplicity, we will be using the full implementation of this class from our eddymotion package Under the data/ folder of this book’s distribution, we have stored a sample DWI dataset with filename dwi.h5. Please note that the file has been minimized by zeroing all but two diffusion-weighted orientation maps.

Let’s get some insights from it:

# import the class from the library
from eddymotion.data.dmri import DWI

# load the sample file
dmri_dataset = DWI.from_filename("../../data/dwi.h5")
print(len(dmri_dataset))
102

In this case, the dataset is reporting to have 102 diffusion-weighted samples.

Python will automatically generate a summary of this object if we just type the name of our new object. This pretty-printing of the object informs us about the data and metadata that, together, compose this particular DWI dataset:

dmri_dataset
DWI(dataobj=<118x118x78x102 (int16)>, affine=<4x4 (float64)>, brainmask=None, bzero=<118x118x78 (int16)>, gradients=<4x102 (float32)>, em_affines=None, fieldmap=None)

We’ll go over some of the components of dmri_dataset through this lesson.

Visualizing the data#

Exercise

Let’s start out by seeing what the data looks like. The fully-fledged DWI object has a convenience function to plot the dataset.

Hint

To see all of the instances and behaviors available to an object, try typing the object name, followed by . and Tab

Solution

Hide code cell content
dmri_dataset.plot_mosaic();
../_images/399c5ec60f7735d3b8113b93a1685264a382990c0567f5293eb05934c28a1663.png

When calling plot_mosaic() without any arguments, the b=0 reference is plotted. This b=0 reference is a map of the signal measured without gradient sensitization, or in other words, when we are not measuring diffusion in any direction. The b=0 map can be used by diffusion modeling as the reference to quantify the signal drop at every voxel and given a particular orientation gradient.

We can also get some insight into how a particular diffusion-weighted orientation looks like by selecting them with the argument index.

Exercise

Try calling plot_mosaic with an index of 10 or 100.

Solution

Hide code cell content
dmri_dataset.plot_mosaic(index=10, vmax=5000);
../_images/e0a96d6f52575f61a54293618c4e5d2e021210d1b3ecb4e43d19d429af2c09fd.png
Hide code cell content
dmri_dataset.plot_mosaic(index=100, vmax=5000);
../_images/998b77b9a08ee47c07fdca877148d71b21c097fe428b9d273a5df1f6699512d0.png

Diffusion that exhibits directionality in the same direction as the gradient results in a loss of signal. As we can see, diffusion-weighted images consistently drop almost all signal in voxels filled with cerebrospinal fluid because there, water diffusion is free (isotropic) regardless of the direction that is being measured.

We can also see that the images at index=10 and index=100 have different gradient strength (”b-value”). The higher the magnitude of the gradient, the more diffusion that is allowed to occur, indicated by the overall decrease in signal intensity. Stronger gradients yield diffusion maps with substantially lower SNR (signal-to-noise ratio), as well as larger distortions derived from the so-called “Eddy-currents”.

Visualizing the gradient information#

Our DWI object stores the gradient information in the gradients attribute.

Exercise

Let’s see the shape of the gradient information.

Solution

Hide code cell content
dmri_dataset.gradients.shape
(4, 102)

We get a \(4\times102\) – three spatial coordinates (\(b_x\), \(b_y\), \(b_z\)) of the unit-norm “b-vector”, plus the gradient sensitization magnitude (the “b-value”), with a total of 102 different orientations for the case at hand.

Exercise

Try printing the gradient information to see what it contains. Remember to transpose (.T) the array.

Solution

Hide code cell content
print(dmri_dataset.gradients.T)
[[-6.04914188e-01 -6.79875135e-01 -4.14546251e-01  3.00000000e+02]
 [ 7.51762651e-03  9.17056799e-01 -3.98685753e-01  3.00000000e+02]
 [-8.86579394e-01  3.21218759e-01 -3.32859576e-01  3.00000000e+02]
 [ 8.16364467e-01  2.68300444e-01 -5.11433184e-01  3.00000000e+02]
 [ 4.34362024e-01 -7.33946085e-01 -5.22161603e-01  3.00000000e+02]
 [ 9.60559174e-02 -7.31587410e-02  9.92683768e-01  3.05000000e+02]
 [ 5.85605621e-01  7.62755051e-02  8.06999445e-01  1.00500000e+03]
 [ 1.32749856e-01  3.81094962e-01 -9.14955795e-01  9.95000000e+02]
 [-9.10080016e-01  3.49084914e-01  2.23369867e-01  1.00000000e+03]
 [-1.42672375e-01 -9.87846315e-01 -6.16788231e-02  1.00000000e+03]
 [-3.14908028e-01  4.70339447e-01  8.24386895e-01  1.00500000e+03]
 [ 9.26234543e-01  3.46899480e-01 -1.47479758e-01  1.00000000e+03]
 [ 6.95218086e-01  6.17206156e-01 -3.68413389e-01  9.95000000e+02]
 [ 1.79834515e-01 -9.48279917e-01  2.61581242e-01  1.00000000e+03]
 [-5.72267652e-01  8.19448650e-01 -3.18384580e-02  1.00000000e+03]
 [ 6.62053525e-01 -3.44300926e-01  6.65689111e-01  1.00500000e+03]
 [-1.81866080e-01  7.04759300e-01 -6.85739756e-01  9.95000000e+02]
 [-6.36256397e-01 -7.67920136e-01 -7.40036741e-02  1.00000000e+03]
 [-2.50486493e-01 -7.25317061e-01  6.41226709e-01  1.00500000e+03]
 [-1.81372240e-01 -5.50868332e-01 -8.14646065e-01  9.95000000e+02]
 [ 2.51025885e-01 -9.41051543e-01 -2.26733208e-01  1.00000000e+03]
 [ 8.59194577e-01  4.16565776e-01  2.97081828e-01  1.00000000e+03]
 [-5.87923348e-01 -4.98144925e-01 -6.37336493e-01  9.95000000e+02]
 [-2.08524555e-01  6.58749491e-02  9.75796103e-01  1.00500000e+03]
 [-8.81649256e-01 -4.38893167e-03 -4.71884906e-01  9.95000000e+02]
 [-6.17551386e-01  6.92788363e-01  3.72390330e-01  1.00000000e+03]
 [-2.11339086e-01 -1.57031611e-01 -9.64715958e-01  9.95000000e+02]
 [ 5.63584268e-01  3.84537101e-01 -7.31097817e-01  9.95000000e+02]
 [-1.04902856e-01  7.80609608e-01  6.16152585e-01  1.00500000e+03]
 [-3.27980161e-01 -8.34259689e-01 -4.43215251e-01  1.00000000e+03]
 [ 9.97081280e-01 -3.60786468e-02  6.72850981e-02  1.00000000e+03]
 [-3.64686340e-01 -8.92125309e-01  2.66676456e-01  1.00000000e+03]
 [ 2.97879040e-01 -3.06824297e-01  9.03950751e-01  1.00500000e+03]
 [ 7.01884270e-01 -3.75511825e-01 -6.05267942e-01  9.95000000e+02]
 [ 5.85624397e-01 -2.57630497e-02 -8.10173035e-01  9.95000000e+02]
 [ 8.74492407e-01  2.43720673e-02 -4.84426469e-01  9.95000000e+02]
 [ 5.51209927e-01 -7.06570625e-01  4.43762958e-01  1.00500000e+03]
 [ 8.55460048e-01 -4.86520320e-01  1.77443221e-01  1.00000000e+03]
 [-6.43364310e-01 -2.89421439e-01 -7.08743691e-01  1.99500000e+03]
 [ 1.21839322e-01  3.67268413e-01  9.22100365e-01  2.01000000e+03]
 [ 8.67053568e-01 -2.20949143e-01 -4.46541786e-01  1.99500000e+03]
 [-7.46472776e-01  8.38321110e-04 -6.65415406e-01  1.99500000e+03]
 [ 3.49373311e-01  4.06477451e-01 -8.44224155e-01  1.99000000e+03]
 [-5.60097814e-01  1.76819220e-01 -8.09336364e-01  1.99500000e+03]
 [-4.30994742e-02  9.74978864e-01  2.18079522e-01  2.00000000e+03]
 [-2.40879446e-01  8.55759144e-01  4.57879245e-01  2.00500000e+03]
 [-7.79660225e-01 -6.15284383e-01  1.16426118e-01  2.00000000e+03]
 [ 4.48004901e-01 -8.49978268e-01  2.77179599e-01  2.00500000e+03]
 [ 3.74817163e-01 -9.27097678e-01 -1.41696935e-03  2.00000000e+03]
 [ 6.10155761e-01 -5.36033213e-01  5.83419502e-01  2.00500000e+03]
 [ 9.50321257e-01 -1.07752405e-01  2.92025506e-01  2.00000000e+03]
 [-8.09516788e-01  2.45182857e-01 -5.33449113e-01  1.99500000e+03]
 [ 9.56617057e-01 -2.84275055e-01  6.38084337e-02  2.00000000e+03]
 [ 4.44206297e-01  6.86384022e-01  5.75810552e-01  2.00500000e+03]
 [ 9.54805911e-01  1.77383393e-01  2.38497064e-01  2.00000000e+03]
 [ 6.78448856e-01  4.65783864e-01 -5.68113148e-01  1.99500000e+03]
 [-1.22661926e-01 -5.87784886e-01  7.99664319e-01  2.00500000e+03]
 [ 4.10763621e-01 -1.32623598e-01 -9.02044475e-01  1.99000000e+03]
 [-9.28761899e-01 -1.47203013e-01  3.40195030e-01  2.00500000e+03]
 [-4.32663143e-01  4.41911817e-01 -7.85822213e-01  1.99500000e+03]
 [-1.73938900e-01 -7.22244203e-01 -6.69409096e-01  1.99500000e+03]
 [ 3.93927962e-01  4.76803184e-01  7.85798609e-01  2.00500000e+03]
 [ 2.16246262e-01 -8.66339266e-01  4.50215280e-01  2.00500000e+03]
 [ 5.89937627e-01 -5.72140634e-01 -5.69761992e-01  1.99500000e+03]
 [ 3.72014821e-01  9.28178847e-01 -9.43302456e-03  2.00000000e+03]
 [ 8.53485584e-01 -5.17996848e-01 -5.69352657e-02  2.00000000e+03]
 [ 2.51451343e-01  9.32365298e-01  2.59744376e-01  2.00000000e+03]
 [ 6.35405779e-02 -5.95680475e-01 -8.00704300e-01  1.99500000e+03]
 [-7.51151621e-01 -6.28546119e-01 -2.01744899e-01  2.00000000e+03]
 [-1.09398387e-01  9.92227495e-01 -5.93008772e-02  2.00000000e+03]
 [-5.18000782e-01  3.63550693e-01  7.74277806e-01  2.00500000e+03]
 [-8.37094843e-01 -4.31998879e-01  3.35632533e-01  2.00500000e+03]
 [ 7.27129698e-01  1.49743423e-01 -6.69969618e-01  1.99500000e+03]
 [-5.17419696e-01 -7.40754366e-01  4.28438783e-01  2.00500000e+03]
 [ 7.27875590e-01  4.97373462e-01  4.72034663e-01  2.00500000e+03]
 [-3.26542199e-01  7.36291781e-02 -9.42310452e-01  1.99000000e+03]
 [-6.14776909e-02 -8.78977180e-01 -4.72884387e-01  1.99500000e+03]
 [-7.73990631e-01  5.15239596e-01 -3.68057936e-01  1.99500000e+03]
 [-3.58383171e-03  1.07271522e-01  9.94223297e-01  2.01000000e+03]
 [-1.60011083e-01  5.34122169e-01 -8.30126464e-01  1.99000000e+03]
 [ 9.88182724e-01  1.43299058e-01 -5.44080175e-02  2.00000000e+03]
 [-9.05943990e-01 -4.18793827e-01 -6.22674413e-02  2.00000000e+03]
 [ 2.03286484e-01  1.53597519e-01 -9.66996610e-01  1.99000000e+03]
 [ 4.36532289e-01  6.08863235e-01 -6.62363291e-01  1.99500000e+03]
 [ 6.82894647e-01 -7.17134774e-01 -1.39185578e-01  2.00000000e+03]
 [ 7.86479354e-01 -4.98139858e-01 -3.65112156e-01  1.99500000e+03]
 [-4.65987802e-01  8.39519799e-01  2.79395491e-01  2.00000000e+03]
 [ 7.21606970e-01 -1.46813065e-01 -6.76556945e-01  1.99500000e+03]
 [ 5.12317955e-01  8.17944586e-01  2.61719227e-01  2.00000000e+03]
 [ 3.80590975e-01 -6.98121727e-01  6.06445849e-01  2.00500000e+03]
 [ 3.18718821e-01 -6.50580406e-01 -6.89320982e-01  1.99500000e+03]
 [ 4.12478864e-01  2.02624902e-01  8.88146579e-01  2.00500000e+03]
 [ 1.32435739e-01  9.71343756e-01 -1.97362855e-01  1.99500000e+03]
 [-9.72415447e-01  1.24751076e-01  1.97092354e-01  2.00000000e+03]
 [-2.45995522e-01 -8.56899619e-01  4.53000307e-01  2.00500000e+03]
 [ 1.21794799e-02  7.88615406e-01 -6.14766181e-01  1.99500000e+03]
 [-8.40300739e-01 -2.33427882e-01 -4.89291459e-01  1.99500000e+03]
 [ 4.91548866e-01  1.66130796e-01 -8.54856849e-01  1.99000000e+03]
 [ 6.91248178e-01 -6.99574709e-01  1.81028187e-01  2.00000000e+03]
 [-5.76671958e-01 -8.02036345e-01  1.55522212e-01  2.00000000e+03]
 [-7.08378106e-02  2.88473666e-01 -9.54863846e-01  1.99000000e+03]
 [-1.85595229e-01  3.32799673e-01  9.24553275e-01  2.01000000e+03]]

Later, we’ll refer to this array as the gradient table.

It consists of one row per diffusion-weighted image, with each row consisting of 4 values corresponding to [ R A S+ b ].

[ R A S+ ] are the components of the gradient direction. Note that the directions have been re-oriented with respect to world space coordinates. For more information on this, refer to the Affine section in The extra mile.

The last column, b, reflects the timing and strength of the gradients in units of s/mm².

To get a better sense of which gradient directions were sampled, let’s plot them!

dmri_dataset.plot_gradients();
../_images/8d6b247e26b2bfebc80d5b824813532419103e3785c59f168b7768873b2ee18e.png

We’ve projected all of the gradient directions onto the surface of a sphere, with each unique gradient strength colour-coded. Darkest hues correspond to the lowest b-values and brighter to the highest.

The LOGO (leave-one-gradient-out) splitter#

One final behavior that will make our endeavor easier in the long run is a convenience method for data splitting. In particular, we are implementing some sort of cross-validation scheme where we will iterate over different data splits. In this case, the splitting strategy is a simple leave-one-out. Because one “datapoint” in our DWI dataset corresponds to one gradient, we will refer to this partitioning of the dataset as leave-one-gradient-out (LOGO):

def logo_split(self, index, with_b0=False):
    """
    Produce one fold of LOGO (leave-one-gradient-out).

    Parameters
    ----------
    index : :obj:`int`
        Index of the DWI orientation to be left out in this fold.
    with_b0 : :obj:`bool`
        Insert the *b=0* reference at the beginning of the training dataset.

    Return
    ------
    (train_data, train_gradients) : :obj:`tuple`
        Training DWI and corresponding gradients.
        Training data/gradients come **from the updated dataset**.
    (test_data, test_gradients) :obj:`tuple`
        Test 3D map (one DWI orientation) and corresponding b-vector/value.
        The test data/gradient come **from the original dataset**.

    """
    dwframe = self.dataobj[..., index]
    bframe = self.gradients[..., index]

    # if the size of the mask does not match data, cache is stale
    mask = np.zeros(len(self), dtype=bool)
    mask[index] = True

    train_data = self.dataobj[..., ~mask]
    train_gradients = self.gradients[..., ~mask]

    if with_b0:
        train_data = np.concatenate(
            (np.asanyarray(self.bzero)[..., np.newaxis], train_data),
            axis=-1,
        )
        b0vec = np.zeros((4, 1))
        b0vec[0, 0] = 1
        train_gradients = np.concatenate(
            (b0vec, train_gradients),
            axis=-1,
        )

    return (
        (train_data, train_gradients),
        (dwframe, bframe),
    )

This function is contained in the DWI class shown earlier and will allow us to easily partition the dataset as follows:

from eddymotion.data.splitting import lovo_split as logo_split
from eddymotion.viz import plot_dwi

data_train, data_test = logo_split(dmri_dataset, 10)
plot_dwi(np.squeeze(data_test[0]), dmri_dataset.affine, gradient=data_test[1]);
../_images/9ab3f12b0129d7fc6ffbaada7ea931a1b0bb41fc89a09e65519a04e1548397aa.png

data_train is a tuple containing all diffusion-weighted volumes and the corresponding gradient table, excluding the left-out, which is stored in data_test (the 11th gradient indexed by 10, in this example). data_test[0] contains the held-out diffusion-weighted volume and data_test[1], the corresponding gradient table.

Exercise

Try printing the shapes of elements in the data_train tuple.

Solution

Hide code cell content
print(f"data_train[0] is the DW maps dataset and has {data_train[0].shape} dimensions")
print(f"data_train[1] is a gradient table and has {data_train[1].shape} dimensions")
data_train[0] is the DW maps dataset and has (118, 118, 78, 101) dimensions
data_train[1] is a gradient table and has (4, 101) dimensions

Exercise

Likewise for the left-out gradient, try printing the shapes of elements in the data_test tuple.

Solution

Hide code cell content
print(f"data_test[0] is left-out DW map and has {data_test[0].shape} dimensions")
print(f"data_test[1] is the corresponding DW gradient and has {data_test[1].shape} dimensions")
data_test[0] is left-out DW map and has (118, 118, 78, 1) dimensions
data_test[1] is the corresponding DW gradient and has (4, 1) dimensions

Next steps: diffusion modeling#

By modeling the diffusion signal, the acquired images can provide measurements which are related to the microscopic changes and estimate white matter trajectories.