NeuroImaging PREProcessing toolS (NiPreps)#

Pre-processing MRI data is a necessary task before doing any kind of data analysis.

Different kinds of artifacts can occur during a scan due to:

  • the subject

    • head motion

    • breathing, heart beating, blood vessels

    • metal items

  • scanner hardware limitations

    • distortions due to B0 and B1 inhomogeneities

    • eddy currents

    • signal drift

  • image reconstruction

    • Gibbs ringing

These physiological and acquisition artifacts can lower the accuracy, precision, and robustness of our analyses, and confound the interpretation of the results. Thus, pre-processing is necessary to minimize their influence and to promote more sensitive analyses.

Pre-processing can also help prepare the data for analysis in other ways. Some examples include:

  • image registration between acquisitions (e.g., sessions, runs, modalities, etc.)

  • image registration to standard spaces

  • identifying spurious sources of signal

  • automated segmentation (e.g., brain masking, tissue classification)

The problem of methodological variability#

Generally, researchers create ad-hoc pre-processing workflows for each dataset, building upon a large inventory of available tools. The complexity of these workflows has snowballed with rapid advances in acquisition parameters and processing steps.

In Botvinik et al., 2020 [1], 70 independent teams were tasked with analyzing the same fMRI dataset and testing 9 hypotheses. The study demonstrated the huge amount of variability in analytic approaches as no two teams chose identical workflows. One encouraging finding was that 48% of teams chose to pre-process the data using fMRIPrep [2], a standardized pipeline for fMRI data.

A similar predicament exists in the field of dMRI analysis. There has been a lot of effort in recent years to compare the influence of various pre-processing steps on tractography and structural connectivity [3] [4] and harmonize different datasets [5].

Differences in methods or parameters chosen [6] [7], implementations across software [8], and even operating systems or software versions [9] [10] all contribute to variability.

Doing reproducible neuroimaging research is hard. All of this points to a need for creating standardized pipelines for pre-processing MRI data that will reduce methodological variability and enable comparisons between different datasets and downstream analysis decisions.

Augmenting the scanner to produce “analysis grade” data#

../_images/sashimi.jpg

NiPreps are a collection of tools that work as an extension of the scanner produce “analysis-grade” data. By analysis-grade we mean something like sushi-grade fish: NiPreps produce minimally preprocessed data that nonetheless are safe to consume (meaning, ready for modeling and statistical analysis). From the reversed perspective, NiPreps are designed to be agnostic to downstream analysis. This means that NiPreps are carefully designed not to limit the potential analyses that can be performed on the preprocessed data. For instance, because spatial smoothing is a processing step tightly linked with the assumptions of your statistical model, fMRIPrep does not perform any spatial smoothing step.

Below is a depiction of the projects currently maintained by the NiPreps community. These tools arose out of the need to extend fMRIPrep to new imaging modalities and populations.

They can be organized into 3 layers:

  • Software infrastructure: deliver low-level interfaces and utilities

  • Middleware: contain functions that generalize across the end-user tools

  • End-user tools: perform pre-processing or quality control

../_images/nipreps-chart.png

NiPreps driving principles#

NiPreps are driven by three main principles, which are summarized below. These principles distill some design and organizational foundations.

1. Robust with very diverse data#

NiPreps are meant to be robust to different datasets and attempt to provide the best possible results independent of scanner manufacturer, acquisition parameters, or the presence of additional correction scans (such as field maps). The end-user tools only impose a single constraint on the input dataset - being compliant with BIDS (Brain Imaging Data Structure) [11]. BIDS enables consistency in how neuroimaging data is structured and ensures that the necessary metadata is complete. This minimizes human intervention in running the pipelines as they are able to adapt to the unique features of the input data and make decisions about whether a particular processing step is appropriate or not.

The scope of these tools is strictly limited to pre-processing tasks. This eases the burden of maintaining these tools but also helps focus on standardizing each processing step and reducing the amount of methodological variability. NiPreps only support BIDS-Derivatives as output.

NiPreps also aim to be robust in their codebase. The pipelines are modular and rely on widely-used tools such as AFNI, ANTs, FreeSurfer, FSL, Nilearn, or DIPY and are extensible via plug-ins. This modularity in the code allows each step to be thoroughly tested. Some examples of tests performed on different parts of the pipeline are shown below:

2. Easy to use#

NiPreps are packaged as a fully-compliant BIDS-Apps [12]. All of the software is containerized and the pipelines all share a common command-line interface:

<pipelines_name> <bids_dir> <output_dir> <participant> [--options]

Thanks to limiting the input dataset to BIDS, manual parameter input is reduced to a minimum, allowing the pipelines to run in an automated fashion.

3. “Glass box” philosophy#

NiPreps are thoroughly and transparently documented (including the generation of individual subject-level visual reports with a consistent format that serve as scaffolds for understanding the quality of each pre-processing step and any design decisions). Below is an example report:

../_images/dwi_reportlet.gif

NiPreps are also community-driven. The success of these tools has largely been driven by their strong uptake in the neuroimaging community. This has allowed them to be exercised on diverse datasets and has brought the interest of a variety of domain experts to contribute their knowledge towards improving the tools. The tools are “open source” and all of the code and ideas are visible on GitHub.

References#