Transparency of workflows
NiPreps adopt fMRIPrep's foundations, and particularly resonate with the transparency principles. As discussed in (Esteban et al., 2019 -- preprint):
The rapid increase in the volume and diversity of data, as well as the evolution of available techniques for processing and analysis, presents an opportunity for considerable advancement of research in neuroscience. The drawback resides in the need for progressively more complex analysis workflows that rely on decreasingly interpretable models of the data. Such context encourages ‘black-box’ solutions that efficiently perform a valuable service but do not provide insights into how the tool has transformed the data into the expected outputs. Black boxes obscure important steps in the inductive process mediating between experimental measurements and reported findings. This way of moving forward risks producing a future generation of cognitive neuroscientists who have become experts in sophisticated computational methods but have little to no working knowledge of how their data were transformed through processing. Transparency is often identified as a remedy for these problems. fMRIPrep ascribes to ‘glass-box’ principles, which are defined in opposition to the many different facets or levels at which black-box solutions are opaque. The visual reports that fMRIPrep generates are a crucial aspect of the glass-box approach. Their quality control checkpoints represent the logical flow of preprocessing, allowing scientists to critically inspect and better understand the underlying mechanisms of the workflow. A second transparency element is the citation boilerplate that formalizes all details of the workflow and provides the versions of all involved tools along with references to the corresponding scientific literature. A third asset for transparency is thorough documentation that delivers additional details on each of the building blocks represented in the visual reports and described in the boilerplate. Further, fMRIPrep has been open-source since its inception: users have access to all of the incremental additions to the tool through the history of the version-control system. The use of GitHub grants access to the discussions held during development, allowing one to see how and why the main design decisions were made. The modular design of fMRIPrep enhances its flexibility and improves transparency, as the main features of the software are more easily accessible to potential collaborators. In combination with some coding style and contribution guidelines, this modularity has enabled multiple contributions by peers and the creation of a rapidly growing community that would be difficult to nurture behind closed doors.
Visual reports beyond quality control¶
One foundational component of the NiPreps framework is the Visual Report System. End-user applications such as fMRIPrep or dMRIPrep generate individual reports after their preprocessing. Those visual reports have two fundamental purposes:
- assessing the quality of the generated outputs, permitting the user to take quality control actions to eliminate biases originated from inadequate processing; and
- understanding the workflow, by sequentially presenting the main steps of processing, the user can access the why the tool in particular took these steps ando more geneally why standard preprocessing involves that step.
Citation boilerplates¶
NiPreps leverage the wealth of existing neuroimaging software that is available to researchers. To give back for standing on the shoulders of giants, NiPreps aim at the most thorough reporting possible crediting all the pieces of the prior knowledge they leverage. With the execution of some particular NiPreps, the application runs some introspection code to formalize the computational graph the particular workflow executed and iterates over all the nodes to extract the relevant articles and communications that should be cited, as well as all software tools and their versions involved. Similarly, ancillary materials such as neuroimaging templates and atlases are reported and cited.
All these references and citations are finally collated in a natural language description of the workflow. This description is therefore generated automatically, and contains all the details that are necessary to replicate the processing, as well as the abovementioned references. The text is appended to the visual report, and provided in three formats (markdown, latex and html/plain-text) with an index of citations, so that the user is only required to "copy-and-paste" into the Methods section of their papers.
Note for reviewers and editors
The boilerplate text generated by some NiPreps is intended to allow for clear, consistent description of the preprocessing steps used, in order to improve the reproducibility of studies. We fully intend for it to be copied verbatim, and have released it under the CC0 license, dedicating it to the public domain in jurisdictions that recognize the concept, and assert that we will take no action to enforce copyright in jurisdictions where we cannot disclaim it.
We firmly believe that requiring authors to modify this passage will serve no legitimate scientific or literary purpose and can, in fact, serve only to reduce the replicability of the analysis being described by making the preprocessing steps less clear.
We recognize that there may be automated plagiarism detection software that will flag the boilerplate text. We would be happy to discuss potential solutions for annotating boilerplate sections of documents to indicate automatic generation, and can update our software to make this annotation simpler for authors.