Git-Annex and DataLad within containers
Apps may be able to identify if the input dataset is handled with DataLad or Git-Annex, and pull down linked data that has not been fetched yet. One example of one such application is MRIQC, and all the examples on this documentation page will refer to it.
Summary
Executing BIDS-Apps leveraging DataLad-controlled datasets within containers can be tricky. In particular, one of our general recommendations involves mounting or binding folders into the container in read-only mode, which will disallow DataLad from writing to the dataset tree. Similarly, and depending on the specific runtime settings of the container framework, DataLad may encounter issues with file ownership too. This section guides users through ensuring smooth execution of BIDS-Apps on DataLad/Git-annex-managed datasets.
DataLad and Docker¶
When executing MRIQC within Docker on a DataLad dataset (for instance, installed from OpenNeuro), we will need to ensure the following settings are observed:
- the user id (uid) who installed the DataLad dataset must match the uid who is executing MRIQC within the container runtime
- the uid who is executing MRIQC within the container must have sufficient permissions to write in the tree.
Setting execution uid¶
If the uid is not correct, we will likely encounter the following error:
datalad.runner.exception.CommandError: CommandError: 'git -c diff.ignoreSubmodules=none -c core.quotepath=false -c annex.merge-annex-branches=false annex find --not --in . --json --json-error-messages -c annex.dotfiles=true -- sub-0001/func/sub-0001_task-restingstate_acq-mb3_bold.nii.gz sub-0002/func/sub-0002_task-emomatching_acq-seq_bold.nii.gz sub-0002/func/sub-0002_task-restingstate_acq-mb3_bold.nii.gz sub-0001/func/sub-0001_task-emomatching_acq-seq_bold.nii.gz sub-0001/func/sub-0001_task-faces_acq-mb3_bold.nii.gz sub-0001/dwi/sub-0001_dwi.nii.gz sub-0002/func/sub-0002_task-workingmemory_acq-seq_bold.nii.gz sub-0001/anat/sub-0001_T1w.nii.gz sub-0002/anat/sub-0002_T1w.nii.gz sub-0001/func/sub-0001_task-gstroop_acq-seq_bold.nii.gz sub-0002/func/sub-0002_task-faces_acq-mb3_bold.nii.gz sub-0002/func/sub-0002_task-anticipation_acq-seq_bold.nii.gz sub-0002/dwi/sub-0002_dwi.nii.gz sub-0001/func/sub-0001_task-anticipation_acq-seq_bold.nii.gz sub-0001/func/sub-0001_task-workingmemory_acq-seq_bold.nii.gz sub-0002/func/sub-0002_task-gstroop_acq-seq_bold.nii.gz' failed with exitcode 1 under /data [info keys: stdout_json] [err: 'git-annex: Git refuses to operate in this repository, probably because it is owned by someone else.
To add an exception for this directory, call:
git config --global --add safe.directory /data
git-annex: automatic initialization failed due to above problems']
Confusingly, following the suggestion from DataLad directly on the host
(git config --global --add safe.directory /data
) will not work in this
case, because this line must be executed within the container.
Instead, we can override the default user executing within the container
(which is root
, or uid = 0).
This can be achieved with
Docker's -u
/--user
option:
--user=[ user | user:group | uid | uid:gid | user:gid | uid:group ]
We can combine this option with Bash's id
command to ensure the current user's uid and group id (gid) are being set.
Let's update the last example in the previous
Docker execution section:
$ docker run -ti --rm \
-v $HOME/ds002785:/data:ro \
-v $HOME/ds002785/derivatives:/out \
-v $HOME/tmp/ds002785-workdir:/work \
-u $(id -u):$(id -g) \ # set execution uid:gid
nipreps/mriqc:<latest-version> \
\
/data /out/mriqc-<latest-version> \
participant \
-w /work
The above command line will ensure MRIQC to be executed with the current uid and gid, which will match the filesystem's permissions if the dataset was installed with the same user.
Match uid and gid with those corresponding to the user who installed the dataset
When different users are to install the dataset and
execute the application, Docker must be executed with the
uid and gid corresponding to the user who installed the dataset.
The uid corresponding to a given username (for instance janedoe
)
can be obtained as follows:
getent passwd "janedoe" | cut -f 3 -d ":"
and her gid:
getent passwd "janedoe" | cut -f 4 -d ":"
Mounting the dataset folder without read-only permissions¶
If the dataset is protected with read-only permissions, then MRIQC
will hit the following error
(see nipreps/mriqc#1363
):
get(error): sub-0001/func/sub-0001_task-restingstate_acq-mb3_bold.nii.gz (file) [git-annex: .git/annex/tmp: createDirectory: permission denied (Read-only file system)]
action summary:
get (error: 1)
Traceback (most recent call last):
File "/opt/conda/bin/mriqc", line 8, in <module>
sys.exit(main())
^^^^^^
File "/opt/conda/lib/python3.11/site-packages/mriqc/cli/run.py", line 43, in main
parse_args(argv)
File "/opt/conda/lib/python3.11/site-packages/mriqc/cli/parser.py", line 658, in parse_args
initialize_meta_and_data()
File "/opt/conda/lib/python3.11/site-packages/mriqc/utils/misc.py", line 447, in initialize_meta_and_data
_datalad_get(dataset)
File "/opt/conda/lib/python3.11/site-packages/mriqc/utils/misc.py", line 282, in _datalad_get
return get(
^^^^
File "/opt/conda/lib/python3.11/site-packages/datalad/interface/base.py", line 773, in eval_func
return return_func(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/conda/lib/python3.11/site-packages/datalad/interface/base.py", line 763, in return_func
results = list(results)
^^^^^^^^^^^^^
File "/opt/conda/lib/python3.11/site-packages/datalad_next/patches/interface_utils.py", line 287, in _execute_command_
raise IncompleteResultsError(
datalad.support.exceptions.IncompleteResultsError: Command did not complete successfully. 1 failed:
[{'action': 'get',
'annexkey': 'MD5E-s76037251--344f061a3165c71e36b98ad1649c3c8c.nii.gz',
'error_message': 'git-annex: .git/annex/tmp: createDirectory: permission '
'denied (Read-only file system)',
'path': '/data/sub-0001/func/sub-0001_task-restingstate_acq-mb3_bold.nii.gz',
'refds': '/data',
'status': 'error',
'type': 'file'}]
This error indicates that the container is executed with the appropriate uid and gid pair. In this case, we will need to ensure DataLad can write to the dataset installation when obtaining new data. This is easily achieved by removing the read-only parameters of the mount option:
$ docker run -ti --rm \
-v $HOME/ds002785:/data \ # mount data WITHOUT :ro
-v $HOME/ds002785/derivatives:/out \
-v $HOME/tmp/ds002785-workdir:/work \
-u $(id -u):$(id -g) \ # set execution uid:gid
nipreps/mriqc:<latest-version> \
\
/data /out/mriqc-<latest-version> \
participant \
-w /work
DataLad and Singularity/Apptainer¶
In the case of Singularity and Apptainer, ensuring the uid that executes the container involves using user namespace mappings. Therefore, you will need to contact your system administrator to figure out a convenient solution to the problem.
Since most of Singularity/Apptainer deployments automatically bind
the user's $HOME
directory, DataLad's suggested direction may
work:
git config --global --add safe.directory <path-to-dataset-in-host>
Allowing the container to write on the dataset's tree is straightforward
and homologous to Docker, by removing the :ro
setting in the binding
option (-B
).