Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adding sourcedata filename to a column in the scans.tsv file #905

Closed
adam2392 opened this issue Oct 20, 2021 · 10 comments
Closed

Adding sourcedata filename to a column in the scans.tsv file #905

adam2392 opened this issue Oct 20, 2021 · 10 comments

Comments

@adam2392
Copy link
Member

Problem

I've been working with conversion of sourcedata files over to BIDS for the sake of i) speeding up my analysis work streams and ii) speeding up sharing of datasets. However, many times you'll have new datasets coming in, or maybe you want to determine if the file you uploaded with some filename (e.g. subject001_eeg_001.edf) was converted or not.

Moreover, many of my collaborators (i.e. clinicians) only remember their original file naming scheme, not the organized BIDS files. Unfortunately, then there's a lot of back and forth about which file is which unless there is a backwards trace of which BIDS file corresponds to which source file. There is no easy way to check this right now.

Suggestion

My proposal is to add a SHOULD requirement in the scans.tsv that suggests that users add a column to the file for original_filename, which adds the filename of the source file. This way, one can backtrack what was converted easily. To be honest, I think it should be a MUST, unless there is some sort of PHI embedded in the source filename?

@effigies
Copy link
Collaborator

effigies commented Oct 20, 2021

Definitely can't do a must, and for PHI reasons (and the fact that scans.tsv is optional) I think should is too strong. This does seem okay to put in as may. An alternative could be promoting the derivative Sources metadata to raw files as well.

@adam2392
Copy link
Member Author

You mean https://bids-specification.readthedocs.io/en/stable/05-derivatives/02-common-data-types.html here?

I suppose that serves the same purpose as adding original_filename in scans.tsv. However, where would these go? Would they go in the sidecar JSON?

I'm okay w/ either option as long as it's specified in BIDS, then we can support it in mne-bids.

@tsalo
Copy link
Member

tsalo commented Oct 20, 2021

+1 to using Sources in raw datasets. It fits in with other applications to derivatives rules to raw datasets, like #440.

@effigies
Copy link
Collaborator

You mean https://bids-specification.readthedocs.io/en/stable/05-derivatives/02-common-data-types.html here?

Yes.

I suppose that serves the same purpose as adding original_filename in scans.tsv. However, where would these go? Would they go in the sidecar JSON?

Yes.


Another approach to this could just be a table in sourcedata/ or code/ with source/destination columns. It could serve as a log or an input to a tool that performs the conversions.

@sappelhoff
Copy link
Member

Also +1 to make Sources available for Raw.

@adam2392
Copy link
Member Author

Should this just go in the sidecar json part for each MEG, EEG and iEEG?

@Remi-Gau
Copy link
Collaborator

Also +1 to make Sources available for Raw.

Same for me

@guiomar
Copy link
Collaborator

guiomar commented Oct 21, 2021

I would be careful of including original filenames inside the BIDS dataset, since many times they could contain sensitive data (eg. surnames, real dates, diseases, etc). Since this is not imprescindible information to understand the dataset itself, but lab management logistics. I would incline more towards some log outside (eg in /sourcedata), that one can easily delete before the dataset is shared. Having a field inside a json or tsv might be more difficult to delete.

@Remi-Gau
Copy link
Collaborator

yup we raised that concern in the PR: #906 (comment)

Though technically nothing in BIDS prevents from naming a file: sub-JohnDoe_T1w.nii, but I see your point.

@sappelhoff
Copy link
Member

closed, because we discussed in #906 that implementing this on the tooling side of things would suffice ... see mne-tools/mne-bids#890

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants