-
Notifications
You must be signed in to change notification settings - Fork 171
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
BEP for audio/video capture of behaving subjects #1771
Comments
cc @yarikoptic who is providing guidance on this concept. |
An alternative idea is to name the files "_video.mp4|avi|mkv|..." and "_audio.mp3|wav|...". The advantage of this is it may be more clear what these files are. The disadvantages are that this does not make it clear that it's a recording of the subject as opposed to a stimulus, and that it's not clear what you should do if you have an audio/video recording. |
Another alternative idea is to have the files called "_beh.mp3|.wav|.mp4|.mkv|.avi|...", though this conflicts with the current beh modality. If there is a beh.tsv file in the beh/ directory, then it will have an accompanying beh.json file, which would conflict with the json file that corresponds to the data (e.g. beh.mp3) file |
|
Some of this may already be covered by the BIDS support for motion data and look at the the eyetracking BEP (PR and HTML) |
tagging @gdevenyi who I think mentioned wanting to work on something like this last time I saw him. |
The ideas for allowing annotations of movies and audios as expressed in issue #153 could be expanded to allow annotation of participant video/audio but in the imaging directories themselves with appropriate file structure to distinguish. |
I like how those different initiatives are synching up. Wouldn't those annotations of videos using HED when experimenters "code" their video be more appropriate as a derivative though. |
Not necessarily.... in one group I worked with on experiments on stuttering -- the speech pathologist's annotations were definitely considered part of the original data. Most markers that you see in typical event files didn't come from the imaging equipment but are extracted from the control software or external devices. The eye trackers have algorithms to mark saccades and blinks and these are written as original data. In my mind, if the annotations pertain to data that has been "calculated" from the original experimental data it should go into the derivatives folder. Annotations pertaining to data acquired during the experiment itself should probably go in the main folder. |
I see I was more thinking of cases where videos of an animal behavior have to be annotated to code when certain behavior happened. Given this is not automated and can happen long time after data acquisition I would have seen this as more derivatives. But your examples show that the answer like in many cases will be "it depends". |
We have potential animal applications in both domains:
also I guess a: |
Would non-contiguous recordings (using the same setup) end up in the same or distinct files? As an example, there could be cases where video recording has been stopped while taking care of a crying baby and resumed later on. Should BIDS try to enforce anything here, or leave it to end users (and data providers)? What about other types of "time-series" data? Not sure about MEG, for EEG I know the EDF+ format allows discontinuous recordings:
|
@DimitriPapadopoulos I believe this would be different runs. You would specify the start time of each run in the scans file |
I think there might be multiple scenarios (entities) how it could be handled:
|
From the annotation perspective in #153, an
Any of them seems great, I currently suggested
This is similar to having a stimulus with multiple tracks (left or right video streams, multiple audio channels, or separate video and audio), but they are not |
Also, as @bendichter mentioned, this proposal will very soon find its audience in human neuroscience, especially with DeepLabCut adding subject masking capabilities and newer modalities such as LiDAR and wifi motion capture comes into play. It might be useful to have Motion-BIDS maintainers' (@sjeung and @JuliusWelzel) opinions as well. |
How do we feel about this naming convention?
I'm not 100% on it myself but I can't think of anything better. Other options:
Is there any precedence from other standards we could use here? |
Technically mkv is container format, it could have different kinds of video/audio streams. Should we specify non-patent-encumbered video compression formats? |
@bendichter would be good to have your input on the proposed entities here. A specific point of discussion is how open the description of the proposed -annot entity would be: for stimuli only or also for other types of annotations as discussed above? |
@dorahermes I like the idea of a general text annotations file that annotates a media file, and I think that could certainly be relevant downstream of these behavioral capture files. I think the needs of stimuli storage and behavioral capture storage are different. With stimuli, you often have a single file that you play many times across different subjects, sessions, and trials, so it makes sense to have a root folder for these where they can be referenced repeatedly. For behavioral captures, every capture is unique, so it would make more sense to store these alongside other types of recordings. So I like what is going on with stimuli, but I don't want that that to engulf these ideas about how to represent behavioral capture. I also am trying to keep this to an MVP, so I'd like to push off discussion of annotations, though I will say I think the general approach you link to will probably work for behcapture as well with minimal adjustments. |
The most likely culprit here would be H264, which is used in mpeg files, however it seems that would be a non-issue since this would be covered under the "Free Internet Broadcasting" consideration (source) |
FWIW, I also think that we should have "audio", "video" in suffix (ref elsewhere) but do not think we should want to collapse an "intent" (beh) into it, moreover since we do have datatype |
I think there is a good amount of overlap (datatypes, extensions) with "stimuli" BEP044. @bendichter when you get a chance, have a look at that BEP google doc.
|
@satra pointed to https://docs.b2ai-voice.org/ where BIDS-inspired organization was also used and looked like
|
Very cool initiative, I think we really need support for such modalities in BIDS, to keep up with recent trends in neuroscience. Regarding @bendichter's question on naming:
How about using |
Welcome to the thread, @niksirbi! This is a good time to start pushing on this again. I like _videocapture / audiocapture! I like "capture" over "beh" as it is a bit more broad- not necessarily capturing "behavior." Maybe there would be situations where we want to capture video or audio that isn't a result of behavior but also isn't a stimulus? Then maybe we could have some kind of metadata categories that can give a rough idea of what we are capturing. Maybe HED could help with this? But is it important for us to differentiate between videos with and without audio streams in the filenames? How would we do that? |
Thanks for the pointer, @yarikoptic! Maybe we could take some inspiration from this. I think "voice" would not work for us. I definitely want to make this more broad than speech studies. I also would consider features and transcript to be derived data, so out of scope for this initial draft, but good to think about roughly how that would look. |
The naming of audio and video streams is also being discussed in PR#2022 with respect to BEP044 on stimuli. It might be useful to have some consistency in naming, if possible, as these initiatives move forward. |
thanks for the pointer to BEP044, @VisLab and @yarikoptic Things we can carry over from that:
Things we should not carry over:
|
Using IMHO,these suffixes indicate what the file is, such as being a video or audiovideo, irrespective of the purpose of the file (being a stimulus or behavior recording). |
Sorry I did not mean to close. It appears this new GitHub web UI is a bit buggy. |
For me that's not so important, but I'm definitely biased as I primarily work with the image content of videos. In theory, we could offer a third option, EDITJust saw that in #2022 they propose |
Are stimuli always in a directory named "stimuli"? If so, we could just use the same names: |
Yes, according to the proposed specifications. |
OK, in that case I would be happy to consider using the |
Given this piece of information, I'm also fine with dropping |
@niksirbi I'm not bothered by the length of audiovideocapture. Brevity of filenames has clearly not been a high priority for BIDS 🤣 While I agree the "stimuli" folder captures intent, the lack of stimulus folder does not capture the intent of it not being a stimulus unless the user is already aware of the stimulus rule. In that way, I think it falls just a bit short of the BIDS goal of being self-describing. |
I kind of see your point, but we also have an additional differentiator. The BEP044 filenames start with |
I'm not so sure that it would. I know I said that earlier in the thread but I think I was confused. The filename would be something like: |
That raises a different question (apologies in advance for derailing this conversation on suffixes, which is closing in on some consensus): How would we store video/audio files capturing multiple subjects at once? It's quite common to acquire videos with multiple subjects. It will be tricky to do given BIDS' subject-first approach. But perhaps this is out-of-scope, in which case ignore my question. |
some of those derivatives files are an artifact of the past. those are no longer really there. in terms of the broader space for audio-video capture, in the BBQS consortium where some of this is highly relevant, there are projects on dyadic interaction, navigation in the wild, in specific settings (e.g. hospital rooms and houses), multiple cameras/devices, etc.,. so capturing context will be as important as storing the streams. |
@niksirbi, this is tricky. BIDS' file hierarchy assumes a subject -> session structure where a session has a single subject. My first thought would be to create two different session folders, one for each subject, and use a softlink to link them together. Fortunately, in DANDI (and maybe OpenNeuro?), files that are exactly identical are de-duplicated, so it would be not problem to have multiple sessions with the same video capture file, even if that file takes different names. |
Something like that seems like a good compromise. |
softlink/symlink is filesystem specific solution, thus not encouraged/used anywhere in BIDS.
That is exactly why I also would prefer to stay away from using suffix for depicting intention/purpose for the file. So far we mostly avoided that in BIDS, as suffixes describe content not intention per se. E.g. someone could potentially use Also I dislike "capture" since too generic -- all data is "captured". Even movie videos from Hollywood are "captured" by video cameras. But I do confirm that we do have potential for a conflict ambiguity ATM! E.g.,how do we organize and name files for a session where subject was presented with a particular to that subject/session movie stimuli Hence -- we have two [*] hence does not make sense for placing into top level |
I have said that but forgot about BIDS URIs, https://bids-specification.readthedocs.io/en/stable/common-principles.html#bids-uri , so those could potentially be used I guess (ATM we have those + ad-hoc pointers like |
I believe neither this BEP nor BEP044 proposes a solution for this case (stimulus files per subject in the subject/session directories). I agree that with some use cases that this BEP will accommodate, it is only natural to have an entity to determine the scope of the recording. An example that comes to my mind is the STRUM task/dataset, in which two subjects collaborate in a first-person game environment with recordings from (among many datastreams) EEG, eye-tracking, and videos of the participants' faces (behavior), screens (stimulus), and eye-gaze (both behavior and stimulus). Sample videos are here (I removed the face camera video because they are not anonymized). |
I think I may be a bit confused. My reading of BEP044 is that all stimuli go in a stimuli directory at the root of the dataset, even if they are subject- or session-specific stimuli. If we wanted to modify it, we could allow for a stimuli directory at the subject or session level. Then we wouldn't ever have a naming collision with these captured videos. |
If I understand correctly, you are talking about differentiating between multiple simultaneous video recordings, right? I proposed this in the initial comment
would that handle this or am I missing something about your example? |
AFAIK, the
Yes, but as in the example, and the case @yarikoptic raised, some videos may not have any behavior in them (being stimulus presentation). It might be beneficial to have a way to differentiate them, either with the |
FTR, some of the openneuro datasets with videos under per-subj folders, likely with beh recordings*$> for ds in ds*; do find $ds/sub-* -iname *.avi -o -iname *.mp4 -o -iname *.mkv | head; done
find: ‘ds001107/sub-*’: No such file or directory
find: ‘ds003676/sub-*’: No such file or directory
ds004505/sub-06/video/sub-06_trial-01.mp4
ds004505/sub-06/video/sub-06_trial-02.mp4
ds004505/sub-06/video/sub-06_trial-03.mp4
ds004505/sub-06/video/sub-06_trial-04.mp4
ds004505/sub-06/video/sub-06_trial-05.mp4
ds004505/sub-06/video/sub-06_trial-06.mp4
ds004505/sub-06/video/sub-06_trial-07.mp4
ds004505/sub-06/video/sub-06_trial-08.mp4
ds004505/sub-06/video/sub-06_trial-09.mp4
ds004505/sub-06/video/sub-06_trial-10.mp4
ds004598/sub-01/ses-1/eeg/sub-01_ses-1_task-LinearTrack_video.avi
ds004598/sub-02/ses-1/eeg/sub-02_ses-1_task-LinearTrack_video.avi
ds004598/sub-02/ses-2/eeg/sub-02_ses-2_task-LinearTrack_video.avi
ds004598/sub-02/ses-3/eeg/sub-02_ses-3_task-LinearTrack_video.avi
ds004598/sub-03/ses-1/eeg/sub-03_ses-1_task-LinearTrack_video.avi
ds004598/sub-03/ses-2/eeg/sub-03_ses-2_task-LinearTrack_video.avi
ds004598/sub-04/ses-1/eeg/sub-04_ses-1_task-LinearTrack_video.avi
ds004598/sub-04/ses-2/eeg/sub-04_ses-2_task-LinearTrack_video.avi
ds004598/sub-05/ses-1/eeg/sub-05_ses-1_task-LinearTrack_video.avi
ds004598/sub-05/ses-2/eeg/sub-05_ses-2_task-LinearTrack_video.avi
find: ‘ds004643/sub-*’: No such file or directory
ds005127/sub-00002/ses-1/video/sub-00002_ses-1_task-sleep_run-20160531_2257.avi
ds005127/sub-00002/ses-1/video/sub-00002_ses-1_task-sleep_run-20160531_2330.avi
ds005127/sub-00003/ses-1/video/sub-00003_ses-1_task-sleep_run-20160712_2255.avi
ds005127/sub-00003/ses-1/video/sub-00003_ses-1_task-sleep_run-20160712_2350.avi
ds005127/sub-00003/ses-1/video/sub-00003_ses-1_task-sleep_run-20160713_0013.avi
ds005127/sub-00003/ses-1/video/sub-00003_ses-1_task-sleep_run-20160713_0142.avi
ds005127/sub-00003/ses-1/video/sub-00003_ses-1_task-sleep_run-20160713_0222.avi
ds005127/sub-00003/ses-1/video/sub-00003_ses-1_task-sleep_run-20160713_0310.avi
ds005127/sub-00003/ses-1/video/sub-00003_ses-1_task-sleep_run-20160713_0338.avi
ds005127/sub-00003/ses-1/video/sub-00003_ses-1_task-sleep_run-20160713_0459.avi
find: ‘ds005443/sub-*’: No such file or directory
find: ‘ds005590/sub-*’: No such file or directory
Example(s) in dandi (not yet BIDS): |
I would like to create a BEP to store the audio and/or video recordings of behaving subjects.
While this would obviously be problematic for sharing human data, it would be useful to internal human data and for internal and shared data of non-human subjects.
Following the structure of the Task Events we will define types of files that can be placed in various data_type directories.
This schema will follow the standard principles of BIDS, listed here for clarity:
beh/
._split-
entity._recording-
entity to differentiate. We will need to modify the definition of this term to generalize it a bit to accommodate this usage. This entity would also be used to differentiate if a video and audio were recorded simultaneously but from different devices. Not that simply using the file extension to differentiate would not work because it would not be clear which file the .json maps to.scans.tsv
file.The JSON would define "streams" which would define each stream in the file.
The
*_beh.json
would looks like this:To be specific, it would follow this JSON Schema structure:
This BEP would be specifically for audio and/or video, and would not include related data like eye tracking, point tracking, pose estimation, or behavioral segmentation. All of these would be considered derived and are reserved for another BEP.
The text was updated successfully, but these errors were encountered: