-
Notifications
You must be signed in to change notification settings - Fork 1.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Added a TODO to start implementation of HED support in annotations #13059
base: main
Are you sure you want to change the base?
Changes from 3 commits
8f0c018
6f6ccdc
c31e837
a593841
b3183d3
4065433
5240041
40311f0
1485356
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -52,6 +52,7 @@ | |
verbose, | ||
warn, | ||
) | ||
from .utils.check import _soft_import | ||
|
||
# For testing windows_like_datetime, we monkeypatch "datetime" in this module. | ||
# Keep the true datetime object around for _validate_type use. | ||
|
@@ -151,6 +152,7 @@ class Annotations: | |
-------- | ||
mne.annotations_from_events | ||
mne.events_from_annotations | ||
mne.HEDAnnotations | ||
|
||
Notes | ||
----- | ||
|
@@ -288,7 +290,7 @@ def orig_time(self): | |
|
||
def __eq__(self, other): | ||
"""Compare to another Annotations instance.""" | ||
if not isinstance(other, Annotations): | ||
if not isinstance(other, type(self)): | ||
return False | ||
return ( | ||
np.array_equal(self.onset, other.onset) | ||
|
@@ -567,6 +569,8 @@ def _sort(self): | |
self.duration = self.duration[order] | ||
self.description = self.description[order] | ||
self.ch_names = self.ch_names[order] | ||
if hasattr(self, "hed_tags"): | ||
self.hed_tags = self.hed_tags[order] | ||
|
||
@verbose | ||
def crop( | ||
|
@@ -758,6 +762,140 @@ def rename(self, mapping, verbose=None): | |
return self | ||
|
||
|
||
class HEDAnnotations(Annotations): | ||
"""Annotations object for annotating segments of raw data with HED tags. | ||
|
||
Parameters | ||
---------- | ||
onset : array of float, shape (n_annotations,) | ||
The starting time of annotations in seconds after ``orig_time``. | ||
duration : array of float, shape (n_annotations,) | float | ||
Durations of the annotations in seconds. If a float, all the | ||
annotations are given the same duration. | ||
description : array of str, shape (n_annotations,) | str | ||
Array of strings containing description for each annotation. If a | ||
string, all the annotations are given the same description. To reject | ||
epochs, use description starting with keyword 'bad'. See example above. | ||
hed_tags : array of str, shape (n_annotations,) | str | ||
Array of strings containing a HED tag for each annotation. If a single string | ||
is provided, all annotations are given the same HED tag. | ||
hed_version : str | ||
The HED schema version against which to validate the HED tags. | ||
orig_time : float | str | datetime | tuple of int | None | ||
A POSIX Timestamp, datetime or a tuple containing the timestamp as the | ||
first element and microseconds as the second element. Determines the | ||
starting time of annotation acquisition. If None (default), | ||
starting time is determined from beginning of raw data acquisition. | ||
In general, ``raw.info['meas_date']`` (or None) can be used for syncing | ||
the annotations with raw data if their acquisition is started at the | ||
same time. If it is a string, it should conform to the ISO8601 format. | ||
More precisely to this '%%Y-%%m-%%d %%H:%%M:%%S.%%f' particular case of | ||
the ISO8601 format where the delimiter between date and time is ' '. | ||
%(ch_names_annot)s | ||
|
||
See Also | ||
-------- | ||
mne.Annotations | ||
|
||
Notes | ||
----- | ||
|
||
.. versionadded:: 1.10 | ||
""" | ||
|
||
def __init__( | ||
self, | ||
onset, | ||
duration, | ||
description, | ||
hed_tags, | ||
hed_version="latest", # TODO @VisLab what is a sensible default here? | ||
orig_time=None, | ||
ch_names=None, | ||
): | ||
hed = _soft_import("hed", "validation of HED tags in annotations") # noqa | ||
# TODO is some sort of initialization of the HED cache directory necessary? | ||
super().__init__( | ||
onset=onset, | ||
duration=duration, | ||
description=description, | ||
orig_time=orig_time, | ||
ch_names=ch_names, | ||
) | ||
# TODO validate the HED version the user claims to be using. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. When the user tries to load the schema version --- if it can't find it, the load fails. This happens at the beginning of validation. For now I think that is sufficient. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. so in other words, when the |
||
self.hed_version = hed_version | ||
self._update_hed_tags(hed_tags=hed_tags) | ||
|
||
def _update_hed_tags(self, hed_tags): | ||
if len(hed_tags) != len(self): | ||
raise ValueError( | ||
f"Number of HED tags ({len(hed_tags)}) must match the number of " | ||
f"annotations ({len(self)})." | ||
) | ||
# TODO insert validation of HED tags here | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I might need clarification here. My understanding is that this is at the stage of annotations for the continuous data before epoching. We would have the message "Number of HED strings..." since HED tags refer to the individual tags rather than the comma-separated list. Yes this is where it would occur. The validator would take strings in. If there are validation errors, how would they be reported? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The variable There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The validator returns a list of dicts with the issues. There is a function to get printable strings out of this. For your use case, we would probably want to validate the entire list. The validators take a We could wrap this to raise an error. For the current situation are you validating each sublist individually or are you doing it by event? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. (Answering more thoroughly now that I'm at my desk instead of my phone):
OK, we should change the variable name from
yes. Here I'm assuming that each annotated segment will have a single string (containing comma-separated tags) associated with it (i.e., it's neither necessary nor allowed to associate a list of HED strings with a single annotation). Correct me if I'm wrong about that please.
OK, so maybe something like: hed_results = func_that_validates_list_of_HED_strings(hed_strings)
# or if the validator takes in single strings instead of a list, then maybe
# hed_results = list(map(func_that_validates_one_HED_string, hed_strings))
if any(map(len, hed_results)):
err_strings = list(map(func_that_gets_printable_strings, hed_results))
raise ValueError(
"Some HED strings in your annotations failed to validate:\n"
"\n - ".join(err_strings)
)
not sure what "sublist" is here, do you mean "list of HED strings, of same length as the number of annotated events"? We can structure it so that each There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Yes there will be a single string but I need clarification on what an annotated segment is. A HED annotation would ordinarily be associated with a single time marker. Assuming that you are not using the NOTE: If you have a table of events (with onset and HED annotations), you can also compute the ( There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
For individual strings it is the error_handler = ErrorHandler(check_for_warnings=False)
validator = HedValidator(schema)
hed_obj = HedString(mystring, schema)
issues = validator.validate(hed_obj, False, error_handler=error_handler)
issue_str = get_printable_issue_string(issues) If you want to validate an entire column (with header "HED"), you would create a There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
in MNE There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
I think that is something we can handle later in a separate PR, once the |
||
self.hed_tags = hed_tags | ||
|
||
def __eq__(self, other): | ||
"""Compare to another HEDAnnotations instance.""" | ||
return ( | ||
super().__eq__(self, other) | ||
and np.array_equal(self.hed_tags, other.hed_tags) | ||
and self.hed_version == other.hed_version | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @VisLab if we want to compare equality of two There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The tags are the same, but it should be re-validated using the latest version of the schema. RE: Although once a tag is in the schema, it is always there (unless there is a major version change which we don't anticipate and even then -- every effort would be made to keep tags). This being said, the schema tags have attributes which may affect how they are validated -- also they might also have a different path in the hierarchy as upper level tags are added. (That is why the annotations should use the short form as much as possible and use tools to expand if needed.) In other words -- if you have two datasets and they have different versions of the schema then I think it should work if you revalidate using the latest of the two versions of the schema. (Am I correctly understanding that within a given dataset the files would use a single version of HED?) There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Maybe rephrasing my question will help: if I have 2 HED Strings, and as strings they are identical (i.e., they both say There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. (this might also help: MNE-Python does not deal with datasets. That is the job of MNE-BIDS. Within MNE-BIDS, I think it is safe to assert/require that only one schema version is used to do all validation of annotations within that dataset. So the question I'm asking is really about the collection of HED strings attached to a single recording and what counts as "the same" when talking about those HED strings) There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. All of the tools require a single non-conflicting schema version specification. So we are talking about whether two There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
not in this context! I think that is the root of our misunderstanding. The code (in MNE-Python) that this question is attached to checks equality of So the question is, how should we assess "equality" of two
My original question of "should we care about schema version when testing equality of HEDAnnotations objects?" could be rephrased as "should be just do (1) or should we also do (2)?" but I'm now adding option (3) for clarity, since you've explained how equality is tested in your library. I'll note that it's not (yet) obvious to me that there's added value from the extra computations involved in (3) in the context of testing equality of HEDAnnotations objects, so if you think (3) is the best choice, could you explain why you think so (perhaps by giving an example where 2 identical strings would be parsed as meaningfully different under different schema versions)? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
The only option is to compare as HedString There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
OK! that's pretty clear motivation for (3). One more question: by the time we're checking equality of |
||
) | ||
|
||
def __repr__(self): | ||
"""Show a textual summary of the object.""" | ||
counter = Counter(self.hed_tags) | ||
kinds = ", ".join(["{} ({})".format(*k) for k in sorted(counter.items())]) | ||
kinds = (": " if len(kinds) > 0 else "") + kinds | ||
ch_specific = ", channel-specific" if self._any_ch_names() else "" | ||
s = ( | ||
f"HEDAnnotations | {len(self.onset)} segment" | ||
f"{_pl(len(self.onset))}{ch_specific}{kinds}" | ||
) | ||
return "<" + shorten(s, width=77, placeholder=" ...") + ">" | ||
|
||
def __getitem__(self, key, *, with_ch_names=None): | ||
"""Propagate indexing and slicing to the underlying numpy structure.""" | ||
result = super().__getitem__(self, key, with_ch_names=with_ch_names) | ||
if isinstance(result, OrderedDict): | ||
result["hed_tags"] = self.hed_tags[key] | ||
else: | ||
key = list(key) if isinstance(key, tuple) else key | ||
hed_tags = self.hed_tags[key] | ||
return HEDAnnotations( | ||
result.onset, | ||
result.duration, | ||
result.description, | ||
hed_tags, | ||
hed_version=self.hed_version, | ||
orig_time=self.orig_time, | ||
ch_names=result.ch_names, | ||
) | ||
|
||
def append(self, onset, duration, description, ch_names=None): | ||
"""TODO.""" | ||
pass | ||
|
||
def count(self): | ||
"""TODO. Unlike Annotations.count, keys should be HED tags not descriptions.""" | ||
pass | ||
|
||
def crop( | ||
self, tmin=None, tmax=None, emit_warning=False, use_orig_time=True, verbose=None | ||
): | ||
"""TODO.""" | ||
pass | ||
|
||
def delete(self, idx): | ||
"""TODO.""" | ||
pass | ||
|
||
def to_data_frame(self, time_format="datetime"): | ||
"""TODO.""" | ||
pass | ||
Comment on lines
+899
to
+919
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @VisLab these TODOs are for me. So as you can see some things aren't going to work yet, but we're already at least able to do: $ ipython
In [1]: import mne
In [2]: foo = mne.HEDAnnotations([0, 1], [0.5, 1.2], ['foo', 'bar'], ['hed/foo', 'hed/
...: bar'])
In [3]: foo
Out[3]: <HEDAnnotations | 2 segments: hed/bar (1), hed/foo (1)> There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Not completely sure what these do, but would be willing to help as needed. Would the Thanks @drammock |
||
|
||
|
||
class EpochAnnotationsMixin: | ||
"""Mixin class for Annotations in Epochs.""" | ||
|
||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@VisLab
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
By default, the users cache directory is in
$USERHOME/.hedtools/hed_cache
. The hedtools distribution has inhed/schema/schema_data
the most recent HED schemas and when this directory is created, it is initialized with this data. If the user accesses a version that is not cached, it goes to the web to fetch it and caches it.The default location can be overridden if this would be better for MNE-Python, but it isn't necessary for HED.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah, and I noticed that directory was empty for me after installing/importing/fiddling with the package, which is why I asked. Sounds like that wasn't a problem.