-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Ambiguity in "regular expression for PEFF description line" #28
Comments
I think what was intended was to allow something that looks like this: However, there need to be several different ones now because different terms take a different number of elements. I will leave this issue open because this does still need to be fixed in the CV. |
Should I go through the current draft of the spec in the repository and aggregate the feature type by regex? |
Can you clarify what you mean by "aggregate the feature type by regex"? I don't understand what you mean? |
I mean to go over each explicitly named header key in the specification and construct a regular expression that matches the full range of inputs described there, and then group header keys by shared regular expression. |
I hadn't realized there were so few controlled header keys. There are no duplicates. I've tested these regular expressions on each of the examples from
/[0-9]+\|[A-Z]+(?:\|[a-zA-Z0-9]+)?/
/[0-9]+\|[0-9]+\|(?:[A-Z]{2,})?(?:\|[a-zA-Z0-9]+)?/
/(?:[0-9,]+)|\?\|UNIMOD:[0-9]+\|[^\|]+(?:\|[a-zA-Z0-9]+)?/
/(?:[0-9,]+)|\?\|MOD:[0-9]+\|[^\|]+(?:\|[a-zA-Z0-9]+)?/
/(?:[0-9,]+)|\?\|[^\|]+\|[^\|]+(?:\|[a-zA-Z0-9]+)?/
/[0-9]+\|[0-9]+\|PEFF:[0-9]+\|[^\|]+(?:\|[a-zA-Z0-9]+)?/ |
I'm attempting to implement a more strict PEFF parser in Python, but after consulting the controlled vocabulary, I'm not sure I see how to type-check annotations which are defined by the regex "regular expression for PEFF description line"
With syntax highlighting, the regex is:
/([0-9]+|[0-9]+|[a-zA-Z0-9]*)/
First, the expression translated into words seems partially redundant "One or more digits between 0 and 9 OR One or more digits between 0 and 9 OR Zero or more alphanumeric characters". The first two alternatives are identical, which seems odd. The reduced regex would be
/([0-9]+|[a-zA-Z0-9]*)/
This reads as "One or more digits between 0 and 9 OR Zero or more alphanumeric characters". This seems to suggest that implicitly each element of a
|
separated tuple will be interpreted separately, and that the indices of the tuple are not governed by the CV. This information is described in the format specification's text.Is this interpretation consistent with the intentions of the authors?
The text was updated successfully, but these errors were encountered: