-
Notifications
You must be signed in to change notification settings - Fork 72
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Change term - identificationQualifier #244
Comments
I propose a different view: keep qualifier independent of the actual name, but applied to the rank: The definition would then turn into something like:
In that way a reasonable controlled vocabulary could be built and used for this term (which would include the so many combinations of doubt terms/ranks. @ianengelbrecht do you see any conflicts for this view? |
If I am misunderstanding your point, perhaps you could provide an example. |
Looking at the examples the qualifier does apply to ranks higher that the lowest name in some cases, e.g. 'aff. agrifolia var. oxyadenia'. My suggestion was only to clarify the definition to make it consistent with the examples provided. Your suggestion might work @pzermoglio, but it means a different definition and different data in the field to what people may have been providing until now, and it might be difficult to work with if only |
The example is misleading, because I imagine the intention is to say that the organism has affinity with the variety oxyadenia. Even though the prefix is placed before the specific epithet. I'm not certain of normal taxonomic practice, but I doubt they mean that it has affinity only with Quercus agrifolia, because if this were the case why would they mention the variety. I suggest changing the example. |
Okay yes agreed. My feeling then is that if it always applies to the lowest identified taxonomic rank then the names are not actually required and the field can just include 'aff' or 'cf'. Issue #181 has a proposal for verbatimScientificName so that could capture the qualifier at the specific location in the name where the identifier put it if adopted. |
I prefer the way the Darwin Core I agree with @qgroom that qualifiers for anything else than the lowest taxonomic level to which something is identified does not make sense, but that does not mean that I have never seen it happen. In fact, in the discussions we had at the time we replaced our collections database ten years ago our botanists insisted on having the option. I myself would never use identification qualifiers at all, but rather write whole essays in identificationRemarks. |
Apropos, there was also a discussion on an idenfificationQualifier vocabulary at the HISPID GitHub (also concluding against a controlled vocabulary): |
@pzermoglio You proposed an alternative treatment of identificationQualifier that isn't actually under consideration in this term change proposal, but for the sake of being certain about consensus would like to know if the proposal as it stands is acceptable. |
Usually if there is a doubt in the identification (e.g. Ficus cf. cuatrecasasiana) we recomend to the publisher to put the doubt in identificationQualifier ( cf. cuatrecasasiana) and just keep the part that has taxonomic certainty in scientificName (Ficus). Aditionally, the use of "sp." in the examples, doesn't seem a wrigth use of this element becuase "sp" could be documented in the field verbatimTaxonRank, so there is no need to put it here. We made this comment in behalf of @SiBColombia |
@EstebanMH-SiB Your usage 'cf. cuatrecasasiana' is in agreement with the examples. I agree with you about the appropriate treatment of 'sp.', but we commonly see 'sp.' anyway. In cases where the species has not yet been described but is recognized as being a new species, we see the pattern 'Genus sp. nov. X'. In these cases the 'sp. nov. X' should go in identificationQualifier. Would this would be a better example to add? |
@tucotuco We agree to add 'sp. nov. X' instead of 'sp.' as an example. We think it will clarify the cases when an sp. has to be documented in identificationQualifier or verbatimTaxonRank. |
I agree that the qualifier should always apply to the lowest taxonomic rank -in spite of how it is actually used in many cases. Eg: I don't see why we would want to be inconsistent, what would be the advantage of having any name included in the field at all if we will always be referring to the lowest rank. |
I did not propose this. It is what it is now, judging from the examples.
In ABCD:
|
This proposal has been labeled as 'Controversial'. It will remain open for public review in pursuit of a consensus solution for another 30 days, but will not be included in the release to be prepared from the public review of 2021-05-01/2021-05/31. |
I'm not sure if this is helpful but we wrote a paper on the use of open nomenclature terms in image-based identifications (where there are a lot of uncertainties. We gave examples of the use of identifier qualifier and how to input to Darwin Core, and this might be of some help here? Horton et al 2021. Recommendations for the Standardisation of Open Taxonomic Nomenclature for Image-Based Identifications https://doi.org/10.3389/fmars.2021.620702 We provide examples to standardise the terms used and indicated that the field should contain and also recommend use of the identificationRemarks field to explain the qualifier. |
Great to see another article proposing standardized use of Open Nomenclature terms. I hope there will be an opportunity to discuss identifications again at the upcoming TDWG working sessions after the conference, I feel this is something that needs to be adopted. I've been trying to advocate for standardized use of these terms in our community here in South Africa again, but I'm receiving quite substantive back-pressure still... |
Public review of this issue has now concluded with objections to the proposed change. The issue will remain open for discussion and potential resolution. |
I have difficulty understanding this example above as this is not how I understood how it works. After discussing with @albenson-usgs and @pieterprovoost, these are the points that we find confusing: The definition of scientificName states:
For The definition of infraspecificEpithet states:
If the definition of scientificName states that it should contain lowest level taxonomic rank that can be determined, then why is Likewise, the definition of taxonRank states:
I am not sure if I understand why If it is me not understanding how taxonomy works, please help me to understand this. Thanks a lot! |
@ymgan these are really good points. We wrestle with this all the time, and I would very-much value insights from others. Stated in slightly different terms, given the source text string: "Quercus agrifolia cf. var. oxyadenia" I could interpret it two different ways: One is that the asserted taxon is Quercus agrifolia var. oxyadenia, and the The other is that the asserted taxon is Quercus agrifolia, and the I would be inclined to parse it according to the first interpretation, so that it shows up for a search on the variety Q. a. var. oxyadenia, but then is flagged with |
Likewise, I myself think of I think this falls within the 'application scheme` realm, as individual communities might be closer to recorders and determiners in the community and also can set guidelines as to when and how to use identification qualifiers. Notwithstanding the above, I think it is good to have this discussion, as it would be worthwhile to have a note about this somewhere in Darwin Core. Just noting that it would be much better to do this in a new issue, as this issue is about a proposal to change the definition of |
I agree it's impractical to get into the head of the person whom asserted the identification, but my preference for option 1 is more practical. Dave Remsen often spoke of "Recall" vs. "Precision". Very briefly, "recall" represents the inclusion of records of possible interest within the result set (i.e., reducing the linklihood of missing records of interest). "Precision" is minimizing the inclusion of records outside the scope of interest in the recordset. My feeling is that option 1 supports "recall" in that a search for the epithet "oxyadenia" is more likely to return this record if the value is included as Of course, this depends entirely on how the query logic is programmed... but it still seems to me that erring on the side of parsing out the I do think it's relevant to the discussion on |
Thank you very much @deepreef and @nielsklazenga! I appreciate the explanations! The reason I commented here is because I find the example seems to be contradicting the definition of multiple Darwin Core terms based on my understanding. I hope something could be done to clarify it under this term change proposal. From the perspective of thematic portal (antarctic biodiversity portal and antarctic GBIF/OBIS node here) where the data is important for modelling and to provide ocean statistics, we tend to take the conservative approach to only include information that is certain. I believe our approach is close to what is described as "Precision" as well as @tammyhorton paper in this comment. Back to the example, "Quercus agrifolia cf. var. oxyadenia". To me, both of the options are not certain that it is "Quercus agrifolia var. oxyadenia", even though now I learned that there are different degrees of confidence between the 2 options in the comment. This is how our current approach looks like for this example based on my understanding:
I acknowledge that there will be loss of information and I do not know what is the best way to represent the information when different degrees of confidence in the identification could be important to some. I leave my comment here, hoping that at least, there could be some clarification in the term definition/comment. Thanks again! |
Thanks, @ymgan
Yes, I would agree with this. Indeed, I think in almost all cases, any non-null value for This raises another issue: If we accept that values for If any non-null value is provided for If this an appropriate logic, then it supports my first option for parsing values (i.e., going with only 'cf.' for the My point is that we should craft the definition and associated descriptions for this term in such a way that they are explicit enough that data providers will be consistent, such that logic of the sort described above will work reliably. It seems to me that the Example for the "Quercus agrifolia cf. var. oxyadenia" should only include "cf." for the |
An interesting discussion! We have been interpreting this as I stated in the paper mentioned above, and as @ymgan indicates. The scientific name is used according to the definition of scientificName that it should contain 'the name in lowest level taxonomic rank that can be determined', and any text following after this scientificName, is placed in the identificationQualifier field - so, in this case, "cf. var. oxydenia". i.e. We know this is Quercus agrifolia, but we need further information to confirm if it is of the variety oxyadenia. Identification remarks are used to explain the reasoning and we are encouraging this to be used, although it is often lacking. In our work, the majority of names are accompanied by an identification qualifier of some sort, usually stet. or indet., but we can be confident in the scientificName value as the lowest level taxonomic rank that can be determined with certainty (as certain as ANY identification can be - As @deepreef indicates!). The examples given for the term in the Darwin core quick ref guide uses: cf. var. oxyadenia (for Quercus agrifolia cf. var. oxyadenia with accompanying values Quercus in genus, agrifolia in specificEpithet, oxyadenia in infraspecificEpithet, and var. in taxonRank) To me this indicates that "cf. var. oxydenia" should be placed in the identificationQualifier field, but I agree with @ymgan that it seems strange to then also put oxyadenia in infraspecificEpithet and var in taxonRank, since the taxon has only actually been determined to the species level with any confidence. I think that for data usage we should be working on confident identifications, and we need to also think of usage of open nomenclature terms in addition to cf. and aff. and standardise the usage of all of these. The usage of stet. and indet. result in no lower identification beyond that, but we still need to refer to the lowest taxonomic level determined in ScientificName. By including oxyadenia in the specific epithet we begin mixing identifications that are 'confident' with interpretations of identification which are usually not known by the data user. @deepreef 's logic of If any non-null value is provided for identificationQualifer, then disregard the name provided for the lowest taxonomic rank, and instead use the next-higher taxonomic rank as the confident identification Therefore does not make the most sense to me. |
Many thanks, @tammyhorton! This is really helpful to me, and has given me a new perspective on how to think about the best ways to represent content for I had always thought of So, if the determination label on the specimen said something like "Q. agrifolia cf. var. oxyadenia", and I was extremely confident that "Q." was an abbreviation of "Quercus", then I would populate "Quercus agrifolia var. oxyadenia" [with "Q." expanded to "Quercus", and the "cf." extracted] I would then present "cf." as the value for Here, for reference, is the definition of
So, my reading of "full scientific name" combined with "lowest level taxonomic rank that can be determined" would yield "Quercus agrifolia var. oxyadenia"; and the "identification qualification" of "cf." would be excluded from But if I understand your persepctive correctly, the existence of the "cf." qualifier excludes the "var. oxyadenia" bit from being part of the "can be determined" aspect of an identification, and that only the "Quercus agrifolia" meets the threshold for "can be determined". I think this perspective is entirely valid -- especially when "precision" is favored over "recall" -- and points to a potential ambiguity in the defnition of
...seems to favor that only the "cf." part should be represented for this term. I think I slightly favor the interpretation that I guess it comes down to the distinction I mentioned earlier, which could be simplified to:
I'm less concerned about which way dwc goes with this, than I am about removing ambiguity and potential inconsistencies -- which appear to exist in this example because different practitioners have interpreted these definitions in different ways. @mdoering or @timrobertson100 : I wonder if you could do some analysis to see how often a taxonomic name epithet appears in |
Thanks @deepreef , you have summarised the situation much better than I can. Yes, you have my perspective correct on the 'can be determined' part of dwc:scientificName, your number 2. I feel the same in that this is how I've interpreted it and it makes sense to me, so it seems the most logical way of representing it. But yes, the current definition of dwc:identificationQualifier does indeed state 'a brief phrase or standard term' which does not exactly capture how I am interpreting it. I agree that we need to think about removing ambiguity and ensuring everyone understands how to use the field, but also about the ongoing use of the data and confidence in it. I think it will be useful to be able to search for a particular taxon and sort according to whether it has an identification qualifier or not, but also be able to easily compare those entries with identification qualifiers. |
While I agree this is a goal - it also then excludes from the data the messy and not-quite identified things that someone might be interested in working with. I prefer @deepreef scenario 1 as it provides me the ability to find all of the varieties and the flexibility to remove or modify data with qualifiers if I choose. In Arctos, all qualifiers are forced to the end of the scientific name as our names are structured in a controlled vocabulary. From my perspective, the qualifiers should probably be accompanied with an explanation, because it is clear to me from this discussion that different groups have different ideas about what the qualifiers mean. Arctos is also in the process of adding an attribute to identifications = identification confidence which is meant to provide more detail about the determiner's confidence in their identification than these codes that have many definitions depending upon who is doing the interpreting. When we start migrating that information to Darwin Core, it will most likely end up in identificationQualifier, concatenated with whatever modifier was also applied to the scientificName. We also have information in our identifications that doesn't make it into a good place in DwC but probably should - identification method can also be important when considering how confident you are in an identification that someone else made but DwC does not include a method for identifications, that I'm guessing generally ends up in identificationRemarks or just not in DwC at all.
It seems like the time might be good for a Task Group to work on this? What happens when you have two determiners making conflicting identifications? Are you forced to choose one or can you provide both with methods used and let the users make their own decisions? |
When auditing DwC datasets I often see iQ used in odd ways. A recent job had a family name in scientificName and "indet." in iQ, and a genus name in sN and "sp." in iQ. However iQ is defined there will be compliance problems, so it's a good idea to give lots of examples to reduce these. I also see what could be valid iQ entries in identificationRemarks, which is meant to have "comments or notes about the identification". This is so broad it entirely contains iQ as a subset. |
@Jegelewicz writes "What happens when you have two determiners making conflicting identifications? Are you forced to choose one or can you provide both with methods used and let the users make their own decisions?" The current DwC allows you to pick the latest ID and put any other IDs in previousIdentifications (which for reasons I don't understand is in the Organism class, not the Identification class). A related problem is "Aus bus or Aus cus". IOW, the identifier is not only confident about the genus, but also confident that the species is either bus OR cus. I'm currently recommending "Aus" in scientificName, "genus" in taxonRank and "either Aus bus or Aus cus" in identificationRemarks for this case, but there might be better solutions. |
Is there still interest in trying to resolve this controversial issue? |
@tucotuco, the issue is still interesting, but I'm not sure what you mean by "resolve". Even if the DwC definitions and examples were changed/unchanged, you will still have different practices in different data communities. I don't see a resolution that brings all community practices together. Do you? This morning I audited a DwC dataset (for a Pensoft data paper) in which "Achaeta danica Nielsen & Christensen, 1959" (one of several such examples) was in scientificName, "species" in taxonRank and "Achaeta cf. danica" in taxonRemarks. The data compiler is quite competent and described taxonRemarks in the data paper as "Freeform remarks entered relevant to the taxonomy and characterisation of the documented species or taxon." For the record, I support Rich Pyle's scenario 1 because it allows a harvest of all ID confidence levels with a search on scientificName. For this morning's audit I recommended that the compiler's "cf." entries in taxonRemarks be deleted and moved to identificationQualifier. |
A year late, but I used the GBIF SQL API to download a list aggregated occurrences that have some value in There is a lot of noise in there, things like |
This is what is being evaluated
Change term
Current Term definition: https://dwc.tdwg.org/terms/#dwc:identificationQualifier
Proposed new attributes of the term:
?
forMeristina furcata?
with accompanyingMeristina
in genus,furcata
in specificEpithet,species
in taxonRank, andMeristina furcata
in scientificName. An identificationQualifiersp.
forLycalopex sp.
with accompanying valuesLycalopex
in genus and scientificName, andgenus
in taxonRank. An identificationQualifiercf. var. oxyadenia
forQuercus agrifolia cf. var. oxyadenia
with accompanying valuesQuercus
in genus,agrifolia
in specificEpithet,oxyadenia
in infraspecificEpithet,var.
in taxonRank, andQuercus agrifolia var. oxyadenia
in scientificName.This is how it began
Original comment:
The current definition is 'A brief phrase or a standard term ("cf.", "aff.") to express the determiner's doubts about the Identification.' and the examples include the names following the qualifier. Suggested to update the definition to specify that the value provided should include the names following the qualifier and not just the qualifier itself.
The text was updated successfully, but these errors were encountered: