You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
We currently do not have any validation system in place to confirm that the node IDs used in each record use the correct names and labels. While our curators are very detail-oriented and careful, the lack of automated checks leaves open the risk that data errors are introduced. We should create a simple script that queries each ID against some authoritative resource(s) (e.g., mygene.info/mychem.info/mydisease.info, OLS, UMLS, NodeNormalizer). More details below...
There are three IDs under nodes for MESH:D000068877, UniProt:P00519, and MESH:D015464. If we look up the first ID in the MeSH API here: https://id.nlm.nih.gov/mesh/lookup/details?descriptor=D000068877, we see that the preferred name for MESH:D000068877 is actually Imatinib Mesylate, and the preferred name for MESH:D015464 is Leukemia, Myelogenous, Chronic, BCR-ABL Positive. The script described here would output a version of the input YAML with the names replaced by the "preferred name" from the MeSH API.
The most common identifiers used in DMDB are shown here (with counts):
We currently do not have any validation system in place to confirm that the node IDs used in each record use the correct names and labels. While our curators are very detail-oriented and careful, the lack of automated checks leaves open the risk that data errors are introduced. We should create a simple script that queries each ID against some authoritative resource(s) (e.g., mygene.info/mychem.info/mydisease.info, OLS, UMLS, NodeNormalizer). More details below...
The first record in the indication_paths.yaml file is here:
There are three IDs under
nodes
forMESH:D000068877
,UniProt:P00519
, andMESH:D015464
. If we look up the first ID in the MeSH API here: https://id.nlm.nih.gov/mesh/lookup/details?descriptor=D000068877, we see that the preferred name forMESH:D000068877
is actuallyImatinib Mesylate
, and the preferred name forMESH:D015464
isLeukemia, Myelogenous, Chronic, BCR-ABL Positive
. The script described here would output a version of the input YAML with the names replaced by the "preferred name" from the MeSH API.The most common identifiers used in DMDB are shown here (with counts):
So let's start with MeSH as the most common identifier used. After that, we'll expand to the other identifier types.
The text was updated successfully, but these errors were encountered: