Standardize static vocabulary JSON files between tools #216

alyssadai · 2025-01-06T20:10:14Z

The n-API and annotation tool currently use slightly different formats for JSON files containing all terms from an external vocabulary/sub-vocab, e.g. SNOMED assessments or SNOMED disorders.

e.g., for diagnoses,

API:

  {
    "sctid": "10007009",
    "preferred_name": "Coffin-Siris syndrome"
  },

Annotation tool:

   {
      "identifier":"snomed:10007009",
      "label":"Coffin-Siris syndrome"
   },

We should pick one format and be consistent with it across tools, if not across all vocabularies, then at least across the same vocabulary.

Considerations:

The code to generate SNOMED vocabulary files from Athena currently produces the second format, i.e. where Neurobagel-specific prefixes are baked into the terms
The annotation tool vocab file for SNOMED disorders (second format) additionally has the NCIT healthy control term in it (if we use the same file for the API, we may want to remove it from vocab endpoints to avoid confusion?)
The first format does not assume anything about vocabulary namespace prefixes specific to Neurobagel, and instead relies on the code that uses the JSON files to attach those as needed
The downside of hardcoding namespace prefixes (which are arbitrary shorthands decided by us - they do not end up in any of the actual graphs) is that if we ever decide to change them, we have to also change all the vocab files

alyssadai · 2025-01-15T17:12:03Z

@neurobagel/dev, please take a look at the issue description (including the considerations) and comment your preference - the key decision is whether to include the namespace prefixes in the JSON.

rmanaem · 2025-01-15T21:24:08Z

I prefer the Annotation tool version i.e., keeping the namespace prefix in the JSON.

surchs · 2025-01-15T21:28:53Z

Me too, I prefer the annotation tool version. I think having the prefix in there is a helpful sanity check if and when we switch vocabularies. It's likely a bit of extra works for some future (and past) vocabularies that only provide the term IDs, but I think that's worth the effort.

alyssadai · 2025-01-15T23:00:41Z

Okay, sounds good. To clarify, assuming we include prefixes, I think the two scenarios in which we would need to change the JSON file are:

If we decide to rename a vocabulary prefix itself (e.g. ncit used to be called purl by our tools, even though the actual healthy control term URL we’ve used hasn’t changed)
if we change the vocabulary itself (but we would need to update the JSON anyways since the terms would change

neurobagel-bot bot added this to Neurobagel Jan 6, 2025

alyssadai mentioned this issue Jan 7, 2025

[MNT] Replaced Cognitive Atlas with SNOMED neurobagel/api#397

Merged

8 tasks

rmanaem moved this to Backlog in Neurobagel Jan 7, 2025

rmanaem removed the flag:schedule Flag issue that should go on the roadmap or backlog. label Jan 7, 2025

alyssadai mentioned this issue Feb 13, 2025

Remove duplicates/keep only standard terms in SNOMED assessment list #236

Open

3 tasks

rmanaem moved this from Backlog to Specify - Done in Neurobagel Feb 21, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Standardize static vocabulary JSON files between tools #216

Standardize static vocabulary JSON files between tools #216

alyssadai commented Jan 6, 2025 •

edited

Loading

alyssadai commented Jan 15, 2025

rmanaem commented Jan 15, 2025

surchs commented Jan 15, 2025

alyssadai commented Jan 15, 2025 •

edited

Loading

Standardize static vocabulary JSON files between tools #216

Standardize static vocabulary JSON files between tools #216

Comments

alyssadai commented Jan 6, 2025 • edited Loading

Considerations:

alyssadai commented Jan 15, 2025

rmanaem commented Jan 15, 2025

surchs commented Jan 15, 2025

alyssadai commented Jan 15, 2025 • edited Loading

alyssadai commented Jan 6, 2025 •

edited

Loading

alyssadai commented Jan 15, 2025 •

edited

Loading