Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Standardize static vocabulary JSON files between tools #216

Open
alyssadai opened this issue Jan 6, 2025 · 4 comments
Open

Standardize static vocabulary JSON files between tools #216

alyssadai opened this issue Jan 6, 2025 · 4 comments
Labels
flag:discuss Flag issue that needs to be discussed before it can be implemented. refactor Simplifying or restructuring existing code or documentation. usability Issue affecting user or developer experience.

Comments

@alyssadai
Copy link
Contributor

alyssadai commented Jan 6, 2025

The n-API and annotation tool currently use slightly different formats for JSON files containing all terms from an external vocabulary/sub-vocab, e.g. SNOMED assessments or SNOMED disorders.

e.g., for diagnoses,

API:

  {
    "sctid": "10007009",
    "preferred_name": "Coffin-Siris syndrome"
  },

Annotation tool:

   {
      "identifier":"snomed:10007009",
      "label":"Coffin-Siris syndrome"
   },

We should pick one format and be consistent with it across tools, if not across all vocabularies, then at least across the same vocabulary.

Considerations:

  • The code to generate SNOMED vocabulary files from Athena currently produces the second format, i.e. where Neurobagel-specific prefixes are baked into the terms
  • The annotation tool vocab file for SNOMED disorders (second format) additionally has the NCIT healthy control term in it (if we use the same file for the API, we may want to remove it from vocab endpoints to avoid confusion?)
  • The first format does not assume anything about vocabulary namespace prefixes specific to Neurobagel, and instead relies on the code that uses the JSON files to attach those as needed
  • The downside of hardcoding namespace prefixes (which are arbitrary shorthands decided by us - they do not end up in any of the actual graphs) is that if we ever decide to change them, we have to also change all the vocab files
@alyssadai alyssadai added usability Issue affecting user or developer experience. refactor Simplifying or restructuring existing code or documentation. flag:schedule Flag issue that should go on the roadmap or backlog. flag:discuss Flag issue that needs to be discussed before it can be implemented. labels Jan 6, 2025
@rmanaem rmanaem moved this to Backlog in Neurobagel Jan 7, 2025
@rmanaem rmanaem removed the flag:schedule Flag issue that should go on the roadmap or backlog. label Jan 7, 2025
@alyssadai
Copy link
Contributor Author

@neurobagel/dev, please take a look at the issue description (including the considerations) and comment your preference - the key decision is whether to include the namespace prefixes in the JSON.

@rmanaem
Copy link
Contributor

rmanaem commented Jan 15, 2025

I prefer the Annotation tool version i.e., keeping the namespace prefix in the JSON.

@surchs
Copy link
Contributor

surchs commented Jan 15, 2025

Me too, I prefer the annotation tool version. I think having the prefix in there is a helpful sanity check if and when we switch vocabularies. It's likely a bit of extra works for some future (and past) vocabularies that only provide the term IDs, but I think that's worth the effort.

@alyssadai
Copy link
Contributor Author

alyssadai commented Jan 15, 2025

Okay, sounds good. To clarify, assuming we include prefixes, I think the two scenarios in which we would need to change the JSON file are:

  1. If we decide to rename a vocabulary prefix itself (e.g. ncit used to be called purl by our tools, even though the actual healthy control term URL we’ve used hasn’t changed)
  2. if we change the vocabulary itself (but we would need to update the JSON anyways since the terms would change

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
flag:discuss Flag issue that needs to be discussed before it can be implemented. refactor Simplifying or restructuring existing code or documentation. usability Issue affecting user or developer experience.
Projects
Status: Specify - Done
Development

No branches or pull requests

3 participants