Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GREI HDV Task: Determine whether/how Dataverse can support hierarchical vocabularies #236

Open
3 tasks
Tracked by #174
cmbz opened this issue Apr 30, 2024 · 16 comments
Open
3 tasks
Tracked by #174
Assignees
Labels
Dataverse Project Issues related to Dataverse Project software GREI Year 3 Year 3 GREI task GREI 2 Consistent Metadata Harvard Dataverse Issues related to Harvard Dataverse Repository Project: NIH GREI Tasks related to the NIH GREI project

Comments

@cmbz
Copy link
Contributor

cmbz commented Apr 30, 2024

Overview

  • Determine whether/how Dataverse can support hierarchical vocabularies

Deliverable

  • Defining what hierarchal support means for DV/HDV. What would this look like for DV? What are the goals and how do we know that a solution meets those goals?

Then either:

  • Confirmation that Dataverse can currently provide hierarchical vocabulary support, or
  • Documentation describing how to implement hierarchical vocabulary support in Dataverse

Resources

@cmbz cmbz changed the title GREI HDV Task: Determine whether/how Dataverse can support hierarchical vocabularies (community need) GREI HDV Task: Determine whether/how Dataverse can support hierarchical vocabularies Apr 30, 2024
@cmbz cmbz added GREI 2 Consistent Metadata Project: NIH GREI Tasks related to the NIH GREI project labels Apr 30, 2024
@cmbz cmbz added GREI Year 3 Year 3 GREI task Harvard Dataverse Issues related to Harvard Dataverse Repository Dataverse Project Issues related to Dataverse Project software labels May 7, 2024
@cmbz cmbz moved this to SPRINT- NEEDS SIZING in IQSS Dataverse Project Jul 1, 2024
@sbarbosadataverse
Copy link

sbarbosadataverse commented Jul 19, 2024

Sonia and Julian met and discussed additional steps to getting this task done. See updates to "deliverables" above. Julian estimates he can devote time to this issue in mid September

@cmbz
Copy link
Contributor Author

cmbz commented Aug 5, 2024

2024/08/05

  • Stefano reports that Jim indicated that the Dataverse script for external vocabularies can support hierarchical vocabularies, but work will be needed to store the data properly in Dataverse.
  • Gustavo indicates that the internal vocabulary does not currently support hierarchies. Discussion happened, but implementation did not progress. Design work would needed to implement support.
  • September: @qqmyers @jggautier and @scolapasta will meet to discuss options and possibilities.

@qqmyers
Copy link
Member

qqmyers commented Aug 5, 2024

I think I said the opposite - with the external vocab mechanism, Dataverse just stores a term URI, so there are no changes needed to support a hierarchical vocabulary - all the changes would be in the JavaScript (to be developed for a give vocabulary/service) where it should be simple to find a widget/mirror what other sites do, etc. to handle hierarchy or graph relations, etc.)

@cmbz
Copy link
Contributor Author

cmbz commented Aug 5, 2024

Thanks @qqmyers for clarifying. @siacus and @scolapasta please see Jim's comment for an update to our understanding about hierarchical vocab support.

@cmbz cmbz removed their assignment Nov 18, 2024
@cmbz
Copy link
Contributor Author

cmbz commented Nov 18, 2024

2024/11/18: Ask @qqmyers to take a look, recommend next steps (e.g., development work needed), create relevant development issues.

@qqmyers
Copy link
Member

qqmyers commented Nov 18, 2024

I'd suggest some ~non-dev work to start:

  • pick a vocab to start with, UMLS, or MeSH
  • see if there is a service or repository where we can get the official values in the repository and or where we can link to to provide more info about a term (as we point to a user's ORCID profile)
  • find an example/do a mockup of what navigating the vocabulary should look like
  • assure that all we want/need for now is hierarchical navigation (versus, for example, needed to navigate to related terms elsewhere in the hierarchy)
  • define how the term should be displayed - by itself? as part of a hierarchy of parent terms?
  • define where the term will be added to Dataverse, i.e. which field in which block (note - making it one type of thing that goes in the citation keyword field makes development somewhat more complex than if its a separate field, e.g. in the HEAL block or elsewhere)
  • identify if/where the term should go in the various metadata exports we have. (OAI-ORE and our JSON are trivial, the question is more about DataCite, DDI, etc.)

With the answers above, I think it should be straight-forward to scope the JavaScript work needed to support the input and display, identify whether there's work related to a new metadata block, whether updates are needed to exporters, etc.

@cmbz
Copy link
Contributor Author

cmbz commented Nov 19, 2024

@jggautier and @sbarbosadataverse do you have suggestions for the first bullet points Jim suggested here: #236 (comment) ?

@jggautier
Copy link

Hmm, I'll try to think about it and reply later today

@cmbz cmbz moved this from SPRINT- NEEDS SIZING to On Hold ⌛ in IQSS Dataverse Project Nov 21, 2024
@cmbz
Copy link
Contributor Author

cmbz commented Nov 21, 2024

2024/11/21: Placing On Hold until @sbarbosadataverse and @jggautier figure out which vocabulary they want to investigate.

@bencomp
Copy link

bencomp commented Jan 13, 2025

May I offer the idea that using a hierarchical vocabulary should help finding data even without additional visuals? E.g., if I tag a dataset "European politics", I should be able to find it when I search for the broader term "politics" (assuming the use of a vocabulary that includes those terms in a hierachical relationship).

Just chiming in since this issue replaces ones of the oldest Dataverse issues, while it appears to not cover all of the old issue's contents.

@jggautier
Copy link

Thanks @bencomp. Could you write more about what additional visuals might mean?

@bencomp
Copy link

bencomp commented Jan 13, 2025

For a moment I thought this issue was only/mostly about the visual navigation of a hierarchy, but on a second read I see that was my mistake.

@jggautier
Copy link

Ah, okay, visuals like how depositors and curators might select terms from a hierarchical vocabulary. Thanks!

Yeah we definitely mean to consider all aspects of "support", like what was discussed in the older GitHub issues that this issue replaced.

@sbarbosadataverse
Copy link

sbarbosadataverse commented Jan 21, 2025

Status: January 2025

@cmbz @qqmyers

Julian and I met and discussed some tasks associated with Jim's plan:

  1. It's still important to add to Jim's plan to keep in mind how people will use these vocabularies to search, per @bencomp comment

  2. Determine which vocabulary is more complex, which is used more often, how easy it is to access the terms, and would interact better with controlled vocab functionality

  3. Are there already platforms allowing users access to these terms, to use for our mockup examples

  4. The vocabulary should be accessible to all users and not within blocks - as Jim pointed out this would make development more complicated but the goal is for HDV wide-use -- @qqmyers

In addition to what Jim outlined, and to happen in parallel:

  1. Learning about how people are already using these terms in DV (e g. the MORU collection)

  2. Consider use cases for those wanting to use multiple controlled vocabulary in the same field (In Keyword field, example) - @qqmyers Would it be problematic to build support for one vocabulary, and then modify to support "multiple controlled vocabulary," later? Should we consider a "multiple vocabulary support" model to start? @jggautier can share the community conversation on this multiple vocab support (is someone in the community already supporting this? We can email the installations and ask?)

@cmbz
Copy link
Contributor Author

cmbz commented Jan 23, 2025

Thanks @sbarbosadataverse and @jggautier looks like a great plan to me. Curious about @qqmyers thoughts?

@qqmyers
Copy link
Member

qqmyers commented Jan 23, 2025

Not sure what to comment on: re: 4 - not sure why implementation in a new block can't available site-wide, but, if the idea is to have this in the citation block Keywords field - it would be required to be on for everyone (so non-medical collections would have to see any medical vocab).

re the second #2: The way our ext. vocab service currently works is that there can be one script per field. That means that if you want to support one hierarchical vocab and free-text entries, the script has to support that (most of our current ones do) and if you want multiple vocabs, again the script has to support it (currently only our skosmos script does that and it requires both vocabs to be on the same server.) Same for multiple vocabs and free text - that would all be built into a single script.

There is interest in the community in allowing multiple scripts on a given field and even allowing different scripts to be turned on for a given field in different collections. If/when that is designed/implemented, individual scripts could probably stop doing anything to handle free text or multiple vocabs. Which ~means that starting with single vocab per field is fine/it's extra work to support multiple vocabs and, until there's a clear design, work towards multiple vocabs in one field could end up being one-off/have to be redone later. (I don't have a good guess as to when redesign might get going - probably faster if Harvard is also interested due to GREI).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Dataverse Project Issues related to Dataverse Project software GREI Year 3 Year 3 GREI task GREI 2 Consistent Metadata Harvard Dataverse Issues related to Harvard Dataverse Repository Project: NIH GREI Tasks related to the NIH GREI project
Projects
Status: On Hold ⌛
Development

No branches or pull requests

7 participants