Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Data portal - how manifest_set records should look in the UI #1463

Open
aclum opened this issue Nov 15, 2024 · 4 comments
Open

Data portal - how manifest_set records should look in the UI #1463

aclum opened this issue Nov 15, 2024 · 4 comments
Assignees

Comments

@aclum
Copy link
Contributor

aclum commented Nov 15, 2024

This ticket replaces #1365

@aclum aclum assigned aclum and kheal Nov 15, 2024
@kheal
Copy link

kheal commented Nov 22, 2024

For the MS-related use case:
When the associated Manifest record that is referenced to a DataObject has the field value manifest_type of instrument_run, no special or alternate handling is needed on the upset plots or changes to the existing UI.

@aclum
Copy link
Contributor Author

aclum commented Jan 17, 2025

Business rules should be as follows:
if data_object_set records are has_output of a data_generation_set and have a value for in_manifest then check the manifest_set record. If the manifest_category on the manifest_set record has a permissible value of poolable_replicates then the corresponding data_generation_set records should be counted as 1 in the data portal visualizations (bar chart). No changes in logic are needed if for other manifest_category permissible values.

Acceptance criteria:
A search for biosample ID nmdc:bsm-11-1by2a436 in the data portal data type bar chart returns a count of 1 metagenome. Current behavior is a count of 2.

@aclum aclum assigned naglepuff and unassigned aclum and kheal Jan 17, 2025
@aclum
Copy link
Contributor Author

aclum commented Jan 17, 2025

@naglepuff would you please t-shirt size this task when you have a chance.

@aclum aclum changed the title Data portal - decide how manifest_set records should look in the UI Data portal - how manifest_set records should look in the UI Jan 17, 2025
@aclum aclum moved this to Todo in EMP 500 Jan 28, 2025
@aclum aclum added this to EMP 500 Jan 28, 2025
@naglepuff
Copy link
Collaborator

naglepuff commented Feb 5, 2025

Ok, here are some initial thoughts I have on this task after some clarifying discussion on slack with @aclum

My understanding of manifest objects is that they are a way to group related data_objects that are outputs of different data_generations.

On the Data Portal, specifically the "Data Type" bar chart (and probably the counts for the flyout for the "Data Type" facet), different data_generation records that can be traced to the same manifest record should only count as one IF the value of manifest_category for that manifest record is "poolable_replicate".

For example, I have:

manifest_set = [
  { id: "m1", manifest_category: "poolable_replicates" }
]
data_generation_set = [
  { id: "dg1", has_output: "do1" ... },
  { id: "dg2", has_output: "do2" ... },
  { id: "dg3", has_output: "do3" ... },
]
data_object_set = [
  { id: "do1", was_generated_by: "dg1", in_manifest: ["m1"] ... },
  { id: "do2", was_generated_by: "dg2", in_manifest: ["m1"] ... },
  { id: "do3", was_generated_by: "dg3", in_manifest: ["m1"] ... },
]

Then for any query from the data portal whose results in include some subset of {dg1, dg2, dg3}, that subset should only contribute 1 to the data type count on the bar chart.

Currently, the counts are determined by the BaseQuerySchema.facet method. This essentially does a
SELECT <<column>>, COUNT(*) FROM <<table>>.

In order to implement the change that this issue is describing we'd need to:

  1. update the data model to capture which data_generation records can be counted together
  2. update ingest to accommodate that
  3. update the logic of the facet method (probably on the data_generation-specific subclass of BaseQuerySchema

I will continue to dig in to estimates for these, but both (2) and (3) heavily depend on the design of (1), which will likely be the hardest part of this.

I don't know what the T-shirt size would be, but I would expect this would take me anywhere from 1-2 sprints

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: Todo
Development

No branches or pull requests

3 participants