Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

QC Report branch with test refinement #1428

Merged
merged 60 commits into from
Jan 16, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
60 commits
Select commit Hold shift + click to select a range
cce8856
works because everthing is commented out
turbomam Nov 28, 2023
5088245
everything workigne except rules and enums
turbomam Nov 28, 2023
7515b20
old enums working
turbomam Nov 28, 2023
708b836
new class-equivalent permissible values
turbomam Nov 28, 2023
fb58cbd
Update Database-ReadQcAnalysisActivity-quality_fail.yaml
aclum Nov 28, 2023
2264ea1
Update workflow_execution_activity.yaml
aclum Dec 5, 2023
7ff8d20
Update Database-ReadQcAnalysisActivity-invalid.yaml
aclum Dec 5, 2023
e7130e6
Merge branch 'main' into 1427-try-to-fix-qc-report-pr-1334-in-new-branch
aclum Dec 5, 2023
236b14b
Update nmdc.yaml
aclum Dec 5, 2023
06f17b6
Update Database-ReadQcAnalysisActivity-quality_fail.yaml
aclum Dec 5, 2023
13d0baf
Update Database-ReadQcAnalysisActivity-invalid.yaml
aclum Dec 5, 2023
ec568f6
Update core.yaml
aclum Dec 6, 2023
2ff098f
Update workflow_execution_activity.yaml
aclum Dec 6, 2023
1ae7714
Update basic_slots.yaml
aclum Dec 6, 2023
c6077a7
Update nmdc.yaml
aclum Dec 7, 2023
fa6f47b
Update Database-ReadQcAnalysisActivity-quality_fail.yaml
aclum Dec 7, 2023
7132c4b
Update Database-ReadQcAnalysisActivity-invalid.yaml
aclum Dec 7, 2023
3610ecd
Update Database-ReadQcAnalysisActivity-quality_fail.yaml
aclum Dec 7, 2023
d83fc3d
Update core.yaml
aclum Dec 7, 2023
7b0c3c8
Update workflow_execution_activity.yaml
aclum Dec 7, 2023
e035047
Update Database-nucleic-extraction.yaml
aclum Dec 7, 2023
a9b4344
Update Database-extraction_set-exhaustive.yaml
aclum Dec 7, 2023
cf02142
Update Database-nucleic-extraction.yaml
aclum Dec 7, 2023
9d7198c
Update Database-Extraction-invalid-sample_mass.yaml
aclum Dec 7, 2023
dc06229
Update Extraction-NEON.yaml
aclum Dec 7, 2023
1930e5a
add rules, kind of
Dec 8, 2023
ba61c61
fix rule definition
Dec 8, 2023
1709c11
Update nmdc.yaml
aclum Dec 19, 2023
2a6858f
Update workflow_execution_activity.yaml
aclum Dec 19, 2023
1e1bf0c
Update core.yaml
aclum Dec 19, 2023
4063d45
Update Database-ReadQcAnalysisActivity-quality_fail.yaml
aclum Dec 19, 2023
7cda193
Update Database-ReadQcAnalysisActivity-quality_fail.yaml
aclum Dec 19, 2023
cfcb2a0
Update workflow_execution_activity.yaml
aclum Dec 19, 2023
c8d5320
Merge branch 'main' into 1427-try-to-fix-qc-report-pr-1334-in-new-branch
aclum Dec 19, 2023
6c92749
Update workflow_execution_activity.yaml
aclum Dec 19, 2023
ad52662
Update basic_slots.yaml
aclum Dec 19, 2023
9e9e233
Update basic_slots.yaml
aclum Dec 19, 2023
3ce617e
Update basic_slots.yaml
aclum Dec 19, 2023
35d3562
Update nmdc.yaml
aclum Dec 19, 2023
a733df8
Update nmdc.yaml
aclum Dec 19, 2023
145287b
Update workflow_execution_activity.yaml
aclum Dec 19, 2023
4c674fd
Update workflow_execution_activity.yaml
aclum Dec 19, 2023
3ac1ecc
Update workflow_execution_activity.yaml
aclum Dec 20, 2023
3e74a58
Create Database-MetagenomeAssembly_invalid_qc_status_rules.yaml
aclum Dec 20, 2023
59a2d4c
fix syntax of ifabsent
Dec 20, 2023
53c2a76
Update workflow_execution_activity.yaml
aclum Dec 20, 2023
f0c9943
Update pyproject.toml
aclum Dec 21, 2023
4f326b0
updating poetry.lock file after updating pyproject.toml
aclum Dec 21, 2023
e6a59c3
fixing format of invalid example to test WXA rules
aclum Dec 21, 2023
d5b54bf
Update Database-ReadQcAnalysisActivity-invalid.yaml
aclum Dec 21, 2023
4a81495
updating to linkml 1.6.8, invalid tests still pass which is not expected
aclum Jan 11, 2024
0db0a54
Merge branch 'main' into 1427-try-to-fix-qc-report-pr-1334-in-new-branch
aclum Jan 11, 2024
e7ab626
make pyproject.toml and poetry.lock file match
aclum Jan 11, 2024
08935b5
Create migrator from 9_3 to 9_4 with move extraction qc status
mbthornton-lbl Jan 12, 2024
eb5263e
update doctest
mbthornton-lbl Jan 12, 2024
2d901f9
Update nmdc.yaml
aclum Jan 12, 2024
cccbb3e
Update workflow_execution_activity.yaml
aclum Jan 12, 2024
d5c32de
Create Database-AsemblyAnalysisActivity-1.yaml
aclum Jan 13, 2024
a93bf27
fixing adding subdirs
aclum Jan 13, 2024
5e58230
Update workflow_execution_activity.yaml
aclum Jan 13, 2024
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
31 changes: 31 additions & 0 deletions nmdc_schema/migrators/migrator_from_9_3_to_9_4.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,31 @@

from nmdc_schema.migrators.migrator_base import MigratorBase


class Migrator(MigratorBase):
r"""Migrates data between two schema versions."""

_from_version = "9.3"
_to_version = "9.4"

def __init__(self, *args, **kwargs) -> None:
super().__init__(*args, **kwargs)
self._agenda = dict(
extraction_set=[self.move_extraction_qc_status],
)

def move_extraction_qc_status(self, extraction: dict) -> dict:
r"""
Move quality_control_report.status to qc_status

>>> m = Migrator()
>>> m.move_extraction_qc_status({'id': 123, 'quality_control_report': {'status': 'pass'}})
{'id': 123, 'qc_status': 'pass'}
"""
self.logger.info(f"Starting migration of {extraction['id']}")
if 'quality_control_report' in extraction:
if 'status' in extraction['quality_control_report']:
extraction['qc_status'] = extraction['quality_control_report']['status']
del extraction['quality_control_report']['status']
del extraction['quality_control_report']
return extraction
263 changes: 127 additions & 136 deletions poetry.lock

Large diffs are not rendered by default.

2 changes: 1 addition & 1 deletion pyproject.toml
Original file line number Diff line number Diff line change
Expand Up @@ -44,7 +44,7 @@ packages = [

[tool.poetry.dependencies]
click-log = "^0.4.0"
linkml = "^1.6.3"
linkml = "^1.6.8"
linkml-runtime = "^1.6.2"
python = "^3.9"

Expand Down
5 changes: 2 additions & 3 deletions src/data/invalid/Database-Extraction-invalid-sample_mass.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -15,8 +15,7 @@ extraction_set:
sample_mass:
has_numeric_value: 0.25
has_unit: gram
quality_control_report:
status: pass
qc_status: pass
extraction_method: phenol/chloroform extraction
extraction_target: DNA


Original file line number Diff line number Diff line change
@@ -0,0 +1,12 @@
#this should fail b/c qc_status is not specified so has_output should be required.
metagenome_assembly_set:
- id: nmdc:wfmgas-99-B7Vogx
name: Metagenome assembly 1472_51277
was_informed_by: nmdc:omprc-12-124
started_at_time: '2020-03-24T00:00:00+00:00'
ended_at_time: '2020-03-25T00:00:00+00:00'
type: nmdc:MetagenomeAssembly
execution_resource: LANL B-div
git_url: https://github.com/microbiomedata/metaAssembly/releases/tag/1.0.0
has_input:
- nmdc:dobj-12-1243
20 changes: 20 additions & 0 deletions src/data/invalid/Database-ReadQcAnalysisActivity-invalid.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,20 @@
read_qc_analysis_activity_set:
#test should fail because with a 'qc_status' of 'pass' 'has_output' should exists and have at least 1 value
- id: nmdc:wfrqc-11-hemh0a87.1
name: Read QC Activity for nmdc:wfrqc-11-hemh0a87.1
qc_status: pass
has_failure_categorization:
- qc_failure_what: malformed_data
qc_failure_where: ReadQcAnalysisActivity
qc_comment: Failure during call-stage to interleave fastq files
type: nmdc:ReadQcAnalysisActivity
started_at_time: "2023-08-29T19:41:47.365957+00:00"
ended_at_time: "2023-08-30T13:26:02.892410+00:00"
execution_resource: NERSC-Perlmutter
git_url: https://github.com/microbiomedata/ReadsQC
version: v1.0.8
was_informed_by: nmdc:omprc-11-r0pjgp16
has_input:
- nmdc:dobj-11-1k62bt83
- nmdc:dobj-11-e8hs8y25

21 changes: 21 additions & 0 deletions src/data/valid/Database-AsemblyAnalysisActivity-1.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,21 @@
#multiple failure categorization values
metagenome_assembly_set:
- id: nmdc:wfmgas-99-B7Vogx
qc_status: fail
has_failure_categorization:
- qc_failure_what: assembly_size_too_small
qc_failure_where: MetagenomeAssembly
- qc_failure_what: other
qc_failure_where: MetagenomeAssembly
qc_comment: 15% human contamination and assembly size is below 5 MB
name: Metagenome assembly 1472_51277
was_informed_by: nmdc:omprc-12-124
started_at_time: '2020-03-24T00:00:00+00:00'
ended_at_time: '2020-03-25T00:00:00+00:00'
type: nmdc:MetagenomeAssembly
execution_resource: LANL B-div
git_url: https://github.com/microbiomedata/metaAssembly/releases/tag/1.0.0
has_input:
- nmdc:dobj-12-1243
has_output:
- nmdc:dobj-12-1247
64 changes: 64 additions & 0 deletions src/data/valid/Database-ReadQcAnalysisActivity-quality_fail.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,64 @@
read_qc_analysis_activity_set:
#valid failed data, no output files
- id: nmdc:wfrqc-11-hemh0a87.1
name: Read QC Activity for nmdc:wfrqc-11-hemh0a87.1
qc_status: fail
#cv for what the failure was
#since there can be multivalued and what/where should be paired we need this structure
has_failure_categorization:
- qc_failure_what: malformed_data
qc_failure_where: ReadQcAnalysisActivity
qc_comment: Failure during call-stage to interleave fastq files
type: nmdc:ReadQcAnalysisActivity
started_at_time: "2023-08-29T19:41:47.365957+00:00"
ended_at_time: "2023-08-30T13:26:02.892410+00:00"
execution_resource: NERSC-Perlmutter
git_url: https://github.com/microbiomedata/ReadsQC
version: v1.0.8
was_informed_by: nmdc:omprc-11-r0pjgp16
has_input:
- nmdc:dobj-11-1k62bt83
- nmdc:dobj-11-e8hs8y25

#valid passing data
- id: nmdc:wfrqc-11-hemh0a88.1
name: Read QC Activity for nmdc:wfrqc-11-hemh0a88.1
qc_status: pass
qc_comment: Number of output reads from readqc is above threshold (6000000 > 1000000)
type: nmdc:ReadQcAnalysisActivity
started_at_time: "2023-08-29T19:41:47.365957+00:00"
ended_at_time: "2023-08-30T13:26:02.892410+00:00"
execution_resource: NERSC-Perlmutter
git_url: https://github.com/microbiomedata/ReadsQC
version: v1.0.8
was_informed_by: nmdc:omprc-11-r0pjgp16
has_input:
- nmdc:dobj-11-1k62bt83
- nmdc:dobj-11-e8hs8y25
has_output:
- nmdc:dobj-11-e8hs8y26
- nmdc:dobj-11-e8hs8y27
- nmdc:dobj-11-e8hs8y28

#valid failed data with outputs
- id: nmdc:wfrqc-11-hemh0a90.1
name: Read QC Activity for nmdc:wfrqc-11-hemh0a87.1
qc_status: fail
has_failure_categorization:
- qc_failure_what: low_read_count
qc_failure_where: ReadQcAnalysisActivity
qc_comment: Most data removed for artifacts
type: nmdc:ReadQcAnalysisActivity
started_at_time: "2023-08-29T19:41:47.365957+00:00"
ended_at_time: "2023-08-30T13:26:02.892410+00:00"
execution_resource: NERSC-Perlmutter
git_url: https://github.com/microbiomedata/ReadsQC
version: v1.0.8
was_informed_by: nmdc:omprc-11-r0pjgp16
has_input:
- nmdc:dobj-11-1k62bt83
- nmdc:dobj-11-e8hs8y25
has_output:
- nmdc:dobj-11-e8hs8y26
- nmdc:dobj-11-e8hs8y27
- nmdc:dobj-11-e8hs8y28
3 changes: 1 addition & 2 deletions src/data/valid/Database-extraction_set-exhaustive.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -26,9 +26,8 @@ extraction_set:
has_numeric_value: 0.25
has_unit: gram #http://purl.obolibrary.org/obo/UO_0000021
#based on qaqc criteria for a given process does the output 'Pass' or 'Failed' https://www.ebi.ac.uk/ols/ontologies/edam/terms?iri=http%3A%2F%2Fedamontology.org%2Fdata_3914&lang=en&viewMode=All&siblings=false
quality_control_report:
#status range should be new status_enum with possible values of 'Pass' or Failed'
status: pass
qc_status: pass
#leaving as extraction_method to make the model generic, other option considered was 'Nucleic Acid Extraction Method' http://purl.obolibrary.org/obo/NCIT_C177560
extraction_method: phenol/chloroform extraction
#extraction_type should have range of dna_extraction_enum with possible values of 'DNA extraction', 'RNA extraction', 'protein extraction'
Expand Down
5 changes: 2 additions & 3 deletions src/data/valid/Database-nucleic-extraction.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -37,9 +37,8 @@ extraction_set:

#based on qaqc criteria for a given process does the output 'Pass' or 'Failed' https://www.ebi.ac.uk/ols/ontologies/edam/terms?iri=http%3A%2F%2Fedamontology.org%2Fdata_3914&lang=en&viewMode=All&siblings=false

# quality_control_report:
# #status range should be new status_enum with possible values of 'Pass' or Failed'
# status: Pass

qc_status: pass

#leaving as extraction_method to make the model generic, other option considered was 'Nucleic Acid Extraction Method' http://purl.obolibrary.org/obo/NCIT_C177560
extraction_method: phenol/chloroform extraction
Expand Down
5 changes: 2 additions & 3 deletions src/data/valid/Extraction-NEON.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -6,9 +6,8 @@ has_output:
start_date: "2020-06-24T22:06Z"
end_date: "2021-08-19"
# processing_institution: "Battelle"
quality_control_report:
status: "pass"
qc_status: "pass"
extraction_target: "DNA"
input_mass:
has_numeric_value: 0.25
has_unit: "g"
has_unit: "g"
26 changes: 25 additions & 1 deletion src/schema/basic_slots.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -210,20 +210,43 @@ slots:
domain: PlannedProcess
range: Protocol


# nucleic_acid_concentration:
# range: InstrumentValue

biomaterial_purity:
domain: ProcessedSample
range: QuantityValue

qc_failure_what:
domain: FailureCategorization
range: FailureWhatEnum
description: >-
Provides a summary about what caused a lab or workflow process to fail
comments:
- For example Low read count from a sequencer, malformed fastq files, etc)

qc_failure_where:
domain: FailureCategorization
range: FailureWhereEnum
description: >-
Describes the nmdc schema class that corresonds to where the failure occurred.
Most commonly this would be the same as Class that generated the results.
comments:
- If the assembly size was too small to proceed to annotation failure_where would be MetagenomeAssembly.

qc_comment:
range: string
description: >-
Slot to store additional comments about laboratory or workflow output. For workflow output
it may describe the particular workflow stage that failed. (ie Failed at call-stage due to a malformed fastq file).

instrument_name:
domain: PlannedProcess
description: >-
The name of the instrument that was used for processing the sample.
# add this and write migration # range: instrument_name_enum


enums:
processing_institution_enum:
notes:
Expand Down Expand Up @@ -257,6 +280,7 @@ enums:
title: University of California, Davis Genome Center
meaning: https://genomecenter.ucdavis.edu/


# instrument_name_enum:
# todos:
# - remove "21T Agilent"
Expand Down
5 changes: 5 additions & 0 deletions src/schema/core.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -168,10 +168,15 @@ classes:
- protocol_link
- start_date
- instrument_name # or "used" ?
- qc_status
- qc_comment
- has_failure_categorization

slot_usage:
designated_class:
comments:
- required on all instances in a polymorphic Database slot like planned_process_set


OntologyClass:
is_a: NamedThing
Expand Down
63 changes: 51 additions & 12 deletions src/schema/nmdc.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -143,7 +143,10 @@ subsets:
- Assignment of the data_portal_subset is incomplete in the schema.

classes:

FailureCategorization:
slots:
- qc_failure_what
- qc_failure_where
FunctionalAnnotationAggMember:
slots:
- metagenome_annotation_id
Expand Down Expand Up @@ -223,7 +226,7 @@ classes:
- extraction_method
- extraction_target
- input_mass
- quality_control_report
- volume
turbomam marked this conversation as resolved.
Show resolved Hide resolved
slot_usage:
has_input:
required: true
Expand Down Expand Up @@ -266,10 +269,7 @@ classes:
- url
- name

QualityControlReport:
slots:
- status
- name


LibraryPreparation:
class_uri: nmdc:LibraryPreparation
Expand Down Expand Up @@ -2247,6 +2247,46 @@ enums:
V-bottom conical tube:
falcon tube:

FailureWhatEnum:
turbomam marked this conversation as resolved.
Show resolved Hide resolved
description: The permitted values for describing where a failure occurred during processing in the lab during analysis workflows.
permissible_values:
low_read_count:
description: Number of output reads is not sufficient to continue to the next analysis step.
malformed_data:
description: Workflow failure reading input or writing the output file(s).
assembly_size_too_small:
description: The size of the metagenome or metatranscriptome assembly is too small to proceed to the next analysis workflow.
no_valid_data_generated:
description: A process ran but did not produce any output. Ie binning ran but did not produce any medium or high quality bins.
other:
description: A lab process or analysis workflow has failed in a way that has not been captured by the available values yet. Please use slot 'qc_comment' to specify details.

FailureWhereEnum:
description: The permitted values for describing where in the process, either a lab or analysis workflow step, the failure occurred.
comments:
- At Chris' recommendation permissible values for this enumeration are the same as Class names.
permissible_values:
OmicsProcessing:
description: A failure has occurred in omics processing, a lab process.
Pooling:
description: A failure has occurred in pooling, a lab process.
Extraction:
description: A failure has occurred in extraction, a lab process.
LibraryPreparation:
description: A failure has occurred in library preparation, a lab process.
MetagenomeAssembly:
description: A failure has occurred in metagenome assembly, a workflow process.
MetatranscriptomeActivity:
description: A failure has occurred in metatranscriptome analysis, a workflow process.
MagsAnalysisActivity:
description: A failure has occurred in binning, a workflow process to generate metagenome-assembled genomes (MAGS).
ReadQcAnalysisActivity:
description: A failure has occurred in read qc, a workflow process.
ReadBasedTaxonomyAnalysisActivity:
description: A failure has occurred in reads based taxonomy, a workflow process.
MetagenomeAnnotationActivity:
description: A failure has occurred in annotation, a workflow process.

SeparationMethodEnum:
description: The tool/substance used to separate or filter a solution or mixture.
contributors:
Expand Down Expand Up @@ -2317,6 +2357,9 @@ enums:
ZIC-cHILIC:

slots:
has_failure_categorization:
range: FailureCategorization
multivalued: true
model:
range: InstrumentModelEnum
vendor:
Expand Down Expand Up @@ -2355,8 +2398,8 @@ slots:
range: integer
sample_collection_month:

status:
domain: QualityControlReport
qc_status:
description: Stores information about the result of a process (ie the process of sequencing a library may have for qc_status of 'fail' if not enough data was generated)
range: StatusEnum

library_preparation_kit:
Expand All @@ -2373,10 +2416,6 @@ slots:
domain: Extraction
range: ExtractionTargetEnum

quality_control_report:
domain: PlannedProcess
range: QualityControlReport

pcr_cycles:
range: integer
exact_mappings:
Expand Down
Loading
Loading