Converting ph4ge waste water schema to lectern #39

edsu7 · 2024-08-01T21:14:35Z

Input of definitions and list of values can be found:
https://github.com/pha4ge/Wastewater_Contextual_Data_Specification/tree/main/Template

The following table combines the info from above and add additional columns such as expected data types and dependencies to parse for lectern:
~~https://docs.google.com/spreadsheets/d/1UYkDrkVvynWdfO97mlhQt1QZIsUXI2gKO7TRKeApJl0/edit?usp=sharing~~
https://docs.google.com/spreadsheets/d/1Z5kEZR0Fib_s4wk3hykjq_0fIpS9wImt/edit?usp=drive_link&ouid=104625230692579285705&rtpof=true&sd=true

Some considerations and follow ups:

equivalent fields but which name to use?
- water catchment area human population bin == water catchment area human population range
- N50 value == N50
DataHarmonizer provenance not found in reference guide. Do they still want DataHarmonizer provenance ?
confirm the dependencies columns listed in above TSV
If an ID should null value menu be allowed?
gene list? or leave as string
~~If require should null value menu be allowed?~~
~~definitions or examples for the following fields:~~
- experimental protocol field
- experimental specimen role type
- lineage/clade analysis report filename
- lineage/clade analysis software name
- lineage/clade analysis software version
- lineage/clade name
- geo loc name (county/region)
- geo loc name (site)
- sample collection end date
- sample collection end time
- sample collection start time
- Data harmonizer provnenace?
~~Cannot find equivalent~~
- sequencing assay type
data_type listed as string but should be integer instead?
- sample collection time duration value
- sample volume measurement value
- sample storage duration value
- water catchment area human population density value
- precipitation measurement value
- ambient temperature measurement value
- total daily flow rate measurement value
- instantaneous flow rate measurement value
- turbidity measurement value
- dissolved oxygen measurement value
- oxygen reduction potential (ORP) measurement value
- chemical oxygen demand (COD) measurement value
- carbonaceous biochemical oxygen demand (CBOD) measurement value
- total suspended solids (TSS) measurement value
- total dissolved solids (TDS) measurement value
- total solids (TS) measurement value
- alkalinity measurement value
- conductivity measurement value
- salinity measurement value
- total nitrogen (TN) measurement value
- total phosphorus (TP) measurement value
- fecal contamination value
- fecal coliform count value
- urinary contamination value
- sample temperature value (at collection)
- sample temperature value (when received)
- diagnostic measurement value
Regex for the following IDs
- BioProject accession
- BioSample accession
- GenBank accession (versioned)
- SRA accession
- GISAID accession
- GISAID virus name
- ENA accession
- DRA accession
- GSA accession
- Enterobase accession

The text was updated successfully, but these errors were encountered:

edsu7 · 2024-08-07T19:03:09Z

Updated Schema with version2

added versioning
populated missing info for :

"experimental protocol field",
"experimental specimen role type",
"lineage/clade analysis report filename",
"lineage/clade analysis software name",
"lineage/clade analysis software version",
"lineage/clade name",
"sequencing assay type",
'diagnostic measurement method',
"geo loc name (county/region)",
"geo loc name (site)",
"sample collection end date",
"sample collection end time",
"sample collection start time",
"sequenced by contact name",
"gene name"

added new values only found in reference:

"metagenome-assembled genome (MAG) ID",
"microbiological method",
"strain",
"isolate ID",
"alternative isolate ID",
"progeny isolate ID",
"isolated by",
"isolated by laboratory name",
"isolated by contact name",
"isolated by contact email",
"isolation date",
"isolate received date",
"serovar",
"serotyping method",
"phagetype",
"DNA fragment length",
"genomic target enrichment method",
"genomic target enrichment method details",
"sequence assembly software name",
"sequence assembly software version",
"sequence assembly length",
"read mapping software name",
"read mapping software version",
"taxonomic reference database name",
"taxonomic reference database version",
"taxonomic analysis report filename",
"taxonomic analysis date",
"read mapping criteria",
"AMR analysis software name",
"AMR analysis software version",
"AMR reference database name",
"AMR reference database version",
"AMR analysis report filename",
'diagnostic measurement method'

edsu7 self-assigned this Aug 1, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Converting ph4ge waste water schema to lectern #39

Converting ph4ge waste water schema to lectern #39

edsu7 commented Aug 1, 2024 •

edited

Loading

edsu7 commented Aug 7, 2024

Converting ph4ge waste water schema to lectern #39

Converting ph4ge waste water schema to lectern #39

Comments

edsu7 commented Aug 1, 2024 • edited Loading

edsu7 commented Aug 7, 2024

edsu7 commented Aug 1, 2024 •

edited

Loading