Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Converting ph4ge waste water schema to lectern #39

Open
edsu7 opened this issue Aug 1, 2024 · 1 comment
Open

Converting ph4ge waste water schema to lectern #39

edsu7 opened this issue Aug 1, 2024 · 1 comment
Assignees

Comments

@edsu7
Copy link
Contributor

edsu7 commented Aug 1, 2024

Input of definitions and list of values can be found:
https://github.com/pha4ge/Wastewater_Contextual_Data_Specification/tree/main/Template

The following table combines the info from above and add additional columns such as expected data types and dependencies to parse for lectern:
https://docs.google.com/spreadsheets/d/1UYkDrkVvynWdfO97mlhQt1QZIsUXI2gKO7TRKeApJl0/edit?usp=sharing
https://docs.google.com/spreadsheets/d/1Z5kEZR0Fib_s4wk3hykjq_0fIpS9wImt/edit?usp=drive_link&ouid=104625230692579285705&rtpof=true&sd=true

Some considerations and follow ups:

  • equivalent fields but which name to use?
    • water catchment area human population bin == water catchment area human population range
    • N50 value == N50
  • DataHarmonizer provenance not found in reference guide. Do they still want DataHarmonizer provenance ?
  • confirm the dependencies columns listed in above TSV
  • If an ID should null value menu be allowed?
  • gene list? or leave as string
  • If require should null value menu be allowed?
  • definitions or examples for the following fields:
    • experimental protocol field
    • experimental specimen role type
    • lineage/clade analysis report filename
    • lineage/clade analysis software name
    • lineage/clade analysis software version
    • lineage/clade name
    • geo loc name (county/region)
    • geo loc name (site)
    • sample collection end date
    • sample collection end time
    • sample collection start time
    • Data harmonizer provnenace?
  • Cannot find equivalent
    • sequencing assay type
  • data_type listed as string but should be integer instead?
    • sample collection time duration value
    • sample volume measurement value
    • sample storage duration value
    • water catchment area human population density value
    • precipitation measurement value
    • ambient temperature measurement value
    • total daily flow rate measurement value
    • instantaneous flow rate measurement value
    • turbidity measurement value
    • dissolved oxygen measurement value
    • oxygen reduction potential (ORP) measurement value
    • chemical oxygen demand (COD) measurement value
    • carbonaceous biochemical oxygen demand (CBOD) measurement value
    • total suspended solids (TSS) measurement value
    • total dissolved solids (TDS) measurement value
    • total solids (TS) measurement value
    • alkalinity measurement value
    • conductivity measurement value
    • salinity measurement value
    • total nitrogen (TN) measurement value
    • total phosphorus (TP) measurement value
    • fecal contamination value
    • fecal coliform count value
    • urinary contamination value
    • sample temperature value (at collection)
    • sample temperature value (when received)
    • diagnostic measurement value
  • Regex for the following IDs
    • BioProject accession
    • BioSample accession
    • GenBank accession (versioned)
    • SRA accession
    • GISAID accession
    • GISAID virus name
    • ENA accession
    • DRA accession
    • GSA accession
    • Enterobase accession
@edsu7 edsu7 self-assigned this Aug 1, 2024
@edsu7
Copy link
Contributor Author

edsu7 commented Aug 7, 2024

Updated Schema with version2

  • added versioning
  • populated missing info for :
"experimental protocol field",
"experimental specimen role type",
"lineage/clade analysis report filename",
"lineage/clade analysis software name",
"lineage/clade analysis software version",
"lineage/clade name",
"sequencing assay type",
'diagnostic measurement method',
"geo loc name (county/region)",
"geo loc name (site)",
"sample collection end date",
"sample collection end time",
"sample collection start time",
"sequenced by contact name",
"gene name"
  • added new values only found in reference:
"metagenome-assembled genome (MAG) ID",
"microbiological method",
"strain",
"isolate ID",
"alternative isolate ID",
"progeny isolate ID",
"isolated by",
"isolated by laboratory name",
"isolated by contact name",
"isolated by contact email",
"isolation date",
"isolate received date",
"serovar",
"serotyping method",
"phagetype",
"DNA fragment length",
"genomic target enrichment method",
"genomic target enrichment method details",
"sequence assembly software name",
"sequence assembly software version",
"sequence assembly length",
"read mapping software name",
"read mapping software version",
"taxonomic reference database name",
"taxonomic reference database version",
"taxonomic analysis report filename",
"taxonomic analysis date",
"read mapping criteria",
"AMR analysis software name",
"AMR analysis software version",
"AMR reference database name",
"AMR reference database version",
"AMR analysis report filename",
'diagnostic measurement method'

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant