Skip to content

Commit

Permalink
auto-generate isolate field in BioSample metadata TSV for Flu (#3)
Browse files Browse the repository at this point in the history
* add 4 flu_biosample_metadata optional fields required for making isolate output column

* added boolean and code to automatically create the "isolate" column for Flu samples when input terra data table does not include strain or isolate

* delete comment

* bump version to 1.0.6

* update AUTHORS file; update dockerfile and readme to list 1.0.6. Dockerfile will not build successfully until github release is created
  • Loading branch information
kapsakcj authored Jul 30, 2024
1 parent 8d1798c commit bcc947e
Show file tree
Hide file tree
Showing 6 changed files with 23 additions and 8 deletions.
3 changes: 2 additions & 1 deletion AUTHORS
Original file line number Diff line number Diff line change
@@ -1 +1,2 @@
Sage M. Wright
Sage M. Wright
Curtis J. Kapsak
2 changes: 1 addition & 1 deletion Dockerfile
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
ARG MERCURY_VER="1.0.5"
ARG MERCURY_VER="1.0.6"

FROM google/cloud-sdk:480.0.0-slim

Expand Down
6 changes: 3 additions & 3 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -27,16 +27,16 @@ Default databases by organism:
We highly recommend using the following Docker image to run Mercury:

```bash
docker pull us-docker.pkg.dev/general-theiagen/theiagen/mercury:1.0.5
docker pull us-docker.pkg.dev/general-theiagen/theiagen/mercury:1.0.6
```

The entrypoint for this Docker image is the Mercury help message. To run this container interactively, use the following command:

```bash
docker run -it --entrypoint=/bin/bash us-docker.pkg.dev/general-theiagen/theiagen/mercury:1.0.5
docker run -it --entrypoint=/bin/bash us-docker.pkg.dev/general-theiagen/theiagen/mercury:1.0.6
# Once inside the container interactively, you can run the tbp-parser tool
python3 /mercury/mercury/mercury.py -v
# v1.0.5
# v1.0.6
```

### Locally with Python
Expand Down
2 changes: 1 addition & 1 deletion mercury/Metadata.py
Original file line number Diff line number Diff line change
Expand Up @@ -30,7 +30,7 @@ def mpox_gisaid_metadata(self):

def flu_biosample_metadata(self):
biosample_required = ["submission_id", "organism", "collected_by", "collection_date", "geo_loc_name", "host", "host_disease", "isolation_source", "lat_lon"]
biosample_optional = ["sample_title", "isolation_type", "bioproject_accession", "attribute_package", "strain", "isolate", "culture_collection", "genotype", "host_age", "host_description", "host_disease_outcome", "host_disease_stage", "host_health_state", "host_sex", "host_subject_id", "host_tissue_sampled", "passage_history", "pathotype", "serotype", "serovar", "specimen_voucher", "subgroup", "subtype", "description"]
biosample_optional = ["sample_title", "isolation_type", "bioproject_accession", "attribute_package", "strain", "isolate", "culture_collection", "genotype", "host_age", "host_description", "host_disease_outcome", "host_disease_stage", "host_health_state", "host_sex", "host_subject_id", "host_tissue_sampled", "passage_history", "pathotype", "serotype", "serovar", "specimen_voucher", "subgroup", "subtype", "description", "abricate_flu_type" , "abricate_flu_subtype", "state", "year"]
return biosample_required, biosample_optional

def bankit_metadata(self):
Expand Down
16 changes: 15 additions & 1 deletion mercury/Table.py
Original file line number Diff line number Diff line change
Expand Up @@ -179,6 +179,12 @@ def make_biosample_csv(self):
else:
biosample_metadata[column] = ""

# if either "isolate" or "strain" columns do not exist in input terra table, then set a boolean variable "user_supplied_isolate_or_strain" to False; otherwise set to True
if "isolate" not in self.table.columns and "strain" not in self.table.columns:
user_supplied_isolate_or_strain = False
else:
user_supplied_isolate_or_strain = True

biosample_metadata.rename(columns={"submission_id" : "sample_name"}, inplace=True)

if self.organism == "mpox" or self.organism == "sars-cov-2":
Expand All @@ -188,7 +194,15 @@ def make_biosample_csv(self):
biosample_metadata.rename(columns={"collecting_lab" : "collected_by", "host_sci_name" : "host", "patient_gender" : "host_sex", "patient_age" : "host_age"}, inplace=True)
if self.organism == "sars-cov-2":
biosample_metadata.rename(columns={"treatment" : "antiviral_treatment_agent"}, inplace=True)

# Flu only: when user does not supply isolate or strain metadata columns, create "isolate" column using the syntax below
elif self.organism == "flu" and user_supplied_isolate_or_strain == False :
# type/state/submission_id/year (subtype)
# strip off "Type_" from beginning of Type, e.g. "Type_A" -> "A"
self.logger.debug("DEBUG:User did not supply isolate or strain metadata columns, creating isolate column for Flu samples now...")
biosample_metadata["isolate"] = (biosample_metadata["abricate_flu_type"].str.replace("Type_","") + "/" + biosample_metadata["state"] + "/" + biosample_metadata["sample_name"] + "/" + biosample_metadata["year"] + " (" + biosample_metadata["abricate_flu_subtype"] + ")")
# Remove 4 extra columns from the output table prior to creating TSV file (these are simply used to create the isolate column)
biosample_metadata.drop(["abricate_flu_type", "abricate_flu_subtype", "year", "state"], axis=1, inplace=True)

biosample_metadata.to_csv(self.output_prefix + "_biosample_metadata.tsv", sep='\t', index=False)
self.logger.debug("TABLE:BioSample metadata file created")

Expand Down
2 changes: 1 addition & 1 deletion mercury/__init__.py
Original file line number Diff line number Diff line change
@@ -1 +1 @@
__VERSION__ = "v1.0.5"
__VERSION__ = "v1.0.6"

0 comments on commit bcc947e

Please sign in to comment.