Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[merlin_magic] harden workflow against failures due to sonnei typing disagreement #600

Closed
AndrewLangvt opened this issue Aug 30, 2024 · 3 comments · Fixed by #747
Closed
Assignees

Comments

@AndrewLangvt
Copy link
Contributor

🆒

📌 Explain the Request

Currently, merlin magic will fail if gambit predicts Shigella Sonniei, but Shigatyper "disagrees." Expand logic for sonneityping to simply output empty result in these instances. For more context, see slack messages https://theiagen.slack.com/archives/C04GMT7TPTR/p1724974576036859

@kapsakcj
Copy link
Contributor

kapsakcj commented Nov 6, 2024

and to add on to this, when this occurs, and the sonneityper tool is run and fails on non-Shigella sonnei genomes. Here's the not-so-helpful error message:

[mykrobe 2024-11-05T23:39:22 INFO] Progress: finished making AMR predictions
[mykrobe 2024-11-05T23:39:22 INFO] Progress: writing output
[mykrobe 2024-11-05T23:39:22 INFO] Progress: finished
Traceback (most recent call last):
File "/sonneityping/parse_mykrobe_predict.py", line 326, in <module>
main()
File "/sonneityping/parse_mykrobe_predict.py", line 323, in main
final_results.to_csv(args.prefix + "_predictResults.tsv", index=False, sep="\t", columns=["genome", "species", "final genotype", "name", "confidence", "num QRDR", "parC_S80I", "gyrA_S83L", "gyrA_S83A", "gyrA_D87G", "gyrA_D87N", "gyrA_D87Y", "lowest support for genotype marker", "poorly supported markers", "max support for additional markers", "additional markers", "node support"])
File "/opt/conda/envs/mykrobe/lib/python3.10/site-packages/pandas/util/_decorators.py", line 211, in wrapper
return func(*args, **kwargs)
File "/opt/conda/envs/mykrobe/lib/python3.10/site-packages/pandas/core/generic.py", line 3720, in to_csv
return DataFrameRenderer(formatter).to_csv(
File "/opt/conda/envs/mykrobe/lib/python3.10/site-packages/pandas/util/_decorators.py", line 211, in wrapper
return func(*args, **kwargs)
File "/opt/conda/envs/mykrobe/lib/python3.10/site-packages/pandas/io/formats/format.py", line 1170, in to_csv
csv_formatter = CSVFormatter(
File "/opt/conda/envs/mykrobe/lib/python3.10/site-packages/pandas/io/formats/csvs.py", line 89, in __init__
self.cols = self._initialize_columns(cols)
File "/opt/conda/envs/mykrobe/lib/python3.10/site-packages/pandas/io/formats/csvs.py", line 156, in _initialize_columns
self.obj = self.obj.loc[:, cols]
File "/opt/conda/envs/mykrobe/lib/python3.10/site-packages/pandas/core/indexing.py", line 1067, in __getitem__
return self._getitem_tuple(key)
File "/opt/conda/envs/mykrobe/lib/python3.10/site-packages/pandas/core/indexing.py", line 1256, in _getitem_tuple
return self._getitem_tuple_same_dim(tup)
File "/opt/conda/envs/mykrobe/lib/python3.10/site-packages/pandas/core/indexing.py", line 924, in _getitem_tuple_same_dim
retval = getattr(retval, self.name)._getitem_axis(key, axis=i)
File "/opt/conda/envs/mykrobe/lib/python3.10/site-packages/pandas/core/indexing.py", line 1301, in _getitem_axis
return self._getitem_iterable(key, axis=axis)
File "/opt/conda/envs/mykrobe/lib/python3.10/site-packages/pandas/core/indexing.py", line 1239, in _getitem_iterable
keyarr, indexer = self._get_listlike_indexer(key, axis)
File "/opt/conda/envs/mykrobe/lib/python3.10/site-packages/pandas/core/indexing.py", line 1432, in _get_listlike_indexer
keyarr, indexer = ax._get_indexer_strict(key, axis_name)
File "/opt/conda/envs/mykrobe/lib/python3.10/site-packages/pandas/core/indexes/base.py", line 6111, in _get_indexer_strict
self._raise_if_missing(keyarr, indexer, axis_name)
File "/opt/conda/envs/mykrobe/lib/python3.10/site-packages/pandas/core/indexes/base.py", line 6174, in _raise_if_missing
raise KeyError(f"{not_found} not in index")
KeyError: "['species', 'lowest support for genotype marker'] not in index"
mv: cannot stat 'SAMPLENAME.sonneityping_predictResults.tsv': No such file or directory
Traceback (most recent call last):
File "<stdin>", line 2, in <module>
FileNotFoundError: [Errno 2] No such file or directory: './SAMPLENAME.sonneityping.tsv

The resolution to get TheiaProk to succeed is to set expected_taxon optional input (workflow-level variable) to "Escherichia" and re-run the workflow. ✅

@kapsakcj
Copy link
Contributor

kapsakcj commented Nov 6, 2024

Here's the offending genome in the GAMBIT database that is labeled as Shigella sonnei by the submitter/NCBI, but should probably be corrected to E. coli
[GCF_003985005.1] Shigella sonnei subspecies 2
https://www.ncbi.nlm.nih.gov/datasets/genome/GCF_003985005.1/

NCBI's own ANI analysis shows it's closer to an E. coli than the closest Shigella sonnei genome:
image

@aestradaapril
Copy link

Here are some other Shigella genomes in the GAMBIT database that are worth looking into. These resulted in similar issues during our validation of TheiaProk. Thank you for looking into this!

[GCF_002249045.1] Shigella boydii subspecies 2
https://www.ncbi.nlm.nih.gov/datasets/genome/GCF_002249045.1/
[GCF_003977375.1] Shigella sonnei subspecies 1
https://www.ncbi.nlm.nih.gov/datasets/genome/GCF_003977375.1/
[GCF_003977345.1] Shigella boydii subspecies 2
https://www.ncbi.nlm.nih.gov/datasets/genome/GCF_003977345.1/
[GCF_004402105.1] Shigella sonnei subspecies 1
https://www.ncbi.nlm.nih.gov/datasets/genome/GCF_004402105.1/
[GCF_002246205.1] Shigella sonnei subspecies 2
https://www.ncbi.nlm.nih.gov/datasets/genome/GCF_002246205.1/
[GCF_003985005.1] Shigella sonnei subspecies 2
https://www.ncbi.nlm.nih.gov/datasets/genome/GCF_003985005.1/
[GCF_003977245.1] Shigella sonnei subspecies 2
https://www.ncbi.nlm.nih.gov/datasets/genome/GCF_003977245.1/

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
4 participants