Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[TheiaProk wfs] upgrade StxTyper version and OPERON outputs #750

Draft
wants to merge 10 commits into
base: main
Choose a base branch
from

Conversation

fraser-combe
Copy link
Contributor

@fraser-combe fraser-combe commented Feb 6, 2025

This PR closes #694

🗑️ This dev branch should be deleted after merging to main.

🧠 Summary

This PR updates the StxTyper WDL task across all TheiaProk workflows by upgrading the Docker image to version 1.0.40 and updating the parsing logic to include two new OPERON output types: EXTENDED and AMBIGUOUS. Additionally, the no-hits branch has been updated to create placeholder files for these new outputs, ensuring the workflow does not fail when no hits are found.
Documentation updated to include new outputs

see new release notes here: https://github.com/ncbi/stxtyper/releases/tag/v1.0.40

⚡ Impacted Workflows/Tasks

TheiaProk workflows utilizing the StxTyper task within merlin_magic.wdl

  • stxtyper.wdl

This PR may lead to different results in pre-existing outputs: Yes
This PR uses an element that could cause duplicate runs to have different results: No

🛠️ Changes

  • Docker Image Upgrade: Updated the StxTyper Docker image from version 1.0.24 to 1.0.40.
  • Output Parsing Enhancements: Modified the task to parse and capture two new OPERON output types, EXTENDED and --AMBIGUOUS, in addition to the previously parsed types.
  • No-Hits Handling: Revised the no-hits branch to explicitly create placeholder output files for all expected outputs, including the new ones (stxtyper_extended_operons and stxtyper_ambiguous_hits), so that the workflow passes even when no hits are found.
  • Aggregated Output Update: Adjusted the final aggregation step to include the new output types in the stxtyper_all_hits file.

###:gear: Algorithm

Only updated parsing logic for new OPERON types and conditional logic for outputs in case of no hits or aggregated output.

➡️ Inputs

NA

⬅️ Outputs

Added two new outputs:

  • stxtyper_extended_operons (captures operons with the EXTENDED status)
  • stxtyper_ambiguous_hits (captures ambiguous hits)
  • The aggregated output (stxtyper_all_hits) now includes these new values as well.

🧪 Testing

Tested the updated task using a local test assembly through miniwdl to verify that:
-The Docker image is correctly updated and outputs the correct version (1.0.40).
-The script correctly parses hits when present.
-In cases where no hits are found, all output files (including the new ones) are created with placeholder content ("None") to prevent delocalization failures.

ILMN PE with E. coli samples known to be stx positive

Repeated submissions from Curtis testing from a previous PR updating STXtyper showed that workflows ran successfully, and StxTyper outputs matched expectations except for one sample when comparing 161 samples with the last validation run here .
I re ran the same sample M22F001452 and confirmed the same result.

Sample ID Old stxtyper_all_hits New stxtyper_all_hits Old stxtyper_num_hits New stxtyper_num_hits
M22F001452 stx2,stx2j stx2j 2 1

Ran the other Theiaprok workflows and received identical results to Curtis last validation runs.

ONT on some random Shigella that are likely stx negative

FASTA with same E. coli dataset that are stx positive

TheiaProk_Illumina_SE

Suggested Scenarios for Reviewer to Test

Test any of the TheiaProk workflows. E. coli and Shigella spp samples
Run task on assemblies with no hits for stx typer to verify all required outputs when no hits

🔬 Final Developer Checklist

  • The workflow/task has been tested and results, including file contents, are as anticipated
  • The CI/CD has been adjusted and tests are passing (Theiagen developers)
  • Code changes follow the style guide
  • Documentation and/or workflow diagrams have been updated if applicable
    • You have updated the "Last Known Changes" field for any affected workflows in the respective workflow documentation page and for every entry in the three workflows_overview tables to be the tag for the next upcoming release. If you do not know the tag, please put "vX.X.X"

🎯 Reviewer Checklist

  • All changed results have been confirmed
  • You have tested the PR appropriately (see the testing guide for more information)
  • All code adheres to the style guide
  • MD5 sums have been updated
  • The PR author has addressed all comments
  • The documentation has been updated

@fraser-combe fraser-combe marked this pull request as ready for review February 7, 2025 21:18
@fraser-combe fraser-combe requested a review from a team as a code owner February 7, 2025 21:18
@AndrewLangvt AndrewLangvt marked this pull request as draft February 11, 2025 15:21
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
1 participant