Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update to artic v1.5.3 (medaka to clair3) #78

Merged
merged 12 commits into from
Nov 15, 2024
Merged

Update to artic v1.5.3 (medaka to clair3) #78

merged 12 commits into from
Nov 15, 2024

Conversation

sam-baird
Copy link
Contributor

@sam-baird sam-baird commented Nov 15, 2024

This PR closes #71 and #75

Aim, context, and functionality 🎯

SC2_ont_assembly.wdl: Update artic v1.2.4 to v1.5.3. This release of the artic pipeline comes with several significant updates (see artic-network/fieldbioinformatics#137), most notably switch from Medaka to clair3.

Workflow Changes ✅

Upstream Effects

None

Input Changes

medaka_model to model

Output Changes

Version capture file: task name change Medaka to call_consensus_artic and removed mentions of medaka.

Downstream Effects

None (will do patch release to not cause problems with BigQuery transfers)

Testing 🛠️

test_cov_2205_grid

Test(s) performed:
Compared results between branch and current release.

Results (including if the results matched the expected results):
Theiavalidate and other results stored on GCP in validation/CDPHE-SARS-CoV-2/sb-clair3

  • Negative control had longshot error (no assembly output) in original results but now no error and assembly is output because of the removal of longshot.
  • Overlapping variants issue resolved in bcftools consensus for the three samples that failed for this reason in the original dataset. FASTA is now output for the three samples.
  • The 9 sample frameshift QC failures on Nextclade are resolved.
  • 'ANNNNT vs 3 nt deletion' problem in spike protein resolved (except one sample).
  • 5 pango lineage call changes (excluding the 3 samples that now have new pangolin call due to outputting a FASTA). All of the lienage changes are within one 'suffix' of the previous lineage.
  • Most samples differed in the number of nucleotide and amino acid mutations between the two datasets, but manual inspection of the mutations on the Nextclade web tool indicate these are mostly results of difference is base and gap calling at the very ends of the genome.

Developer Checklist 👷‍♀️

  • Prior to development, issues were discussed with the bioinformatics team members and approved
  • Code has been refactored to sufficiently address the issues this pull request closes
  • Testing was performed and the results from testing match the expected results
  • All code changes match our style guide
  • README has been updated to reflect changes
  • Workflow diagrams in READMEs have been updated to reflect changes

Reviewer Checklist 🔍

  • Met with developer to review all changes and testing performed
  • Code refactoring sufficiently address the issue(s) that this pull request closes
  • All code meets style guide critera
  • Confirm testing was sufficient (e.g. correct dataset was used for testing, results match the expected results). If not, the developer will perform additional testing which should be documented in the testing section above.
  • New version release number has been decided upon (if applicable)

Copy link
Contributor

@laura-bankers laura-bankers left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

reviewed and ready to merge

@sam-baird sam-baird marked this pull request as ready for review November 15, 2024 22:56
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[BUG/REQUEST] Update Artic to 1.5.2 for variant calling improvement and minor bug fix
2 participants