feat: add sex check #1516

mathiasbio · 2025-01-03T10:39:48Z

Description

Adds sex checks to all workflows. See issue: #1517
In the above issue I also post some results from tests using the 3 different methods implemented here:

WGS TN (using Ascat)
WGS TO using a fraction of the median per base X and Y coverage,
TGA using the CNVkit CNN target and antitarget files

Added

sex prediction tools and specified sex-verification for all workflows

Documentation

N/A, WILL UPDATE ATLAS THOUGH!
Updated Balsamic documentation to reflect the changes as needed for this PR.
- [Document Name]

Tests

Feature Tests

Test that the sex check is working!
More tests showing that the prediction is working in all workflows in this issue: #1517

Manually changing config-file for a sex to wrong gender, and after running metric validation the error shows up in all.0.sh.123456.err

After running a switched up case, the sex-prediction shows as "conflicting" and the metric validation error is shown in all.0.sh.123456.err

Both Somalier and the sex prediction fails as expected.

From predicted_sex.json:

Sex metric shows up in deliverables yaml file

Pipeline Integrity Tests

Report deliver (generation of the .hk file)
- N/A
- Verified
TGA T/O Workflow
- N/A
- Verified
TGA T/N Workflow
- N/A
- Verified
UMI T/O Workflow
- N/A
- Verified
UMI T/N Workflow
- N/A
- Verified
WGS T/O Workflow
- N/A
- Verified
WGS T/N Workflow
- N/A
- Verified
QC Workflow
- N/A
- Verified
PON Workflow
- N/A
- Verified

Clinical Genomics Stockholm

Documentation

Atlas documentation
- N/A
- Updated docs here: https://github.com/Clinical-Genomics/atlas/pull/3302
Web portal for Clinical Genomics
- N/A
- Updated: [Link]

Panel of Normal specific criteria

The PR includes the addition of a new Panel of Normals
The samples have been verified to adhere to the sample selection criteria on Atlas PoN creation instructions for Balsamic

User Changes

N/A
This PR affects the output files or results.
- User feedback is considered unnecessary because [Justification].
- Affected users have been included in the development process and given a chance to provide feedback.

Infrastructure Changes

Stored files in Housekeeper
- N/A
- Updated: [Link]
CG (CLI and delivered/uploaded files)
- N/A
- Updated: [Link]
Servers (configuration files on Hasta)
- N/A
- Updated: [Link]
Scout interface
- N/A
- Updated: [Link]

Checklist

Important

Ensure that all checkboxes below are ticked before merging.

For Developers

PR Description
- Provided a comprehensive description of the PR.
- Linked relevant user stories or issues to the PR.
Documentation
- Verified and updated documentation if necessary.
Tests
- Described and tested the functionality addressed in the PR.
- Ensured integration of the new code with existing workflows.
- Confirmed that meaningful unit tests were added for the changes introduced.
- Checked that the PR has successfully passed all relevant code smells and coverage checks.
Review
- Addressed and resolved all the feedback provided during the code review process.
- Obtained final approval from designated reviewers.

For Reviewers

Code
- Code implements the intended features or fixes the reported issue.
- Code follows the project's coding standards and style guide.
Documentation
- Pipeline changes are well-documented in the CHANGELOG and relevant documentation.
Tests
- The author provided a description of their manual testing, including consideration of edge cases and boundary
  conditions where applicable, with satisfactory results.
Review
- Confirmed that the developer has addressed all the comments during the code review.

codecov · 2025-01-07T09:51:03Z

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 99.50%. Comparing base (7d529e6) to head (fc59c9e).
Report is 36 commits behind head on develop.

Additional details and impacted files

@@             Coverage Diff             @@
##           develop    #1516      +/-   ##
===========================================
+ Coverage    99.48%   99.50%   +0.01%     
===========================================
  Files           40       40              
  Lines         1932     2000      +68     
===========================================
+ Hits          1922     1990      +68     
  Misses          10       10

Flag	Coverage Δ
unittests	`99.50% <100.00%> (+0.01%)`	⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

… into add_sex_check

fevac

Nicely done! 🌟 It's a bit annoying that the sex check is not the same for all workflows but that they rely in different files and methods. Since the most general way seems to be the one using sentieon outputs, would that work for all of them (using different thresholds if needed)? I know you haven't run those tests, but do you have a feeling for it?

The other things that might be problematic is that the sex check is not written in multiqc but only on the yalm file. If I'm not wrong the point was to stop using the yalm file eventually and only feed multiqc files into janus. Would it make sense to make a local multiqc module to output this value?

see other small comments below

BALSAMIC/assets/scripts/collect_qc_metrics.py

fevac · 2025-01-10T08:08:47Z

BALSAMIC/snakemake_rules/quality_control/qc_metrics.rule

-python {params.collect_qc_metrics_script} {params.config_path} {output.yaml} {input.json} {input.bcftools_counts}
-        """
+
+if config["analysis"]["analysis_workflow"] != "balsamic-qc":


sorry, why is this needed now? why are we diverging between balsamic and balsamic-qc?

I could just limit this if-statement to WGS TN and TGA.

But there it's because the sex_prediction.json is reliant on CNV tools which are not run in the balsamic-qc workflow. But I didn't want to bother too much about allowing the sex-check for specifically WGS TO cases in balsamic-qc because I don't think anyone is ever running balsamic-qc anyway, and the logic for starting it is also being removed from CG. So it feels like a dead workflow 🤷

but I guess if I did not rely on the CNV tools but only the per base coverage stats like in WGS TO I wouldn't need to make this distinction. But we're not generating this file for the TGA workflow so it would need to be added, and at that point it just starts to feel like extra work with little benefit. But for sure I could use this file in WGS TN similar to WGS TO, and only have 2 ways of getting the prediction instead of 3. It wouldn't be that difficult to make that change, it was just nice to rely on a more sophisticated tool when there was one available, but then I'm not actually sure how Ascat determines if the Y-chrom is present or not, only that it works so far...

aha I see your point. Could the senteion file be generated to the TGA workflow? If not, I agree that is extra work for not much benefit.

I think it can be added but I would need to investigate what the file looks like for TGA and rerun a bunch of cases to get the files and find a suitable threshold like I did with the CNVkit files. I agree that it would be cleaner, and I'm sure it could work! But for now it feels like we have more pressing things to add to release 17, and I kind of just wanted to sneak this feature in as it was requested by prodbioinfo for so long, but we didn't really plan for it to be included in the release 😬 and it seems to be working so I'm happy with this compromise

but if I could start over I'd generate the per base coverage files! Now it feels like I don't have time anymore to make the change and finish the remaining features 🥲

I see! That's fine then. Thanks for clarifying though

BALSAMIC/snakemake_rules/quality_control/sentieon_qc_metrics.rule

fevac · 2025-01-10T08:19:12Z

BALSAMIC/snakemake_rules/quality_control/sex_check.rule

it's a bit ugly to have this repeated and slightly different for the different workflows, I wonder if we could simplify it to have a single rule and dynamically determine the input and the arguments for the scripts. Would that be cleaner?

I think that would force me to do a lot of snakemake wildcard stuff, which would end up converting the input files to a list so I couldn't use arguments anymore to the script, and it would end up transferring the logic if figuring out which files goes where to the script and it could get messy I think 🤔 I kind of think this repeated rule structure gives a nice overview of what's actually happening but maybe that's just me 😂

Repeated structures are difficult to maintain, so generally I would argue against them. However in this case it seems that it might help readability and understanding of how things are done so I guess it's ok to leave it

tests/models/test_metric_models.py

tests/scripts/test_sex_prediction.py

mathiasbio · 2025-01-10T11:10:56Z

I made some changes now, to remove ascat for TN WGS and instead use the same method as for WGS TO, and to not use case_sex but instead just compare each samples sex to the sex in the config. This should be a bit cleaner. Thanks for the suggestions Eva!

sonarqubecloud · 2025-01-10T15:31:27Z

Quality Gate passed

Issues
0 New issues
0 Accepted issues

Measures
0 Security Hotspots
0.0% Coverage on New Code
0.0% Duplication on New Code

See analysis details on SonarQube Cloud

mathiasbio added 13 commits December 20, 2024 14:20

begin adding sex check

978bf0b

update script

f9fdf84

update script

d7f00f2

refactoring

e15ac96

refactoring black

ce1ef2a

add sex check

68579a5

fix format

366f8f9

add wgs sex check

aeaf0c6

add sex check to qc metrics

902d45f

add sex check for wgs

803f099

add wgs tumor only

f57f940

black

8195cf1

fix bugs

29a3c1a

mathiasbio added 8 commits January 7, 2025 10:57

changelog

34c9663

fix code

f257381

black

dbd3159

refactor

50af315

black

f08533b

add pytests to new scripts

b68fff6

add additional pytest for final qc

7fb7e31

black

eaee329

mathiasbio linked an issue Jan 7, 2025 that may be closed by this pull request

[User Story] Add sex check for all workflows #1517

Open

5 tasks

mathiasbio added 7 commits January 7, 2025 17:17

fix bug

02ba985

fix

04e3959

make metrics model work with strings too

417685a

fix pytests

e29a79b

black

a312c69

Merge branch 'develop' into add_sex_check

fb8b4cf

fix issues

976dbec

mathiasbio added 5 commits January 9, 2025 12:04

Merge branch 'add_sex_check' of github.com:Clinical-Genomics/BALSAMIC…

55ebb77

… into add_sex_check

fix issues

8b67ee7

fix

5f24021

fix

ceceef2

add new pytest

4c226e7

mathiasbio marked this pull request as ready for review January 9, 2025 14:19

mathiasbio requested a review from a team as a code owner January 9, 2025 14:19

fevac requested changes Jan 10, 2025

View reviewed changes

mathiasbio added 2 commits January 10, 2025 11:07

fix

e95fccd

switch from ascat to per base coverage for wgs tn, and remove case_sex

5b4938d

mathiasbio added 6 commits January 10, 2025 12:11

black

f62f8aa

fix

41f7a38

replace sex prediction json files

b16557b

replace with tn female sex prediction

706e736

fix pytests

d50c524

black and docstring

fc59c9e

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: add sex check #1516

feat: add sex check #1516

mathiasbio commented Jan 3, 2025 •

edited

Loading

codecov bot commented Jan 7, 2025 •

edited

Loading

fevac left a comment

fevac Jan 10, 2025

mathiasbio Jan 10, 2025

fevac Jan 10, 2025

mathiasbio Jan 10, 2025

mathiasbio Jan 10, 2025

fevac Jan 10, 2025

fevac Jan 10, 2025

mathiasbio Jan 10, 2025

fevac Jan 10, 2025

mathiasbio commented Jan 10, 2025

sonarqubecloud bot commented Jan 10, 2025

feat: add sex check #1516

Are you sure you want to change the base?

feat: add sex check #1516

Conversation

mathiasbio commented Jan 3, 2025 • edited Loading

Description

Added

Documentation

Tests

Feature Tests

Pipeline Integrity Tests

Clinical Genomics Stockholm

Documentation

Panel of Normal specific criteria

User Changes

Infrastructure Changes

Checklist

For Developers

For Reviewers

codecov bot commented Jan 7, 2025 • edited Loading

Codecov Report

fevac left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mathiasbio commented Jan 10, 2025

sonarqubecloud bot commented Jan 10, 2025

Quality Gate passed

mathiasbio commented Jan 3, 2025 •

edited

Loading

codecov bot commented Jan 7, 2025 •

edited

Loading