-
Notifications
You must be signed in to change notification settings - Fork 16
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[User Story] Apply strandbias filter for WGS TN workflow #1505
Comments
I have done some testing of this filter in this sheet: https://docs.google.com/spreadsheets/d/1jBSoqJR1IcSw1w5sDXeAnMSZJMrZpuxEkzW3DJeQYns/edit?gid=0#gid=0 But I'll summarise the relevant parts here. I looked into the effects on the number of variants are applying the same filter as we're using in the WGS TO analysis on 5 different cases, where the bottom is the case from the deviation https://github.com/Clinical-Genomics/Deviations/issues/719 If we apply the filter as it is in WGS TO there will be a LOT of variants filtered from the WGS TN workflow, stretching from between 82% -> 7.8% of the final clinical filtered variants. I also wanted to check how the SOR parameter works for low AD variants by checking if the fraction of variants with an AD below 6 which also had the SOR > 3 filter set was higher than in the total set of variants. But I didn't see any such clear link. Which was promising.
Then to look into the relationship between the ALT_F1R2 and ALT_F2R1 and the SOR parameter a bit more closely I generated some plots. In this plot I merged the top 4 WGS cases (excluding the one from the deviation) and focusing on the quality_filtered VCFs. And as usual it's quite frustrating to understand the parameters in TNscope. It seems of course that the SOR parameter is linked to the ALT_F1R2 and ALT_F2R1 values but it's not the whole story. It just puts us in the situation where we look at variants like this: Where such as in the 3 row there's equal representation of both strands in the variant. Should we just trust that TNscope behind the scenes is doing some clever math? Or should we try to create our own filter which would make sense given the variant data? I tested this method:
Which would make more sense when looking at these variant read-strand data. And which we probably could implement in bcftools. But I don't know at the moment which method to prefer. |
I guess this SOR parameter is inspired by, or taken directly from GATK StrandOddsRatio: https://gatk.broadinstitute.org/hc/en-us/articles/360036361772-StrandOddsRatio Where the strandbias is also based on the reference allele. But I wonder if that strandbias metric would be optimal for cases such as the one in the deviation, where the reference allele appears to have no strandbias. But update on this: I tested the method they posted on the website with one of these variants from TNscope and the SOR did not agree at all. So I'm guessing TNscope calculates it in a slightly different way. It's also quite clear that some variants exists with quite significant strandbias based on the alt allele strandedness values, but which has quite good SOR values and are consequently not filtered out: I have emailed Sentieon to see if they have any information on this This script from SMD-Bioinformatics-lund could be worth taking a look at: https://github.com/SMD-Bioinformatics-Lund/SomaticPanelPipeline/blob/01c5bd8ae916bc5a27b20353c0bc8e9b4fae3e3e/bin/filter_tnscope_somatic.pl#L4 |
Need
As a clinician interpreting variants from balsamic I want to avoid interpreting false positive artefacts.
Currently in the balsamic WGS T+N workflow there's no strand bias filtering applied, which has caused a lot of likely artefacts to be called in at least a couple of cases, as can be seen in this deviation: https://github.com/Clinical-Genomics/Deviations/issues/719 where a very large number of variants were called with 100% strand bias.
These samples admittedly seems to have some technical lab issues, something to do with the index-pairs, but it highlighted a need for applying a SOR filter.
Suggested approach
Apply bcftools filter similar to the one applied in the tumor only WGS workflow:
| bcftools filter --threads {threads} --include "INFO/SOR < {params.sor[0]}" --soft-filter '{params.sor[1]}' --mode '+' \
in the rule:
bcftools_quality_filter_tnscope_tumor_normal_wgs
Considered alternatives
No response
Deviation
No response
System requirements assessed
Requirements affected by this story
No response
Risk assessment needed
Risk assessment
No response
SOUPs
No response
Can be closed when
No response
Blockers
No response
Anything else?
No response
The text was updated successfully, but these errors were encountered: