Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Create tools for SV VCF cleaning #8996

Draft
wants to merge 62 commits into
base: master
Choose a base branch
from
Draft
Changes from 1 commit
Commits
Show all changes
62 commits
Select commit Hold shift + click to select a range
d8fb88c
Initial commit
kjaisingh Oct 9, 2024
0c72bd2
Silenced SV type processing
kjaisingh Oct 11, 2024
05501ce
Created initial commit for 1b
kjaisingh Oct 16, 2024
40295e1
Reformatting per GATK style guide
kjaisingh Oct 16, 2024
5cd8ea3
Update src/main/java/org/broadinstitute/hellbender/tools/walkers/sv/S…
kjaisingh Oct 16, 2024
75f476b
Update src/main/java/org/broadinstitute/hellbender/tools/walkers/sv/S…
kjaisingh Oct 16, 2024
8d31a61
Update src/main/java/org/broadinstitute/hellbender/tools/walkers/sv/S…
kjaisingh Oct 16, 2024
12cc930
Update src/main/java/org/broadinstitute/hellbender/tools/walkers/sv/S…
kjaisingh Oct 16, 2024
4ecd525
Update src/main/java/org/broadinstitute/hellbender/tools/walkers/sv/S…
kjaisingh Oct 16, 2024
df375cc
PR feedback
kjaisingh Oct 17, 2024
6f34672
Merge branch 'master' into kj_sv_cleanvcf
kjaisingh Oct 17, 2024
2e28e13
WIP commit - prior to deprecating BED file input in 1b
kjaisingh Oct 17, 2024
f0c0e0f
Updated to no longer ingest BED file
kjaisingh Oct 18, 2024
50c4eaf
Cleaned up scripts...
kjaisingh Oct 18, 2024
74de1b0
SVCleanPt2 WIP
kjaisingh Oct 18, 2024
354857f
Merge branch 'master' into kj_sv_cleanvcf
kjaisingh Oct 18, 2024
74f0d73
Working version of SVCleanPt2
kjaisingh Oct 22, 2024
54f5f96
Code cleanup
kjaisingh Oct 22, 2024
18da350
Added sorting and better formatting of outputs
kjaisingh Oct 22, 2024
f7e14c6
Initial commit for CleanPt4
kjaisingh Oct 23, 2024
5f5a6cd
WIP
kjaisingh Oct 24, 2024
52a9049
WIP - up till CleanVcf5 (first task complete)
kjaisingh Oct 25, 2024
31f7032
Reformatting & restructuring
kjaisingh Oct 26, 2024
2b97b7b
Completed CleanVcf4 / implemented skeleton walker for CleanVcf5
kjaisingh Oct 28, 2024
45443f9
Revert SVCleanPt2 to use overlap buffer
kjaisingh Oct 30, 2024
7fbc3a2
Working implementation of SVCleanPt5
kjaisingh Oct 30, 2024
181b352
Modified param name for chrX/chrY
kjaisingh Oct 31, 2024
3eb5c3d
Changes to test
kjaisingh Oct 31, 2024
52daa21
Changes to test
kjaisingh Oct 31, 2024
d438bb9
CleanVcf4 added exit if not in range
kjaisingh Oct 31, 2024
9aaf2f1
Updated type of EV format field
kjaisingh Oct 31, 2024
1de3e2f
Minor changes - replaced EV type
kjaisingh Oct 31, 2024
4e353a0
Merge branch 'master' into kj_sv_cleanvcf
kjaisingh Nov 1, 2024
a57601f
Skip no-call genotypes
kjaisingh Nov 1, 2024
2f17eec
Changes post-debugging: modify .getEnd() to use SVLEN
kjaisingh Nov 4, 2024
4c60608
Undo use of getEnd()
kjaisingh Nov 5, 2024
56ecaeb
Furhter debugging: modified SVTYPE update & corresponding genotype as…
kjaisingh Nov 5, 2024
82ecea8
Handled no-call rewriting bug
kjaisingh Nov 5, 2024
d96d810
Modifications to large del/dup event logic
kjaisingh Nov 6, 2024
0fa7738
Modified conditions for biallelic filtering
kjaisingh Nov 6, 2024
a281d65
Modified filter to only use distinct pairs
kjaisingh Nov 6, 2024
f7215ee
Moved svtype modification to cleanpt4
kjaisingh Nov 7, 2024
383a01c
Reverted svtype changes
kjaisingh Nov 7, 2024
c9d1020
Merge branch 'master' into kj_sv_cleanvcf
kjaisingh Nov 12, 2024
ee16037
Overlap logic change to align with GATK-SV outputs
kjaisingh Nov 12, 2024
3f901f4
Modified overlap logic to only be unidirectional
kjaisingh Nov 12, 2024
e8ce4c5
Standardized variant overlap code
kjaisingh Nov 12, 2024
29dc96f
Modified overlap logic - minor bug
kjaisingh Nov 12, 2024
a68dcc6
Modified CleanPt4 imports
kjaisingh Nov 14, 2024
8ba660e
Minor change to remove redundant size/svtype check
kjaisingh Nov 18, 2024
657e601
Merge branch 'master' into kj_sv_cleanvcf
kjaisingh Nov 22, 2024
707ada6
Minor file creation
kjaisingh Nov 25, 2024
1e30b4b
New tools - modular implementations
kjaisingh Dec 11, 2024
726144d
Updated to include necessary imports only
kjaisingh Dec 11, 2024
0ab56eb
Updated header writing
kjaisingh Dec 13, 2024
8fbd27e
Added sex revisions for male GT
kjaisingh Jan 13, 2025
a40b193
Concatenated overlapping cnv tasks into one
kjaisingh Jan 17, 2025
63838fe
Merge branch 'master' into kj_sv_cleanvcf
kjaisingh Jan 17, 2025
e60fc9c
Bug-fixed to match results from 5-pass version
kjaisingh Jan 21, 2025
a4db400
Standardized overlap logic across OverlappingCnv task methods
kjaisingh Jan 21, 2025
7f9b22d
Used caching to improve runtime
kjaisingh Jan 29, 2025
f46e501
Minor structural changes to overlap code
kjaisingh Feb 6, 2025
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Prev Previous commit
Next Next commit
New tools - modular implementations
  • Loading branch information
kjaisingh committed Dec 11, 2024
commit 1e30b4b7d62c7db309e00fa668e15481335fe3e4
Original file line number Diff line number Diff line change
@@ -1,7 +1,15 @@
package org.broadinstitute.hellbender.tools.walkers.sv;

import htsjdk.variant.variantcontext.*;
import htsjdk.variant.variantcontext.Allele;

import htsjdk.variant.variantcontext.Genotype;
import htsjdk.variant.variantcontext.GenotypeBuilder;
import htsjdk.variant.variantcontext.VariantContext;
import htsjdk.variant.variantcontext.VariantContextBuilder;
import htsjdk.variant.variantcontext.writer.VariantContextWriter;
import htsjdk.variant.vcf.VCFHeader;
import htsjdk.variant.vcf.VCFHeaderLineType;
import htsjdk.variant.vcf.VCFInfoHeaderLine;

import htsjdk.variant.vcf.*;
import org.broadinstitute.barclay.argparser.Argument;
Original file line number Diff line number Diff line change
@@ -0,0 +1,106 @@
package org.broadinstitute.hellbender.tools.walkers.sv;

import htsjdk.variant.variantcontext.*;
import htsjdk.variant.variantcontext.writer.VariantContextWriter;
import org.broadinstitute.barclay.argparser.Argument;
import org.broadinstitute.barclay.argparser.BetaFeature;
import org.broadinstitute.barclay.argparser.CommandLineProgramProperties;
import org.broadinstitute.barclay.help.DocumentedFeature;
import org.broadinstitute.hellbender.cmdline.StandardArgumentDefinitions;
import org.broadinstitute.hellbender.cmdline.programgroups.StructuralVariantDiscoveryProgramGroup;
import org.broadinstitute.hellbender.engine.*;
import org.broadinstitute.hellbender.tools.spark.sv.utils.GATKSVVCFConstants;

import java.util.List;
import java.util.ArrayList;

/**
* Completes an initial series of cleaning steps for a VCF produced by the GATK-SV pipeline.
*
* <h3>Inputs</h3>
* <ul>
* <li>
* VCF containing structural variant (SV) records from the GATK-SV pipeline.
* </li>
* <li>
* TODO
* </li>
* </ul>
*
* <h3>Output</h3>
* <ul>
* <li>
* Cleansed VCF.
* </li>
* </ul>
*
* <h3>Usage Example</h3>
* <pre>
* TODO
* </pre>
*
* <h3>Processing Steps</h3>
* <ol>
* <li>
* TODO
* </li>
* </ol>
*/
@CommandLineProgramProperties(
summary = "Clean and format SV VCF",
oneLineSummary = "Clean and format SV VCF",
programGroup = StructuralVariantDiscoveryProgramGroup.class
)
@BetaFeature
@DocumentedFeature
public class SVReviseAbnormalAllosomes extends VariantWalker {
@Argument(
fullName = StandardArgumentDefinitions.OUTPUT_LONG_NAME,
shortName = StandardArgumentDefinitions.OUTPUT_SHORT_NAME,
doc = "Output VCF name"
)
private GATKPath outputVcf;

private VariantContextWriter vcfWriter;

@Override
public void onTraversalStart() {
vcfWriter = createVCFWriter(outputVcf);
vcfWriter.writeHeader(getHeaderForVariants());
}

@Override
public void closeTool() {
if (vcfWriter != null) {
vcfWriter.close();
}
}

@Override
public void apply(final VariantContext variant, final ReadsContext readsContext, final ReferenceContext referenceContext, final FeatureContext featureContext) {
VariantContextBuilder builder = new VariantContextBuilder(variant);
if (!variant.getAttributeAsBoolean(GATKSVVCFConstants.REVISED_EVENT, false)) {
processRevisedSex(variant, builder);
}
vcfWriter.add(builder.make());
}

private void processRevisedSex(final VariantContext variant, final VariantContextBuilder builder) {
List<Genotype> genotypes = variant.getGenotypes();
List<Genotype> updatedGenotypes = new ArrayList<>(genotypes.size());
for (Genotype genotype : genotypes) {
if (Integer.parseInt(genotype.getExtendedAttribute(GATKSVVCFConstants.RD_CN, 0).toString()) > 0) {
int newRdCn = Integer.parseInt(genotype.getExtendedAttribute(GATKSVVCFConstants.RD_CN).toString()) - 1;
GenotypeBuilder gb = new GenotypeBuilder(genotype);
gb.attribute(GATKSVVCFConstants.RD_CN, newRdCn);
if (genotype.hasExtendedAttribute(GATKSVVCFConstants.COPY_NUMBER_FORMAT)) {
gb.attribute(GATKSVVCFConstants.COPY_NUMBER_FORMAT, newRdCn);
}
updatedGenotypes.add(gb.make());
} else {
updatedGenotypes.add(genotype);
}
}
builder.genotypes(genotypes);
}
}
Loading
Loading