Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add xml workflow #636

Open
wants to merge 8 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions docs/analysis-workflows/analysis-overview.md
Original file line number Diff line number Diff line change
Expand Up @@ -13,6 +13,7 @@ The ARGO Data Platform will accept a wide range of datatypes, including:
- DNA Methylation data
- Transcriptomic data
- Proteomic data
- Variant calling data (in XML format)
- Slide images

Pipelines, and the individual analysis workflows that they are constructed from, have been developed by the DCC Bioinformatics team based on established, best community practices.
Expand Down
29 changes: 29 additions & 0 deletions docs/analysis-workflows/xml-ingestion.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,29 @@
---
id: xml-variant-ingestion
title: XML Variant Ingestion
sidebar_label: XML Variant Ingestion
platform_key: DOCS_DNA_PIPELINE
---

The ARGO Data Platform accepts variant calling data in XML format (based on hg19 genome reference). The XML file will be converted to VCF file followed by lift over to GRCh38 reference genome. For details, please see the latest version of the [ARGO xml_variant_ingestion workflow](https://github.com/icgc-argo-workflows/dna-seq-processing-wfs/releases).

## Inputs
* Submitted XML file(s)
* Mapping file
* [GRCh38](https://hgdownload.soe.ucsc.edu/goldenPath/hg38/bigZips/) as human reference genome
* Genome liftover [chain file](https://hgdownload.soe.ucsc.edu/goldenPath/hg19/liftOver/)
* [Genome reference](https://hgdownload.soe.ucsc.edu/goldenPath/hg19/bigZips/) used to call variant in the XML file

## Processing
* Submitted variant calling (XML) are converted into VCFs based on variant types (copy number alteration, rearrangement and short variant).
* [Picard:liftovervcf](https://gatk.broadinstitute.org/hc/en-us/articles/27007978536219-LiftoverVcf-Picard) is used to lift the variant calling to GRCh38 reference genome.

## Outputs
* [Raw SNV Calls](https://docs.icgc-argo.org/docs/data/variant-calls#raw-snv-calls) and [VCF Index](https://docs.icgc-argo.org/docs/data/variant-calls#vcf-index)
* [Raw Indel Calls](https://docs.icgc-argo.org/docs/data/variant-calls#raw-indel-calls) and [VCF Index](https://docs.icgc-argo.org/docs/data/variant-calls#vcf-index)
* [Raw SV Calls](https://docs.icgc-argo.org/docs/data/variant-calls#raw-sv-calls) and [VCF Index](https://docs.icgc-argo.org/docs/data/variant-calls#vcf-index)
* [Raw CNV Calls](https://docs.icgc-argo.org/docs/data/variant-calls#raw-cnv-calls) and [VCF Index](https://docs.icgc-argo.org/docs/data/variant-calls#vcf-index)

## Workflow Diagram

https://docs.google.com/drawings/d/1EfyRtN1mtX-iPNDvbdqTQLfvZ8iNPneGJZr-wS6U-dM/edit
14 changes: 10 additions & 4 deletions docs/data/variant-calls.md
Original file line number Diff line number Diff line change
Expand Up @@ -14,7 +14,7 @@ Data files containing Single Nucleotide Variations (SNV) called from aligned rea

| Filename Pattern | Description | Analysis Type | Data Category | Generating Workflow(s) |
| ---------------- | ------------------------------- | --------------- | --------------------------- | ----------------------------------------------------------------------------------------------------------------------- |
| \*.snv.vcf.gz | SNV results in VCF File format. | variant_calling | Simple Nucleotide Variation | <ul><li>Sanger WGS Variant Calling </li><li>Sanger WXS Variant Calling </li><li> GATK Mutect2 Variant Calling</li></ul> |
| \*.snv.vcf.gz | SNV results in VCF File format. | variant_calling | Simple Nucleotide Variation | <ul><li>Sanger WGS Variant Calling </li><li>Sanger WXS Variant Calling </li><li>GATK Mutect2 Variant Calling </li><li>XML Variant Ingestion </li></ul> |

## SNV Supplement

Expand All @@ -34,7 +34,7 @@ Data files containing Structural Variations (SV) called from aligned reads, whic

| Filename Pattern | Description | Analysis Type | Data Category | Generating Workflow(s) |
| ---------------- | ------------------------------ | --------------- | -------------------- | -------------------------- |
| \*.sv.vcf.gz | SV results in VCF File format. | variant_calling | Structural Variation | Sanger WGS Variant Calling |
| \*.sv.vcf.gz | SV results in VCF File format. | variant_calling | Structural Variation | <ul><li>Sanger WGS Variant Calling </li><li>XML Variant Ingestion </li></ul> |

## SV Supplement

Expand All @@ -54,7 +54,7 @@ Data files containing the Copy Number Varations (CNV) called from aligned reads,

| Filename Pattern | Description | Analysis Type | Data Category | Generating Workflow(s) |
| ---------------- | ------------------------------- | --------------- | --------------------- | -------------------------- |
| \*.cnv.vcf.gz | CNV results in VCF file format. | variant_calling | Copy Number Variation | Sanger WGS Variant Calling |
| \*.cnv.vcf.gz | CNV results in VCF file format. | variant_calling | Copy Number Variation | <ul><li>Sanger WGS Variant Calling </li><li>XML Variant Ingestion </li></ul> |

## CNV Supplement

Expand All @@ -70,6 +70,12 @@ Data files containing CNV results and temporary files generated by variant calle

Data files containing the simple Insertions and Deletions (InDel) data called from aligned reads, which have not yet been annotated but do have some filtering flags added.

#### File Types

| Filename Pattern | Description | Analysis Type | Data Category | Generating Workflow(s) |
| ---------------- | ------------------------------- | --------------- | --------------------- | -------------------------- |
| \*.indel.vcf.gz | CNV results in VCF file format. | variant_calling | Simple Nucleotide Variation | XML Variant Ingestion |

## InDel Supplement

Data files containing InDel results and temporary files generated by variant callers used in the ARGO Analysis pipeline.
Expand Down Expand Up @@ -114,4 +120,4 @@ Secondary files that are external index files for VCF format files. TBI files fo

| Filename Pattern | Description | Analysis Type | Data Category | Generating Workflow(s) |
| ---------------- | --------------------------------------------------------- | ------------------------------------------------------------ | -------------------------------------------------------------------------------------------------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| \*.vcf.gz.tbi | VCF Index file format. Requires a corresponding VCF file. | <ul><li>variant_calling</li><li>variant_processing</li></ul> | <ul><li>Simple Nucleotide Variation</li><li>Structural Variation</li><li>Copy Number Variation</li></ul> | <ul><li>Sanger WGS Variant Calling </li><li>Sanger WXS Variant Calling </li><li> GATK Mutect2 Variant Calling</li><li>Open Access Variant Filtering</li></ul> |
| \*.vcf.gz.tbi | VCF Index file format. Requires a corresponding VCF file. | <ul><li>variant_calling</li><li>variant_processing</li></ul> | <ul><li>Simple Nucleotide Variation</li><li>Structural Variation</li><li>Copy Number Variation</li></ul> | <ul><li>Sanger WGS Variant Calling </li><li>Sanger WXS Variant Calling </li><li> GATK Mutect2 Variant Calling</li><li>Open Access Variant Filtering</li><li>XML Variant Ingestion </li></ul> |
1 change: 1 addition & 0 deletions website/sidebars.js
Original file line number Diff line number Diff line change
Expand Up @@ -92,6 +92,7 @@ module.exports = {
'analysis-workflows/dna-sanger-wgs-vc',
'analysis-workflows/dna-sanger-wxs-vc',
'analysis-workflows/dna-gatk-mutect2-vc',
'analysis-workflows/xml-variant-ingestion',
'analysis-workflows/dna-open-access-filtering',
],
},
Expand Down