Skip to content

Commit

Permalink
Merge pull request #451 from NCI-GDC/bioinfo_note
Browse files Browse the repository at this point in the history
Bioinfo note
  • Loading branch information
wwysoc2 authored Jan 30, 2025
2 parents fa1c5af + 0970f09 commit 411c2be
Show file tree
Hide file tree
Showing 2 changed files with 3 additions and 4 deletions.
1 change: 0 additions & 1 deletion docs/Data/Bioinformatics_Pipelines/DNA_Seq_WGS.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,6 @@ Variant calls from Whole Genome Sequencing (WGS) data are produced using pipelin
* [Manta](https://github.com/Illumina/manta): Structural variants, which are available in both VCF and BEDPE format.
* Additionally, Manta generates a set of candidate indels, which are subsequently used as input for the [Strelka](https://github.com/Illumina/strelka) SSM calling pipeline to enhance Strelka's coverage across all indel sizes. Note that the Manta candidate indel file is not released on the GDC data portal.
* [Strelka](https://github.com/Illumina/strelka): Simple nucleotide variants, including both point mutations and Indels, which are available in VCF format.
* [VarScan2](https://dkoboldt.github.io/varscan/): Simple nucleotide variants, including both point mutations and Indels, which are available in VCF format. This is the same tool that is used in GDC WXS somatic variant calling.
* [SvABA](https://github.com/walaj/svaba): Indel variants only, which are available in VCF format, and structural variants, which are available in both VCF and BEDPE format.
* [GATK4 MuTect2](https://gatk.broadinstitute.org): Simple nucleotide variants, including both point mutations and Indels, which are available in VCF format.
* [GATK4 CNV](https://gatk.broadinstitute.org): Copy number segments, which are available in TXT format.
Expand Down
6 changes: 3 additions & 3 deletions docs/Data/Release_Notes/Data_Release_Notes.md
Original file line number Diff line number Diff line change
Expand Up @@ -103,6 +103,7 @@ A complete list of files included in the GDC Data Portal can be found below:

### Known Issues and Workarounds

* Some VCF headers from SvABA list the names of the BAM files they originated from instead of "NORMAL" and "TUMOR", in that order.
* The slide image viewer does not display for any non-TCGA slides. At this time, these slides will need to be downloaded and viewed locally. Additionally, the slide image viewer does not display properly for 14 TCGA slides, which are identified [here](missing_tiling.txt).
* 397 alignments from the TCGA program were found to have contamination values over 0.04 ([alignment list](Contaminated_Alignments.dr32.tsv)). The ensemble MAFs produced by these alignments were removed from the Data Portal.
* One methylation aliquot from the TCGA-COAD project, TCGA-D5-6930-01A-11D-1926-05, was not added to the portal and will be added in a future release.
Expand All @@ -115,10 +116,9 @@ A complete list of files included in the GDC Data Portal can be found below:
* TCGA Projects
* 74 Diagnostic TCGA slides are attached to a portion rather than a sample like the rest of the diagnostic slides. The reflects how these original samples were handled. <!--SV-1111-->
* Two tissue slide images are unavailable for download from GDC Data Portal <!--DAT-1439-->

* Some TCGA annotations are unavailable in the Data Portal<!--DAT-52-->. These annotations can be found [here](tcga-annotations-unavailable-20170315.json).



## Data Release 41.0

Expand Down

0 comments on commit 411c2be

Please sign in to comment.