Skip to content

Commit

Permalink
Merge pull request #203 from nf-core/transcript_lengths
Browse files Browse the repository at this point in the history
Transcript lengths for DESeq2
  • Loading branch information
pinin4fjords authored Nov 22, 2023
2 parents 4587680 + 9bf5396 commit 0b9aeeb
Show file tree
Hide file tree
Showing 11 changed files with 113 additions and 40 deletions.
1 change: 1 addition & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,6 +7,7 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0

### `Added`

- [[#203](https://github.com/nf-core/differentialabundance/pull/203)] - Transcript lengths for DESeq2 ([@pinin4fjords](https://github.com/pinin4fjords), review by [@maxulysse](https://github.com/maxulysse))
- [[#193](https://github.com/nf-core/differentialabundance/pull/193)] - Add DESeq2 text to report ([@WackerO](https://github.com/WackerO), review by [@pinin4fjords](https://github.com/pinin4fjords))
- [[#192](https://github.com/nf-core/differentialabundance/pull/192)] - Add scree plot in report ([@WackerO](https://github.com/WackerO), review by [@pinin4fjords](https://github.com/pinin4fjords))
- [[#189](https://github.com/nf-core/differentialabundance/pull/189)] - Add DE models to report ([@WackerO](https://github.com/WackerO), review by [@pinin4fjords](https://github.com/pinin4fjords))
Expand Down
7 changes: 6 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -52,7 +52,12 @@ RNA-seq:
```

:::note
If you are using the outputs of the nf-core rnaseq workflow as input here, please use the **gene_counts_length_scaled.tsv** or **gene_counts_scaled.tsv** matrices. See the [usage documentation](https://nf-co.re/differentialabundance/usage) for more information.
If you are using the outputs of the nf-core rnaseq workflow as input here **either**:

- supply the raw count matrices (file names like **gene_counts.tsv**) alongide the transcript length matrix via `--transcript_length_matrix` (rnaseq versions >=3.12.0, preferred)
- **or** supply the **gene_counts_length_scaled.tsv** or **gene_counts_scaled.tsv** matrices.

See the [usage documentation](https://nf-co.re/differentialabundance/usage) for more information.
:::

Affymetrix microarray:
Expand Down
5 changes: 3 additions & 2 deletions conf/test.config
Original file line number Diff line number Diff line change
Expand Up @@ -25,8 +25,9 @@ params {
// Input data

input = 'https://raw.githubusercontent.com/nf-core/test-datasets/modules/data/genomics/mus_musculus/rnaseq_expression/SRP254919.samplesheet.csv'
matrix = 'https://raw.githubusercontent.com/nf-core/test-datasets/modules/data/genomics/mus_musculus/rnaseq_expression/SRP254919.salmon.merged.gene_counts.top1000cov.tsv'
contrasts = 'https://raw.githubusercontent.com/nf-core/test-datasets/modules/data/genomics/mus_musculus/rnaseq_expression/SRP254919.contrasts.csv'
matrix = 'https://raw.githubusercontent.com/nf-core/test-datasets/modules/data/genomics/mus_musculus/rnaseq_expression/SRP254919.salmon.merged.gene_counts.top1000cov.tsv'
transcript_length_matrix = 'https://raw.githubusercontent.com/nf-core/test-datasets/modules/data/genomics/mus_musculus/rnaseq_expression/SRP254919.spoofed_lengths.tsv'
contrasts = 'https://raw.githubusercontent.com/nf-core/test-datasets/modules/data/genomics/mus_musculus/rnaseq_expression/SRP254919.contrasts.csv'

// To do: replace this with a cut-down mouse GTF matching the matrix for testing
gtf = 'https://ftp.ensembl.org/pub/release-81/gtf/mus_musculus/Mus_musculus.GRCm38.81.gtf.gz'
Expand Down
21 changes: 15 additions & 6 deletions docs/usage.md
Original file line number Diff line number Diff line change
Expand Up @@ -69,17 +69,26 @@ This is a numeric square matrix file, comma or tab-separated, with a column for

#### Outputs from nf-core/rnaseq and other tximport-processed results

The nf-core rnaseq workflow uses [tximport](https://bioconductor.org/packages/release/bioc/html/tximport.html) to generate its quantification matrices. It does not currently output sufficient information to allow modelling of transcript length biases in differential analysis by this workflow, so we must use matrices per the second recommended approach in the [documentation](https://bioconductor.org/packages/release/bioc/vignettes/tximport/inst/doc/tximport.html#Downstream_DGE_in_Bioconductor):
The nf-core RNAseq workflow incorporates [tximport](https://bioconductor.org/packages/release/bioc/html/tximport.html) for producing quantification matrices. From [version 3.12.2](https://github.com/nf-core/rnaseq/releases/tag/3.13.2), it additionally provides transcript length matrices which can be directly consumed by DESeq2 to model length bias across samples.

> "The second method is to use the tximport argument countsFromAbundance="lengthScaledTPM" or "scaledTPM", and then to use the gene-level count matrix txi$counts directly as you would a regular count matrix with these software. Let’s call this method “bias corrected counts without an offset”"
To use this approach, include the transcript lengths file with the **raw counts**:

This corresponds to the **gene_counts_length_scaled.tsv** or **gene_counts_scaled.tsv** matrices, respectively, from the rnaseq workflow.
```bash
--matrix 'salmon.merged.gene_counts.tsv' \
--transcript_length_matrix 'salmon.merged.gene_lengths.tsv'
```

Without the transcript lengths, for instance in earlier rnaseq workflow versions, follow the second recommendation in the [tximport documentation](https://bioconductor.org/packages/release/bioc/vignettes/tximport/inst/doc/tximport.html#Downstream_DGE_in_Bioconductor):

> "Use the tximport argument countsFromAbundance='lengthScaledTPM' or 'scaledTPM', then employ the gene-level count matrix txi$counts directly in downstream software, a method we call 'bias corrected counts without an offset'"
This aligns with the **gene_counts_length_scaled.tsv** or **gene_counts_scaled.tsv** matrices in the rnaseq workflow.

Note that those documents also say:
Important to note, the documentation advises:

> "Note: Do not manually pass the original gene-level counts to downstream methods without an offset."
> "Do not manually pass the original gene-level counts to downstream methods without an offset."
This corresponds to the 'gene_counts.tsv' matrix, so we do not recomend this matrix is used as input for this workflow.
So we **do not recommend** raw counts files such as `salmon.merged.gene_counts.tsv` as input for this workflow **except** where the transcript lengths are also provided.

### MaxQuant intensities

Expand Down
2 changes: 1 addition & 1 deletion modules.json
Original file line number Diff line number Diff line change
Expand Up @@ -37,7 +37,7 @@
},
"deseq2/differential": {
"branch": "master",
"git_sha": "25047aa940979f64b31d19b94c483a9254263892",
"git_sha": "c992bd93b22a97feda5f04755511366c45423626",
"installed_by": ["modules"]
},
"geoquery/getgeo": {
Expand Down
3 changes: 2 additions & 1 deletion modules/nf-core/deseq2/differential/main.nf

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

27 changes: 16 additions & 11 deletions modules/nf-core/deseq2/differential/meta.yml

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

59 changes: 50 additions & 9 deletions modules/nf-core/deseq2/differential/templates/deseq_de.R

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

1 change: 1 addition & 0 deletions nextflow.config
Original file line number Diff line number Diff line change
Expand Up @@ -17,6 +17,7 @@ params {
contrasts = null
querygse = null
matrix = null
transcript_length_matrix = null
control_features = null
sizefactors_from_controls = false

Expand Down
20 changes: 13 additions & 7 deletions nextflow_schema.json
Original file line number Diff line number Diff line change
Expand Up @@ -76,6 +76,12 @@
"pattern": "^\\S+\\.(tsv|csv|txt)$",
"fa_icon": "fas fa-border-all"
},
"transcript_length_matrix": {
"type": "string",
"fa_icon": "fas fa-border-all",
"description": "(RNA-seq only): optional transcript length matrix with sample samples and genes as the abundance matrix",
"help_text": "if provided, this file willl be used to provide transcript lengths to DESeq2 to model lengh bias across samples"
},
"affy_cel_files_archive": {
"type": "string",
"default": "None",
Expand Down Expand Up @@ -294,7 +300,7 @@
},
"proteus_round_digits": {
"type": "number",
"default": -1,
"default": -1.0,
"description": "Number of decimals to round the MaxQuant intensities to; default: -1 (will not round)."
}
}
Expand All @@ -307,13 +313,13 @@
"properties": {
"filtering_min_abundance": {
"type": "number",
"default": 1,
"default": 1.0,
"description": "Minimum abundance value",
"fa_icon": "fas fa-compress-alt"
},
"filtering_min_samples": {
"type": "number",
"default": 1,
"default": 1.0,
"description": "Minimum observations that must pass the threshold to retain the row/ feature (e.g. gene).",
"fa_icon": "fas fa-compress-alt"
},
Expand Down Expand Up @@ -445,13 +451,13 @@
},
"differential_min_fold_change": {
"type": "number",
"default": 2,
"default": 2.0,
"description": "Minimum fold change used to calculate differential feature numbers",
"fa_icon": "fas fa-angle-double-down"
},
"differential_max_pval": {
"type": "number",
"default": 1,
"default": 1.0,
"description": "Maximum p value used to calculate differential feature numbers",
"fa_icon": "fas fa-angle-double-up"
},
Expand Down Expand Up @@ -709,7 +715,7 @@
},
"limma_p_value": {
"type": "number",
"default": 1,
"default": 1.0,
"description": "cutoff value for adjusted p-values. Only genes with lower p-values are listed.",
"fa_icon": "fas fa-angle-double-up"
}
Expand Down Expand Up @@ -955,7 +961,7 @@
},
"report_scree": {
"type": "boolean",
"default": "true",
"default": true,
"description": "Whether to generate a scree plot in the report"
}
},
Expand Down
7 changes: 5 additions & 2 deletions workflows/differentialabundance.nf
Original file line number Diff line number Diff line change
Expand Up @@ -64,6 +64,7 @@ if (params.study_type == 'affy_array'){
}

// Check optional parameters
if (params.transcript_length_matrix) { ch_transcript_lengths = Channel.of([ exp_meta, file(params.transcript_length_matrix, checkIfExists: true)]).first() } else { ch_transcript_lengths = [[],[]] }
if (params.control_features) { ch_control_features = Channel.of([ exp_meta, file(params.control_features, checkIfExists: true)]).first() } else { ch_control_features = [[],[]] }
if (params.gsea_run) {
if (params.gsea_gene_sets){
Expand Down Expand Up @@ -356,7 +357,8 @@ workflow DIFFERENTIALABUNDANCE {
DESEQ2_NORM (
ch_contrasts.first(),
ch_samples_and_matrix,
ch_control_features
ch_control_features,
ch_transcript_lengths
)

// Run the DESeq differential module, which doesn't take the feature
Expand All @@ -365,7 +367,8 @@ workflow DIFFERENTIALABUNDANCE {
DESEQ2_DIFFERENTIAL (
ch_contrasts,
ch_samples_and_matrix,
ch_control_features
ch_control_features,
ch_transcript_lengths
)

// Let's make the simplifying assumption that the processed matrices from
Expand Down

0 comments on commit 0b9aeeb

Please sign in to comment.