Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

update #228

Merged
merged 25 commits into from
Mar 13, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
25 commits
Select commit Hold shift + click to select a range
8e65b61
Merge pull request #216 from nf-core/update_3_1
yuukiiwa Feb 15, 2023
adc0832
Merge pull request #217 from nf-core/update_3_1
christopher-hakkaart Feb 17, 2023
2b824f5
fix custom config for multiqc
maxulysse Mar 3, 2023
75cc128
Merge pull request #219 from nf-core/maxulysse-patch-1
yuukiiwa Mar 7, 2023
d1666b5
Update CITATIONS.md
yuukiiwa Mar 8, 2023
0820143
Update CITATIONS.md
yuukiiwa Mar 8, 2023
5f4711e
Add back demultiplexing
yuukiiwa Mar 8, 2023
d939006
Merge pull request #221 from nf-core/update_3_1
yuukiiwa Mar 8, 2023
e8ad21b
update nanoseq version
yuukiiwa Mar 9, 2023
aa0f91f
Update CHANGELOG.md
yuukiiwa Mar 9, 2023
2fac88e
Update test.config
yuukiiwa Mar 9, 2023
6bb6694
Update CHANGELOG.md
yuukiiwa Mar 9, 2023
8bf6db4
Update CHANGELOG.md
yuukiiwa Mar 9, 2023
8f37ff5
Syntax formatting of run_deseq2.r
Mar 9, 2023
a49bd8c
Update CHANGELOG.md
yuukiiwa Mar 9, 2023
fb733b3
Add files via upload
yuukiiwa Mar 9, 2023
0748ea4
annotate .collate(12)
yuukiiwa Mar 9, 2023
7cd4211
Merge pull request #224 from nf-core/DSchreyer-patch-1
yuukiiwa Mar 9, 2023
2ef8d12
Merge pull request #223 from nf-core/update_3_1
yuukiiwa Mar 9, 2023
a1979c4
address suggestions from Daniel
yuukiiwa Mar 10, 2023
37293dc
leaving nf-core subworkflows as it is now
yuukiiwa Mar 10, 2023
5088a4b
Merge pull request #226 from nf-core/update_3_1
yuukiiwa Mar 10, 2023
3be2a78
small fix
yuukiiwa Mar 10, 2023
71fe49f
Merge pull request #227 from nf-core/update_3_1
yuukiiwa Mar 10, 2023
6e563e5
Merge pull request #222 from nf-core/dev
yuukiiwa Mar 10, 2023
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
28 changes: 28 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,34 @@
The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/)
and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).

## [3.1.0] - 2023-03-10

### Major enhancements

- Removed the `guppy` basecaller as distributing it via a docker image is a breach to EULA
- Bump minimum Nextflow version from 21.10.3 -> 21.10.3
- Update pipeline template to nf-core/tools `2.7.2`
- Update `bambu` version from `1.0.2` to `2.0.0`

### Parameters

- Removed `--flowcell` as `nanoseq` no longer supports basecalling
- Removed `--kit` as `nanoseq` no longer supports basecalling
- Removed `--guppy_config` as `nanoseq` no longer supports basecalling
- Removed `--guppy_model` as `nanoseq` no longer supports basecalling
- Removed `--guppy_gpu` as `nanoseq` no longer supports basecalling
- Removed `--guppy_gpu_runners` as `nanoseq` no longer supports basecalling
- Removed `--guppy_cpu_threads` as `nanoseq` no longer supports basecalling
- Removed `--output_demultiplex_fast5` as `nanoseq` no longer supports basecalling
- Removed `--skip_basecalling` as `nanoseq` no longer supports basecalling
- Removed `--skip_pycoqc` as `nanoseq` no longer supports basecalling

### Software dependencies

| Dependency | Old version | New version |
| -------------------- | ----------- | ----------- |
| `bioconductor-bambu` | 2.0.0 | 3.0.8 |

## [3.0.0] - 2022-06-21

### Major enhancements
Expand Down
12 changes: 10 additions & 2 deletions CITATIONS.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,12 +4,20 @@

> Ewels PA, Peltzer A, Fillinger S, Patel H, Alneberg J, Wilm A, Garcia MU, Di Tommaso P, Nahnsen S. The nf-core framework for community-curated bioinformatics pipelines. Nat Biotechnol. 2020 Mar;38(3):276-278. doi: 10.1038/s41587-020-0439-x. PubMed PMID: 32055031.

## [SGNEx](https://www.biorxiv.org/content/10.1101/2021.04.21.440736v1.abstract)

> Chen, Y., Davidson, N. M., Wan, Y. K., Patel, H., Yao, F., Low, H. M., ... & SG-NEx consortium. (2021). A systematic benchmark of Nanopore long read RNA sequencing for transcript level analysis in human cell lines. BioRxiv, 2021-04.

## [Nextflow](https://www.ncbi.nlm.nih.gov/pubmed/28398311/)

> Di Tommaso P, Chatzou M, Floden EW, Barja PP, Palumbo E, Notredame C. Nextflow enables reproducible computational workflows. Nat Biotechnol. 2017 Apr 11;35(4):316-319. doi: 10.1038/nbt.3820. PubMed PMID: 28398311.

## Pipeline tools

- [bambu](https://www.biorxiv.org/content/10.1101/2022.11.14.516358v2.abstract)

> Chen, Y., Sim, A. D., Wan, Y. K., Yeo, K., Lee, J. J. X., Ling, M. H., ... & Göke, J. (2022). Context-aware transcript quantification from long read RNA-seq data with Bambu. bioRxiv, 2022-11.

- [BEDTools](https://www.ncbi.nlm.nih.gov/pubmed/20110278/)

> Quinlan AR, Hall IM. BEDTools: a flexible suite of utilities for comparing genomic features. Bioinformatics. 2010 Mar 15;26(6):841-2. doi: 10.1093/bioinformatics/btq033. Epub 2010 Jan 28. PubMed PMID: 20110278; PubMed Central PMCID: PMC2832824.
Expand All @@ -34,9 +42,9 @@

> Davidson NM, Chen Y, Sadras T, Ryland GL, Blombery P, Ekert PG, Göke J, Oshlack A. JAFFAL: detecting fusion genes with long-read transcriptome sequencing. Genome Biol. 2022 Jan 6;23(1):10. doi: 10.1186/s13059-021-02588-5. PMID: 34991664; PMCID: PMC8739696.

- [m6anet](https://www.biorxiv.org/content/10.1101/2021.09.20.461055v1)
- [m6anet](https://pubmed.ncbi.nlm.nih.gov/36357692/)

> Hendra C, et al., Detection of m6A from direct RNA sequencing using a Multiple Instance Learning framework. bioRXiv (2021)
> Hendra, C., Pratanwanich, P. N., Wan, Y. K., Goh, W. S., Thiery, A., & Göke, J. (2022). Detection of m6A from direct RNA sequencing using a multiple instance learning framework. Nature Methods, 1-9. PMID: 36357692; PMCID: PMC9718678.

- [PEPPER-Margin-DeepVariant](https://pubmed.ncbi.nlm.nih.gov/34725481/)

Expand Down
15 changes: 8 additions & 7 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -23,24 +23,25 @@ On release, automated continuous integration tests run the pipeline on a [full-s

## Pipeline Summary

1. Raw read cleaning ([NanoLyse](https://github.com/wdecoster/nanolyse); _optional_)
2. Raw read QC ([`NanoPlot`](https://github.com/wdecoster/NanoPlot), [`FastQC`](http://www.bioinformatics.babraham.ac.uk/projects/fastqc/))
3. Alignment ([`GraphMap2`](https://github.com/lbcb-sci/graphmap2) or [`minimap2`](https://github.com/lh3/minimap2))
1. Demultiplexing ([`qcat`](https://github.com/nanoporetech/qcat); _optional_)
2. Raw read cleaning ([NanoLyse](https://github.com/wdecoster/nanolyse); _optional_)
3. Raw read QC ([`NanoPlot`](https://github.com/wdecoster/NanoPlot), [`FastQC`](http://www.bioinformatics.babraham.ac.uk/projects/fastqc/))
4. Alignment ([`GraphMap2`](https://github.com/lbcb-sci/graphmap2) or [`minimap2`](https://github.com/lh3/minimap2))
- Both aligners are capable of performing unspliced and spliced alignment. Sensible defaults will be applied automatically based on a combination of the input data and user-specified parameters
- Each sample can be mapped to its own reference genome if multiplexed in this way
- Convert SAM to co-ordinate sorted BAM and obtain mapping metrics ([`samtools`](http://www.htslib.org/doc/samtools.html))
4. Create bigWig ([`BEDTools`](https://github.com/arq5x/bedtools2/), [`bedGraphToBigWig`](http://hgdownload.soe.ucsc.edu/admin/exe/)) and bigBed ([`BEDTools`](https://github.com/arq5x/bedtools2/), [`bedToBigBed`](http://hgdownload.soe.ucsc.edu/admin/exe/)) coverage tracks for visualisation
5. DNA specific downstream analysis:
5. Create bigWig ([`BEDTools`](https://github.com/arq5x/bedtools2/), [`bedGraphToBigWig`](http://hgdownload.soe.ucsc.edu/admin/exe/)) and bigBed ([`BEDTools`](https://github.com/arq5x/bedtools2/), [`bedToBigBed`](http://hgdownload.soe.ucsc.edu/admin/exe/)) coverage tracks for visualisation
6. DNA specific downstream analysis:
- Short variant calling ([`medaka`](https://github.com/nanoporetech/medaka), [`deepvariant`](https://github.com/google/deepvariant), or [`pepper_margin_deepvariant`](https://github.com/kishwarshafin/pepper))
- Structural variant calling ([`sniffles`](https://github.com/fritzsedlazeck/Sniffles) or [`cutesv`](https://github.com/tjiangHIT/cuteSV))
6. RNA specific downstream analysis:
7. RNA specific downstream analysis:
- Transcript reconstruction and quantification ([`bambu`](https://bioconductor.org/packages/release/bioc/html/bambu.html) or [`StringTie2`](https://ccb.jhu.edu/software/stringtie/))
- bambu performs both transcript reconstruction and quantification
- When StringTie2 is chosen, each sample can be processed individually and combined. After which, [`featureCounts`](http://bioinf.wehi.edu.au/featureCounts/) will be used for both gene and transcript quantification.
- Differential expression analysis ([`DESeq2`](https://bioconductor.org/packages/release/bioc/html/DESeq2.html) and/or [`DEXSeq`](https://bioconductor.org/packages/release/bioc/html/DEXSeq.html))
- RNA modification detection ([`xpore`](https://github.com/GoekeLab/xpore) and/or [`m6anet`](https://github.com/GoekeLab/m6anet))
- RNA fusion detection ([`JAFFAL`](https://github.com/Oshlack/JAFFA))
7. Present QC for raw read and alignment results ([`MultiQC`](https://multiqc.info/docs/))
8. Present QC for raw read and alignment results ([`MultiQC`](https://multiqc.info/docs/))

### Functionality Overview

Expand Down
18 changes: 9 additions & 9 deletions bin/run_deseq2.r
Original file line number Diff line number Diff line change
Expand Up @@ -42,17 +42,17 @@ path <-args[2]

#create a dataframe for all samples
if (transcriptquant == "stringtie2"){
count.matrix <- data.frame(read.table(path,sep="\t",header=TRUE, skip = 1))
count.matrix <- data.frame(read.table(path, sep="\t", header=TRUE, skip = 1))
count.matrix$Chr <- count.matrix$Start <- count.matrix$End <- count.matrix$Length <- count.matrix$Strand <- NULL
colnames(count.matrix)[2:length(colnames(count.matrix))] <- unlist(lapply(strsplit(colnames(count.matrix)[2:length(colnames(count.matrix))],"\\."),"[[",1))
count.matrix <- aggregate(count.matrix[,-1],count.matrix["Geneid"],sum)
countTab <- count.matrix[,-1]
rownames(countTab) <-count.matrix[,1]
colnames(count.matrix)[2:length(colnames(count.matrix))] <- unlist(lapply(strsplit(colnames(count.matrix)[2:length(colnames(count.matrix))], "\\."), "[[", 1))
count.matrix <- aggregate(count.matrix[, -1],count.matrix["Geneid"],sum)
countTab <- count.matrix[, -1]
rownames(countTab) <-count.matrix[, 1]
}
if (transcriptquant == "bambu"){
countTab <- data.frame(read.table(path,sep="\t",header=TRUE,row.names = 1))
colnames(countTab) <- unlist(lapply(strsplit(colnames(countTab),"\\."),"[[",1))
countTab[,1:length(colnames(countTab))] <- sapply(countTab, as.integer)
countTab <- data.frame(read.table(path, sep="\t", header=TRUE, row.names = 1))
colnames(countTab) <- unlist(lapply(strsplit(colnames(countTab), "\\."), "[[", 1))
countTab[, 1:length(colnames(countTab))] <- sapply(countTab, as.integer)
}


Expand All @@ -66,7 +66,7 @@ sample <- colnames(countTab)
group <- sub("(^[^-]+)_.*", "\\1", sample)
sampInfo <- data.frame(group, row.names = sample)
if (!all(rownames(sampInfo) == colnames(countTab))){
sampInfo <- sampInfo[match(colnames(countTab), rownames(sampInfo)),]
sampInfo <- sampInfo[match(colnames(countTab), rownames(sampInfo)), ]
}

################################################
Expand Down
2 changes: 1 addition & 1 deletion conf/test.config
Original file line number Diff line number Diff line change
Expand Up @@ -11,7 +11,7 @@ params {
config_profile_name = 'Test profile'
config_profile_description = 'Minimal test dataset to check pipeline function'

// Limit resources so that this can run on Travis
// Limit resources
max_cpus = 2
max_memory = 6.GB
max_time = 12.h
Expand Down
2 changes: 1 addition & 1 deletion conf/test_nodx_noaln.config
Original file line number Diff line number Diff line change
Expand Up @@ -21,6 +21,6 @@ params {
protocol = 'directRNA'
skip_demultiplexing = true
skip_alignment = true
skip_fusion_analysis= true
skip_fusion_analysis = true
skip_modification_analysis=true
}
2 changes: 1 addition & 1 deletion modules/local/bambu.nf
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
process BAMBU {
label 'process_medium'

conda "conda-forge::r-base=4.0.3 bioconda::bioconductor-bambu=3.0.6 bioconda::bioconductor-bsgenome=1.66.0"
conda "conda-forge::r-base=4.0.3 bioconda::bioconductor-bambu=3.0.8 bioconda::bioconductor-bsgenome=1.66.0"
container "${ workflow.containerEngine == 'singularity' && !task.ext.singularity_pull_docker_container ?
'https://depot.galaxyproject.org/singularity/bioconductor-bambu:3.0.8--r42hc247a5b_0' :
'quay.io/biocontainers/bioconductor-bambu:3.0.8--r42hc247a5b_0' }"
Expand Down
2 changes: 1 addition & 1 deletion modules/local/multiqc.nf
Original file line number Diff line number Diff line change
Expand Up @@ -27,7 +27,7 @@ process MULTIQC {

script:
def args = task.ext.args ?: ''
def custom_config = params.multiqc_config ? "--config $multiqc_custom_config" : ''
def custom_config = params.multiqc_config ? "--config $ch_multiqc_custom_config" : ''
"""
multiqc \\
-f \\
Expand Down
2 changes: 1 addition & 1 deletion nextflow.config
Original file line number Diff line number Diff line change
Expand Up @@ -244,7 +244,7 @@ manifest {
description = """A pipeline to demultiplex, QC and map Nanopore data"""
mainScript = 'main.nf'
nextflowVersion = '!>=22.10.1'
version = '3.0.0'
version = '3.1.0'
doi = ''
}

Expand Down
2 changes: 1 addition & 1 deletion nextflow_schema.json
Original file line number Diff line number Diff line change
Expand Up @@ -26,7 +26,7 @@
"protocol": {
"type": "string",
"description": "Input sample type. Valid options: 'DNA', 'cDNA', and 'directRNA'.",
"format": "file-path",
"format": "sample-type",
"mimetype": "text/csv",
"schema": "assets/schema_input.json",
"help_text": "You will need to specify a protocol based on the sample input type. Valid options are 'DNA', 'cDNA', and 'directRNA'.",
Expand Down
2 changes: 1 addition & 1 deletion subworkflows/local/align_graphmap2.nf
Original file line number Diff line number Diff line change
Expand Up @@ -21,7 +21,7 @@ workflow ALIGN_GRAPHMAP2 {
ch_index
.cross(ch_fastq) { it -> it[-1] }
.flatten()
.collate(12)
.collate(12) // [fasta, fasta sizes, gtf, bed, fasta_index, annotation_string, meta, fastq, fasta, gtf, is_transcript, fasta_gtf_string]
.map { it -> [ it[6], it[7], it[0], it[1], it[2], it[3], it[10], it[4] ] } // [ sample, fastq, fasta, sizes, gtf, bed, is_transcripts, index ]
.set { ch_index }

Expand Down
Loading