diff --git a/CHANGELOG.md b/CHANGELOG.md index be995280..ea4a0165 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -3,6 +3,34 @@ The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/) and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html). +## [3.1.0] - 2023-03-10 + +### Major enhancements + +- Removed the `guppy` basecaller as distributing it via a docker image is a breach to EULA +- Bump minimum Nextflow version from 21.10.3 -> 21.10.3 +- Update pipeline template to nf-core/tools `2.7.2` +- Update `bambu` version from `1.0.2` to `2.0.0` + +### Parameters + +- Removed `--flowcell` as `nanoseq` no longer supports basecalling +- Removed `--kit` as `nanoseq` no longer supports basecalling +- Removed `--guppy_config` as `nanoseq` no longer supports basecalling +- Removed `--guppy_model` as `nanoseq` no longer supports basecalling +- Removed `--guppy_gpu` as `nanoseq` no longer supports basecalling +- Removed `--guppy_gpu_runners` as `nanoseq` no longer supports basecalling +- Removed `--guppy_cpu_threads` as `nanoseq` no longer supports basecalling +- Removed `--output_demultiplex_fast5` as `nanoseq` no longer supports basecalling +- Removed `--skip_basecalling` as `nanoseq` no longer supports basecalling +- Removed `--skip_pycoqc` as `nanoseq` no longer supports basecalling + +### Software dependencies + +| Dependency | Old version | New version | +| -------------------- | ----------- | ----------- | +| `bioconductor-bambu` | 2.0.0 | 3.0.8 | + ## [3.0.0] - 2022-06-21 ### Major enhancements diff --git a/CITATIONS.md b/CITATIONS.md index 78902f81..aa9274ad 100644 --- a/CITATIONS.md +++ b/CITATIONS.md @@ -4,12 +4,20 @@ > Ewels PA, Peltzer A, Fillinger S, Patel H, Alneberg J, Wilm A, Garcia MU, Di Tommaso P, Nahnsen S. The nf-core framework for community-curated bioinformatics pipelines. Nat Biotechnol. 2020 Mar;38(3):276-278. doi: 10.1038/s41587-020-0439-x. PubMed PMID: 32055031. +## [SGNEx](https://www.biorxiv.org/content/10.1101/2021.04.21.440736v1.abstract) + +> Chen, Y., Davidson, N. M., Wan, Y. K., Patel, H., Yao, F., Low, H. M., ... & SG-NEx consortium. (2021). A systematic benchmark of Nanopore long read RNA sequencing for transcript level analysis in human cell lines. BioRxiv, 2021-04. + ## [Nextflow](https://www.ncbi.nlm.nih.gov/pubmed/28398311/) > Di Tommaso P, Chatzou M, Floden EW, Barja PP, Palumbo E, Notredame C. Nextflow enables reproducible computational workflows. Nat Biotechnol. 2017 Apr 11;35(4):316-319. doi: 10.1038/nbt.3820. PubMed PMID: 28398311. ## Pipeline tools +- [bambu](https://www.biorxiv.org/content/10.1101/2022.11.14.516358v2.abstract) + + > Chen, Y., Sim, A. D., Wan, Y. K., Yeo, K., Lee, J. J. X., Ling, M. H., ... & Göke, J. (2022). Context-aware transcript quantification from long read RNA-seq data with Bambu. bioRxiv, 2022-11. + - [BEDTools](https://www.ncbi.nlm.nih.gov/pubmed/20110278/) > Quinlan AR, Hall IM. BEDTools: a flexible suite of utilities for comparing genomic features. Bioinformatics. 2010 Mar 15;26(6):841-2. doi: 10.1093/bioinformatics/btq033. Epub 2010 Jan 28. PubMed PMID: 20110278; PubMed Central PMCID: PMC2832824. @@ -34,9 +42,9 @@ > Davidson NM, Chen Y, Sadras T, Ryland GL, Blombery P, Ekert PG, Göke J, Oshlack A. JAFFAL: detecting fusion genes with long-read transcriptome sequencing. Genome Biol. 2022 Jan 6;23(1):10. doi: 10.1186/s13059-021-02588-5. PMID: 34991664; PMCID: PMC8739696. -- [m6anet](https://www.biorxiv.org/content/10.1101/2021.09.20.461055v1) +- [m6anet](https://pubmed.ncbi.nlm.nih.gov/36357692/) - > Hendra C, et al., Detection of m6A from direct RNA sequencing using a Multiple Instance Learning framework. bioRXiv (2021) + > Hendra, C., Pratanwanich, P. N., Wan, Y. K., Goh, W. S., Thiery, A., & Göke, J. (2022). Detection of m6A from direct RNA sequencing using a multiple instance learning framework. Nature Methods, 1-9. PMID: 36357692; PMCID: PMC9718678. - [PEPPER-Margin-DeepVariant](https://pubmed.ncbi.nlm.nih.gov/34725481/) diff --git a/README.md b/README.md index 7cff5363..800f2e93 100644 --- a/README.md +++ b/README.md @@ -23,24 +23,25 @@ On release, automated continuous integration tests run the pipeline on a [full-s ## Pipeline Summary -1. Raw read cleaning ([NanoLyse](https://github.com/wdecoster/nanolyse); _optional_) -2. Raw read QC ([`NanoPlot`](https://github.com/wdecoster/NanoPlot), [`FastQC`](http://www.bioinformatics.babraham.ac.uk/projects/fastqc/)) -3. Alignment ([`GraphMap2`](https://github.com/lbcb-sci/graphmap2) or [`minimap2`](https://github.com/lh3/minimap2)) +1. Demultiplexing ([`qcat`](https://github.com/nanoporetech/qcat); _optional_) +2. Raw read cleaning ([NanoLyse](https://github.com/wdecoster/nanolyse); _optional_) +3. Raw read QC ([`NanoPlot`](https://github.com/wdecoster/NanoPlot), [`FastQC`](http://www.bioinformatics.babraham.ac.uk/projects/fastqc/)) +4. Alignment ([`GraphMap2`](https://github.com/lbcb-sci/graphmap2) or [`minimap2`](https://github.com/lh3/minimap2)) - Both aligners are capable of performing unspliced and spliced alignment. Sensible defaults will be applied automatically based on a combination of the input data and user-specified parameters - Each sample can be mapped to its own reference genome if multiplexed in this way - Convert SAM to co-ordinate sorted BAM and obtain mapping metrics ([`samtools`](http://www.htslib.org/doc/samtools.html)) -4. Create bigWig ([`BEDTools`](https://github.com/arq5x/bedtools2/), [`bedGraphToBigWig`](http://hgdownload.soe.ucsc.edu/admin/exe/)) and bigBed ([`BEDTools`](https://github.com/arq5x/bedtools2/), [`bedToBigBed`](http://hgdownload.soe.ucsc.edu/admin/exe/)) coverage tracks for visualisation -5. DNA specific downstream analysis: +5. Create bigWig ([`BEDTools`](https://github.com/arq5x/bedtools2/), [`bedGraphToBigWig`](http://hgdownload.soe.ucsc.edu/admin/exe/)) and bigBed ([`BEDTools`](https://github.com/arq5x/bedtools2/), [`bedToBigBed`](http://hgdownload.soe.ucsc.edu/admin/exe/)) coverage tracks for visualisation +6. DNA specific downstream analysis: - Short variant calling ([`medaka`](https://github.com/nanoporetech/medaka), [`deepvariant`](https://github.com/google/deepvariant), or [`pepper_margin_deepvariant`](https://github.com/kishwarshafin/pepper)) - Structural variant calling ([`sniffles`](https://github.com/fritzsedlazeck/Sniffles) or [`cutesv`](https://github.com/tjiangHIT/cuteSV)) -6. RNA specific downstream analysis: +7. RNA specific downstream analysis: - Transcript reconstruction and quantification ([`bambu`](https://bioconductor.org/packages/release/bioc/html/bambu.html) or [`StringTie2`](https://ccb.jhu.edu/software/stringtie/)) - bambu performs both transcript reconstruction and quantification - When StringTie2 is chosen, each sample can be processed individually and combined. After which, [`featureCounts`](http://bioinf.wehi.edu.au/featureCounts/) will be used for both gene and transcript quantification. - Differential expression analysis ([`DESeq2`](https://bioconductor.org/packages/release/bioc/html/DESeq2.html) and/or [`DEXSeq`](https://bioconductor.org/packages/release/bioc/html/DEXSeq.html)) - RNA modification detection ([`xpore`](https://github.com/GoekeLab/xpore) and/or [`m6anet`](https://github.com/GoekeLab/m6anet)) - RNA fusion detection ([`JAFFAL`](https://github.com/Oshlack/JAFFA)) -7. Present QC for raw read and alignment results ([`MultiQC`](https://multiqc.info/docs/)) +8. Present QC for raw read and alignment results ([`MultiQC`](https://multiqc.info/docs/)) ### Functionality Overview diff --git a/bin/run_deseq2.r b/bin/run_deseq2.r index f35904fd..7b04d9dd 100755 --- a/bin/run_deseq2.r +++ b/bin/run_deseq2.r @@ -42,17 +42,17 @@ path <-args[2] #create a dataframe for all samples if (transcriptquant == "stringtie2"){ - count.matrix <- data.frame(read.table(path,sep="\t",header=TRUE, skip = 1)) + count.matrix <- data.frame(read.table(path, sep="\t", header=TRUE, skip = 1)) count.matrix$Chr <- count.matrix$Start <- count.matrix$End <- count.matrix$Length <- count.matrix$Strand <- NULL - colnames(count.matrix)[2:length(colnames(count.matrix))] <- unlist(lapply(strsplit(colnames(count.matrix)[2:length(colnames(count.matrix))],"\\."),"[[",1)) - count.matrix <- aggregate(count.matrix[,-1],count.matrix["Geneid"],sum) - countTab <- count.matrix[,-1] - rownames(countTab) <-count.matrix[,1] + colnames(count.matrix)[2:length(colnames(count.matrix))] <- unlist(lapply(strsplit(colnames(count.matrix)[2:length(colnames(count.matrix))], "\\."), "[[", 1)) + count.matrix <- aggregate(count.matrix[, -1],count.matrix["Geneid"],sum) + countTab <- count.matrix[, -1] + rownames(countTab) <-count.matrix[, 1] } if (transcriptquant == "bambu"){ - countTab <- data.frame(read.table(path,sep="\t",header=TRUE,row.names = 1)) - colnames(countTab) <- unlist(lapply(strsplit(colnames(countTab),"\\."),"[[",1)) - countTab[,1:length(colnames(countTab))] <- sapply(countTab, as.integer) + countTab <- data.frame(read.table(path, sep="\t", header=TRUE, row.names = 1)) + colnames(countTab) <- unlist(lapply(strsplit(colnames(countTab), "\\."), "[[", 1)) + countTab[, 1:length(colnames(countTab))] <- sapply(countTab, as.integer) } @@ -66,7 +66,7 @@ sample <- colnames(countTab) group <- sub("(^[^-]+)_.*", "\\1", sample) sampInfo <- data.frame(group, row.names = sample) if (!all(rownames(sampInfo) == colnames(countTab))){ - sampInfo <- sampInfo[match(colnames(countTab), rownames(sampInfo)),] + sampInfo <- sampInfo[match(colnames(countTab), rownames(sampInfo)), ] } ################################################ diff --git a/conf/test.config b/conf/test.config index 1e7d8f8c..83c5c1bb 100644 --- a/conf/test.config +++ b/conf/test.config @@ -11,7 +11,7 @@ params { config_profile_name = 'Test profile' config_profile_description = 'Minimal test dataset to check pipeline function' - // Limit resources so that this can run on Travis + // Limit resources max_cpus = 2 max_memory = 6.GB max_time = 12.h diff --git a/conf/test_nodx_noaln.config b/conf/test_nodx_noaln.config index eed15dd4..c42e2309 100644 --- a/conf/test_nodx_noaln.config +++ b/conf/test_nodx_noaln.config @@ -21,6 +21,6 @@ params { protocol = 'directRNA' skip_demultiplexing = true skip_alignment = true - skip_fusion_analysis= true + skip_fusion_analysis = true skip_modification_analysis=true } diff --git a/modules/local/bambu.nf b/modules/local/bambu.nf index 95d46b9a..777b0e11 100644 --- a/modules/local/bambu.nf +++ b/modules/local/bambu.nf @@ -1,7 +1,7 @@ process BAMBU { label 'process_medium' - conda "conda-forge::r-base=4.0.3 bioconda::bioconductor-bambu=3.0.6 bioconda::bioconductor-bsgenome=1.66.0" + conda "conda-forge::r-base=4.0.3 bioconda::bioconductor-bambu=3.0.8 bioconda::bioconductor-bsgenome=1.66.0" container "${ workflow.containerEngine == 'singularity' && !task.ext.singularity_pull_docker_container ? 'https://depot.galaxyproject.org/singularity/bioconductor-bambu:3.0.8--r42hc247a5b_0' : 'quay.io/biocontainers/bioconductor-bambu:3.0.8--r42hc247a5b_0' }" diff --git a/modules/local/multiqc.nf b/modules/local/multiqc.nf index ef2fb27d..c5ccad28 100644 --- a/modules/local/multiqc.nf +++ b/modules/local/multiqc.nf @@ -27,7 +27,7 @@ process MULTIQC { script: def args = task.ext.args ?: '' - def custom_config = params.multiqc_config ? "--config $multiqc_custom_config" : '' + def custom_config = params.multiqc_config ? "--config $ch_multiqc_custom_config" : '' """ multiqc \\ -f \\ diff --git a/nextflow.config b/nextflow.config index 0341d8ed..11963c22 100644 --- a/nextflow.config +++ b/nextflow.config @@ -244,7 +244,7 @@ manifest { description = """A pipeline to demultiplex, QC and map Nanopore data""" mainScript = 'main.nf' nextflowVersion = '!>=22.10.1' - version = '3.0.0' + version = '3.1.0' doi = '' } diff --git a/nextflow_schema.json b/nextflow_schema.json index 5e716045..6dbb8b4e 100644 --- a/nextflow_schema.json +++ b/nextflow_schema.json @@ -26,7 +26,7 @@ "protocol": { "type": "string", "description": "Input sample type. Valid options: 'DNA', 'cDNA', and 'directRNA'.", - "format": "file-path", + "format": "sample-type", "mimetype": "text/csv", "schema": "assets/schema_input.json", "help_text": "You will need to specify a protocol based on the sample input type. Valid options are 'DNA', 'cDNA', and 'directRNA'.", diff --git a/subworkflows/local/align_graphmap2.nf b/subworkflows/local/align_graphmap2.nf index 75db5921..50909436 100644 --- a/subworkflows/local/align_graphmap2.nf +++ b/subworkflows/local/align_graphmap2.nf @@ -21,7 +21,7 @@ workflow ALIGN_GRAPHMAP2 { ch_index .cross(ch_fastq) { it -> it[-1] } .flatten() - .collate(12) + .collate(12) // [fasta, fasta sizes, gtf, bed, fasta_index, annotation_string, meta, fastq, fasta, gtf, is_transcript, fasta_gtf_string] .map { it -> [ it[6], it[7], it[0], it[1], it[2], it[3], it[10], it[4] ] } // [ sample, fastq, fasta, sizes, gtf, bed, is_transcripts, index ] .set { ch_index }