-
Notifications
You must be signed in to change notification settings - Fork 27
Input
Because nextflow is flexible, this workflow is flexible in how input fastq files are specified.
Cecret can use a sample sheet for input with the sample name and reads separated by commas. The header must be sample,fastq_1,fastq_2
- even if using nanopore fastq files. The general rule is the identifier for the file(s), the file locations, and the type if not paired-end fastq files. This method is the recommended method for use-cases involving the cloud.
Rows match files with their processing needs.
- paired-end reads:
sample,read1.fastq.gz,read2.fastq.gz
- single-reads reads:
sample,sample.fastq.gz,single
- nanopore reads :
sample,sample.fastq.gz,ont
- fasta files:
sample,sample.fasta,fasta
Example sample sheet:
sample,fastq_1,fastq_2
SRR13957125,/home/eriny/sandbox/test_files/cecret/reads/SRR13957125_1.fastq.gz,/home/eriny/sandbox/test_files/cecret/reads/SRR13957125_2.fastq.gz
SRR13957170,/home/eriny/sandbox/test_files/cecret/reads/SRR13957170_1.fastq.gz,/home/eriny/sandbox/test_files/cecret/reads/SRR13957170_2.fastq.gz
SRR13957177S,/home/eriny/sandbox/test_files/cecret/single_reads/SRR13957177_1.fastq.gz,single
OQ255990.1,/home/eriny/sandbox/test_files/cecret/fastas/OQ255990.1.fasta,fasta
SRR22452244,/home/eriny/sandbox/test_files/cecret/nanopore/SRR22452244.fastq.gz,ont
Example usage with sample sheet using docker to manage containers
nextflow run UPHL-BioNGS/Cecret -profile docker --sample_sheet SampleSheet.csv
If using local computational resources, this workflow can read in files from directories. Paired-end Illumina files, single-end Illumina files, and a single file of nanopore reads should end with 'fastq', 'fastq.gz', 'fq', or 'fq.gz'. Fastas must end with '.fasta', '.fna', or '.fa'.
WARNING:
- Sometimes nextflow does not catch every name of paired-end fastq files. This workflow is meant to be fairly agnostic, but if paired-end fastq files are not being found it might be worth renaming them to some sort of
sample_1.fastq.gz
format or using a sample sheet. - Single and paired-end reads cannot be in the same directory
- Nanopore reads are not single-end Illumina reads
- Wildcards do not work well with AWS buckets, so it is recommended that those users use sample sheets.
These directories can be specified with a corresponding param
params.reads = <path to directory of paired-end Illumina reads>
params.single_reads = <path to directory of single-end Illumina reads>
params.nanopore = <path to directory of single-end nanopore reads>
params.fastas = <path to directory with fasta files>
More information about adjusting parameters can be found on the Params page of this wiki.
There are some default directories that Cecret will automatically look for when finding input fastq files.
For paired-end fastq files
directory
└── reads
└── *fastq.gz
For single-end fastq files
directory
└── single_reads
└── *fastq.gz
For nanopore fastq files
directory
└── nanopore
└── *fastq.gz
For fasta files
directory
└── fastas
└── *fasta