Skip to content
Young edited this page Oct 18, 2024 · 3 revisions

Inputs

Because nextflow is flexible, this workflow is flexible in how input fastq files are specified.

Using a sample sheet

Cecret can use a sample sheet for input with the sample name and reads separated by commas. The header must be sample,fastq_1,fastq_2 - even if using nanopore fastq files. The general rule is the identifier for the file(s), the file locations, and the type if not paired-end fastq files. This method is the recommended method for use-cases involving the cloud.

Rows match files with their processing needs.

  • paired-end reads: sample,read1.fastq.gz,read2.fastq.gz
  • single-reads reads: sample,sample.fastq.gz,single
  • nanopore reads : sample,sample.fastq.gz,ont
  • fasta files: sample,sample.fasta,fasta

Example sample sheet:

sample,fastq_1,fastq_2
SRR13957125,/home/eriny/sandbox/test_files/cecret/reads/SRR13957125_1.fastq.gz,/home/eriny/sandbox/test_files/cecret/reads/SRR13957125_2.fastq.gz
SRR13957170,/home/eriny/sandbox/test_files/cecret/reads/SRR13957170_1.fastq.gz,/home/eriny/sandbox/test_files/cecret/reads/SRR13957170_2.fastq.gz
SRR13957177S,/home/eriny/sandbox/test_files/cecret/single_reads/SRR13957177_1.fastq.gz,single
OQ255990.1,/home/eriny/sandbox/test_files/cecret/fastas/OQ255990.1.fasta,fasta
SRR22452244,/home/eriny/sandbox/test_files/cecret/nanopore/SRR22452244.fastq.gz,ont

Example usage with sample sheet using docker to manage containers

nextflow run UPHL-BioNGS/Cecret -profile docker --sample_sheet SampleSheet.csv

Files from directories

If using local computational resources, this workflow can read in files from directories. Paired-end Illumina files, single-end Illumina files, and a single file of nanopore reads should end with 'fastq', 'fastq.gz', 'fq', or 'fq.gz'. Fastas must end with '.fasta', '.fna', or '.fa'.

WARNING:

  • Sometimes nextflow does not catch every name of paired-end fastq files. This workflow is meant to be fairly agnostic, but if paired-end fastq files are not being found it might be worth renaming them to some sort of sample_1.fastq.gz format or using a sample sheet.
  • Single and paired-end reads cannot be in the same directory
  • Nanopore reads are not single-end Illumina reads
  • Wildcards do not work well with AWS buckets, so it is recommended that those users use sample sheets.

These directories can be specified with a corresponding param

params.reads = <path to directory of paired-end Illumina reads>
params.single_reads = <path to directory of single-end Illumina reads>
params.nanopore = <path to directory of single-end nanopore reads>
params.fastas = <path to directory with fasta files>

More information about adjusting parameters can be found on the Params page of this wiki.

Default directories

There are some default directories that Cecret will automatically look for when finding input fastq files.

For paired-end fastq files

directory
└── reads
     └── *fastq.gz

For single-end fastq files

directory
└── single_reads
     └── *fastq.gz

For nanopore fastq files

directory
└── nanopore
     └── *fastq.gz

For fasta files

directory
└── fastas
     └── *fasta