Skip to content
Young edited this page Dec 26, 2023 · 7 revisions

Usage

This workflkow does not run without input files, and there are multiple ways to specify which input files should be used

# using singularity on paired-end reads in a directory called 'reads'
nextflow run UPHL-BioNGS/Cecret -profile singularity --reads <directory to reads>

# using docker on samples specified in SampleSheet.csv
nextflow run UPHL-BioNGS/Cecret -profile docker --sample_sheet SampleSheet.csv

# using a config file containing all inputs
nextflow run UPHL-BioNGS/Cecret -c file.config

More information can be found on the input fastq files page of this wiki.

Results

Results are roughly organized into 'params.outdir'/< analysis >/sample.result

A file summarizing all results is found in 'params.outdir'/cecret_results.csv and 'params.outdir'/cecret_results.txt.

Consensus sequences can be found in 'params.outdir'/consensus and end with *.consensus.fa.

More information can be found on the results page of this wiki.

Determining primer and amplicon bedfiles

The default primer scheme of the 'Cecret' workflow is the 'V4' primer scheme developed by artic network for SARS-CoV-2. Releases prior to and including '2.2.20211221' used the 'V3' primer scheme as the default. As many public health laboratories are still using 'V3', the 'V3' files are still in this repo, but now the 'V4', 'V4.1' ('V4' with a spike-in of additional primers), and 'V5.3.2' are also included. The original primer and amplicon bedfiles can be found at artic's github repo.

Setting primers with a parameter on the command line (these can also be defined in a config file)

# using artic V3 primers
nextflow run UPHL-BioNGS/Cecret -profile singularity --primer_set 'ncov_V3'

# using artic V4 primers
nextflow run UPHL-BioNGS/Cecret -profile singularity --primer_set 'ncov_V4'

# using artic V4.1 primers
nextflow run UPHL-BioNGS/Cecret -profile singularity --primer_set 'ncov_V4.1'

# using artic V5.3.2 primers
nextflow run UPHL-BioNGS/Cecret -profile singularity --primer_set 'ncov_V5.3.2'

Some "Midnight" primers are also included and can be set with midnight_idt_V1, midnight_ont_V1, midnight_ont_V2, midnight_ont_V3.

It is still possible to set 'params.primer_bed' and 'params.amplicon_bed' via the command line or in a config file with the path to the corresponding file.

Using the included nextclade dataset

It has been requested by some of our more sofisticated colleagues to include a way to upload a nextclade dataset separately. We expect that is mostly for cloud usage. To accomadate this, there is now a sars.zip file in data with a nextclade dataset. To use this included dataset, params.download_nextclade_dataset must be set to false in either the command line of in a config file.

nextflow run UPHL-BioNGS/Cecret -profile singularity --sample_sheet input.csv --download_nextclade_dataset false

This included dataset, however, will only be as current as Cecret's maintainers are able to upload it. There is a Github actions that should attempt to update the nextclade dataset every Tuesday, but this still has be merged and go through testing. The end user can also create a nextclade dataset, and then feed that into this workflow with params.predownloaded_nextclade_dataset.

To create the nextclade dataset with nextclade

nextclade dataset get --name sars-cov-2 --output-zip sars.zip

To use with Cecret

nextflow run UPHL-BioNGS/Cecret -profile singularity --sample_sheet input.csv --download_nextclade_dataset false --predownloaded_nextclade_dataset sars.zip

Or the corresponding params can be set in a config file.

Determining CPU usage

For the sake of simplicity, processes in this workflow are designated 1 CPU, a medium amount of CPUs (5), or the largest amount of CPUs (the number of CPUs of the environment launching the workflow if using the main workflow and a simple config file or 8 if using profiles and the config template). The medium amount of CPUs can be adjusted by the End User by adjusting 'params.medcpus', the largest amount can be adjusted with 'params.maxcpus', or the cpus can be specified for each process individually in a config file.

The End User can adjust this by specifying the maximum cpus that one process can take in the config file 'params.maxcpus = <new value>' or on the command line

nextflow run UPHL-BioNGS/Cecret -profile singularity --maxcpus <new value>

It is important to remember that nextflow will attempt to utilize all CPUs available, and this value is restricted to one process. As a specific example, the prcoess 'bwa' will be allocated 'params.maxcpus'. If there are 48 CPUs available and 'params.maxcpus = 8', then 6 samples can be run simultaneously.

Determining depth for base calls

Sequencing has an intrinsic amount of error for every predicted base on a read. This error is reduced the more reads there are. As such, there is a minimum amount of depth that is required to call a base with ivar consensus, ivar variants, and bcftools variants. The main assumption of using this workflow is that the virus is clonal (i.e. only one infection represented in a sample) and created via pcr amplified libraries. The default depth for calling bases or finding variants is set with 'params.minimum_depth' with the default value being 'params.minimum_depth = 100'. This parameter can be adjusted by the END USER in a config file or on the command line.

A corresponding parameter is 'params.mpileup_depth' (default of 'params.mpileup_depth = 8000'), which is the number of reads that samtools (used by ivar) or bcftools uses to put into memory for any given position. If the END USER is experiencing memory issues, this number may need to be decreased.

Determining if duplicates should be taken into account

For library preparation methods with baits followed by PCR amplification, it is recommended to remove duplicate reads. For most other methods, removing deplicates will generally not harm anything. To remove duplicates, set the 'params.markdup' to true. This removes duplicate reads from the aligned sam file, which is before the primer trimming and after the filter processes. This will likely enable a lower minimum depth for variant calling (default is 100).

On the command line:

nextflow run UPHL-BioNGS/Cecret -profile singularity --markdup true --minimum_depth 10

In a config file:

params.markdup = true
params.minimum_depth = 10

Other library prep methods

There are amplicon-based methods, bait, and amplicon-bait hybrid library preparation methods which increases the portion of reads for a relevant organism. If there is a common preparation for the End User, please submit an issue, and we can create a profile or config file. Remember that the bedfiles for the primer schemes and amplicons MUST match the reference.

Updating Cecret

nextflow pull UPHL-BioNGS/Cecret