Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add mtbseq workflow #8

Merged
merged 16 commits into from
Oct 30, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 6 additions & 0 deletions assets/schema_input.json
Original file line number Diff line number Diff line change
Expand Up @@ -12,6 +12,12 @@
"pattern": "^\\S+$",
"errorMessage": "Sample name must be provided and cannot contain spaces",
"meta": ["id"]
},
"library": {
"type": "string",
"pattern": "^\\S+$",
"errorMessage": "Library preparation protocol name must be provided and cannot contain spaces",
"meta": ["library"]
},
"fastq_1": {
"type": "string",
Expand Down
73 changes: 73 additions & 0 deletions conf/mtbseq_base.config
Original file line number Diff line number Diff line change
@@ -0,0 +1,73 @@
params {
mtbseq_parallel = ""
mtbseq_only_qc = false
//-------------------------------------
// OPTIONS FROM MTBSEQ MANUAL
//-------------------------------------

// Hidden parameters
mtbseq_user = "root"
mtbseq_path = "MTBseq"


// Name of the project and the analysis
mtbseq_cohort_tsv = null
mtbseq_project = "mtbseqnf"

// Setting this OPTION will add an additional filter that excludes all variants except SNPs.
mtbseq_snp_vars = false

// Setting this OPTION has major implications on how the mapping data for each position is processed. By default, the majority allele is called and taken for further calculations.
// If the --lowfreq_vars OPTION is set, MTBseq will consider the majority allele distinct from wild type, if such an allele is present.
// This means that only in this detection mode, MTBseq will report variants present only in subpopulations, i.e. low frequency mutations.
// Of course, OPTIONS --mincovf, --mincovr, --minphred and --minfreq need to be set accordingly.
// Please be aware that output generated in this detection mode should not be used for phylogenetic analysis.
mtbseq_lowfreq_vars = false

// The OPTION sets a threshold for the sequence data quality to be used for the mpileup creation
mtbseq_minbqual = 13

// The OPTION sets a minimum forward read coverage threshold. Alleles must have a forward coverage of this VALUE or higher to be considered.
mtbseq_mincovf = 4

// The OPTION sets a minimum reverse read coverage threshold. Alleles must have a reverse coverage of this VALUE or higher to be considered.
mtbseq_mincovr = 4

// The OPTION sets a minimum number of reads indicating an allele with a phred score of at least 20.
mtbseq_minphred = 4

// The OPTION sets a minimum frequency for an allele.
mtbseq_minfreq = 75

// The option sets a minimum percentage of samples with unambiguous information for position.
mtbseq_unambig = 95

// The OPTION sets a window size in which the algorithm scans for the occurrence of multiple variants within the same sample.
// If more than one variant occurs within this window in the same sample, the positions will be excluded.
mtbseq_window = 12

// The OPTION sets a SNP distance that is used to classify samples into groups of samples, using agglomerative clustering.
// If SNP distances between samples are less or equal this VALUE, they are grouped together.
mtbseq_distance = 12

// This OPTION sets the reference genome for the read mapping.
mtbseq_ref_and_indexes_path = "${projectDir}/data/mtbseq-references/ref/"

// This OPTION sets a list of known variant positions associated to drug resistance for resistance prediction.
mtbseq_resilist = "${projectDir}/data/mtbseq-references/res/MTB_Resistance_Mediating.txt"

// This OPTION sets a list of interesting regions to be used for annotation of detected variants
mtbseq_intregions = "${projectDir}/data/mtbseq-references/res/MTB_Extended_Resistance_Mediating.txt"

// This OPTION specifies a gene categories file to annotate essential and non-essential genes as well as repetitive regions. SNPs in repetitive regions will be excluded for phylogenetic analysis.
mtbseq_categories = "${projectDir}/data/mtbseq-references/cat/MTB_Gene_Categories.txt"

// This OPTION specifies a file for base quality recalibration. The list must be in VCF format and should contain known SNPs.
mtbseq_basecalib = "${projectDir}/data/mtbseq-references/res/MTB_Base_Calibration_List.vcf"




}


28 changes: 28 additions & 0 deletions conf/mtbseq_parallel_test.config
Original file line number Diff line number Diff line change
@@ -0,0 +1,28 @@
process {
resourceLimits = [
cpus: 4,
memory: '8.GB',
time: '3.h'
]
}


params {
config_profile_name = 'MTBseq mode using parallel processing test profile'
config_profile_description = 'Minimal test to check if MTBseq mode with parallel processing is working as intended'


input = "${projectDir}/data/test_data/test_3_samples.csv"

mtbseq_cohort_tsv = "${projectDir}/data/test_data/test_3_cohort.tsv"

outdir = "test-profile-results"

mode = "mtbseq"

mtbseq_parallel = true


}


25 changes: 25 additions & 0 deletions conf/mtbseq_test.config
Original file line number Diff line number Diff line change
@@ -0,0 +1,25 @@
process {
resourceLimits = [
cpus: 4,
memory: '8.GB',
time: '3.h'
]
}


params {
config_profile_name = 'MTBseq Test profile'
config_profile_description = 'Minimal test to check if MTBseq mode is working as intended'

input = "${projectDir}/data/test_data/test_3_samples.csv"

mtbseq_cohort_tsv = "${projectDir}/data/test_data/test_3_cohort.tsv"

outdir = "mtbseq-test-profile-results"

mode = "mtbseq"


}


7 changes: 7 additions & 0 deletions data/mtbseq-references/Readme.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
## NOTE ABOUT THE REFERENCE FILES

These references are copied from `v1.0.3` of [MTBseq source tree](https://github.com/ngs-fzb/MTBseq_source/tree/v1.0.3/var).


These are copied here to be able to stage this files directly from the `projectDir` rather than depending upon the packaging option for `MTBseq` such as `conda` or `docker` container.

Loading
Loading