diff --git a/docs/user-guide/preparing-files.md b/docs/user-guide/preparing-files.md index 93be694..8ffdf58 100644 --- a/docs/user-guide/preparing-files.md +++ b/docs/user-guide/preparing-files.md @@ -1,8 +1,8 @@ -# 2. Preparing Files +# Preparing Files The pipeline is controlled through editing configuration and manifest files. Defaults are found in the /WORKDIR/config and /WORKDIR/manifest directories, after initialization. -## 2.1 Configs +## Configs The configuration files control parameters and software of the pipeline. These files are listed below: @@ -10,28 +10,28 @@ The configuration files control parameters and software of the pipeline. These f - resources/cluster.yaml - resources/tools.yaml -### 2.1.1 Cluster Config +### Cluster Config -The cluster configuration file dictates the resouces to be used during submission to Biowulf HPC. There are two differnt ways to control these parameters - first, to control the default settings, and second, to create or edit individual rules. These parameters should be edited with caution, after significant testing. +The cluster configuration file dictates the resources to be used during submission to Biowulf HPC. There are two different ways to control these parameters - first, to control the default settings, and second, to create or edit individual rules. These parameters should be edited with caution, after significant testing. -### 2.1.2 Tools Config +### Tools Config The tools configuration file dictates the version of each software or program that is being used in the pipeline. -### 2.1.3 Config YAML +### Config YAML There are several groups of parameters that are editable for the user to control the various aspects of the pipeline. These are : - Folders and Paths - - These parameters will include the input and ouput files of the pipeline, as well as list all manifest names. + - These parameters will include the input and output files of the pipeline, as well as list all manifest names. - User parameters - These parameters will control the pipeline features. These include thresholds and whether to perform processes. - References - These parameters will control the location of index files, spike-in references, adaptors and species calling information. -#### 2.1.3.1 User Parameters +#### User Parameters -##### 2.1.3.1.1 (Spike in Controls) +##### Spike in Controls The pipeline allows for the use of a species specific spike-in control, or the use of normalization via library size. The parameter `spikein_genome` should be set to the species term used in `spikein_reference`. @@ -62,11 +62,11 @@ spikein_reference: If it's determined that the amount of spike-in is not sufficient for the run, a library normaliaztion can be performed. 1. Complete a CARLISLE run with spike-in set to "Y". This will allow for the complete assessment of the spike-in. -2. Run inital QC analysis on the output data +2. Run initial QC analysis on the output data 3. Add the alignment_stats dir to the configuration file. 4. Re-run the CARLISLE pipeline -##### 2.1.3.1.2 Duplication Status +##### Duplication Status Users can select duplicated peaks (dedup) or non-deduplicated peaks (no_dedup) through the user parameter. @@ -74,7 +74,7 @@ Users can select duplicated peaks (dedup) or non-deduplicated peaks (no_dedup) t dupstatus: "dedup, no_dedup" ``` -##### 2.1.3.1.3 Peak Caller +##### Peak Caller Three peak callers are available for deployment within the pipeline, with different settings deployed for each caller. @@ -112,14 +112,14 @@ A complete list of the available peak calling parameters and the recommended lis ### peaktype: "macs2_narrow, macs2_broad, seacr_norm_stringent, seacr_norm_relaxed, seacr_non_stringent, seacr_non_relaxed, gopeaks_narrow, gopeaks_broad" ``` -##### 2.1.3.1.3.1 Macs2 additional option +##### Macs2 additional option MACS2 can be run with or without the control. adding a control will increase peak specificity Selecting "Y" for the `macs2_control` will run the paired control sample provided in the sample manifest -##### 2.1.3.1.4 Quality Tresholds +##### Quality Tresholds -Thresholds for quality can be controled through the `quality_tresholds` parameter. This must be a list of comma separated values. minimum of numeric value required. +Thresholds for quality can be controlled through the `quality_thresholds` parameter. This must be a list of comma separated values. minimum of numeric value required. - default MACS2 qvalue is 0.05 https://manpages.ubuntu.com/manpages/xenial/man1/macs2_callpeak.1.html - default GOPEAKS pvalue is 0.05 https://github.com/maxsonBraunLab/gopeaks/blob/main/README.md @@ -130,7 +130,7 @@ Thresholds for quality can be controled through the `quality_tresholds` paramete quality_thresholds: "0.1, 0.05, 0.01" ``` -#### 2.1.3.2 References +#### References Additional reference files may be added to the pipeline, if other species were to be used. @@ -144,14 +144,14 @@ The following information must be included: 1. regions: "list of regions to be included; IE chr1 chr2 chr3" 2. macs2_g: "macs2 genome shorthand; IE mm IE hs" -## 2.2 Preparing Manifests +## Preparing Manifests -There are two manifests, one which required for all pipeliens and one that is only required if running a differential analysis. These files describe information on the samples and desired contrasts. The paths of these files are defined in the snakemake_config.yaml file. These files are: +There are two manifests, one which required for all pipelines and one that is only required if running a differential analysis. These files describe information on the samples and desired contrasts. The paths of these files are defined in the snakemake_config.yaml file. These files are: - samplemanifest - contrasts -### 2.2.1 Samples Manifest (REQUIRED) +### Samples Manifest (REQUIRED) This manifest will include information to sample level information. It includes the following column headers: @@ -173,7 +173,7 @@ An example sampleManifest file is shown below: | HN6_H3K4me3 | 2 | N | HN6_IgG_rabbit_negative_control | 1 | PIPELINE_HOME/.test/HN6_H3K4me3_2.R1.fastq.gz | PIPELINE_HOME/.test/HN6_H3K4me3_2.R2.fastq.gz | | HN6_IgG_rabbit_negative_control | 1 | Y | - | - | PIPELINE_HOME/.test/HN6_IgG_rabbit_negative_control_1.R1.fastq.gz | PIPELINE_HOME/.test/HN6_IgG_rabbit_negative_control_1.R2.fastq.gz | -### 2.2.2 Contrast Manifest (OPTIONAL) +### Contrast Manifest (OPTIONAL) This manifest will include sample information to performed differential comparisons.