The Mfd Protein is the Transcription-Repair Coupling Factor (TRCF) in Mycobacterium Smegmatis

In vitro and in vivo experiments with Escherichia coli have shown that the Mfd translocase is responsible for transcription-coupled repair which is defined as the faster rate of repair of the transcribed strand than the non-transcribed strand by nucleotide excision repair. Even though the mfd gene is conserved in all bacterial lineages, there is only limited information on whether it performs the same function in other bacterial species. Here, by genome scale analysis of repair of UV-induced cyclobutene dimers we find that the Mfd protein is the Transcription-Repair Coupling Factor (TRCF) in Mycobacterium smegmatis. This finding, combined with the inverted strandedness of UV-induced mutations in wild-type and mfd- Escherichia coli and Bacillus subtilis indicate that the Mfd protein is the universal TRCF in bacteria.

This repository contains the data analysis workflow.

Installation

This workflow is prepared using Snakemake workflow management system and conda
To run the workflow, you should have conda installed for environment management. All the other packages including Snakemake and their dependencies can be obtained automatically through environments prepared for each step of the workflow. You can follow the installation steps from the link.

Initially, you should clone the repository and navigate into the directory:

git clone https://github.com/CompGenomeLab/MycoSmegmatis_TRCF.git
    
cd MycoSmegmatis_TRCF

Next, you should create a conda environment with the defined packages. Install mamba and create the environment using mamba:

conda install -c conda-forge mamba

mamba create -c bioconda -c conda-forge -c r -n repair snakemake=6.3.0 python=3.8 rust=1.50 sra-tools=2.11.0

conda activate repair

Directory Structure

This workflow is prepared according to the structure recommended by Snakemake:

config/: contains the configuration files.
logs/: contains the log files of each step. This folder will automatically appear when you run the workflow.
report/: contains the description files of figures, which will be used in reports.
resources/: contains samples/ where the raw XR-seq and Damage-seq data are stored, input/ where the input files are stored, and ref_genomes/ where the reference genome files are stored. Reference genome files can be automatically produced by the workflows, if they are properly defined in the config files.
results/: contains the generated files and figures. This folder will automatically appear when you run the workflow.
workflow/: contains envs/ where the environments are stored, rules/ where the Snakemake rules are stored, and scripts/ where the scripts used inside the rules are stored.

Configuration file

The configuration file with "initial" prefix shouldn't be modified by the user since they are containing configuration settings that are common for all XR-seq experiments. For more detail about these configuration files, check out the readme file in config/ directory. The parameters for "config.yaml" as below:

sample: The name of the sample file w/o the extension. Multiple sample names can be given in the below format:
```
sample: 
    - "SAMPLE_1"
    - "SAMPLE_2"
    - "SAMPLE_3"
```
- Using the given sample name, the workflow will look for {SAMPLE}.fastq.gz as raw data. Therefore, the fastq file must be gzipped before running the workflow.
damage_type: Damage type of each sample should be provided here in the same order of the samples:
```
damage_type: 
    - "64"
    - "CPD"
    - "oxaliplatin"
```
- Currently damages below are available can be provided as (case-insensitive):
  - (6-4)PP: 64, 64pp, (6-4)pp, 6-4pp;
  - CPD: CPD;
  - Cisplatin: cisplatin;
  - Oxaliplatin: oxaliplatin.

Usage

After adjusting the configuration file, you can run the workflow from this directory.

This workflow runs on Slurm Workload Manager](https://slurm.schedmd.com/srun.html)

```
source al.sh
rna 
```

After rna pipeline is completed you should run

```
source al.sh
smm 
```

To generate detailed HTML report files, the code below should be run after workflow:

snakemake --report report.zip

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
config		config
logs/cluster		logs/cluster
report		report
resources/ref_genomes		resources/ref_genomes
results/figs		results/figs
workflow		workflow
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
al.sh		al.sh
myco.smk		myco.smk

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

The Mfd Protein is the Transcription-Repair Coupling Factor (TRCF) in Mycobacterium Smegmatis

Installation

Directory Structure

Configuration file

Usage

About

Releases

Packages

Languages

License

CompGenomeLab/MycoSmegmatis_TRCF

Folders and files

Latest commit

History

Repository files navigation

The Mfd Protein is the Transcription-Repair Coupling Factor (TRCF) in Mycobacterium Smegmatis

Installation

Directory Structure

Configuration file

Usage

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages