Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: set intermediate and final output files #129

Merged
merged 23 commits into from
Dec 8, 2023
Merged
Show file tree
Hide file tree
Changes from 14 commits
Commits
Show all changes
23 commits
Select commit Hold shift + click to select a range
701ab06
test: add --notemp to test script
deliaBlue Nov 19, 2023
c2ba640
refactor: set intermediate files as temp
deliaBlue Nov 19, 2023
2d0fe9d
merge with dev
deliaBlue Nov 21, 2023
0ccb2e0
refactor: start tmp files
deliaBlue Nov 29, 2023
2f2fb15
refactor: start tmp files
deliaBlue Nov 29, 2023
0eadf76
ci: update paths for expected output
deliaBlue Nov 30, 2023
06fcfa3
refactor: change intermediate files to tmp dir
deliaBlue Nov 30, 2023
bc52303
test: add --no-hooks CLI option
deliaBlue Nov 30, 2023
b162f13
docs: update rule graph
deliaBlue Nov 30, 2023
93316f3
test: restore expected output
deliaBlue Nov 30, 2023
35953db
docs: add expected output files section
deliaBlue Dec 2, 2023
d3e9b1b
build: rename temporary directory
deliaBlue Dec 2, 2023
41174d0
test: update expected output with new tmp dir name
deliaBlue Dec 2, 2023
a487f7b
style: format to pass snakefmt test
deliaBlue Dec 2, 2023
dee7347
Merge branch 'dev' into 86-cleanorder-final-output-files-in-snakefile
deliaBlue Dec 3, 2023
bda1754
test: update uncollapsed sam dir
deliaBlue Dec 4, 2023
9d7b822
refactor: remove uncollapsed sam form final output
deliaBlue Dec 4, 2023
d0b274f
refactor: remove uncollapsed sam from final output
deliaBlue Dec 4, 2023
f6fc812
refactor: change intermediates directory
deliaBlue Dec 6, 2023
c41d03f
build: change intermediates directory
deliaBlue Dec 6, 2023
d299225
test: update intermediates directory
deliaBlue Dec 6, 2023
29d6f1f
docs: rewrite output files section
deliaBlue Dec 6, 2023
d086215
change logs dir
deliaBlue Dec 8, 2023
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
45 changes: 45 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -12,6 +12,7 @@ _MIRFLOWZ_ is a [Snakemake][snakemake] workflow for mapping miRNAs and isomiRs.
2. [Usage](#usage)
- [Preparing inputs](#preparing-inputs)
- [Running the workflow](#running-the-workflow)
- [Expected output files](#expected-output-files)
- [Creating a Snakemake report](#creating-a-snakemake-report)
3. [Workflow description](#workflow-description)
4. [Contributing](#contributing)
Expand Down Expand Up @@ -251,6 +252,50 @@ snakemake \
After successful execution of the workflow, results and logs will be found in
the `results/` and `logs/` directories, respectively.

### Expected output files
deliaBlue marked this conversation as resolved.
Show resolved Hide resolved

Upon successful execution of _MIRFLOWZ_, the tool automatically removes all
intermediate files generated during the process. The final output comprises:
deliaBlue marked this conversation as resolved.
Show resolved Hide resolved

1. A SAM file containing alignments intersecting a pri-miR locus. These
alignments intersect with extended start and/or end positions specified in the
provided pri-miR annotations. Please note that they may not contribute to the
final counting and will not appear in the final table.
deliaBlue marked this conversation as resolved.
Show resolved Hide resolved

2. A SAM file containing alignments intersecting a miRNA locus. Similar to the
deliaBlue marked this conversation as resolved.
Show resolved Hide resolved
previous file, these alignments intersect with extended start and/or end
positions specified in the provided miRNA annotations. They may not contribute
to the final counting and might be absent from the final table.

3. A SAM file containing the uncollapsed set of alignments that contribute to
the final counting.

deliaBlue marked this conversation as resolved.
Show resolved Hide resolved
4. A BAM file containing the uncollapsed set of alignments contributing to the
deliaBlue marked this conversation as resolved.
Show resolved Hide resolved
final counting and its corresponding index file (`bam..bai`).

5. Table(s) containing the counting data from all libraries for (iso)miRs
and/or pri-miRs. Each row corresponds to a miRNA species, and each column
represents a sample library. Counting involves aggregating contributions from
all alignments, calculated as the ratio of collapsed reads in th alignment to
the number of hits (NH value).
deliaBlue marked this conversation as resolved.
Show resolved Hide resolved

To retain all intermediate files, include --no-hooks in the workflow call.

```bash
snakemake \
--snakefile="path/to/Snakefile" \
--cores 4 \
--configfile="path/to/config.yaml" \
--use-conda \
--printshellcmds \
--rerun-incomplete \
--no-hooks \
--verbose
```

After successful execution of the workflow, the intermediate files will be
found in the `results/inter_files` directory.
deliaBlue marked this conversation as resolved.
Show resolved Hide resolved

### Creating a Snakemake report

Snakemake provides the option to generate a detailed HTML report on runtime
Expand Down
5 changes: 5 additions & 0 deletions config/config_schema.json
Original file line number Diff line number Diff line change
Expand Up @@ -30,6 +30,11 @@
"default": "results/",
"description": "Path to the output directory."
},
"tmp_dir":{
"type": "string",
"default": "results/inter_files",
"description": "Path to the temporary directory storing the intermediate files."
deliaBlue marked this conversation as resolved.
Show resolved Hide resolved
},
deliaBlue marked this conversation as resolved.
Show resolved Hide resolved
"local_log":{
"type": "string",
"default": "logs/local/",
Expand Down
1 change: 1 addition & 0 deletions config/config_template.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -32,6 +32,7 @@ map_chr_file: path/to/ucsc_ensembl_mappings.tsv
#### DIRECTORIES ####

output_dir: results/
tmp_dir: results/inter_files
local_log: logs/local/
cluster_log: logs/cluster/
scripts_dir: ../scripts/
Expand Down
760 changes: 374 additions & 386 deletions images/rule_graph.svg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
110 changes: 55 additions & 55 deletions test/expected_output.md5
Original file line number Diff line number Diff line change
@@ -1,58 +1,58 @@
68f943f89b52d628851dd97fb1399d68 results/TABLES/all_mirna_counts.tab
eec9be6cda61d2728290c92c1209f455 results/TABLES/mirna_counts_test_lib
363ecee318c57ee7e2e45ca468007baa results/TABLES/all_pri-mir_counts.tab
a844e3a29159e36e2f17a0646d1e8c5f results/TABLES/pri-mir_counts_test_lib
0d76977b2e36046cc176112776c5fa4e results/test_lib/alignments_intersecting_mirna_uncollapsed_sorted.bam.bai
36f7d024fe6ddfd3e788aebf61c61061 results/test_lib/oligomap_genome_sorted.fasta
48e605df55bf2dd37ea5a5a74eb5872a results/test_lib/mappings_all.sam
d41d8cd98f00b204e9800998ecf8427e results/test_lib/oligomap_transcriptome_mappings.fasta
eea903fc0ab81054cf8e34193f80f4a7 results/test_lib/mappings_all_removed_inferiors.sam
98498ac521f451426a9dbabcbecb5f25 results/test_lib/alignments_intersecting_primir.bam
defdc8c46e1d73692edde0e0278f2d5e results/test_lib/oligomap_genome_mappings.fasta
1649738f226e8979d4d88a3ae47fa423 results/test_lib/segemehl_transcriptome_mappings.sam
9ecee9ab80daba0a53076b05c9f6ff53 results/test_lib/alignments_intersecting_mirna_uncollapsed_sorted.bam
1649738f226e8979d4d88a3ae47fa423 results/test_lib/transcriptome_mappings_filtered_nh.sam
8e22ddfa7c39ce7e4ec5945dff1576ef results/test_lib/alignments_all.bam
a124a5afdb5f7bfbcc5683260556c9c4 results/test_lib/mappings_all_no_header.sam
dd00dea3549dc1ad14f9e1505d397de5 results/test_lib/alignments_all.sam
8c24d619073f4c5ca1f439fe429d0ef4 results/test_lib/alignments_intersecting_mirna_tag.sam
d41d8cd98f00b204e9800998ecf8427e results/test_lib/oligomap_transcriptome_sorted.fasta
c218718d93f48e5987fc18b33dc488f0 results/test_lib/segemehl_genome_mappings.sam
d41d8cd98f00b204e9800998ecf8427e results/test_lib/transcriptome_mappings_to_genome.sam
63a32839360a985b68e0685aafad5c54 results/test_lib/fa/reads.fa
5cc557ec2073144f47fe28ac145f4869 results/test_lib/alignments_intersecting_mirna_uncollapsed.sam
edcb854702519c0002d8ce89a21e54ef results/test_lib/reads_formatted.fasta
1a547487b8e92ad85bb26ff9b1db1f93 results/test_lib/intersected_extended_mirna.bed
721071f3ead528aa71978508db8d73f9 results/test_lib/alignments_all_sorted_test_lib.bam
ec0e9bcc8ea857da897035c8fca4078f results/test_lib/reads_trimmed_adapters.fasta
bbfc27c84b66ff41bfeee73f701b4b29 results/test_lib/alignments_intersecting_mirna_uncollapsed.bam
81bed7fc879f7a16c12d2ba912263c46 results/test_lib/alignments_intersecting_mirna.sam
dd560414078330bf3138f039da109093 results/test_lib/genome_mappings.sam
f5cb65466d328036a15b66cfbd4d8419 results/test_lib/oligomap_genome_report.txt
6cbdb9299e09b3e39b79a50db69226b5 results/test_lib/transcriptome_mappings_no_header.sam
1649738f226e8979d4d88a3ae47fa423 results/test_lib/transcriptome_mappings.sam
947607be69c16246f8dc9adbd9b971c8 results/test_lib/oligomap_genome_mappings.sam
9833208a79143eaf3f2a5fdeca0b2d94 results/test_lib/alignments_intersecting_mirna_sorted_tag.sam
02096523b293082629d5b895085468a3 results/test_lib/alignments_intersecting_primir_sorted.bam
d41d8cd98f00b204e9800998ecf8427e results/test_lib/oligomap_transcriptome_mappings.sam
a124a5afdb5f7bfbcc5683260556c9c4 results/test_lib/genome_mappings_no_header.sam
dd560414078330bf3138f039da109093 results/test_lib/genome_mappings_filtered_nh.sam
ae4c4963ca2cd206952b2ea2c58301dd results/test_lib/mappings_all_sorted_by_id.sam
2c77ffa021dda190d82f3f54a3312393 results/test_lib/reads_collapsed.fasta
f68693cfaa1e6ea78e1a5562ade6d9ed results/test_lib/intersected_extended_primir.bed
61f12595db9421926073d6675f7c3c42 results/test_lib/alignments_intersecting_primir.sam
c2a5770a755ada66ef63d96eec4afb00 results/test_lib/reads_filtered_for_oligomap.fasta
fe5388094985e9604a302d39d2abc82c results/test_lib/oligomap_transcriptome_report.txt
be7a0d92e57480190de57eb30baffa36 results/extended_mirna_annotation_6_nt.gff3
8148cd880602255be166beb59bbed95a results/genome_header.sam
09e24a504bfec37fee3d5ff1b5c7738e results/exons.bed
4fb453846e88593d0cac13220ec2d685 results/segemehl_genome_index.idx
d34fc868b861b1bc46db07a397dc0f10 results/genome_processed.fa.fai
21e102e4ebd3508bb06f46366a3d578d results/exons.gtf
003b92b245ac336e3d70a513033e1cee results/transcriptome_trimmed_id.fa
44dbf7c3eae00d0bc8d5e1319123746c results/chr_size.txt
cc5c3512dab0e269d82bd625de74198e results/extended_primir_annotation_6_nt.gff3
f28cc0143ab6659bef3de3a7afa1dccc results/mirna_annotations.gff3
2d437f8681f4248d4f2075f86debb920 results/transcriptome.fa
7eb64c112830266bcf416ded60b4cf77 results/segemehl_transcriptome_index.idx
4fba145540a2c61f29bfddfd0f5a4d4e results/genome_processed.fa
f91c144e491e447a50369a67220a832f results/test_lib/alignments_intersecting_mirna_uncollapsed_sorted.bam
a8b1a66aecf4d7b583362ea8619228ed results/test_lib/alignments_intersecting_mirna_uncollapsed.sam
9f0bad0ed3c62d0410060d8b332315e8 results/test_lib/alignments_intersecting_mirna.sam
4ae56cdb8de0fbaac24b4a49d356f7f8 results/test_lib/alignments_intersecting_primir.sam
eec9be6cda61d2728290c92c1209f455 results/inter_files/TABLES/mirna_counts_test_lib
a844e3a29159e36e2f17a0646d1e8c5f results/inter_files/TABLES/pri-mir_counts_test_lib
36f7d024fe6ddfd3e788aebf61c61061 results/inter_files/test_lib/oligomap_genome_sorted.fasta
48e605df55bf2dd37ea5a5a74eb5872a results/inter_files/test_lib/mappings_all.sam
d41d8cd98f00b204e9800998ecf8427e results/inter_files/test_lib/oligomap_transcriptome_mappings.fasta
f54bacf9bf4188541a0c0fedc203e3ed results/inter_files/test_lib/mappings_all_removed_inferiors.sam
4b86be9b7ed15ddc0067b8de4aad431c results/inter_files/test_lib/alignments_intersecting_primir.bam
defdc8c46e1d73692edde0e0278f2d5e results/inter_files/test_lib/oligomap_genome_mappings.fasta
3aca095999e737c5d9cdb66540e8b195 results/inter_files/test_lib/segemehl_transcriptome_mappings.sam
3aca095999e737c5d9cdb66540e8b195 results/inter_files/test_lib/transcriptome_mappings_filtered_nh.sam
698711937e6d98dd65b70b3a738388b4 results/inter_files/test_lib/alignments_all.bam
a124a5afdb5f7bfbcc5683260556c9c4 results/inter_files/test_lib/mappings_all_no_header.sam
cb542d2dd6b4405d690086de0bb5ec70 results/inter_files/test_lib/alignments_all.sam
d8ab74abfa3ed2b2a92c83142af1c638 results/inter_files/test_lib/alignments_intersecting_mirna_tag.sam
d41d8cd98f00b204e9800998ecf8427e results/inter_files/test_lib/oligomap_transcriptome_sorted.fasta
f34a0091f633db03a940d0c790ad265a results/inter_files/test_lib/segemehl_genome_mappings.sam
d41d8cd98f00b204e9800998ecf8427e results/inter_files/test_lib/transcriptome_mappings_to_genome.sam
63a32839360a985b68e0685aafad5c54 results/inter_files/test_lib/fa/reads.fa
edcb854702519c0002d8ce89a21e54ef results/inter_files/test_lib/reads_formatted.fasta
1a547487b8e92ad85bb26ff9b1db1f93 results/inter_files/test_lib/intersected_extended_mirna.bed
a71a2dd39c82baee52d5dbe2e3a39457 results/inter_files/test_lib/alignments_all_sorted_test_lib.bam
ec0e9bcc8ea857da897035c8fca4078f results/inter_files/test_lib/reads_trimmed_adapters.fasta
acf1608593f39294e0137069f6351058 results/inter_files/test_lib/alignments_intersecting_mirna_uncollapsed.bam
0454bc9f3edd9348a7b3e08d9c3007d8 results/inter_files/test_lib/genome_mappings.sam
f5cb65466d328036a15b66cfbd4d8419 results/inter_files/test_lib/oligomap_genome_report.txt
6cbdb9299e09b3e39b79a50db69226b5 results/inter_files/test_lib/transcriptome_mappings_no_header.sam
3aca095999e737c5d9cdb66540e8b195 results/inter_files/test_lib/transcriptome_mappings.sam
947607be69c16246f8dc9adbd9b971c8 results/inter_files/test_lib/oligomap_genome_mappings.sam
fa14b33623fd12b068a6d4ae301e7f49 results/inter_files/test_lib/alignments_intersecting_mirna_sorted_tag.sam
b6de7f5615b4b05834f4af11df993345 results/inter_files/test_lib/alignments_intersecting_primir_sorted.bam
d41d8cd98f00b204e9800998ecf8427e results/inter_files/test_lib/oligomap_transcriptome_mappings.sam
a124a5afdb5f7bfbcc5683260556c9c4 results/inter_files/test_lib/genome_mappings_no_header.sam
0454bc9f3edd9348a7b3e08d9c3007d8 results/inter_files/test_lib/genome_mappings_filtered_nh.sam
09c89a2769c919e58c3a3d3cbe2ceaf6 results/inter_files/test_lib/mappings_all_sorted_by_id.sam
2c77ffa021dda190d82f3f54a3312393 results/inter_files/test_lib/reads_collapsed.fasta
f68693cfaa1e6ea78e1a5562ade6d9ed results/inter_files/test_lib/intersected_extended_primir.bed
c2a5770a755ada66ef63d96eec4afb00 results/inter_files/test_lib/reads_filtered_for_oligomap.fasta
fe5388094985e9604a302d39d2abc82c results/inter_files/test_lib/oligomap_transcriptome_report.txt
be7a0d92e57480190de57eb30baffa36 results/inter_files/extended_mirna_annotation_6_nt.gff3
8148cd880602255be166beb59bbed95a results/inter_files/genome_header.sam
09e24a504bfec37fee3d5ff1b5c7738e results/inter_files/exons.bed
4fb453846e88593d0cac13220ec2d685 results/inter_files/segemehl_genome_index.idx
d34fc868b861b1bc46db07a397dc0f10 results/inter_files/genome_processed.fa.fai
21e102e4ebd3508bb06f46366a3d578d results/inter_files/exons.gtf
003b92b245ac336e3d70a513033e1cee results/inter_files/transcriptome_trimmed_id.fa
44dbf7c3eae00d0bc8d5e1319123746c results/inter_files/chr_size.txt
cc5c3512dab0e269d82bd625de74198e results/inter_files/extended_primir_annotation_6_nt.gff3
f28cc0143ab6659bef3de3a7afa1dccc results/inter_files/mirna_annotations.gff3
2d437f8681f4248d4f2075f86debb920 results/inter_files/transcriptome.fa
7eb64c112830266bcf416ded60b4cf77 results/inter_files/segemehl_transcriptome_index.idx
4fba145540a2c61f29bfddfd0f5a4d4e results/inter_files/genome_processed.fa
2 changes: 1 addition & 1 deletion test/test_workflow_local_with_conda.sh
Original file line number Diff line number Diff line change
Expand Up @@ -25,9 +25,9 @@ snakemake \
--use-conda \
--printshellcmds \
--rerun-incomplete \
--no-hooks \
--verbose


# Snakemake report
snakemake \
--snakefile="../workflow/Snakefile" \
Expand Down
2 changes: 1 addition & 1 deletion test/test_workflow_local_with_singularity.sh
Original file line number Diff line number Diff line change
Expand Up @@ -26,9 +26,9 @@ snakemake \
--singularity-args "--bind ${PWD}/../" \
--printshellcmds \
--rerun-incomplete \
--no-hooks \
--verbose


# Snakemake report
snakemake \
--snakefile="../workflow/Snakefile" \
Expand Down
1 change: 1 addition & 0 deletions test/test_workflow_slurm_with_conda.sh
Original file line number Diff line number Diff line change
Expand Up @@ -41,6 +41,7 @@ snakemake \
--use-conda \
--printshellcmds \
--rerun-incomplete \
--no-hooks \
--verbose

# Snakemake report
Expand Down
1 change: 1 addition & 0 deletions test/test_workflow_slurm_with_singularity.sh
Original file line number Diff line number Diff line change
Expand Up @@ -42,6 +42,7 @@ snakemake \
--singularity-args="--bind ${PWD}/../" \
--printshellcmds \
--rerun-incomplete \
--no-hooks \
--verbose

# Snakemake report
Expand Down
25 changes: 21 additions & 4 deletions workflow/Snakefile
Original file line number Diff line number Diff line change
Expand Up @@ -31,6 +31,23 @@ validate(config, Path("../config/config_schema.json"))


OUT_DIR = Path(config["output_dir"])
TMP_DIR = Path(config["tmp_dir"])
LOG_DIR = Path(f"{config['local_log']}/../")


###############################################################################
### onSuccess/onError handlers configuration
###############################################################################
deliaBlue marked this conversation as resolved.
Show resolved Hide resolved


onsuccess:
print("\nWORKFLOW SUCCEED. Removing intermediate files.\n")
shell("rm -rf {TMP_DIR}")


onerror:
print("\nWORKFLOW FAILED. Check the log file in the LOGS/ directory.\n")
deliaBlue marked this conversation as resolved.
Show resolved Hide resolved
shell("cat {log} > {LOG_DIR}/failed_workflow.log")


###############################################################################
Expand Down Expand Up @@ -67,14 +84,14 @@ rule finish:
OUT_DIR / "{sample}" / "alignments_intersecting_mirna.sam",
sample=pd.unique(samples_table.index.values),
),
intersect_sam=expand(
OUT_DIR / "{sample}" / "alignments_intersecting_mirna_sorted_tag.sam",
sample=pd.unique(samples_table.index.values),
),
table=expand(
OUT_DIR / "TABLES" / "all_{mir}_counts.tab",
mir=[mir for mir in config["mir_list"] if mir != "isomir"],
),
uncollapsed_sam=expand(
OUT_DIR / "{sample}" / "alignments_intersecting_mirna_uncollapsed.sam",
sample=pd.unique(samples_table.index.values),
),
uncollapsed_bam=expand(
OUT_DIR
/ "{sample}"
Expand Down
Loading
Loading