Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: set intermediate and final output files #129

Merged
merged 23 commits into from
Dec 8, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
23 commits
Select commit Hold shift + click to select a range
701ab06
test: add --notemp to test script
deliaBlue Nov 19, 2023
c2ba640
refactor: set intermediate files as temp
deliaBlue Nov 19, 2023
2d0fe9d
merge with dev
deliaBlue Nov 21, 2023
0ccb2e0
refactor: start tmp files
deliaBlue Nov 29, 2023
2f2fb15
refactor: start tmp files
deliaBlue Nov 29, 2023
0eadf76
ci: update paths for expected output
deliaBlue Nov 30, 2023
06fcfa3
refactor: change intermediate files to tmp dir
deliaBlue Nov 30, 2023
bc52303
test: add --no-hooks CLI option
deliaBlue Nov 30, 2023
b162f13
docs: update rule graph
deliaBlue Nov 30, 2023
93316f3
test: restore expected output
deliaBlue Nov 30, 2023
35953db
docs: add expected output files section
deliaBlue Dec 2, 2023
d3e9b1b
build: rename temporary directory
deliaBlue Dec 2, 2023
41174d0
test: update expected output with new tmp dir name
deliaBlue Dec 2, 2023
a487f7b
style: format to pass snakefmt test
deliaBlue Dec 2, 2023
dee7347
Merge branch 'dev' into 86-cleanorder-final-output-files-in-snakefile
deliaBlue Dec 3, 2023
bda1754
test: update uncollapsed sam dir
deliaBlue Dec 4, 2023
9d7b822
refactor: remove uncollapsed sam form final output
deliaBlue Dec 4, 2023
d0b274f
refactor: remove uncollapsed sam from final output
deliaBlue Dec 4, 2023
f6fc812
refactor: change intermediates directory
deliaBlue Dec 6, 2023
c41d03f
build: change intermediates directory
deliaBlue Dec 6, 2023
d299225
test: update intermediates directory
deliaBlue Dec 6, 2023
29d6f1f
docs: rewrite output files section
deliaBlue Dec 6, 2023
d086215
change logs dir
deliaBlue Dec 8, 2023
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
46 changes: 45 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -12,6 +12,7 @@ _MIRFLOWZ_ is a [Snakemake][snakemake] workflow for mapping miRNAs and isomiRs.
2. [Usage](#usage)
- [Preparing inputs](#preparing-inputs)
- [Running the workflow](#running-the-workflow)
- [Expected output files](#expected-output-files)
- [Creating a Snakemake report](#creating-a-snakemake-report)
3. [Workflow description](#workflow-description)
4. [Contributing](#contributing)
Expand Down Expand Up @@ -219,7 +220,7 @@ We recommend creating a copy of the

```bash
cp config/config_template.yaml path/to/config.yaml
```
``` So on that PR I could move this information in the section/file all of this will be written.

Open the new copy in your editor of choice and adjust the configuration
parameters to your liking. The template explains what each of the
Expand Down Expand Up @@ -251,6 +252,49 @@ snakemake \
After successful execution of the workflow, results and logs will be found in
the `results/` and `logs/` directories, respectively.

### Expected output files
deliaBlue marked this conversation as resolved.
Show resolved Hide resolved

Upon successful execution of _MIRFLOWZ_, the tool automatically removes all
intermediate files generated during the process. The final outputs comprise:

1. A SAM file containing alignments intersecting a pri-miR locus. These
alignments intersect with extended start and/or end positions specified in the
provided pri-miR annotations. Please note that they may not contribute to the
final counting and may not appear in the final table. Alignments are discarded
if their start and/or end positions differ from the ends of the provided
pri-miR annotations by more bases than the extension used.

2. A SAM file containing alignments intersecting a mature miRNA locus. Similar
to the previous file, these alignments intersect with extended start and/or end
positions specified in the provided miRNA annotations. They may not contribute
to the final counting and might be absent from the final table.

3. A BAM file containing the set of alignments contributing to the final
counting and its corresponding index file (`.bam.bai`).

4. Table(s) containing the counting data from all libraries for (iso)miRs
and/or pri-miRs. Each row corresponds to a miRNA species, and each column
represents a sample library. Each read is counted towards all the annotated
miRNA species it aligns to, with 1/n, where n is the number of genomic and/or
transcriptomic loci that read aligns to.

To retain all intermediate files, include `--no-hooks` in the workflow call.

```bash
snakemake \
--snakefile="path/to/Snakefile" \
--cores 4 \
--configfile="path/to/config.yaml" \
--use-conda \
--printshellcmds \
--rerun-incomplete \
--no-hooks \
--verbose
```

After successful execution of the workflow, the intermediate files will be
found in the `results/intermediates` directory.

### Creating a Snakemake report

Snakemake provides the option to generate a detailed HTML report on runtime
Expand Down
5 changes: 5 additions & 0 deletions config/config_schema.json
Original file line number Diff line number Diff line change
Expand Up @@ -30,6 +30,11 @@
"default": "results/",
"description": "Path to the output directory."
},
"intermediates_dir":{
"type": "string",
"default": "results/intermediates",
"description": "Path to the directory storing the intermediate files."
},
"local_log":{
"type": "string",
"default": "logs/local/",
Expand Down
1 change: 1 addition & 0 deletions config/config_template.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -32,6 +32,7 @@ map_chr_file: path/to/ucsc_ensembl_mappings.tsv
#### DIRECTORIES ####

output_dir: results/
intermediates_dir: results/intermediates
local_log: logs/local/
cluster_log: logs/cluster/
scripts_dir: ../scripts/
Expand Down
760 changes: 374 additions & 386 deletions images/rule_graph.svg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
110 changes: 55 additions & 55 deletions test/expected_output.md5
Original file line number Diff line number Diff line change
@@ -1,58 +1,58 @@
68f943f89b52d628851dd97fb1399d68 results/TABLES/all_mirna_counts.tab
eec9be6cda61d2728290c92c1209f455 results/TABLES/mirna_counts_test_lib
363ecee318c57ee7e2e45ca468007baa results/TABLES/all_pri-mir_counts.tab
a844e3a29159e36e2f17a0646d1e8c5f results/TABLES/pri-mir_counts_test_lib
0d76977b2e36046cc176112776c5fa4e results/test_lib/alignments_intersecting_mirna_uncollapsed_sorted.bam.bai
36f7d024fe6ddfd3e788aebf61c61061 results/test_lib/oligomap_genome_sorted.fasta
48e605df55bf2dd37ea5a5a74eb5872a results/test_lib/mappings_all.sam
d41d8cd98f00b204e9800998ecf8427e results/test_lib/oligomap_transcriptome_mappings.fasta
eea903fc0ab81054cf8e34193f80f4a7 results/test_lib/mappings_all_removed_inferiors.sam
98498ac521f451426a9dbabcbecb5f25 results/test_lib/alignments_intersecting_primir.bam
defdc8c46e1d73692edde0e0278f2d5e results/test_lib/oligomap_genome_mappings.fasta
1649738f226e8979d4d88a3ae47fa423 results/test_lib/segemehl_transcriptome_mappings.sam
9ecee9ab80daba0a53076b05c9f6ff53 results/test_lib/alignments_intersecting_mirna_uncollapsed_sorted.bam
1649738f226e8979d4d88a3ae47fa423 results/test_lib/transcriptome_mappings_filtered_nh.sam
8e22ddfa7c39ce7e4ec5945dff1576ef results/test_lib/alignments_all.bam
a124a5afdb5f7bfbcc5683260556c9c4 results/test_lib/mappings_all_no_header.sam
dd00dea3549dc1ad14f9e1505d397de5 results/test_lib/alignments_all.sam
8c24d619073f4c5ca1f439fe429d0ef4 results/test_lib/alignments_intersecting_mirna_tag.sam
d41d8cd98f00b204e9800998ecf8427e results/test_lib/oligomap_transcriptome_sorted.fasta
c218718d93f48e5987fc18b33dc488f0 results/test_lib/segemehl_genome_mappings.sam
d41d8cd98f00b204e9800998ecf8427e results/test_lib/transcriptome_mappings_to_genome.sam
63a32839360a985b68e0685aafad5c54 results/test_lib/fa/reads.fa
5cc557ec2073144f47fe28ac145f4869 results/test_lib/alignments_intersecting_mirna_uncollapsed.sam
edcb854702519c0002d8ce89a21e54ef results/test_lib/reads_formatted.fasta
1a547487b8e92ad85bb26ff9b1db1f93 results/test_lib/intersected_extended_mirna.bed
721071f3ead528aa71978508db8d73f9 results/test_lib/alignments_all_sorted_test_lib.bam
ec0e9bcc8ea857da897035c8fca4078f results/test_lib/reads_trimmed_adapters.fasta
bbfc27c84b66ff41bfeee73f701b4b29 results/test_lib/alignments_intersecting_mirna_uncollapsed.bam
81bed7fc879f7a16c12d2ba912263c46 results/test_lib/alignments_intersecting_mirna.sam
dd560414078330bf3138f039da109093 results/test_lib/genome_mappings.sam
f5cb65466d328036a15b66cfbd4d8419 results/test_lib/oligomap_genome_report.txt
6cbdb9299e09b3e39b79a50db69226b5 results/test_lib/transcriptome_mappings_no_header.sam
1649738f226e8979d4d88a3ae47fa423 results/test_lib/transcriptome_mappings.sam
947607be69c16246f8dc9adbd9b971c8 results/test_lib/oligomap_genome_mappings.sam
9833208a79143eaf3f2a5fdeca0b2d94 results/test_lib/alignments_intersecting_mirna_sorted_tag.sam
02096523b293082629d5b895085468a3 results/test_lib/alignments_intersecting_primir_sorted.bam
d41d8cd98f00b204e9800998ecf8427e results/test_lib/oligomap_transcriptome_mappings.sam
a124a5afdb5f7bfbcc5683260556c9c4 results/test_lib/genome_mappings_no_header.sam
dd560414078330bf3138f039da109093 results/test_lib/genome_mappings_filtered_nh.sam
ae4c4963ca2cd206952b2ea2c58301dd results/test_lib/mappings_all_sorted_by_id.sam
2c77ffa021dda190d82f3f54a3312393 results/test_lib/reads_collapsed.fasta
f68693cfaa1e6ea78e1a5562ade6d9ed results/test_lib/intersected_extended_primir.bed
61f12595db9421926073d6675f7c3c42 results/test_lib/alignments_intersecting_primir.sam
c2a5770a755ada66ef63d96eec4afb00 results/test_lib/reads_filtered_for_oligomap.fasta
fe5388094985e9604a302d39d2abc82c results/test_lib/oligomap_transcriptome_report.txt
be7a0d92e57480190de57eb30baffa36 results/extended_mirna_annotation_6_nt.gff3
8148cd880602255be166beb59bbed95a results/genome_header.sam
09e24a504bfec37fee3d5ff1b5c7738e results/exons.bed
4fb453846e88593d0cac13220ec2d685 results/segemehl_genome_index.idx
d34fc868b861b1bc46db07a397dc0f10 results/genome_processed.fa.fai
21e102e4ebd3508bb06f46366a3d578d results/exons.gtf
003b92b245ac336e3d70a513033e1cee results/transcriptome_trimmed_id.fa
44dbf7c3eae00d0bc8d5e1319123746c results/chr_size.txt
cc5c3512dab0e269d82bd625de74198e results/extended_primir_annotation_6_nt.gff3
f28cc0143ab6659bef3de3a7afa1dccc results/mirna_annotations.gff3
2d437f8681f4248d4f2075f86debb920 results/transcriptome.fa
7eb64c112830266bcf416ded60b4cf77 results/segemehl_transcriptome_index.idx
4fba145540a2c61f29bfddfd0f5a4d4e results/genome_processed.fa
25aca3f96e7ed644067d2050393bf7a4 results/test_lib/alignments_intersecting_mirna_uncollapsed_sorted.bam
cc01c7884838a597c587437cb0acf64e results/test_lib/alignments_intersecting_mirna.sam
b1eb81426f890d671bba8c8a815edc1e results/test_lib/alignments_intersecting_primir.sam
eec9be6cda61d2728290c92c1209f455 results/intermediates/TABLES/mirna_counts_test_lib
a844e3a29159e36e2f17a0646d1e8c5f results/intermediates/TABLES/pri-mir_counts_test_lib
36f7d024fe6ddfd3e788aebf61c61061 results/intermediates/test_lib/oligomap_genome_sorted.fasta
48e605df55bf2dd37ea5a5a74eb5872a results/intermediates/test_lib/mappings_all.sam
d41d8cd98f00b204e9800998ecf8427e results/intermediates/test_lib/oligomap_transcriptome_mappings.fasta
e9aac4afeb2053385d60f5e4b07a9774 results/intermediates/test_lib/mappings_all_removed_inferiors.sam
9ebcb4ac877f37921b88ceca3ff03b62 results/intermediates/test_lib/alignments_intersecting_primir.bam
defdc8c46e1d73692edde0e0278f2d5e results/intermediates/test_lib/oligomap_genome_mappings.fasta
e632f8984d423d46bbb377ec75468521 results/intermediates/test_lib/segemehl_transcriptome_mappings.sam
e632f8984d423d46bbb377ec75468521 results/intermediates/test_lib/transcriptome_mappings_filtered_nh.sam
3344bbeb9fe01f07c04831e5b4a795ba results/intermediates/test_lib/alignments_all.bam
a124a5afdb5f7bfbcc5683260556c9c4 results/intermediates/test_lib/mappings_all_no_header.sam
d62630102c33d43d593af14c2a642839 results/intermediates/test_lib/alignments_all.sam
81103749d61bc55ee2cfc84ca1527456 results/intermediates/test_lib/alignments_intersecting_mirna_tag.sam
d41d8cd98f00b204e9800998ecf8427e results/intermediates/test_lib/oligomap_transcriptome_sorted.fasta
76643f87bb2e2bff77d1b1223d7720b5 results/intermediates/test_lib/segemehl_genome_mappings.sam
d41d8cd98f00b204e9800998ecf8427e results/intermediates/test_lib/transcriptome_mappings_to_genome.sam
63a32839360a985b68e0685aafad5c54 results/intermediates/test_lib/fa/reads.fa
e9e9698d9350b64b64c1f6d96019fce8 results/intermediates/test_lib/alignments_intersecting_mirna_uncollapsed.sam
edcb854702519c0002d8ce89a21e54ef results/intermediates/test_lib/reads_formatted.fasta
1a547487b8e92ad85bb26ff9b1db1f93 results/intermediates/test_lib/intersected_extended_mirna.bed
a287ffc43b6afbdde3e9905bc27c28a5 results/intermediates/test_lib/alignments_all_sorted_test_lib.bam
ec0e9bcc8ea857da897035c8fca4078f results/intermediates/test_lib/reads_trimmed_adapters.fasta
d7a5ab720ff9c96f41f3755a05b8f9e0 results/intermediates/test_lib/alignments_intersecting_mirna_uncollapsed.bam
1f1b873d05ec14ef9b16376a1c98315b results/intermediates/test_lib/genome_mappings.sam
f5cb65466d328036a15b66cfbd4d8419 results/intermediates/test_lib/oligomap_genome_report.txt
6cbdb9299e09b3e39b79a50db69226b5 results/intermediates/test_lib/transcriptome_mappings_no_header.sam
e632f8984d423d46bbb377ec75468521 results/intermediates/test_lib/transcriptome_mappings.sam
947607be69c16246f8dc9adbd9b971c8 results/intermediates/test_lib/oligomap_genome_mappings.sam
ce3fcd037e0a6a0b1a7a3253219e7053 results/intermediates/test_lib/alignments_intersecting_mirna_sorted_tag.sam
53764354c520d9700f13761c2721d8aa results/intermediates/test_lib/alignments_intersecting_primir_sorted.bam
d41d8cd98f00b204e9800998ecf8427e results/intermediates/test_lib/oligomap_transcriptome_mappings.sam
a124a5afdb5f7bfbcc5683260556c9c4 results/intermediates/test_lib/genome_mappings_no_header.sam
1f1b873d05ec14ef9b16376a1c98315b results/intermediates/test_lib/genome_mappings_filtered_nh.sam
6cc6165e8942a08420552aa810e629f8 results/intermediates/test_lib/mappings_all_sorted_by_id.sam
2c77ffa021dda190d82f3f54a3312393 results/intermediates/test_lib/reads_collapsed.fasta
f68693cfaa1e6ea78e1a5562ade6d9ed results/intermediates/test_lib/intersected_extended_primir.bed
c2a5770a755ada66ef63d96eec4afb00 results/intermediates/test_lib/reads_filtered_for_oligomap.fasta
fe5388094985e9604a302d39d2abc82c results/intermediates/test_lib/oligomap_transcriptome_report.txt
be7a0d92e57480190de57eb30baffa36 results/intermediates/extended_mirna_annotation_6_nt.gff3
8148cd880602255be166beb59bbed95a results/intermediates/genome_header.sam
09e24a504bfec37fee3d5ff1b5c7738e results/intermediates/exons.bed
4fb453846e88593d0cac13220ec2d685 results/intermediates/segemehl_genome_index.idx
d34fc868b861b1bc46db07a397dc0f10 results/intermediates/genome_processed.fa.fai
21e102e4ebd3508bb06f46366a3d578d results/intermediates/exons.gtf
003b92b245ac336e3d70a513033e1cee results/intermediates/transcriptome_trimmed_id.fa
44dbf7c3eae00d0bc8d5e1319123746c results/intermediates/chr_size.txt
cc5c3512dab0e269d82bd625de74198e results/intermediates/extended_primir_annotation_6_nt.gff3
f28cc0143ab6659bef3de3a7afa1dccc results/intermediates/mirna_annotations.gff3
2d437f8681f4248d4f2075f86debb920 results/intermediates/transcriptome.fa
7eb64c112830266bcf416ded60b4cf77 results/intermediates/segemehl_transcriptome_index.idx
4fba145540a2c61f29bfddfd0f5a4d4e results/intermediates/genome_processed.fa
2 changes: 1 addition & 1 deletion test/test_workflow_local_with_conda.sh
Original file line number Diff line number Diff line change
Expand Up @@ -25,9 +25,9 @@ snakemake \
--use-conda \
--printshellcmds \
--rerun-incomplete \
--no-hooks \
--verbose


# Snakemake report
snakemake \
--snakefile="../workflow/Snakefile" \
Expand Down
2 changes: 1 addition & 1 deletion test/test_workflow_local_with_singularity.sh
Original file line number Diff line number Diff line change
Expand Up @@ -26,9 +26,9 @@ snakemake \
--singularity-args "--bind ${PWD}/../" \
--printshellcmds \
--rerun-incomplete \
--no-hooks \
--verbose


# Snakemake report
snakemake \
--snakefile="../workflow/Snakefile" \
Expand Down
1 change: 1 addition & 0 deletions test/test_workflow_slurm_with_conda.sh
Original file line number Diff line number Diff line change
Expand Up @@ -41,6 +41,7 @@ snakemake \
--use-conda \
--printshellcmds \
--rerun-incomplete \
--no-hooks \
--verbose

# Snakemake report
Expand Down
1 change: 1 addition & 0 deletions test/test_workflow_slurm_with_singularity.sh
Original file line number Diff line number Diff line change
Expand Up @@ -42,6 +42,7 @@ snakemake \
--singularity-args="--bind ${PWD}/../" \
--printshellcmds \
--rerun-incomplete \
--no-hooks \
--verbose

# Snakemake report
Expand Down
21 changes: 17 additions & 4 deletions workflow/Snakefile
Original file line number Diff line number Diff line change
Expand Up @@ -31,6 +31,23 @@ validate(config, Path("../config/config_schema.json"))


OUT_DIR = Path(config["output_dir"])
INTERMEDIATES_DIR = Path(config["intermediates_dir"])
LOG_DIR = Path(f"{config['local_log']}/../")


###############################################################################
### onSuccess/onError handlers configuration
###############################################################################
deliaBlue marked this conversation as resolved.
Show resolved Hide resolved


onsuccess:
print("\nWORKFLOW SUCCEED. Removing intermediate files.\n")
shell("rm -rf {INTERMEDIATES_DIR}")


onerror:
print("\nWORKFLOW FAILED. Check the log file in the log directory.\n")
shell("cat {log} > {LOG_DIR}/failed_workflow.log")


###############################################################################
Expand Down Expand Up @@ -67,10 +84,6 @@ rule finish:
OUT_DIR / "{sample}" / "alignments_intersecting_mirna.sam",
sample=pd.unique(samples_table.index.values),
),
intersect_sam=expand(
OUT_DIR / "{sample}" / "alignments_intersecting_mirna_sorted_tag.sam",
sample=pd.unique(samples_table.index.values),
),
table=expand(
OUT_DIR / "TABLES" / "all_{mir}_counts.tab",
mir=[mir for mir in config["mir_list"] if mir != "isomir"],
Expand Down
Loading
Loading