Skip to content

Commit

Permalink
pre-rel
Browse files Browse the repository at this point in the history
  • Loading branch information
telatin committed Apr 30, 2021
1 parent ddf0c90 commit 69026be
Show file tree
Hide file tree
Showing 15 changed files with 576 additions and 213 deletions.
1 change: 1 addition & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -4,3 +4,4 @@ htmldocs/
bin/*
*.bai
profiling
multiqc/
112 changes: 10 additions & 102 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,113 +5,21 @@ Tools to extract coverage informations from BAM (and CRAM) files, based on the
coverage and physical coverage, input from streams
and uses a memory-efficient algorithm.

## covtobed
```
covToBed 2.0.0
Usage: covtobed [options] [<BAM>]
Arguments:
<BAM> the alignment file for which to calculate depth (default: STDIN)
Core options:
-p, --physical Calculate physical coverage
-s, --stranded Report coverage separate by strand
-w, --wig <SPAN> Output in wig format (using fixed <SPAN>)
Target files:
-r, --regions <bed> Target file in BED or GFF format (detected with the extension)
-t, --type <feat> GFF feature type to parse [default: CDS]
-i, --id <ID> GFF identifier [default: ID]
BAM reading options:
-T, --threads <threads> BAM decompression threads [default: 0]
-F, --flag <FLAG> Exclude reads with any of the bits in FLAG set [default: 1796]
-Q, --mapq <mapq> Mapping quality threshold [default: 0]
Other options:
--debug Enable diagnostics
-h, --help Show help
```

## covtotarget
will count the _total nucleotide coverage_ per feature in a BED or GFF file using as input the output of [covtobed](https://github.com/telatin/covtobed) (also from STDIN).
```
covToTarget
Usage: covtotarget [options] <Target> [<covtobed-output>]
## :book: Documentation

Arguments:
Full documentation is available online at the **[dedicated website](https://telatin.github.io/bamtocov/)**, or in
this repository under `docs`.

<Target> the BED (or GFF) file containing regions in which to count reads
<covtobed-output> covtobed output, or STDIN if not provided
## Installation

Options:
The BamToCov package is available from [BioConda](https://bioconda.github.io/recipes/bamtocov/README.html)

-g, --gff Force GFF for input (otherwise autodetected by .gff extension)
-t, --type <feat> GFF feature type to parse [default: CDS]
-i, --id <ID> GFF identifier [default: ID]
-l, --norm-len Normalize by gene length
-b, --bed-output Output format is BED-like (default is feature_name [tab] counts)
-h, --help Show help
```

Example, can be used in a stream from the BAM emitter to covtobed:
```bash
cat input/mini.bam | covtobed | covtotarget input/mini.gff
```

Where _covtobed_ output is:
```text
seq1 0 9 0
seq1 9 109 5
[...]
seq2 499 599 10
seq2 599 1000 0
```

and `covtocounts` output is (extracts):
```text
MGLILCEK_00002 0
MGLILCEK_00003 51
MGLILCEK_00010 1000
```
with `--norm-len` and `--bed-output`:
```
seq0 299 400 ZERO_COV_CHR_2 0.0
seq1 199 400 MGLILCEK_00001 3.487562189054726
seq1 599 650 MGLILCEK_00002 0.0
```


## covtocounts
will count the _number of alignments_ in a BAM file per feature of a target BED or GFF file (basically, adds GFF support to `read-count` found in [nim-hts-tools](https://github.com/brentp/hts-nim-tools))
```
covToCounts 0.4.1
Usage: covtocounts [options] <Target> <BAM-or-CRAM>
Arguments:
<Target> the BED (or GFF) file containing regions in which to count reads
<BAM-or-CRAM> the alignment file for which to calculate depth
Options:
-T, --threads <threads> BAM decompression threads [default: 0]
-r, --fasta <fasta> FASTA file for use with CRAM files [default: ].
-F, --flag <FLAG> Exclude reads with any of the bits in FLAG set [default: 1796]
-Q, --mapq <mapq> Mapping quality threshold [default: 0]
-g, --gff Force GFF for input (otherwise autodetected by .gff extension)
-t, --type <feat> GFF feature type to parse [default: CDS]
-i, --id <ID> GFF identifier [default: ID]
-n, --rpkm Add a RPKM column
-l, --norm-len Add a counts/length column (after RPKM when both used)
--header Print header
--debug Enable diagnostics
-h, --help Show help
conda install -y -c bioconda bamtocov
```

## References
* Brent Pedersen, Aaron Quinlan, [hts-nim: scripting high-performance genomic analyses](https://academic.oup.com/bioinformatics/article/34/19/3387/4990493) (Bioinformatics)
* Giovanni Birolo, Andrea Telatin, [covtobed: a simple and fast tool to extract coverage tracks from BAM files](https://joss.theoj.org/papers/10.21105/joss.02119) (JOSS)
* Brent Pedersen, Aaron Quinlan,
[hts-nim: scripting high-performance genomic analyses](https://academic.oup.com/bioinformatics/article/34/19/3387/4990493) (Bioinformatics)
* Giovanni Birolo, Andrea Telatin,
[covtobed: a simple and fast tool to extract coverage tracks from BAM files](https://joss.theoj.org/papers/10.21105/joss.02119) (JOSS)
2 changes: 1 addition & 1 deletion bamtocov.nimble
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
# Package

version = "2.0.2"
version = "2.0.3"
author = "Andrea Telatin, Giovanni Birolo"
description = "BAM to Coverage"
license = "MIT"
Expand Down
Binary file added input/alt.bam
Binary file not shown.
33 changes: 33 additions & 0 deletions input/alt.sam
Original file line number Diff line number Diff line change
@@ -0,0 +1,33 @@
@HD VN:1.6 SO:coordinate
@SQ SN:seq1 LN:1000
@SQ SN:seq2 LN:1000
@SQ SN:seq0 LN:1000
@PG ID:samtools PN:samtools VN:1.10 CL:samtools view -bS /local/giovanni/covtools/src/../input/mini.sam
@PG ID:samtools.1 PN:samtools PP:samtools VN:1.10 CL:samtools sort -o /local/giovanni/covtools/src/../input/mini.bam -
@PG ID:samtools.2 PN:samtools PP:samtools.1 VN:1.11 CL:samtools view -h mini.bam
read11 0 seq1 10 60 100M * 0 0 * *
read12 0 seq1 10 60 100M * 0 0 * *
t1_out1 0 seq1 190 60 100M * 0 0 * *
t1_out2 0 seq1 190 60 100M * 0 0 * *
t1_in1 0 seq1 201 60 100M * 0 0 * *
t1_in2 0 seq1 201 60 100M * 0 0 * *
t1_mid1 0 seq1 251 60 100M * 0 0 * *
t1_mid2 0 seq1 251 60 100M * 0 0 * *
t1_edge 0 seq1 300 60 100M * 0 0 * *
t1_end 0 seq1 380 60 100M * 0 0 * *
over_t1 0 seq1 401 60 100M * 0 0 * *
just_1X 0 seq1 651 60 100M * 0 0 * *
s2for1 0 seq2 500 60 100M * 0 0 * *
s2for2 0 seq2 500 60 100M * 0 0 * *
s2for3 0 seq2 500 60 100M * 0 0 * *
s2for4 0 seq2 500 60 100M * 0 0 * *
s4for_ 0 seq2 500 60 100M * 0 0 * *
s2rev1 16 seq2 500 60 100M * 0 0 * *
s2rev2 16 seq2 500 60 100M * 0 0 * *
s2rev3 16 seq2 500 60 100M * 0 0 * *
s2rev4 16 seq2 500 60 100M * 0 0 * *
s2rev1b 16 seq2 500 60 100M * 0 0 * *
s2rev2b 16 seq2 500 60 100M * 0 0 * *
s2rev3b 16 seq2 500 60 100M * 0 0 * *
s2rev4b 16 seq2 500 60 100M * 0 0 * *
s2rev4 16 seq2 500 60 100M * 0 0 * *
Binary file added input/copy.bam
Binary file not shown.
6 changes: 6 additions & 0 deletions input/phi/annotation.bed
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
NC_001422.1 50 221 PhiX_01 . + Prodigal:002006 CDS 0 ID=PhiX_01;inference=ab initio prediction:Prodigal:002006;locus_tag=PhiX_01;product=hypothetical protein
NC_001422.1 389 848 PhiX_02 . + Prodigal:002006 CDS 0 ID=PhiX_02;inference=ab initio prediction:Prodigal:002006;locus_tag=PhiX_02;product=hypothetical protein
NC_001422.1 847 964 PhiX_03 . + Prodigal:002006 CDS 0 ID=PhiX_03;inference=ab initio prediction:Prodigal:002006;locus_tag=PhiX_03;product=hypothetical protein
NC_001422.1 1000 2284 PhiX_04 . + Prodigal:002006 CDS 0 ID=PhiX_04;inference=ab initio prediction:Prodigal:002006;locus_tag=PhiX_04;product=hypothetical protein
NC_001422.1 2394 2922 PhiX_05 . + Prodigal:002006 CDS 0 ID=PhiX_05;inference=ab initio prediction:Prodigal:002006;locus_tag=PhiX_05;product=hypothetical protein
NC_001422.1 2930 3917 PhiX_06 . + Prodigal:002006 CDS 0 ID=PhiX_06;inference=ab initio prediction:Prodigal:002006;locus_tag=GINNEHPO_00006;product=hypothetical protein
100 changes: 100 additions & 0 deletions input/phi/annotation.gff
Original file line number Diff line number Diff line change
@@ -0,0 +1,100 @@
##gff-version 3
##sequence-region NC_001422.1 1 5386
NC_001422.1 Prodigal:002006 CDS 51 221 . + 0 ID=PhiX_01;inference=ab initio prediction:Prodigal:002006;locus_tag=PhiX_01;product=hypothetical protein
NC_001422.1 Prodigal:002006 CDS 390 848 . + 0 ID=PhiX_02;inference=ab initio prediction:Prodigal:002006;locus_tag=PhiX_02;product=hypothetical protein
NC_001422.1 Prodigal:002006 CDS 848 964 . + 0 ID=PhiX_03;inference=ab initio prediction:Prodigal:002006;locus_tag=PhiX_03;product=hypothetical protein
NC_001422.1 Prodigal:002006 CDS 1001 2284 . + 0 ID=PhiX_04;inference=ab initio prediction:Prodigal:002006;locus_tag=PhiX_04;product=hypothetical protein
NC_001422.1 Prodigal:002006 CDS 2395 2922 . + 0 ID=PhiX_05;inference=ab initio prediction:Prodigal:002006;locus_tag=PhiX_05;product=hypothetical protein
NC_001422.1 Prodigal:002006 CDS 2931 3917 . + 0 ID=PhiX_06;inference=ab initio prediction:Prodigal:002006;locus_tag=PhiX_06;product=hypothetical protein
##FASTA
>NC_001422.1
GAGTTTTATCGCTTCCATGACGCAGAAGTTAACACTTTCGGATATTTCTGATGAGTCGAA
AAATTATCTTGATAAAGCAGGAATTACTACTGCTTGTTTACGAATTAAATCGAAGTGGAC
TGCTGGCGGAAAATGAGAAAATTCGACCTATCCTTGCGCAGCTCGAGAAGCTCTTACTTT
GCGACCTTTCGCCATCAACTAACGATTCTGTCAAAAACTGACGCGTTGGATGAGGAGAAG
TGGCTTAATATGCTTGGCACGTTCGTCAAGGACTGGTTTAGATATGAGTCACATTTTGTT
CATGGTAGAGATTCTCTTGTTGACATTTTAAAAGAGCGTGGATTACTATCTGAGTCCGAT
GCTGTTCAACCACTAATAGGTAAGAAATCATGAGTCAAGTTACTGAACAATCCGTACGTT
TCCAGACCGCTTTGGCCTCTATTAAGCTCATTCAGGCTTCTGCCGTTTTGGATTTAACCG
AAGATGATTTCGATTTTCTGACGAGTAACAAAGTTTGGATTGCTACTGACCGCTCTCGTG
CTCGTCGCTGCGTTGAGGCTTGCGTTTATGGTACGCTGGACTTTGTGGGATACCCTCGCT
TTCCTGCTCCTGTTGAGTTTATTGCTGCCGTCATTGCTTATTATGTTCATCCCGTCAACA
TTCAAACGGCCTGTCTCATCATGGAAGGCGCTGAATTTACGGAAAACATTATTAATGGCG
TCGAGCGTCCGGTTAAAGCCGCTGAATTGTTCGCGTTTACCTTGCGTGTACGCGCAGGAA
ACACTGACGTTCTTACTGACGCAGAAGAAAACGTGCGTCAAAAATTACGTGCGGAAGGAG
TGATGTAATGTCTAAAGGTAAAAAACGTTCTGGCGCTCGCCCTGGTCGTCCGCAGCCGTT
GCGAGGTACTAAAGGCAAGCGTAAAGGCGCTCGTCTTTGGTATGTAGGTGGTCAACAATT
TTAATTGCAGGGGCTTCGGCCCCTTACTTGAGGATAAATTATGTCTAATATTCAAACTGG
CGCCGAGCGTATGCCGCATGACCTTTCCCATCTTGGCTTCCTTGCTGGTCAGATTGGTCG
TCTTATTACCATTTCAACTACTCCGGTTATCGCTGGCGACTCCTTCGAGATGGACGCCGT
TGGCGCTCTCCGTCTTTCTCCATTGCGTCGTGGCCTTGCTATTGACTCTACTGTAGACAT
TTTTACTTTTTATGTCCCTCATCGTCACGTTTATGGTGAACAGTGGATTAAGTTCATGAA
GGATGGTGTTAATGCCACTCCTCTCCCGACTGTTAACACTACTGGTTATATTGACCATGC
CGCTTTTCTTGGCACGATTAACCCTGATACCAATAAAATCCCTAAGCATTTGTTTCAGGG
TTATTTGAATATCTATAACAACTATTTTAAAGCGCCGTGGATGCCTGACCGTACCGAGGC
TAACCCTAATGAGCTTAATCAAGATGATGCTCGTTATGGTTTCCGTTGCTGCCATCTCAA
AAACATTTGGACTGCTCCGCTTCCTCCTGAGACTGAGCTTTCTCGCCAAATGACGACTTC
TACCACATCTATTGACATTATGGGTCTGCAAGCTGCTTATGCTAATTTGCATACTGACCA
AGAACGTGATTACTTCATGCAGCGTTACCATGATGTTATTTCTTCATTTGGAGGTAAAAC
CTCTTATGACGCTGACAACCGTCCTTTACTTGTCATGCGCTCTAATCTCTGGGCATCTGG
CTATGATGTTGATGGAACTGACCAAACGTCGTTAGGCCAGTTTTCTGGTCGTGTTCAACA
GACCTATAAACATTCTGTGCCGCGTTTCTTTGTTCCTGAGCATGGCACTATGTTTACTCT
TGCGCTTGTTCGTTTTCCGCCTACTGCGACTAAAGAGATTCAGTACCTTAACGCTAAAGG
TGCTTTGACTTATACCGATATTGCTGGCGACCCTGTTTTGTATGGCAACTTGCCGCCGCG
TGAAATTTCTATGAAGGATGTTTTCCGTTCTGGTGATTCGTCTAAGAAGTTTAAGATTGC
TGAGGGTCAGTGGTATCGTTATGCGCCTTCGTATGTTTCTCCTGCTTATCACCTTCTTGA
AGGCTTCCCATTCATTCAGGAACCGCCTTCTGGTGATTTGCAAGAACGCGTACTTATTCG
CCACCATGATTATGACCAGTGTTTCCAGTCCGTTCAGTTGTTGCAGTGGAATAGTCAGGT
TAAATTTAATGTGACCGTTTATCGCAATCTGCCGACCACTCGCGATTCAATCATGACTTC
GTGATAAAAGATTGAGTGTGAGGTTATAACGCCGAAGCGGTAAAAATTTTAATTTTTGCC
GCTGAGGGGTTGACCAAGCGAAGCGCGGTAGGTTTTCTGCTTAGGAGTTTAATCATGTTT
CAGACTTTTATTTCTCGCCATAATTCAAACTTTTTTTCTGATAAGCTGGTTCTCACTTCT
GTTACTCCAGCTTCTTCGGCACCTGTTTTACAGACACCTAAAGCTACATCGTCAACGTTA
TATTTTGATAGTTTGACGGTTAATGCTGGTAATGGTGGTTTTCTTCATTGCATTCAGATG
GATACATCTGTCAACGCCGCTAATCAGGTTGTTTCTGTTGGTGCTGATATTGCTTTTGAT
GCCGACCCTAAATTTTTTGCCTGTTTGGTTCGCTTTGAGTCTTCTTCGGTTCCGACTACC
CTCCCGACTGCCTATGATGTTTATCCTTTGAATGGTCGCCATGATGGTGGTTATTATACC
GTCAAGGACTGTGTGACTATTGACGTCCTTCCCCGTACGCCGGGCAATAACGTTTATGTT
GGTTTCATGGTTTGGTCTAACTTTACCGCTACTAAATGCCGCGGATTGGTTTCGCTGAAT
CAGGTTATTAAAGAGATTATTTGTCTCCAGCCACTTAAGTGAGGTGATTTATGTTTGGTG
CTATTGCTGGCGGTATTGCTTCTGCTCTTGCTGGTGGCGCCATGTCTAAATTGTTTGGAG
GCGGTCAAAAAGCCGCCTCCGGTGGCATTCAAGGTGATGTGCTTGCTACCGATAACAATA
CTGTAGGCATGGGTGATGCTGGTATTAAATCTGCCATTCAAGGCTCTAATGTTCCTAACC
CTGATGAGGCCGCCCCTAGTTTTGTTTCTGGTGCTATGGCTAAAGCTGGTAAAGGACTTC
TTGAAGGTACGTTGCAGGCTGGCACTTCTGCCGTTTCTGATAAGTTGCTTGATTTGGTTG
GACTTGGTGGCAAGTCTGCCGCTGATAAAGGAAAGGATACTCGTGATTATCTTGCTGCTG
CATTTCCTGAGCTTAATGCTTGGGAGCGTGCTGGTGCTGATGCTTCCTCTGCTGGTATGG
TTGACGCCGGATTTGAGAATCAAAAAGAGCTTACTAAAATGCAACTGGACAATCAGAAAG
AGATTGCCGAGATGCAAAATGAGACTCAAAAAGAGATTGCTGGCATTCAGTCGGCGACTT
CACGCCAGAATACGAAAGACCAGGTATATGCACAAAATGAGATGCTTGCTTATCAACAGA
AGGAGTCTACTGCTCGCGTTGCGTCTATTATGGAAAACACCAATCTTTCCAAGCAACAGC
AGGTTTCCGAGATTATGCGCCAAATGCTTACTCAAGCTCAAACGGCTGGTCAGTATTTTA
CCAATGACCAAATCAAAGAAATGACTCGCAAGGTTAGTGCTGAGGTTGACTTAGTTCATC
AGCAAACGCAGAATCAGCGGTATGGCTCTTCTCATATTGGCGCTACTGCAAAGGATATTT
CTAATGTCGTCACTGATGCTGCTTCTGGTGTGGTTGATATTTTTCATGGTATTGATAAAG
CTGTTGCCGATACTTGGAACAATTTCTGGAAAGACGGTAAAGCTGATGGTATTGGCTCTA
ATTTGTCTAGGAAATAACCGTCAGGATTGACACCCTCCCAATTGTATGTTTTCATGCCTC
CAAATCTTGGAGGCTTTTTTATGGTTCGTTCTTATTACCCTTCTGAATGTCACGCTGATT
ATTTTGACTTTGAGCGTATCGAGGCTCTTAAACCTGCTATTGAGGCTTGTGGCATTTCTA
CTCTTTCTCAATCCCCAATGCTTGGCTTCCATAAGCAGATGGATAACCGCATCAAGCTCT
TGGAAGAGATTCTGTCTTTTCGTATGCAGGGCGTTGAGTTCGATAATGGTGATATGTATG
TTGACGGCCATAAGGCTGCTTCTGACGTTCGTGATGAGTTTGTATCTGTTACTGAGAAGT
TAATGGATGAATTGGCACAATGCTACAATGTGCTCCCCCAACTTGATATTAATAACACTA
TAGACCACCGCCCCGAAGGGGACGAAAAATGGTTTTTAGAGAACGAGAAGACGGTTACGC
AGTTTTGCCGCAAGCTGGCTGCTGAACGCCCTCTTAAGGATATTCGCGATGAGTATAATT
ACCCCAAAAAGAAAGGTATTAAGGATGAGTGTTCAAGATTGCTGGAGGCCTCCACTATGA
AATCGCGTAGAGGCTTTGCTATTCAGCGTTTGATGAATGCAATGCGACAGGCTCATGCTG
ATGGTTGGTTTATCGTTTTTGACACTCTCACGTTGGCTGACGACCGATTAGAGGCGTTTT
ATGATAATCCCAATGCTTTGCGTGACTATTTTCGTGATATTGGTCGTATGGTTCTTGCTG
CCGAGGGTCGCAAGGCTAATGATTCACACGCCGACTGCTATCAGTATTTTTGTGTGCCTG
AGTATGGTACAGCTAATGGCCGTCTTCATTTCCATGCGGTGCACTTTATGCGGACACTTC
CTACAGGTAGCGTTGACCCTAATTTTGGTCGTCGGGTACGCAATCGCCGCCAGTTAAATA
GCTTGCAAAATACGTGGCCTTATGGTTACAGTATGCCCATCGCAGTTCGCTACACGCAGG
ACGCTTTTTCACGTTCTGGTTGGTTGTGGCCTGTTGATGCTAAAGGTGAGCCGCTTAAAG
CTACCAGTTATATGGCTGTTGGTTTCTATGTGGCTAAATACGTTAACAAAAAGTCAGATA
TGGACCTTGCTGCTAAAGGTCTAGGAGCTAAAGAATGGAACAACTCACTAAAAACCAAGC
TGTCGCTACTTCCCAAGAAGCTGTTCAGAATCAGAATGAGCCGCAACTTCGGGATGAAAA
TGCTCACAATGACAAATCTGTCCACGGAGTGCTTAATCCAACTTACCAAGCTGGGTTACG
ACGCGACGCCGTTCAACCAGATATTGAAGCAGAACGCAAAAAGAGAGATGAGATTGAGGC
TGGGAAAAGTTACTGTAGCCGACGTTTTGGCGGCGCAACCTGTGACGACAAATCTGCTCA
AATTTATGCGCGCTTCGATAAAAATGATTGGCGTATCCAACCTGCA
7 changes: 7 additions & 0 deletions input/phi/annotation.tsv
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
locus_tag ftype length_bp gene EC_number COG product
PhiX_01 CDS 171 hypothetical protein
PhiX_02 CDS 459 hypothetical protein
PhiX_03 CDS 117 hypothetical protein
PhiX_04 CDS 1284 hypothetical protein
PhiX_05 CDS 528 hypothetical protein
PhiX_06 CDS 987 hypothetical protein
Loading

0 comments on commit 69026be

Please sign in to comment.