Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

mosdepth behaves abnormal. #3581

Closed
jxshi opened this issue Dec 10, 2021 · 4 comments
Closed

mosdepth behaves abnormal. #3581

jxshi opened this issue Dec 10, 2021 · 4 comments

Comments

@jxshi
Copy link

jxshi commented Dec 10, 2021

Version info

  • bcbio version (bcbio_nextgen.py --version): 1.2.9a
  • OS name and version (lsb_release -ds): CentOS Linux release 7.8.2003 (Core)

To Reproduce
Exact bcbio command you have used:

bcbio_nextgen.py ../config/hu.yaml -n 96

Your yaml configuration file:

details:
  - analysis: variant2
    genome_build: GRCh37
    # In order to do paired variant calling, samples should belong to the
    # same batch ("batch" under "metadata" below") and have a "phenotype"
    # field stating either "normal" or "tumor". For each batch there
    # should be a sample with "tumor" phenotype and a sample with "normal"
    # phenotype (no more than two samples per batch)
    metadata:
       batch: your-batch-name
       phenotype: tumor # or "normal"
    algorithm:
      #trim_reads: [fastp]
      #align_split_size: false
      aligner: minimap2
      platform: Illumina
      save_diskspace: true
      mark_duplicates: true
      remove_lcr: true
      recalibrate: false
      realign: false
      variantcaller: [vardict,varscan]
      #variantcaller: [vardict]
      tools_off:
      - gatk4
      - picard
      - mosdepth
      min_allele_fraction: 0.5
      #use_lowfreq_filter: false
      #svcaller: [lumpy, manta]
      #svprioritize: cancer/civic
      #hlacaller: optitype
      exclude_regions: [lcr]
      ensemble:
        numpass: 2
        use_filtered: true
      # for targetted projects, set the region
      # variant_regions: /path/to/your.bed

Log files (could be found in work/log)
Please attach (10MB max): bcbio-nextgen-commands.log, and bcbio-nextgen-debug.log.

bcbio-nextgen-commands log

[2021-12-10T02:35Z] /home/zzusah/local/share/bcbio/anaconda/envs/htslib1.10/bin/mosdepth -t 47 /mnt/ONCOBOX/zzusah/data/results/hu/work/qc/huqiusheng/viral/huqiusheng-gdc-viral /mnt/ONCOBOX/zzusah/data/results/hu/work/qc/huqiusheng/viral/huqiusheng-gdc-viral.bam -n --thresholds 1,5,25 --by <(awk 'BEGIN {FS="\t"}; {print $1 FS "0" FS $2}' /home/zzusah/local/share/bcbio/genomes/Hsapiens/GRCh37/viral/gdc-viral.fa.fai) && echo '## Viral sequences (from https://gdc.cancer.gov/about-data/data-harmonization-and-generation/gdc-reference-files) found in unmapped reads' > /mnt/ONCOBOX/zzusah/data/results/hu/work/bcbiotx/tmpy08bxy2a/huqiusheng-gdc-viral-completeness.txt &&echo '## Sample: huqiusheng' >> /mnt/ONCOBOX/zzusah/data/results/hu/work/bcbiotx/tmpy08bxy2a/huqiusheng-gdc-viral-completeness.txt && echo '#virus    size    depth   1x      5x      25x     reads   reads_pct' >> /mnt/ONCOBOX/zzusah/data/results/hu/work/bcbiotx/tmpy08bxy2a/huqiusheng-gdc-viral-completeness.txt && paste <(zcat /mnt/ONCOBOX/zzusah/data/results/hu/work/qc/huqiusheng/viral/huqiusheng-gdc-viral.regions.bed.gz) <(zgrep -v ^# /mnt/ONCOBOX/zzusah/data/results/hu/work/qc/huqiusheng/viral/huqiusheng-gdc-viral.thresholds.bed.gz) <(samtools idxstats /mnt/ONCOBOX/zzusah/data/results/hu/work/qc/huqiusheng/viral/huqiusheng-gdc-viral.bam | grep -v '*') | awk 'BEGIN {FS="\t"} { print $1 FS $3 FS $4 FS $10/$3 FS $11/$3 FS $12/$3 FS $15 FS $15/85589025}' | sort -n -r -k 5,5 >> /mnt/ONCOBOX/zzusah/data/results/hu/work/bcbiotx/tmpy08bxy2a/huqiusheng-gdc-viral-completeness.txt

bcbio-nextgen-debug.log

[2021-12-10T02:35Z] Analyse coverage of viral genomes
[2021-12-10T02:35Z] bam.nim(381)             open
[2021-12-10T02:35Z] Error: unhandled exception: invalid bgzf file [ValueError]
[2021-12-10T02:35Z] Uncaught exception occurred
Traceback (most recent call last):
  File "/home/zzusah/local/share/bcbio/anaconda/lib/python3.7/site-packages/bcbio/provenance/do.py", line 26, in run
    _do_run(cmd, checks, log_stdout, env=env)
  File "/home/zzusah/local/share/bcbio/anaconda/lib/python3.7/site-packages/bcbio/provenance/do.py", line 106, in _do_run
    raise subprocess.CalledProcessError(exitcode, error_msg)
subprocess.CalledProcessError: Command 'set -o pipefail; /home/zzusah/local/share/bcbio/anaconda/envs/htslib1.10/bin/mosdepth -t 47 /mnt/ONCOBOX/zzusah/data/results/hu/work/qc/huqiusheng/viral/huqiusheng-gdc-viral /mnt/ONCOBOX/zzusah/data/results/hu/work/qc/huqiusheng/viral/huqiusheng-gdc-viral.bam -n --thresholds 1,5,25 --by <(awk 'BEGIN {FS="\t"}; {print $1 FS "0" FS $2}' /home/zzusah/local/share/bcbio/genomes/Hsapiens/GRCh37/viral/gdc-viral.fa.fai) && echo '## Viral sequences (from https://gdc.cancer.gov/about-data/data-harmonization-and-generation/gdc-reference-files) found in unmapped reads' > /mnt/ONCOBOX/zzusah/data/results/hu/work/bcbiotx/tmpy08bxy2a/huqiusheng-gdc-viral-completeness.txt &&echo '## Sample: huqiusheng' >> /mnt/ONCOBOX/zzusah/data/results/hu/work/bcbiotx/tmpy08bxy2a/huqiusheng-gdc-viral-completeness.txt && echo '#virus       size    depth   1x      5x      25x     reads   reads_pct' >> /mnt/ONCOBOX/zzusah/data/results/hu/work/bcbiotx/tmpy08bxy2a/huqiusheng-gdc-viral-completeness.txt && paste <(zcat /mnt/ONCOBOX/zzusah/data/results/hu/work/qc/huqiusheng/viral/huqiusheng-gdc-viral.regions.bed.gz) <(zgrep -v ^# /mnt/ONCOBOX/zzusah/data/results/hu/work/qc/huqiusheng/viral/huqiusheng-gdc-viral.thresholds.bed.gz) <(samtools idxstats /mnt/ONCOBOX/zzusah/data/results/hu/work/qc/huqiusheng/viral/huqiusheng-gdc-viral.bam | grep -v '*') | awk 'BEGIN {FS="\t"} { print $1 FS $3 FS $4 FS $10/$3 FS $11/$3 FS $12/$3 FS $15 FS $15/85589025}' | sort -n -r -k 5,5 >> /mnt/ONCOBOX/zzusah/data/results/hu/work/bcbiotx/tmpy08bxy2a/huqiusheng-gdc-viral-completeness.txt
bam.nim(381)             open
Error: unhandled exception: invalid bgzf file [ValueError]
' returned non-zero exit status 1.
@naumenko-sa
Copy link
Contributor

yes, this one was giving me a hard time as well.

The very issue is well known for many years in the old samtools code:
lh3/samtools#7

But why we are hitting it here, I am not sure. In the debugger if you delete the viral.bam and re-creates just well.

I've made the samtools call safer here:
#3582

And it seems to be solving the issue in my bcbio run.

Let me know how it works for you (you'd need to upgrade to the latest code).

Sergey

@jxshi
Copy link
Author

jxshi commented Dec 12, 2021

Hi @naumenko-sa ,

I circumvented this issue by turning off viral in the template yaml file with the following option.

tools_off:
- viral

And finally, it went through. Cheers!

Best,
Jianxiang

@jxshi jxshi closed this as completed Dec 12, 2021
@naumenko-sa
Copy link
Contributor

Thanks for confirming! In my project, it finished with multicore bcbio (one-node, as opposed to ipython).

@naumenko-sa
Copy link
Contributor

see also this fix: #3587

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants