Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Deletion variant not called #125

Open
swomics opened this issue Mar 4, 2025 · 12 comments
Open

Deletion variant not called #125

swomics opened this issue Mar 4, 2025 · 12 comments

Comments

@swomics
Copy link

swomics commented Mar 4, 2025

Hi, I'm trying to call a verified deletion variant using some low coverage HiFi reads. I have to reduce the min-support to 1 for it to be identified in the output. However it appears as two records and is not called, whereas other indels with the same coverage seem to be called correctly. I've attached a small subset of the bam and vcf for the region, in case that helps

test.zip

@kcleal
Copy link
Owner

kcleal commented Mar 4, 2025

Hi @swomics,
Thanks for sending the test data, I will take a look

kcleal added a commit that referenced this issue Mar 4, 2025
kcleal added a commit that referenced this issue Mar 4, 2025
@kcleal
Copy link
Owner

kcleal commented Mar 4, 2025

Hi @swomics,
There were a couple of things going on. Firstly, the example reads you sent have quite a high divergence for HiFi data so they were getting filtered out. I changed to --divergence 0.2 which stopped the filtering.

However, there was also a bug which was overriding min-support = 1, and preventing any calls! The bug should now be fixed. It will be a few days before a proper release, but you can download one of the pre-built wheels from the workflow page when they finish building https://github.com/kcleal/dysgu/actions/runs/13658034481

kcleal added a commit that referenced this issue Mar 4, 2025
* Dysgu_dev <-- master (#121)

* Dysgu dev (#115)

* dysgu_dev <- master (#108)

* correct usage of update_filter_value in filter_normals

It seems like at some point the usage of update_filter_value changed, because in several spots, it is missing the sample_name argument. This breaks `dysgu filter --keep-all`.

* Update main.yml

* Update main.yml

* Update requirements.txt

* updated build process to pyproject.toml (#103)

* v1.6.6 updated README

* Update main.yml

* Added pyproject.toml

* Updated pyproject.toml and setup.py

* Update pyproject.toml

---------

Co-authored-by: Dr. K. D. Murray <[email protected]>

* Better recall+precision for long-reads. Improved merging pipeline. Faster runtime. Code refactoring. WIP # [skip ci]

* Fixed issues with overriding presets. --mode option updated # [skip ci]

* Fixed issues with overriding presets. --mode option updated to support r10 and revio, new presets available. --no-gt flat removed # [skip ci]

* Fixed CLI issues for tests # [skip ci]

* Fixed API issue

* v1.7.0

* added deprecated warning for mode

* Added filter for high divergence reads (long-reads only) # [skip ci]

* Better read merging for long reads. New clustering methods for long reads. Improved runtime.

* Update main.yml

* Update main.yml

* Update osx-deps

* Update main.yml macos11

* Update main.yml

* Update main.yml

* Update main.yml

* Update osx-deps

* Update osx-deps

* Update osx-deps

* Update osx-deps

* Update main.yml

* Update osx-deps

* Update main.yml

* Updated workflow

* Updated workflow

* Updated workflow

* Update main.yml

* Updated workflow

---------

Co-authored-by: Dr. K. D. Murray <[email protected]>

* Fixes compile error for clang

* Fixed merge None type error. Added wb3 as default compression level for long-reads (run-pipeline) (#116)

* Updated workflow

* Updated workflow

* Fixed compiler warning

---------

Co-authored-by: Dr. K. D. Murray <[email protected]>

* Fixed compile errors. Add logging for HP tag usage

* Added superintervals dependency

* Added PSET phaseset HP haplotype tag and AF allele frequency tags to output

* Fixed index error

* Changed empty PSET and HP to 0

* PSET and HP default to empty string

* PSET and HP default to '-1' and '0'

* #125 better calling of low support SVs

* #125 better calling of low support SVs

* Updated requirements.txt

---------

Co-authored-by: Dr. K. D. Murray <[email protected]>
@swomics
Copy link
Author

swomics commented Mar 4, 2025

Thank you for such a quick response! I will try again tomorrow and see if the divergence parameter gets the behaviour I was looking for.

@kcleal
Copy link
Owner

kcleal commented Mar 4, 2025

No problem. I recommend getting the update though. You can download wheels for Mac or Linux here https://github.com/kcleal/dysgu/actions/runs/13658034481. These can be unzipped and installed using pip install wheelfile

@swomics
Copy link
Author

swomics commented Mar 5, 2025

Hello, I've tried again with the new build, divergence 0.2, min-support 2 & min-support 1 and it is still giving me the same behaviour (nothing called for the known SV with min-support 2 and very low quality candidate with no_call at min-support 1)

kcleal added a commit that referenced this issue Mar 5, 2025
* Dysgu_dev <-- master (#121)

* Dysgu dev (#115)

* dysgu_dev <- master (#108)

* correct usage of update_filter_value in filter_normals

It seems like at some point the usage of update_filter_value changed, because in several spots, it is missing the sample_name argument. This breaks `dysgu filter --keep-all`.

* Update main.yml

* Update main.yml

* Update requirements.txt

* updated build process to pyproject.toml (#103)

* v1.6.6 updated README

* Update main.yml

* Added pyproject.toml

* Updated pyproject.toml and setup.py

* Update pyproject.toml

---------

Co-authored-by: Dr. K. D. Murray <[email protected]>

* Better recall+precision for long-reads. Improved merging pipeline. Faster runtime. Code refactoring. WIP # [skip ci]

* Fixed issues with overriding presets. --mode option updated # [skip ci]

* Fixed issues with overriding presets. --mode option updated to support r10 and revio, new presets available. --no-gt flat removed # [skip ci]

* Fixed CLI issues for tests # [skip ci]

* Fixed API issue

* v1.7.0

* added deprecated warning for mode

* Added filter for high divergence reads (long-reads only) # [skip ci]

* Better read merging for long reads. New clustering methods for long reads. Improved runtime.

* Update main.yml

* Update main.yml

* Update osx-deps

* Update main.yml macos11

* Update main.yml

* Update main.yml

* Update main.yml

* Update osx-deps

* Update osx-deps

* Update osx-deps

* Update osx-deps

* Update main.yml

* Update osx-deps

* Update main.yml

* Updated workflow

* Updated workflow

* Updated workflow

* Update main.yml

* Updated workflow

---------

Co-authored-by: Dr. K. D. Murray <[email protected]>

* Fixes compile error for clang

* Fixed merge None type error. Added wb3 as default compression level for long-reads (run-pipeline) (#116)

* Updated workflow

* Updated workflow

* Fixed compiler warning

---------

Co-authored-by: Dr. K. D. Murray <[email protected]>

* Fixed compile errors. Add logging for HP tag usage

* Added superintervals dependency

* Added PSET phaseset HP haplotype tag and AF allele frequency tags to output

* Fixed index error

* Changed empty PSET and HP to 0

* PSET and HP default to empty string

* PSET and HP default to '-1' and '0'

* #125 better calling of low support SVs

* #125 better calling of low support SVs

* Updated requirements.txt

* Fixed large record output, switched to symbolic

---------

Co-authored-by: Dr. K. D. Murray <[email protected]>
@kcleal
Copy link
Owner

kcleal commented Mar 5, 2025

Ok, sorry about that. Would you be able to share the vcf file for that record? And if possible the log output for dysgu? Thanks

@kcleal
Copy link
Owner

kcleal commented Mar 6, 2025

Using the test data you sent, dysgu found a 3 SVs - are these the ones you were expecting?

##command="/home/kez/miniforge3/envs/py311/bin/dysgu call --divergence 0.2 --symbolic-sv-size 0 --mode pacbio-revio renamed_file.fasta wd test.bam -x"
#CHROM	POS	ID	REF	ALT	QUAL	FILTER	INFO	FORMAT	test
FR989875.1	12248982	1	C	C	.	PASS	SVMETHOD=DYSGUv1.8.1;SVTYPE=DEL;END=12249749;CHR2=FR989875.1;GRP=1;NGRP=1;CT=3to5;CIPOS95=0;CIEND95=0;SVLEN=767;KIND=extra-regional;GC=35.31;NEXP=0;STRIDE=0;EXPSEQ=;RPOLY=128;OL=0;SU=8;WR=4;PE=0;SR=0;SC=0;BND=0;LPREC=1;RT=pacbio	GT:GQ:PSET:HP:AF:MAPQP:MAPQS:SU:WR:PE:SR:SC:BND:NMB:COV:NEIGH10:PS:MS:RMS:RED:BCC:FCC:ICN:OCN:PROB	1/1:12:-1:0:1.0:60.0:0.0:8:4:0:0:0:0:1.11:0.0:5:3:1:0:0:0:-1.0:-1.0:-1.0:0.89
FR989875.1	12250402	3	A	A	.	PASS	SVMETHOD=DYSGUv1.8.1;SVTYPE=DEL;END=12255742;CHR2=FR989875.1;GRP=3;NGRP=1;CT=3to5;CIPOS95=0;CIEND95=0;SVLEN=5340;KIND=extra-regional;GC=35.97;NEXP=0;STRIDE=0;EXPSEQ=;RPOLY=278;OL=0;SU=8;WR=4;PE=0;SR=0;SC=0;BND=0;LPREC=1;RT=pacbio	GT:GQ:PSET:HP:AF:MAPQP:MAPQS:SU:WR:PE:SR:SC:BND:NMB:COV:NEIGH10:PS:MS:RMS:RED:BCC:FCC:ICN:OCN:PROB	1/1:12:-1:0:1.0:60.0:0.0:8:4:0:0:0:0:1.11:0.0:5:3:1:0:0:0:-1.0:-1.0:-1.0:0.841
FR989875.1	12257853	5	T	T	.	PASS	SVMETHOD=DYSGUv1.8.1;SVTYPE=DEL;END=12259528;CHR2=FR989875.1;GRP=5;NGRP=1;CT=3to5;CIPOS95=0;CIEND95=0;SVLEN=1675;KIND=extra-regional;GC=37.38;NEXP=0;STRIDE=0;EXPSEQ=;RPOLY=113;OL=0;SU=4;WR=2;PE=0;SR=0;SC=0;BND=0;LPREC=1;RT=pacbio	GT:GQ:PSET:HP:AF:MAPQP:MAPQS:SU:WR:PE:SR:SC:BND:NMB:COV:NEIGH10:PS:MS:RMS:RED:BCC:FCC:ICN:OCN:PROB	./.:.:-1:0:1.0:60.0:0.0:4:2:0:0:0:0:1.66:0.0:5:1:1:0:0:0:-1.0:-1.0:-1.0:0.728

gw renamed_file.fasta -b test.bam -v test.vcf --track test.vcf

Image

@swomics
Copy link
Author

swomics commented Mar 6, 2025

Here are the outputs, perhaps it's the difference in the parameters? Initially I ran with the sequel2 flag (it's older HiFi data), but I had the same issue.

2025-03-06 11:01:30,936 [INFO ] [dysgu-run] Version: 1.8.1
2025-03-06 11:01:30,936 [INFO ] run --min-support 1 --divergence 0.2 --pl pacbio -x genome.fna G7-2013-13_wd G7-2013-13.bam
2025-03-06 11:01:30,936 [INFO ] Destination: G7-2013-13_wd
2025-03-06 11:01:58,711 [INFO ] dysgu fetch G7-2013-13.bam written to G7-2013-13_wd/G7-2013-13.dysgu_reads.bam, n=170309, time=0:00:27 h:m:s
2025-03-06 11:01:58,711 [INFO ] Input file is: G7-2013-13_wd/G7-2013-13.dysgu_reads.bam
2025-03-06 11:01:58,714 [WARNING] Warning: no @rg, using input file name as sample name for output: G7-2013-13.dysgu_reads
2025-03-06 11:01:58,714 [INFO ] Sample name: G7-2013-13.dysgu_reads
2025-03-06 11:01:58,714 [INFO ] Writing vcf to stdout
2025-03-06 11:01:58,715 [INFO ] Running pipeline
2025-03-06 11:02:10,443 [INFO ] Inferred read length: 14580.0
2025-03-06 11:02:10,445 [INFO ] Max clustering dist: 1050
2025-03-06 11:02:10,445 [INFO ] Building cluster graph
2025-03-06 11:02:19,767 [INFO ] Total input reads: 170309
2025-03-06 11:02:19,795 [INFO ] Graph constructed
2025-03-06 11:02:19,796 [INFO ] Minimum support: 1
2025-03-06 11:04:59,633 [INFO ] Number of components: 187201. N candidates: 186862
2025-03-06 11:04:59,756 [INFO ] Re-alignment of soft-clips done. N candidates: 186855
2025-03-06 11:12:53,420 [INFO ] Number of candidate SVs merged: 71419
2025-03-06 11:12:53,420 [INFO ] Number of candidate SVs after merge: 115436
2025-03-06 11:12:53,483 [INFO ] Loaded n=33 chromosome coverage arrays from G7-2013-13_wd
2025-03-06 11:13:21,746 [INFO ] Loading Model: ~/lib/python3.12/site-packages/dysgu/dysgu_model.1.pkl.gz
2025-03-06 11:13:21,979 [INFO ] Model config: pacbio, diploid: True, contig features: True. N features: 26
2025-03-06 11:13:43,321 [INFO ] dysgu call G7-2013-13_wd/G7-2013-13.dysgu_reads.bam complete, n=115343, time=0:11:44 h:m:s
2025-03-06 11:13:43,488 [INFO ] dysgu run G7-2013-13.bam complete, time=0:12:12 h:m:s

FR989875.1 12257853 190306 T <DEL> . lowProb SVMETHOD=DYSGUv1.8.1;SVTYPE=DEL;END=12259528;CHR2=FR989875.1;GRP=190306;NGRP=1;CT=3to5;CIPOS95=0;CIEND95=0;SVLEN=1675;CONTIGA=ATAAAAAATATTAATATCCATTAATTTAAATACATGATTGCAAATGTGGTGCATATGCGTGTATTATACATATAAACTTTCCCCTTGAACCACTTTGTCTATTGATGGAAACCGCATTAAAATCCGTTGCGTAGTTTAAAAGATCTAAGCGTACAGATGGCGGGAAGCGTTAGACACAGTTCAACAATTTATCAACTCTGGCAGCAGTCAGTACTGGATAAGGCCGTAGTACGTCTTATTTTTGGATCAT;KIND=extra-regional;GC=35.6;NEXP=0;STRIDE=0;EXPSEQ=;RPOLY=21;OL=0;SU=2;WR=1;PE=0;SR=0;SC=0;BND=0;LPREC=1;RT=pacbio GT:GQ:PSET:HP:AF:MAPQP:MAPQS:SU:WR:PE:SR:SC:BND:NMB:COV:NEIGH10:PS:MS:RMS:RED:BCC:FCC:ICN:OCN:PROB ./.:.:-1:0:1.0:60.0:0.0:2:1:0:0:0:0:1.17:0.0:26:1:0:0:0:0:-1.0:-1.0:-1.0:0.249
FR989875.1 12257853 190312 T <DEL> . lowProb SVMETHOD=DYSGUv1.8.1;SVTYPE=DEL;END=12259528;CHR2=FR989875.1;GRP=190312;NGRP=1;CT=3to5;CIPOS95=0;CIEND95=0;SVLEN=1675;CONTIGA=ATAAAAAATATTAATATCCATTAATTTAAATACATGATTGCAAATGTGGTGCATATGCGTGTATTATACATATAAACTTTCCCTTGAACCACTTTGTCTATTGATGGAAACCGCATTAAAATCCGTTGCGTAGTTTAAAAGATCTAAGCGTACAGATGGCGGGAAGCGTTAGACACAGTTCAACAATTTATCAACTCTGGCAGCAGTCAGTACTGGATAAGGCCGTAGTACGTCTTATTTTTGGATCAT;KIND=extra-regional;GC=35.34;NEXP=0;STRIDE=0;EXPSEQ=;RPOLY=21;OL=0;SU=2;WR=1;PE=0;SR=0;SC=0;BND=0;LPREC=1;RT=pacbio GT:GQ:PSET:HP:AF:MAPQP:MAPQS:SU:WR:PE:SR:SC:BND:NMB:COV:NEIGH10:PS:MS:RMS:RED:BCC:FCC:ICN:OCN:PROB ./.:.:-1:0:1.0:60.0:0.0:2:1:0:0:0:0:2.15:0.0:26:0:1:0:0:0:-1.0:-1.0:-1.0:0.219

@swomics
Copy link
Author

swomics commented Mar 6, 2025

Using the test data you sent, dysgu found a 3 SVs - are these the ones you were expecting?

##command="/home/kez/miniforge3/envs/py311/bin/dysgu call --divergence 0.2 --symbolic-sv-size 0 --mode pacbio-revio renamed_file.fasta wd test.bam -x"
#CHROM	POS	ID	REF	ALT	QUAL	FILTER	INFO	FORMAT	test
FR989875.1	12248982	1	C	C	.	PASS	SVMETHOD=DYSGUv1.8.1;SVTYPE=DEL;END=12249749;CHR2=FR989875.1;GRP=1;NGRP=1;CT=3to5;CIPOS95=0;CIEND95=0;SVLEN=767;KIND=extra-regional;GC=35.31;NEXP=0;STRIDE=0;EXPSEQ=;RPOLY=128;OL=0;SU=8;WR=4;PE=0;SR=0;SC=0;BND=0;LPREC=1;RT=pacbio	GT:GQ:PSET:HP:AF:MAPQP:MAPQS:SU:WR:PE:SR:SC:BND:NMB:COV:NEIGH10:PS:MS:RMS:RED:BCC:FCC:ICN:OCN:PROB	1/1:12:-1:0:1.0:60.0:0.0:8:4:0:0:0:0:1.11:0.0:5:3:1:0:0:0:-1.0:-1.0:-1.0:0.89
FR989875.1	12250402	3	A	A	.	PASS	SVMETHOD=DYSGUv1.8.1;SVTYPE=DEL;END=12255742;CHR2=FR989875.1;GRP=3;NGRP=1;CT=3to5;CIPOS95=0;CIEND95=0;SVLEN=5340;KIND=extra-regional;GC=35.97;NEXP=0;STRIDE=0;EXPSEQ=;RPOLY=278;OL=0;SU=8;WR=4;PE=0;SR=0;SC=0;BND=0;LPREC=1;RT=pacbio	GT:GQ:PSET:HP:AF:MAPQP:MAPQS:SU:WR:PE:SR:SC:BND:NMB:COV:NEIGH10:PS:MS:RMS:RED:BCC:FCC:ICN:OCN:PROB	1/1:12:-1:0:1.0:60.0:0.0:8:4:0:0:0:0:1.11:0.0:5:3:1:0:0:0:-1.0:-1.0:-1.0:0.841
FR989875.1	12257853	5	T	T	.	PASS	SVMETHOD=DYSGUv1.8.1;SVTYPE=DEL;END=12259528;CHR2=FR989875.1;GRP=5;NGRP=1;CT=3to5;CIPOS95=0;CIEND95=0;SVLEN=1675;KIND=extra-regional;GC=37.38;NEXP=0;STRIDE=0;EXPSEQ=;RPOLY=113;OL=0;SU=4;WR=2;PE=0;SR=0;SC=0;BND=0;LPREC=1;RT=pacbio	GT:GQ:PSET:HP:AF:MAPQP:MAPQS:SU:WR:PE:SR:SC:BND:NMB:COV:NEIGH10:PS:MS:RMS:RED:BCC:FCC:ICN:OCN:PROB	./.:.:-1:0:1.0:60.0:0.0:4:2:0:0:0:0:1.66:0.0:5:1:1:0:0:0:-1.0:-1.0:-1.0:0.728

gw renamed_file.fasta -b test.bam -v test.vcf --track test.vcf

Image

Yes, it is the 1675 deletion that I would like to call and genotype in short read sets.

@kcleal
Copy link
Owner

kcleal commented Mar 6, 2025

Its a little unclear what is going wrong here. If you run the test sample you sent me, does dysgu 1.8.1 now find the SV in question? I think I might need another large test bam file to reproduce the issue. Thanks!

@swomics
Copy link
Author

swomics commented Mar 6, 2025

If I run the command as above

dysgu call --divergence 0.2 --symbolic-sv-size 0 --mode pacbio-revio ../../Sanger_genome2/GCA_905404145.2_ilBisBetu1.2_genomic.fna wd test.bam -x
2025-03-06 16:16:11,090 [INFO ] [dysgu-call] Version: 1.8.1
2025-03-06 16:16:11,091 [INFO ] Input file is: test.bam
2025-03-06 16:16:11,091 [INFO ] call --divergence 0.2 --symbolic-sv-size 0 --mode pacbio-revio ../../Sanger_genome2/GCA_905404145.2_ilBisBetu1.2_genomic.fna wd test.bam -x
2025-03-06 16:16:11,091 [WARNING] Warning: no @rg, using input file name as sample name for output: test
2025-03-06 16:16:11,091 [INFO ] Sample name: test
2025-03-06 16:16:11,091 [INFO ] Writing vcf to stdout
2025-03-06 16:16:11,091 [INFO ] Running pipeline
2025-03-06 16:16:11,092 [INFO ] Sequence divergence: 0.2
2025-03-06 16:16:11,092 [INFO ] Building cluster graph
2025-03-06 16:16:11,142 [INFO ] Total input reads: 5
2025-03-06 16:16:11,142 [INFO ] Graph constructed
2025-03-06 16:16:11,142 [INFO ] Inferred minimum support: 2
2025-03-06 16:16:11,193 [INFO ] Number of components: 8. N candidates: 3
2025-03-06 16:16:11,197 [INFO ] Number of candidate SVs merged: 0
2025-03-06 16:16:11,197 [INFO ] Number of candidate SVs after merge: 3
2025-03-06 16:16:11,199 [INFO ] Loaded n=1 chromosome coverage arrays from wd
2025-03-06 16:16:11,199 [INFO ] Number of candidate SVs dropped with support < min support: 0
Traceback (most recent call last):
File "/bin/miniconda3/envs/dysgu/bin/dysgu", line 8, in
sys.exit(cli())
^^^^^
File "
/bin/miniconda3/envs/dysgu/lib/python3.12/site-packages/click/core.py", line 1161, in call
return self.main(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/bin/miniconda3/envs/dysgu/lib/python3.12/site-packages/click/core.py", line 1082, in main
rv = self.invoke(ctx)
^^^^^^^^^^^^^^^^
File "
/bin/miniconda3/envs/dysgu/lib/python3.12/site-packages/click/core.py", line 1697, in invoke
return _process_result(sub_ctx.command.invoke(sub_ctx))
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/bin/miniconda3/envs/dysgu/lib/python3.12/site-packages/click/core.py", line 1443, in invoke
return ctx.invoke(self.callback, **ctx.params)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "
/bin/miniconda3/envs/dysgu/lib/python3.12/site-packages/click/core.py", line 788, in invoke
return __callback(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/bin/miniconda3/envs/dysgu/lib/python3.12/site-packages/click/decorators.py", line 33, in new_func
return f(get_current_context(), *args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "
/bin/miniconda3/envs/dysgu/lib/python3.12/site-packages/dysgu/main.py", line 464, in call_events
cluster.cluster_reads(ctx.obj)
File "dysgu/cluster.pyx", line 605, in dysgu.cluster.cluster_reads
File "dysgu/cluster.pyx", line 512, in dysgu.cluster.pipe1
File "~/bin/miniconda3/envs/dysgu/lib/python3.12/site-packages/dysgu/post_call.py", line 350, in get_ref_base
end - e.posA
^^^
UnboundLocalError: cannot access local variable 'end' where it is not associated with a value

@kcleal
Copy link
Owner

kcleal commented Mar 6, 2025

Thats odd, I think that bit of code has been replaced in a newer build already. Would you mind testing with the latest build of v1.8.1 found here:

https://github.com/kcleal/dysgu/actions/runs/13679919289

Thanks again.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants