Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

+1 end position in _ampliconx_ecDNA_x_intervals.bed in dir _classification_bed_files #13

Open
solo7773 opened this issue Apr 12, 2023 · 5 comments

Comments

@solo7773
Copy link

v0.4.13

The coordinate in cycle file is, for example,
chr1 100 500

however, in the eccDNA_x_intervals.bed output file, it becomes
chr1 100 501

for other outputs, like BFB_x_intervals.bed, unknown_x_intervals.bed, it is unchanged, namely
chr1 100 500

I tried to locate the code that causes the difference. It might be within the amplicon_annotation function definition in ac_annotation.py, but I am not sure. Please help to fix it.

I also notice that v0.4.16 changelog says Result table creation bugfixes. Has the issue been fixed in the new version?

@jluebeck
Copy link
Member

Hi, thanks for reaching out - I will attempt to reproduce the issue locally and get back to you in the next couple of days.

Re: 0.4.16 "result table creation bugfixes" - that fix was related to transforming the classification results into more condensed .tsv and .json files, which also reported the paths of some relevant files for each focal amp. They were not bugfixes to the classification table itself. Please let me know if I misunderstood your query though.

Thanks,
Jens

@jluebeck
Copy link
Member

Sorry - can you clarify, are you experiencing this coordinate error in 0.4.13? Did you experience it also in the latest version 0.4.16?

@solo7773
Copy link
Author

solo7773 commented Apr 13, 2023

Sorry - can you clarify, are you experiencing this coordinate error in 0.4.13? Did you experience it also in the latest version 0.4.16?

Hi Jens,

Thanks for your timely reply.

I just tested and the coordinate error also shows up in v0.4.16. Please find output details in the following.

=-=-= _amplicon1_cycles.txt =-=-=

Interval 1 chr7 54754577 55441772
Interval 2 chr13 23999081 24718495
Interval 3 chr13 66726562 68688055
List of cycle segments
Segment 1 chr7 54770769 54806995
Segment 2 chr7 54770769 55085821
...
Segment 5 chr7 55154431 55163550
Segment 6 chr7 55154431 55287963
Segment 7 chr7 55163846 55184468
Segment 8 chr7 55200137 55329629
...
Segment 15 chr13 24117009 24407971
...
Segment 17 chr13 66843810 67028499
...
Segment 38 chr13 67924848 68361576
...
Segment 42 chr13 68365639 68396999
Segment 43 chr13 68489102 68530848
Segment 44 chr13 68520852 68573873

=-=-= _amplicon1_ecDNA_1_intervals.bed =-=-=

chr7 54770769 55085822
chr7 55154431 55329630
chr13 24117009 24407972
chr13 66843810 68361577
chr13 68520852 68573874

=-=-= _amplicon1_unknown_1_intervals.bed =-=-=
chr13 68365639 68397000 (PS: this error does not present in v0.4.13)

Thanks,
Nan

@jluebeck
Copy link
Member

Hi Nan,

Thanks again for checking your local files and providing these examples. I realize now the issue is as follows:

AmpliconArchitect uses a 0-based fully-closed counting system (endpoint included). This allows 'intervals' of size 1bp to have the same start and end coordinate. It is useful for AA, but not for external programs.

AmpliconClassifier reports intervals using a 0-based half-closed counting system (endpoint excluded). This is what the UCSC genome browser and the IGV browser use. We made this change to enable better compatibility with outside tools when seeing where intervals map.

I have updated the README to contain this information.

In my examination of my own files, I found that all amplicon types are reported with the 0-based half-closed system (at least in 0.4.16 and on). If you find some amplicon types that don't obey that, please do let me know, and if possible provide the graph and cycles file so I can reproduce it locally.

Congratulations by the way on the eccDNAdb site. It has a lot of very nice functionality!

Thank you,
Jens

@solo7773
Copy link
Author

solo7773 commented Apr 14, 2023

Hi Jens,

Thanks for your detailed explanation of the difference between AmpliconArchitect and AmpliconClassifier outputs. Recently I am working with AmpliconClassifier outputs, and I will tell you if I see other issues.

Congratulations on your new publication in Nature. Also, I appreciate your generous share of the Amplicon software suite which has helped me a lot.

Warmest regards,
Nan

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants