Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Code crashes when chromosome ID's are digit-only #120

Open
jmoellmann opened this issue Oct 31, 2024 · 0 comments
Open

Code crashes when chromosome ID's are digit-only #120

jmoellmann opened this issue Oct 31, 2024 · 0 comments
Labels
bug Something isn't working

Comments

@jmoellmann
Copy link

jmoellmann commented Oct 31, 2024

When running pixy on datasets where chomosome IDs consists only of digits, and starting with zeros (e.g. ["0001", [...], "0016"]), the program will break when reading from the temp files, as the IDs get automatically converted to numerics, removing any trailing zeros (pd.read_csv, line 328, main.py), resulting in a KeyError on lines 363 and 370.

This is certainly very much an edge case, as most chromosome IDs will not be digit-only, but some tools output numeric-only chromosome IDs.

This bug is certainly irrespective of the pixy command and populations files used and the system architecture and it is very easy to reproduce.

A line from an exemplary VCF:
#CHROM POS ID REF ALT QUAL FILTER INFO FORMAT X1 [...] XY
0001 1 . A . 100 . DP=[...] GT:[...]

I suggest the following fix at line 328, main.py:

< --- outpanel = pandas.read_csv(temp_file, sep='\t', header=None)
---- > outpanel = pandas.read_csv(temp_file, sep='\t', header=None, dtype = {3 : 'string'})

@jmoellmann jmoellmann added the bug Something isn't working label Oct 31, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

1 participant