Code crashes when chromosome ID's are digit-only #120

jmoellmann · 2024-10-31T20:41:52Z

When running pixy on datasets where chomosome IDs consists only of digits, and starting with zeros (e.g. ["0001", [...], "0016"]), the program will break when reading from the temp files, as the IDs get automatically converted to numerics, removing any trailing zeros (pd.read_csv, line 328, main.py), resulting in a KeyError on lines 363 and 370.

This is certainly very much an edge case, as most chromosome IDs will not be digit-only, but some tools output numeric-only chromosome IDs.

This bug is certainly irrespective of the pixy command and populations files used and the system architecture and it is very easy to reproduce.

A line from an exemplary VCF:
#CHROM POS ID REF ALT QUAL FILTER INFO FORMAT X1 [...] XY
0001 1 . A . 100 . DP=[...] GT:[...]

I suggest the following fix at line 328, main.py:

< --- outpanel = pandas.read_csv(temp_file, sep='\t', header=None)
---- > outpanel = pandas.read_csv(temp_file, sep='\t', header=None, dtype = {3 : 'string'})

jmoellmann added the bug Something isn't working label Oct 31, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Code crashes when chromosome ID's are digit-only #120

Code crashes when chromosome ID's are digit-only #120

jmoellmann commented Oct 31, 2024 •

edited

Loading

Code crashes when chromosome ID's are digit-only #120

Code crashes when chromosome ID's are digit-only #120

Comments

jmoellmann commented Oct 31, 2024 • edited Loading

jmoellmann commented Oct 31, 2024 •

edited

Loading