You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
When running pixy on datasets where chomosome IDs consists only of digits, and starting with zeros (e.g. ["0001", [...], "0016"]), the program will break when reading from the temp files, as the IDs get automatically converted to numerics, removing any trailing zeros (pd.read_csv, line 328, main.py), resulting in a KeyError on lines 363 and 370.
This is certainly very much an edge case, as most chromosome IDs will not be digit-only, but some tools output numeric-only chromosome IDs.
This bug is certainly irrespective of the pixy command and populations files used and the system architecture and it is very easy to reproduce.
A line from an exemplary VCF:
#CHROM POS ID REF ALT QUAL FILTER INFO FORMAT X1 [...] XY
0001 1 . A . 100 . DP=[...] GT:[...]
When running pixy on datasets where chomosome IDs consists only of digits, and starting with zeros (e.g. ["0001", [...], "0016"]), the program will break when reading from the temp files, as the IDs get automatically converted to numerics, removing any trailing zeros (pd.read_csv, line 328, main.py), resulting in a KeyError on lines 363 and 370.
This is certainly very much an edge case, as most chromosome IDs will not be digit-only, but some tools output numeric-only chromosome IDs.
This bug is certainly irrespective of the pixy command and populations files used and the system architecture and it is very easy to reproduce.
A line from an exemplary VCF:
#CHROM POS ID REF ALT QUAL FILTER INFO FORMAT X1 [...] XY
0001 1 . A . 100 . DP=[...] GT:[...]
I suggest the following fix at line 328, main.py:
< --- outpanel = pandas.read_csv(temp_file, sep='\t', header=None)
---- > outpanel = pandas.read_csv(temp_file, sep='\t', header=None, dtype = {3 : 'string'})
The text was updated successfully, but these errors were encountered: