You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I have been trying to follow the documentation in lmat-doc.txt to build a custom database for use with LMAT. I've been having issues doing so. I'll try to document a specific case here.
One step in the process of building a custom database is constructing a mapping file between NCBI Taxonomy Database identifiers and the full deflines from the multi-FASTA formatted file containing the reference sequences. I've followed the documentation below in that regard:
The mapping is specified as a tab delimited file with the first column containing the tax id and the second
column should contain the header associated with sequence stored in the input fasta file (WORK/test.fa below)
For example:
418127 >ref|NC_009782.1|gnl|NCBI_GENOMES|21340|gi|156978331|Staphylococcus aureus subsp. aureus Mu3, complete genome
When I provide my constructed GenomeToTaxID.txt file to build_header_table.py, it breaks:
reading: /media/ephemeral/taltman/lmat/GenomeToTaxID.txt
Traceback (most recent call last):
File "./build_header_table.py", line 44, in <module>
gi_to_tid[t[4]] = t[0]
IndexError: list index out of range
Poking into the Python script, it seems to be expecting a file with at least five columns, not two. Changing t[4] to t[1] seems to fix it.
So, either there is a documentation bug, or there is a software bug.
Any feedback would be greatly appreciated. Thanks!
The text was updated successfully, but these errors were encountered:
Hello, It's possible that this script was merged with another version that used the following convoluted formatting as follows:
Taxonomy id, taxonomy id, -1, otherid, header
as you can see there is a lot of redundancy here. I don't think this is needed.
It appears to me the best option would be to make the change: gi_to_tid[t[1]] = t[0] and use your original format.
Hi LMAT team!
I have been trying to follow the documentation in lmat-doc.txt to build a custom database for use with LMAT. I've been having issues doing so. I'll try to document a specific case here.
One step in the process of building a custom database is constructing a mapping file between NCBI Taxonomy Database identifiers and the full deflines from the multi-FASTA formatted file containing the reference sequences. I've followed the documentation below in that regard:
When I provide my constructed GenomeToTaxID.txt file to
build_header_table.py
, it breaks:Poking into the Python script, it seems to be expecting a file with at least five columns, not two. Changing
t[4]
tot[1]
seems to fix it.So, either there is a documentation bug, or there is a software bug.
Any feedback would be greatly appreciated. Thanks!
The text was updated successfully, but these errors were encountered: