Documentation / software bug for building custom database #11

taltman · 2020-04-23T17:38:33Z

Hi LMAT team!

I have been trying to follow the documentation in lmat-doc.txt to build a custom database for use with LMAT. I've been having issues doing so. I'll try to document a specific case here.

One step in the process of building a custom database is constructing a mapping file between NCBI Taxonomy Database identifiers and the full deflines from the multi-FASTA formatted file containing the reference sequences. I've followed the documentation below in that regard:

 The mapping is specified as a tab delimited file with the first column containing the tax id and the second
 column should contain the header associated with sequence stored in the input fasta file (WORK/test.fa below)
 For example:
 418127   >ref|NC_009782.1|gnl|NCBI_GENOMES|21340|gi|156978331|Staphylococcus aureus subsp. aureus Mu3, complete genome

When I provide my constructed GenomeToTaxID.txt file to build_header_table.py, it breaks:

reading: /media/ephemeral/taltman/lmat/GenomeToTaxID.txt
Traceback (most recent call last):
  File "./build_header_table.py", line 44, in <module>
    gi_to_tid[t[4]] = t[0]
IndexError: list index out of range

Poking into the Python script, it seems to be expecting a file with at least five columns, not two. Changing t[4] to t[1] seems to fix it.

So, either there is a documentation bug, or there is a software bug.

Any feedback would be greatly appreciated. Thanks!

The text was updated successfully, but these errors were encountered:

jeallen · 2020-04-23T20:13:43Z

Hello, It's possible that this script was merged with another version that used the following convoluted formatting as follows:
Taxonomy id, taxonomy id, -1, otherid, header
as you can see there is a lot of redundancy here. I don't think this is needed.

It appears to me the best option would be to make the change: gi_to_tid[t[1]] = t[0] and use your original format.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Documentation / software bug for building custom database #11

Documentation / software bug for building custom database #11

taltman commented Apr 23, 2020

jeallen commented Apr 23, 2020

Documentation / software bug for building custom database #11

Documentation / software bug for building custom database #11

Comments

taltman commented Apr 23, 2020

jeallen commented Apr 23, 2020