kmersim produce empty similarities.tsv file #43

ceci4323 · 2024-12-03T14:54:24Z

ceci4323
Dec 3, 2024

Hi,
Thank you for continuing to develop this tool, I'm a big supporter of the pipeline and have used it numerous times for different pathogens. I am trying to run the command straingst kmersim --all-vs-all -t 4 -S jaccard -S subset strainge_db/*.hdf5 > similarities.tsv to compare the k-mer sets. I am running on a database of Salmonella genomes from ncbi, but when I run the script, the output is an empty file. I am wondering whether this is due to the genomes being too similar? Salmonella is a very clonal bacteria, so I could imagine that this creates problems.

My question is, is there any way to avoid this? And if not, can the kmersim-step be skipped and I can go straight to the clustering part? I was considering copying all filenames from the strainge_db folder to make a custom similarities.tsv file.

Kind regards,

m-nikolaidis · 2024-12-24T17:58:17Z

m-nikolaidis
Dec 24, 2024

Hello, I had a similar problem with you but was using another database. In my case the problem was with some files that had an empty output after the kmerize command (it was Esch_coli_KE47).

I found out by running with only one thread (because the -t 4 spawn 4 different threads) and checking the output log.

Hope this helps.

1 reply

ceci4323 Dec 29, 2024
Author

Great suggestion! Thank you so much

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

kmersim produce empty similarities.tsv file #43

{{title}}

Replies: 1 comment 1 reply

{{title}}

{{title}}

Select a reply

kmersim produce empty similarities.tsv file #43

ceci4323 Dec 3, 2024

Replies: 1 comment · 1 reply

m-nikolaidis Dec 24, 2024

ceci4323 Dec 29, 2024 Author

ceci4323
Dec 3, 2024

Replies: 1 comment 1 reply

m-nikolaidis
Dec 24, 2024

ceci4323 Dec 29, 2024
Author