Replies: 1 comment 1 reply
-
Hello, I had a similar problem with you but was using another database. In my case the problem was with some files that had an empty output after the kmerize command (it was Esch_coli_KE47). I found out by running with only one thread (because the -t 4 spawn 4 different threads) and checking the output log. Hope this helps. |
Beta Was this translation helpful? Give feedback.
1 reply
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
Hi,
Thank you for continuing to develop this tool, I'm a big supporter of the pipeline and have used it numerous times for different pathogens. I am trying to run the command
straingst kmersim --all-vs-all -t 4 -S jaccard -S subset strainge_db/*.hdf5 > similarities.tsv
to compare the k-mer sets. I am running on a database of Salmonella genomes from ncbi, but when I run the script, the output is an empty file. I am wondering whether this is due to the genomes being too similar? Salmonella is a very clonal bacteria, so I could imagine that this creates problems.My question is, is there any way to avoid this? And if not, can the kmersim-step be skipped and I can go straight to the clustering part? I was considering copying all filenames from the strainge_db folder to make a custom similarities.tsv file.
Kind regards,
Beta Was this translation helpful? Give feedback.
All reactions