Discrepancy in Abundances in Sensitive Mode #45

echolley31 · 2024-06-05T18:54:08Z

Hi there! Thank you for your comment on my initial Metalign installation issue! I was able to get the application working, and I just had a question about the calculation of relative abundances. I posted this on Biostars, but I was wondering if you also had an answer? Thank you so much!

My lab is particularly interested in M. smegmatis, and the abundance of that within a particular soil sample. When I ran Metalign with the default parameters, M. smeg was found in 10% relative abundance, and around 30 other species were identified in the sample. This is a native soil sample, with a limited amount of M. smeg added to the soil. I would expect there to be more than just 30 species identified and for M. smeg to not be 10% of all reads within that native soil sample, given the biodiversity of soil.

When I changed the parameters to run in sensitive mode, Metalign found more than 17,000 different species, and the M. smegmatis abundance dropped to 0.01. If this is relative abundance, which includes the % of unmapped reads in both of these runs, shouldn't it find the same amount of M. smegmatis?

I've been reading on the CMash algorithm and how it pre-filters the database based upon the ratio (containment index) of k-mers from reads in common with a reference genome to the number of k-mers in that reference genome. When using the defaults, Metalign has this ratio/index cutoff of 0.01. When Metalign runs in sensitive mode, the cutoff is 0.0, effectively eliminating this pre-filtering step.

Additionally, 61% of reads were unmapped with the defaults as opposed to 78% unmapped with --sensitive mode. Therefore, despite the --sensitive mode identifying way more organisms, the percentage of unmapped reads increased. Thus, is Metalign falsely aligning reads to M. smeg and the 30 other organisms since that is all the filtered database has? Why did the abundance of M. smegmatis go from 10% with a 0.01 CMash cutoff to a 0.015% abundance when there was no CMash cutoff?

If there is anything wrong with my understanding of Metalign's algorithm or how it works, please let me know. I'm very new at this and just trying to understand this large discrepancy in the results. Thank you!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Discrepancy in Abundances in Sensitive Mode #45

Discrepancy in Abundances in Sensitive Mode #45

echolley31 commented Jun 5, 2024

Discrepancy in Abundances in Sensitive Mode #45

Discrepancy in Abundances in Sensitive Mode #45

Comments

echolley31 commented Jun 5, 2024