-
Notifications
You must be signed in to change notification settings - Fork 7
Running Metalign
For the example below, we assume a directory called "/my/directory/Metalign/" where Metalign's repo has been installed (according to the wiki page) and your reads file is called "my_reads.fq" and is in the same directory. We assume you are in "/my/directory/Metalign/" when running these commands. In this example, the output file will be /my/directory/Metalign/metalign_results.tsv.
In reality, you do not need to run Metalign from within its install directory; just make sure you update your file paths correctly.
Metalign comes with a wrapper script called metalign.py that runs both stages of the method. The quick usage is:
python3 metalign.py my_reads.fq data/ --output metalign_results.tsv
For more sensitive or precise results, you can try running this with the "--sensitive" or "--precise" flags added.
You can run the two steps of Metalign (pre-filtering the database, then alignment+profiling) separately. One possible use case of this is to retain the pre-filtered database for other uses. Another is if you already have a SAM file and you want to pass this into the alignment+profiling stage (which will then only perform the profiling step). Finally, this allows you to tinker with the parameters more.
The most simple case will look something like this:
python3 select_db.py my_reads.fq data/ --db cmash_db.fna
python3 map_and_profile.py my_reads.fq data/ --db cmash_db.fna --output metalign_results.tsv
There are other options that can be explored, including modulating the filtering parameters for the pre-filtering stage and the profiling stage, normalizing abundances by genome length, and more. For more details, see the "Command line option descriptions" wiki page.