Skip to content

Running Metalign

nlapier2 edited this page Jun 30, 2020 · 1 revision

Preliminaries

For the example below, we assume a directory called "/my/directory/Metalign/" where Metalign's repo has been installed (according to the wiki page) and your reads file is called "my_reads.fq" and is in the same directory. We assume you are in "/my/directory/Metalign/" when running these commands. In this example, the output file will be /my/directory/Metalign/metalign_results.tsv.

In reality, you do not need to run Metalign from within its install directory; just make sure you update your file paths correctly.

The Easy Way

Metalign comes with a wrapper script called metalign.py that runs both stages of the method. The quick usage is:

python3 metalign.py my_reads.fq data/ --output metalign_results.tsv 

For more sensitive or precise results, you can try running this with the "--sensitive" or "--precise" flags added.

Running the two steps separately

You can run the two steps of Metalign (pre-filtering the database, then alignment+profiling) separately. One possible use case of this is to retain the pre-filtered database for other uses. Another is if you already have a SAM file and you want to pass this into the alignment+profiling stage (which will then only perform the profiling step). Finally, this allows you to tinker with the parameters more.

The most simple case will look something like this:

python3 select_db.py my_reads.fq data/ --db cmash_db.fna
python3 map_and_profile.py my_reads.fq data/ --db cmash_db.fna --output metalign_results.tsv

There are other options that can be explored, including modulating the filtering parameters for the pre-filtering stage and the profiling stage, normalizing abundances by genome length, and more. For more details, see the "Command line option descriptions" wiki page.

Clone this wiki locally