This directory contains a Makefile that will build a private database for the 64 genomes used in Awad et al., 2017.
Run make
to run the entire pipeline. (You'll need sourmash v4.4.0
installed.)
The Makefile does the following:
The Makefile runs curl to download the genomes from the OSF project, and then unpacks them.
Next, the Makefile uses the script ../fasta-to-fromfile.py
to
scan the genomes and produce a summary file, build.csv
, that
contains names and source genomes for sourmash signatures.
Finally, the Makefile runs
sourmash sketch fromfile build.csv -p dna -o podar-ref.zip
to sketch all of the genomes in build.csv
. The parameter string -p dna
tells sourmash to construct DNA sketches using the default parameters;
any number of parameter strings can be provided, one with each -p
.
The names for the output signatures are taken from build.csv
.
You can run sourmash sig summarize podar-ref.zip
to get a summary of
the contents of the zip file, or sourmash sig describe podar-ref.zip
to get a listing of all the signatures.