From 98eb065b761074e87a390b009a7e12bdf4e97ee9 Mon Sep 17 00:00:00 2001 From: johnne Date: Thu, 6 May 2021 09:22:44 +0200 Subject: [PATCH] Update README --- README.md | 14 ++++++++++++++ 1 file changed, 14 insertions(+) diff --git a/README.md b/README.md index c266d7c..6173dee 100644 --- a/README.md +++ b/README.md @@ -33,6 +33,20 @@ This will download, filter and cluster sequences from [GBIF Hosted Datasets](htt See below for configuration and more options. +## Output + +Sequences in the resulting `bold_clustered.fasta` fasta file contain the original +identifier as their primary id, and a string showing their taxonomic lineage in +the fasta header: + +```bash +>centroid=GBA28357-15 Arthropoda;Insecta;Psocodea;Philotarsidae;Aaroniella;Aaroniella sp.;seqs=1 +``` + +In this example `centroid=` indicates that sequences from this species were +clustered with `vsearch` and that the representative sequence for the resulting +cluster is `GBA28357-15`. + ## Configuration There are a few configurable parameters that modifies how sequences are filtered and clustered. You can modify these parameters using a config file in `yaml`