-
Notifications
You must be signed in to change notification settings - Fork 11
Strain Comparisons
Compare strains across samples.
- StrainGR call data (HDF5 files) for the samples of interest
Strains in different samples that match the same close reference genome can be compared in more detail (at the nucleotide level) using StrainGR.
To compare strains run straingr compare
:
straingr compare sample1.hdf5 sample2.hdf5 \
-o sample1.vs.sample2.summary.tsv -d sample1.vs.sample2.details.tsv
straingr compare
takes in two HDF5 files as generated by straingr call
, and the compares the base calls in each
sample for each scaffold in the concatenated reference. If different concatenated references were used for each sample,
only the scaffolds the two concatenated references have in common will be compared.
This file contains several metrics that summarizes the comparisons of each strain (scaffold).
Warning: this file currently contains a ton of metrics, several of which are slight variations on others. In the final version of StrainGE we will likely remove a few and only keep the most relevant ones.
Columns:
- sample1, sample2: Sample names (from filename)
- ref: The name of the original reference this scaffold belongs to
- scaffold: scaffold name
- length: length of the scaffold
- common (commonPct): Number (percentage) of positions of this scaffold that's callable in both samples
- single (singlePct): Number (percentage) of positions where both samples have a single strong call (i.e. no evidence for multiple alleles)
- singleAgree (singleAgreePct): Number (percentage) of positions where both sample have single strong call, and the base call is the same. singleAgreePct is the ACNI metric as described in the paper.
- sharedAlleles (sharedAllelesPct): Number (percentage) of positions where both samples share an allele. This allows for positions to have multiple alleles, and at least one allele should match.
- variants (variantsPct): Number (percentage) of positions where either sample has an allele other than the reference.
- commonVariant (commonVariantPct): Number (percentage) of variants where both samples share an allele
- variantExact (variantExactPct): Number (percentage) of variants that are exactly the same in both samples (including the same positions with multiple alleles).
- AnotB (AnotBPct): Number (percentage) of variants in Sample A but not in Sample B
- BnotA (BnotAPct): Number (percentage) of variants in Sample B but not in Sample A
- gapJaccardSimilarity: Jaccard similarity between samples of set of positions not marked as gap (i.e. analogous to gene content similarity).