This guide provides essential commands for using SVanalyzer's benchmark
and merge
functions to analyze structural variants (SVs).
Compare test SVs against a truth set to estimate sensitivity and specificity.
svanalyzer benchmark --ref <reference.fasta> --test <test.vcf> --truth <truth.vcf>
--ref
: Reference FASTA file (required).--test
: VCF file with test variants (required).--truth
: VCF file with true variants (required).--maxdist
: Maximum positional distance between variants to consider a match (default: 100,000).--normshift
: Maximum normalized shift allowed (default: 0.2).--normsizediff
: Maximum normalized size difference (default: 0.2).--normdist
: Maximum normalized edit distance (default: 0.2).--minsize
: Minimum variant size to include (default: 0).--prefix
: Prefix for output files (default:benchmark
).
svanalyzer benchmark \
--ref reference.fasta \
--test sample_test.vcf \
--truth sample_truth.vcf \
--prefix benchmark_results
Group SVs by similarity to create a unified set of variants.
Merge from a single VCF file:
svanalyzer merge --ref <reference.fasta> --variants <variants.vcf> --prefix <output_prefix>
Merge from multiple VCF files:
svanalyzer merge --ref <reference.fasta> --fof <vcf_list.txt> --prefix <output_prefix>
--ref
: Reference FASTA file.--variants
: VCF file containing variants to merge.--fof
: File listing paths to VCF files (one per line).--prefix
: Prefix for output files (default:merged
).--maxdist
: Maximum distance between variants for merging (default: 2000).--reldist
: Maximum normalized edit distance (default: 0.2).--relsizediff
: Maximum normalized size difference (default: 0.2).--relshift
: Maximum normalized shift (default: 0.2).--seqspecific
: Only include variants with explicit REF and ALT sequences.
Merging from multiple VCF files:
-
Create a file listing your VCFs (e.g.,
vcf_list.txt
):caller1.vcf caller2.vcf caller3.vcf
-
Run the merge command:
svanalyzer merge \ --ref reference.fasta \ --fof vcf_list.txt \ --prefix merged_results
- Reference Consistency: Ensure the reference FASTA file matches the one used for generating your VCFs.
- Parameter Tuning: Adjust
--maxdist
,--reldist
, etc., based on your data characteristics. - Sequence-Specific Variants: Use
--seqspecific
to focus on variants with defined REF and ALT sequences. - Output Files: The
--prefix
option sets the prefix for all output files generated.