SVIM is able to detect and classify the following six classes of structural variation: deletions, insertions, inversions, tandem duplications, interspersed duplications and translocations
mamba install -c bioconda svim
svim alignment /home/qgn1237/qgn1237/4_single_cell_SV_chimera/1_smooth_seq_95_sc_K562_SMRT/SRR11951439/svim /projects/b1171/qgn1237/4_single_cell_SV_chimera/1_smooth_seq_95_sc_K562_SMRT/SRR11951439/SRR11951439_sort.bam ~/qgn1237/1_my_database/GRCh38_p13/GRCh38.p13.genome.fa
Another command line to add sequences in output
svim alignment --insertion_sequences /home/qgn1237/qgn1237/4_single_cell_SV_chimera/1_smooth_seq_95_sc_K562_SMRT/SRR11951439/svim /projects/b1171/qgn1237/4_single_cell_SV_chimera/1_smooth_seq_95_sc_K562_SMRT/SRR11951439/SRR11951439_sort.bam ~/qgn1237/1_my_database/GRCh38_p13/GRCh38.p13.genome.fa
SRR11951439_sort.var.vcf
Since the output SVIM is unfiltered, we have to filter them manually This is very very important since SVIM ouput almost everything.
for dir in *depth/; do cd "$dir"; cd svim; filter_vcf_based_on_quality.py variants.vcf 10 > filtered_variant.vcf; cd ../..; done
Or you can do it with BCFtools
bcftools view -i 'QUAL >= 10' variants.vcf'.
# or you can do
filter_vcf_based_on_quality.py variants.vcf 10 > filtered_variant.vcf
For high-coverage datasets (>40x), we would recommend a threshold of 10-15. For low-coverage datasets, the threshold should be lower (>3-5). For 30 I choose 8.
Or you can do not do this
for dir in *depth/; do cd "$dir"; cd svim; filter_vcf_based_on_quality.py variants.vcf 2 > filtered_variant.vcf; cd ../..; done
# For 30x, the value is 7
# For 25x, the value is 6
# for 20x, the value is 5
# for 15x, the value is 4
# for 10x, the value is 3
# for 5x, the value is 2
./SVIM_steps_generator.py --bam input.bam --reference ref.fa
./SVIM_steps_generator.py --bam input.bam --reference ref.fa --quality 8