Add a Crimean-Congo hemorrhagic fever virus dataset #265

anna-parker · 2025-02-06T14:52:26Z

This PR updates the CCHF dataset started in #200

It does the following:

adds basic QC
updates alignment parameters:

Decreases the minSeedCover to 10% - this means the total number of sequences on NCBI virus that cannot be aligned using this dataset drops from 148 to 48.
Decreases the kmer size to 7 for segment M and S - further drop to only 34 sequences.
Add retryReverseComplement - able to align 25 further sequences (drop to only 9)
Increases the the gapOpenScores to 18, 19 and 20 for the S and L segment and to 24, 26 and 28 for the M segment (see reasoning below)

Adds clades for the S segment using the annotation defined in
Serena A. Carroll, Brian H. Bird, Pierre E. Rollin, Stuart T. Nichol,
Ancient common ancestry of Crimean-Congo hemorrhagic fever virus (2010), link. The nextstrain build used for this can be found here: https://github.com/neherlab/CCHFV

More Insertions or Mutations?

I see high divergence right before the terminals of all segments, especially the M segment, this is partially in line with previous findings: "Prominent features of the M RNA segment are a high degree of divergence at the first part of the M genome, along with conservation of the middle and last regions and the 10-nt termini, which are conserved in all nairoviruses" (https://pmc.ncbi.nlm.nih.gov/articles/PMC2730268/).

After discussion with @corneliusroemer I increased the gapOpen score in all segments, but especially in the M segment, to make insertions more expensive. For most sequences the number of insertions relative to the parent is now comparable to the number of mutations relative to the parent, whereas before it was approx. double that.

M segment alignment, only sequences with high coverage, default gapOpen score:

Current M segment alignment:

Preview

https://clades.nextstrain.org/?dataset-server=https://raw.githubusercontent.com/anna-parker/nextclade_data/cchfv_updates/data_output

chore: rebuild [skip ci]

8422bdd

anna-parker changed the title ~~add CCHF~~ WIP: Add CCHF Feb 6, 2025

anna-parker force-pushed the cchfv_updates branch 3 times, most recently from a995211 to f0dae5c Compare February 7, 2025 15:07

anna-parker and others added 5 commits February 7, 2025 16:07

add CCHF

c8d501d

update pathogen.json

0afd35b

fix qc

b041731

update urls to images

a7b2ed8

Increase bandwidth

60760eb

anna-parker force-pushed the cchfv_updates branch from f0dae5c to 60760eb Compare February 7, 2025 15:08

anna-parker added 3 commits February 7, 2025 16:11

add to data_output for local testing

10c864a

correct trees

0dbda0f

update data_output

6a03d6d

anna-parker changed the title ~~WIP: Add CCHF~~ Add a Crimean-Congo hemorrhagic fever virus dataset Feb 7, 2025

anna-parker mentioned this pull request Feb 9, 2025

[don't merge until ingest grouping changes tested] fix(ingest): also try reverse complement when identifying segment loculus-project/loculus#2651

Closed

1 task

ivan-aksamentov force-pushed the master branch from 8422bdd to b0d27df Compare February 17, 2025 07:43

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add a Crimean-Congo hemorrhagic fever virus dataset #265

Add a Crimean-Congo hemorrhagic fever virus dataset #265

anna-parker commented Feb 6, 2025 •

edited

Loading

Add a Crimean-Congo hemorrhagic fever virus dataset #265

Are you sure you want to change the base?

Add a Crimean-Congo hemorrhagic fever virus dataset #265

Conversation

anna-parker commented Feb 6, 2025 • edited Loading

More Insertions or Mutations?

Preview

anna-parker commented Feb 6, 2025 •

edited

Loading