Skip to content

Quick walk through example

Anna Bernasconi edited this page Oct 31, 2022 · 11 revisions

This is a complete example of use of EpiViruSurf. It should be sufficient to learn using the system. Please let us know if you find anything unclear (you can find our contacts here).

You start by selecting a population of sequences of your interest. SARS-CoV-2 virus and Homo Sapiens host are pre-selected, as we supposed these are the ones of bigger interest for our users. If you need to change them, please click on the CLEAR YOUR QUERY BUTTON or place the cursor on the text field of the drop-down menu and delete its content. Here we show that also country = 'usa' was added, and we are in the process of adding region = 'florida'. The dropdown fills according to your insertion in the text field, auto-completing it. The numbers you see on the side are the amount of sequences available with that value.

After you click on metadata values, we update the number in the footer. Florida sequences are now 21706. And we still show the total of 1965 currently available from IEDB.

Mode 2 (yes, we prefer to explain this before than Mode 1)

Below, you can choose one of the three modes of use of EpiViruSurf. We will first explain how to use Mode 2.

Here you have an interface to metadata attributes describing IEDB epitopes (this is very similar to the one on the https://www.iedb.org/ but we have added the condition on the position range, where you insert a minimum and a maximum expressed using the coordinates of the protein, e.g., on the SARS-CoV-2 Spike you will be able to use 1-1273 (or 1228, as in the figure, because other filters have already been used and the system automatically computes which coordinates remain available). In the 'Epitope search condition' field you see what you have already selected. We update immediately the count on the footer. Here you see that the current selections retrieves 27 epitopes on IEDB.

Here we show an example of how the result table will look like.

  • In red rectangles: functionalities to modify the columns of the table and download it as a csv file
  • In orange rectangle: links to pages and references (publications) on IEDB
  • In green rectangles: links to open the selected dataset in our VirusViz application (we will explain how this work later, but if you are interested you can read the full documentation here and try VirusViz directly here
  • In blue rectangle: all the metadata regarding each epitope (from IEDB)

Mode 3

We then switch to Mode 3, for a more advanced use of the interface. All the selected filters are maintained in this mode. This mode serves as a way to check how much the population of sequences selected in the first part is mutated in the areas of the epitopes selected in the second part.

Note that, if we want to restrict the statistics to specific amino acid changes we can do so by clicking on ADD CONDITION ON AMINO ACIDS, as shown step-by-step by the figure (where we added the condition on the K417N substitution in the Spike).

However, as this is a sophisticated use of the search, in this example we proceed by omitting this condition, so after the epitope attributes selection, we just press APPLY EPITOPE SEARCH, thereby triggering the computations of the result table with the statistics.

The system warns you that, for performance reasons, this mode should be only used when a population of thousands of sequences is checked against tens of epitopes. In this example we are following this guideline.

As you see, in the result table, we have many options to open VirusViz:

  • VirusViz Mutated Sequences
  • VirusViz All population
  • VirusViz All Epitopes

as well as four new statistics columns:

  1. NUM MUT SEQ: the number of sequences in the selected population that exhibit at least one amino acid change within the epitope position range;
  2. TOT MUT: the number of total amino acid changes exhibited by the full population of sequences.
  3. MUT FREQ: the ratio of total variants (2) over the number of mutated sequences (1);
  4. MUT SEQ RATIO: the ratio of mutated sequences (1) over the total of the selected population (that you see in the footer).

The content of the two columns pointed with red arrows is clickable and in the following we explain the effects of clicking on those buttons.

Consider first the NUM MUT SEQ column. When clicked, we show a list of those sequences (599 in the example).

Instead, when you click on TOT MUT, you open the 'Epitope Mutation Statistics' functionality. Here we show a summary of the characteristics of the epitope on which the functionality has been activated. You can choose up to two attributes according to which we will group the counts regarding each amino acid change present in the location of this epitope (in the example between 411 and 420 of Spike protein). We here choose to group by lineage and then by month of collection.

As a result the following table is obtained. Note that you can conveniently click on TOTAL to sort by descending count of mutations. In this way, you immediately see that the K417T is the change that appears in most of the sequences in the selected set. In second position, the alternative change K417N and so on so forth. By clicking on the K417T you can also sort 'by column', immediately extracting at the leftmost side of the table the lineage+month groups with most occurrences (in this case P.1 lineage in March and April 2021).

Going back to the result table, we now explain the meaning of the colored circle marks in the MUT SEQ RATIO column. When epitopes are of T cell or MHC ligand type, we recommend considering the statistics only when the response frequency is above 0.2 (green mark) or, alternatively (when marks are red/orange), ensure that the used HLA restriction is appropriate for the observed population.

Finally, from this view it is also possible to open results in VirusViz. Several buttons allow this interaction. For brevity, here, we only show what happens when you use the 'VirusViz All Epitopes' button at the top of the table. Please click on the (AA mutations only) option as the other one (including nucleotides) is not relevant for Epitopes research.

VirusViz opens the selected sequence (in this case, all the ones from Florida), as a new project, optionally removing sequences that are too short (here 143). If you are interested you can read the full documentation here and try VirusViz directly here, otherwise you can continue here.

After clicking on 'Confirm', by default VirusViz opens on its distribution page. On the left, in the 'Highlight region' dropdown you can choose one of the epitopes that were chosen in EpiViruSurf previously. We visualize a bar plot where the x-axis represents the amino acid positions of a protein, bars' heights represent the number of sequences in the selected populations that feature a change in the bar's position, and epitope position ranges are represented as blue vertical regions.

If you wish to remove particular epitopes or to check their information in a more convenient way, you can do so in the 'Regions' tab (epitopes' names are clickable and they open IEDB information).

Sophisticated use of VirusViz can be initiated in the 'Groups' tab. By using metadata filters (as shown in the figure) you can build groups of sequences that were collected in particular months.

Then, you may switch to the 'Compare' tab, and build a comparative view of the distributions of sequences (for example, divided by collection month as in the figure). Note that on the left you should select:

  • Compare groups
  • Select groups (the ones you have created before)
  • Highlight region: one epitope at a time
  • Other optional visualization features to adjust the height of plots (according to your taste/goal)

... and finally Mode 1

Here we explain how to use Mode 1, also said 'Custom epitopes' mode, as it allows you to design your own epitope, by choosing a name, protein and position range (or multiple ranges in the case of discontinuous epitopes).

In the figure you see such example, where we added two segments [677,680] and [682,685] on the Spike protein. When you are satisfied with the choice, you can press ADD EPITOPE.

If you instead which to compute statistics only over a fraction of the selected population (in our example, it is still SARS-CoV-2 sequences from human hosts in Florida, US), you should click on the red button 'ADD CONDITION ON AMINO ACIDS (OPTIONAL)', which will open the following grey panel. As an example, here we selected amino acid changes on the 677 position of the Spike (but this does not need to be a position included in the epitope). When you are satisfied with the choice you first 'ADD' the amino acid condition and then 'ADD EPITOPE'.

The system will warn you that the custom epitope functionality should be used with care, as checking for the conservancy of epitope ranges is a good indication for B cell epitopes, but may not be enough for T cell epitopes. For these, please check appropriate HLA restrictions.

Once you have input your epitopes, they will be registered in a user-define list. See two example epitopes below. Each epitope has its own info card. You can press MORE INFO to get quantitative information on the epitope conservancy.

Here we show how the information looks like for the first input epitope.

Remember that epitopes are always checked in the context of a popoulation of sequences that you selected at the beginning (for us, Florida, US). If you wish to change such population, you can click on RELOAD: this will reload into all the dropdowns the information that defined that epitope+poulation.

In this way you could replace the initial selection (for us, Florida) with, for example, other countries (we tried Texas) or collection dates, or other metadata attributes. As a result, you will obtain the following table, where each epitope is registered according to the specific population where the quantitative information is computed. My_epi2 and My_epi2_0 are the same, but tested in Florida and in Texas. It appears that the Florida sequences are more mutated than the Texas sequences in the context of this epitope (on nucleocapsid protein, range 200-210).