Near real-time visualization of SARS-CoV-2 (hCoV-19) genomic variation

CoVizu is an open source project endeavouring to visualize the global diversity of SARS-CoV-2 genomes.

This web page provides two interactive visualizations of these data. On the left, it displays a phylogenetic tree summarizing the evolutionary relationships among different SARS-CoV-2 lineages (groupings of viruses with similar genomes, useful for linking outbreaks in different places; Rambaut et al. 2020). You can navigate between different lineages by clicking on their respective boxes. Selecting a lineage displays a "beadplot" visualization in the centre of the page. Each horizontal line represents one or more samples of SARS-CoV-2 that share the same genome sequence. Beads along the line represent the dates that this variant was sampled.

The results here are in whole or part based upon data hosted at the Canadian VirusSeq Data Portal and the Nextstrain Open Data build assets. We wish to acknowledge the contributing laboratories, Canadian Public Health Laboratory Network (CPHLN) and CanCOGGeN VirusSeq. Computational support provided by the McArthur Lab (Michael G. DeGroote Institute for Infectious Disease Research, McMaster University). This version of the front-end is hosted by the Ontario Institute for Cancer Research.

For more help, click on the 🔰icons.

A phylogenetic tree is a model of how different populations are related by common ancestors. The tree displayed here (generated by TreeTime v0.8.0) summarizes the common ancestry of different SARS-CoV-2 lineages, which are pre-defined groupings of viruses based on genome similarity.

A time scale is drawn above the tree marked with dates. The earliest ancestor (root) is drawn on the left, and the most recent observed descendants are on the right. We estimate the dates of common ancestors by comparing the sampled genomes and assuming a constant rate of evolution.

For each lineage, we draw a rectangle to summarize the range of sample collection dates, and colour it according to the geographic region it was sampled most often. To explore the samples within a lineage, click on the label (e.g., "B.4") or the rectangle to retrieve the associated beadplot.

We use beadplots to visualize the different variants of SARS-CoV-2 within a lineage, where and when they have been sampled, and how they are related to each other. Every object in the beadplot has additional info in a tooltip (which you view by hovering over that object with your mouse pointer).

Each horizontal line segment represents a variant – viruses with identical genomes. We draw beads along a line to indicate when that variant was sampled. If there are no beads on the line and it is grey, then it is an unsampled variant: two or more sampled variants descend from an ancestral variant that has not been directly observed.

The area of the bead is scaled in proportion to the number of times the variant was sampled that day. This is important for rapid or intensively-sampled epidemics, e.g., lineage D.2 in Australia. Beads are coloured with respect to the most common geographic region of the samples.

We draw vertical line segments to connects variants to their common ancestors. These relationships are estimated by the neighbor-joining method using RapidNJ. Tooltips for each edge report the number of genetic differences (mutations) between ancestor and descendant as the "genomic distance". Since it's difficult to reconstruct exactly when these mutations occurred, we simply map each line to when the first sample was collected.