1 Evolution in Viruses

Viruses are parasites of cells, containing transmissible deoxyribonucleic acid (DNA) or ribonucleic acid (RNA) as their genetic material. They are a large, extremely diverse group, capable of rapid evolution as genetic variation is continually generated through random mutations and continually removed from the population through the processes of natural selection and genetic drift. These are the processes that drive evolution—changes in the inherited characteristics of a population from one generation to the next. In viruses, new genetic variation arises rapidly due to high mutation rates, typically through errors in replication which are then passed on to subsequent generations. However, in some RNA viruses, recombination and reassortment can also play an important role. For example, reassortment is quite common among influenza virus strains and may occur when multiple strains infect the same host cell and exchange segments of their genomes to produce hybrid viral progeny [1]. RNA viruses typically have higher mutation rates than DNA viruses. This is because they lack the proofreading activity of RNA polymerases, so new genetic variants constantly arise, allowing for evolution in changing environments [2, 3].

The processes of genetic drift and natural selection impact evolving populations to different degrees, depending on population size. Genetic drift drives random fluctuations in the frequency of different genetic variants in populations simply due to chance events. Under genetic drift conditions, new mutations with a whole range of effects (good, bad, neutral) may rise in frequency in a population simply by chance [4]. Evolution by means of genetic drift occurs in small sized populations, for example, a population that has undergone a severe population “bottleneck.” However, for large populations, evolution is typically driven by natural selection. Natural selection will drive a mutation to increase in frequency in a population if it confers a fitness benefit. On the other hand, it will drive a decrease in frequency if the mutation has a deleterious effect on fitness [5]. Evolutionary change driven by natural selection is referred to as “adaptive evolution.”

Viral life cycles are characterized by massive fluctuations in population sizes across multiple scales (see Fig. 5.1), and these fluctuations result in shifts in the balance between natural selection and genetic drift in driving evolutionary dynamics. In some virus species, infection of a new host can be initiated with as little as a single viral particle or virion (e.g., HIV; [7]). The data are still preliminary, but the transmission bottleneck in SARS-CoV-2 appears to be very small as well [8]. Once a new host is infected, the viral population typically grows rapidly, reaching population sizes that are easily over a billion virions within a matter of days (e.g., SAR-CoV-2: 1–100 billion virions at peak infection; [9]). With some infections, there are additional population fluctuations within the host, for example, dengue viruses experience a large bottleneck in their mosquito hosts as they move into the mosquito salivary gland [10]. Once a viral population has reached sufficient size in its host, it may then be transmitted onto the next susceptible host. At this stage, the size of transmission bottleneck can also depend on the specific mode of transmission, for example, with influenza, fewer virions are passed on through aerosol transmission versus contact transmission [11].

Fig. 5.1
figure 1

(a) Overview of the different scales at which viral populations fluctuate in size from intra-host to local populations of hosts to global networks of populations. Different selection pressures may be important at each of these levels, and genetic drift will dominate evolutionary dynamics at small population sizes. (b) Fluctuating global population dynamics of SARS-CoV-2 showing the shifting importance of genetic drift versus natural selection in driving evolutionary dynamics in the viral population. Case count data was retrieved from the COVID-19 Content Portal https://systems.jhu.edu/research/public-health/ncov/ [6]

At the inter-host level, viruses can also experience extreme population fluctuations as epidemics initiate, spread exponentially, decrease (e.g., due to the implementation of public health practices such as quarantining), and then repeat, potentially going through multiple epidemic waves. Finally, a virus spreading through a host population may infect another geographically distinct host population, usually via host migration, resulting in complex and even asynchronous viral population fluctuations across an interconnected network of host populations. Figure 5.1 gives an overview of these different scales at which population fluctuations are expected in a virus population, such as that of SARS-CoV-2 (case numbers shown in Fig. 5.1b). Thus, virus populations follow a complex pattern of extreme changes in size which can have important implications for the relative impacts of genetic drift versus natural selection over the course of a pandemic.

Two key characteristics or traits of viruses that can impact their fitness are transmission rate and virulence. Transmission rate is the rate at which a virus moves from one infected host to a new susceptible host, while virulence can be defined as the harm that a pathogen inflicts on its host and results from the pathogen using the host resources for replication [12]. Much theoretical and empirical work on pathogen evolution has centered on the hypothesis that a trade-off between virulence and transmission drives pathogen fitness [13]. The assumption is that while increasing virulence might initially increase transmission rates (because there are more virions to transmit), increasing virulence may eventually result in increased host mortality rates, which typically slows the transmission rate because the host’s infectious period is cut short—deceased hosts don’t transmit the virus. Thus, the transmission rate is expected to be highest at intermediate levels of virulence, balancing the costs of replication and infectious period length [13]. Although empirical evidence of this trade-off is not clear-cut, there are many pathogens that appear to have evolved to maintain intermediate virulence because of this trade-off. Some examples are the Zika virus [14], HIV-1 [15], dengue virus [16], and the influenza virus [17]. It remains to be seen whether SARS-CoV will also follow this pattern.

2 Evolutionary History of SARS-CoV-2 Within the Coronavirus Group

Coronaviruses (CoVs) are a widely distributed group of RNA viruses, typically highly specialized at infecting humans, birds, and a range of mammals, causing mild to severe disease with both respiratory and gastrointestinal symptoms, depending on the species. This group of viruses is considered one of the zoonotic viruses posing great challenges for the global health community [18, 19]. Coronaviruses belong to the subfamily Coronaviridae which is subdivided in four genera: Alphacoronavirus, Betacoronavirus, Gammacoronavirus, and Deltacoronavirus in family Coronaviridae (not to be confused with the Alpha, Beta, Gamma, and Delta SARS-CoV-2 variants), categorized in the order Nidovirales [20]. Among the four genera, alpha- and betacoronaviruses infect only mammals, while gamma- and delta-coronaviruses infect primarily avian species along with few mammals [21, 22]. Alpha- and beta-CoVs normally cause respiratory illness to humans and gastroenteritis to other animals; however, there are also reports of hepatic and neurologic syndromes due to infection [23]. Although bats are considered as the major natural host shaping the evolutionary dynamics of CoVs, these viruses also circulate among wildlife and domestic livestock. These secondary animal species can play important roles as intermediate hosts before infecting humans [21, 24]. There are already seven reported instances of CoVs being transmitted from animals to humans that have led to the emergence of human CoVs with a wide range of virulence and transmission rates [25]. There are also some CoVs circulating in bats that are reported to have the capability of infecting humans without an intermediate host [20]. Table 5.1 outlines the seven human coronaviruses (HCoVs), and their animal origins. Figure 5.2 shows the evolutionary relationships between those HCoVs and a few other animal hosted coronaviruses within the alpha and beta groups.

Table 5.1 Species of coronavirus reported in human hosts [19, 21, 26]
Fig. 5.2
figure 2

Maximum likelihood phylogeny of a selection of alpha- and betacoronaviruses, including the seven coronaviruses known to infect humans (labeled in bold). This phylogeny was inferred using the GTR + G + I model [27] fit to ORF1ab nucleotide sequences and is drawn to scale with branch lengths measured in the number of substitutions per site (see scale bar). Evolutionary analyses were conducted in MEGA X [28], and bootstrap values were calculated from 500 replications. Red labels indicate coronaviruses in the SARS-CoV-2 group; bold text indicates species found in humans. Accession numbers for sequence data used in this figure include the following: NC_019843.3, MT797634.1, NC_006577.2, NC_006213.1, NC_004718.3, NC_045512.2, MN996532.2, MT121216.1, NC_003045.1, NC_003436.1, MG762674.1, DQ022305.2, NC_010438.1, NC_038861.1, and NC_005831.2

Among the seven reported human coronaviruses, four of them include HCoV-229E, HCoV-OC43, HCoV-NL63, and HCoV-HKU1 that represent endemic and low virulence HCoVs causing upper respiratory tract disease, responsible for up to 15–30% of the common cold symptoms in adults. The earliest, HCoV-229E, was isolated in 1965 using standard tissue culture from a volunteer. In 1967, HCoV-OC43 was recovered from a tracheal and nasal organ culture from the Common Cold Unit in Salisbury, United Kingdom. Only these two HCoVs were widely reported and under study until the twenty-first century. In 2004, HCoV-NL63 was isolated from a 7-month-old child in the Netherlands, while in 2005, HCoV-HKU1 was isolated from a Hong Kong patient with pneumonia. Later both strains were identified in adults and infants, indicating that these two HCoVs can be considered as new agents responsible for respiratory infections [29, 30]. These four HCoVs generally demonstrate winter seasonality between the months of December and April and typically cause mild upper respiratory infections in humans with some exception of severe lower respiratory infection, for example, in the cases of elderly or immunocompromised individuals [31].

Apart from these four mild HCoVs, over the last two decades, the world has witnessed three major outbreaks of coronaviruses with increased morbidity and transmission rates. In 2002–2003, the outbreak of severe acute respiratory syndrome (SARS-CoV) occurred in Guangdong Province of China and quickly spread to over 30 countries, infecting 8000 people with a mortality rate of approximately 10%. A decade later, in 2012, MERS-CoV caused a severe respiratory disease that emerged in the 27 countries in the Middle East, Europe, North Africa, and Asia. MERS-CoV virus is still posing a potential threat in the Middle East where it is still sporadically detected in the human population [32]. In December 2019, the novel coronavirus entitled severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) was reported in Wuhan, Hubei, China. On March 11, 2020, the World Health Organization declared a global COVID-19 (the disease caused by SARS-CoV-2) pandemic due to the emergence of different variants and global spread [33, 34].

Bats are considered the most probable zoonotic origin for all three of these major outbreaks of coronavirus. SARS-CoV and MERS-CoV appear to have passed through intermediate hosts: civets and camels, respectively, before transmitting to humans. However, the details of these transmission events are unclear [19, 35, 36]. As for SARS-CoV-2, the highest average genetic similarity with other sequenced virus genomes to date is with CoV RaTG13 sampled from a Rhinolophus affinis bat, estimated to have shared a common ancestor with SARS-CoV-2 decades prior to the emergence of SARS-CoV-2 in humans [37]. So far, no viruses with a closer genetic similarity to SARS-CoV-2 have been collected, and so there remains some ambiguity about the immediate source of the virus [18]. Pangolins were also initially considered as the possible SARS-CoV-2 reservoir due to high genomic similarity of Pangolin CoV with SARS-CoV-2. However, infected pangolins also exhibit clinical and histopathological changes when infected with CoV [32], and natural CoV hosts are expected to be asymptomatic due to their long coevolutionary history.

3 Evolution of SARS-CoV-2 in Human Hosts

As the SARS-CoV-2 epidemic began to emerge globally in early 2020, one key part of the global effort to characterize the virus and understand how it was spreading was to acquire whole genome sequence data. These sequence data quickly began to reveal mutational changes. The rate of change of the SARS-CoV-2 genome has been estimated at between 1–5 × 10−6 nucleotide substitutions per site per day [38], or across the whole SARS-CoV-2 genome, approximately 20 genetic changes per year within a lineage [39]. In the early days of the pandemic, the observed mutations all appeared to be random and effectively neutral, but they provided a way of tracking transmission routes of specific genotypes, as the virus began to spread around the globe. For example, early on there was much discussion about what the initial transmission routes were for SAR-CoV-2 into the USA. Through sequencing and tracking of specific mutations, researchers were able to conclude that SARS-CoV-2 entered the USA via multiple independent sources from both Asia and Europe [40].

Fig. 5.3
figure 3

Phylogenetic tree of 3643 genomes sampled globally between December 2019 and December 2021. Clades are named according to Nextstrain nomenclature, which distinguishes clades based on global frequency, year of emergence, and a unique letter. Visualization was performed by nextstrain.org [123] with data from GISAID

4 SARS-CoV-2 Whole Genome Sequence Data

The SARS-CoV-2 genome consists of ~29,903 nucleotides and has a gene composition and structure similar to that of other betacoronaviruses: ~70% of the genome comprises replicase orf1ab, and the rest of the genome consists of S (encoding the structural spike glycoprotein), ORF3a (ORF3a protein), E (structural envelope protein), M (structural membrane glycoprotein), ORF6 (ORF6 protein), ORF7a (ORF7a protein), ORF7b (ORF7b protein), ORF8 (ORF8 protein), N (structural nucleocapsid phosphoprotein), and ORF10 (ORF10 protein). We know much about SARS-CoV-2 diversity and evolution throughout the global pandemic from ongoing analysis of the unprecedented quantity of publicly available SARS-CoV-2 whole genome sequence data. This continually growing collection of sequence data has given researchers the opportunity to observe viral molecular evolution, essentially in real time. For decades, the research field of bioinformatics has supported and promoted the practice of open data sharing [41], and the SARS-CoV-2 sequence data has been no different. Importantly, free access to SARS-CoV-2 sequence data for all researchers means more teams can work in parallel and hopefully allow for more rapid characterization of patterns and even solutions. Data has been made available through databases, including NCBI as well as GISAID, a database originally developed to organize sharing of influenza sequence data but now the main repository for global SAR-CoV-2 genome sequence data. As of late 2021, there were close to six million publicly available whole SARS-CoV-2 genome sequences on GSAID. Data are from all over the globe, but unsurprisingly the origin of sequence data is unevenly distributed, with the USA and the United Kingdom contributing the highest total numbers. However, on a per-case basis, the USA is far behind many other countries, having sequenced samples from less than 1% of patients diagnosed with COVID-19 [42]. Iceland has the best per-case sequencing rate, at close to 100% of diagnosed infections [43]. Figure 5.3 shows the inferred evolutionary relationships between a subset of available SARS-CoV-2 genomes (N = 3643) to give an idea of how the virus population has evolved and diversified over the course of the pandemic.

5 Natural Selection Begins to Drive Adaptive Evolution

By the fall of 2020, clear evidence was emerging to suggest that a few key mutations in the SARS-CoV-2 genome conferred adaptive impacts of human health concern, in particular increased transmission rates and virulence. These adaptations presented new challenges for governments and public health officials trying to reduce cases. Therefore, understanding and predicting the adaptive evolution of SARS-CoV-2 have become increasingly important. Here, we identify two broad categories of approaches that researchers have taken to identify regions in the SARS-CoV-2 genome where natural selection is driving evolution. The first type of approach aims to infer how specific mutations are likely to impact protein structure and function at the molecular level and so lead to impacts on virus fitness. The second type of approach analyzes variation in genome sequences collected during the SARS-CoV-2 pandemic to identify epidemiological and phylogenetic patterns that are unlikely to have occurred simply due to chance and are likely driven by natural selection. We discuss what we can learn from both types of approaches, as well as their drawbacks. Typically, multiple lines of evidence are required before a specific mutation is identified by consensus as adaptive.

6 Evidence of Selection from Models and Experimental Tests of Protein Function

At the molecular and functional level, predictive protein models and a range of in vitro experimental tests have allowed researchers to identify genes and even specific sites within genes on the SARS-CoV-2 genome where mutations are under selection and likely to have impacts on human health. The focus here is on exploring protein function and interactions with the human immune system and host cell. Using this kind of approach, earlier studies examining other coronaviruses had already identified the spike protein as one of crucial importance [44]. Soon after SARS-CoV-2 emerged, the spike protein was identified as important in this novel coronavirus as well [45]. The spike protein interacts with the angiotensin-converting enzyme 2 (ACE2) receptor on the surface of the human host cell. If this interaction is successful, then the coronavirus has access into the human host cell. Predictive models and experimental studies have identified specific regions and amino acid positions that are likely to be most important in impacting how the spike protein interfaces with ACE2 receptors. Of particular importance is the receptor-binding domain (RBD) of the spike protein [46]. This region varies between a closed/down position and an open/up position. To successfully bind with the human ACE2 receptor, it must be in the up position [47,48,49]. This is also the region where many neutralizing antibodies bind to prevent interaction with ACE2, effectively neutralizing the virus. Many in vitro experiments have been performed looking at the impacts of specific mutations in this region (e.g., [50, 51]). Other regions of interest in the spike protein that have been identified using this approach include the furin cleavage site (e.g., [52]) and the N-terminal domain (e.g., [53]). There is also evidence that mutations in other genes of the SARS-CoV-2 genome may impact virus fitness (e.g., [54, 55]). However, the spike gene appears to have the largest predicted impact.

While studies exploring the impact of mutations on the structure and function of key proteins are useful for identifying genes and regions of potential importance, sometimes they give us only a hint at the true impacts of these mutations when in vivo. Furthermore, for logistical reasons, these studies typically examine the impacts of a single mutational change at a time. However, the impact of a mutation often depends on which other mutations are also present in a genome, a phenomenon known as epistasis [56]. There is now growing evidence of epistatic effects of mutations in SARS-CoV-2 [57]. Therefore, the impacts of multiple mutation combinations will often remain unclear until they play out in real human hosts.

7 Evidence of Selection from Phylogenetic and Epidemiological Data

This second type of approach uses observed SARS-CoV-2 sequence data to identify patterns that are unlikely to have occurred simply due to chance and so must be driven by natural selection. One way to do this is by tracking variant frequencies over time and looking for rapid increases in the frequency of virus variants. The assumption is that the genomes of those variants rapidly increasing in frequency must contain one or more mutations that increase viral fitness as compared to other variants circulating at that time. It is usually unclear from this type of data which mutation or mutations are the ones that are impacting fitness. Another drawback of this approach is that sometimes an observed rapid increase in variant frequency may occur simply due to random loss of other variants when a small number of viruses establish in a new host population. These are the potential results of evolution by means of genetic drift in a small population. Because of this potential for genetic drift to result in evolutionary patterns that are difficult to distinguish from the results of natural selection, the evidence for adaptive evolution is significantly strengthened when the same pattern of a rapidly rising variant frequency is observed in multiple independent host populations. This is the type of evidence that allowed for the identification of the spike protein amino acid change D614G as one that increases virus fitness and therefore is a potential human health concern. SARS-CoV-2 variants with this mutation were observed to rapidly increase in frequency across the globe and at multiple geographic levels: national, regional, and municipal, beginning in the spring of 2020 [58].

Another way that researchers have used observed SARS-CoV-2 sequence data to look for evidence of selection is by using phylogenetic techniques to fit models of molecular evolution and identify mutations that have evolved multiple times across different independently evolving lineages of SARS-CoV-2. This pattern of repeated evolution is called “convergent evolution” and provides strong evidence for adaptive evolution, suggesting that the convergently evolved mutation must provide a fitness benefit of some kind to the virus. For example, Hodcroft et al. [59] showed evidence of convergent evolution at amino acid 677 in the spike protein. Their phylogenetic analyses of SARS-CoV-2 genome sequences from the USA (Sept.–Nov. 2020) revealed the independent evolution of spike mutations at position 677 in seven distinct lineages. The independent rise and spread of the mutation this many times is highly unlikely to have occurred by chance alone. It strongly suggests a fitness advantage in viruses that have it.

8 Adaptively Evolved Variants of Human Health Concern

In response to evidence that natural selection was driving the evolution of SARS-CoV-2 variants with important impacts on global human health, the Virus Evolution Working Group of the World Health Organization (WHO) established a name scheme for the variants to simplify and standardize communication about SARS-CoV-2 [60]. This naming scheme also has the added benefit of encouraging the media and the public to move away from location-based names such as the “UK variant” or the “South African variant” which can be stigmatizing to countries and their residents. Other nomenclature systems for naming and tracking SARS-CoV-2 genetic lineages have also been used [61,62,63]. However, the WHO naming scheme is specifically focused on identifying variants of potential interest for public health. In particular, as is outlined in detail on their website (www.who.int/en/activities/tracking-SARS-CoV-2-variants; [64]), they provide working definitions of what they have named variants under monitoring (VUMs), variants of interest (VOIs), and variants of concern (VOCs). Variants classified into these categories are more closely monitored and assessed through coordinated field and laboratory investigations by WHO member states and partners. Variants are placed on these lists if they seem to have increased transmissibility, virulence, immune escape, or undergone any other evolutionary changes with the potential to impact global public health. VOCs are those variants for which these evolutionary changes with global health impacts have been clearly demonstrated through the types of approaches described in the previous section of this chapter. Variants may be removed from these lists if “they have been conclusively demonstrated to no longer pose a major added risk to global public health compared to other circulating SARS-CoV-2 variants” [64]. Figure 5.4 shows the frequencies of five identified VOCs over time in four different representative countries.

Fig. 5.4
figure 4

Frequency of five SARS-CoV-2 variants of concern (VOCs) and pre-VOC variants over time, from June 2020 to Nov 2021, in the United Kingdom, the USA, South Africa, and Brazil. Variant frequency data retrieved from https://cov-spectrum.ethz.ch

8.1 The First Clearly Identified Adaptively Evolved Variant: Alpha

The Alpha variant (also known as B.1.1.7, or colloquially the UK variant) was the first SARS-CoV-2 variant for which there was clear evidence suggesting it had evolved impacts on human health. First detected in the United Kingdom in September 2020, it was officially designated as a VOC on December 18, 2020. The genome of the Alpha variant has 23 mutations compared to the Wuhan ancestor strain, but mutations of potential concern for human health all lie in the gene that codes for the spike protein [65]. A few lines of evidence confirmed that this variant evolved by means of natural selection and so has increased fitness compared to the other variants circulating at that time. The first line of evidence was its rapid increase in frequency, starting in Kent and Greater London in September 2020. It rapidly moved across the United Kingdom [65, 66], despite a country-wide lockdown, and was widespread across the United Kingdom by early December [67]. The Alpha variant also quickly rose in frequency in other countries as it began to spread globally. It accounted for most of the infections in the USA and many European countries by the second quarter of 2021 [38, 68, 69]. Epidemiological models fit to case counts estimated from community-based COVID-19 testing, and secondary contact data suggested that the Alpha variant had a significant transmission advantage of 59–74% over other lineages circulating at that time [70,71,72]. Further evidence of adaptively evolved changes came from structural modeling of the amino acid changes in the spike protein. Modeling of the evolved amino acid changes predicted that the Alpha variant could bind more easily to host cells in the human respiratory tract [73]. Viral load (the number of viral particles in a host), measured in both hospitalized and walk-in test center patients, showed that patients with the Alpha variant had viral loads that were higher by a factor of 10 relative to non-Alpha variant patients [74, 75]. This suggested that high viral load might be driving the observed increase in transmission rates.

Adaptive evolution of higher transmission rates in SARS-CoV-2 variants is not surprising given that transmission is an important component of viral fitness [76]. Unfortunately, and less clearly linked to viral fitness, higher virulence has also evolved in the Alpha variant. Statistical models examining community test data and deaths found a significant increase in the mortality rate (64% higher; 95% CI: 32–104%) for patients testing positive for the Alpha variant compared to patients with other variants [77,78,79]. The Alpha variant does not appear to be associated with increased risk of reinfection [80] nor a significant decrease in vaccine efficacy [81,82,83]. Therefore, these initial adaptively evolved changes did not impact reinfection rates.

8.2 Other Adaptively Evolved Variants Identified: Beta, Gamma, and Delta

The Beta and Gamma variants emerged around roughly the same time as the Alpha variant but in different regions of the world. The Beta variant (also known as B.1.351) was first detected in South Africa in October 2020. However, subsequent phylogenetic analysis suggests that the variant first evolved in July or August 2020 [84]. The Gamma variant (also known as P.1) was first detected in Brazil in November/December 2020 [85].

Preliminary data suggested that the Beta variant had evolved to be more transmissible [84, 86] and resulted in higher rates of reinfection compared with earlier SARS-CoV-2 lineages [87, 88]. In South Africa, the second wave of the SARS-CoV-2 epidemic was larger than the first and was characterized by a more rapid increase in admissions to hospitals, along with increased in-hospital mortality. Some of the increased mortality in the second wave likely arose from higher numbers of cases in older individuals and increased pressure on the health system. However, some of the increased mortality may have also been due to higher virulence of the Beta variant compared to the variants that dominated the first epidemic wave in South Africa [89].

The Gamma variant spread rapidly through Brazil and other parts of South America, and its rate of spread suggested that it also had an increased transmission rate. Analysis of samples from SARS-CoV-2 patients in the Manaus region of Brazil showed that Gamma variant samples were significantly associated with higher viral loads. Epidemiological models for variant counts of cases in Manaus, Brazil, also suggest that the Gamma variant was between 1.7 and 2.4 times more transmissible that other non-Gamma variants as it emerged in Brazil [85]. The model further showed that SARS-CoV-2 infections were 1.2–1.9 times more likely to result in mortality after the emergence of Gamma in Brazil. However, this effect may have also been driven by increased stresses on the healthcare system at that time [85].

The Delta variant (also known as B.1.617.2) began to emerge a little later. It was first documented in India in October of 2020 and declared a VOC in May 2021. This variant has shown increased transmissibility compared to previous variants, spreading across much of the globe and rapidly replacing the Alpha variant in the United Kingdom [90, 91] and the USA [92]. Studies are ongoing to characterize this variant, but increased transmission is likely driven by higher viral loads, a shorter time to peak viral load, and a shorter incubation period [93,94,95], along with the ability to effectively resist antibodies [96].

8.3 Emerging Variants of Concern: Omicron

The first documented case of Omicron was in early November 2021. Since then, it has quickly swept through South Africa and was declared a VOC on November 26, 2021 [97]. Omicron is unusual as compared to other variants sequenced previously. It has more unique mutations than expected and so is quite distantly related to other sequenced SARS-CoV-2 variants. At the time of this writing, the impacts of these unique mutations are still emerging. However, the number and location of mutations on the gene that codes for the spike protein are a concern and suggest that transmission rate is likely to be further increased in this variant. Impacts of these mutations on virulence and vaccine efficacy are unclear.

8.4 Evolved Impacts on Immunity and Vaccine Effectiveness

Individuals who are infected with SARS-CoV-2 retain some level of immunity after recovery and so are less likely to become infected again. If they are reinfected, they tend to have milder symptoms. Similarly, individuals who have been fully vaccinated with one of the multiple available COVID-19 vaccines are also much less likely to become reinfected. No vaccine is 100% effective but at a roughly 80–95% effectiveness for preventing symptomatic pre-VOC SARS-CoV-2 infections, vaccines have made a significant impact [98]. One concern is that the virus is evolving to better evade human immune defenses, both those generated by past infection and those generated through vaccination. There certainly have been significant shifts in vaccine efficacy due to some of the evolved VOCs, and these evolved shifts in vaccine efficacy differ between specific vaccine types. Table 5.2 summarizes some of these evolved shifts in vaccine efficacy for three of the available vaccines. The potential for future adaptive evolution of SARS-CoV-2 in response to vaccines remains to be seen.

Table 5.2 Estimates of vaccine efficacy at preventing symptomatic SARS-CoV-2 infection with pre-variant and evolved variants Alpha, Beta, and Delta

9 Immunocompromised Patients as Hotspots for SARS-CoV-2 Adaptive Evolution

Typically, individuals infected with SAR-CoV-2 remain infectious for no longer than 10 days, while patients with severe-to-critical illness remain infectious for up to 20 days [104, 105]. However, in immunocompromised patients, SARS-CoV-2 can sometimes successfully evade the immune system and persist for much longer [106,107,108,109], such as for over 6 months in one documented case [110]. Over the course of a long-term infection, the virus population can evolve extensively within the patient (“intra-host evolution”) and accumulate an unusually high number of mutations. Immunocompromised patients are often treated over the course of their SARS-CoV-2 infection with the antiviral drug remdesivir or convalescent plasma (blood plasma derived from patients who have recovered from COVID-19), which may drive adaptive evolution in SARS-CoV-2 to better evade these treatments. For example, in Kemp et al. [109], the authors report on a long-term SARS-CoV-2 infection in an immunosuppressed patient treated with convalescent plasma. In this patient, the viral population evolved to better escape neutralizing antibodies, and eventually this infection was fatal. Thus, rapid evolution driven by selection in immunocompromised patients may have driven the adaptive evolution of some (if not all) of the VOCs [84, 106, 108].

The key piece of evidence suggesting that long-term intra-host evolution (due to chronic infections) has driven the adaptive evolution of many of the VOCs is that these variants all have more unique mutations than expected as compared to other co-circulating variants. For example, each of the first three identified VOCs had about twice as many mutations in their genomes compared to other co-circulating lineages at the time they first emerged. The Alpha variant had 23 mutations compared to the Wuhan-Hu-1 reference sequence [65]. The Beta variant had 21 mutations [85], and the Gamma variant had 23 mutations [84]. These mutations are spread across the whole genome but tend to be biased toward non-synonymous mutations (those mutations that can affect protein structure) and focused on the S1 gene (the gene that codes for the spike protein) [38].

10 Potential Impacts of Cross-Species Spillback on Adaptive Evolution

Since SARS-CoV-2 is originally derived from a nonhuman animal host, we know it is certainly capable of cross-species transmission. Thus, reports of “spillback infections”—transmission of SARS-CoV-2 from humans to nonhuman animals—are not unexpected. Indeed, there have been quite a few species of nonhuman animals reported to have been infected with SARS-CoV-2 including pets and domesticated animals (e.g., cats, dogs, hamsters, rabbits, ferrets, and cows) and those commonly found in zoos (e.g., cougars, tigers, lions, gorillas), along with wildlife (e.g., white-tailed deer and skunks) [29, 111, 112]. Along with being a potential threat to the health of these nonhuman animals, repeated human to animal transmission increases the risk of adaptive evolution of SARS-CoV-2 occurring in a new animal host, followed by retransmission back to humans. The main concern with a high number of spillover cases is that new animal hosts may act as reservoirs, maintaining high numbers of infections and driving novel adaptive evolution of the virus. In 2020, this risk was realized in Denmark [113] and the Netherlands, where SARS-CoV-2 was transmitted from humans to farmed mink and then back to humans again [114, 115]. During its time in the mink population, the virus accumulated some new mutations including up to five in the spike protein, one of which was later found to confer partial escape from human antibodies [116]. In response to these events, the Danish government ordered the culling of all farmed mink in the country, estimated at approximately 17 million animals, and in the Netherlands, more than 2.7 million minks were culled [117]. At this time, it is not clear how big a risk mink farms are of becoming dangerous reservoirs of SARS-CoV-2, but at least for now, proactive measures have been taken to reduce this potential risk.

Other clear examples of transmission to other nonhuman animal hosts and then back to humans again have not been documented, but the potential for this occurring in the future is certainly there. Wild animal species of particular concern as SARS-CoV-2 reservoirs are those that are both abundant and live in close association with humans [111], for example, North American white-tailed deer. A study screening serum samples from wild deer in four US states in 2021 detected SARS-CoV-2 antibodies in 40% of the samples [118]. The high prevalence of antibodies in deer sampled in this study across multiple states suggests a high likelihood of within-herd spread. A second study tested samples from free-living and captive deer in Iowa in 2020 for the presence of SAR-CoV-2 RNA—evidence of a current or recent infection. These researchers found that one-third of those deer sampled tested positive for SARS-CoV-2 [119]. These studies highlight the need for increased surveillance of deer and potentially other wildlife populations to better determine whether nonhuman animal populations are on their way to becoming long-term reservoirs for SARS-CoV-2, generating novel evolved variants that will spill back over into humans or even other animal hosts.

11 Future Evolution of SARS-CoV-2

What will the future evolution of SARS-CoV-2 look like? The evolutionary outcomes depend crucially on population size and spread. The larger the virus population, the greater the chance for rapid and continued adaptive evolution of SARS-CoV-2, with further implications for human health. The more we can work to reduce infection numbers through a range of public health practices including mask-wearing and vaccinations, the better the chance that random genetic drift will dominate the evolutionary dynamics of SARS-CoV-2 instead of natural selection. Without dramatic reductions in case numbers, natural selection will likely continue to drive adaptive evolution in SARS-CoV-2 in ways that are possible. It could be that the virulence-transmission trade-off hypothesis [13] will play out, and we will see a reduction in SARS-CoV-2 virulence moving forward, eventually shifting the disease to one that resembles the common cold. We may also see SARS-CoV-2 evolve in response to some of the biomedical interventions currently in use – either treatments or vaccinations or both. Certainly, we have already started to see SARS-CoV-2 evolve to better evade human host immune defenses generated by the current vaccines. It is very possible that the COVID-19 vaccines need periodic updates to counter ongoing evolutionary changes, much like the annually updated flu vaccine. The best way to prepare for these possibilities is through continued and globally coordinated surveillance, analysis, and modeling of the evolving SARS-CoV-2 genome (e.g., [120,121,122]).