Introduction

The rapid evolution of multiple viruses, through mutation and recombination processes followed by selection (Arenas et al. 2018), has been associated with the alteration of transmission vector specificities, increase in virulence and pathogenesis, evasion of host immunity and resistance against drug therapies (including vaccine escape), among others (e.g., Lemey et al. 2006; Woodford and Ellington 2007; Arenas and Posada 2010; Voskarides 2020). At the molecular level, viral proteins can show the acquisition of these capabilities. Note that proteins perform a variety of functions within organisms, including catalysis of chemical reactions, building of cellular structures, and molecular recognition, among others. Because of their central role in these organisms, certain proteins have been selected as molecular targets for multiple treatments against pathogens such as viruses. A relevant case is flavivirus, which includes a variety of highly pathogenic viruses for humans such as dengue virus (DENV), West Nile virus (WNV), yellow fever virus (YFV), Zika virus (ZIKV), Japanese encephalitis virus (JEV) and tick-borne encephalitis virus (TBEV), causing millions of human infections every year (Gubler 2002; Gould and Solomon 2008). Flaviviruses are small enveloped viruses presenting a positive-sense single-stranded RNA genome with 9500–12,500 nucleotides that encodes three structural proteins (nucleocapsid assembly C, precursor to membrane protein prM, and envelope E; all of them mainly related with the capsid) and seven non-structural proteins (binding control protein NS1, formation of the viral replication complex NS2A, cofactor for protease NS2B, helicase NS3, formation of the viral replication complex NS4A, formation of the viral replication complex together with the membrane protein NS4B and, RNA polymerase NS5; all of them mainly related with virus replication). Vaccines have been successfully developed for some flaviviruses, but not for others despite much effort (Ishikawa et al. 2014). Indeed, new drugs are being developed to effectively inhibit different flavivirus proteins, especially the envelope, protease, helicase and polymerase (e.g., Luo et al. 2015; de Wispelaere et al. 2018). Still resistance against therapies can be observed as a consequence of certain evolutionary events occurring in proteins of these viruses (e.g., Wang et al. 2017). Therefore, investigating the evolution of these viral proteins is crucial to design durable and effective antiviral treatments. In this concern, two articles recently published in the Journal of Molecular Evolution interestingly investigate evolutionary processes of flavivirus proteins. In particular, Le and Vinh (2020) estimated relative substitution rates among amino acids using the proteome of WNV, DENV and ZIKV, which could be useful to perform more realistic phylogenetic inference of flavivirus proteins. The other study by Nunez-Castilla et al. (2020) identified regions of the proteomes of ZIKV, DENV and other flaviviruses presenting evolutionary constraints that could be used as candidates for broadly neutralizing antiviral drugs targets across flaviviruses. The importance, goals and limits of both studies are discussed in the following section.

Evolutionary Dynamics of Flavivirus Proteins

Protein variants emerge from evolutionary mechanisms (i.e., mutation and recombination) upon which selection operates. The action of selection can be observed in the relative substitution rates among amino acids (i.e., some amino acids could be favored over others to maintain the protein folding and activity) and in the level of conservation in certain regions (i.e., evolutionary constrains also to maintain the protein activity). These aspects of viral protein evolution are discussed below with focus on the particular case of flavivirus, although the discussion could also be extended to other viruses.

The Need for an Empirical Substitution Model for Flavivirus

Traditional phylogenetic inferences based on probabilistic approaches (which currently are the most accurate approaches in phylogenetics) apply a substitution model of molecular evolution (Arenas 2015). At the protein level, a substitution model consists of a 20 × 20 matrix of relative exchangeability rates among amino acids (hereafter, exchangeability matrix Q) and 20 amino acid frequencies at the equilibrium. These parameters are usually estimated from large empirical protein data [i.e., mitochondrial proteins of Arthropoda (Abascal et al. 2007)]. The success of empirical substitution models of protein evolution in phylogenetic inferences is caused by the availability of the models (the exchangeability matrix and the amino acid frequencies of many empirical models have been already developed and implemented in user-friendly computational frameworks that are ready for use without further requirements) and technical simplicity (they assume site-independent evolution that allows straightforward incorporation into likelihood functions). Currently, around 50 empirical substitution models of protein evolution are available. Most of them are based on general nuclear or mitochondrial protein data and some of them were developed from proteins of certain organisms, including viruses like human immunodeficiency virus (HIV) (Nickle et al. 2007) or influenza virus (Dang et al. 2010). Despite the selection and use of a best-fitting substitution model of protein evolution being traditionally considered in phylogenetics as a mandatory procedure to obtain accurate inferences [i.e., topology and branch lengths in the inferred phylogenetic tree (Lemmon and Moriarty 2004)], recently there is some controversy in the field with authors against (Spielman and Kosakovsky Pond 2018; Abadi et al. 2019) and for (Kaehler et al. 2017; Gerth 2019) the need for substitution model selection. This relates to the more general question of if the best fit model is phenomenological or if it relates more directly to the lineage- and gene-specific processes of evolution (Liberles et al. 2013). In my opinion much work is required to assess this issue (i.e., evaluating data with variable molecular diversity and exploring other independent metrics different from the traditionally used sequence similarity). In any case, the need for new empirical substitution models of protein evolution seems to be real due to the lack of representation of many data in the currently available empirical substitution models. For example, using software for substitution model selection Keane et al. (2006) found that the best-fitting empirical substitution model for large proteobacteria and archaea protein datasets was a model inferred from retroviral Pol proteins that is not expected to properly describe their evolutionary processes.

In a recent issue of Journal of Molecular Evolution, Le and Vinh (2020) present a new empirical substitution model of protein evolution based on proteomes of the flavivirus WNV, DENV and ZIKV. Viruses, including flaviviruses, usually present high evolutionary rates and can be subjected to evolutionary constraints (i.e., caused by transmission and co-evolution with the host) different from those occurring in other organisms. Hence, an empirical substitution model based on the proteome of a specific family of viruses can better mimic the evolution of proteins belonging to that family than other empirical substitution models (e.g., see for proteins of the influenza virus Dang et al. 2010). This outcome was also found by Le and Vinh (2020) for the new empirical substitution model of flavivirus proteins where the model better fit (in terms of maximum likelihood) test flavivirus protein datasets than other empirical substitution models, including substitution models based on proteins from other viruses such as HIV and influenza virus. Consequently, this new empirical substitution model of protein evolution can be useful to obtain accurate phylogenetic inference from protein sequences of flaviviruses. Still future work to properly model protein evolution is needed. In general, empirical models can be improved with the incorporation of protein sequences from new studies, a clear description of underlying modeling error, and implementation in software for substitution model selection and phylogenetic inference. Being more ambitious, new substitution models of protein evolution could increase in realism by avoiding technical assumptions, like substitution reversibility, that are made for mathematical simplicity. Indeed, analyses based on empirical substitution models for protein evolution ignore that different protein sites can evolve under different evolutionary patterns with different effects on the protein stability and activity (Echave et al. 2016; Jiménez-Santos et al. 2018), suggesting the use of more complex substitution models of protein evolution (e.g., Wilke 2012; Bordner and Mittelmann 2013; Echave and Wilke 2017; Bastolla and Arenas 2019; Arenas and Bastolla 2020).

The Need for Identifying Regions with Evolutionary Constraints Along the Proteomes of Flaviviruses

Genomic regions are often subjected to different strengths of selection on molecular stability and activity indicating that evolutionary patterns usually vary across genome sequences (Arbiza et al. 2011; Jiménez-Santos et al. 2018; Del Amparo et al. 2020). Therefore, in order to design an antiviral therapy one should investigate, among other aspects, which regions of the proteome could act as potential therapy targets (i.e., evaluating their biological function) and also under which patterns such regions evolve (i.e., evaluating their capacity to acquire diversity and potential drug resistance during a given period of time). Viral proteomes include active proteins with and without unique 3D structures (i.e., note that flaviviruses include both type of proteins, as discussed above). Interestingly, a large fraction of proteomes consists of intrinsically disordered proteins that are biologically active (Xue et al. 2012). Moreover, it is known that the conformational flexibility of structurally disordered protein regions allow them to acquire a new function while maintaining the original one (i.e., affecting antibody binding in envelope proteins) and, actually, disordered proteins of flaviviruses have shown rapid evolutionary dynamics of structural disorder favoring functional change (Ortiz et al. 2013).

In a recent issue of Journal of Molecular Evolution, Nunez-Castilla et al. (2020) identified highly conserved regions (in both sequence and structure) in the proteomes of ZIKV, DENV and other flaviviruses that could be used as candidates for broadly neutralizing antiviral drugs targets against flaviviruses. The study also investigates regions related to viral transmission (vector specificity) by analyzing their evolutionary rates. An important consideration made in this study is that, following from previous work (Chong et al. 2018), intrinsically disordered proteins can present structural features conserved across the conformational ensemble that could be used as potential drug targets. Clearly, highly conserved regions (in protein sequence and structure) are often catalytic residues (Ribeiro et al. 2020) that should be considered in the design of antiviral drugs, but also other more variable regions (i.e., stabilizing residues near a binding pocket) can present important roles in protein activity and should not be ignored when designing antiviral drugs (i.e., note that the drug must properly fit within the protein binding pockets). Interestingly, the study by Nunez-Castilla et al. (2020) discusses the possible application of some inhibitors, which have been already used against other viruses like Hepatitis C virus (HCV), to flaviviruses. This opens the door towards future research involving molecular docking between those possible inhibitors and the identified regions in the proteomes of flaviviruses with potential as drug targets. Clearly, this study presents progress in the field and suggests directions for future research to improve our current set of therapies against flaviviruses.

Concluding Thoughts

The evolutionary patterns observed in the proteomes of flaviviruses can be summarized by an empirical substitution model of evolution that can be useful to obtain more accurate phylogenetic inference than those performed under other empirical substitution models that are currently available. This new empirical substitution model of protein evolution should help to clarify our knowledge about the origin and evolution of flaviviruses. However, one should be aware about the assumptions made by the empirical substitution models of protein evolution and, in this concern, the establishment and use of more realistic models, like those that directly consider stability and functional constraints, should be encouraged.

The heterogeneous strength of selection throughout the proteomes of flaviviruses (i.e., caused by constraints from molecular stability and function) results in proteome regions presenting variable levels of molecular diversity. In this direction, some regions present large degrees of conservation in sequence and structure, suggesting that they play an important role in the activity of the virus and alterations can be fatal for its life cycle. Consequently, these regions can be used as molecular targets of antiviral therapies. However, other less conserved regions could also be used for this purpose despite their identification being more complex (i.e., they could not be recognized by analyzing only genetic diversity). In any case, it is clear that identifying flaviviruses proteome regions that can potentially be used as molecular targets of antiviral therapies is important progress in the field. A crucial subsequent step will be a computational and experimental evaluation of those potential candidate regions with current and new molecular therapies.