Abstract
One of the most widely accepted ideas related to the evolutionary rates of proteins is that functionally important residues or regions evolve slower than other regions, a reasonable outcome of which should be a slower evolutionary rate of the proteins with a higher density of functionally important sites. Oddly, the role of functional importance, mainly measured by essentiality, in determining evolutionary rate has been challenged in recent studies. Several variables other than protein essentiality, such as expression level, gene compactness, protein–protein interactions, etc., have been suggested to affect protein evolutionary rate. In the present review, we try to refine the concept of functional importance of a gene, and consider three factors—functional importance, expression level, and gene compactness, as independent determinants of evolutionary rate of a protein, based not only on their known correlation with evolutionary rate but also on a reasonable mechanistic model. We suggest a framework based on these mechanistic models to correctly interpret the correlations between evolutionary rates and the various variables as well as the interrelationships among the variables.
Similar content being viewed by others
Avoid common mistakes on your manuscript.
Introduction
Darwin described evolution as “descent with modification in traits by natural selection” (Darwin 1859). Extending Darwin’s statement to the molecular level, genomic sequences have been evolving in different species since they first diverged from common ancestral sequences. The hundreds of complete genomic sequences strongly suggest that all life on earth have common origin and that their morphological evolution has been mediated by molecular changes. However, evolutionary rates vary significantly among proteins (Kimura and Ota 1974; Wilson et al. 1977; Wall et al. 2005), and identification of factors affecting protein evolutionary rates has been the focus of numerous studies (Koonin 2005; Koonin and Wolf 2006; McLnerney et al. 2006; Pal et al. 2006). Despite these efforts, an unambiguous and consistent mechanism-based explanation of evolutionary rate differences among genes has not yet emerged.
One of the most consistent empirical findings related to the evolutionary rates of sequences is that genic sequences generally evolve slower than nongenic sequences (Gilbert 1978; Li and Graur 1991; Li 1997). Nearly neutral theory explains that the faster rates of nongenic sequence evolution can be attributed to the relaxed functional constraints on these regions (Ohta 1992; Li 1997). This reasoning can be extended to explain the regional evolutionary rate differences within a gene. For example, coding regions evolve slower than the 5′UTR, 3′UTR, and introns; nonsynonymous sites evolve slower than synonymous sites; fourfold degenerate sites evolve more rapidly than less degenerate sites (Li 1997); and certain domains or functional motifs evolve slower than other coding regions of a gene. All of these observations are consistent with the theory of nearly neutral evolution; i.e., the strength of functional constraints determines the rates of evolution within a gene (Li 1997). Thus, the regional sequence conservation observed within a group of orthologous genes provides a useful marker for identifying functionally important domains or motifs (Zhang and He 2005). In addition, the severity of detrimental phenotypic effects caused by genetic perturbations, including site-directed mutagenesis or the deletion of genic regions is a reasonable predictor of the functional importance of that region (Hirsh and Fraser 2001; Castillo-Davis and Hartl 2003; Krylov et al. 2003; Fraser et al. 2003; Rocha and Danchin 2004; Liao et al. 2006; Dotsch et al. 2010). Therefore, it seems clear that the functional constraint is a major determinant of evolutionary rate differences between genic and nongenic sequences, between coding and noncoding regions at the genomic level, and between functional domains and other regions within a single gene.
Functional importance is not, however, a directly measurable quantity. Perhaps for this reason several studies aiming to identify major determinants of average evolutionary rate differences between proteins have arrived at inconsistent conclusions (Jovelin and Phillips 2009; Wang and Zhang 2009; Zeng and Gu 2010; Razeto-Barry et al. 2011). A specific point of contention has been triggered by reports suggesting that essential genes or proteins, as defined by the lethality caused by deleting these genes in model organisms such as yeast and mouse, do not necessarily evolve more slowly than nonessential genes, for instance Hurst and Smith (1999). This is contrary to intuition considering the established relationship between functional importance and evolutionary rate described above. Following Hurst and Smith (1999), several other studies have investigated the relationship between essentiality and evolutionary rate of a protein (Hirsh and Fraser 2001; Jordan et al. 2002; Krylov et al. 2003; Rocha and Danchin 2004; Zhang and He 2005; Dotsch et al. 2010). Essentiality is one of many quantifiable variables explored in search for determinants of genic evolutionary rate (see below for a discussion of the others). However, interrelation among these variables and their independent contributions to evolutionary rates are not entirely clear. In this review, we try to synthesize many scattered studies that link gene evolutionary rates with various variables, and we suggest a cohesive way of viewing the relationships between evolutionary rates and its correlative variables. In the following section, we focus on three determinants that are justified by explicit models, explaining how they may directly affect coding sequence substitution.
Variables and Their Correlations with Genic Evolutionary Rates
Table 1 (see Table 2 for glossary of terms) lists the most prominent variables tested so far with regards to overall evolutionary rates of proteins as well as those studied with regards to regional evolutionary rates within a protein. Evolutionary rates measured by dN (nonsynonymous substitution rate), dS (synonymous substitution rate), or dN/dS (considered as normalized dN) have also been correlated with several other parameters (Koonin 2005; Pal et al. 2006; Koonin and Wolf 2006; McLnerney et al. 2006). Essentiality has been the most compelling variable investigated as a potential determinant of protein evolutionary rate because it is an intuitive measure of overall structural–functional constraints on a protein. However, essentiality has failed to emerge as the only or primary variable. In fact, some studies even reported a weak or no negative correlation between essentiality and the evolutionary rate (Hurst and Smith 1999; Yang et al. 2003) (Table 3), although negative correlations have been observed in most of the studies (Hirsh and Fraser 2001; Jordan et al. 2002; Krylov et al. 2003; Rocha and Danchin 2004; Zhang and He 2005; Chen and Xu 2005; Wall et al. 2005; Liao et al. 2006; Plotkin and Fraser 2007; Larracuente et al. 2008; Wang and Zhang 2009; Dotsch et al. 2010).
Besides essentiality, the variable that has been most consistently observed to correlate with coding sequence evolutionary rate is expression level (EL) of the gene (Krylov et al. 2003; Subramanian and Kumar 2004; Rocha and Danchin 2004; Drummond et al. 2005; Wall et al. 2005; Lemos et al. 2005; Lin et al. 2007) (Table 4). As shown in Table 4, especially in single-celled organisms such as E. coli, B. subtilis, and S. cerevisiae, highly expressed genes are consistently found to evolve slowly, although Hudson and Conant (2011) argued that EL is not a primary predictor of evolutionary rate in mammals. In addition, Liao et al. (2006) showed that mammalian genes follow a different evolutionary rule, i.e., compactness, expression breadth (EB), and essentiality are more important than EL in determining the evolutionary rate. In mammals, EB was found to be consistently negatively associated with the probability of amino acid replacement, even more strongly than EL (Duret and Mouchiroud 2000; Zhang and Li 2004; Subramanian and Kumar 2004; Yang et al. 2005; Liao et al. 2006; Liao and Zhang 2006; Zhu et al. 2008; Park and Choi 2010) (Supplementary Table 1). The number of protein–protein interactions (PPI) is another variable that shows a negative correlation with evolutionary rate, mostly demonstrated in yeast (Supplementary Table 1). It is assumed that proteins with a higher degree of interaction may evolve slowly because more sites should be evolutionarily constrained in a protein that is functionally interacting with several other proteins, relative to those interacting with fewer proteins. In fact, several studies have shown that highly connected proteins in a PPI network evolve more slowly (Fraser et al. 2002; Teichmann 2002; Jordan et al. 2003; Krylov et al. 2003; Hahn et al. 2004; Lemos et al. 2005), although contradictory observations have also been reported (Bloom and Adami 2003; Zhou et al. 2008; Jovelin and Phillips 2009; Podder et al. 2009). Propensity of gene loss (PGL) is also correlated with evolutionary rates (Krylov et al. 2003), i.e., genes with lower PGL are more likely to evolve slowly (Supplementary Table 1).
Variables related to gene length, such as intron, UTR, and CDS lengths (sometimes jointly referred to as gene compactness) are also known to correlate with evolutionary rate. Genes with longer introns i.e., less compact genes, are more likely to evolve slowly (Marais et al. 2005; Liao et al. 2006; Vinogradov 2010), although a contradictory observation has also been reported (Lemos et al. 2005) (Table 5). The variables pertaining to gene length have been investigated with regards to their correlation with other variables such as recombination rates (Vinogradov 2001; Comeron and Kreitman 2002; Comeron and Guthrie 2005), EL (Vinogradov 2001; Castillo-Davis et al. 2002; Urrutia and Hurst 2003; Marais et al. 2005; Carmel and Koonin 2009; Woody et al. 2011), EB (Vinogradov 2004; Vinogradov 2006; Zhu et al. 2008; Shabalina et al. 2010; Rao et al. 2010), and codon usage (Vinogradov 2001; Comeron and Kreitman 2002; Comeron and Guthrie 2005) (Supplementary Table 2). The relationships of the length variables with other variables are subtle and apparently contradictory at times, and ultimately evade a unified and biologically interpretable theoretical model.
Correlation tests have been the major methodology used for most of these studies, which does not necessarily imply causation. Some of the relationships between evolutionary rates and variables may be secondary or indirect, because of the variable’s inherent correlation with another, potentially unknown, causative variable. For instance, there is a positive correlation between EL and EB (Subramanian and Kumar 2004; Pal and Guda 2006; Park and Choi 2010), between EL and codon usage bias (Comeron et al. 1999; Duret and Mouchiroud 1999; Iida and Akashi 2000; Urrutia and Hurst 2003; Ingvarsson 2008; Zhou et al. 2009), between PGL and PPI (Krylov et al. 2003), and between pleiotropy and PPI (He and Zhang 2006). All of these variables are correlated with evolutionary rate. It is unclear which of these inter-variable correlations are due to direct causal links and which are indirect correlations caused by their independent correlations with evolutionary rate. One possible way to distinguish causal links from circumstantial correlation might be to look for experimental or theoretical evidence that supports a causal relationship. In the following, we describe three determinants and their corresponding models/hypotheses that specify how each directly and independently influences the evolutionary rate of genes.
“Function-Centered” Variable and the “Function (Fitness) Density” Model
Several variables including solvent accessibility (Choi et al. 2006; Lin et al. 2007; Conant and Stadler 2009; Franzosa and Xia 2009; Ramsey et al. 2011; Toth-Petroczy and Tawfik 2011), types of interaction (e.g., hydrogen bond, disulfide, ionic, hydrophobic, etc.) and types of secondary structure (helix, strand, loop, turn, etc.) in a protein (Choi et al. 2006; Peralta et al. 2011), and types of alternative splicing (e.g., ASE (alternative splicing exon), and CSE (constitutive splicing exon)) (Ermakova et al. 2006; Plass and Eyras 2006; Chen et al. 2012; Wu and Chen 2012) have been considered to be important in determining the regional evolutionary rates within a protein (Table 1). For instance, the sites encoding internal amino acids of a protein (i.e., a lower solvent accessibility), the sites involved in disulfide or hydrophobic interactions, the sites with higher electrostatic charge, or the sites participating in helix or strand structure of a protein tertiary structure were reported to be under a stronger evolutionary constraint and to evolve slowly, although there are also some contradictory findings even on this regional evolutionary rate issue (Zhou et al. 2008). At the most basic level, it is the mutations at nucleotide sites that are favored or disfavored by selection depending on if and how they affect organismal fitness or reproductive success. Therefore, according to the “function (fitness) density” model, fraction of sites in a protein or density of functionally important sites should ultimately determine a protein’s evolutionary rate (Zuckerkandl 1976; Rocha 2006; Lin et al. 2007; Wang and Zhang 2009). Thus it seems intuitive that the proteins with a greater fraction of functionally constrained sites evolve slower, implying that the region-specific variables are specific instances of a more general variable for quantifying overall functional constraints on a protein.
Essentiality (or dispensability) was the first variable to be tested for correlation with evolutionary rate of a protein, because it is arguably an appropriate measure of a protein’s overall functional importance. Classically, proteins that have a large fraction of functionally important residues are expected to be under stronger evolutionary constraints and are also expected to be essential and to evolve relatively slowly (Hirsh and Fraser 2001; Jordan et al. 2002; Krylov et al. 2003; Rocha and Danchin 2004; Koonin 2005; Pal et al. 2006). Essentiality has generally been estimated by phenotypic lethality or sometimes by a growth rate differential after a gene knock-out or knock-down in a model organism such as yeast or mouse (Gerdes et al. 2003; Fang et al. 2005; Kim and Copley 2007; Gong et al. 2008; Scholle and Gerdes 2008; Bergmiller et al. 2012). With some exceptions, as mentioned earlier, significant negative correlations have been consistently observed between gene essentiality and evolutionary rate (Table 3). The negative relationship between essentiality and evolutionary rate was observed in multicellular organisms, but was less obvious in unicellular organisms such as yeast (Hurst and Smith 1999; Yang et al. 2003; Chen and Xu 2005), possibly due to the inadequacy of essentiality as a measure of functional importance. Essentiality may not be a robust proxy for functional importance for several reasons: (1) essentiality has been measured under laboratory conditions which are nutritionally abundant compared to natural conditions, and therefore may not faithfully reveal essential genes in natural conditions (Pal et al. 2006), (2) as such, essentiality measures the effect of complete gene deletion, a relatively rare occurrence in nature, and therefore cannot capture the fraction of functionally constrained sites in the protein (Pal et al. 2006), (3) essential genes in one species may not be essential in other organisms used to estimate the evolutionary rate (Bergmiller et al. 2012), (4) essential genes for growth may not be essential from an evolutionary point of view (Fang et al. 2005), and (5) while essentiality implies functional importance, the converse need not be true; a gene affecting reproductive fitness is clearly functionally important and will be evolutionarily constrained but may not be essential, as measured by the experiments.
Functional importance of a protein can be generally defined as the effect size on fitness caused by any perturbation in the protein’s activity. Under this definition, one can unify several variables. Specifically, a protein expressed in early embryonic stage rather than in late adult stage, a protein expressed broadly rather than restricted to a tissue, or a protein that is highly connected to other protein, is more likely to be functionally important and therefore more likely to be evolutionarily constrained. Thus, functional importance of a gene has multiple facets represented by presumably independent variables. These variables nevertheless exert a joint effect on functional importance, and their individual effects on overall functional importance and evolutionary rates is difficult to disambiguate. It is notable that several studies have investigated whether proteins expressed in a broad range of tissues (EB), more connected in a PPI network, affecting several processes or phenotypes (pleiotropy), and with a lower propensity of gene loss (PGL) during evolution, are more constrained and more likely to be essential (Tudor et al. 1999; Jeong et al. 2001; Wuchty 2002; Krylov et al. 2003; Yu et al. 2004; Hahn et al. 2004; Hahn and Kern 2005; Yang et al. 2005; Goh et al. 2007; Zotenko et al. 2008; Park and Kim 2009; Hao et al. 2010) (Supplementary Table 1). In fact, all four variables – EB, PPI, pleiotropy, and PGL, have been shown to be correlated with essentiality (Supplementary Table 2). Although the correlation is sometimes weak (Coulomb et al. 2005), the existence of this correlation indicates that these four variables may affect evolutionary rate only indirectly via their effect on essentiality. Furthermore, several studies have shown that there are inherent correlations among these variables; EB has a positive correlation with PPI (Alvarez-Ponce 2012; Rodgers-Melnick et al. 2012) and pleiotropy (Tuller et al. 2008), PPI has a positive correlation with pleiotropy (He and Zhang 2006), and PGL has a negative correlation with PPI (Krylov et al. 2003). Thus, these variables—essentiality, EB, PPI, pleiotropy, and PGL, may not be independent and can operationally be grouped into a single category called “function (fitness)-centered” variable, representing the overall functional importance of a protein. Considering the “function (fitness)-centered” variable and the corresponding “functional (fitness) density” model supports the idea that functionally important proteins, and more specifically, the proteins with greater fraction of functionally important residues, are more likely to evolve slowly. Thus, we argue that the “function (fitness)-centered” variable unifies several variables and as such, represents a more complete variable in determining protein evolutionary rate—one that is likely to be more independent of other factors that could potentially influence evolutionary rate.
EL and Two Different Hypotheses: The “Translational Selection” Hypothesis and “Mistranslation-Induced Misfolding (MIM)” Hypothesis
The possibility that EL can constrain nucleotide sequence evolution was first recognized from the observation that highly expressed genes are biased in their synonymous codon usage (i.e., codon usage bias) in several prokaryotes and unicellular eukaryotes including E. coli, S. typhimurium, and S. cerevisiae (Pal et al. 2001; Krylov et al. 2003; Marais et al. 2004; Rocha and Danchin 2004; Wall et al. 2005), although this relationship was less obvious in multicellular organisms (Liao et al. 2006; Hudson and Conant 2011) (Supplementary Table 3). Several groups suggested the “translational selection” hypothesis as an explanation of how requirement for high gene expression level might directly affect synonymous changes in the coding region of a gene (Sharp et al. 1993; Akashi 1994; Akashi and Eyre-Walker 1998; Moriyama and Powell 1998; Akashi 2001; Drummond et al. 2006; Comeron 2006; Kotlar and Lavner 2006; Waldman et al. 2011; Gingold and Pilpel 2011; Plotkin and Kudla 2011). The “translational selection” hypothesis posits that highly expressed proteins require optimized codons for accurate and efficient translation, which produces a negative correlation between codon usage bias and dS, and between EL and dS (Precup and Parker 1987; Akashi 1994; Akashi and Eyre-Walker 1998; Comeron et al. 1999; Iida and Akashi 2000; Urrutia and Hurst 2001; Stoletzki and Eyre-Walker 2007; Ingvarsson 2008; Hiraoka et al. 2009). Interestingly, some studies have suggested that selection favors preferred codons at sites where misincorporations, or translational errors are expected to be critical, implicating that translation selection for increased accuracy might, in part, influence variation on codon usage bias and dN (Precup and Parker 1987; Akashi 1994; Stoletzki and Eyre-Walker 2007; Kramer and Farabaugh 2007; Drummond and Wilke 2009).
Several papers have shown that EL can constrain nonsynonymous codon evolution. The important role of EL in determining nonsynonymous evolutionary rate has been well described in bacteria, yeast, and Drosophila (Pal et al. 2001; Rocha and Danchin 2004; Zhang and He 2005; Drummond et al. 2005, 2006; Larracuente et al. 2008) (Table 4). A strong negative correlation was consistently observed between the dN and the EL of a gene, although this correlation is stronger in unicellular organisms than in multicellular organisms (Liao et al. 2006; Hudson and Conant 2011). However, the “translational selection” hypothesis is not sufficient to explain why EL shows an even stronger correlation with dN than it does with dS (Wilke and Drummond 2006; Gingold and Pilpel 2011). Drummond et al. (2006) have therefore suggested a novel hypothesis, “mistranslation-induced misfolding (MIM),” which posits that highly expressed genes evolve more slowly because they experience a stronger negative selection against the toxic effects of misfolded proteins induced by mistranslation, leading to a slower rate of coding sequence substitutions (Drummond et al. 2006; Drummond and Wilke 2008). Consistently, when analyzing the relationship between codon bias and protein structural integrity, some groups found that translationally optimal codons were preferentially used at the sites at which mutations led to protein misfolding and aggregation (Zhou et al. 2009; Lee et al. 2010;). In addition, Yang et al. (2012) showed through simulation that highly abundant proteins in yeast are more likely to use misfolding-minimizing amino acids and that these sites are evolutionarily more constrained than other sites of the same proteins. Thus, EL can influence the rates of both synonymous and nonsynonymous changes and represents a major independent determinant of evolutionary rate.
Gene Compactness and the “Hill-Robertson Interference” Hypothesis
Gene compactness, variously measured by intron length, UTR length, or CDS length, has additionally been considered as an independent variable determining coding sequence evolution (Comeron et al. 1999; Duret and Mouchiroud 1999; Subramanian and Kumar 2004; Liao et al. 2006; Kim and Yi 2007; Larracuente et al. 2008) (Table 5). For instance, correlative studies in several organisms including E. coli, D. melanogaster, and C. elegans revealed CDS length to be inversely correlated with codon bias (Kliman and Hey 1993; Comeron et al. 1999; Marais et al. 2001; Comeron and Kreitman 2002; Campos et al. 2012) (Supplementary Table 4). Interestingly, the relationship between intron length and codon bias is mixed, i.e., a positive correlation in unicellular organisms whereas a negative correlation in multicellular organisms has been observed (Vinogradov 2001; Comeron and Kreitman 2002; Comeron and Guthrie 2005) (Supplementary Table 4). Recombination was considered as a mechanism underlying the relationship between gene or intron length and codon bias (Comeron et al. 1999; Marais et al. 2001; Duret 2001; Comeron and Kreitman 2002; Fedorova and Fedorov 2003; Pal et al. 2006). Several papers have reported that a lower probability of recombination in short genes can reduce the effectiveness of natural selection thus reducing the codon bias in those genes (Kliman and Hey 1993; Hudson 1994; Betancourt et al. 2009; Charlesworth et al. 2009; Campos et al. 2012). More generally, “Hill-Robertson interference” hypothesis posits that efficient natural selection at two genetic loci is curbed by low recombination and higher linkage between the two loci (Hill and Robertson 1966; Felsenstein 1974; Gordo and Charlesworth 2001). In other words, when two loci are genetically linked, both, fixation of beneficial mutations, as well as elimination of deleterious mutations at a site can be prevented due to interference caused by selection at the linked site (Marais et al. 2005; Larracuente et al. 2007). Recombination can enhance the effectiveness of selection by breaking the linkage during meiosis (Carvalho and Clark 1999; Comeron et al. 1999; Duret 2001; Comeron and Kreitman 2002). Thus, higher rates of recombination can affect evolution in both directions, either a decreased or an increased evolutionary rate, depending on the relative occurrence of advantageous and deleterious mutation (Pal et al. 2006). Hill-Robertson interference implies that a larger genomic distance between sites under selection (say, in less compact genes with longer introns separating exons) can facilitate natural selection such that advantageous mutations can be fixed and deleterious mutations can be eliminated more efficiently.
There are two potential evolutionary mechanisms for relieving Hill-Robertson interference, either by lowering compactness via elongation of introns, UTRs, or even CDS of a gene, or by increasing recombination rates. Interestingly, in Drosophila, genes located in regions of lower recombination rates tend to have longer introns (Carvalho and Clark 1999; Comeron and Kreitman 2000), which might be interpreted as a mechanism for enhancing the probability of recombination, presumably to facilitate natural selection (Comeron and Kreitman 2000; Comeron et al. 2008). However, Prachumwat et al. (2004) reported an opposite finding that longer introns are located in regions of higher recombination rates in C. elegans (Prachumwat et al. 2004).
Clearly, intron (or exon) number is directly related to gene compactness and therefore indirectly influence gene evolutionary rate. However, intron number can also influence the evolutionary rate via alternative mechanisms. For instance, some groups showed that mammalian exonic splice site enhancers (ESEs) located at the exon–intron boundaries are strongly constrained, thus constraining the codons near the boundaries. Consequently, intron number is significantly negatively correlated with dN and dS (Parmley et al. 2007; Larracuente et al. 2008; Carmel and Koonin 2009).
The relationship between gene compactness and the rate of protein evolution is nuanced, and only a few studies have investigated this relationship directly (Table 5). Marais et al. (2005) found that among 630 Drosophila genes analyzed, the genes with introns have significantly lower dN, and there is a negative relationship between total intron length and dN. The authors concluded that the negative relationship is likely to be driven by a need for more efficient purifying selection against deleterious mutations in genes with longer introns due to relaxation of the Hill-Robertson interference. Liao et al. (2006) also showed that genes with longer introns or UTRs tend to evolve slowly, consistent with a more effective negative selection in less compact genes.
This relationship between intron length and evolutionary rate is further complicated when we consider the relationship between EL and intron length. Several studies have shown that highly expressed genes have shorter introns (Castillo-Davis et al. 2002; Urrutia and Hurst 2003; Subramanian and Kumar 2004; Warringer and Blomberg 2006) (Table 6). Taken together with observation that genes with higher EL evolve slower, genes with longer introns should evolve faster than genes with shorter introns. However, as described above, genes with longer introns evolve slower, consistent with relaxation of the Hill-Robertson interference in the presence of purifying selection. In fact, inconsistent with the studies showing a negative relationship between intron length and EL, some studies have found that genes with longer introns or CDS are expressed at a higher level, particularly in plants such as rice and Arabidopsis (Vinogradov 2001; Marais et al. 2005; Stenoien 2007; Carmel and Koonin 2009; Woody et al. 2011). Similarly, the relationship between EL and codon bias has not been straightforward, and highly expressed genes do not necessarily use more biased codons, especially in mammals (Gonzalez et al. 1989; Fitch and Strausbaugh 1993; Hiraoka et al. 2009; Misawa and Kikuno 2011) (Supplementary Table 3). Furthermore, one more issue should be considered to appreciate the nuanced relationship between EL and intron length, and between intron length and evolutionary rate, which is related to two different models of explaining how or why introns are maintained in genomes: “selection for energy cost” or “genome design.”
The Relationship Between EL and Compactness: the “Genome Design” Versus the “Selection for Economy” Model
Several studies have attempted to show how one evolutionary-rate-correlative variable is correlated with other such variables (Rocha and Danchin 2004; Liao et al. 2006; Larracuente et al. 2008). For instance, EL has a positive correlation with EB but a negative correlation with intron length (Vinogradov 2001; Marais et al. 2005; Liao et al. 2006; Stenoien 2007; Park and Choi 2010) (Table 6). However, it is unclear whether these various correlative relationships are due to direct causation. For example, with respect to the observation that genes with a higher EL tend to have shorter introns, it is possible that these two variables are correlated because they both independently influence coding sequence evolution and are correlated with evolutionary rate. Similarly, with regards to the correlation between compactness and the various fitness-centered variables, it has not been shown whether intron length of a gene is directly related to functional importance of that gene independent of its relationship with evolutionary rate. Of the three broad categories of variables—functional importance, EL (or EB), and compactness, no mechanistic models have been proposed that link functional importance to either EL or gene compactness. However, two models have been proposed to explain the relationship between expression and compactness: the “selection for economy” model and “genome design” model (Castillo-Davis et al. 2002; Eisenberg and Levanon 2003; Urrutia and Hurst 2003; Wagner 2005; Vinogradov 2006).
The “selection for economy” model posits that due to high energetic costs associated with transcription and translation, natural selection would favor compactness (i.e., shorter size) of highly expressed genes (Castillo-Davis et al. 2002; Urrutia and Hurst 2003). As described above, while several studies have shown a negative correlation between EL and intron length, i.e., positive correlation between EL and compactness (Table 6), several others have found the opposite result, i.e., that highly expressed genes have longer introns (Table 6). A negative correlation between EB and intron length is also expected following the energetics argument. However, similar to EL, while some studies found a positive correlation between EB and intron length, others reported a negative correlation (Eisenberg and Levanon 2003; Rao et al. 2010; Vinogradov 2004; Zhu et al. 2008) (Table 6).
The studies that demonstrated negative correlations between EB and intron length generally adopted the “selection for economy” model (Moriyama and Powell 1998; Castillo-Davis et al. 2002; Eisenberg and Levanon 2003; Rao et al. 2010; Zhu et al. 2008). In contrast, to explain a positive correlation between EB and intron lengths, other studies invoke the “genome design” model which posits that genes expressed in multiple contexts may require a more complex regulatory mechanism and thus may have longer introns to accommodate regulatory elements relative to genes with shorter introns (Vinogradov 2004). Consistent with this possibility, in Drosophila, longer introns especially first introns were found to be evolutionarily more conserved (Haddrill et al. 2005; Marais et al. 2005; Presgraves 2006). Intronic sequence conservation has long been considered as an indicator of transcriptional regulatory elements (Fedorova and Fedorov 2003; Marais et al. 2005; Parra et al. 2011). In addition, Vinogradov (2004) showed that genes with intermediate breadth of expression, likely to require a more complex regulatory mechanism, are more likely to have longer introns, relative to genes expressed within specific context or expressed ubiquitously (Vinogradov 2004).
It seems that the “selection for economy” model could explain why some genes have shorter introns for both broadly or highly expressed genes. However, the “genome design” model does not explain why some highly expressed genes have longer introns even in unicellular eukaryotes, because it is unclear how the complexity of transcriptional regulation would impact level of gene expression. The “selection for economy” model is not enough either to explain the observation that some highly or broadly expressed genes have long introns. In those cases, longer introns might be favored to enable recombination thereby enhancing the efficiency of natural selection.
Concluding Remarks
In this review, we have suggested a framework based on known mechanistic models to interpret the correlations between evolutionary rate and the various variables, as well as correlations between the variables (Fig. 1). We have first attempted to clarify the concept of functional importance of a gene as it relates to evolutionary rate. While gene essentiality is a highly intuitive proxy for functional importance, we have argued why five independent variables studied so far, including essentiality, ought to be considered jointly as “function (fitness)-centered” variable. We then considered three variables, functional importance, EL, and gene compactness as independent effectors of a gene’s evolutionary rate.
We have also attempted to separate the issue of identifying determinants of evolutionary rate from the correlative (secondary) relationships among variables. For instance, some studies showed that highly expressed genes tend to have smaller introns (Castillo-Davis et al. 2002; Subramanian and Kumar 2004; Urrutia and Hurst 2003; Warringer and Blomberg 2006) and asked whether genes with smaller introns might evolve more slowly because they are highly expressed (Marais et al. 2005). In fact, the opposite result was found to be true, i.e., genes with longer introns evolve more slowly. This conceptual inconsistency occurs mainly due to the complex relationships between different variables: (1) relationship between intron size (e.g., gene compactness) and EL proposed by the “selection for economy” and “genome design” hypotheses, (2) the relationship between EL and evolutionary rate posited by the MIM hypothesis, (3) the relationship between intron size and evolutionary rate posited by the “Hill-Robertson interference” hypothesis. Because three different relationships controlled by different evolutionary forces act independently, it may not be possible to predict the overall rate at which a gene with short introns will evolve. Even in the argument for (1), the two models have opposing explanations for how intron lengths are influenced by EL. Furthermore, it remains debatable whether the correlations of evolutionary rates with lengths of introns, CDS, or UTRs (i.e., compactness) are mainly controlled by MIM or by the degree of relief from the Hill-Robertson interference. If the former mechanism is stronger, shorter genes should evolve more slowly, because shorter genes tend to be highly expressed and these short but highly expressed genes are more vulnerable to toxicity caused by a translational error and thus are more likely to be under a stronger selective constraint; however, if the latter mechanism is stronger, genes with shorter introns should evolve more rapidly, assuming that the purifying selection is more prevalent than positive selection. Some groups demonstrated that intronless genes evolve faster in mammals and human, which indicates that EL rather than intron length is the primary determinant of evolutionary rates in those systems (Agarwal 2005; Shabalina et al. 2010).
Another confounding issue is that of determining whether multiple variables exert independent influence on evolutionary rate. For instance, Chen and Dokholyan (2008) showed that essential proteins tend to have lower aggregation propensity compared with nonessential proteins, suggesting that EL might share its influence on evolutionary rate with functional importance. On the other hand, Wolf et al. (2008) have demonstrated that structural–functional constraints and EL have comparable contributions to the rate of protein sequence evolution, suggesting independent roles of EL and functional importance in determining the evolutionary rate. Kim and Yi (2007) have shown, through partial correlation and principal component analysis, that protein length and essentiality play independent roles in protein evolution. Larracuente et al. (2008) have also shown, through partial correlation study in Drosophila, that gene essentiality and recombination along with tissue specificity of gene expression and intron number contribute to evolutionary rates. Taken together, our synthesis of the current literature suggests three main determinants: functional importance, EL, and compactness via recombination, each supported by a mechanistic model, act simultaneously and independently to determine the overall evolutionary rate of a gene.
References
Agarwal SM (2005) Evolutionary rate variation in eukaryotic lineage specific human intronless proteins. Biochem Biophys Res Commun 337:1192–1197
Akashi H (1994) Synonymous codon usage in Drosophila melanogaster: natural selection and translational accuracy. Genetics 136:927–935
Akashi H (2001) Gene expression and molecular evolution. Curr Opin Genet Dev 11:660–666
Akashi H, Eyre-Walker A (1998) Translational selection and molecular evolution. Curr Opin Genet Dev 8:688–693
Alvarez-Ponce D (2012) The relationship between the hierarchical position of proteins in the human signal transduction network and their rate of evolution. BMC Evol Biol 12:192
Bergmiller T, Ackermann M, Silander OK (2012) Patterns of evolutionary conservation of essential genes correlate with their compensability. PLoS Genet 8:e1002803
Betancourt AJ, Welch JJ, Charlesworth B (2009) Reduced effectiveness of selection caused by a lack of recombination. Curr Biol 19:655–660
Bloom JD, Adami C (2003) Apparent dependence of protein evolutionary rate on number of interactions is linked to biases in protein–protein interactions data sets. BMC Evol Biol 3:21
Campos JL, Charlesworth B, Haddrill PR (2012) Molecular evolution in nonrecombining regions of the Drosophila melanogaster genome. Genome Biol Evol 4:278–288
Carmel L, Koonin EV (2009) A universal nonmonotonic relationship between gene compactness and expression levels in multicellular eukaryotes. Genome Biol Evol 1:382–390
Carvalho AB, Clark AG (1999) Intron size and natural selection. Nature 401:344
Castillo-Davis CI, Hartl DL (2003) Conservation, relocation and duplication in genome evolution. Trends Genet 19:593–597
Castillo-Davis CI, Mekhedov SL, Hartl DL, Koonin EV, Kondrashov FA (2002) Selection for short introns in highly expressed genes. Nat Genet 31:415–418
Charlesworth B, Betancourt AJ, Kaiser VB, Gordo I (2009) Genetic recombination and molecular evolution. Cold Spring Harb Symp Quant Biol 74:177–186
Chen Y, Dokholyan NV (2008) Natural selection against protein aggregation on self-interacting and essential proteins in yeast, fly, and worm. Mol Biol Evol 25:1530–1533
Chen Y, Xu D (2005) Understanding protein dispensability through machine-learning analysis of high-throughput data. Bioinformatics 21:575–581
Chen FC, Liao BY, Pan CL, Lin HY, Chang AY (2012) Assessing determinants of exonic evolutionary rates in mammals. Mol Biol Evol 29:3121–3129
Choi SS, Vallender EJ, Lahn BT (2006) Systematically assessing the influence of 3-dimensional structural context on the molecular evolution of mammalian proteomes. Mol Biol Evol 23:2131–2133
Comeron JM (2006) Weak selection and recent mutational changes influence polymorphic synonymous mutations in humans. Proc Natl Acad Sci USA 103:6940–6945
Comeron JM, Guthrie TB (2005) Intragenic Hill-Robertson interference influences selection intensity on synonymous mutations in Drosophila. Mol Biol Evol 22:2519–2530
Comeron JM, Kreitman M (2000) The correlation between intron length and recombination in Drosophila. Dynamic equilibrium between mutational and selective forces. Genetics 156:1175–1190
Comeron JM, Kreitman M (2002) Population, evolutionary and genomic consequences of interference selection. Genetics 161:389–410
Comeron JM, Kreitman M, Aguade M (1999) Natural selection on synonymous sites is correlated with gene length and recombination in Drosophila. Genetics 151:239–249
Comeron JM, Williford A, Kliman RM (2008) The Hill-Robertson effect: evolutionary consequences of weak selection and linkage in finite populations. Heredity (Edinb) 100:19–31
Conant GC, Stadler PF (2009) Solvent exposure imparts similar selective pressures across a range of yeast proteins. Mol Biol Evol 26:1155–1161
Coulomb S, Bauer M, Bernard D, Marsolier-Kergoat MC (2005) Gene essentiality and the topology of protein interaction networks. Proc Biol Sci 272:1721–1725
Darwin C (1859) On the origin of species by means of natural selection. J. Murray, London
Dotsch A, Klawonn F, Jarek M, Scharfe M, Blocker H, Haussler S (2010) Evolutionary conservation of essential and highly expressed genes in Pseudomonas aeruginosa. BMC Genomics 11:234
Drummond DA, Wilke CO (2008) Mistranslation-induced protein misfolding as a dominant constraint on coding-sequence evolution. Cell 134:341–352
Drummond DA, Wilke CO (2009) The evolutionary consequences of erroneous protein synthesis. Nat Rev Genet 10:715–724
Drummond DA, Bloom JD, Adami C, Wilke CO, Arnold FH (2005) Why highly expressed proteins evolve slowly. Proc Natl Acad Sci USA 102:14338–14343
Drummond DA, Raval A, Wilke CO (2006) A single determinant dominates the rate of yeast protein evolution. Mol Biol Evol 23:327–337
Duret L (2001) Why do genes have introns? Recombination might add a new piece to the puzzle. Trends Genet 17:172–175
Duret L, Mouchiroud D (1999) Expression pattern and, surprisingly, gene length shape codon usage in Caenorhabditis, Drosophila, and Arabidopsis. Proc Natl Acad Sci USA 96:4482–4487
Duret L, Mouchiroud D (2000) Determinants of substitution rates in mammalian genes: expression pattern affects selection intensity but not mutation rate. Mol Biol Evol 17:68–74
Eisenberg E, Levanon EY (2003) Human housekeeping genes are compact. Trends Genet 19:362–365
Ermakova EO, Nurtdinov RN, Gelfand MS (2006) Fast rate of evolution in alternatively spliced coding regions of mammalian genes. BMC Genomics 7:84
Fang G, Rocha E, Danchin A (2005) How essential are nonessential genes? Mol Biol Evol 22:2147–2156
Fedorova L, Fedorov A (2003) Introns in gene evolution. Genetica 118:123–131
Felsenstein J (1974) The evolutionary advantage of recombination. Genetics 78:737–756
Fitch DH, Strausbaugh LD (1993) Low codon bias and high rates of synonymous substitution in Drosophila hydei and D. melanogaster histone genes. Mol Biol Evol 10:397–413
Franzosa EA, Xia Y (2009) Structural determinants of protein evolution are context-sensitive at the residue level. Mol Biol Evol 26:2387–2395
Fraser HB, Hirsh AE, Steinmetz LM, Scharfe C, Feldman MW (2002) Evolutionary rate in the protein interaction network. Science 296:750–752
Fraser HB, Wall DP, Hirsh AE (2003) A simple dependence between protein evolution rate and the number of protein–protein interactions. BMC Evol Biol 3:11
Gerdes SY, Scholle MD, Campbell JW, Balazsi G, Ravasz E, Daugherty MD, Somera AL, Kyrpides NC, Anderson I, Gelfand MS, Bhattacharya A, Kapatral V, D’Souza M, Baev MV, Grechkin Y, Mseeh F, Fonstein MY, Overbeek R, Barabasi AL, Oltvai ZN, Osterman AL (2003) Experimental determination and system level analysis of essential genes in Escherichia coli MG1655. J Bacteriol 185:5673–5684
Gilbert W (1978) Why genes in pieces? Nature 271:501
Gingold H, Pilpel Y (2011) Determinants of translation efficiency and accuracy. Mol Sys Biol 7:481
Goh KI, Cusick ME, Valle D, Childs B, Vidal M, Barabasi AL (2007) The human disease network. Proc Natl Acad Sci USA 104:8685–8690
Gong X, Fan S, Bilderbeck A, Li M, Pang H, Tao S (2008) Comparative analysis of essential genes and nonessential genes in Escherichia coli K12. Mol Genet Genomics 279:87–94
Gonzalez F, Romani S, Cubas P, Modolell J, Campuzano S (1989) Molecular analysis of the asense gene, a member of the achaete-scute complex of Drosophila melanogaster, and its novel role in optic lobe development. EMBO J 8:3553–3562
Gordo I, Charlesworth B (2001) Genetic linkage and molecular evolution. Curr Biol 11:R684–R686
Haddrill PR, Charlesworth B, Halligan DL, Andolfatto P (2005) Patterns of intron sequence evolution in Drosophila are dependent upon length and GC content. Genome Biol 6:R67
Hahn MW, Kern AD (2005) Comparative genomics of centrality and essentiality in three eukaryotic protein-interaction networks. Mol Biol Evol 22:803–806
Hahn MW, Conant GC, Wagner A (2004) Molecular evolution in large genetic networks: does connectivity equal constraint? J Mol Evol 58:203–211
Hao L, Ge X, Wan H, Hu S, Lercher MJ, Yu J, Chen WH (2010) Human functional genetic studies are biased against the medically most relevant primate-specific genes. BMC Evol Biol 10:316
He X, Zhang J (2006) Toward a molecular understanding of pleiotropy. Genetics 173:1885–1891
Hill WG, Robertson A (1966) The effect of linkage on limits to artificial selection. Genet Res 8:269–294
Hiraoka Y, Kawamata K, Haraguchi T, Chikashige Y (2009) Codon usage bias is correlated with gene expression levels in the fission yeast Schizosaccharomyces pombe. Genes Cells 14:499–509
Hirsh AE, Fraser HB (2001) Protein dispensability and rate of evolution. Nature 411:1046–1049
Huang YF, Niu DK (2008) Evidence against the energetic cost hypothesis for the short introns in highly expressed genes. BMC Evol Biol 8:154
Hudson RR (1994) How can the low levels of DNA sequence variation in regions of the drosophila genome with low recombination rates be explained? Proc Natl Acad Sci USA 91:6815–6818
Hudson CM, Conant GC (2011) Expression level, cellular compartment and metabolic network position all influence the average selective constraint on mammalian enzymes. BMC Evol Biol 11:89
Hurst LD, Smith NG (1999) Do essential genes evolve slowly? Curr Biol 9:747–750
Iida K, Akashi H (2000) A test of translational selection at ‘silent’ sites in the human genome: base composition comparisons in alternatively spliced genes. Gene 261:93–105
Ingvarsson PK (2008) Molecular evolution of synonymous codon usage in Populus. BMC Evol Biol 8:307
Jeong H, Mason SP, Barabasi AL, Oltvai ZN (2001) Lethality and centrality in protein networks. Nature 411:41–42
Jordan IK, Rogozin IB, Wolf YI, Koonin EV (2002) Essential genes are more evolutionarily conserved than are nonessential genes in bacteria. Genome Res 12:962–968
Jordan IK, Wolf YI, Koonin EV (2003) No simple dependence between protein evolution rate and the number of protein–protein interactions: only the most prolific interactors tend to evolve slowly. BMC Evol Biol 3:1
Jovelin R, Phillips PC (2009) Evolutionary rates and centrality in the yeast gene regulatory network. Genome Biol 10:R35
Kawahara Y, Imanishi T (2007) A genome-wide survey of changes in protein evolutionary rates across four closely related species of Saccharomyces sensu stricto group. BMC Evol Biol 7:9
Kim J, Copley SD (2007) Why metabolic enzymes are essential or nonessential for growth of Escherichia coli K12 on glucose. Biochemistry 46:12501–12511
Kim SH, Yi SV (2007) Understanding relationship between sequence and functional evolution in yeast proteins. Genetica 131:151–156
Kimura M, Ota T (1974) On some principles governing molecular evolution. Proc Natl Acad Sci USA 71:2848–2852
Kliman RM, Hey J (1993) Reduced natural selection associated with low recombination in Drosophila melanogaster. Mol Biol Evol 10:1239–1258
Koonin EV (2005) Systemic determinants of gene evolution and function. Mol Syst Biol 1(2005):0021
Koonin EV, Wolf YI (2006) Evolutionary systems biology: links between gene evolution and function. Curr Opin Biotechnol 17:481–487
Kotlar D, Lavner Y (2006) The action of selection on codon bias in the human genome is related to frequency, complexity, and chronology of amino acids. BMC Genomics 7:67
Kramer EB, Farabaugh PJ (2007) The frequency of translational misreading errors in E. coli is largely determined by tRNA competition. RNA 13:87–96
Krylov DM, Wolf YI, Rogozin IB, Koonin EV (2003) Gene loss, protein sequence divergence, gene dispensability, expression level, and interactivity are correlated in eukaryotic evolution. Genome Res 13:2229–2235
Larracuente AM, Clark AG, Eisen MB, Smith DR, Bergman CM, Oliver B, Markow TA, Kaufman TC, Kellis M, Gelbart W, Iyer VN, Pollard DA, Sackton TB, Singh ND, Abad JP, Abt DN, Adryan B, Aguade M, Akashi H, Anderson WW, Aquadro CF, Ardell DH, Arguello R, Artieri CG, Barbash DA, Barker D, Barsanti P, Batterham P, Batzoglou S, Begun D, Bhutkar A, Blanco E, Bosak SA, Bradley RK, Brand AD, Brent MR, Brooks AN, Brown RH, Butlin RK, Caggese C, Calvi BR, de Carvalho AB, Caspi A, Castrezana S, Celniker SE, Chang JL, Chapple C, Chatterji S, Chinwalla A, Civetta A, Clifton SW, Comeron JM, Costello JC, Coyne JA, Daub J, David RG, Delcher AL, Delehaunty K, Do CB, Ebling H, Edwards K, Eickbush T, Evans JD, Filipski A, Findeiss S, Freyhult E, Fulton L, Fulton R, Garcia AC, Gardiner A, Garfield DA, Garvin BE, Gibson G, Gilbert D, Gnerre S, Godfrey J, Good R, Gotea V, Gravely B, Greenberg AJ, Griffiths-Jones S, Gross S, Guigo R, Gustafson EA, Haerty W, Hahn MW, Halligan DL, Halpern AL, Halter GM, Han MV, Heger A, Hillier L, Hinrichs AS, Holmes I, Hoskins RA, Hubisz MJ, Hultmark D, Huntley MA, Jaffe DB, Jagadeeshan S et al (2007) Evolution of genes and genomes on the Drosophila phylogeny. Nature 450:203–218
Larracuente AM, Sackton TB, Greenberg AJ, Wong A, Singh ND, Sturgill D, Zhang Y, Oliver B, Clark AG (2008) Evolution of protein-coding genes in Drosophila. Trends Genet 24:114–123
Lee Y, Zhou T, Tartaglia GG, Vendruscolo M, Wilke CO (2010) Translationally optimal codons associate with aggregation-prone sites in proteins. Proteomics 10:4163–4171
Lemos B, Bettencourt BR, Meiklejohn CD, Hartl DL (2005) Evolution of proteins and gene expression levels are coupled in Drosophila and are independently associated with mRNA abundance, protein length, and number of protein–protein interactions. Mol Biol Evol 22:1345–1354
Li W-H (1997) Molecular evolution. Sinauer Associates, Sunderland, MA
Li W-H, Graur D (1991) Fundamentals of molecular evolution. Sinauer Associates, Sunderland, MA
Liao BY and Zhang J (2006) Low rates of expression profile divergence in highly expressed genes and tissue-specific genes in during mammalian evolution. Mol Biol Evol 23:1119–1128
Liao BY, Scott NM, Zhang J (2006) Impacts of gene essentiality, expression pattern, and gene compactness on the evolutionary rate of mammalian proteins. Mol Biol Evol 23:2072–2080
Lin YS, Hsu WL, Hwang JK, Li WH (2007) Proportion of solvent-exposed amino acids in a protein and rate of protein evolution. Mol Biol Evol 24:1005–1011
Marais G, Mouchiroud D, Duret L (2001) Does recombination improve selection on codon usage? Lessons from nematode and fly complete genomes. Proc Natl Acad Sci USA 98:5688–5692
Marais G, Domazet-Loso T, Tautz D, Charlesworth B (2004) Correlated evolution of synonymous and nonsynonymous sites in Drosophila. J Mol Evol 59:771–779
Marais G, Nouvellet P, Keightley PD, Charlesworth B (2005) Intron size and exon evolution in Drosophila. Genetics 170:481–485
McLnerney JO, Creevey CJ, Keane TM, Pentony MM, Naughton TJ (2006) Assessment of methods for amino acid matrix selection and their use on empirical data shows that ad hoc assumptions for choice of matrix are not justified. BMC Evol Biol 6:29
Misawa K, Kikuno RF (2011) Relationship between amino acid composition and gene expression in the mouse genome. BMC Res Notes 4:20
Moriyama EN, Powell JR (1998) Gene length and codon usage bias in Drosophila melanogaster, Saccharomyces cerevisiae and Escherichia coli. Nucleic Acids Res 26:3188–3193
Ohta T (1992) The nearly neutral theory of molecular evolution. Annu Rev Ecol Syst 23:263–286
Pal LR, Guda C (2006) Tracing the origin of functional and conserved domains in the human proteome: implications for protein evolution at the modular level. BMC Evol Biol 6:91
Pal C, Papp B, Hurst LD (2001) Highly expressed genes in yeast evolve slowly. Genetics 158:927–931
Pal C, Papp B, Lercher MJ (2006) An integrated view of protein evolution. Nat Rev Genet 7:337–348
Park SG, Choi SS (2010) Expression breadth and expression abundance behave differently in correlations with evolutionary rates. BMC Evol Biol 10:241
Park K, Kim D (2009) Localized network centrality and essentiality in the yeast-protein interaction network. Proteomics 9:5143–5154
Parmley JL, Urrutia AO, Potrzebowski L, Kaessmann H, Hurst LD (2007) Splicing and the evolution of proteins in mammals. PLoS Biol 5:e14
Parra G, Bradnam K, Rose AB, Korf I (2011) Comparative and functional analysis of intron-mediated enhancement signals reveals conserved features among plants. Nucleic Acids Res 39:5328–5337
Peralta H, Guerrero G, Aguilar A, Mora J (2011) Sequence variability of Rhizobiales orthologs and relationship with physico-chemical characteristics of proteins. Biol Direct 6:48
Plass M, Eyras E (2006) Differentiated evolutionary rates in alternative exons and the implications for splicing regulation. BMC Evol Biol 6:50
Plotkin JB, Fraser HB (2007) Assessing the determinants of evolutionary rates in the presence of noise. Mol Biol Evol 24:1113–1121
Plotkin and Kudla (2011) Synonymous but not the same: the causes and consequences of codon bias. Nat Rev Genet 12:32–42
Podder S, Mukhopadhyay P, Ghosh TC (2009) Multifunctionality dominantly determines the rate of human housekeeping and tissue specific interacting protein evolution. Gene 439:11–16
Popescu CE, Borza T, Bielawski JP, Lee RW (2006) Evolutionary rates and expression level in Chlamydomonas. Genetics 172:1567–1576
Prachumwat A, DeVincentis L, Palopoli MF (2004) Intron size correlates positively with recombination rate in Caenorhabditis elegans. Genetics 166:1585–1590
Precup J, Parker J (1987) Missense misreading of asparagine codons as a function of codon identity and context. J Biol Chem 262:11351–11355
Presgraves DC (2006) Intron length evolution in Drosophila. Mol Biol Evol 23:2203–2213
Ramsey DC, Scherrer MP, Zhou T, Wilke CO (2011) The relationship between relative solvent accessibility and evolutionary rate in protein evolution. Genetics 188:479–488
Rao YS, Wang ZF, Chai XW, Wu GZ, Zhou M, Nie QH, Zhang XQ (2010) Selection for the compactness of highly expressed genes in Gallus gallus. Biol Direct 5:35
Razeto-Barry P, Diaz J, Cotoras D, Vasquez RA (2011) Molecular evolution, mutation size and gene pleiotropy: a geometric reexamination. Genetics 187:877–885
Ren XY, Vorst O, Fiers MW, Stiekema WJ, Nap JP (2006) In plants, highly expressed genes are the least compact. Trends Genet 22:528–532
Rocha EP (2006) The quest for the universals of protein evolution. Trends Genet 22:412–416
Rocha EP, Danchin A (2004) An analysis of determinants of amino acids substitution rates in bacterial proteins. Mol Biol Evol 21:108–116
Rodgers-Melnick E, Mane SP, Dharmawardhana P, Slavov GT, Crasta OR, Strauss SH, Brunner AM, DiFazio SP (2012) Contrasting patterns of evolution following whole genome versus tandem duplication events in Populus. Genome Res 22:95–105
Saeed R, Deane CM (2006) Protein protein interactions, evolutionary rate, abundance and age. BMC Bioinformatics 7:128
Sällström B, Arnaout RA, Davids W, Bjelkmar P, Andersson SG (2006) Protein evolutionary rates correlate with expression independently of synonymous substitutions in Helicobacter pylori. J Mol Evol 62:600–614
Schaber J, Rispe C, Wernegreen J, Buness A, Delmotte F, Silva FJ, Moya A (2005) Gene expression levels influence amino acid usage and evolutionary rates in endosymbiotic bacteria. Gene 352:109–117
Scholle MD, Gerdes SY (2008) Whole-genome detection of conditionally essential and dispensable genes in Escherichia coli via genetic footprinting. Methods Mol Biol 416:83–102
Shabalina SA, Ogurtsov AY, Spiridonov AN, Novichkov PS, Spiridonov NA, Koonin EV (2010) Distinct patterns of expression and evolution of intronless and intron-containing mammalian genes. Mol Biol Evol 27:1745–1749
Sharp PM, Stenico M, Peden JF, Lloyd AT (1993) Codon usage: mutational bias, translational selection, or both? Biochem Soc Trans 21:835–841
Stenoien HK (2007) Compact genes are highly expressed in the moss Physcomitrella patens. J Evol Biol 20:1223–1229
Stoletzki N, Eyre-Walker A (2007) Synonymous codon usage in Escherichia coli: selection for translational accuracy. Mol Biol Evol 24:374–381
Subramanian S, Kumar S (2004) Gene expression intensity shapes evolutionary rates of the proteins encoded by the vertebrate genome. Genetics 168:373–381
Teichmann SA (2002) The constraints protein–protein interactions place on sequence divergence. J Mol Biol 324:399–407
Toth-Petroczy A, Tawfik DS (2011) Slow protein evolutionary rates are dictated by surface-core association. Proc Natl Acad Sci USA 108:11151–11156
Tudor M, Murray PJ, Onufryk C, Jaenisch R, Young RA (1999) Ubiquitous expression and embryonic requirement for RNA polymerase II coactivator subunit Srb7 in mice. Genes Dev 13:2365–2368
Tuller T, Kupiec M, Ruppin E (2008) Evolutionary rate and gene expression across different brain regions. Genome Biol 9:R142
Urrutia AO, Hurst LD (2001) Codon usage bias covaries with expression breadth and the rate of synonymous evolution in humans, but this is not evidence for selection. Genetics 159:1191–1199
Urrutia AO, Hurst LD (2003) The signature of selection mediated by expression on human genes. Genome Res 13:2260–2264
Vinogradov AE (2001) Intron length and codon usage. J Mol Evol 52:2–5
Vinogradov AE (2004) Compactness of human housekeeping genes: selection for economy or genomic design? Trends Genet 20:248–253
Vinogradov AE (2006) “Genome design” model: evidence from conserved intronic sequence in human-mouse comparison. Genome Res 16:347–354
Vinogradov AE (2010) Systemic factors dominate mammal protein evolution. Proc R Soc B Biol Sci 277:1403–1408
Wagner A (2005) Energy constraints on the evolution of gene expression. Mol Biol Evol 22:1365–1374
Waldman YY, Tuller T, Keinan A, Ruppin E (2011) Selection for translation efficiency on synonymous polymorphisms in recent human evolution. Genome Biol Evol 3:749–961
Wall DP, Hirsh AE, Fraser HB, Kumm J, Giaever G, Eisen MB, Feldman MW (2005) Functional genomic analysis of the rates of protein evolution. Proc Natl Acad Sci USA 102:5483–5488
Wang Z, Zhang J (2009) Why is the correlation between gene importance and gene evolutionary rate so weak? PLoS Genet 5:e1000329
Warringer J, Blomberg A (2006) Evolutionary constraints on yeast protein size. BMC Evol Biol 6:61
Wilke CO, Drummond DA (2006) Population genetics of translational robustness. Genetics 173:473–481
Wilson AC, Carlson SS, White TJ (1977) Biochemical evolution. Annu Rev Biochem 46:573–639
Wolf MY, Wolf YI, Koonin EV (2008) Comparable contributions of structural-functional constraints and expression level to the rate of protein sequence evolution. Biol Direct 3:40
Woody JL, Severin AJ, Bolon YT, Joseph B, Diers BW, Farmer AD, Weeks N, Muehlbauer GJ, Nelson RT, Grant D, Specht JE, Graham MA, Cannon SB, May GD, Vance CP, Shoemaker RC (2011) Gene expression patterns are correlated with genomic and genic structure in soybean. Genome 54:10–18
Wright SI, Yau CB, Looseley M, Meyers BC (2004) Effects of gene expression on molecular evolution in Arabidopsis thaliana and Arabidopsis lyrata. Mol Biol Evol 21:1719–1726
Wu GCT, Chen FC (2012) Determinants of exon-level evolutionary rates in Arabidopsis species. Evol Bioinf 8:389–415
Wuchty S (2002) Interaction and domain networks of yeast. Proteomics 2:1715–1723
Yang J, Gu Z, Li WH (2003) Rate of protein evolution versus fitness effect of gene deletion. Mol Biol Evol 20:772–774
Yang J, Su AI, Li WH (2005) Gene expression evolves faster in narrowly than in broadly expressed mammalian genes. Mol Biol Evol 22:2113–2118
Yang JR, Liao BY, Zhuang SM, Zhang JZ (2012) Protein misinteraction avoidance causes highly expressed proteins to evolve slowly. Proc Natl Acad Sci USA 109:E831–E840
Yu H, Zhu X, Greenbaum D, Karro J, Gerstein M (2004) TopNet: a tool for comparing biological sub-networks, correlating protein properties with topological statistics. Nucleic Acids Res 32:328–337
Zeng Y, Gu X (2010) Genome factor and gene pleiotropy hypotheses in protein evolution. Biol Direct 5:37
Zhang J, He X (2005) Significant impact of protein dispensability on the instantaneous rate of protein evolution. Mol Biol Evol 22:1147–1155
Zhang L, Li WH (2004) Mammalian housekeeping genes evolve more slowly than tissue-specific genes. Mol Biol Evol 21:236–239
Zhou T, Drummond DA, Wilke CO (2008) Contact density affects protein evolutionary rate from bacteria to animals. J Mol Evol 66:395–404
Zhou T, Weems M, Wilke CO (2009) Translationally optimal codons associate with structurally sensitive sites in proteins. Mol Biol Evol 26:1571–1580
Zhu J, He F, Hu S, Yu J (2008) On the nature of human housekeeping genes. Trends Genet 24:481–484
Zotenko E, Mestre J, O’Leary DP, Przytycka TM (2008) Why do hubs in the yeast protein interaction network tend to be essential: reexamining the connection between the network topology and essentiality. PLoS Comput Biol 4:e1000140
Zuckerkandl E (1976) Evolutionary processes and evolutionary noise at the molecular level. I. Functional density in proteins. J Mol Evol 7:167–183
Acknowledgments
This research was supported by Kangwon National University and Basic Science Research Program through the National Research Foundation of Korea (NRF) funded by the Ministry of Education, Science and Technology (NRF-2011-0010679). SH is funded by NIH R01GM085226. Authors would like to thank the anonymous reviewers for their very generous comments, and Dr. Leonid Sukharnikov and Justin Malin for additional comments on the revised version.
Author information
Authors and Affiliations
Corresponding author
Electronic supplementary material
Below is the link to the electronic supplementary material.
Rights and permissions
About this article
Cite this article
Choi, S.S., Hannenhalli, S. Three Independent Determinants of Protein Evolutionary Rate. J Mol Evol 76, 98–111 (2013). https://doi.org/10.1007/s00239-013-9543-6
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00239-013-9543-6