Introduction

Functional selection has been considered the dominant force shaping proteome evolution (Nei 1975). A competitive advantage (or disadvantage) is conferred by conformational changes and/or alterations in the chemical characteristics of the end protein product. Selection, however, can act globally, at the level of an entire protein or even an entire proteome. One example is the selective advantage gained through the use, where possible, of less biosynthetically expensive amino acids. Akashi and Gojobori (2002) drew attention to the existence of this proteome shaping force in Escherichia coli and Bacillus subtilis by showing that average protein biosynthetic cost varies inversely with expressivity, and Heizer et al. (2006) extended these findings to four additional proteomes.

Both of these studies relied upon adherence to codon usage bias (associated with translational efficiency) as a proxy for gene expressivity. Codon usage bias is the preferential use of the codons associated with the most abundant isoacceptor tRNAs (Ikemura 1981a, b), and is, itself, another example of a global evolutionary force. Care was taken in these previous studies to ensure that the codon usage bias exhibited characteristics indicative of that imparted by translational efficiency by ensuring that known highly expressed genes (such as ribosomal protein coding genes) adhered strongly to the identified bias.

These predicted protein expression levels were compared to each protein’s average biosynthetic cost for its constituent amino acids. Amino acid biosynthetic costs were determined through examination of the metabolic pathways associated with the synthesis of each amino acid and identifying the number of high-energy phosphate bonds diverted for the synthesis of the amino acid.

This analysis can be performed on any organism for which the following conditions hold: the nucleotide sequence of the genome is known, the nature of the organism’s metabolic pathways is understood, and the organism’s codon usage has been shaped predominantly by translational efficiency bias. The National Center for Biotechnology Information (NCBI) has candidate genome sequence data for 1,700 microbial genomes that allows for whole- and comparative-genome analysis to identify and further characterize global, genome-wide, evolutionary forces.

Others have identified trends suggestive of cell economy but have limited their analysis to single, or small numbers of organisms, and generally limit their measure of biosynthetic cost to proxies such as amino acid aromaticity, complexity, or overall protein size (Lobry and Gautier 1994; Garat and Musto 2000; Jansen and Gerstein 2000; Palacios and Wernegreen 2002; Zavala et al. 2002; Urrutia and Hurst 2003; Peixoto et al. 2004; Chanda et al. 2005; Schaber et al. 2005; Kahali et al. 2007; Raiford et al. 2008; Bragg and Wagner 2009). Larger-scale analyses have been undertaken to determine the universality of cost selection in microbial organisms. Seligmann (2003) reported evidence of cost selection in a study of 256 bacteria (along with organisms from other domains). That study, however, used amino acid molecular weight as a proxy for biosynthetic cost and did not consider lifestyle or auxotrophy. Additionally, Seligmann utilized a simple count of synonymous codons across each whole genome to determine preference rather than determining usage in a subset of highly expressed genes. Swire (2007) examined 31 microbial organisms (as well as 12 Eukaryotes) but used the same costs for all organisms (those of E. coli). Swire’s analysis avoided the use of a proxy for expressivity by focusing instead upon the consistent usage of amino acids with respect to protein cost (i.e., biosynthetically inexpensive proteins consistently utilize biosynthetically inexpensive amino acids). Das et al. (2005) examined 50 microbial genomes of which 34 were selected for analysis based upon the presence of translational efficiency bias and the absence of obscuring strand bias. They demonstrated that average amino acid size/complexity, aromaticity, and alcoholicity all exhibit negative correlations with Codon Adaptation Index (CAI) (Sharp and Li 1987) (indicative of a decrease in biosynthetic cost in highly expressed genes).

In this article, we examine the prevalence of cost selection across 389 organisms (with genomes that have strong translational efficiency bias), and find that average biosynthetic cost varies inversely with protein expression, and amino acid usage varies with expressivity in a way that is consistent with each amino acid’s biosynthesis cost (i.e., biosynthetically expensive amino acids are used less frequently in highly expressed proteins while usage of inexpensive amino acids increases).

Methods

Metabolic Efficiency

Demonstration of cost selection was accomplished by examining the Spearman rank correlation (Spearman 1904) between average amino acid biosynthetic cost for each protein and its expression level as predicted by its codon usage bias [Measure Independent of Length and Composition (MILC) Expression Level Predictor (MELP) (Supek and Vlahovicek 2005)]. The MILC-based MELP was chosen as a measure of codon usage bias due to its demonstrated resilience to the effects of sequence length and content bias (Supek and Vlahovicek 2005).

Amino acid biosynthetic costs were determined using the methods devised by Craig and Weber (1998) and Akashi and Gojobori (2002) (Table 1). Costs (high-energy phosphate bonds) are determined by summing the amount of energy expended in the specific biosynthetic pathway with the amount of potential energy lost through the diversion of metabolic intermediates that would have otherwise been catabolized (Fig. 1). Wagner (2005) calculated the costs of producing amino acids both aerobically and anaerobically in E. coli and these costs were used for any bacteria with these lifestyles. Metabolic pathways are complex and our approach includes some simplifying assumptions such as a primary carbon source and lack of alternative paths to the same end products. The average amino acid biosynthetic cost for each protein was determined by summing the cost of its constituent amino acids and dividing by the protein’s total number of amino acids. Aerobic costs were used for those organisms known to be facultative. The cost estimator A glucose (Barton et al. 2010) was utilized when organismal lifestyle could not be determined. In cases where precise lifestyle and costs could not be determined, but it was known that the organism does not metabolize glucose, molecular weight was employed (Seligmann 2003).

Table 1 Average amino acid biosynthetic costs in high-energy bonds (~PO4)
Fig. 1
figure 1

Metabolic pathways involved in amino acid biosynthesis and energy production. penP ribose 5-phosphate; PRPP 5-phosphoribosyl pyrophosphate; eryP erythrose 4-phosphate; 3pg 3-phosphoglycerate; pep phosphoenolpyruvate; pyr pyruvate; acCoA acetyl-CoA; αkg α-ketoglutarate; oaa oxaloacetate; RuBP ribulose-bisphosphate; TCA tricarboxylic acid cycle. The anaerobic amino acid biosynthesis pathways are identical except the TCA cycle is incomplete. Modified from Akashi and Gojobori (2002)

Ribosomal protein coding genes are used as a reference set for the MILC-based MELP. These genes were identified by employing a BLAST search (Altschul et al. 1990) on each target proteome using a non-redundant set of known ribosomal proteins as query sequences. The known ribosomal proteins were drawn from six representative proteomes (E. coli str. K12 substr. MG1655, Nitrosopumilus maritimus SCM1, Candidatus Carsonella ruddii PV, Natronomonas pharaonis DSM 2160, Mycoplasma genitalium G37, and Neorickettsia sennetsu str. Miyayama) to form a curated set. Amino acid sequences exhibiting alignments with 40 % identity over at least 60 % of the target amino acids were regarded as a match.

Identification of Candidate Genomes

Only genomes that exhibited characteristics consistent with that of genomes known to be translationally efficient were used in this study to ensure that MELP was a high-confidence predictor of expressivity. An algorithm developed by Carbone et al. (2003, 2005) was employed to determine the dominant bias as well as the strength of that bias. For the purposes of this study [and as recommended by Carbone et al. (2005)], genomes with a strength criterion greater than eight were considered “strongly biased,” and those with a ribosomal criterion greater than one were considered “dominant for translational efficiency.”

Phylogenetic and Lifestyle Context

A comprehensive review of the organisms with strong translational efficiency bias was conducted using an equal proportions test to determine if the proportions of metabolic lifestyles (determined by evaluation of the organism’s physiology) and phyla differed significantly from that expected for a random sampling. Lifestyles were determined from the literature for 1,047 of the 1,700 whole genomes available on NCBI. The phyla of 1,687 of the 1,700 organisms were obtained from the NCBI Taxonomy Browser Database. A phylum level phylogenetic tree was constructed using randomly drawn organisms from each phylum to produce a multiple sequence alignment (MSA) (ClustalW; Thompson et al. 1994). To produce the final tree, this MSA was analyzed with the PHYLIP 3.67 DrawTree program (Felsenstein 1989, 1993).

Exogenously Acquired Amino Acids

Since biosynthetic cost is being compared to the expressivity of each organism’s complement of proteins, it is necessary to identify those amino acids that the organism cannot synthesize (i.e., amino acids for which the organism is auxotrophic). BLAST (Altschul et al. 1990) searches were performed on the entire proteome of each organism to determine amino acid biosynthetic capability. Benchmark enzymes known to be involved in amino acid biosynthesis were taken from a set of phylogenetically balanced genomes (a randomly drawn genome from each of the represented phyla for each of the target enzymes). Target organisms were determined to contain a homolog of the benchmark protein if they were at least 20 % identical. If homologs of more than 50 % of a known amino acid biosynthetic pathway were absent, then it was concluded that the organism was auxotrophic for that amino acid.

To determine whether the exogenously acquired amino acids are over- or under-utilized in highly expressed genes, we calculated the fraction of these amino acids in the uppermost and lowermost expression quintiles of an organism’s genes sorted according to their predicted expression levels (MELP). The difference in these values (top quintile minus lowest quintile) was used as an indication of preference (or avoidance) of these amino acids in the protein products of highly expressed genes. Significance was determined by generating a bootstrap distribution of difference values. The entries in the distribution were generated by randomly producing quintile-sized sets of genes (drawn, randomly, from the uppermost and lowermost quintiles) and calculating the differential auxotrophy between the two sets (bootstrap size was 1,000 for each organism). The p values were determined parametrically by calculating the average and standard deviation of the bootstrap difference values, and identifying the probability, given a normal distribution, that an average fractional difference between the upper and lower quintiles could be outside the observed difference (two-tailed).

Gene Culling Criteria

Once candidate genomes were identified, coding sequences with fewer than 100 codons were removed from consideration to minimize sampling effects and potential length biases (Eyre-Walker 1996). Genes identified as candidates for horizontal gene transfer (Garcia-Vallvé et al. 2003) were also removed as they may not reflect the codon usage bias of the target organism (dos Reis et al. 2003; Garcia-Vallvé et al. 2003).

Additionally, only one version of each paralogous locus was kept in each organism’s gene set. Paralogs were identified using BLAST searches (Altschul et al. 1990) that were individually performed for each protein in each organism against all other proteins from that same organism. Proteins with greater than 60 % amino acid identity and with similar lengths (40 % tolerance) were considered to be paralogous.

Results

Organisms with Strong Translational Efficiency Bias

854 organisms of the 1,700 available from the NCBI Microbial Whole Genome Database had a ribosomal criterion greater than one (indicative of translational efficiency bias). Of these, 389 had a strong translational efficiency bias (strength criterion greater than eight). Amino acid biosynthetic pathways could be determined for 258 of those organisms. A complete listing of 46 distinct lifestyles of these 258 organisms can be found in the supplemental material. To determine whether the distribution of proportions of the nine lifestyles in strong translational efficiency organisms deviates from that which could be expected given a random drawing from the 46 lifestyles represented in the entire sample, an equal proportion test was performed. The results show that a number of lifestyles are significantly over- or under-represented in the set of organisms with strong translational efficiency bias (Table 2). In addition to having significantly different proportions of lifestyles, the distribution of phyla in this group of organisms was not representative. Of the 389 organisms with strong translational efficiency bias whose phylum was identified, only seven phyla are represented (Fig. 2) and 23 were not. The equal proportions test revealed that Proteobacteria, Firmicutes, Euryarchaeota, Cyanobacteria, and Actinobacteria are significantly over represented (Table 3). Additionally, 17 of the 23 phyla that were not present in this group should have been given a random sampling (significantly underrepresented).

Table 2 Amino acid biosynthetic pathways associated with lifestyles that exhibit significant proportional differences between the overall population and the population comprised of organisms with strong translational efficiency codon usage bias
Fig. 2
figure 2

Phylogenetic context for organisms exhibiting strong translational efficiency bias. Tree generated from randomly drawn 16S sequences from each phylum. Highlighted regions are where strong translational efficiency organisms are found including the proportion of sequenced organisms in that phylum that exhibit the trait

Table 3 Phyla that exhibit significant proportional differences between the overall population and the population comprised of organisms with strong translational efficiency codon usage bias

Many of the set of 389 genomes are from very similar strains of the same species such that only 176 species are represented at least once. Due to high degrees of variability observed between genome strains (Lawrence and Hendrickson 2005; JG Lawrence, personal communication, 2008), all genomes, including separate strains for the same species, were considered in this analysis. Only three of the 389 organisms that satisfied the requirements to be considered a genome with strong translational efficiency bias were Archaea (116 Archaea were present in the NCBI database at the time of this study).

Overall Correlations: Expressivity Versus Cost

All but four genera (six species) out of the subset of 389 genomes exhibited a negative and significant correlation between average biosynthetic cost and MELP (mean and standard deviation of Spearman rank correlation coefficients: μ r s = −0.27, σ = 0.115). This analysis included all amino acids in the computation. When exogenously acquired amino acids were omitted (biosynthesis costs are not applicable for these amino acids, though as will be shown, amino acid usage indicates these costs may be representative), all but eight genera (11 species) (Table 4) exhibited a negative and significant correlation between average biosynthetic cost and MELP (mean and standard deviation of Spearman rank correlation coefficients: μ r s = −0.23, σ = 0.12). Additionally, analysis was run using A glucose and molecular weight as biosynthesis cost surrogates for all organisms (A glucose: μ r s = −0.31, σ = 0.10, molecular weight μ r s = −0.21, σ = 0.13). Further investigations were performed on facultative organisms alone using the aerobic variant of their synthesis costs (this was the variant used in the analysis above; μ r s = −0.23, σ = 0.09) and using the anaerobic costs (μ r s = −0.0053, σ = 0.092).

Table 4 Genera exhibiting positive or non-significant trends (Spearman rank correlations) between average biosynthetic costs and MELP

Organisms without Negative and Significant Trends

The non-significant, negative trends for the four genera could be due to obscuring effects related to skewed GC-content. Each of the four exhibits higher GC-content than the other genomes with strong translational efficiency bias, ranging from 48 to 57 % G + C, compared to an average of 45 % (σ = 5.5 %) for the entire set of strong translational efficiency genomes. While it has been shown that optimal codons tend to align with the genomic GC-content (Hershberg and Petrov 2009) there are counter examples. For instance, Nostoc sp. PCC 7120 is a high AT-content organism (41 % G + C), while its ribosomal protein coding genes are 47 % G + C. If its highly expressed genes aligned with genomic GC-content (i.e., translational efficiency bias and content bias positively correlated), then they would be among the most highly biased and would exhibit lower than average GC-content (Raiford et al. 2010, 2011). Additionally, the correlation between GC-content and codon usage bias values are greater in magnitude for MELP than for the dominant bias for the non-significant organisms [except for the Desulfotalea genomes which are greater in magnitude for the dominant bias as calculated using the algorithm from Carbone et al. (2003)] indicating a divergence between the dominant bias and translational efficiency bias. This results in an above average number of preferred codons in ribosomal protein coding genes that disagree with the organism’s GC-content. In the set of organisms with non-significant organisms, there are on average 10.6 preferred codons that are at odds with the organism’s GC-content [compared to an average of 8.6 (σ = 2.85) in all strong translational efficiency organisms].

The additional genera without significantly negative trends when exogenously acquired amino acids were removed from consideration tend to have an unusually large number of amino acids for which they are auxotrophic (Table 4). Additionally, they tend to be auxotrophic for the more biosynthetically expensive amino acids (Table 5) as well as the least biosynthetically expensive amino acid (glu; Table 5).

Table 5 Exogenously acquired amino acid usage behavior

Amino Acid Usage

Organisms tend to utilize amino acids with low biosynthetic cost in highly expressed proteins (Fig. 3). Further, the usage of each amino acid varies in a way that is consistent with its synthesis cost in that the slopes for each amino acid between its usage in each protein and that protein’s expressivity (associated gene’s MELP) are negative.

Fig. 3
figure 3

Amino acid usage behavior. Each data point represents the cost-usageexpressivity relationship for a single amino acid (identified by its one-letter designation) across all proteins examined for each lifestyle/metabolic group. Y-axis is the average slope for amino acid usage versus MELP. X-axis is amino acid biosynthetic cost. The error bars represent one standard deviation. The fitted lines were generated using standard linear regression and are for visualization purposes only. a Organisms with uncharacterized metabolic pathways that are assumed to metabolize glucose [A glucose used for biosynthesis cost (Barton et al. 2010)], b organisms with unknown metabolic pathways that do not metabolize glucose [molecular weight used for biosynthesis cost (Seligmann 2003)], c aerobic heterotrophs, d anaerobic heterotrophs, e aerobic phototrophs (note that e’s plot has a slightly greater Y-axis range than ad). Consistent amino acid usage would be evidenced by a negative trend [i.e., inexpensive amino acids increase (positive slope) with expressivity, while expensive amino acids decrease (negative slope) with expressivity]. Plots are similar when Spearman rank correlation coefficient is used instead of slope. The overall Spearman rank correlation (r s) between each amino acid’s biosynthesis cost and its slope across all organisms (389 organisms, 20 slopes each; slope is for MELP vs. that amino acid’s usage) is −0.11 p value = 2.86 × 10−22

Consistent amino acid usage is evidenced by a negative trend [i.e., the frequency of inexpensive amino acids increases (positive slope) with expressivity, while expensive amino acids decrease (negative slope) with expressivity]. The overall Spearman rank correlation (r s) between each amino acid’s biosynthesis cost and its slope across all organisms (389 organisms, 20 slopes each; slope is for MELP vs. that amino acid’s usage) is −0.11 p value = 2.86 × 10−22.

Usage of Exogenously Acquired Amino Acids

Similar analysis is difficult for exogenously acquired amino acids as there are no biosynthetic costs against which to compare the usage trends. We can, however, examine the usage trends by themselves and possibly gain insights into the perceived cost to the organisms for each of these amino acids (Table 5). There is a strong positive and significant Spearman rank correlation between the slopes of usage for endogenously synthesized amino acids (Fig. 3) when compared to those that are exogenously acquired (Table 5) (r s = 0.81, p value < 5 × 10−324).

A bootstrap analysis of differential utilization of exogenously acquired amino acids in highly expressed versus weakly expressed genes indicated that, out of the 173 strong translational efficiency organisms with at least one exogenously acquired amino acid, 159 of them demonstrate either significant avoidance or preference of exogenous amino acids in highly expressed genes [99 exhibit significantly positive differential usage (preference) and 60 exhibit significantly negative differential usage]. A complete listing of the differential averages and p values for all 173 organisms can be found in the supplemental material.

Discussion

Previous work that utilizes this approach (adherence to translational efficiency codon usage bias as a predictor of expressivity and amino acid biosynthesis pathway used to determine biosynthesis cost) has identified metabolic efficiency as a selective force shaping proteomes in E. coli and B. subtilis (Akashi and Gojobori 2002) and four organisms with photoautotrophic or thermophilic lifestyles (Heizer Jr. et al. 2006). This study confirms the near universality of cost selection by extending the analysis to 389 microbial genomes available through the NCBI microbial genomes database [October, 2011, NCBI Complete Microbial Genomes Database (NCBI 2011)]. Cost selection causes highly expressed proteins to tend to utilize less biosynthetically expensive amino acids.

All but four genera (6 species) out of the 389 genomes with strong translational efficiency bias exhibited a negative and significant correlation between average biosynthetic cost and MELP (mean and standard deviation of Spearman rank correlation coefficients: μ r s = −0.27, σ = 0.115). There is a strong correlation between the slopes in usage of endogenously biosynthesized and exogenously acquired amino acids (r s = 0.81, p value < 5 × 10−324, Fig. 3; Table 5). This phenomenon of similar usage of synthesized and acquired amino acids has been observed indirectly before (Swire 2007), and is here shown directly.

In order to confirm the validity of including exogenously acquired amino acids, analysis was performed without them. Indeed, when amino acids for which these organisms are auxotrophic were removed from consideration, the correlations remained negative and significant for all organisms except in an additional four genera [four genera with all amino acids considered, four additional genera with acquired amino acids removed] (Table 4) (mean and standard deviation of Spearman rank correlation coefficients: μ r s = −0.23, σ = 0.121).

It seems likely that unusual GC-content (relative to other strong translational efficiency organisms) could be obscuring the translational efficiency codon usage bias in the four genera exhibiting non-significant trends (“Organisms without Negative and Significant Trends” section in “Results”). This could confound expressivity prediction and, thereby, correlations with biosynthesis cost. The life histories of these organisms may also reduce their need for strict adherence to metabolic efficiency. A closer review of their life and evolutionary histories may be required to completely explain their departure from the general tendency of organisms to be significantly influenced by selective pressures favoring metabolic efficiency.

In the analysis with exogenously acquired amino acids removed, the additional organisms that did not have significant negative correlations were among those with the highest degree of auxotrophy (Table 4). These exception organisms tend to be auxotrophic for both the most and least biosynthetically expensive amino acids (Tables 1, 5). In organisms that are capable of synthesizing them, these amino acids tend to be the strongest contributors to the inverse relationship between expressivity and average biosynthetic cost (Fig. 3).

It has been suggested that organisms treat exogenously supplied amino acids differently than those for which they are auxotrophic (Alves and Savageau 2005). Most of the organisms in this study exhibited either a significant preference or avoidance of exogenously acquired amino acids (159 out of the 173 strong translational efficiency organisms have such amino acids; 60 of which showed significantly positive differential usage in highly expressed genes, with the other 99 exhibiting significant avoidance). Generally speaking, the 60 organisms that preferentially utilize acquired amino acids have lifestyles and are associated with environments where it is likely that they will have a ready supply of those amino acids (e.g., they are obligate intracellular parasites). In contrast, the 99 that avoid the use of amino acids that must be exogenously acquired are associated with environments where the supply of those amino acids is less reliable (a list of these organisms along with their differential usage can be found in the supplementary data).

In addition to demonstrating that organisms with strong translational efficiency bias have a significant negative correlation between protein expressivity and average biosynthetic cost, this analysis also shows that each amino acid’s usage varies in a way that is consistent with its biosynthetic cost. Biosynthetically expensive amino acids are used less frequently in highly expressed proteins, and inexpensive amino acids are used more frequently in highly expressed genes (Fig. 3). This trend (the relationship between amino acid biosynthesis cost and the MELP vs. usage slope) has an overall negative Spearman rank correlation (r s) between each amino acid’s biosynthesis cost and its slope across all organisms (389 organisms, 20 slopes each; slope is for MELP vs. that amino acid’s usage) is −0.11 p value = 2.86 × 10−22. Lysine and Leucine are consistently over and underutilized, respectively, given their cost, across all lifestyles (Fig. 3). A contributing factor to over usage of K may be its preferential use in ribosomal proteins to bind to negatively charged rRNA.

Of the 1,700 genomes that currently reside on NCBI’s microbial genome database, 389 exhibit strong translational efficiency bias. The metabolic lifestyles are known for 258 of these. Ribosomal criteria for the dominant bias (average adherence score of ribosomal protein coding genes in standard normal form; i.e., a ribosomal criteria of 1 indicates that their average is one standard deviation above the average for all proteins) is positive in 80 % of all organisms, and greater than one standard deviation above the average for the genome in 50 % of all sequenced microbial organisms. Organisms that exhibit strong translational efficiency bias utilize eight different metabolic lifestyles (out of the 46 known lifestyles in the set of all 1,700 complete genomes listed on NCBI). This is significantly less than would be expected given the distribution of lifestyles in the underlying population (equal proportions test; Table 2). Facultative and anaerobic heterotrophs and aerobic phototrophs were significantly overrepresented than what would be expected if organism selection was random. The absence of five lifestyles in the strong translational efficiency group was also significant.

Phylogenetically, the strong translational efficiency genomes were overrepresented by Proteobacteria, Firmicutes, Euryarchaeota, Cyanobacteria, and Actinobacteria, and there are several phyla that are underrepresented (Table 3). This composition is not entirely unexpected. Many proteobacteria and firmicutes, including those in this group, are free-living chemoheterotrophs that often reside in oligotrophic environments where resources are scarce. While most of these organisms have the capability for fast growth, in these environments they are slow-growing and resource limited. They cannot depend on the resources in their environment to provide intact amino acids and other building blocks for biosynthesis of macromolecules. Therefore, they need to be very metabolically efficient (especially anabolically) to compete for survival, and it is advantageous for them to be able to biosynthesize cellular components.

While the investigation was performed on only 389 of the 1,700 whole microbial genomes available at the time of this study (the subset of organisms for which expression rates could confidently be predicted), the consistency with which the highly expressed genes tend to preferentially utilize less biosynthetically expensive amino acids is compelling. The vast majority of these organisms exhibit a negative and significant correlation between expressivity and biosynthetic cost. Even when exogenously acquired amino acids are ignored, the result holds for all but a handful of genera. The few exceptions are unusual in that they are auxotrophic for a large number of amino acids (Table 4). They are also utilized in the food industry, tend to avoid the use of their exogenously acquired amino acids (Table 5), and have likely undergone human-directed breeding and/or genetic engineering.

Metabolic efficiency is a weak, global, evolutionary force, and its influence may not be detectable in all organisms (such as those with small population sizes or where energetic constraints are not limiting). Detection of the subtle trends of biosynthetic cost selection driven changes often requires whole proteome analysis. The effects tend to accumulate in regions of proteins that are least functionally constrained (Heizer et al. 2011), and the benefits are greatest in genes that are highly expressed. Secreted genes also undergo observable biosynthetic cost selection to minimize the loss of cellular resources (Smith 2010). Given that metabolic efficiency appears to have had a significant effect in the shaping of the proteomes of the organisms included in this study; it seems reasonable to expect that similar effects will be found in other bacterial and archaeal species as their complete genome sequences become available for analysis. Any organisms that must occasionally compete for rare resources and whose population sizes would support such proteome-wide evolutionary movements should exhibit metabolic efficiency.