Introduction

At least four independent transitions from water to terrestrial habitats occurred during the diversification of the streptophytes, which include charophycean green algae and embryophytic plants. One of these represents a major organismal radiation during the history of life on earth, with the evolution of embryophytes (bryophytes + tracheophytes) from a freshwater charophyceae ancestor (Kenrick and Crane 1997; Bateman et al. 1998; Qiu and Palmer 1999; Karol et al. 2001; Lewis and McCourt 2004; Qiu et al. 2006, 2007; Turmel et al. 2006; Lemieux et al. 2007), which is estimated to have occurred between 425 and 490 million years ago (Mya) (Sanderson 2004).

Another transition involves the unicellular monotypic genera Mesostigma and Chlorokybus that form the earliest branching lineage in the streptophyte tree (Fig. 1) (Qiu et al. 2006; Lemieux et al. 2007). Mesostigma occupies the water column in freshwater habitats, while Chlorokybus has adapted to occupy sub-aerial habitats (Lewis and McCourt 2004; Lemieux et al. 2007). Klebsormidiales is another early branching lineage that contains members that occupy terrestrial habitats, in addition to two genera from the Zygnematales that can form sub-aerial mats (i.e., Cylindrocystis, Mesotaenium) (Lewis and McCourt 2004).

Fig. 1
figure 1

Streptophyta phylogeny (Karol et al. 2001; Qiu et al. 2006, 2007) showing major evolutionary grades of green algae, ‘bryophytes’, and the tracheophyte lineage. Proteomic sequence data, derived from plastid (Pt) and mitochondrial (Mt) proteomes, were obtained for taxa presented in the tree as indicated within closed brackets. Branches with broken lines indicate an ‘aquatic’ → ‘terrestrial’ transition, where proteomes may have optimized amino acid composition to cope with higher temperatures, increased ultraviolet radiation (UV), and cellular desiccation/dehydration. Major morphological adaptations for terrestrialization are described in external text-boxes at respective nodes in the tree, and an estimated date of divergence (Sanderson 2004) is provided within an internal box

To overcome increased water stress, heat, and exposure to UV, which must have accompanied these transitions, adaptive mechanisms may have been required (Graham et al. 2000; Proctor and Pence 2002; Waters 2003; Oliver et al. 2005; Rensing et al. 2008; Wang et al. 2009b; Khandelwal et al. 2010; Richardt et al. 2010; Wolf et al. 2010).

Desiccation tolerance in plants involves mechanisms of cellular protection for vegetative structures, spores, seeds, and pollen; research on this topic is centered on the accumulation of osmoregulating free amino acids (e.g., proline) in the cellular solution (Blum and Ebercon 1976; Carmo-Silva et al. 2009; Khandelwal et al. 2010; Richardt et al. 2010), over-expression of abiotic stress related genes (Frank et al. 2005; Close 1997; Cuming 1999; Proctor and Smirnoff 2000; Proctor et al. 2007; Finkelstein et al. 2008; Hájek and Beckett 2008; Rensing et al. 2008; Khandelwal et al. 2010), and epigenetic modifications (Granot et al. 2009).

Based on phylogenetic mapping, Oliver et al. (2005) suggest that primitive land plants were vegetatively and reproductively tolerant of desiccation, with presence of active dehydrin/rehydrin proteins found in liverwort and moss lineages (Waters 2003; Close 1997; Cuming 1999; Proctor et al. 2007; Hájek and Beckett 2008; Rensing et al. 2008). With the subsequent divergence of hornworts (Anthocerotophyta), this protective mechanism seems to have been lost, but had reappeared independently in various tracheophyte lineages, presumably as they adapted to arid habitats (Oliver et al. 2005; Granot et al. 2009).

Cellular dehydration/desiccation affects the concentration of cellular solutes and is therefore likely to interrupt the balanced organization of water molecules required for protein folding and stabilization (Mrabet et al. 1992; Jaenicke and Böhm 1998; Borders et al. 1994; Strub et al. 2004). Both these processes are usually dependent on a protein’s surface–polar interface with water (Klotz 1958; Bryant 1996; Killian and Heijne 2000; Smith et al. 2004).

UV radiation is another problem terrestrial plants face; cells, cellular components and proteomes and DNA therein may all be subject to resulting oxidative damage (Tyrrell 1994; Brosche and Strid 2003; Rensing et al. 2008; Wolf et al. 2010). The penetration of UV is mainly limited to surfaces and near-surfaces of cells, and in chloroplasts, can alter thylakoid integrity thereby compromising photosynthesis (Tyrrell 1994; Kovács and Keresztes 2002; Rodrigues et al. 2006).

Increased concentrations of free aromatic compounds (e.g., tannins, flavonoids) (Cooper-Driver and Bhattacharya 1998, Waters 2003), and the release of isoprene hydrocarbon compounds from photosynthetic surfaces are thought to be adaptive mechanisms for protection from UV damage, and heat induced production of reactive oxygen species (Hanson et al. 1999; Sharkey et al. 2008).

As demonstrated above, many mechanisms have evolved that allow plants to deal with environmental stresses during their movement to terrestrial habitats. One additional mechanism that has not been considered so far in the studies of plant colonization of land is the alteration of the amino acid composition, i.e., increase or decrease in the use of certain residues such as those with either polar charge or UV absorbing aromatic rings.

Increased use of polar charged amino acids, particularly the EKR residues, may provide protection from desiccation through thermostability and hydrophilicity (Haney et al. 1999; Killian and Heijne 2000; Nishio et al. 2003; Brocchieri 2004), while increased use of aromatic amino acids (FHWY) on protein surfaces has been suggested to provide thermostability for thermophilic proteins, due primarily to the absorptive properties of the side-chain aromatic ring (Kannan and Vishveshwara 2000). In early land plant lineages such whole proteome adaptations might have been especially important in the light-capturing chloroplast proteome, with UV-B known to damage photosystem components (Tyrrell 1994; Kovács and Keresztes 2002; Rodrigues et al. 2006).

If proteomic adaptations occurred during terrestrialization events in streptophytes, we reasoned that a signature might be found in the amino acid composition of transitional lineages when compared to their closest freshwater aquatic relatives. To answer this question we used concatenated orthologous sets of genes containing 55 chloroplast (plastid), and 13 mitochondrial protein-coding genes, and compared amino acid compositional frequencies between (1) the ‘bryophyte’ and ‘charophycean’ green algal grades, (2) the tracheophyte lineage and ‘bryophytes’, and (3) Chlorokybus and Mesostigma. We also investigated whether proteins/protein regions exposed to the organellar fluid might be under stronger selection for protein hydrophilicity and aromaticity, by comparing transmembrane domains with those of extra-membrane regions.

Materials and Methods

Taxon Sampling and Protein Alignments

Organellar sequences for sampled taxa were obtained from the NCBI GenBank database (http://www.ncbi.nlm.nih.gov) (Fig. 1; refer to Supplementary Table S1 for GenBank accession numbers). All sampled genes are presented in Supplementary Table S2.

Amino acid sequence alignments for the 55 plastid and 13 mitochondrial gene datasets (Supplementary Table S2) were concatenated and analyzed as individual compartments (Figs. 2, 3, 4), and also as classes of structurally related proteins (Supplementary Tables S2, S3).

Fig. 2
figure 2

Percentage difference between mean usage frequencies for categories of amino acid residues. Comparisons are between ChlorkybusMesostigma (hatched bars), bryophytes–green algae (excluding Chlorkybus) (gray bars), tracheophytes–bryophytes (open bars), across concatenated sets of orthologous protein-coding genes (n = number of genes). Taxa sampled for a the plastid proteome (Pt) include six alga, four bryophytes, and 42 tracheophytes; b mitochondrial proteome (Mt) includes four alga, three bryophytes, and six tracheophytes. For plastid and mitochondrial data-sets, the Mann–Whitney test was used to evaluate statistical significance (see “Materials and Methods” section). The level of significance is indicated by *P < 0.05; **P < 0.01

Fig. 3
figure 3

Boxplots showing the frequency (%) of the EKR residue category in extra-membrane regions of a chloroplast (Pt), and b mitochondria (Mt), for algae (excluding Chlorokybus), bryophytes, and tracheophytes. Boxes show mean, SD, and values with extreme variance (black circles). Frequency values for Chlorokybus (C.a) are schematically shown in filled circles. Between-group comparisons that are significantly different (P ≤ 0.001) are indicated with brackets

Fig. 4
figure 4

Boxplots showing the frequency (%) of a polar, and b hydrophobic residue categories in extra-membrane regions of the mitochondrion, for algae (excluding Chlorokybus), bryophytes, and tracheophytes. Boxes show mean, SD, and values with extreme variance (black circles). Frequency values for Chlorokybus (C.a) are schematically shown in filled circles. Between-group comparisons that are significantly different (P ≤ 0.001) are indicated with brackets

Alignment of amino acid sequences was performed using Muscle (Edgar 2004). Amino acid alignment length was 13873 and 4276 for the plastid and mitochondrial matrices, respectively. We determine the amino acid composition frequencies for both gapped and ungapped alignments. For consistency, we have used the data obtained from ungapped alignments as similar results were obtained for both analyses.

The amino acid matrix length for transmembrane domains was 2133 (plastid), and 1509 (mitochondrion), while for extra-membrane regions the length was 11740 (plastid) and 2767 (mitochondrion). Both the concatenated plastid and mitochondrial extra-membrane datasets also contained non-membrane-bound genes (refer to Supplementary Table S2).

A manual inspection was performed to identify non-orthologous regions, facilitating an unbiased and accurate estimation of amino acid frequency (alignments available upon request).

Regarding taxon selection, we had to consider the presence of RNA editing sites in the genomic sequences, as these sites can significantly alter the amino acid composition (Giegé and Brennicke 1999; Kugita et al. 2003; Wolf et al. 2004; Jobson and Qiu 2008), and for this reason we only used sequences in which cDNA data were available in GenBank. For the hornwort Megaceros aenigmaticus (Li et al. 2009) we sequenced the cDNA of all mitochondrial genes represented in our dataset (Supplementary Table S2). We did not use available programs to approximate RNA editing sites due to the sensitivity of our analyses: i.e., even just a few undetermined sites can alter the results significantly, as determined by mitochondrial protein alignments for the bryophytes Pleurozia purpurea (GenBank accession: NC_013444) (Wang et al. 2009a) and Phaeoceros laevis (GenBank accession: NC_013765) (Xue et al. 2010), and tracheophyte Isoetes engelmannii (Grewe et al. 2009).

The sensitivity of the analyses also limited us to using only those taxa that contained genomes with the full complement of expressed target genes (Supplementary Table S2).

Several lineages of green algae and early land plants formed two individual grades before and after the colonization of land by plants (Karol et al. 2001, Qiu et al. 2006, 2007), and they showed similar frequencies of amino acid composition. Thus, we lumped the various lineages in these two paraphyletic groups into two groups for the sake of convenient comparison. If the strict clade-to-clade comparison approach was taken, the trend revealed here would just be as distinct or even more so since there would be several lineages on both sides of the terrestrialization event for comparison.

Amino Acid Compositional Analyses and Statistics

The amino acid frequency was determined for all residues across the sampled taxa in data-sets for plastid and mitochondrial proteomes (Supplementary Table S1). Previous studies have calculated an ‘expected’ frequency from codon usage data, and correlated this with the observed usage frequency (King and Jukes 1969). As we wanted to find broad patterns across amino acid categories and taxonomic groups, we took a different approach by calculating the average frequency across taxonomic groupings of streptophytes: freshwater algae (Mesostigmatales, Chlorokybales, Zygnematales, Coleochaetales, Charales), and Embryophytes; bryophytes (Marchantiophyta, Bryophyta, Anthocerotophyta), and tracheophytes (Lycopodiophyta, Moniliformopses Spermatophyta) (sampled taxa are shown in Fig. 1, and Supplementary Table S1).

We categorized residues based on major biochemical side-chain properties using the EMBL-EBI ‘Amino Acid Properties and Substitutions’ website (http://www.ebi.ac.uk), and the Amino Acid Properties’ website (http://www.mcb.ucdavis.edu/courses/bis102/AAProp.html), and formed sub-categories of related residues based on those deemed important in studies of adaptive thermostability (Deckert et al. 1998; Kumar and Nussinov 2001). The analyzed categories are as follows; polar side-chains ‘charged’ (DEHKR), charged polar side-chains ‘negative’ (DE), charged polar side-chains ‘positive’ (HKR), charged polar side-chains deemed important for protein thermostabilization (EKR), combined polar side-chains (DEHKNQRST), uncharged polar side-chains ‘polar-neutral’ (NQST), side-chains ‘aromatic’ (FHWY), side-chains ‘aliphatic’ (ILV), residues ‘hydrophobic’ (AGFILMPVW), and residues ‘small’ (ACDGNSTV).

Data presented in Fig. 2 are the relative difference for the average compositional frequency between taxonomic groups [{AA freq. Group 1 − AA freq. Group 2}/{MAX AA freq.}]. This procedure was performed to account for differences in residue number per amino acid category.

We also partitioned the data into the following functional gene classes: ATP = ATP synthases (plastid, mitochondrion); NDH = nicotinamide dehydrogenases (plastid, mitochondrion); PET = cytochrome-bf (plastid); PSI-II = components of photosystem I and II (plastid); and COX = cytochrome-c-oxidases (mitochondrion). For these classes we present relative difference for the average compositional frequency results for charged EKR, polar, and aromatic groups (Supplementary Table S3).

Transmembrane domain boundaries were estimated for the plastid and mitochondrial protein alignments (see above for length), across all sampled taxa, using the Sequence Annotation feature in the Protein Knowledge Database (UniProtKB) (http://www.uniprot.org).

Amino acid frequency results (relative difference for the average compositional frequency) were determined separately for alignments containing either transmembrane domains or extra-membrane regions, across all residue categories (Supplementary Table S4).

To examine the effect of nucleotide shifts between taxa, we did not look at the nucleotide compositional changes themselves; instead we reasoned that such changes should be more accurately reflected by relative difference between combined AT rich (FYMINK) and GC rich (GARP) amino acid residues (Supplementary Table S4). Our main reason for using this approach was the presence of RNA editing in organelle genomes, which results in an observed non-correspondence of genomic DNA → protein sequence (i.e., genomic GC and AT do not fully reflect the amino acid composition) (see above). In any case, the nucleotide frequency for taxa used in plastid and mitochondrial data-sets were presented in Jobson and Qiu (2008).

We compared taxonomic group means (average compositional frequency per taxonomic grouping), and assessed significance (P ≤ 0.05) using the Mann–Whitney test (Fig. 2; Supplementary Tables S3, S4). We also visualized the compositional data as boxplots (Figs. 3, 4, 5), and compared the significance of frequency difference (P ≤ 0.05) between taxonomic groups using one-way ANOVA in R version 2.9.2 software (http://www.R-project.org) (R Development Core Team 2009). For the Mesostigma/Chlorokybus comparison we did not perform statistical analyses on the relative frequency differences.

Fig. 5
figure 5

Boxplots showing the frequency (%) of the aromatic residue category in extra-membrane regions of a plastid (Pt), and b mitochondria (Mt), for algae (excluding Chlorokybus), bryophytes, and tracheophytes. Boxes show mean, SD, and values with extreme variance (black circles). Frequency values for Chlorokybus (C.a) are schematically shown in filled circles. The value for Mesostigma (M.v) is shown with open circle, and the value for Chara (C.v) is also indicated

Results and Discussion

Most research linking protein level amino acid compositional shifts with adaptation to new habitats has focused on extreme examples comparing optimal growth temperatures across mesophilic versus thermophilic prokaryotes. In particular, the latter group shows an observed pattern of increased use of thermostabilizing polar charged amino acids, particularly the hydrophilic EKR category (Haney et al. 1999; Killian and Heijne 2000; McDonald 2001; Nishio et al. 2003; Singer and Hickey 2003; Brocchieri 2004), at the expense of uncharged polar residues (Haney et al. 1999; Kumar and Nussinov 2001; McDonald 2001). Differing interpretations of the above observations have lead to two sides; those that attribute this pattern to mutational biases driven by nucleotide compositional shifts (Foster et al. 1997; Gu et al. 1998; Jabbari et al. 2003; Lobry and Chessel 2003; Wang et al. 2004), and those tending toward the involvement of adaptive processes (Deckert et al. 1998; Haney et al. 1999; Killian and Heijne 2000; McDonald 2001; Nishio et al. 2003; Brocchieri 2004; Tekaia and Yeramian 2006; Boussau et al. 2008; Saelensminde et al. 2009).

Here we take a first look at this question across episodes of terrestrialization in the streptophyte lineage (Fig. 1). We examined the amino acid frequency of extant taxa (i.e., tree terminals), under the assumption that particular “fixed” amino acid usage frequency patterns had changed in response to shifts in the ecological/physiological environment; in our case, freshwater aquatic versus terrestrial habitats. It should be noted that our analyses are constrained by the assumption of within-taxon amino acid compositional stability over a period of >400 million years (Sanderson 2004).

Residues for Protecting Proteins Against Cellular Dehydration and Desiccation, and Maintaining Their Thermal Stability

Our comparison of amino acid frequencies in extra-membrane regions for bryophytes and algae revealed a positive difference in frequency of charged residues, including negative, positive, and EKR categories. This pattern is consistent with that of the ChorokybusMesostigma comparison, with the strongest consistent increase observed for positively charged residues (HKR) in both plastid and mitochondria (Fig. 2a, b), while the greatest difference was for EKR (bryophyte–algae; P ≤ 0.05) in the plastid genome (Fig. 2a).

We then examined the within-group distribution for EKR (Fig. 3) and found an unbiased increase in the plastid proteome across all bryophytes versus algae (P ≤ 0.05), and a clear increase for Chlorokybus relative to the five other algal species (Fig. 3a). This trend was also observed in the mitochondrion (Fig. 3b). When we looked at the plastid EKR frequency in major gene classes, we found that all contribute positively to this pattern, with ATP, NDH, PET, and ribosomal/polymerase contributing a significant effect (Supplementary Table S3). For the mitochondrion, the NDH gene class contributes all of the positive EKR effect observed in Fig. 2b, while ATP and COX classes contribute a slightly negative trend (Supplementary Table S3).

When we combined all polar residues (DEHKNQRST) for the bryophyte–algae comparison we found that the mitochondrial frequency increases significantly (P ≤ 0.001) (Figs. 2b, 4a), with all gene classes contributing to this positive pattern (Supplementary Table S3), while for the same comparison in plastids, the effect is dampened, due mainly to the corresponding decrease of polar-neutral residues (NQST) (Fig. 2a). A reverse trend was found for aliphatic and hydrophobic residues (Fig. 2b), with significant losses of the latter (P ≤ 0.001) found in the mitochondrion (Fig. 4b), and a non-significant loss for both categories in plastids (Fig. 2a). For the plastid proteome we instead found that the small residue category decreases significantly (P ≤ 0.05), while remaining constant for the mitochondrion (Fig. 2b).

For the Chlorokybus mitochondrial proteome we find that the compositional frequencies of the combined polar (Fig. 4a), and hydrophobic categories (Fig. 4b) had strongly diverged from Mesostigma, reaching levels comparable to the mean of the bryophyte grade. This pattern of positive increase is observed in all mitochondrial gene classes, but only ATP and PET classes for the plastid proteome (Supplementary Table S3).

The above analyses of usage frequency within categories of amino acid residues revealed patterns within extra-membranous regions, under the hypothesis that natural selection for hydrophilicity would be stronger on exposed regions of proteins. To further test this idea we also performed the same analyses on transmembrane domains and found that in the plastid proteome both the bryophyte–algae and ChorokybusMesostigma comparisons were generally consistent with this hypothesis: extra-membrane regions show a stronger increase in the charged polar residue categories than that of the transmembrane domains (Supplementary Table S4A).

We found that the pattern for the mitochondrion was less clear (Supplementary Table S4B). This may be due to the limited transmembrane data (see “Materials and Methods” section), and the low percentage of charged polar residues normally located in transmembrane domains (i.e., mean frequency of EKR in mitochondria: transmembrane = 2.2%; extra-membrane = 14.3%), with a similar contrast found in the plastid proteome (data not shown). However, a more robust comparison can be made using the combined polar category, with average transmembrane mitochondrial and plastid frequencies of 29 and 33%, versus 54 and 57% for extra-membrane regions, respectively. From this comparison it seems generally clear that as streptophytes transitioned to terrestrial habits, polar residues were gained less frequently in transmembrane domains than that in extra-membrane regions (Supplementary Table S4).

We have uncovered a consistent pattern of a positive difference for charged residues as early land plants (represented by the bryophyte grade) colonized land, and as Chlorokybus adapted to aerial environments. Further, the clear algae to bryophyte EKR increase in plastids is made more striking when one considers that the ‘fixed’ frequencies for algae are represented by terminal taxa across four major nodes of streptophyte evolution (Fig. 1), with the sub-aerial Chlorokybus being the only taxon to deviate from the algal trend.

Aromatic Residues for Protecting Proteins and Nucleic Acids Against UV Damage

Increased usage of aromatic amino acids (HFWY) is observed in extra-membrane regions in the plastid proteome across all bryophytes versus all but one algal taxon (Chara) (Fig. 5a), with this comparison approaching significance (P = 0.06) (Fig. 2a). For the bryophyte–algae comparison of the plastid proteome, all gene classes, except PET, contribute to the positive HFWY usage pattern, with that for PSI-II and NDH found to be significantly higher (P ≤ 0.05) (Supplementary Table S3). The lowest levels were found for Mesostigma, with a relatively small increase (0.1%) in Chlorokybus versus Mesostigma (Fig. 5a).

Of the obligate aquatic algae, Chara shows the highest levels for HFWY in plastids, deviating from the algal mean by 0.6%, followed in usage frequency by the first branching bryophyte Marchantia (Figs. 1, 3c). Charales (represented by our sampled Chara) are thought to be the sister lineage to embryophytes (Kenrick and Crane 1997; Bateman et al. 1998; Qiu and Palmer 1999; Karol et al. 2001; Lewis and McCourt 2004; Qiu et al. 2006, 2007; Lemieux et al. 2007; Qiu 2008) (Fig. 1), and perhaps under this phylogenetic scenario (refer to Turmel et al. 2006 for an alternate phylogenetic placement of Chara), the observed build-up of aromatic residues was a pre-adaptation for life on land (Fig. 1). To examine this hypothesis, we trialed the inclusion of Chara within the bryophyte group and found HFWY is significantly elevated (P ≤ 0.001) when compared to the other aquatic algae (Fig. 5a).

The anchored thalli of Characeae, which often extends to the surface of the water column (de Bakker et al. 2005; Becker and Marin 2009), may be occasionally exposed to stronger UV when water levels decrease in its littoral habitat (de Bakker et al. 2005). In comparison to the relatively large and branched filamentous whorls found in the Characeae, the single-celled Mesostigma, branched filaments of Coleochaetales, and conjugating cells and unbranched filaments of the Zygnematales are less complex in morphology and possibly less exposed to UV (de Bakker et al. 2005).

In the mitochondrion, we find very little change between algae and bryophytes in the mitochondrion (Fig. 2b, 5b), which suggests that any adaptive increase in HFWY is restricted to the plastid proteome. In contrast to the pattern observed in plastids, for the mitochondrion Mesostigma strongly exceeded the frequency of Chlorokybus, with a 0.6% decrease in HFWY for Chlorokybus relative to the algal mean (Fig. 5a).

The above findings, restricted mainly to the light-absorbing plastid proteome, and interestingly, to the protein classes involved in photon capture (PSI-II), and cyclic electron transport (NDH), are suggestive of adaptation for dealing with increased UV damage to thylakoid membranes encountered during the colonization of land by plants (Tyrrell 1994; Wolf et al. 2010).

Evolution of Vascularized Tissues

Regarding niche occupation, it is conceivable that early land plants would have been highly restricted due to their shallowly placed rhizoids, with extant taxa generally limited to moist substrates. In contrast, the tracheophyte body plan enabled the occupation of dryer habitats through evolution of highly specialized regulation of cellular water relations; efficient water uptake through branched root systems, and control of loss via specialized leaf surface stomata (Kramer and Boyer 1995; Graham et al. 2000; Chaves et al. 2003; Chen et al. 2005; Song et al. 2005; Carmo-Silva et al. 2009).

Our bryophyte–algae comparison uncovered a pattern of increased usage for both polar and aromatic residues, and we now ask whether this pattern of usage frequency continued throughout the evolution of the tracheophyte lineage (Fig. 1). During terrestrialization we observe a significant build-up of polar charged amino acids in the plastid proteome (Figs. 2a, 3a). However, after the evolution of vascularized tissues we only detect a weak increase, or decrease in the case of EKR (Figs. 2a, 3a), with only Adiantum and all four grass taxa (Supplementary Table S1) reaching levels comparable to the bryophyte mean (~18.8%) (Fig. 3a). Wang et al. (2004) suggest that this increase in grasses is more related to mutational bias, and we also observe increased levels for the GC rich GARP residues after vascularization for both proteomes (Supplementary Table S4), which suggests that nucleotide compositional shifts may have, in part, contributed to the pattern. Banerjee et al. (2006) compared amino acid frequency across nuclear genomes of A. thaliana and O. sativa and found evidence for natural selection as a possible driver of increased hydrophobicity; however, in contrast to our organellar results, their study found that hydrophobic residues were favored by the grass taxon O. sativa.

The mitochondrial proteome does not conform to the above plastid pattern for charged side-chain categories, with generally stronger positive change occurring during/after vascularization, with corresponding significant losses of both aromatic and hydrophobic residues (Fig. 2b). For the hydrophobic category, the pattern is very weak, while for the non-charged and small categories there was a significant positive frequency difference (P < 0.05 and >0.001, respectively) (Fig. 2a).

Effects of Mutational Biases

Although our observed amino acid compositional shifts are suggestive of adaptive evolution for life on land, we are aware that random forces may have been involved in shaping these patterns (Foster et al. 1997; Gu et al. 1998; Jabbari et al. 2003; Lobry and Chessel 2003; Wang et al. 2004). Natural selection acts on individual point mutations, and therefore, it is difficult to envision the concerted fixation of adaptive substitutions across whole genomes, even over long periods of time (Jobson et al. 2010). We also note that the habitat transitions examined in this study likely involved major changes in population structure, with this factor previously shown to drive higher levels of protein evolution in large mammals with smaller population sizes, possibly due to a corresponding weakness of purifying selection (Popadin et al. 2007). In this study we were not able to differentiate potential mutational effects from those driven by purifying selection. However, we tested the effect of GC bias on our observed EKR and HFWY patterns by comparing the relative difference for GC rich (GARP) and AT rich (FYMINK) codons and found no significant between-group bias for the bryophytes–algae comparison.

From these results we find very little observed change in either plastid (P = 0.24) or mitochondrial proteomes (P = 0.35) (Supplementary Table S4). However, we did find significant biases of GARP for the bryophyte–tracheophyte comparisons, with a significant difference for tracheophytes relative to bryophytes in both plastid (P ≤ 0.001) and mitochondrial proteomes (P ≤ 0.001) (Supplementary Table S4).

Perhaps the most striking result is the similarity in magnitude of GARP-shift between transmembrane domains and extra-membrane regions (Supplementary Table S4), a pattern that does not correspond with the negative usage frequency of EKR in plastid transmembrane domains for the bryophytes–algae comparison (Supplementary Table S4).

It is also worth mentioning that the genetic code presumably evolved to alternate in response to changing environments by assigning the thermostable nucleotides GC to codons that encode thermostabilizing amino acid residues (Xia and Li 1998; Ikehara 2002). In support of this idea, shifts in nucleotide composition have been linked to environmental change (Cruveiller et al. 1999; Naya et al. 2002; Knight et al. 2004; Musto et al. 2004; Romero et al. 2009), and it is becoming evident that compositional shifts may be shaped by environmental conditions (Foerstner et al. 2005; Romero et al. 2009).

Conclusions

In this study we uncovered consistent patterns of amino acid compositional fluxuation that correlate with episodes of terrestrialization; positive change in usage frequency for residues with charged side-chains, and a general decrease in the usage frequency of aliphatic, hydrophobic, and small residues, and a positive change in the usage of aromatic residues in the plastid proteome. These results agree with a hypothesis that amino acid residue changes may protect proteins and nucleic acids from UV, heat, and water stress. Since the same patterns were observed in two independent terrestrialization events, charophycean algae → bryophytes, and Mesostigma → Chlorokybus, these results suggest that the amino acid compositional changes reflect adaptive evolution at the biochemical level when these photosynthetic eukaryotes moved from water to land. Once more extensive genomic data are available for comparison, taxa representing the Klebsormidiales, and Cylindrocystis and Mesotaenium of the Zygnematales should also be assessed.