Introduction

Photosynthesis is a key chloroplast process by which photoautotroph organisms transform light energy into chemical energy. The starting reaction is catalyzed by the enzyme ribulose-1,5-bisphosphate carboxylase/oxygenase (RuBisCO, E.C. 4.1.1.39), which incorporates atmospheric CO2 to the Calvin cycle, thus integrating inorganic carbon into the biosphere (Erb and Zarzycki 2018). In green algae, early diverging land plants (liverworts, mosses and hornworts), and vascular plants (fern like, ferns, seed plants), the RuBisCO Form I holoenzyme is composed by eight large subunits (L8) encoded by the chloroplast rbcL gene and eight small subunits (S8) encoded by the nuclear rbcS gene family, assembled into a hexadecamer (L8S8) (Spreitzer and Salvucci 2002).

It was early suggested that rbcL evolution is strongly constrained by function (Albert et al. 1994; Kellogg and Juliano 1997). Nucleotide sequences from the rbcL gene have been one of the most preferred plastid DNA locus for reconstructing land plant phylogenies, both at deep and lower evolutionary nodes (Chase et al. 1993; Manhart 1994; Gastony and Rollo 1995; Hasebe et al. 1995; Cameron et al. 1999; Tsubota et al. 2004; Masuzaki et al. 2010; Liu et al. 2012). Nevertheless, plant phylogeneticists using rbcL nucleotides have usually ignored functional constraints and have treated DNA sequences as if it were a string of anonymous nucleotides devoid of function as aptly pointed out by Kellogg and Juliano (1997). Negletion of the functional constraints have gone as far as to consider the rbcL gene as a universal DNA barcode (and mini-barcode) for plants (Group CPW 2009; Erickson et al. 2017).

At the amino acid level, the identification of sites subjected to adaptive evolution is essential in order to understand the RuBisCo kinetics variability. In addition, the assessment of coevolutionary replacements may reveal clues about the cytonuclear molecular evolution processes of the LSU and SSU rbc subunits that may be mediated by intergenomic gene conversion and altered transcription of duplicated, homoeologous nuclear genes (Gong et al. 2014).

In comparative analysis of protein-coding DNA sequences, the non-synonymous–synonymous rate ratio (dN/dS, denoted ω) has been usually used as a measure of selective pressure. In RuBisCO, most rbcL sites are likely to be functionally constrained and are under purifying selection (ω < 1), while only a relative small number of amino acid residues might tolerate modification and are under neutral evolution (ω = 0) or positive Darwinian selection (ω > 1). This is not surprising since RuBisCO large subunits possess the catalytic site and other amino acid residues involved in the functionality of the protein. This implies a proper structure folding, interactions with the small subunits of the RuBisCO holoenzyme, as well as RuBisCO-activase, a catalytic chaperone engaged in the RuBisCO activation (Andersson 2007). Several studies have revealed adaptive evolution of RuBisCO in all lineages of green plants (Kapralov and Filatov 2006). This approach has been rarely conducted in land plants and most of the results have been obtained from the analysis of seed plants (Kapralov and Filatov 2006; Sen et al. 2011; Kapralov et al. 2012; Galmés et al. 2014; Hermida-Carrera et al. 2016, 2017, 2020).

In addition, complex evolutionary processes modeling RuBisCO fitness have been inferred by identifying conserved residue sites whose mutations may be deleterious, and sites where amino acid replacements and coevolutionary substitutions are likely to improve the adaptive enzyme performance (Andersson 2007; Sen et al. 2011).

Extant representatives of early diverging plants, i.e., the three basal-most land plant lineages, liverworts (Marchantiophyta), mosses (Bryophyta), and hornworts (Anthocerotophyta), accumulate thousands of rbcL sequences in GenBank databases. However, a few reports related the genetic and kinetic variability of the LSU. In fact, only Miwa et al. (2009) and Kapralov and Filatov (2006) have examined rbcL sequence variation to assess selective pressure at the protein level.

This scanty knowledge is unfortunate since ancient land plant lineages may provide suitable case studies to gain insights into the molecular evolution of RuBisCO large subunits. First, their fossil record traces back to the early Paleozoic Era (Tomescu et al. 2018), suggesting a long history of genome evolution and gene expression linked to lineage diversification. Moreover, the colonization of terrestrial habitats where atmospheric CO2 concentrations differed from those present in the primeval aquatic environments might have prompted a continuous fine-tuning of RuBisCO under a selective pressure modifying the species-specific optima for photosynthesis (Iida et al. 2009). Finally, the bryopytes do not show neither an efficient cuticle layer nor stomata in the prominent phase of their life cycles (the gametophyte) and their water content is completely dependent on the humidity of the environment (atmosphere and substrate) (Glime 2007). This, in addition to their simple photosynthetic structure (in particular the low ratio of the internal photosynthetic tissues to external surface area), affects the ecophysiology of photosynthesis (Green and Lange 1994).

In the present study, we focus on the rbcL gene evolution of a moss family, Orthotrichaceae (Bryophyta), which is the single family encompassing the order Orthotrichales. It is a highly speciose and cosmopolitan bryophyte group of slow-growing plants showing a relative homogeneous plant form of the gametophyte. The family includes mostly epiphyte species showing variation in habitats.

Our main objective was to assess the evolutionary changes in the amino acid residues of the large subunit of the RuBisCO enzyme in a monophyletic group of early diverging land plants showing morphological and eco-physiological adaptations to the uneven and intermittent distribution of water in the terrestrial environment (Proctor 1979; Alpert and Oechel 1987). All these features might have modulated the evolution of RuBisCO under contrasting microdistributions and environmental niches.

Specifically, we were interested in addressing the following questions: (i) which amino acid sites appear to be positively selected and under adaptive evolution? (ii) are hot spots of amino acid variation located at specific protein locations or interfaces? (iii) are amino acid replacements lineage specific? and (iv) does variation in the amino acid residues follow any significant coevolutionary pattern?

Methods

Species Selection and Retrieval of rbcL Sequences

This study is focused on family Orthotrichaceae, excluding Erpodiaceae and Rhachiteciaceae (Goffinet et al. 1998), as delimited by Norris et al. (2004) which forms a distinct monophyletic group using plastid DNA sequences (Tsubota et al. 2004). Generic delimitation is somewhat controversial (Goffinet et al. 1998, 2004) and contradictory hypotheses have recently resulted in the splitting (Matcham and O’Shea 2005; Lara et al. 2016; Mizia et al. 2019) or lumping of genera (Calabrese and Muñoz 2006). Nucleotide and protein accessions available from GenBank databases were downloaded (accessed 5th June 2019). Takakia and Sphagnum representatives (the basal-most clade of Bryophyta), and Tetraphis pelucida, were used for comparative (outgroup) purposes. Partial rbcL sequences shorter than 435 amino acid residues were excluded from the analysis. GenBank accession numbers for the nucleotide and amino acid sequences from 45 ingroup accessions and three outgroups analyzed in this study are shown in Online resource 1.

Phylogenetic Analysis

Sequences were aligned with MAFFT v7.427 software (Katoh et al. 2017) resulting in 1425 nucleotide and 475 amino acid site alignments (Online resource 2). The sequence from Sphagnum rigescens was used as a reference for delimiting the 5′ and 3′ ends of the rbcL gene. The initial alignment was manually inspected and missing positions and indeterminations were noted as ‘NNN’ codons. Then, the alignment was trimmed using TrimAl v1.2 (Capella-Gutiérrez et al. 2009) with the -gappyout option. The optimal nucleotide substitution model was determined using jModelTest v.2.1.10 (Darriba et al. 2012), by comparing available models using the Akaike Information Criterion, Bayesian Information Criterion and Decision Theory Criterion. We inferred the phylogenetic relationships using both Maximum Likelihood and Bayesian approaches. Maximum Likelihood trees were generated in PhyML v3.1 (Guindon et al. 2010). The best fit model of nucleotidic evolution was the GTR model with a p-inv = 0.504 value for the proportion of invariant sites and a substitution rate distributed according to a discrete gamma with four categories and an inferred shaper of α = 0.871 (Γ4). In addition, a phylogenetic Bayesian analysis with the same alignment was performed using BEAST v1.10.4 (Suchard et al. 2018) with a Markov Chain Monte Carlo (MCMC) process using the same models and parameters established for the Maximum Likelihood analysis. 1 × 106 generations and trees were sampled every 1000 generations. The output trees were subjected to a phylogenetic signal analysis using the PhyloSignal package v.1.2.1 (Keck et al. 2016), running in R v.3.6.0, to estimate p-values for the methods Cmean and λ. The trend model was defined by the fastBM function of PhyTools v.0.6–99 package (Revell 2012) with μ = 0 that implies a Brownian motion with a trend. The random model was inferred generating a normal distribution with n = 48 and α = 10.

Adaptability Analysis of Amino Acid Substitution

Tests for positive, neutral or purifying selection at the molecular level can be assessed by comparing rates (ω) of non-synonymous (dN) and synonymous (dS) nucleotide substitutions (Yang et al. 2005) along a phylogenetic tree. The measure of selection pressure (ω = dN/dS) is expected to be equal to 1 under neutral selection. Departures from this value are indicative of either purifying (0 < ω < 1) and positive selection (ω > 1). Estimates of ω were carried out with the CodeML program in the PAML v.4.9 software (Yang 2007) and the SLR v.1.3 program (Massingham and Goldman 2005). SLR incorporates a Nielsen–Yang model-based distribution of ω at each site in the alignment and allows every site to be under a different level of evolutionary constraint. The SLR software tests whether a particular site is evolving in a non-neutral fashion using likelihood ratio tests (LRT) between ωi = 1 vs. ωi ≠ 1 sites and is a mesure of the strength of the evidence for selection.

Molecular adaptation tests (Nielsen and Yang 1998; Yang et al. 2000; Wong et al. 2004) were used to assess selection on a codon-by-codon basis, allowing for variation in dN/dS across the alignment and performed on the ML phylogenetic tree obtained previously. LRT were used to compare the distribution of a null model of codon substitution that does not allow for any codon to be dN/dS > 1 against a model that does. Seven different models proposed by Yang et al. (2000, 2005) were compared: M0 (one ratio), M1a (nearly neutral), M2a (positive election), M3 (discrete), M7 (beta), M8 (beta and ω) and M8a (beta and ω = 1).

Briefly, the model 0 (M0) allows for a single ω value for all sites and branches from the ML phylogenetic tree. Model 1 (M1a), considers a proportion of conserved sites (po) with ωo = 0 and a proportion of neutral sites (p1) with ωo = 1. Model 2a (M2a), the modified M2 model, adds the proportion of p2 sites with ω2 > 1 etimated from the data. In models 7 (M7) and 8 (M8 and M8a) ω was estimated from a beta distribution B (p, q) for a proportion (po) of sites. However, M7 does not allow for the presence of positively selected sites (ω > 1), in contrast with M8 models. The difference between M8 and M8a models is that under the more general M8 model ωs > 1, whereas in the M8a model ωs = 1. When the LTR indicates that models accounting for positive selection are preferred, the Bayes Empirical Bayes (BEB) inference was used to calculate posterior probabilities for the sites with ω > 1 (Yang et al. 2005). Positively selected sites were considered as supported when posterior probability values were greater than 99%.

Evolutionary Trends in Amino Acid Substitutions

Reconstruction of ancestral amino acid sites showing ω > 1 with posterior probability higher than 99% in the M8 model was carried out using the maximum likelihood (ML) approach in Mesquite version 3.04 (Maddison and Maddison 2019), which assigns to each internal node the character state that maximizes the probability of obtaining the observed character states in the terminal taxa under the specified model of evolution. The ML reconstructions were conducted using the Mk1 model of evolution (Schluter et al. 1997; Pagel 1999). The Mk1 (Markov k-state 1 parameter model) is a k-state generalization of the Jukes–Cantor model, corresponding to Lewis (2001) Mk model, which assigns equal probability to changes between any two character states.

Intra-protein Coevolution Analyses

To assess the evolutionary dependency among amino acid sites of the RuBisCO large subunit we used the CAPS software v.2 (Fares and McNally 2006) at the program server (http://caps.tcd.ie/caps/home.html). This program provides an automatization of a designed pipeline that allows comparison of a correlated variance of the evolutionary rates among pairwise sites and their estimated divergence times. The amount of amino acid replacements is used as a relative measure of time. Groups of amino acid sites were considered to be significantly coevolving when correlation values were higher than 0.5 and bootstrap values were larger than 0.95 (Yao et al. 2019).

Structural Analysis of RuBisCO

A model for the LSU structure of Macromitrium japonicum (GenBank accession BAD98508) was generated at the Swiss-Model web site (Waterhouse et al. 2018), a fully automated protein structure homology-modeling server. The protein structure was annotated with PyMol v.2.3.2 (https://pymol.org/2/), a molecular visualization tool. Structural motifs of the large subunit of RuBisCO were obtained from Kellogg and Juliano (1997) and Spreitzer and Salvucci (2002).

Results

rbcL Variability

Complete sequences of the rbcL gene (475 amino acid residues) were available for 16 ingroup accessions (14 species). The shortest sequence (437 sites) was present in one of the Pulvigera lyellii accessions (Online resource 1). There were 39 variable sites (Table 1), of which 10 were restricted to a single accession and 29 were shared by at least two sequences. Three or four amino acid states could occur at codons 32 (lysine, serine, threonine, leucine), 33 (aspartic acid, glutamic acid, glutamine), 221 (valine, cysteine, isoleucine), 230 (alanine, glycine, serine), 255 (glutamine, alanine, glutamic acid), 256 (phenylalanine, cysteine, alanine), 424 (valine, alanine, leucine), 449 (alanine, threonine, serine) and 475 (valine, leucine, isoleucine). Overall, we identified 43 unique sequences (amino acid haplotypes) in the ingroup dataset (45 accessions, 43 species), indicating that all species could be identified by a unique amino acid profile of the rbcL gene.

Table 1 Amino acid substitutions in Orthotrichaceae rbcL

Seven species with two accessions were present in GenBank. The two sequences of both Macromitrium gymnostomum and M. ferriei showed the same haplotype. However, intraspecific variation was detected in Macromitrium incurvifolium (five amino acid sites), Pulvigera lyellii (four sites), Ulota crispa (three sites), Nyholmiella obtusifolia (two sites), and Macrocoma tenuis (one site).

Phylogenetic Analysis

The statistics values and associated p-values of the phylogenetic signal analysis on the ML and Bayesian trees are indicated in Online resource 3. The best significance of the trend model values compared to the random model indicated the presence of phylogenetic signal in both ML and Bayesian trees. Since the phylogenetic trees obtained by both approaches showed similar topologies, we selected the ML tree for the analyses, since the p-value of the random model and trend model test showed a better adjustment to the trend model. The selected ML tree used to assess the adaptability analysis of amino acid substitution and to infer the ancestral amino acid sites is shown in Fig. 1.

Fig. 1
figure 1

Maximum Likelihood tree of Orthotrichaceae inferred from rbcL sequences under the GTR model with a p-inv = 0.504 value for the proportion of invariant sites and α = 0.871 as the gamma shape value for four rate categories. Posterior probability values are indicated by circles in different gray tones according to the inset label

Tests of Positive Selection in the rbcL Gene

Table 2 lists the parameters-estimated values for seven codon substitution models of molecular evolution of the RuBisCO LSU. The significance of nested site-specific models were tested by LRT and the results are shown in Table 3. Model comparisons indicated that the rbcL gene has evolved in a non-neutral fashion since the null model assuming neutrality (M0) was rejected in favor of the alternative M1a model. Heterogeneity in ω ratios was shown in the comparison of the M3 and M0 codon models, where the former model was favored. Finally, the M2a and M8 models of positive selection were preferred over the null M1a and M7 models, suggesting the presence of positive selection. The amino acid residues 32, 33, 91, 230, 247, 251, 255, 424, 449 and 475 were identified as evolving under positive selection under the M8 model, and a subset of them (33, 91, 230, 475) under the M2a model, computed with PAML by the BEB analysis with a Bayesian posterior probability larger than 0.99. For each analyzed genus, the ten amino acid residues selected by the M8 model are indicated in Online resource 4. The average ω ratio of 0.053 was calculated by SLR with a log-likelihood of −5878.420, indicating a conservative nature of the whole rbcL gene. The single rate homogeneity was discarded in two positions, 251 and 475, which are a subset of those previously inferred by the M8 model.

Table 2 Parameter estimates and log-likelihood values for Orthotrichaceae under seven codon substitution models
Table 3 Test for selection models by LRT

Structural Distribution and Features of Amino Acid Replacement Sites

The amino acid substitutions identified in Orthotrichaceae, the location of the residues in the structure regions of the RuBisCO large subunit, their evolutionary polarity (using Sphagnum species as reference), and the type of physical properties of the changes are presented in Table 1. Seven substitutions were located on the N-terminal and the C-carboxyl terminus, 16 in α-helices, six in β-strands, two between α-helices, six between β-strands, one between α-helix and β-strand, and one between a β-strand and a loop. The placement of the variable sites in the three-dimensional structure of the RuBisCO large subunit is shown in Fig. 2. Inferring the ancestral state of several amino acid substitutions, and hence assessing the direction of change in their physical properties, was not always possible using outgroup comparisons. In some cases, sites were variable in the outgroup (23, 28, 255, 279, 306, 328, 340, 443, 470). In others, sites were monomorphic in the outgroup (11, 32, 33, 221, 230, 256, 449, 475) but showed three or four amino acid substitutions in variable sites of Orthotrichaceae. This precluded a straightforward explanation for multistate characters and require theories of character evolution, or to assume that any transformation series between different states is always possible and could be established with certainty. The inspection of the 21 sites for which the direction of change could be indicated, showed that 11 substitutions did not involve changes in both the hydrophobicity and polarity of the amino acids, three changed only the hydrophobicity of the residues, two affected their polarity, and five were implicated in changes in both their hydrophobicity and polarity (Table 1).

Fig. 2
figure 2

Estimated ω values under the M8 model for Orthorichaceae rbcL residues (bottom) using the ETE Toolkit v.3.1.1 pipeline (Huerta-Cepas et al. 2016). The position of α-helices and β-sheets is indicated according to the secondary structure given in Kellogg and Juliano (1997). Positive selected sites are indicated (top) in the tertiary structure model for the LSU of Macromitrium japonicum (GenBank accession BAD98508) generated at the Swiss-Model web site (Waterhouse et al. 2018). Alignment for the ten indicated positively selected sites is shown in Online resource 5

Ancestral Amino Acid Sequences

The ML reconstruction of ten ancestral amino acid sites (32, 33, 91, 230, 247, 251, 255, 424, 449 and 475) showing positive selection signatures (ω > 1) by BEB inferred by the M8 codon model is shown in Online resources 6–15. Overall, most of the site changes in the ancestral sequences were located at the more derived nodes of the tree. Independent changes in the different states of the ten amino acids are present throughout the phylogenetic tree.

Inter-dependent Evolution of Amino Acid Sites in the LSU Subunit

Coevolutionary interactions between pairwise amino acid sites in LSU were assessed using the CAPS software. The most significant coevolving amino acid pairs (i.e., those showing correlation and bootstrap values higher than 0.5 and 0.95, respectively) are shown in Table 4. The relevant associations involved nine coevolutionary pairs involving 13 sites (11, 30, 50, 89, 262, 270, 348, 387, 404, 443, 449, 470 and 475). The most interacting site (50) involved four amino acid residues and was located on the N-terminal domain (α-strand B). In turn, sites 89 (β-strand C) and 262 (β-strand 4) (C-terminal domain) were the most frequent significant coevolving residues involving two sites. Interestingly, only two sites involved in the coevolutionary interactions (449, 475) were identified as subjected to adaptive selection (see above).

Table 4 Coevolving pairs of amino acids with a correlation higher than 0.50 and a bootstrap value higher than 95% obtained with CAPS v2 (Fares and McNally 2006)

Discussion

A relative large amount of rbcL gene sequences (about 9000) are currently available in public databases for the three extant groups of early land plants (GenBank accessed 7th August 2019). However, most of these nucleotide data have been obtained for phylogenetic reconstruction, taxonomic purposes or molecular species identification (Tsubota et al. 2004). In fact, only two studies, specifically assessing adaptive evolution in rbcL have used protein sequences to analyze the molecular evolution of RuBisCO in early diverging plants (Kapralov and Filatov 2007; Miwa et al. 2009).

rbcL Residue Variation in Orthotrichaceae

We have identified 39 amino acid replacements representing 8.4% of variable sites in the LSU. Certainly, this number is conservative and probably does not account for the real extent of variable amino acid sites present in Orthotrichaceae. On the one hand, few accessions were available for the largest genera of the family, including Macromitrium (about 370 species; Guo et al. 2012), Orthotrichum (103 species; Lara et al. 2016), Zygodon (93 species; Calabrese and Muñoz 2006), Ulota (69 species; Garilleti et al. 2015) and Lewinskya (66 species; Lara et al. 2016), and sampling of additional unrelated species are required to substantiate the current data. In addition, most rbcL sequences were incomplete and usually lack about 35 residues of the complete gene as a consequence of the close location of the amplifying and sequencing primers at both the 5′ and 3′ ends, where variable sites have been identified in plants (Kellogg and Juliano 1997) and in our Orthotrichaceae dataset.

Despite these shortcomings some conclusions can be drawn from a comparison of the sequences. Inspecting the GenBank sequence data obtained by Miwa et al. (2009) and Shaw et al. (2016) in two early diverging genera from liverworts (Conocephalum) and mosses (Sphagnum), 5.29% and 4.84% of variable amino acid sites, respectively, were identified. The roughly double value present in Orthotrichaceae was perhaps not unexpected as it was obtained from a more diverse phylogenetic sample (involving deeper nodes; 15 genera). However, when we compare the range of variable sites in these three groups of early land plants with other plant vascular lineages for which data are available, it appears that bryophytes may show more RuBisCO variability per phylogenetic unit. For instance, the number of amino acid replacements reported for Brassicaceae (3.93% across 45 genera and 33 species; Liu et al. 2012), and Fagaceae plus Nothofagaceae families (4.00% across 6 genera and 190 species; Hermida-Carrera et al. 2017). There is not a direct association between rates of evolution and the number of variable sites in genes. In fact, it has been estimated that bryophytes show slow molecular evolution compared with vascular plants (ferns and seed plants) for ribosomal and protein-coding genes from the three molecular genomes (Stenøien 2008). However, a larger proportion of variable sites implies an evolutionary plasticity for the molecular adaptation of the RuBisCO protein to changing selective forces. The dataset used for comparison is extremely reduced that it can only stimulate future research in this specific topic.

Amino Acid Site Variations are Located at Different Structural and Functional Domains

Variable amino acid sites detected in Orthotrichaceae were unevenly distributed across the LSU, which is in accordance with its complex structure and functional domains. Eight amino acid substitutions are located at the hydrophobic cores of the rbcL monomer, which are critical for the stabilization of the folded state. The hydrophobic cores of the N-terminal domain and that in the interface region between the N-terminal and C-terminal domains did not show any amino acid residue variation. In addition, only a site (292) was variable at the first hydrophobic core in the C-terminal domain, at the α/β barrel that is formed by those residues from the β-strand 5 that point to the interior of the barrel. In contrast, most of the variation was placed at the second hydrophobic core of the C-terminal domain (200, 221, 326, 328, 374, 400, and 424). Interestingly, amino acid variation at sites 200, 328, and 400 involved changes of hydrophobicity.

Another set of amino acid substitutions were at key positions involved in the correct assembly of the RuBisCO holoenzyme, mainly at the dimer interface (L2 dimer). Thus, sites 247, 279, 301, 306, and 309 were at the C-terminal domain of two L2 subunits, site 470 was at the C-terminal domain of one L subunit interacting with the N-terminal domain of a second L unit, whereas site 288 was involved at the dimer-dimer interface. This last residue, together with sites 288 and 429, interacts between a large subunit (L subunit B) and a small subunit (S1) of the L8S8 holoenzyme. However, all these sites are not conserved across the large subunit of RuBisCO across seed plants. In fact, three alternative residues have been reported to be present at sites 279, 288, 301, and 309, four at site 306, five at site 247, six at site 429, whereas nine and ten amino acids have been indicated at sites 470 and 230, respectively (Kellogg and Juliano 1997).

The residue present at the active site (201) needs to be carbamylated for operating functional RuBisCO (Kannappan and Gready 2008). As expected, this residue was strictly conserved in Orthotrichaceae. This lack of variation was also observed in key residues at sites 175, 204, 294, and 334, which are involved in the multistep catalytic reactions of RuBisCO (Andersson et al. 1989; Andersson 2007; Kannappan and Gready 2008). Likewise, residues 331–338, forming Loop 6, which are associated to cover the opening of the α/β barrel, serving to close the active site and influencing the CO2/O2 specificity (Andersson 2007), were conserved.

Two variable sites in Orthotrichaceae, 91 and 94, were present in a region, the βC–βD loop (residues between 89 and 94), which has been previously identified as critical for species specificity of RuBisCO-activase interaction (Ott et al. 2000). The βC–βD loop is located on the surface of the RuBisCO holoenzyme close to the conserved Loop 6 (Andersson 2007). The binding of RuBisCO-activase (site 311) at residue 94, together with the likely steric interactions of residues 314 and 312 on RuBisCO sites 89 and 93, respectively (Portis et al. 2007), promotes conformational changes in RuBisCO releasing the inhibitor sugar phosphates from the active site (Portis 2003). It has been also hypothesized that RuBisCO amino acids at sites 89 and 93 are likely associated to steric interactions with RuBisCO-activase residues 314 and 312, respectively (Portis et al. 2007). Given the specific critical residues for activase recognition in the large subunit of RuBisCO, it appears that residues at the βC-βD loop should be highly conserved. This is not the case in Orthotrichaceae nor in seed plants. Thus, sites 91 and 94 are embedded within the hypervariable region from residue 86 through 95 and constitute one of the main mutational hotspots in the amino acid sequences of the large subunit of RuBisCO (Kellogg and Juliano 1997; Larson et al. 1997). This marked variability contradicts observations, reporting reversal in RuBisCO/RuBisCO-activase activity, when mutations at the βC–βD loop occurred (Larson et al. 1997). It seems plausible to argue that mutations at the βC–βD loop might be functionally neutral if compensatory changes at this activase region do not cause substantial differences in conformation at this surface domain (Ott et al. 2000) that may disturb critical interactions for the binding of both enzymes.

Positively Selected Amino Acid Replacements and Codependent Evolution of Amino Acid Sites in the LSU Subunit

The evolution of the RuBisCO LSU subunit has been hypothesized to be under strong biophysical constraints and is modulated by recurrent tradeoffs between activity and stability (Studer et al. 2014). Thus, the acquisition of enhanced activity has been mediated by destabilizing structural mutations that are followed by compensatory mutations that restore global stability.

Coevolution of residues is common in RuBisCO and about half of the amino acid sites were detected as coevolving in green algae and land plants (Wang et al. 2011). Furthermore, it has been detected an overall overlap between coevolving and positively selected residues (Wang et al. 2011).

Our results in Orthotrichales show that matches of coevolving (13 amino acid sites) and positively selected residues (ten sites) are few (two sites). The reasons underlying this low overlap merit discussion. Several algorithms involving different assumptions and methods have been developed to identify compensating alterations during protein evolution (see Juan et al. 2013 for a comprehensive account). Dunn et al. (2008) reported that a high background composed of random noise and phylogenetic components may interfere with the identification of evolving positions. In addition, most of the studies on protein evolution are performed over the linear sequence, ignoring the predicted atomic interactions between amino acid sites. The fact that not only amino acid interactions and function, but also phylogeny and stochastic components account for co-variation, caution about the correct identification of the coevolution components (Fares and Travers 2006; Codoñer and Fares 2008). We have reanalyzed our rbcL alignment using the MIp software (Dunn et al. 2008) that does not consider tree topology nor protein structure. This method is based on information theory and accurately estimates the level of background mutual information for each pair of positions and then the algorithm corrects the information formula to remove the influence entropy. Interestingly, MIp has identified only three coevolving sites (32, 255, 256) which are different from those identified by the CAPS software (11, 30, 50, 89, 262, 270, 348, 387, 404, 443, 449, 470 and 475). More important, two of the three coevolving sites selected by MIp (32 and 255) were also identified as evolving under positive selection under the M8 model in this study. Clearly, additional studies are needed to identify consensus coevolutionary amino acid sites in bryophytes using several contrasting assumptions and methods.

Positively Selected Amino Acid Replacements

Ten rbcL sites have been identified by the Bayesian Empirical Bayes inference using the M8 model to be under positive selection in Orthotrichaceae. However, using the SLR test only two out of the ten residues (251 and 475) remained significant (95%) as positively selected sites. This decrease is not restricted to Orthotrichaceae since similar trends have been generally observed in other studies when PAML and SLR results were compared (e.g., Yao et al. 2019).

Kapralov and Filatov (2007) searched for positive selection in over 3000 rbcL sequences from species representing all lineages of green plants and other photosynthetic organisms. Their study analyzed 88 species of Bryophyta and included 12 rbcL accessions from Orthotrichales which have also been used in our study. Two out of the four rbcL residues identified to evolve under positive selection in Orthotrichales by Kapralov and Filatov (2007), sites 251 and 255, have been detected in our expanded dataset by the M8 model. Both sites are among the most frequently reported residues to be under selection by Kapralov and Filatov (2007) across all lineages of green plants. However, the sites 32T, 33E, 91P, 230A, 247S, 424V, 449T, and 475I were not detected by these authors neither in Orthotrichales nor in the whole Bryophyta dataset. These diverging results are intriguing since their analysis and ours used the same (a) parameter estimates from M8 model, (b) Bayesian Empirical Bayes approaches, (c) Bayesian posterior probability of positive selection larger than 0.99, and (d) identical PALM package. Maximum likelihood estimates to reveal diversifying selection at amino acid sites relies on the phylogenetic relationships among sequences (organisms) (Yang et al. 2000). Due to computational time constraints, Kapralov and Filatov (2007) divided all of the rbcL sequences they analyzed into 151 small monophyletic groups by manual dissection of trees constructed, using the neighbor-joining algorithm. They built the phylogenetic trees using rather crude distance estimates (a homogeneous pattern of molecular evolution among lineages and uniform rates among sites) and no measure of clade reliability (as bootstrap values) was reported to be applied to their phylogenetic results. The four monophyletic groups presented in their Additional file 3 (Kapralov and Filatov 2007) showed spurious results which clearly conflicts with currently supported phylogenetic relationships in Bryophyta (Tsubota et al. 2004; Liu et al. 2019). More importantly, Kapralov and Filatov (2007) results indicated that Orthotrichales is polyphyletic; an inference that contradicts all phylogenetic hypotheses based on coding and non-coding sequences from the nuclear and both organellar genomes (Goffinet and Vitt 1998; Goffinet et al. 1998, 2001, 2004; Tsubota et al. 2004; Liu et al. 2019). It is likely that the use of non-supported phylogenetic trees in Bryophyta and the analysis of rbcL sequences from polyphyletic assemblages led to artefactual results concerning the identification of positively selected sites in Bryophyta in general, and in Orthotrichaceae in particular. It has been stated that the inference of sites under positive selection does not appear to be sensitive to the tree topology using ML trees (Yang et al. 2000). However, these authors also emphasized that a reasonably good phylogeny is necessary to perform LTRs of positive selection (Pie 2006). The odd phylogenetic results obtained by Kapralov and Filatov (2007) in rbcL sequences from Bryophyta may have influenced the detection of overall positive selection in RuBisCO, and could have affected the number of sites identified. Further research is necessary on the subject, since the RuBisCO sites shown to be under adaptive selection (117, 169, 247, 279, 309, 340) in another lineage of Early Land Plants (Marchantiophyta) by Miwa et al. (2009) are in disagreement with those reported by Kapralov and Filatov (2007) and this study.

Concluding Remarks

Most members of Orthotrichaceae are epiphytes and usually species from the same and different genera grow intermingled. We have shown that, despite this apparent environmental homogeneity, all species analyzed to date show unique L subunit protein haplotypes. Ten rbcL sites (32, 33, 91, 230, 247, 251, 255, 424, 449 and 475) have been strongly supported to be positively selected and under adaptive evolution. The pattern of amino acid variation suggests that it is not lineage specific, but represents a case of convergent evolution suggesting recurrent changes that potentially favor the same amino acid substitutions and likely optimized the RuBisCO activity. The selected residues are located on rbcL sites that are highly variable in higher plants and close to key regions implying dimer–dimer (L2L2), RuBisCO-activase interactions, and conformational functions during catalysis. Our results prompted future research in RuBisCO kinetics in co-growing species of Orthotrichaceae to assess to what extent catalytic properties of the holoenzyme are significantly different and which are the eco-physiological niche microenvironments driving RuBisCO variability.