Introduction

Phycobiliproteins (PBPs) are light-harvesting antenna proteins found in cyanobacteria, red algae, and cryptomonads (Zuber 1986; Glazer 1988; Bryant 1991; MacColl 1998). Allophycocyanin (APC), phycocyanin (PC), phycoerythrin (PE), and phycoerythrocyanin (PEC) are the principal classes of PBPs in cyanobacteria. Previously determined crystal structures of PBPs (Duerring 1990; Ficner 1992; Brejc 1995; Reuter 1999; Nield et al. 2003) have revealed that the tertiary fold and the general structure of macromolecular assemblages are remarkably conserved in all classes of PBPs. They consist of two different subunits, α and β, which are present in (α/β)-monomers. The basic assembly unit for all phycobilisomes are (α/β)3-trimers, which can also aggregate into (α/β)6-hexamers in a face-to-face way. Phycobilisomes are not only composed of PBPs but also contain a small amount of linker polypeptides, which can induce the aggregation of the PBP trimers (rod linker polypeptides; LR) and also connect the rods to the core (rod-core linkers; LRC), and the core to the thylakoid membrane (core-membrane linkers; LCM). Light energy is absorbed mainly by the peripheral rods, where the shortest wavelength absorbing PBPs (PE or PEC) are located. The light energy absorbed by PE or PEC is transferred in a quite efficient way to PC, and then to APC, and, finally, to the chlorophyll a in the thylakoid membrane (Glazer 1989; MacColl 1998).

The fold of the α and β subunits of PBPs is a well-defined, helical globin-like domain with seven helices (A, B, E, F′, F, G, and H), complemented by an additional helical hairpin domains (X and Y) at the N-terminus of each chain, which is responsible for the formation and stability of (α/β)-monomers (Schirmer et al. 1985; Kikuchi et al. 2000). The amino acid sequences of the biliproteins have been analyzed for homologueies (Apt et al. 1995; Ducret et al. 1994). Apt et al. (1995) suggested that at least three steps of gene duplication occurred during the PBPs evolution. First, a duplication of the ancestral gene generated a pair of tandem PBP genes. Second, this heterodimer gene gave rise to two separate lines of descent through duplication, generating the core (APC) and the rod PBPs. Third, the rod precursor duplicated again to form the PC and PE subfamilies. Huber (1989) first suggested that the differences in chromophore “environments” helped establish a complex with unidirectional energy flow and increase light harvesting capacity, which should be one of the major forces in the PBPs evolution. Our previous study also detected adaptive evolution in the chromophore-binding domain of PE subfamilies (Qin et al. 2005).

Here we extended the analyses of PBP genes and the linker polypeptide genes by isolating homologueues from the well-annotated and even unfinished cyanobacterium genomes. This allowed us to apply multiple alignment-based tools to reveal the structural and functional evolution of PBP genes. Although we cannot draw a safe conclusion that PBPs have been subject to positive selection, some residues in the chromophore-binding domain and the helical hairpin domains (X and Y) are found to have elevated d N /d S ratios. Covariation analysis also reveals that these sites are significantly correlated, indicating the functional-structural importance of interactions among these residues. Further, evidence for positive selection was found in the linker polypeptides, although more sequence and structure information are needed for further analyses. Implications of such elevated nonsynonymous changes on the cyanobacterial PBPs protein sequences, particularly those relating to the chromophore- phycobiliprotein interaction and tertiary structure stability, were discussed.

Materials and Methods

Sequence Collection, Alignment, and Phylogenetic Analysis

Sequences were obtained from GenBank with the following accession numbers: T-elong-pc (cpcAB; NP_682748), S-8102-pc (cpcAB; NP_898114), S-6803-pc (cpcAB; NP_440551), C-8501-pc (cpcAB; ZP_00176806), T-eryth-pc (cpcAB; ZP_00327625), Mast-pec (pecAB; 149774), N-7120-pec (pecAB; 9957314), P-9313-pe (peAB;NP_895509), Calot-pe (peAB; AAT36317), S-8102-1pe (peAB; NP_898107), S-8102-2pe (peAB; NP_898100), C-8501-ap (apcAB; ZP_00175376), T-eryth-ap (apcAB; ZP_00326574), S-8102-ap (apcAB; NP_896580), T-elong-ap (apcAB; NP_681747), and S-6803-ap (apcAB; NP_441194). Rod linker polypeptides (cpcC like proteins): Synechocystis sp. PCC 6803 (NP_440550, NP_440549), Synechococcus elongatus PCC 7942 (ZP_00164181), Spirulina platensis (AAD48442), Nostoc punctiforme PCC 73102 (ZP_00106783), Fremyella diplosiphon (CAA29461), Thermosynechococcus elongatus BP-1 (NP_682749), Nostoc sp. PCC 7120 (NP_484574), Synechocystis sp. PCC 9413 (AAC25588), Synechococcus sp. PCC 7002 (CAA57457), and Trichodesmium erythraeum IMS101 (ZP_00327468). Rod-core linker polypeptides (cpcG-like proteins): Synechocystis sp. PCC 6803 (NP_442922), Crocosphaera watsonii WH 8501 (ZP_00179726), Nostoc punctiforme PCC 73102 (ZP_00108261, ZP_00106626), Synechococcus elongatus PCC 7942 (ZP_00165045), Trichodesmium erythraeum IMS101 (ZP_00326815, ZP_00324847), Synechococcus sp. WH 8102 (NP_898088, NP_896409), Nostoc sp. PCC 7120 (NP_484578- NP_ 484581), Thermosynechococcus elongatus BP-1 (NP_682753-NP_ 682755), and Fischerella sp. PCC 7603 (CAA42433-CAA42435). Core-membrane linker polypeptides (apcE-like proteins): Synechocystis sp. PCC 6803 (NP_441972), Crocosphaera watsonii WH 8501 (ZP_00175246), Nostoc sp. PCC 7120 (NP_484064), Synechococcus sp. PCC 7002 (AAC14723), Trichodesmium erythraeum IMS101 (ZP_00325281), Gloeobacter violaceus PCC 7421 (NP_924191), and Synechococcus sp. WH 8102 (NP_896581).

Amino acid alignments were carried out using CLUSTAL W software with the default settings (Thompson et al. 1994) and manually adjusted using BioEdit (http://www.mbio.ncsu.edu/BioEdit/bioedit.html). Due to the alignment uncertainty, we excluded hyper-variable regions in the linker polypeptides. The nucleotide sequences were then aligned following the same gap patterns. Further analyses were all performed on this set of aligned nucleotide sequences.

Core-membrane linker polypeptide (apcE-like protein) was considered to have a close relationship with PBPs (Sidler 1994), so we used the apcE gene from Synechocystis sp. PCC6803 as an outgroup to examine the relationships among different PBPs. Phylogenetic trees made by the maximum likelihood and maximum parsimony method were constructed using PROML and PROPARS programs from a PHYLIP software package (Felsenstein 1989). To perform further likelihood ratio test, PBP nucleotide sequences and also the third codon positions were applied to construct phylogenetic trees using DNAML program from PHYLIP software. The reliability of the tree was evaluated by the bootstrap method (Felsenstein 1989) with 1000 replications.

Estimation of dN/dS Ratios Among Amino Acid Sites and Lineages

At present, the most frequently applied methods to detect adaptive evolution are based on comparisons of the frequency of nonsynonymous (replacement; d N ) versus synonymous (silent; d S ) substitutions (Yang and Bielawski 2000). Several recent studies have shown that some of the branch-site specific models under certain conditions might have a high false positive rate when used to detect positively selected sites (Suzuki and Nei 2004; Zhang 2004). Here, we used the modified NSsites models and a new Bayes empirical Bayes (BEB) method for predicting positively selected sites (Wong et al. 2004). This method has good power and accuracy in detecting positive selection over a wide range of parameter values. In this paper, maximum likelihood models, M0 (one ratio), M1a (nearly neutral), M2a (positive selection), M7 (β), and M8 (β&ω), implemented in the program CODEML of PAML version 3.14b (Yang 1997) were used to test for positive selection. However, if adaptive evolution occurs at a few time points and affects a few amino acids, above site-specific models might lack power to detect positive selection. Here we used the modified branch-site model A (Yang and Nielsen 2002), which allows the ω ratio to vary both among sites and among lineages. It can be compared with the new M1a (nearly neutral) to form a likelihood ratio test (test 1), with df=2. Another test, called test 2, uses a null model A, with ω2 = 1 fixed (Yang et al. 2005). Codon usage bias is known to affect the estimation of synonymous and nonsynonymous substitution rates (Yang et al. 2000). Thus two different codon usage bias models were used in all our analyses: F3×4 and F61. The results under these two models were quite similar, and only those under F3×4 are presented. The posterior probability that a particular codon site is positively selected can be estimated using the Bayes empirical Bayes approach described by the PAML software.

The relative rate ratio test of Creevey and McInerney (2002) was also used to explore the putative selective effects in the PBP phylogeny (Fig. 1). This method is based on the M-K test (McDonald and Kreitman 1991), which compares the ratio of replacement-invariable (RI) to replacement-variable (RV) substitutions to the ratio of silent-invariable (SI) to silent-variable (SV) substitutions. For each internal branch of the phylogeny, substitutions are counted from the branch to the descendant tips, resulting in four values: RI, RV, SI, and SV. Using a G-test or Fisher’s exact test, we can compare the ratio of RI/RV, with the expectation that this ratio is the same as the ratio of SI/SV (McDonald and Kreitman 1991). Since RI, RV, SI, and SV numbers are calculated at each internal branch, this means that we are performing multiple statistical tests on the same data, which may increase the probability of obtaining a significant result by chance alone (Roff and Bentzen 1989). To counter this problem, neutral sequence evolution without insertions or deletions was simulated for 927 bp to generate 100 pseudo-datasets, and these sequences were evolved according to the F84 model with gamma-distributed rates using Seq-Gen (Rambaut and Grassly 1997). The simulated datasets were then analyzed using the relative rate ratio test described above, which generated the expected distribution of G value from G-test at each internal branch if the null hypothesis of neutral evolution were true for the particular dataset (Roff and Bentzen 1989). This distribution was then used to assess the critical level for every internal branch according to the previous method (Creevey and McInerney 2002).

Fig. 1
figure 1

Phylogenetic tree inferred from phycobiliprotein amino acid sequences using the maximum parsimony method. Numbers under the nodes indicate bootstrap values. Scale bars represent level of amino acid sequence divergence. An asterisk above the branch indicates ω values >1.0 estimated under the free-ratio model. Thicker branches labeled A, B, C, and D are used for branch-site codon-based maximum analysis. The abbreviation names are given under Materials and Methods.

Covariation Analysis

Detecting structural interactions and statistical covariance among separate amino acid sites is of great significance for understanding protein structure and evolution (Pollock and Taylor 1997; Atchley et al. 2000). Such analyses are based on the assumption that functionally significant coordinated residues in proteins result from physicochemical interactions (e.g., hydrophobic, electrostatic). These interactions are dependent on the physicochemical properties (e.g., volume, charge, and hydrophobicity) of the residues (Tomii and Kanehisa 1996). Here, we estimated the pairwise correlation of residue substitutions at the helical hairpin domains (X and Y) and the chromophore-binding domain of PBPs. This approach, based on the estimation of the linear correlation coefficients between the amino acid physicochemical property values, was fully described by Afonnikov and Kolchanov (2004). We used different characteristics of amino acids (volume, hydrophobicity, and polarity) for covariation analysis, which reflect physical and chemical interactions between residues. Different sequence weighting methods (Vingron and Argos 1985; Henikoff and Henikoff 1994) were also used in these tests to avoid sequence bias. All the analyzed positions were clustered through the nearest neighbor method, based on the following measure of closeness for a pair of positions i, j: d ij = 1 − |r ij |, i.e., the stronger the correlation between two positions in a protein, the smaller this distance. r ij , the linear correlation coefficient, was calculated according to previous method (Afonnikov and Kolchanov 2004). The results were described using a binary tree. Each node of the tree unites two sets of positions and the position of a node on the dendrogram corresponds to the maximum correlation coefficient between all the possible pairs.

Results

Evolution of PBPs

In this study, we searched phycobilisomes assembly-related genes (PC, APC, PE, PEC, and the linker polypeptide) from all the fully sequenced cyanobacteria genomes: Synechocystis sp. 6803, Crocosphaera watsonii WH 8501, Thermosynechococcus elongatus BP-1, Trichodesmium erythraeum IMS101, Nostoc sp. PCC 7120, Gloeobacter violaceus PCC 7421, Synechococcus sp. WH 8102, Prochlorococcus marinus str. MIT 9313, and Prochlorococcus marinus CCMP1375. Phylogenies constructed with different methods (maximum parsimony and maximum likelihood) show quite similar topologies, in which every type of PBPs forms a separate clade, with APC as the basal one (Fig. 1). Phylogenetic trees derived from PBP nucleotide sequences, and also third codon positions, were very similar to the former phylogeny, but with the PEC positioned within the PC clade (data not shown). Both phylogenies were used in the subsequent analyses to detect adaptive evolution among different lineages. Phylogeny of PBPs were well characterized in previous studies and reviews (Sidler 1994; Apt et al. 1995). The duplication of the ancestral PBP gene gave rise to two separate lines of descent, the core (APC) and the rod (PC and PEC). Moreover, PECs, which form a separate group within the PC family, can combine the phycobiliviolin chromophores and are adapted to green-light absorption (Bryant 1982).

Elevated Rates of Nonsynonymous Substitutions in PBPs

The one-ratio model (M0) yielded an estimate ω=0.043 (Table 1), indicating strong selective constraints in PBPs. To test for variable ω ratios among PBP lineages, the one-ratio model was compared with the free-ratio model, which assumes an independent ω ratio for each branch. The log-likelihood value for the one-ratio model is lnL 0 = −11974.44, with lnL 1 = −11902.11 for the free-ratio model. Comparison of 2ΔlnL = 2(lnL 1 − lnL 0)=144.66 (df=30) reveals that the free-ratio model is significantly better than the one-ratio model (p << 0.001 ), suggesting rejection of the one-ratio model and, consequently, different d N /d S ratios among lineages. As shown in Fig. 1, the ω values on 12 of 31 branches are greater than 1.0, which suggest elevated rates of nonsynonymous substitution in these PBP lineages.

Table 1 Parameter estimates for the PBP genes under site-specific and branch-site models

To further investigate whether confined parts of the PBPs and whether certain lineage show signatures of selection, the branch-site model A (Yang and Nielsen 2002) was used to test for positive selection affecting individual sites along the branches leading to the PEC lineage (branch A), the PC lineage (branch B), the PE lineage (branch C), and the APC lineage (branch D). Model A, which allowed for selection along four branches, is significantly favored over the new M1a model. In PC lineage (including PEC), branch-site model A is much more realistic and fits the data better than M1a, with 2ΔlnL=117.95 and p<0.000 (df=2). Under test 2, the null model A is rejected, with 2ΔlnL=28.06 and p<0.001 (df=1). Furthermore, parameter estimates suggest a large set of sites (∼74%) evolving under strong purifying selection (ω=0.066), a small set of sites (∼25%) under positive selection. In the PE lineage, branch-site model A also fits the data better than M1a and null model A, with 2ΔlnL=28.1 and 13.66, respectively. Nine sites (posterior probability > 0.95) were identified to be under positive selection by the BEB method. For example, the amino acid in site 209 is alanine in the PC lineage and leucine in APC, while its homologous position in PE is the identified positively selected cysteine, which can bind an extra chromophore. Along the branch D, however, no significant results are found comparing the branch-site model A with the null model A (Test 2; 2ΔlnL=3.08 and p=0.079, with df=1), although it has a significantly higher likelihood value than M1a (Test 1; 2ΔlnL=31.3 and p<0.000, with df=2). Test 2 is considered to be very conservative, but test 1 might mistake relaxed selective constraint on the foreground branch as positive selection (Yang et al. 2005). We also examined the sensitivity of our analyses to the tree topology. Results (Table 1) indicated that the putative positively selected sites identified under both phylogenies were quite similar.

The relative rate ratio analysis also indicated that positive selection might operate on these four branches, with a significant difference in the ratios of RI/RV versus SI/SV (Table 2). An increase in selective constraints in the descendant lineages, however, could also lead to significant results of this test. Positive directional selection is operating when successive amino acid changes make a protein more efficient at performing a particular role, and the changes are fixed in the descendant lineages. Here, all these ratios were skewed due to the excessive number of RI events, indicating positive directional selection. In the neutral simulation study, none of the 100 simulations produced a G value greater than that obtained from the original data at these four internal branches. Thus we concluded that the results obtained from the modified M-K tests were quite significant (p<0.001).

Table 2 Relative rate ratio tests for the targeted branches in the PBP phylogeny

Variable Selective Pressures Among Amino Acid Sites in the Linker Polypeptides

To test for positive selection at individual amino acid sites, maximum likelihood analyses were carried out between model M1a (nearly neutral) and models M2a (positive selection), M7 (β), and M8 (β&ω) (Wong et al. 2004; Yang et al. 2000). M1a assumes two classes of sites in the protein: the conserved sites (0 < ω0 < 1) and the neutral sites (ω1=1). M2a, the extension of M1a, adds a third class of site, which allows for sites with ω > 1. M7 assumes a β-distribution for ω between 0 and 1, and M8 adds to M7 an extra category, with proportion p1 of sites with ω1, while the rest of sites have ω from the β distribution.

In the linker polypeptide (LC, LCR, and LCM) genes, the majority of sites were subject to strong functional constraints, with ω close to 0. No positively selected sites were identified using M1a/M2a (data not shown) and only a small portion of sites, with low posterior probabilities (p<0.95), was found to be under positive selection using M7/M8 (Table 3). The main reason may be due to the exclusion of hypervariable regions in the alignment to avoid false positives. The positively selected sites identified here are all away from the indels (data not shown), and different gap penalties in the alignment should not influence the sites identified here (Table 3). Similar parameter estimates and likelihood values were obtained using different trees reconstructed using different phylogenetic methods (data not shown); hence, these maximum likelihood analyses were robust to the tree topology. The positively selected sites at a threshold posterior probability of 90% are listed in Table 3.

Table 3 Parameter estimates for the linker polypeptide genes under site-specific models

Location of Sites with Elevated dN/dS Ratios on the PBP Domains

Structural and functional studies of many PBPs, including α and β subunits, have identified several residues that are universally conserved in PBPs (Apt et al. 1995; Bickel et al. 2002). Based on Synechocystis sp. 6803 phycocyanin protein (CpcB) numbering in this study, these residues are 82C, 84R, 85D, 13D, 91R, 95Y, 100G, and 112G. The first three are critical for chromophore interaction; the middle three (13D, 91R, and 95Y) are important for α/β pairing; the final two residues (100G and 112G) appear to be necessary for achieving the proper tertiary structure of each subunit. However, 72N and 78R, necessary for chromophore interaction in PC, APC, and some PE, are replaced by other residues in the PEC lineage (Fig. 2).

Fig. 2
figure 2

Posterior probabilities (>80%) for sites identified with elevated rates of nonsynonymous substitutions. X-axis denotes position in the amino acid alignment. Y-axis denotes posterior probability of each site. Above the figures the secondary structure elements are marked by the gray (α subunit) or darker (β subunit) lines. AD Branches A–D, leading to PEC, PC, PE, and APC, respectively.

Figure. 2 represents the distribution of sites identified with elevated rates of nonsynonymous substitutions using the BEB method (Nielsen and Yang 1998; Wong et al. 2004) along four branches, mapped against the PC structure (Padyana et al. 2001). Although these sites occur through the molecule, it is notable that most of them are concentrated on the chromophore-binding domain and the helical hairpin domains (X and Y) at the N-terminus of each chain. From Fig. 3, we can see that the β-84 chromophore is surrounded with the sites identified with elevated d N /d S ratios. Furthermore, those sites scatter over the entire primary structures of α (68T, 99L, and 116W) and β (205R, 209A, and 212Q) subunits of phycocyanin but still, when mapped onto the tertiary structure, tend to be clustered around the α-84 chromophore-binding site.

Fig. 3
figure 3

Mapping of the residues (gray space-fill) identified with elevated rates of nonsynonymous substitutions on the chromophore-containing phycocyanin structure. A pair of α subunit (black) and β subunit (black), forming a monomer, is shown by the ribbon model. Others are shown by the backbone model. Chromophores (gray) are shown by the stick model.

We then estimated the pairwise correlation of these nonsynonymous substitutions at the helical hairpin domains (X and Y) and the chromophore-binding domain. Different sequence weighting methods (Vingron and Argos 1985; Henikoff and Henikoff 1994) used to avoid sequence bias give similar results, and only the results derived from the Henikoff and Henikoff (1994) approach are shown. We firstly considered such amino acids characteristics as volumes (sizes) for covariation analysis. In the helical hairpin domains (X and Y), five of six residues with elevated rates of nonsynonymous substitutions in the APC lineage are highly correlated (positively or negatively) at the 99.99% significance level (Fig. 4). Similarly, three of five sites in the PEC lineage are significantly correlated. In the chromophore-binding domain, nearly all the sites with elevated d N /d S ratios (201S, 206A, 207L, 209A, 210E, 211Q, 220A, 226M, and 236I) in the APC lineage are correlated at the 99.99% significance level, and similar results are also found in the PEC lineage (Fig. 5). We then chose some other amino acid characteristics (polarity, surrounding hydrophobicity in α-helix, and local flexibility) to perform the covariation analyses. Results from the first two properties are similar to the former analysis, while no significant results are derived from the local flexibilities (data not shown). The strong correlation of these sites may be a consequence of selective pressure and evidence of the functional-structural importance of interactions among PBPs residues.

Fig. 4
figure 4

Binary tree diagram for networks of correlated positions in the helical hairpin domains of the PBP α subunit. The number below each node indicates the correlation coefficient value. The vertical gray lines indicate different significance thresholds. Asterisks indicate the sites with elevated rates of nonsynonymous substitutions identified in Table 1. All positions used Synechocystis sp. PCC 6803 sequence as the reference.

Fig. 5
figure 5

Binary tree diagram for networks of correlated positions in the chromophore-binding domain of the PBP β subunit. The number below each node indicates the correlation coefficient value. The vertical gray bars indicate different significance thresholds. Asterisks indicate the sites with elevated rates of nonsynonymous substitutions identified in Table 1. All positions used Synechocystis sp. PCC 6803 sequence as the reference.

Discussion

According to classical theories of gene family evolution, duplicated genes typically assume one of two evolutionary fates: the extra copy can decay due to redundancy, or the duplicated paralogues can diversify in functionality (Ohno 1970). In this study, we demonstrate that certain lineages of PBPs exhibit elevated rates of nonsynonymous substitutions, and those identified sites mostly reside in or near the chromophore-binding domain and the helical hairpin domains (X and Y). There are two possible explanations for high rates of substitution in amino acid sequences: multiple events of adaptive evolution or the relaxation of functional constraints. The PAML method used in our study provides evidence for either positive selection or a relaxation of functional constraints. Particularly, along branch D leading to the APC lineage, no significant evidence was identified for positive selection, with 2ΔlnL=3.08 (p=0.079, df=1) comparing the branch-site model A with the null model A. In this sense, we cannot rule out certain scenarios of the relaxation of purifying selection in this lineage, although significant results were found in other branches. However, those residues identified with elevated d N /d S ratios still provide useful insights into PBP evolution. A hypothesis of such adaptive changes based on the use of d N /d S values can be strengthened by the covariation analysis based on structural biology. In this study, synonymous substitution sites are evenly distributed throughout the three-dimensional structure of the PBPs (data not shown), as expected for silent changes that have no impact on the protein structure and therefore cannot be selected for or against at the protein level. In contrast, those sites experiencing nonsynonymous replacements after the duplication event are clustered on the chromophore-binding domain or the helical hairpin domains (X and Y). Most interestingly, these sites are significantly correlated and show signatures of coevolution. The fact that the majority of the identified residues could be precisely located on the PC tertiary structure provides a structural interpretation of PBP evolution.

First, several residues were identified in the helical hairpin domains (X and Y) at the N-terminus of both the α and the β subunit. These X-Y helical domains, which serve as the glue of the two subunits, were considered to play an important role in the function of the optical properties of chromophores as well as in the structures comprising the monomer (Kikuchi et al. 2000). From Fig. 2, we can see that the X-Y helical domains from both subunits might be targeted by selection pressures, suggesting that these two subunits coevolved with one another and the driving force behind is probably related to protein-protein interactions ensuring proper subunit recognition and monomer organization.

Second, many residues around the chromophore-binding domain were also identified to have elevated d N /d S ratios. This domain was thought to be the most conserved region in PBPs, which can bind with chromophore and help to transfer harvested energy (Sidler 1994; Apt et al. 1995). However, these sites were mostly fixed in the descendent sequences among each lineage, indicating that these residues might acquire new function after deviation from their ancestral sequences. Chromophores alone in cyanobacteria are not able to perform the light-harvesting function, and the apobiliprotein determines the position, the environment, the orientation, the spacing, and also the spectroscopic properties of the chromophores (Borisov 1989). In PC, for example, the replacement of some residues in this domain could cause a different packing arrangement at the interface between the trimers and a significant rotation of the chromophore D ring (Stec et al. 1999). Doust et al. (2004) developed a structure-function model for the cryptophyte phycoerythrin 545 and found that the spectral differences between the tetrapyrrole pigments were mainly due to the chemical differences between the chromophores, as well as their binding and interaction with the local protein environment. The residues identified in this domain may create a specific microenvironment for the wrapped chromophore in the trimeric aggregates to ensure the energy transfer efficiency. The perturbation experiment of certain sites in this domain could cause spectroscopic property changes, confirming the spatial requirement for an efficient resonance energy transfer among chromophores and through the PBPs (Martinez-Oyanedel et al. 2004).

Third, a number of linker proteins associated with phycobilisomes were also found to evolve adaptively. Because little is known on the linker polypeptides’ tertiary structures and their interactions with PBPs, we cannot map the putative positively selected sites on the protein structure. Previous studies revealed six motifs with high similarities from LRC and LR sequences, which could indicate functional and structural interaction sites with the PBPs (Glauser et al. 1992; Sidler 1994). Notably, the positively selected sites identified from LRC and LR sequences (Table 2) were all outside of these six motifs (data not shown). It has been suggested that these linker proteins occupy positions running through the internal cavities of the disks and may play roles in complex stabilization, rod-core assembly, and induction of the directionality of energy transfer (Adir and Lerner 2003).

Previous researches have emphasized the role of conserved amino acids in the phycobiliprotein-chromophore interaction during PBPs evolution (Glazer 1980; Sidler 1994; Apt et al. 1995). However, we found that many sites with elevated d N /d S ratios in four lineages are located in the chromophore-binding domain and the helical hairpin domains (X and Y), and these residues are significantly correlated due to interactions among amino acid residues. However, our analyses were biased toward these domains because we only focused on conserved stretches that might be unambiguously aligned and excluded most variable domains. Further studies focusing on particular closely related genes for each lineage would be useful to address the roles of selective pressures within the insertion or deletion domains. Sites and genes identified here with elevated rates of nonsynonymous substitutions should become interesting targets for functional evaluations.