Introduction

Cyanobacteria are the simplest organisms known to have an endogenous circadian clock (Kondo and Ishiura 1999). The model species Synechococcus elongatus PCC 7942 contains a cluster of three tandemly arrayed genes, kaiA, kaiB, and kaiC, which has been identified as a key element of the circadian system (Ishiura et al. 1998). Possession of a circadian clock has been shown to enhance the adaptive fitness of cyanobacteria in a wide variety of environmental conditions (Woelfle et al. 2004).

In addition to the three kai genes, which were thought to be indispensable for circadian oscillation (Ishiura et al. 1998; Kitayama et al. 2003; Xu et al. 2003), several other genes have been identified to control the input and output of the cyanobacterial clock (see Golden and Canales 2003, for review). An evolutionary analysis of various components of the circadian system (Dvornyk et al. 2003; Dvornyk et al. 2004; Dvornyk and Knudsen 2005; Dvornyk 2006a) suggested that cyanobacteria have at least two types of the system, those with and those without kaiA (hereafter referred to as kaiABC and kaiBC system, respectively). The species lacking kaiA possess a timing mechanism, although it is less robust than the original kaiABC system (Holtzendorff et al. 2008; Axmann et al. 2009).

The cikA (circadian input kinase) gene encodes a bacteriophytochrome-like histidine kinase involved in the input signaling of the clock (Schmitz et al. 2000). CikA was reported to have three distinct domains: GAF, histidine-protein kinase, and a receiver domain (Mutsuda et al. 2003). This structure is typical for bacteriophytochromes (Fankhauser 2001). However, CikA is missing a conserved cysteine residue, which serves as a bilin ligand in the sensor domain of typical phytochromes. Owing to this deletion, it is categorized as an unusual bacteriophytochrome (Schmitz et al. 2000). A recent study showed that CikA senses light not by a chromophore binding to the GAF domain, but through detecting quinones (Ivleva et al. 2006). The concentration and redox state of quinones in a cell is light dependent. Mutants that are cikA-deficient have a shorter circadian period of gene expression and altered phasing of rhythmicity (Schmitz et al. 2000).

In this study, we have analyzed the occurrence, domain architecture, level of variation and phylogeny of the cikA gene in order to reconstruct the evolutionary history and to determine the evolutionary factors that have been operating on this component of the cyanobacterial circadian system. We have also attempted to estimate a timeline for key events in the evolution of both cikA and the entire circadian system. This study provides new data about the probable functional importance of various structural motifs of the CikA protein, and significantly updates our knowledge about the evolution of the cyanobacterial circadian system as a whole.

Methods

DNA and Protein Sequences

The homologous sequences of the CikA proteins, 16S rRNA, and 23S rRNA genes were retrieved from the GenBank non-redundant database using gapped PSI-BLAST (with 3 iterations) and BLASTN tools (Altschul et al. 1990; Altschul et al. 1997). The following GenBank accession numbers of the sequences from S. elongatus PCC 7942 were used as queries: AAF82192.1, AF132930.1, and CP000100.1. Only the sequences from completely sequenced cyanobacterial genomes were utilized for the phylogenetic analysis. Since CikA is a member of a large family of bacteriophytochromes, two criteria were used to filter the sequences for the subsequent analyses. First, the proteins should have at least three (GAF-HisKA-HATPase_c) of the four domains arrayed in the same order as in the originally described CikA. Second, all these domains should display sufficiently high homology to bona fide CikA (bit score of 177 was used as a lower limit of homology). With such an approach, some proteins having formally higher similarity score (usually limited to the HisKA-HATPase_c domains) but lacking the above domain architecture were excluded from the analyses.

The sequences were aligned using MUSCLE (Edgar 2004). The aligned 16S rRNA and 23S rRNA sequences were trimmed and concatenated. The CikA protein sequences were manually adjusted based on the available data about the protein’s structure (Mutsuda et al. 2003) to match the putative domains. The cikA nucleotide sequences were aligned against the aligned protein homologs using RevTrans v. 1.4 (Wernersson and Pedersen 2003) available online at http://www.cbs.dtu.dk/services/RevTrans/. The list of the used sequences is given in Supplementary data online Tables S1–S3.

Analysis of Variation and Phylogenetic Reconstruction

The DNA substitution model that fitted the data best was determined for the concatenated 16S rRNA and 23S rRNA genes using the hierarchical test as implemented in the ModelTest 3.0 software (Posada and Crandall 1998) and HYPHY (Kosakovsky-Pond et al. 2005). Based on the results of this test, the Tamura–Nei model of substitutions with gamma distribution (Tamura and Nei 1993) and α = 0.40 was used for further phylogenetic analysis of the concatenated rRNA genes. For the CikA protein homologs’ phylogenetic reconstruction, two empirical amino acid replacement matrices were tested: WAG (Whelan and Goldman 2001) and LG (Le and Gascuel 2008). The latter yielded a tree with significantly better likelihood scores. In the reconstruction of the species tree, the 16S rRNA and 23S rRNA genes of the proteobacterium Rhodobacter sphaeroides ACC 17029 were used as an outgroup.

The rate of nonsynonymous nucleotide substitutions per nonsynonymous site (d N) was calculated using the Pamilo–Bianchi–Li method (Pamilo and Bianchi 1993; Li 1993). The rate of synonymous nucleotide substitutions could not be estimated due to saturation. The MEGA 4 software package (Tamura et al. 2007) was used for the computations of d N.

The phylogenetic tree of the CikA-like proteins (in total 1113 sites) was constructed using the maximum-likelihood (ML) algorithm implemented in the PHYML 3.0 software (Guindon and Gascuel 2003). The phylogeny of the concatenated rRNA (4529 sites) genes was reconstructed using two approaches: the ML method as described above and the Bayesian relaxed clock phylogeny as implemented in the BEAST software (Drummond and Rambaut 2007) with MCMC run for 10 million generations and trees sampled every 1000 steps. The reliability of tree topologies inferred with the ML approach was statistically evaluated using nonparametric bootstrap (100 replications) and the approximate likelihood-ratio test (aLRT) (Anisimova and Gascuel 2006). Branch lengths in the species tree were then estimated using the ML with local clock and the above specified parameters of the Tamura–Nei model of substitutions with gamma distribution (Tamura and Nei 1993) implemented in PAML v. 4.1 (Yang 2007). In a case of the Bayesian phylogeny reconstruction, the maximum clade credibility tree was inferred using TreeAnnotator v.1.5.3, which is included in the BEAST software package.

Reconstruction of the Evolutionary Time Scale

The inferred 16S–23S rRNA tree was tested for the presence of global and local molecular clock using the respective algorithms implemented in HYPHY (Kosakovsky-Pond et al. 2005). Based on the results of the test, the model with local clock was used for the further analysis.

Two internal calibration points were used for the time estimates. One point (CP1) is based on the fossil record about the origin of cyanobacteria and is constrained either by ~3500 MYA (Schopf and Packer 1987; Walsh 1992; Kazmierczak and Altermann 2002; Altermann and Kazmierczak 2003; Schopf et al. 2007) or by ~2600 MYA (Summons et al. 1999). Another point (CP2), which refers to the appearance of heterocystic cyanobacteria, was calibrated using both molecular and geological data and was estimated between 2450 and 2100 MYA (Tomitani et al. 2006). We used an average value of 2200 MYA for the analysis. These computations were conducted using PAML v. 4.1 (Yang 2007). We also used the Bayesian relaxed clock phylogeny estimation (Drummond and Rambaut 2007) as described above. Uncertainty in the estimates was indicated by 95% highest posterior density (95% HPD) intervals.

Modeling of the 3D Structure of the GAF Domain in the Bona Fide CikA Protein

The GAF domain and the adjacent N-terminal region both are critical for the circadian function of CikA, as they control phosphorylation of the kinase domains (Mutsuda et al. 2003). Therefore, we modeled a 3D structure of the GAF domain and mapped on it the conserved motifs identified in this study. The 3D structure was modeled using a majority consensus sequence (in total 180 amino acids) from the alignment of the region between 20 bona fide CikA proteins. The initial model was constructed using (PS)2 Protein Structure Prediction Server (http://ps2.life.nctu.edu.tw/) with following options selected: both PSI-BLAST (Altschul et al. 1997) and IMPALA (Schaffer et al. 1999) for template search, and RAMP (http://software.compbio.washington.edu/ramp/ramp.html) for the model building. The obtained initial model was then optimized using the MolProbity server at http://molprobity.biochem.duke.edu/index.php (Lovell et al. 2003). The quality of the model’s versions was assessed with PROCHECK (Laskowski et al. 1993), ERRAT (Colovos and Yeates 1993), and VERIFY_3D (Luthy et al. 1992), as implemented in SAVES (http://nihserver.mbi.ucla.edu/SAVES_3/), and ProQ (Wallner and Elofsson 2003) (http://www.sbc.su.se/~bjornw/ProQ/ProQ.cgi) The model built upon a consensus of the following three templates (PDB ID codes): 2K2N (Cornilescu et al. 2008), 2OOL (Yang et al. 2007) and 2O9C (Wagner et al. 2007) yielded the best scores.

Identification of Amino Acid Residues of Potential Functional Importance in the CikA Proteins

The level of conservation is usually correlated to the functional importance of a particular amino acid site or a sequence motif (Kimura 1983; Graur and Li 2000). If particular sites in one protein subfamily are more conserved (or fixed) as compared to other subfamilies, they are assumed to be more functionally important for that subfamily. We applied ConSeq (Berezin et al. 2004) available at http://conseq.tau.ac.il/index.html) to identify which of the conserved residues are of potential functional significance.

Results

Architecture, Occurrence, and Phylogeny of the cikA Genes and Their Homologs

Comparison of the S. elongatus CikA against the NCBI Conserved Domain Database (Marchler-Bauer et al. 2003) suggested that the protein consists of four tandemly arrayed functional domains: a GAF domain, a histidine kinase phosphoacceptor domain (HisKA), a histidine kinase-like ATPase domain (HATPase_c), and a signal receiver domain (REC) (Fig. 1). HisKA and HATPase_c are usually considered to be components of a single histidine-protein kinase (HPK) unit and are referred to as dimerization and catalytic (ATP/ADP-binding phosphotransfer) domains, respectively (Stock 1999). A BLAST search of available completed bacterial genomes revealed several hundred cikA homologs that may be classified as two-component histidine kinases. However, the majority of the matches were limited to either two (HisKA and HATPase_c) or three (HisKA, HATPase_c, and REC) domains. A relatively small number of the homologs possessed the GAF domain. Both GAF and its adjacent N-terminal region are critical for autophosphorylation of the HisKA domain, and their deletion negatively affects CikA expression (Mutsuda et al. 2003). Therefore, only genes indicating homology to GAF as well as the HisKA and HATPase_c domains were selected for the further analyses. A number of the homologous genes containing other domains in addition to the originally described cikA four-domain architecture (GAF–HisKA–HATPase_c–REC) were found in cyanobacteria. The other domains included, but were not limited to, sensor domains (e.g., PAS, GAF, and the others), CBS, REC, HPT, and the others. For example, the genes from the closely related thermophilic Yellowstone isolates Synechococcus sp. JA-2-3B’a(2-13) (JA-2 YP_476763) and Synechococcus sp. JA-3-3Ab (JA-3 YP_476201) (Table S1, Supplementary material online) have the following domain architecture: GAF–CHASE–PAS–PAS–GAF–HisKA–HATPase_c–REC–REC–HPT (Fig. S1, Supplementary material online).

Fig. 1
figure 1

Domain architecture of the CikA protein with the mapped motifs of putative functional and/or structural importance. Dashed boxes represent regions not present in CikA of all species

Both ML and Bayesian phylogenetic analyses of the CikA-like proteins yielded an identical tree topology featuring five major distinct clades with high statistical support. They are designated hereafter as A1–A5 (Fig. 2). Clade A1 includes the originally described CikA from S. elongatus PCC 7942 (Schmitz et al. 2000) and its closest homologs, which are thus presumed to have a circadian function. These proteins usually have a fairly stable four-domain architecture GAF–HisKA–HATPase_c–REC (as in CikA of S. elongatus PCC 7942), except for those of filamentous heterocystic Nostocales (genera Anabaena and Nostoc) lacking the REC domain (Fig. S1, Supplementary material online). All other CikA-like proteins are from cyanobacteria possessing the original kaiABC system, except for Gloeobacter violaceus PCC 7421, which does not have any kai genes (Nakamura et al. 2003). However, these homologs are more variable in their architecture, featuring various additional domains upstream in the N-terminal region or downstream in the C-terminal region, as described above (Fig. S1, Supplementary material online). The different domain architectures of the cikA homologs are likely indicative of their different functional assignments. Importantly, only the proteins from clade A1 (presumably the bona fide CikA proteins) occur in the genomes of all studied cyanobacterial species with the original kaiABC system. The proteins from clades A2–A5 were found only in a subset of the species presented here. These phylogenetic patterns are similar to those of other known circadian genes and their non-circadian homologs (Dvornyk 2006b).

Fig. 2
figure 2

Unrooted maximum-likelihood tree of the CikA homologs in cyanobacteria. Bar, 0.1 substitutions per site. Maximum-likelihood probabilities of the node support <0.5 and bootstrap <50 are not shown. A1–A5 refer to the subfamilies of the CikA homologs. Clade A1 comprises bona fide CikA proteins. For the designations of the proteins see Supplementary data file 1

A notable feature of the cikA homologs is their high diversification in some species. The number of the homologs can vary considerably, from a single gene copy in G. violaceus PCC 7421, Trichodesmium erythraeum IMS101, and the others, up to the eight copies present in Acaryochloris marina MBIC11017 (Table S1, Supplementary material online).

Divergence of the cikA Homologs and cikA-Like Genes

The cikA homologs from cyanobacteria exhibit domain-specific patterns of nucleotide variation. HisKA and HTPase_c are usually the most conserved domains (Table 1). The genes from A1, which presumably have a circadian function, appear to be the second least polymorphic after those of A2 (Table 1). Despite this, the REC domain of the genes from clade A1 displays the lowest level of conservation.

Table 1 Patterns of nonsynonymous nucleotide substitutions (d N) in the different regions of the cikA genes and cikA-like homologs

The CikA proteins of clade A1 have several regions that are much more conserved than in the other clades. These regions may be of particular importance for circadian functionality (Figs. 1, S2, Supplementary material online). Specifically, one of these conserved motifs corresponds to the part of the N-terminal region immediately preceding the GAF domain (Fig. 3). The N-terminal region upstream of the GAF domain was previously shown to enhance phosphorylation of the HisKA domain (Mutsuda et al. 2003); however, no specific fragment of this region was identified as a major contributor to this function. The analysis of variation suggests that this fragment (motif 1) corresponds to amino acid residues 168–183 (numbering refers to positions in the CikA protein of S. elongatus PCC 7942). Notably, the whole N-terminal region preceding the GAF domain in CikA of Lyngbya sp. PCC 8106 and Arthrospira maxima CS-328 consists only of this motif (Fig. 3). This is further evidence for its functional significance. Another conserved region, motif 2, is located near motif 1 and includes residues 199–210 (Fig. 3).

Fig. 3
figure 3

Alignment of the GAF domains of the cyanobacterial CikA proteins. Motifs 1, 2, and 3 are underlined. Block arrow indicates putative cysteine ligand. Black and gray-shaded backgrounds indicate different degree of conservation (black is the most conserved). The upper numbers indicate positions in the alignment of the full sequences; the numbers on the right indicate positions in the respective sequences. Visualized using Genedoc (Nicholas and Nicholas 1997)

The C-terminal region of the GAF domain exhibits a much lower level of variation than its other regions. Within its C-terminal section, a highly conserved segment of 10 residues (pos. 309–318, motif 3) was identified in the CikA proteins (Fig. 3). A search against the functional motif databases returned no apparent homologs of this motif, and its function remains undetermined.

In the cikA genes of clade A1, the highly conserved segments in the HisKA and HATPase_c domains are located near previously identified functionally important motifs (Schmitz et al. 2000; Mutsuda et al. 2003). For example, motif 4 (pos. 578–587) is immediately adjacent to the G–X–G motif (pos. 574–576) in the G box (Fig. S2, Supplementary material online). The G-X-G motifs are critical for ATP-binding and are located in the loops that shape the top and bottom of the binding pocket (Obermann et al. 1998). The segment between these motifs is more conserved in clade A1 than in the others, suggesting that the existing tertiary structure of the pocket is extremely important for the circadian function of the HATPase_c domain and CikA. Notably, the patterns of variation within the pocket highlight its different tertiary structure among the different clades.

The REC domain does not occur in some members of clade A1: it is missing in heterocystic filamentous Nostocaceae. Unlike the other three domains, REC is most polymorphic in clade A1, specifically at the functionally important fixed positions 678Asp, 708Thr, and 727Lys in clades A2 and A3. The conserved Asp residue functions as phosphoryl acceptor, while the conserved Lys is essential for the β5α5 loop and formation of the dimer upon phosphorylation (Solá et al. 1999). Likewise, the intermolecular recognition site immediately following the phosphorylated residue (Müller-Dieckmann et al. 1999) is conserved in the other clades, while being polymorphic in A1.

The Model of the CikA GAF Domain

The quality assessment parameters of the constructed 3D model are presented in Table 2. They show that the predicted model is of good quality. The model basically follows the experimentally determined structure of the bacterial PAS-related domains (Wagner et al. 2007; Yang et al. 2007; Cornilescu et al. 2008): it consists of five antiparallel β-sheets located between two groups of α-helices (see Fig. S3, Supplementary material online). The β-sheets shape the bottom of a bilin-binding pocket that is characteristic of GAF domains. The conserved motifs 1, 2, and 3 correspond to helix α1 and sheets β1 and β5, respectively.

Table 2 Quality assessment parameters for the constructed 3D structure of the GAF domain

Evolutionary Constraints Associated with Functional Specialization of the CikA Proteins

The results of the Conseq analysis (Fig. S3, Supplementary material online) suggest that the conserved motifs identified in the CikA proteins may be functionally and/or structurally important. Motif 1 seems to be primarily functional: most of its residues are exposed and seven of them are determined as functionally important (Fig. S4, Supplementary material online). Motif 3 is mainly of structural significance, as all of its residues but one are buried. Motifs 2 and 4 are important both functionally and structurally, as they possess buried and exposed amino acid residues in about equal proportions. Furthermore, motif 4 is located near the ATP-binding pocket (Fig. 1, see above), which suggests the histidine in this position is of critical importance to CikA phosphorylation.

Evolutionary Time Estimates

We based our reconstruction of the time scale for the key events in evolution of CikA and the circadian system on the following facts (see Figs. 2, 4 for reference): (1) bona fide CikA is present only in group S1; (2) the REC domain is missing in filamentous heterocystic Nostocaceae; (3) KaiA is missing in clade S3 (Dvornyk et al. 2003). These data constrain the time of the CikA origin to the period between nodes 1 and 2; the CikA loss in S2 and S3—between nodes 3 and 5; the REC domain loss––between CP2 and node 4; the KaiA loss—between nodes 6 and 7. Table 3 presents the resulting time estimates for the respective nodes based on the results of the ML and Bayesian analyses. However, it should be taken into account that they are likely biased toward the higher values, because CP1 was placed in the node, which is apparently not basic for cyanobacteria, as other yet unknown cyanobacterial species evolutionarily older than Gloeobacter may potentially exist.

Fig. 4
figure 4

Maximum-likelihood tree with local clock of the concatenated 16S rRNA and 23S rRNA genes from cyanobacteria. Bar, 1 substitution per site. ML and posterior probabilities of the node support are shown in numerator, bootstrap proportion values are shown in denominator. ML values <0.5 and bootstrap <50 are not shown. ML and posterior probabilities values equal 1.00 are shown without decimals. S1–S4 refer to the groups of the species with different composition of the circadian system. Species with the bona fide CikA are boxed, those belonging to Nostocales and lacking the REC domain are shown in bold. Black boxes indicate the calibration points. Black circles and numbers mark nodes that correspond to the key events in evolution of CikA and the circadian system: the CikA origin—between nodes 1 and 2; the CikA loss—between nodes 3 and 5; the REC domain loss—between CP2 and node 4; the KaiA loss—between nodes 6 and 7. See text for the details

Table 3 Maximum-likelihood and Bayesian time estimates, MYA, for the nodes (Fig. 4) corresponding to the major events in evolution of CikA and the circadian system

Discussion

CikA was identified as an important input component in the kaiABC system of S. elongatus PCC 7942 (Schmitz et al. 2000) and was hypothesized to transfer a signal to the central oscillator (presumably KaiA) through the yet unknown associated response regulator (Mutsuda et al. 2003; Zhang et al. 2006). However, as the results of this study show, neither the cikA gene nor its apparent GAF-containing homologs occur in the cyanobacteria of clades S2 and S3 (Fig. 4). The species of clade S2 contain the kaiA gene, while those of clade S3 lack it. This leads to two conclusions. First, cikA was lost before kaiA. Second, if the species from either S2 or S3 possess circadian rhythmicity, they should utilize a different molecular mechanism for signal input. It was hypothesized earlier that information about the light signal may be delivered to the central oscillator not through a photoreceptor, but indirectly through sensing the redox state of the cell (Ivleva et al. 2006) by using another input element, such as LdpA (Ivleva et al. 2005). In contrast to CikA, LdpA occurs in all known cyanobacteria (Dvornyk 2005); Dvornyk, unpublished]. This suggests that the indirect transfer of the light signal might have become a primary mechanism of the circadian input after the loss of cikA. The recent results of biochemical studies provide support for this hypothesis (Holtzendorff et al. 2008; Axmann et al. 2009). On the other hand, such an indirect signal transfer mechanism might have existed before the origin of cikA.

The circadian system of the species from clade S2 (Fig. 4) is of particular interest, because it shares structural features of the two systems: it has kaiA that makes it more similar to the kaiABC system, but lacks cikA as the kaiBC system does. In addition, other features of this circadian system (e.g., the evolutionary history of the cpmA gene regulating the circadian output) position it closer to the kaiBC type (Dvornyk et al. 2003; Dvornyk et al. 2004; Dvornyk and Knudsen 2005; Dvornyk 2006b). If the kaiABC Δ system of clade S2 is as functionally transitional between the kaiABC and kaiBC as it is structurally intermediate, then studying it may provide important information about the functional evolution of the original kaiABC system into the kaiBC system.

In the original study of CikA in S. elongatus PCC 7942, this protein was classified as a non-typical bacteriophytochrome, due to the missing conserved Cys residue at position 285 (Schmitz et al. 2000) that normally serves as a bilin ligand for phytochromes (Li and Lagarias 1992). This suggests that this particular CikA may interact with another protein (possibly possessing the GAF domain) to replace the bilin acceptor (Mutsuda et al. 2003). However, given that most of the CikA proteins in clade A1 do possess this critical cysteine residue, the proposed mechanism may represent only one of the possible variants. Recent findings by Narikawa et al. (2008) support this view, and show that GAF domains retaining the conserved C285 residue may function as a violet light sensor. This, in turn, explains the functional assignment of motifs 2 and 3 (Fig. 3), which ensure proper 3D configuration of the bilin-binding pocket (Fig. S3, Supplementary material online).

A recent study of CikA in S. elongatus PCC 7942 proposed positive regulation of CikA phosphorylation by the GAF domain and negative regulation by the REC domain (Mutsuda et al. 2003; Zhang et al. 2006). Autophosphorylation is essential for the circadian function of CikA (Mutsuda et al. 2003). This process involves all three principal domains of the protein (GAF as a positive regulator, HisKA as a phosphoacceptor, and HATPase_c as an ATP binder) and therefore assumes a close inter-domain interaction. For such an interaction, a corresponding tertiary structure of the CikA protein is critical for autophosphorylation. On the other hand, since CikA belongs to the superfamily of sensor kinases of bacterial two-component signal transduction systems, it is expected to supply a phosphoryl group to a specific, yet unidentified, response regulator(s) (Schmitz et al. 2000; Mutsuda et al. 2003). This putative function of CikA assumes a close correspondence of its structure to that of the response regulator. The highly conserved motifs identified in this study (Fig. 3) are probably critical for maintaining the tertiary structure of CikA, ensuring said physical interactions both between the domains and between each domain and the response regulator.

Recently, the pseudo-receiver REC domain was confirmed as necessary for the CikA circadian function in S. elongatus PCC 7942 by entraining the clock through sensing the redox state of cellular quinones (Ivleva et al. 2006) and repressing the kinase activity of the protein (Mutsuda et al. 2003; Zhang et al. 2006). However, REC is most polymorphic in clade A1 as compared to the other clades (Table 1) and is even missing in some cyanobacteria (e.g., Nostocales). According to the principle of functional constraint in molecular evolution, functional importance of a protein or domain directly correlates with its level of conservation (Kimura 1983; Graur and Li 2000). Thus, the results of our study raise several questions about the functional significance of REC. Specifically, how is the protein autokinase activity of the REC-deficient CikA controlled? How is quinone sensing conducted? One possibility is that these REC-associated functions in some cyanobacterial species are performed by an unidentified response regulator protein.

Interestingly, two components of the cyanobacterial circadian system, CikA and SasA, which, respectively, control the input and output pathways in the kaiABC system, are autophosphorylated in vitro and have a similar domain structure (Sensor Domain–HisKA–HATPase_c) to the one that is common in two-component sensory transduction histidine kinases (Dutta et al. 1999; Iwasaki et al. 2000; Mutsuda et al. 2003). The similar domain organization of CikA and SasA suggests that they both may have originated through the common evolutionary mechanism: the fusion of the sensor domain with a double-domain two-component sensory transduction histidine kinase. Two-component sensory transduction histidine kinases are a large superfamily widely distributed in prokaryotes (Nagaya et al. 1993). Proteins of this superfamily display a diversity of domain organizations (Dutta et al. 1999) that are the apparent result of multiple gene fusion events. Recent findings suggest gene fusions have played a major role in the evolution of protein-domain architectures (Yanai et al. 2002). These fusions were probably quite common in evolution of the CikA-like proteins. While the core domains, GAF, HiSKA, and HTPase_c, exhibit relatively high similarity between the different members of clades A2–A5, the domain organization of the proteins varies greatly (Fig. 2, Fig. S1, Supplementary material online).

The aggregated data of this study and previous functional studies of cikA (Schmitz et al. 2000; Mutsuda et al. 2003; Zhang et al. 2006; Ivleva et al. 2006) provide evidence that various evolutionary mechanisms have resulted in functional specialization of cikA as a circadian gene. After the duplication of the ancestral two-component histidine kinase, cikA experienced neofunctionalization through accretion of specific domains (GAF and REC) while maintaining conservation at functionally important sites and domain architecture. While the CikA-like proteins from clades A2–A5 experienced significant diversification of the domain organization, the original CikA protein maintained its high level of conservation (Fig. S1, Supplementary material online). The acquired circadian function was then maintained by strong purifying selection in the regions conferring this function. This is a common pattern for circadian genes, which are usually more conserved than their non-circadian paralogs (Dvornyk et al. 2004; Dvornyk 2006b).

The results of the present study make it possible to roughly estimate the probable time of origin and the main events in the evolution of the cikA gene. The following time estimates are approximate, especially with regard to the very early stages of this evolution, and will be updated and corrected as additional genomic data are accumulating. According to the ML estimates, depending on the accepted date of the appearance of cyanobacteria, the lower time limit of cikA origin is placed about 2900 MYA, while the upper limit is placed about 2200 MYA (Table 3). The gene was lost in groups S2 and S3 in the period between 1050 and 600 MYA. The broad range of the estimate is due to the absence of data about the circadian system in cyanobacterial species that are phylogenetically positioned between S. elongatus PCC 7942 and Synechococcus sp. RCC307. The loss of the REC domain in filamentous Nostocales seems to have occurred before the loss of cikA, falling within the period of approximately 2200–900 MYA (Fig. 4; Table 3). The Bayesian analysis yielded similar to the above estimates for all nodes, except for nodes 3 and 5 (Table 3). The latter pull the date of the cikA loss back by almost two-fold, between 1900 and 1000 MYA. This large difference between the ML and Bayesian estimates is likely not due to the phylogenetic uncertainty, because both methods produced the same tree topology (Fig. 4). However, the most recent studies suggest that the ML analysis tends to give more accurate estimates of branch length and, respectively, divergence times (Schwartz and Mueller 2010).

The kaiABC system was initially thought to have evolved from the kaiBC system through the addition of kaiA, which presumably originated about 1000 MYA (Dvornyk et al. 2003). The recent analysis of newly available, more extensive genomic data suggests a different scenario: the current kaiBC system evolved stepwise during 1050–400 MYA from the kaiABC through the loss of kaiA and other components, including cikA (Dvornyk 2006a; Dvornyk 2009). Apparently, these losses were associated with the origin and rapid radiation of cyanobacteria in clades S2 and S3 (Fig. 4). Interestingly, the time limits of this radiation (between nodes 5 and 6, i.e., about 600–500 MYA, Fig. 4 and Table 3) correspond to the period around the well-known Cambrian explosion. Furthermore, according to the results of this study, kaiA was lost between 500 and 400 MYA (Table 3), which corresponds to the hypothesized upper time limit of the last of the three periods proposed to describe the role of UV radiation in the evolution of cyanobacteria (Garcia-Pichel 1998). This period is thought to last between 1000 and 400 MYA and was associated with the increase of atmospheric oxygen content and the formation of the earth’s ozone shield (Canfield and Teske 1996; Garcia-Pichel 1998). Recently, it was suggested that the loss of kaiA and the associated decrease in circadian oscillator robustness occurred due to the adaptation of Prochlorococcus to a temperature-stable ecological niche that does not require a robust oscillator (Holtzendorff et al. 2008; Axmann et al. 2009). This scenario may not fully explain the observed phylogenetic patterns, however, since S. elongatus PCC7942 possesses kaiA yet often occurs with Prochlorococcus in the same ecological niches (Partensky et al. 1999). Further studies may help to determine the factors that triggered the large-scale reduction of the Prochlorococcus genome (Dufresne et al. 2003) including the loss of several circadian genes.

An intriguing finding of our study is that none of the species from clades S2 and S3 have any GAF-containing genes. Obviously, these genes were present in the genome of the common ancestor of S. elongatus PCC 7942 and Synechococcus sp. RCC307 at the time point corresponding to node 3 (Fig. 4). However, for reasons yet unknown, all GAF-containing genes were lost in the lineage leading to Synechococcus sp. RCC307. Of course, this loss might occur in any time point between nodes 3 and 5, but a more accurate estimate will only be possible when more genomic data in this lineage become available. The loss of cikA in some cyanobacterial lineages suggests that the evolution of this gene and its homologs follows the birth-and-death scenario (Nei and Rooney 2005).

The results of the 16S-23S rRNA phylogenetic analysis provide some insights into molecular systematics of cyanobacteria. Specifically, assigning the species from clades S2 and S4 to the same genus Synechococcus seems unjustified. In fact, they are phylogenetically quite distant from each other as well as from S. elongatus PCC 7942. The polyphyly of the genus Synechococcus has been comprehensively discussed elsewhere (Urbach et al. 1998; Honda et al. 1999; Robertson et al. 2001). In total, up to eight Synechococcus lineages are recognized on the basis of 16S rRNA and other genes analyses (Honda et al. 1999; Robertson et al. 2001). However, this topic warrants a separate, more comprehensive investigation involving a much larger number of available strains.

All previous evolutionary studies have shown that the elements of the circadian system usually have lower variation than their apparent non-circadian homologs; this variation is specific to each type of the system and is maintained by strong purifying selection (Dvornyk et al. 2004; Dvornyk 2005). While the core genes, kaiB and kaiC, are common among the different circadian system types, the input and output signal pathways differ significantly, thereby conferring functional and selective constraints to each type. The kaiABC system originally described in S. elongatus PCC 7942 is apparently evolutionarily oldest among the three; however, it is still unclear what happened about 1050–600 MYA and 400 MYA that resulted in loss of cikA and kaiA, respectively. Further comparative evolutionary and functional studies of all types of the cyanobacterial circadian system are needed to reveal the specific molecular mechanisms that have been developed and utilized during this system’s evolution toward functional specialization.