Introduction

Light represents the most important signal able to entrain endogenous rhythms, and central oscillators play a key role in controlling daily biochemical, physiological, and behavioral rhythms (Reppert and Weaver 2001). Mammalian species possess a central oscillator, known as the circadian clock, which is located in the suprachiasmatic nuclei of the anterior hypothalamus (Vatine et al. 2009). In contrast, experimental evidence has failed to identify a central control mechanism in non-mammalian vertebrates, and it has instead been indicated that they possess more distributed local oscillators. In this group, the pineal gland appears to play a central role in the endogenous response to light and melatonin secretion as early as 19 h post-fertilization (Danilova et al. 2004).

The circadian clock system plays fundamental roles in anticipating the day/night cycle and coordinating the molecular oscillator that generates an endogenous rhythm that is close to 24 h. Thus, the photoperiod regulates gene expression through the central clock, generating feedback and ultimately producing an endogenous rate (Devlin 2002). Genes involved in circadian rhythms have been identified through genetic screening in a variety of organisms and have been classified into six groups. The period circadian clock gene family (PER) is composed of three genes (PER1, PER2, and PER3), which are located on three separate chromosomes and perform critical functions in regulating the circadian rhythms of locomotor activity, metabolism, and behavior. The cryptochrome circadian clock genes (CRY1 and CRY2) are also located on separate chromosomes; these genes encode a flavin adenine dinucleotide-binding protein that belongs to the circadian core oscillator complex. The CLOCK/NPAS2 genes encode transcription factors that bind to the upstream regions of PER and CRY genes, activating their transcription. The ARNTL/ARNTL2 genes are also located on separate chromosomes and encode a basic helix-loop-helix transcription factor that forms a heterodimer with CLOCK and then binds enhancer elements upstream of the PER and CRY gene systems. The casein kinase 1 epsilon (CSNK1E) and delta (CSNK1D) genes encode a serine/threonine protein kinase involved in the regulation of circadian rhythms through the phosphorylation of members of the PER gene family. Finally, the Timeless gene is another key component of circadian rhythms, which interacts with the PER genes and others to downregulate the activation of PER1 by CLOCK/ARNTL.

Although the genetic regulation of circadian rhythms appears to be similar in fish, insects, and mammals (Dunlap 1999), the numbers of gene copies involved in this process vary among these groups. For example, rodents possess a more complex repertoire that is characterized by three PER genes, two CRY genes, two CLOCK genes, two ARNTL genes, two CSNK1 genes, and one timeless gene, whereas species of the genus Drosophila possess only one gene copy per family (Reppert and Weaver 2001). Furthermore, zebrafish (Danio rerio) exhibits an increased number of genes (Fig. 1) compared with that of rodents. These differences have been linked to the genome duplication that occurred in the last common ancestor of teleost fish between 320 and 400 mya (Amores et al. 1998, 2011; Postlethwait et al. 2000; Taylor et al. 2001, 2003; Van de Peer et al. 2003; Hoegg et al. 2004; Jaillon et al. 2004; Meyer and Van de Peer 2005; Kasahara et al. 2007; Sato and Nishida 2010).

Fig. 1
figure 1

Genomic structure of the gene families associated with the circadian clock of teleost fish

This teleost-specific genome duplication (TGD) is thought to have provided abundant raw genetic material to allow biological innovations that facilitated the radiation of the group (Amores et al. 2004, 2011; Postlethwait et al. 2000; Taylor et al. 2001, 2003; Hoegg et al. 2004; Jaillon et al. 2004; Kasahara et al. 2007; Sato and Nishida 2010; Opazo et al. 2013) into a wide variety of lineages inhabiting both marine and freshwater environments, making teleosts the most diverse vertebrate group (Meyer and Van de Peer 2005; Nelson 2006). This conclusion is supported by the many studies that have shown a link between new genes that originated during the TGD and the evolution of new functions in a variety of teleost lineages (Lister et al. 2001; Mulley et al. 2006; Hashiguchi and Nishida 2007; Hoegg and Meyer 2007; Sato and Nishida 2007; Siegel et al. 2007; Yu et al. 2007; Douard et al. 2008; Arnegard et al. 2010; Opazo et al. 2013).

Most of the studies on the genetic structure of the circadian clock in teleost fish have targeted individual genes (Wang 2008a, b, 2009), although none has taken a more holistic, multi-gene approach to understand the evolution of the complete genetic machinery associated with the regulation of endogenous rhythms. Accordingly, the goal of this study was to assess the relative contributions of whole-genome duplication and small-scale gene duplication in generating the diversity within the genetic machinery associated with the regulation of endogenous rhythms. To achieve this goal, we annotated genes from six gene families associated with the circadian clock in eight teleost fish species with available genomes and reconstructed their evolutionary history by inferring phylogenetic relationships. The results of our comparative analysis indicated that teleost species possess a variable repertoire of genes associated with the circadian clock gene families and that the actual diversity of these genes has been shaped by a variety of phenomena, such as a complete deletion of ohnologs, differential retention of genes, and lineage-specific gene duplications. From a functional perspective, the subfunctionalization of two ohnolog genes (PER1a and PER1b) in zebrafish highlights the power of whole-genome duplication in the generation of biological diversity.

Materials and Methods

DNA Sequence Data

We used bioinformatics tools to manually annotate the complement of genes in the genomes of eight teleost fish species available from the Ensembl database (Supplementary Table 1) (zebrafish, Danio rerio; Atlantic cod, Gadus morhua; fugu, Takifugu rubripes; stickleback, Gasterosteus aculeatus; medaka, Oryzias latipes; platyfish, Xiphophorus maculatus; Nile tilapia, Oreochromis niloticus; and Tetraodon, Tetraodon nigroviridis). When available, sequences from the spotted gar (Lepisosteus oculatus), a non-teleost ray-finned fish, were included as well. We also included sequences from five tetrapod species as an outgroup for phylogenetic analyses (western clawed frog, Xenopus tropicalis; anole lizard, Anolis carolinensis; chicken, Gallus gallus; mouse, Mus musculus; and human, Homo sapiens).

We compared the annotated exon sequences obtained from public databases (e.g., GenBank) to unannotated genomic fragments (Supplementary Table 1) obtained from ENSEMBL using the program Blast2seq (Tatusova and Madden 1999), which is available from NCBI (http://www.ncbi.nlm.nih.gov/blast/bl2seq/wblast2.cgi). An intact open reading frame with the canonical structure typical of vertebrate genes characterized the putative functional genes, whereas pseudogenes were identifiable based on their high sequence similarity to functional orthologs and the presence of inactivating mutations and/or a lack of exons. To distinguish among tandemly arrayed gene copies, we indexed each gene copy with the symbol T, followed by a number corresponding to the linkage order in the 5 to 3′ orientation; thus, the first gene in the cluster is labeled T1, the second T2, and so forth.

Phylogenetic Inference

We estimated phylogenetic relationships separately for each gene family in the teleost fish. We performed maximum likelihood and Bayesian analyses, as implemented in the programs Treefinder, March 2011 version (Jobb et al. 2004), and Mr. Bayes, v3.1.2 (Ronquist and Huelsenbeck 2003), respectively. We also reconstructed the phylogenetic relationships using maximum likelihood as implemented in the program CodonPhyML (Gil et al. 2013). This approach employs a more realistic description of the evolutionary process at the protein-coding sequence level by incorporating the structure of the genetic code into the model. Sequence alignments were carried out using the G-INS-i strategy in Mafft v6.8 (Katoh et al. 2009). The best fitting models for each codon position were estimated separately using the proposed model routine from the program Treefinder, March 2011 version (Jobb et al. 2004). In the maximum likelihood analysis, we estimated the best tree under the selected models and assessed support for each of the nodes with 1,000 bootstrap pseudoreplicates. In the Bayesian analysis, two simultaneous independent runs were performed for 30 × 106 iterations of a Markov Chain Monte Carlo algorithm, with six simultaneous chains sampling trees every 1,000 generations. Support for the nodes and parameter estimates was derived from a majority rule consensus of the last 15,000 trees sampled after convergence. The average standard deviation of split frequencies remained at 0.01 after the burn-in threshold. For the maximum likelihood approach implemented in the program CodonPhyML (Gil et al. 2013), the model described by Goldman and Yang (1994) involving a subtree pruning and regrafting (SPR) heuristic search with 5 random starting trees was employed to reconstruct the phylogenetic relationships. Support for the nodes was assessed using the aBayes method (Anisimova et al. 2011).

Results

For each species, we identified and characterized all of the structural genes belonging to the circadian clock gene families (Fig. 1). We observed that most of the putative functional genes fell into separate clades that did not deviate significantly from the expected organismal phylogenies. In some cases, we observed differential retention of genes that originated in the last common ancestor of the group, whereas in others, lineage-specific duplications and the deletion of complete sets of ohnologs were observed (Fig. 1).

Phylogenetic reconstruction of the PER gene family recovered three main groups (Fig. 2). Within each group, the monophyly of the teleost ohnologs was well supported, as were the sister group relationships among them (Fig. 2). When sequences from the spotted gar were included (e.g., PER1 and PER3), they were recovered as a sister group to the clade containing teleost ohnologs (Fig. 2). The clade that included the tetrapod sequences was recovered as a sister group to the teleost fish clade (Fig. 2). We note that the PER3b gene was lost in all of the examined species, and differential retention of PER1a was observed in zebrafish (Fig. 2).

Fig. 2
figure 2

Maximum likelihood phylogenetic trees depicting the relationships of the period circadian clock gene family (PER; left), the cryptochrome circadian clock gene family (CRY; center), and the aryl hydrocarbon receptor nuclear translocator-like gene family (ARNTL; right). Values on the relevant nodes denote the bootstrap support values, Bayesian posterior probabilities, and aBayes posterior probabilities

For the genes associated with the CRY gene family, our tree recovered the sister group relationship between the CRY1a and CRY1b clades with strong support (Fig. 2). The latter gene was independently lost in Atlantic cod and in the last common ancestor of fugu and Tetraodon (Fig. 1). Additionally, the CRY1a gene was independently duplicated in fugu (Figs. 1, 2). The monophyly of the CRY2a gene was strongly supported (Fig. 2); this gene was independently lost in fugu and zebrafish (Fig. 1) and was duplicated in Tetraodon (Figs. 1, 2). The ohnolog CRY2b was not found in any of the teleost fish species analyzed, and we assume that it was lost in the last common ancestor of the group (Fig. 1). In zebrafish, we found two additional genes, CRY3-T1 and CRY3-T2, that were recovered as a sister group to the CRY2 clade (which included tetrapods). We believe that this lineage represents an old gene that originated in the common ancestor of tetrapods and fish and was only retained in zebrafish, where it was later duplicated (Figs. 1, 2).

For the ARNTL/ARNTL2 genes, we recovered a clade containing the ARNTLa genes of all teleost species as a sister to the ARNTLb gene of zebrafish (Fig. 2), which was the only species that retained this gene (Fig. 1). The single-copy ARNTL gene of the spotted gar was recovered as a sister to this clade (Fig. 2). We also recovered the sister group relationship of the ARNTL2a and ARNTL2b clades (Fig. 2). The first gene was lost in the stickleback, whereas the second was independently lost in zebrafish and Atlantic cod (Fig. 1).

For the casein kinase 1 epsilon (CSNK1E) and delta (CSNK1D) genes, our phylogenetic analysis indicated a sister group relationship between the CSNK1Ea and CSNK1Eb clades, which was in turn recovered as a sister to the CSNK1E clade of tetrapods (Fig. 3). A similar pattern was found for the CSNK1Da and CSNK1Db genes (Fig. 3). In zebrafish, we identified an additional gene, CSNK1Dc, that was recovered as a sister group of the CSNK1D clade of teleost fish. It can be assumed that CSNK1Dc is an ancient gene that originated in the common ancestor of teleost fish and was only retained in zebrafish (Figs. 1, 3). We also observed examples of the differential retention of genes (Fig. 3), where genes that originated in the last common ancestor of teleost fish were later lost in certain species (Fig. 3). For example, CSNK1Db was independently lost in the last common ancestor of fugu and Tetraodon and in the common ancestor of medaka and Nile tilapia (Fig. 1).

Fig. 3
figure 3

Maximum likelihood phylogenetic trees depicting the relationships of the casein kinase 1 genes (CSNK1; left), clock circadian regulator/neuronal PAS domain protein 2 genes (CLOCK/NPAS2; center), and timeless circadian clock genes (right). Values on the relevant nodes denote the bootstrap support values, Bayesian posterior probabilities, and aBayes posterior probabilities

In the case of the CLOCK/NPAS2 gene family, we recovered the sister group relationship between the CLOCKa and CLOCKb clades of teleost fish (Fig. 3). The single-copy CLOCK gene of the spotted gar was recovered as a sister to this clade (Fig. 3). This copy of the CLOCKa gene was independently lost in Atlantic cod, stickleback, and medaka (Fig. 1), whereas the CLOCKb gene was only lost in platyfish. Among the NPAS2 genes, we only identified orthologous genes corresponding to the NPAS2a gene, and these genes were found in all species of teleost fish except for the Nile tilapia (Fig. 1).

Finally, the evolutionary pattern of the timeless gene has been simple, as we only identified a single-copy gene, timeless-a, in all teleost species examined. All of these sequences fell within a well-supported clade (Fig. 3), which was in turn recovered as a sister to the single-copy gene identified in the spotted gar (Fig. 3).

Discussion

Our findings regarding the circadian clock gene families of teleost species are consistent with the TGD that occurred in the stem lineage of this group (Amores et al. 1998, 2011; Postlethwait et al. 2000; Taylor et al. 2001, 2003; Van de Peer et al. 2003; Hoegg et al. 2004; Jaillon et al. 2004; Meyer and Van de Peer 2005; Kasahara et al. 2007; Sato and Nishida 2010). The presence of ohnologous clades and their phylogenetic arrangement as sister groups in all of the examined species indicate the distinctive features of genomes that have experienced an extra round of genomic duplication (Christoffels et al. 2004; Brunet et al. 2006; Amores et al. 2011). Furthermore, the phylogenetic position of the spotted gar, a non-teleost ray-finned fish that was not affected by the TGD (Figs. 2, 3), as a sister to the clade containing a pair of ohnologous clades, is also consistent with the TGD. However, given that the extra round of whole-genome duplication occurred between 320 and 400 mya, lineage-specific dynamics following the TGD would have been responsible for the gene repertoire found in extant species (Fig. 1). In teleost fish species, the actual gene diversity would have been shaped by a variety of phenomena, such as the complete deletion of ohnologs (e.g., PER3b, NPAS2b, and timeless-b), the differential retention of genes (e.g., CRY1b, ARNTL2b, and CSNK1Ea), and lineage-specific gene duplications (e.g., CRY1a-T2, CLOCKa-T2).

To date, few studies have provided insight regarding the evolutionary history of genes related to the circadian clock of teleost fish, and those that have were conducted by analyzing individual groups of genes (Wang 2008a, b, 2009). From a comparative and functional perspective, the studies conducted by Wang (2008a, b, 2009) have made important progress regarding evolution of the ARNTL, PER, and CLOCK/NPAS2 gene families. From a phylogenetic perspective, our results regarding the ARNTL gene family are similar to those reported by Wang (2009) but with some discrepancies. For example, in Wang’s phylogenetic reconstruction, he recovered a sister group relationship between the zebrafish ARNTLa and ARNTLb genes, which is a topology that could be attributable to a gene conversion event. In the present study, we were able to resolve the orthology among the ARNTLa genes, as we recovered a clade containing all ARNTLa genes from teleost fish that did not deviate significantly from the expected organismal phylogeny (Fig. 2).

The sister group relationship between the zebrafish ARNTLb gene, the unique teleost species that retained this gene (Fig. 1), and the ARNTLa clade indicates a homology relationship derived from a whole-genome duplication event, i.e., ohnology. Although our results regarding the PER gene family are essentially the same as those of Wang (2008b), it is worth highlighting that zebrafish was the only species that retained the PER1a gene, that the PER3b gene was lost in the last common ancestor of all teleost fish, and that PER3a was independently lost in two teleost species (Fig. 1; Wang 2008b). Similar comparative results were obtained for the CLOCK/NPAS2 gene family, except for that we found an extra copy of the CLOCKa gene in the Nile tilapia, which we refer to as CLOCKa-T2 (Figs. 1, 3).

The pattern observed for the CRY1/2 gene family corresponds to a combination of various evolutionary models (Fig. 2). This gene family is characterized by a complete deletion of the CRY2b gene (Fig. 1) and by differential retention of the CRY1b and CRY2a, CRY3-T1, and CRY3-T2 genes. Additionally, lineage-specific duplications of the CRY1a and CRY2a genes in the lineages leading to fugu and Tetraodon have also been found (Fig. 2). It is worth noting that the CRY3 gene represents an old gene lineage that originated in the last common ancestor of tetrapods and fish but was only retained in zebrafish (Fig. 2). Although the functional diversity of these genes is not fully understood, it has been demonstrated that they oscillate in a circadian manner in zebrafish, showing various patterns and playing a variety of roles in the molecular clock (Kobayashi et al. 2000). A process of differential retention at each locus characterizes the mode of evolution of the CSNK1 gene family. Finally, the simplest scenario was found for genes belonging to the timeless gene family, in which one ohnolog, timeless-b, was lost in the last common ancestor of teleost fish, and the other ohnolog, timeless-a, was retained as a single-copy gene in all of the examined species.

Although it is important to understand the evolutionary fate of the genetic machinery underlying the molecular clock, linking this machinery to its functional consequences is one of the major goals of evolutionary biology. In this regard, Wang (2008b) determined that the duplicated copies of the PER1 gene found in zebrafish show distinct patterns of temporal and spatial expression. According to his functional evidence, when the fish are in a dark environment, the PER1a gene is expressed in the retina and in both the telencephalon and the diencephalon of the forebrain, and there is no detectable expression of PER1b. However, when the fish are exposed to light, the expression of PER1a is significantly upregulated, and the PER1b gene is constitutively expressed throughout the head region, including the brain and retina (Wang et al. 2008b). This pattern represents a clear example of subfunctionalization, in which each daughter copy adopts part of the function of the parental gene (Force et al. 1999).

Finally, we wish to offer a note of caution regarding variation in the quality of the genomes of the surveyed species. In the literature, it has been reported that genome assembly has an impact on gene prediction (Florea et al. 2011). Hence, future improvements in fish genome assemblies will help us to better understand gene copy number variation in the gene families associated with the circadian clock of teleost fish.

Concluding Remarks

Our study revealed that following TGD, not all ohnologous genes were equally retained in the genomes of the species surveyed. In most gene families, at least one ohnolog was lost in the last common ancestor of teleost fish. In cases in which genes were retained, lineage-specific dynamics were responsible for the gene repertoire in present-day species. This mode of evolution highlights the role of the differential retention of relatively old genes in shaping the actual gene complement in teleost fish. Additionally, by connecting our comparative results with those of functional studies, we have provided examples of subfunctionalization, in which daughter copies adopt a portion of the function of their parental gene, as was observed for the PER1 duplicates. Taking all of these findings together, this study provides further evidence that teleost-specific genome duplication fueled key innovations in the genes related to the circadian clock, highlighting that whole-genome duplications have played a fundamental role in the origin of evolutionary novelties.