Introduction

Molecular sequences are a rich source of information for dating the divergence of species, especially when paleontological or biogeographical evidence are not available. However, molecular dating remains a difficult methodological field, and many molecular date estimates have remained contentious (Pulquério and Nichols 2007; Miller et al. 2009; Subramanian et al. 2009). One of the difficulties is the interpretation of fossils used as calibration points in molecular phylogenies (Parham and Irmis 2008). Benton and Donoghue (2007) reviewed a large literature on fossils in order to provide a set of standard calibration points for metazoans. Santini et al. (2009) complemented this work by providing a list of calibration points for fishes. However, the use of molecular dating methods remains difficult if one is interested in lineages where no calibration point is available.

The Coelacanthimorpha is a subclass of vertebrates that has flourished from the Paleozoic (Upper Devonian) until the end of the Mesozoic (Upper Cretaceous). Among the nine recognized families (Nelson 2006), only one, Latimeriidae, has known living representatives with two currently described species: Latimeria chalumnae (Smith 1939) and L. menadoensis (Pouyaud et al. 1999). Since the Coelacanthimorpha originated ca. 400 million years ago (Myr), the study of the divergence among living coelacanths is a unique opportunity to investigate the evolutionary dynamics of this lineage considered as “living fossils”. During about a decade, only a single specimen of L. menadoensis was available for scientific investigation since the first specimen was not conserved (Erdmann 1999; Erdmann et al. 1999). The capture of a third coelacanth off Manado (North Sulawesi, Indonesia) in May 2007 gave a new opportunity to investigate the divergence between the two species of Latimeria.

There are two main hypotheses on the origin of the living species of coelacanths. Springer (1999) hypothesized that a large population of coelacanths present in the Indian Ocean was fragmented after the collision of the Indian subcontinent with the Eurasian plate 50 Myr. From this hypothesis, the date of the divergence between L. chalumnae and L. menadoensis is expected to be around 40–50 Myr. On the other hand, Gordon (1998) pointed out the existence of a transoceanic flow from the West Pacific up to the Southwest Indian Ocean, so that the divergence between both species may be much more recent because coelacanths may disperse, more or less passively, along this transoceanic current. However, dating this divergence is a difficult exercise because this speciation event could have occurred, at least in theory, anytime between 400 Myr and the present. Thus, molecular dating methods that use calibration points are difficult to apply because these points will be far from the focus event. Pouyaud et al. (1999) used a simple molecular clock method with two mitochondrial genes (12S and cytochrome b) and estimated the divergence date between both species between 1.22 and 1.42 Myr. Holder et al. (1999), using a larger set of mtDNA sequences (4,823 bp), found older estimates: 4.7–6.3 Myr. Inoue et al. (2005) used the complete sequence of the mitochondrial genome (mtGenome) and Bayesian methods to estimate the same divergence at around 40 Myr. It should be noted that in these three studies, the DNA of the same individual of L. menadoensis was used. So, the present evidence is not definitive, and it does not seem possible to reject either hypothesis.

In the present paper, we report a new assessment of this issue using the data of the recently captured individual of L. menadoensis. Specifically, we studied the divergence between L. chalumnae and L. menadoensis by investigating in detail the mitochondrial genomic divergence between these two species. We compared the divergence between the two species of Latimeria and the divergence between other pairs of species for which complete mtGenome sequences are available and the divergence date between them is known. We selected the two species of chimpanzees: Pan troglodytes and P. paniscus. Various molecular studies have dated the divergence between these two species between 0.86 and 2.4 Myr (Raaum et al. 2005; Won and Hey 2005; Perry et al. 2007). Furthermore, hominid fossils give an older bound for this date since the species of chimpanzees have speciated after the common ancestor of Homo and Pan. Benton and Donoghue (2007) recognized Sahelanthropus, dated between 7 and 10 Myr, as this common ancestor. Consequently, the most recent common ancestor of P. troglodytes and P. paniscus cannot be older than 10 Myr. Therefore, if the divergence between L. chalumnae and L. menadoensis is not significantly greater than the divergence between P. troglodytes and P. paniscus, we may reject the hypothesis that the two species of coelacanths have diverged through vicariance and conclude that Gordon’s (1998) hypothesis is the most likely one.

The complementary test is to compare the divergence between the two species of coelacanths with a pair of species that diverged 40–50 Myr: if these divergences are not significantly different, then Springer’s (1999) hypothesis could be accepted. We selected two pairs of species: Bos taurus and Sus scrofa, and Takifugu rubripes and Tetraodon nigroviridis. Benton and Donoghue (2007) give the divergence of these two pairs as 48.3–53.5 and 32.25–56 Myr, respectively. We also conducted a relaxed-clock molecular analysis using a Bayesian method.

Materials and methods

DNA extraction

The genomic DNA of the second specimen of L. menadoensis was extracted from the pieces of gills that were preserved in absolute ethanol at −20°C using the Genomic DNA Mini Kit developed by Geneaid Biotech, Ltd. (Taipei, Taiwan).

PCR and sequencing

The whole mtGenome was amplified using 24 couples of primers defined on the sequence of the holotype of L. menadoensis (GenBank accession number NC 006921; Inoue et al. 2005). The primer sequences are given in Table 1. The PCRs were done using Illustra Puretaq ready-to-go PCR Beads GE Healthcare (Chalfont St Giles, UK) and with the following amplification program: an initial denaturation step at 95°C for 2 min, a cycle of three steps (denaturation at 94°C for 1 min; annealing at 60°C for 2 min, elongation at 72°C for 1 min) repeated 30 times, and a final extension of 10 min at 72°C. Double-stranded PCR products were purified with a Gel/PCR DNA Fragments Extraction Kit developed by Geneaid Biotech and were subsequently used for direct cycle sequencing with dye-labeled terminators by Macrogen, Inc. (Rockville, MD) using the same primers as those for PCR. The sequence has been deposited in GenBank (accession number: GQ911586).

Table 1 Primers used for PCR of Latimeria menadoensis mtDNA

Additional complete mtGenome sequences were obtained from GenBank (Table 2). Sasaki et al. (2007) showed convincingly that the sequence published by Zardoya and Meyer (1997, NC 001804) has sequencing errors so that it is not usable.

Table 2 Sequences from GenBank used in the present study

Statistical analyses

Sequences were aligned with clustal (Thompson et al. 1997). All nucleotide sequence manipulations and analyses were done with R 2.10.1 (R Development Core Team 2009) and the package ape 2.5 (Paradis et al. 2004). Nucleotide sequences were translated into amino acid sequences with the package seqinr 2.0-8 (Charif and Lobry 2007) in order to examine the patterns of amino acid changes. The patterns of nucleotide substitutions between both pairs of species of Latimeria and Pan were compared using loglinear regression models assuming a multinomial distribution for the observations. The principle of this method is to generalize the idea of the logistic regression, which applies to data that follow a binomial distribution, i.e. data with two possible outcomes. The multinomial regression considers variables with three or more outcomes. Therefore, it may be applied to the patterns of nucloetidic differences between several pairs of sequences where the outcomes are no change, transition, and transversion. It is also possible to consider more categories of possible outcomes. With the four DNA bases, a comparison of two sequences may lead to 16 possible categories. If we consider that changes are time-reversible, then 10 categories will be considered. However, in the present analyses, we found that considering the three above categories was sufficient.

For a multinomial variable Y with k categories, we have to consider k − 1 probabilities (because of the constraint that they must sum to one). The loglinear regression models the logarithmic ratio of these probabilities on the probability of the last category (Agresti 2002):

$$ \ln{\frac{{\hbox{Pr}}(Y=y_j)} {\hbox{Pr}(Y=y_k)}}\quad j=1, \ldots, k-1 $$
(1)

where y 1,...,y k are the values taken by Y. These k − 1 ratios are then modeled with a standard regression model with predictors x 1x 2,..., which may be continuous and/or categorical like in any regression:

$$ \ln{\frac{{\hbox{Pr}}(Y=y_j)} {\hbox{Pr}(Y=y_k)}} = \beta_{j1}x_1 + \beta_{j2}x_2 + \cdots\alpha_j $$
(2)

where βj1, βj2,...,α j are the parameters for the jth regression.

We expected that the number of nucleotide changes are affected by the time separating each species within a pair but also by the position along the sequence of the mtGenome. So, the predictors x in Eq. 2 were ‘gene’, ‘codon position’, and ‘pair of species’ (or ‘lineage’). The data were prepared as follows: for each gene, codon position, and pair of species, the number of transitions, transversions, and no change (y 1, y 2, and y 3, respectively) were counted and associated with these predictors. The last category was used as the reference (since here k = 3). All predictors were considered as categorical. For non-coding loci, codon position was coded as a fourth category. Several models were fitted using these predictors separately or in combination, including a null model (i.e., the probabilities Pr(Y = y j ) are the same along the mtGenome and among all lineages).

Since there are 38 genes in the mtGenome, a model with all three predictors cannot be statistically identified because not all categories of nucleotide change were represented for all combinations of predictors: there was no transversion for ND4L and ATP8 and on the second codon position for all protein-coding genes but ND2. This codon position was monomorphic for three genes (ND4L, COX1, ATP8). Even the model gene + lineage has 82 parameters to be estimated that would be difficult to interpret. Furthermore, substitution rates may not be significantly different between different genes. To test whether variation within different categories of genes is significant, we pooled the data for the protein-coding, for the rRNA, and for the tRNA genes, respectively, thus creating a new predictor labeled ‘gene2’ with four categories (D-loop was kept apart). Since gene and codon position are partly confounded, we pooled together the data from the different positions as if they were from three genes, thus creating a new variable ‘gene3’ with six categories (the three codon positions, rRNA, tRNA, and D-loop). Finally, we tested whether divergence among lineages of the same approximate age was different by creating a variable ‘lineage2’ pooling Latimeria and Pan together on one side, and the pairs Bos + Sus and Takifugu + Tetraodon on the other.

The models were fitted by maximum likelihood, so that they can be compared with their respective Akaike information criterion (AIC; Akaike 1973), which is equal to minus twice the log-likelihood plus twice the number of estimated parameters of the model. The model with the lowest AIC was selected. This procedure can thus be used for hypothesis testing. Additionally, we performed likelihood-ratio tests (LRT) to compare some pairs of models when relevant. After selecting a model, the probabilities for each predictor can be calculated by inverting Eq. 2. The loglinear modeling analysis was done with the R package VGAM (Yee 2009).

Molecular dating was done with the Bayesian method developed by Thorne et al. (1998) and Thorne and Kishino (2002) where rates of molecular evolution are assumed to vary among branches in a phylogenetic tree but in a correlated way between adjacent branches (relaxed molecular clock). Two data sets were used: the first one used the same data than above, and the second one used additional fish calibration points taken from Santini et al. (2009). In both data sets, the phylogeny was unambiguous. The branch lengths were estimated with PAML 4.3 (Yang 2007). These estimates were used by Multidivtime that runs a Markov chain Monte Carlo (MCMC) to calculate the posterior distribution of the node times of the phylogeny. The MCMC was sampled 104 times every 100 cycles, and 105 burn-in cycles were discarded. The sequences were dispatched in five partitions: the three codon positions of protein-coding genes, t-RNA, and rRNA. The D-loop was discarded because of its fast evolving rate and many insertions–deletions, as well as ND6 (see Inoue et al. 2005). All genes were aligned separately prior to building the partitions. This analysis was run from R with the LAGOPUS package (Heibl 2010).

Results

Sequence alignment

The alignment of the four complete mtGenomes of Latimeria using clustal with its default parameters (gap opening = 15, gap extension = 6.66) resulted in the insertion of six gaps in L. menadoensis and seven in L. chalumnae. An alternative alignment was obtained by setting gap opening = 60 and gap extension = 60: it has two fewer indels (one in each sequence) located in the D-loop, but this results in eight additional base mismatches. Additionally, a gap has been placed differently within the sequence of the tRNA-Trp (certainly because of the increased penalty for gap opening) resulting in three new mismatches. Therefore, we considered the first alignment in the subsequent analyses.

The comparison of the first sequenced coelacanth from Manado with the new individual showed six differences (Table 3). Referring to the sequences of L. chalumnae after alignment, it appeared that these six sites were identical between this species and the new individual. Additionally, none of these sites were present in the fastest evolving portions of the mtGenome (e.g., D-loop), and of the five differences observed in protein-coding genes, two occurred on the second codon position, which is the slowest evolving position; the three others occurred on the third codon position. Thus, we suggest that these six differences are the results of sequencing errors in NC_006921, and, therefore, that both individuals have identical mtGenomes.

Table 3 The six positions where the two mtGenome sequences of L. menadoensis differ

Loglinear modeling

The best model selected by AIC is the one with the predictors ‘gene3’ and ‘age’ (Table 4). The model that comes in second in terms of AIC has ‘lineage’ instead of ‘lineage2’. Both models have very close values of log-likelihood meaning that the four additional parameters involved in ‘lineage’ do not signigicantly improve the fit. Indeed, the LRT comparing the two models was not significant (χ 24  = 1.22, P = 0.875).

Table 4 Results of loglinear modeling

Interestingly, the model assuming four categories of substitution rates (for each codon position and the non-coding loci, ‘pos’) fitted better than the model with different substitution rates for each gene (‘gene’), which is clearly over-parameterized.

Table 5 shows the predicted probabilities of substitution from the selected model. These probabilities clearly show the contrast between the two groups of lineages. Overall, they conform to what is known on the rates of molecular evolution. It is interesting to note that for fastest evolving portions of the mtGenome, namely third codon positions and D-loop, the transition (Ts) and transversion (Tv) probabilities are markedly different for the Latimeria and Pan lineages, whereas they were close for the two others. The values of the two types of RNA were also very similar for each category of lineages.

Table 5 Probabilities of change predicted from the model selected with the lowest AIC

Nucleotide differences

In the following analyses, we considered only the two species of Latimeria and of Pan. Figure 1 shows the Kimura (1980) evolutionary distances between each pair of species computed separately for each protein-coding gene and the three codon positions. Overall, the distances are similar between both pairs of species. The similarity is stronger for the third position, which is the most selectively neutral one within the codon. Interestingly, the three shortest sequences (ND3, ND4L, ATP8, all less than 500 bp) show substantial differences in these distances.

Fig. 1
figure 1

Evolutionary distances within Latimeria and Pan for the 13 mitochondrial protein-coding genes and the three codon positions separately

The evolutionary distances among the 22 tRNA sequences (Fig. 2) were also overall similar between both pairs of species, though with more variation than for the protein-coding genes, which is certainly due to the short sequences of the tRNAs (all less than 80 bp).

Fig. 2
figure 2

Evolutionary distances within Latimeria and Pan for the 22 tRNA genes

The evolutionary distances for the rRNA sequences are slightly larger between the two Latimeria species than for the Pan ones (Fig. 3). On the other hand, this was the opposite for the D-loop sequences where the distance among Latimeria was about half that among Pan (Fig. 3).

Fig. 3
figure 3

Evolutionary distances within Latimeria and Pan for the two rRNA genes and D-loop

Amino acid differences

Each protein-coding gene sequence was translated into its amino acid sequence using the vertebrate mitochondrial genetic code, and after eventually inverting the sequence of the ND6 gene, which is coded on the opposite strand of the DNA molecule (see Zardoya and Meyer 1997). The amino acid sequences were then analyzed with the same procedure as for the nucleotide sequences, counting each substitution type for each molecule for each genus.

Between L. menadoensis and L. chalumnae, 112 amino acid substitutions were observed, while there were 95 between P. troglodytes and P. paniscus. Among the 190 possible types of amino acid changes, 29 were observed between the Latimeria species, while 30 were observed between the Pan ones (21 were common to both pairs). The Ala–Thr substitution was the most frequent one in both genera. The second most frequent substitution (Ile-Val) was also the same for both genera. Overall, both matrices of amino acid substitution frequencies are consistent between both genera in the sense that substitutions in middle frequencies (5–10 occurrences) were observed in both sides.

Figure 4 shows how the types of amino acid substitutions were observed for each of the 13 genes. Both genera yielded similar results, with most substitutions in the ND5 gene with several Ile-Val changes.

Fig. 4
figure 4

Patterns of amino acid substitutions within Latimeria and Pan for the 13 mitochondrial protein-coding genes. The gray level indicates how many each substitution type (as rows) occurred between each pair of species

Bayesian molecular dating

The credibility intervals of the divergence dates calculated from their posterior distributions are indicated in Fig. 5. Except for the divergence between the two species of Latimeria, all dates were constrained with upper and lower bounds taken from Benton and Donoghue (2007) and Santini et al. (2009). The credibility interval for the Latimeria divergence was 21 ± 4 when using the same species than above. However, it was 28 ± 6 after including the fish sequence and calibration data.

Fig. 5
figure 5

Bayesian molecular dating of the divergence between the two species of Latimeria a using calibration points from Benton and Donoghue (2007), b using calibration points from Benton and Donoghue (2007) and Santini et al. (2009). The boxes give the credibility intervals for the divergence dates

Discussion

The sequence of the mtGenome of a second individual of L. menadoensis showed that it is similar to the previously sequenced one. The six observed differences between these two individuals are most likely attributable to sequencing errors from Inoue et al. (2005). This observed genetic homogeneity is not a surprising result. Sasaki et al. (2007) sequenced the complete mtGenome of two individuals of L. chalumnae from Tanzania and found four differences (one on a third codon position of ND5, one on tRNA-Gln, and two on D-loop). Schartl et al. (2005) found no variation in the cytochrome b sequences of 47 individuals of L. chalumnae from a much larger geographic area and only six variable sites in the D-loop allowing to define six haplotypes. All these results provide evidence for a very low mitochondrial genetic diversity in the two species of Latimeria. The low population numbers, the long sexual maturation time, and low fecundity of females are factors that decrease the effective population size of females, and therefore lead to lower variability of the matrilinally inherited mtGenome. Schartl et al. (2005) also analyzed microsatellite data and found low levels of heterozygosity further suggesting that effective population sizes are low in L. chalumnae.

The patterns of nucleotide substitutions between L. chalumnae and L. menadoensis were very similar to those observed between P. troglodytes and P. paniscus. This was particularly the case for the third codon position of the protein-coding genes. It is generally admitted that these nucleotide sites are the most neutral among all in the genome because of the degeneracy of the genetic code (Kimura 1983; Nei and Kumar 2000). Furthermore, there was congruence between the difference in divergence between both pairs of species and the length of gene sequences: the longer the sequence, the closer the distance between Latimeria and Pan. This can be explained by the variance of the substitution process that leads to higher observed variance for the shortest sequences (Nei and Kumar 2000). These results support the hypothesis of similar divergence for both pairs of species.

Yokoyama and Tada (2000) studied the sequences of two rhodopsin genes (2,130 bp) in both species of Latimeria and found twelve nucleotide differences; ten of them occurred on third codon positions and the two others on the first codon position resulting in one amino acid difference in each gene. This result agrees with ours showing a recent divergence between L. chalumnae and L. menadoensis.

The molecular dating analysis shows contrasting results compared to those discussed above: we found a relatively old date of divergence between 20 and 30 Ma. Inoue et al. (2005) estimated the same divergence date at ca. 40 Ma. These authors used the same statistical method employed here, and it is very likely that the difference with our results arises from the selection of species. The date of 20 Ma was obtained with a limited number of species: four mammals, two fishes, and the two coelacanths. After adding six other fish species, the same date estimate was pushed 10 Myr back into the past. This is certainly due to the autocorrelation of rates as assumed in the Multidivtime model (Thorne et al. 1998). This also explains why the divergence among coelacanths is estimated to be older than the divergence between the chimpanzees in spite of our results discussed in the previous paragraph. Considering that the Sarcopterygii and other tetrapods diverged very far in the past (more than 400 Ma), the molecular dating method will tend to spread the amount of change along the branch leading to the coelacanths in order to increase the correlation in molecular substitution rates.

It is remarkable that the patterns of amino acid changes between both pairs of species of Latimeria and Pan show several similarities. The ND-series of proteins show many more amino acid changes than the cytochrome ones. For the former proteins, the number of changes were found to be proportional to the length of the protein. On the other hand, for the cytochromes, the longest protein (COX1) showed the smallest number of changes. It will be interesting in future studies to assess whether these amino acids’ changes have similar impacts on the secondary structure of the proteins.

Our results do not support the hypothesis that homeotherms (here mammals) have higher molecular evolutionary rates than poikilotherms (fishes). The molecular divergences between Bos–Sus and Takifugu–Tetraodon were not statistically different, whereas they have widely overlapping divergence dates according to the fossil record (Benton and Donoghue 2007). This is consistent with recent comparative studies that failed to find a relationship between molecular rates of evolution and metabolic rates across a wide range of animal taxa (Thomas et al. 2006; Lanfear et al. 2007).

Several recent studies have questioned the adequacy of mitochondrial DNA in quantifying population genetic variability (Bazin et al. 2006; Balloux 2009). Balloux et al. (2009) showed that some protein-coding genes are under selective pressure in the human mtGenome. Nabholz et al (2009) showed that mutation rates of cytochrome b are on average three times higher in mammals than in birds. They also provided evidence for substantial variation among species of the same group. On the other hand, Piganeau and Eyre-Walker (2009) showed a relationship between effective population size and mitochondrial genetic diversity using a different measure based on synonymous polymorphism. All these studies indicate clearly that mtDNA data should be interpreted with care in population studies. In phylogenetic studies, it is clear that mtDNA exhibits significant rate heterogeneity in many cases, a factor that should be taken into account (Pereira and Baker 2006).

There seems to be two alternative explanations in order to resolve the apparent paradox in our results: either the divergence between both species of coelacanths is recent and the dating analysis is biased as described above, or the divergence is old and the rate of molecular evolution is remarkably slow in coelacanths. It seems difficult to evoke selective pressures that would be different in these two species because of the similarities of molecular substitutions with chimpanzees, even on third codon positions. Nabholz et al. (2009) evoked the “aging hypothesis” to explain why mutation rate is higher in mammals compared to birds. However, it is not clear how this hypothesis could apply here. Coelacanths are long-lived animals and if mutations accumulate at a low rate in their mtGenome that would be interesting since the metabolic rate hypothesis has been refuted in previous studies (Thomas et al. 2006; Lanfear et al. 2007; Nabholz et al. 2009), and thus would imply the existence of repair mechanisms as suggested for birds (Nabholz et al. 2009).

Our results suggest it is unlikely that the question of the divergence date of the two living coelacanth species could be answered using a traditional approach for the estimation of molecular divergence. The possible future captures of other specimens of L. menadoensis would be particularly welcome in order to quantify the variation in mtGenome in this species. These new data will certainly be essential to solve the issue of the evolutionary dynamics of these fascinating tetrapods.