Introduction

The nonsynonymous substitution rate (Ka), the synonymous substitution rate (Ks), and their ratio (Ka/Ks; sometimes termed dN/dS) are commonly used to aid in understanding the direction of evolution and its selective strength in a coding sequence (Fay and Wu 2001; Kimura 1983; Li 1997; Nei and Kumar 2000; Ohta 1995; Yang and Bielawski 2000). Ka/Ks > 1 indicates a positive selection, Ka/Ks <1 indicates a negative selection, and Ka/Ks ≈1 indicates a neutral evolution.

A recent study described a surprisingly strong positive correlation between Ka/Ks and Ks in several data sets using the LPB93 algorithm (Wyckoff et al. 2005). This finding indicated the possibility of a paradigm shift in the way selection strength can be measured using the Ka/Ks ratio. The authors proposed that the Ka/Ks value reflects not only selective strength but also neutral mutation rate. A later study (Liao and Zhang 2006) did not show a strong correlation between Ka/Ks and Ks within mammalian orthologues using PAML and suggests that the correlation might be sensitive to the method or substitution model used.

Algorithms for estimating Ka and Ks normally involve three steps: counting the number of synonymous and nonsynonymous sites, counting the numbers of synonymous and nonsynonymous substitutions, and correcting for multiple substitutions (Yang and Nielsen 2000). These algorithms adopt different substitution or mutation models based on different assumptions that take various sequence features into account: this gives rise to varied estimates of evolutionary distance (Muse 1996). Thus, the estimation of Ka and Ks is sensitive to the underlying assumptions or mutation models (Zhang and Jun 2006).

Table 1 provides details on the characteristics of several of these types of algorithms, how their authors evaluated them, and to which genome projects they were applied. The NG86 algorithm (Nei and Gojobori 1986), based on the Jukes–Cantor model (JC69; Jukes and Cantor1969), assumes substitutions with equal frequency and considers different evolutionary pathways between pairwise sequences. In contrast, the LWL85 algorithm (Li et al. 1985) introduces nondegenerate, twofold degenerate, and fourfold degenerate sites to count sites and substitutions, which is based on the two-parameter model of Kimura (K2P; Kimura 1980). Although K2P is used for correction of multiple substitutions, the LWL85 algorithm allows for different rates between transitions and transversions only by counting substitutions and considers that twofold degenerate sites are one-third synonymous and two-thirds nonsynonymous. In comparison with the LWL85 algorithm, LPB93 (Li 1993) takes into account such bias by counting sites, and the differences between the LWL85 and the LPB93 algorithms mainly focus on their Ka and Ks formulas. The GY94 algorithm (Goldman and Yang 1994) is a maximum-likelihood method that adopts a codon-based model (GY-HKY) considering more features of DNA sequence evolution, e.g., transition/transversion rate bias and nucleotide/codon frequency bias. The YN00 algorithm (Yang and Nielsen 2000) is a simplified version of the GY94 algorithm (Hasegawa et al. 1985) and gives a close approximation of this more time-consuming maximum-likelihood method. The MYN algorithm (Zhang et al. 2006) is a modification of YN00 and adopts the Tamura/Nei (1993) model, which considers unequal transitional rates between purines and pyrimidines, as well as considering transversional rate and nucleotide (codon) frequencies. Beside these methods, Ina’s (1995) method and the modified NG method (Zhang et al. 1998) are also frequently used. Ina’s method does not partition sites according to site degeneracy. However, it takes into account the transition/transversion rate bias by counting synonymous and nonsynonymous sites in proportion to synonymous and nonsynonymous substitution rates. The modified NG method considers the transition/transversion rate bias and estimates the number of synonymous and nonsynonymous site with the K2P model.

Table 1 Characteristics and application of commonly used KaKs algorithms

Most of these algorithms were introduced and evaluated using either simulated or small-scale real data (Table 1) but, as yet, have not been evaluated in a large-scale, genome-wide evaluation of real data. In this report, we show that there is a highly variable correlation among the above six Ka and Ks algorithms in calculations on three complete orthologue data sets. Our results indicate that the correlation between Ka/Ks and Ks is affected not only by the algorithms used, but also by different evolutionary lineages of the DNA sequences analyzed.

Data and Methods

Orthologue Data and Alignment

To define the orthologue genes and the alignments of human-mouse and mouse-rat, we retrieved orthologous gene data from NCBI Homologueene database (ftp://ftp.ncbi.nih.gov/pub/HomoloGene/; version 44.1) and all sequences of the Refseq data from the NCBI genome (ftp://ftp.ncbi.nih.gov/genomes/). Refseq orthologues in a one-to-many or many-to-many relationship were excluded to avoid creating ambiguous orthologue pairs. A total of 15,065 and 14,198 orthologous pairs, respectively, were defined for human-mouse and mouse-rat; 15,743 fugu-tetraodon one-to-one orthologue relationships were defined using the InParanoid database (http://inparanoid.cgb.ki.se). Each pair of orthologous proteins was aligned using the blastp program in the NCBI BLAST2 package and their final nucleotide alignment for the Ka and Ks calculation was created according to the protein alignment.

Calculation of Ka, Ks, and Divergence

The NG86, LWL85, LPB93, GY94, YN00, and MYN algorithms, implemented in KaKs_Calculator (Zhang Zhang 2006), were used on all data sets to calculate Ka and Ks. For GY94 we used an F3x4 codon frequency model and default values for other parameters. Model weight was also calculated using KaKs_Calculator. Divergence (D) between orthologue pairs was calculated according to the proportion distance (p-distance) of each orthologue at the nucleotide level.

Computer Simulation Method

We used the simulation program evolver in PAML (Phylogenetic Analysis by Maximum Likelihood [Yang 2007]; available at: http://abacus.gene.ucl.ac.uk/software/paml.html) to get simulation data. All evolution parameters were extracted from human-mouse orthologue alignments in this study. Codon usage was extracted from human Refseq. Transition/transversion rate (average value, 3.820), Ka/Ks (average value, 0.182 and 0.136 for LPB93 and YN00, respectively), and substitution rate t (average value, 0.589 and 0.657 for LPB93 and YN00, respectively) were extracted from each pair of human-mouse orthologue pairs in this study. The average nucleotide frequencies are 0.257, 0.219, 0.260, and 0.264 for A, T, C, and G, respectively. The codon frequency used for simulation can be found at http://evolution.genomics.org.cn/dNdS_corre/human.codon-usage. Each simulated orthologue pair was assigned a series of parameters, including Ka/Ks, transition/transversion rate, and t, from its trained parameters and then evolved using the GY-HKY and K2P substitution model.

Orthologue Filtering Method

After calculating divergence (D), the orthologue pairs with the largest divergence (upper 5%) were also removed to prevent inaccurate or ambiguously-defined alignments. After cleaning, there were a total of 14,311, 13,488, and 14,954 orthologue pairs defined for human-mouse, mouse-rat, and fugu-tetraodon, respectively.

Statistical Methods

GSL (GNU Scientific Library, www.gnu.org/software/gsl/) was used for statistical analyses with standard C.

Results

The Algorithm and its Underlying Substitution Model Impact Ka/Ks, Ks, and the Correlation Between Ka/Ks and Ks

We first used the six algorithms (NG86, LWL85, LPB93, GY94, YN00, and MYN) to assess the correlation between Ka/Ks and Ks in three vertebrate cross-species orthologues (Fig. 1a–c). The data show that these analyses provide different degrees of correlation between Ka/Ks and Ks. After calculating t-values for each case with H0: r = 0 (r is the correlation coefficient), we calculate p-values for each case under a t distribution (Bernstein 1999). To make the original distribution of Ka/Ks vs Ks easily observable, we randomly selected 2000 original points with the YN00 algorithm for the fugu-tetraodon lineage (Fig. 1d), which shows a distinct correlation when Ks is low.

Fig. 1
figure 1

 Correlation between Ka/Ks and Ks for six different algorithms on three orthologue data sets. a Orthologues (15,065) of human-mouse with average Ks values of 0.610 (NG86), 0.606 (LWL85), 0.496 (LPB93), 0.748 (GY94), 0.755 (YN00), and 0.798 (MYN). Average Ka/Ks ratios are 0.136 (NG86), 0,140 (LWL85), 0.182 (LPB93), 0.131 (GY94), 0.136 (YN00), and 0.128 (MYN). b Orthologues (14,198) of mouse-rat with average Ks values of 0.223 (NG86), 0.231 (LWL85), 0.184 (LPB93), 0.217 (GY94), 0.219 (YN00), and 0.224 (MYN). Average Ka/Ks values are 0.155 (NG86), 0.151(LWL85), 0.202 (LPB93), 0.187 (GY94), 0.183 (YN00), and 0.176 (MYN). c Orthologues (15,743) of fugu-tetraodon with average Ks values of 0.433 (NG86), 0.439(LWL85), 0.379 (LPB93), 0.564 (GY94), 0.648 (YN00), and 0.624 (MYN). Average Ka/Ks values are 0.187 (NG86), 0.185 (LWL85), 0.224 (LPB93), 0.174 (GY94), 0.165 (YN00), and 0.158 (MYN). d Random selection of 2000 original points with the YN00 algorithm for the fugu-tetraodon lineage. All original Ka/Ks and Ks values for each orthologue pair were sorted by their Ks value in a, b, and c, and 300 consecutive points were placed in one bin. Subsequently, the mean values of Ka/Ks and Ks were calculated in each bin as representative Ka/Ks and Ks values for each bin

The NG86, LWL85, and LPB93 algorithms applied to the human-mouse and mouse-rat orthologues indicate there is a relatively strong positive correlation (r 2 > 0.5 and p < 1e-7, for both human mouse and mouse-rat) between Ka/Ks and Ks (Table 2), whereas GY94 shows a much weaker correlation (human-mouse lineage, r 2 = 0.28 and p = 5.88e-4; mouse-rat lineage, r 2 = 0.035 and p = 1.24e-1). YN00 and MYN yield a weak negative correlation in human-mouse lineages (r 2 < 0.3 and p > 1e-2) and a relatively strong negative correlation in mouse-rat lineages (r 2 > 0.4 and p < 1e-6). For fugu-tetraodon lineages, NG86, LWL85, and LPB93 show almost no correlation between Ka/Ks and Ks (r 2 < 0.05 and p > 0.1). In contrast, GY94, YN00, and MYN exhibit a stronger negative correlation (r 2 > 0.2 and p < 1e-3).

Table 2  Correlation coefficient (r) and statistical significance for each algorithm under different evolutionary lineages

Figure 1 shows that the correlation between Ka/Ks and Ks is not consistent for all algorithms within a particular evolutionary lineage. Compared to the GY94, YN00, and MYN algorithms, NG86, LWL85, and LPB93 show a relatively strong positive correlation between Ka/Ks and Ks in human-mouse and mouse-rat lineages. In the fugu-tetraodon lineage, however, GY94, YN00, and MYN show a much weaker negative correlation than NG86, LWL85, and LPB93 do. The analyses here show that there is a general similarity in the correlation of NG86, LWL85, and LPB93, as there is also a similarity in the correlation of GY94, YN00, and MYN. The two groups, however, differ from each other.

If we consider the substitution models at the nucleotide level of these algorithms, we can group the six algorithms into three model groups (Posada and Crandall 2001). The first group, JC69 (Jukes and Cantor 1969), includes the NG86 algorithm. The second model group, K2P (Kimura 1980), includes the LWL85 and LPB93 algorithms. The third model group, GY-HKY (Goldman and Yang 1994), includes the GY94, YN00, and MYN algorithms.

To further test the substitution model’s influence on the correlation between Ka/Ks and Ks, we carried out the following computer simulation to evaluate the correlation differences between group K2P (represented by LPB93) and group GY-HKY (represented by YN00). We did not assess JC69 because it contains an algorithm that performed similarly to those algorithms in K2P (Fig. 1; Table 2). The purpose of computer simulations is to examine whether different substitution models do or do not affect the correlation between Ka/Ks and Ks in simulation data.

We first used the K2P substitution model to generate simulation data and then used LPB93 and YN00 to evaluate the correlation between Ka/Ks and Ks. The result (Fig. 2a) shows that YN00 provides a much stronger negative correlation than LPB93 does. The correlation coefficients for LPB93 and YN00 are –0.590 (p = 3.87e-5) and –0.917 (p = 7.77e-16), respectively. We then used the GY-HKY substitution model to generate our simulation sequences and calculated Ka and Ks using the LPB93 and YN00 algorithms (Fig. 2b). The difference in the correlation between the two algorithms is similar to the first simulation result. LPB93 shows a much stronger positive correlation than YN00. The correlation coefficients for LPB93 and YN00 are 0.401 (p = 6.3e-3) and 0.04 (p = 3.96e-1), respectively. Our simulation results confirm that data generated from LPB93 (the K2P model) will achieve a totally different correlation with YN00 (the GY-HKY model), and vice versa. For a detailed description of the simulation data, see Data and Methods. These results are supported by previous simulation studies (Tzeng et al. 2004), which show that when the evolutionary parameters are similar to those of the human-mouse lineage (CDS size, ~ 400 codons; κ ~ 2; t ~ 0.4), Ks estimated by the GY-HKY model is larger than that estimated by the K2P model, and Ka/Ks estimated by the GY-HKY model is smaller than that by the K2P model (for details see Table 1 of Tzeng et al. 2004). Thus, the relative correlation between Ks and Ka/Ks for the same data set is very different and will result in different conclusions concerning selection direction and strength. We discuss the possible explanation of different degrees of correlation among these algorithms in detail in the Discussion.

Fig. 2
figure 2

Computer simulation for two substitution-model groups. K2P includes the LWL85 and LPB93 algorithms, and GY-HKY includes the GY94, YN00, and MYN algorithms. All simulated data were generated using a the K2P substitution model and b the GY-HKY substitution model. Ka and Ks calculations were carried out using both the LPB93 algorithm and the YN00 algorithm, and then the results were sorted by their Ks value, and 300 consecutive points were put in one bin. Subsequently, the mean value of Ka/Ks for each bin was used as a representative Ka/Ks for each bin

Correlation Dependent on Evolutionary Lineage

We next compared the correlation between Ka/Ks and Ks in fixed algorithms for different evolutionary lineages Although NG86, LWL85, and LPB93 show a relatively strong positive correlation in human-mouse and mouse-rat lineages (r 2 > 0.5 and p < 1e-7), the correlation is lost in fugu-tetraodon (r 2 < 0.05 and p > 0.1) (Fig. 1; Table 2). GY94 shows a weak positive correlation in human-mouse (r 2 = 0.28 and p = 5.88e-4), no correlation in mouse-rat (r 2 = 0.035 and p = 1.24e-1), and a weak negative correlation in fugu-tetraodon (r 2 = 0.215 and p = 3.40e-4). YN00 and MYN show almost no correlation (or very weak negative correlation) in the human-mouse lineage (r 2 < 0.3 and p > 1e-2) but stronger negative correlations in mouse-rat (r 2 > 0.4 and p < 1e-6) and fugu-tetraodon (r 2 > 0.3 and p < 1e-4).

Figure 1 and Table 2 show that the correlation between Ka/Ks and Ks is related to lineages for a given algorithm, indicating that the correlation is also sensitive to the orthologue data. We used the following procedure to test the assumption: the correlation is related to the orthologue data for a specific algorithm. First, we calculated the divergence level for each orthologous pair. Then we discarded the orthologues with the greatest divergence (top 5%) to avoid using incorrect and ambiguously defined alignments or orthologous relationships. For the remaining 95% of the orthologues, we recalculated the correlation coefficient and p-value for each algorithm.

Figure 3 and Table 3 show that, after this filtering procedure, almost all the algorithms present weaker positive correlations or stronger negative correlations than they do in the original orthologue data (in Fig. 1; Table 2). The changes in correlation using this procedure support the assumption that the correlation between Ka/Ks and Ks is also sensitive to evolutionary lineage and orthologue data. Correlation values for a particular algorithm can vary in different evolutionary lineages or in different data subsets (as shown in Figs. 1 and 3).

Fig. 3
figure 3

 Correlation between Ka/Ks and Ks for six different algorithms on three orthologue data sets after filtering the top 5% of divergent orthologues. a Orthologues (14,311) of human-mouse. b Orthologues (13,488) of mouse-rat. c Orthologues (14,954) of fugu-tetraodon. d Random selection of 2,000 original points using the YN00 algorithm for the fugu-tetraodon lineage. All original Ka/Ks and Ks values for each pair of orthologue pairs were sorted by their Ks value, and 300 consecutive points were put in one bin in ac. Subsequently, the mean values of Ka/Ks and Ks were calculated for each bin as representative Ka/Ks and Ks values for each bin

Table 3  Correlation coefficient (r) and statistical significance for each algorithm under different evolutionary lineages after removing the top 5% of the divergent orthologue data

Alignment Quality Check

Table 4 presents data for the quality of the alignment in this study. All three evolutionary lineages were divided into two parts, according to their Ks value. For Ks < 0.25, the percentages of mismatches in the alignments are 6.16%, 6.30%, and 7.15% for human-mouse, mouse-rat, and fugu-tetraodon, respectively. For Ks > 0.25, the percentages of mismatches in the alignments are 14.99%, 9.93%, and 13.05% for human-mouse, mouse-rat, and fugu-tetraodon, respectively. The divergence for all three lineages is still in the region where Ka and Ks can be calculated accurately (Tzeng et al. 2004). We also assessed the gap rate for the whole alignments and the presence or absence of gaps at the ends of the alignments. Table 4 shows that for all three evolutionary lineages, the gap rates of both measurements are no more than 1%, except for human-mouse orthologues when Ks < 0.25 (slightly more than 1%). This indicates that the alignment qualities are still reliable for all lineages.

Table 4 Quality of alignment

To check whether the top 5% orthologues are really divergent from the other 95% of the orthologues, we further investigated the indel number, indel length, Ka, Ks, and Ka/Ks for the top 5% and the other 95% in all three evolutionary lineages. Table 5 shows that the indel number per gene and indel length per gene in the top 5% of the orthologues are about two to four times larger than those in the rest of the orthlogues. Ka, Ks, and Ka/Ks in the top 5% of the divergent orthologues are significantly larger (p < 0.005, Wilcoxon rank sum test) than those in the rest of the orthologues. These results indicate that the top 5% of the divergent orthologues have more unreliable alignments than those of the other 95%.

Table 5 Divergence between the top 5% and the other 95% of orthologues

In addition, to check whether the pattern observed in fugu-tetraodon was not an alignment artifact of our alignment procedure, we used another method for creating this alignment: global alignment with software “needle” based on the Needleman-Wunsch algorithm in EMBOSS (Rice et al. 2000). Using this method we obtained the same Ka/Ks vs. Ks pattern, and this appears at both low and high Ks values (data not shown). This concordance provides evidence to support the quality of our results, and that they are not related to the use of low-quality orthologue data or alignment artifacts.

Due to the potential influence of GC, GC3 content (GC content at the third position of all codons), and transition/transversion ratio on Ks estimation (Chamary et al. 2006), we examined the GC, GC3 content, and transition/transversion rate ratio (Ts/Tv) in each group of orthologues (Table 4). Our results show that when Ks > 0.25, the difference in GC (0.017) or GC3 (0.074) content between human-mouse and fugu-tetraodon orthologues is much smaller; in contrast, when Ks < 0.25, differences in GC (0.048) or GC3(0.129) content between human-mouse and fugu-tetraodon are larger. See the Discussion for more details about the influence of these parameters on Ks.

Discussion

Recently, Wyckoff and coworkers found a surprisingly strongly positive correlation between Ka/Ks and Ks using several data sets (Wyckoff et al. 2005) and suggested a paradigm shift in the application of Ka/Ks as a measure of selective strength, indicating that the Ka/Ks value reflects not only selective strength but also neutral mutation rate. In short lineages, the positive correlation between Ka/Ks and Ks is not observed (Wyckoff et al. 2005). However, after correcting the stochastic noise of Ks in short lineages, the positive correlation can still be observed with the LPB93 algorithm (Vallender and Lahn 2007). Although some earlier studies have shown some correlation between Ka and Ks (Domazet-Loso and Tautz 2003; Lynch and Conery 2000), so far, only Wyckoff et al. have presented a strong systematic correlation between Ka/Ks and Ks. In consideration of Wyckoff and coworkers’ findings, we analyzed three orthologue data sets with six different algorithms for evolutionary distance in three evolutionary lineages. Comparing NG86, LWL85, LPB93, GY94, YN00, and MYN, we found some correlations between Ka/Ks and Ks. However, those correlations had a highly variable strength and a dependence on the lineage used in these calculations.

Which factors might contribute to the cause of the phenomenon that different algorithms present different levels of correlation for the same data set? The following are possible interpretations: (1) transition/transversion rate bias, (2) codon usage bias, (3) the estimation difference among different substitution models increasing with increasing substitution rate, and (4) estimation error and imperfect computation for these KaKs algorithms. As discussed previously, the lack of incorporation of transition/transversion rate bias, NG86 will overestimate Ks and underestimate Ka/Ks (Yang 2006; Yang and Bielawski 2000). Ignoring codon-usage bias in NG86, LWL85, and LPB93 will result in underestimation of Ks and overestimation of Ka/Ks. When divergence increases, the estimation error will increase dramatically (Nei and Kumar 2000). That is, the percentage difference in Ks estimation between two substitution models will increase sharply when the synonymous substitution rate increases (Ks > 0.3). Because the nonsynonymous substitution rate is much lower than the synonymous substitution rate, the estimation difference between two substitution models is very small and will have little impact on the correlation. For comparison between LPB93 and YN00, when the synonymous substitution rate is low (<0.2), YN00 will show a little higher Ks and a little lower Ka/Ks than LPB93 does. When the synonymous substitution rate is higher (>0.3), YN00 will present a much higher Ks and much lower Ka/Ks than LPB93 (Yang 2006; Yang and Bielawski 2000). Therefore, assuming that LPB93 presents a straight line (strong positive correlation between Ka/Ks and Ks), YN00 will be like a parabola beneath the LPB93 straight line, which is consistent with our result in Figs. 1 and 3. Our simulation results further confirm that the systematic differences between the LPB93 and the GY94 algorithms exist for simulation data produced by either the K2P or the GY-HKY model. LPB93 will yield a more positive correlation between Ka/Ks and Ks, and GY94 will yield a more negative correlation for the same data set. One interesting result is that there is a small difference between GY94 and YN00: the negative correlation between Ka/Ks and Ks in YN is a little stronger than that in GY94, although they adopt the same underlying substitution model. This small difference can be explained by the numerical calculation difference between the maximum-likelihood and the approximate method. The approximate method (YN or MYN) will usually yield a little larger estimation of Ks and a little lower estimation of Ka/Ks (Yang and Bielawski 2000; Yang and Nielsen 2000), thus leading to a little more negative correlation than the maximum-likelihood method does (GY94). And these correlation differences between two very similar methods suggest that the impact of stochastic variance and imperfect computation on the correlation cannot be ignored. Additionally, the impact of stochastic noise on the correlation between Ka/Ks and Ks was also considered in recent studies (Vallender and Lahn 2007).

Why does a given KaKs algorithm lead to positive correlations for some data sets but negative correlations for other data sets? Two possible causes are as follows. (1) The change in “real” substitution will cause different degrees of correlation for different data sets even for the same algorithm. This interpretation can be confirmed by our simulation results in Fig. 2. (2) Evolutionary traits, such as codon frequency, transition/transversion rate bias, and divergence at the nucleotide level, will yield another conjunct impact on the correlation difference among different data sets. Different heterogeneity in the gene region, such as codon frequency, transition/transversion rate, CpG islands, and isochores (long stretches of compositionally homogeneous DNA), can also affect the Ka and Ks calculation performance. As for the comparison of warm-blooded and cold-blooded lineages, different compositional patterns of isochore structure exist. Additionally, cold-blooded vertebrate genomes have fewer GC regions and lack GC-rich isochores, which are widespread in warm-blooded vertebrate genomes. The GC3 content in a gene is highly correlated with the GC content of the isochore in which it is embedded in mammals (Chamary et al. 2006). Such a compositional difference in nucleotides may lead to codon usage bias and to different modes of genome evolution: conservative mode and transitional mode (Bernardi 1993). Our results also show that the difference in GC (or GC3) levels between human-mouse and fugu-tetraodon orthologues (we compared these two lineages due to their similar divergence) is much smaller when Ks > 0.25 than when Ks < 0.25: the correlation between Ka/Ks and Ks for human-mouse and fugu-tetraodon is different when Ks < 0.25 (negative correlation in fugu-tetraodon and positive correlation in human-mouse), whereas the correlation is similar between the two lineages when Ks > 0.25. This indicates that the isochore effect may result in different codon usage and selection bias on synonymous sites, as suggested previously (Chamary et al. 2006).

Although previous studies (Liao and Zhang 2006; Wyckoff et al. 2005) have employed correlation coefficients to measure the trend of Ka/Ks with Ks, our fugu-tetraodon results indicate that the trend or dependence of Ka/Ks on Ks is very complicated and very dependent on the regions (low or high Ks region) and calculation method used. Therefore, single statistics of measurement, such as correlation coefficient, may not provide a completely reliable or complete picture of the dependence of Ka/Ks on Ks. To ensure confidence in the current correlation assessment, it is necessary to avoid using global calculated correlations and also to consider the local dependence (correlation) of Ka/Ks and Ks.

Summary

Wyckoff and coworkers (2005) have shown a strong positive correlation between Ka/Ks and Ks with the LPB93 algorithm for human-mouse orthologues, which we have reproduced in this current study. However, when we calculated the correlation using several other algorithms (GY94, YN00, and MYN) and used more evolutionary lineages, including the cold-blooded fugu-tetraodon lineage, the positive correlation became less significant from warm-blooded to cold-blooded lineages using the NG86, LWL85, and LPB93 algorithms. At the same time, we found a weak or no significant negative correlation using GY94, YN00, and MYN in a warm-blooded lineage and stronger negative correlation using GY94, YN00, and MYN in a cold-blooded lineage. In each evolutionary lineage, the correlation was variable among algorithms that are based on different DNA substitution models. Previously, algorithms to compute Ka and Ks were justified by how well they fit some arbitrarily defined mutation models. Given the algorithm-specific and evolutionary lineage-related correlations shown in this work, great caution should be taken when using only one Ka and Ks algorithm. A data set calculation with an improperly chosen algorithm may produce an inaccurate finding, which may then be interpreted as a biological trait but probably is an artifact of the calculation.