Introduction

It has been known for some time that exonic nucleotides immediately surrounding introns have a skewed distribution (Long et al. 1998). The most common exonic splicing site found in eukaryotic genes is AG|G (| shows the position of intron). However, it is not always appreciated that AG|G sites make up only slightly more than one quarter of all exonic splicing sites, which means that the majority of introns are inserted in a variety of splicing sites. According to the original version of protosplice hypothesis introns were commonly gained between G|R (Dibb and Newman 1989), but there are numerous other arrangements. It is not yet clear whether the combination of exonic nucleotides flanking the splice sites are remnants of the original protosplice sites, the result of evolution following intron insertion, or a combination of both (Sverdlov et al. 2004; Rogozin et al. 2005). Some aspects of this problem were investigated using simulation analysis (Ruvinsky et al. 2005). Random coding sequences were generated using the codon usage frequencies found in the studied species. Surprisingly, the simulated patterns were quite similar to the observed distribution of intron phases (0, 1, and 2) and can be significantly attributed to codon usage bias. This approach demonstrates that the skewed distribution of intron phases and exonic splicing sites is likely affected by the coding constraints acting upon the sequences that flank introns. A similar study based on real genes generally confirmed such conclusions (Nguyen et al. 2006).

There are several essential factors influencing splicing that are mainly located in introns. They have steadily evolved shaping the splicing specificity in different groups of organisms (Sheth et al. 2006; Schwartz et al. 2008). It should not, however, escape attention that intron framing exonic nucleotides also contribute to correct splicing. A compromise between protein coding and splicing requirements for these exonic nucleotides is achieved differently depending on intron phase (Ruvinsky and Ward 2008). For instance, the higher flexibility of exonic nucleotides surrounding phase 0 introns may serve as a driving force for additional increase of AG|G frequency. In this study, we made an attempt to examine evolutionary changes in exonic nucleotides framing introns by comparing five mammalian species. Contrary to the expectations the frequency distributions of distinct exonic splicing sites were not too different in compared mammalian species. In human genes there was a moderate increase of AG|G frequency. In other mammalian species, particularly in rodents, some reduction in AG|G frequency compared with the reconstructed ancestral sequences was observed. It is plausible that a dynamic equilibrium caused by the substitution rates and frequencies of the exonic splicing sites are responsible for the current frequency distribution of splicing sites.

Materials and Methods

Extracting Orthologous Splicing Sites from Mammalian Genes

Initially we downloaded “refgene” transcripts table from human genome rel.36.1 (hg18) (http://hgdownload.cse.ucsc.edu/goldenPath/hg18/database/refgene.txt.gz; Karolchik et al. 2008). Then we selected the completed transcripts (‘comp’ value for ‘comp’ field) and retrieved a non-redundant set of donor and acceptor splice sites of 24 bp in length spanning the splice site at the center. We restricted our set to exons spanning CDS and containing ‘gt–ag’ intron flanks. The phases were also ascertained from “refgene” the table (‘exonFrame’ field values).

Then we used the multiz alignment files (maf) from the ucsc browser to retrieve the mouse orthologs (ftp://hgdownload.cse.ucsc.edu/goldenPath/hg18/encode/MSA/DEC-2007/DEC-2007.tar.gz) for the corresponding human splice sites. The Galaxy system was used to retrieve the mouse counterparts (http://g2.trac.bx.psu.edu/wiki/MAFanalysis; Taylor et al. 2007). We did not consider orthologous splice sites if they resided in the poorly aligned regions (identity score over 200 bp window spanning the splice site was less than 80% (Sorek and Ast 2003).

Finally, we compiled the exon junction sites as the two last nucleotides from the previous exon and the first nucleotide from the consequent exon for valid splice sites. The adjacent exon lengths in orthologous junctions were supposed to be not more than 300 bp and intron length assumed to be more than 60 bp, otherwise the splice site pair was discarded as irrelevant. Thus we were able to compile 55,339, 34,510, 35,149, 29,864 of human–mouse, human–cow, human–dog, and human–horse orthologous exon junctions, correspondingly. The quality of DNA sequencing was close to 99.99% accuracy in the compared mammalian genomes.

Creating Five Species Comparisons and Reconstruction of Ancestral Splicing Sites

Two Perl scripts were written to prepare the data that were analyzed in this study. These scripts take their input from a text file containing a list of orthologous splice site alignment records (human–mouse, human–dog, human–horse, human–cow) which are described above. A single file was created containing alignments from all five species using these data. The script we used searched and matched the alignment records based on the human mRNA ID and exon_ID. Finally, we obtained 13,905 orthologous exonic splice site alignments for all five compared species.

We also built a consensus for each of the five species orthologous alignments based on the least number of changes required to arrive at the sequences for each species. The logic used to derive this consensus was based on the phylogenetic relationship between the species that traces back to their most recent common ancestor. Generally accepted phylogeny of mammalian orders {[Primates, Rodentia][(Carnivora, Perissodactyla) Artiodactyla]} was used for the reconstruction of ancestral splicing sites (Murphy et al. 2001). From this, a calculation was made on each nucleotide for each position (−2 −1|+1). Once the number of changes was calculated for each nucleotide, the script chose the nucleotide for each position that required the least changes. Where there was equal number of changes for more than one nucleotide at the same position, a consensus could not be found and an ‘n’ was placed in that position in the consensus field. The script also provided the minimum number of changes required for each position.

After visual inspection of the data it became clear that a small fraction of the splicing sites appeared to be in reverse-complementary form. This possibly occurred during the original sequencing or the following steps. The number of such potential mistakes in the data which we used was miniscule, a fraction of 1%, and we could easily ignore presence of such “strange” sequences because they do not affect the final conclusions in the slightest way. However, we dealt with this minor imperfection of the data in the most parsimonious way. Another Perl script was used to correct this problem. The script searched for single and double occurrences of the sequences like CCT (reverse-complementary to AGG). Where there was a single occurrence of a sequence among five compared species, which was assumed to be the reverse-complementary, the sequence was changed to its correct form and both the original and modified versions of the record were output to a separate file for visual inspection. Minimal number of changes and a new consensus were also recalculated for each of these modified records. In the absolute majority of the suspected cases a presence of reverse-complementary sequence was confirmed and the modified record was kept. Where there were two occurrences of a suspected reverse-complementary sequence, the record was output to a separate file for visual inspection with no changes made to the original data. There were just a few alignments which had two occurrences of the suspected sequences and in all of them there was no possibility to confirm whether they were reverse-complementary. No changes were made in such alignments.

Calculation of Substitution Frequencies

Substitution rates were calculated in the same way as was done by Li and Graur (1991), who used Jukes and Cantor’s one-parameter model of nucleotide substitutions for two protein coding sequences. The formulae used were as follows:

$$ K_{S} = - 3/ 4 {\text{ln}}\left[ { 1-\left( { 4M_{S} / 3N_{S} } \right)} \right] $$

and

$$ K_{A} = - 3/ 4 {\text{ln}}\left[ { 1-\left( { 4M_{A} / 3N_{A} } \right)} \right] $$

where K is the number of substitutions (synonymous or nonsynonymous) per site between compared orthologous sequences since the common ancestor; M is the number of differences; and N is the total number of positions. The rate of substitutions per nucleotide position per year was calculated as:

$$ r = K/ 2T, $$

where r is the rate of nucleotide substitutions per position per year and T is time of divergence between human and mouse which was estimated by Li and Graur (1991) as 80 MY and re-confirmed later.

Standard F tests were used in this investigation and the correlation coefficients were calculated using Excel tools.

Results

Comparing Frequencies of Orthologous Exonic Splicing Sites from Five Mammalian Species

Figure 1 shows the distribution of frequencies of the most common exonic 3-nucleotides flanking introns in the orthologous comparisons between human, mouse, dog, horse, and cow. This includes two last nucleotides from the 3′ ends of preceding exons and the first exonic nucleotide from the 5′ end of the following exons (−2 −1|+1). While all 64 possible combinations (43) of exonic 3-nucleotides were observed, only the five most frequent sites are presented at Fig. 1. In studied species these five exonic splicing sites together exceeded 50% of all sites. The AG|G sites are more frequent in human genes and exceed 26.4%. In others species the frequency varies between 21.8 and 23.7% with the lowest value found in the mouse; the difference between human and mouse values is significant (F AG|G = 92.12, P < 0.0001).

Fig. 1
figure 1

Frequencies of the most numerous orthologous exonic splicing sites from five mammalian species including Homo sapiens, Mus musculus, Canis vulgaris, Equus caballus, and Bos taurus. AG|G in human is more frequent than in the other species (F = 92.12, P < 0.0001). The differences in other splicing sites are not significant. Frequencies for each species are counted independently

Genome-wide loss of guanine in rodents (Duret et al. 2002), may partially explain the low frequency of guanine (43.8%) in murine splicing sites. In other species, particularly in human, the frequency of guanine was a little higher (45.7%). Similar but smaller differences are typical for adenine. This phenomenon whether it has selective or neutral nature could contribute to the lower frequency of AG|G in the murine genes. The other four relatively common exonic splicing sites shown in Fig. 1 can be transformed into AG|G by a single substitution.

Substitution Rates in Exonic Splicing Sites

Comparisons between orthologous human and mouse splicing sites allow estimating rates of substitutions which occurred since the common ancestor. Obviously, multiple and reverse substitutions cannot be detected by such comparisons and this may affect the accuracy of the estimates. Fortunately the probabilities of multiple and reverse substitutions are not high. Figure 2 shows the frequencies of matches and mismatches among AG|G and rare splicing sites in human to mouse orthologous comparisons. While we found that AG|G sites are significantly different from all others (the data are not shown), we chose to compare AG|G with rare sites in order to assess a breath of differences. AG|G sites which are the most common 3-nucleotides comprise about ¼ of all splicing sites. Numerous rare splicing sites also represent about ¼ of all sites. The frequencies of matches are much higher for AG|G comparisons than for rare sites (F match = 445.4, P < 0.0001). On the contrary mismatches for rare splice site comparisons are significantly more frequent (F 1mismatch = 202.9, P < 0.0001; F 2mismatches = 273.3, P < 0.0001). The percentage of mismatches in AG|G pairs is 24.9% and in rare splicing sites reaches 38.5%. Thus orthologous AG|G human to mouse pairs show a much higher degree of conservation than rare orthologous splicing sites and consequently there were far less substitutions in AG|G than in rare splice sites. The opposite pattern is revealed in human to mouse pairs which have 1 or 2 mismatches; namely, the frequencies of mismatches were higher in rare sites than in AG|G.

Fig. 2
figure 2

Human to mouse comparisons of match and mismatch frequencies among orthologous AG|G and rare splicing sites. The differences are highly significant (F match = 445.4, P < 0.0001; F 1mismatch = 202.9, P < 0.0001; F 2mismatches = 273.3, P < 0.0001). AG|G sites, which are the most common 3-nucleotides, comprise about ¼ of all splicing sites. The rare splicing sites also represent about ¼ of all sites

Figure 3 shows the frequencies of mismatches between orthologous human and mouse splicing sites for different nucleotide positions within exonic splicing sites for each phase. As expected, position +1 is the least conserved and accommodated 2–4 times more mismatches than the two other positions. On the contrary position −1 is the most conserved. In several cases, the differences between phases are significant. For instance, position −2 in phase 1, position −1 in phase 0, and position +1 in phase 2 have higher frequencies of mismatches because they coincide with the third position of codons and hence were able to accumulate more synonymous mutations.

Fig. 3
figure 3

Frequencies of mismatches between orthologous human and mouse splicing sites relevant to intron phase and the nucleotide position within the site

Using the approach described in “Materials and Methods” section, we calculated substitution rates per nucleotide per year for the three studied positions in the splicing sites for human and mouse (Table 1). Again as in Fig. 2, comparisons were made between AG|G and rare splicing sites. As Table 1 shows, synonymous positions have different locations in the splicing sites depending on intron phase. Earlier Li and Graur (1991) estimated substitution rates per nucleotide per year between human and mouse for protein coding sequences. It was shown that nonsynonymous substitutions occur at a rate of ~0.85 × 10−9 per nucleotide year, and synonymous substitutions occur at a rate of ~4.61 × 10−9 per nucleotide per year. Clearly, exonic splicing sites have more significant constrains than just protein coding sequences, which leads to an expectation of lower substitution rates. Indeed, in all but two cases presented in Table 1 the substitution rates are lower than Li and Graur (1991) estimates. The exception is position +1 in phase 0 and phase 1, which is a nonsynonymous position. In this case AG|G and rare splicing sites have substitution rates 2–3 times higher than Li and Graur estimates.

Table 1 Substitution rates per nucleotide per year in exonic splicing sites are based on human to mouse comparisons

In all human to mouse comparisons, substitution rates in AG|G sites were lower than in rare sites and the same is true for few other common splicing sites (the data are not shown). While this is a very much expected result, in several cases the substitution rates in AG|G sites are dramatically lower than in rare splicing sites, which is a likely consequence of the natural selection. Particularly low rates are found in position −1, thus confirming fact of strong conservation in this position. The average substitution rate per AG|G site was ~2.1 times lower than in rare splicing sites.

Conservation of the Most Common Splicing Sites and Variability of the Rare Sites

Earlier presented data provide clear support for much higher conservation in AG|G sites compared with rare sites. As Fig. 4 demonstrates, a more general conclusion can be drawn regarding all five compared mammalian species. Using the approach described in “Materials and Methods” section, we calculated the average minimal number of substitutions per each group of orthologous splicing sites. These values were plotted against a number of orthologous splicing sites per set. The distribution of dots on the scatter plot shows that the most numerous splicing sites, like AG|G (the far right dot), accumulated the smallest number of substitutions since the common ancestor. In contrast, less common splicing sites have more substitutions. The coefficient of correlation is negative and significant (r = −0.36, P < 0.0001). The presence of purines contributes to conservation of exonic splicing sites. The coefficient of correlation between the number of purines per exonic splicing site and the average minimal number of substitutions for the five compared orthologous splicing sites is also negative and significant (r = −0.54, P < 0.0001).

Fig. 4
figure 4

Scatter plot representing the relationships between the average minimumal number of substitutions per five orthologous splicing sites from the compared mammalian species (human, mouse, dog, horse, and cow) and the number of orthologous splicing sites per set. Some splicing sites like AG|G were numerous, others were rare. The X-axis is in logarithmic scale. The trend line indicates that more numerous splicing sites generally accumulated less substitutions, which lead to significant negative correlation (r = −0.36, P < 0.0001) between the average minimal number of substitutions and the number of orthologous splicing sites. The coefficient of correlation between the number of purines per splicing site and the average minimal number of substitutions per five compared orthologous splicing sites is even higher (r = −0.54, P < 0.0001)

The degree of conservation of the AG|G sites since divergence from the common ancestor was much higher than for the other splicing sites (Fig. 5). For instance, AG|G conservation was higher than the next most frequent splicing site AG|A (F = 40.7, P < 0.0001) and AG|A was more conservative than AG|C or AG|T (F = 13.9, P < 0.001). About 78% of splicing sites, for which an AG|G consensus was established, were AG|G. The presence of other common splicing sites was noticeable, but much lower: AG|A, 7%; AG|C, 3%; and AG|T, 4%. All other splicing sites combined contribute only about 8%. In human genes, AG|G conservation is 84.3% which is much higher than in the other studied species.

Fig. 5
figure 5

Conservation of the most common exonic splicing sites in five mammalian species. The differences are statistically significant; AG|G versus AG|A (F = 40.7, P < 0.0001); AG|A versus others (F = 13.9, P < 0.001). AG|G conservation in human genes was 84.3% which is much higher than in other studied species

The data indicate that fixation of substitutions in rare splicing sites is more likely than in common splicing sites, particularly AG|G. The distribution pattern of splicing sites includes a few conservative canonic splicing sites that change slowly over the lengthy evolutionary intervals and a long array of diminishingly rare splicing sites which together host numerous substitutions. Such a pattern allows hypothesizing the existence of the drive that could cause steady transformation of rare splicing sites towards the canonic ones during evolution. The facts described here are relevant to the period that followed after the beginning of mammalian radiation some 90–100 MY.

The probabilities of substitutions between the most common splicing sites in human genes are shown in Fig. 6. All considered substitutions occurred in +1 position. Clearly, the probabilities of substitutions are shifted toward AG|G and exceed the reverse substitutions by 2–3 times. Transformation of AG|C and AG|T into AG|A is also more likely than the reverse substitutions. These data also show existence of drive toward the most common splicing sites.

Fig. 6
figure 6

Diagram of substitutional changes between the most frequent exonic splicing sites in human genes. These four sites comprise nearly a half of all splicing sites. The numbers and weight of arrows indicate the probability of substitutions, calculated for all those comparisons where consensus was established. The probability of substitutions is clearly shifted toward AG|G and in a lesser degree toward AG|A. Rare substitutions to other splicing sites are not shown in the diagram and do not exceed 1–2%

Dynamic Equilibrium of the Frequencies of Splicing Sites

The drive toward AG|G leads to an expectation that its frequency will grow as long as the substitution rates continue to be biased toward AG|G. However, it is important to realize that the proportions of different splicing sites are essential for this substitutional dynamic. For simplicity, let us consider substitutions only between AG|G and AG|A. In human genes, the number of AG|G is nearly twice that of AG|A and despite twofold lower substitution rate from AG|G to AG|A, the actual flow of substitutions in both directions is approximately equal. This dynamic can be described quantitatively as shown in the following box.

$$ N_{{{\text{AG}}|{\text{G}}}} \underset{\beta }{\overset{\alpha }{\longleftrightarrow}}M_{{{\text{AG}}|{\text{A}}}} $$
$$ N_{{\left( {{\text{AG}}|{\text{G}}} \right){\text{e}}}} = \beta \left( {N + M} \right)/\left( {\alpha + \beta } \right) $$
$$ M_{{\left( {{\text{AG}}|{\text{A}}} \right){\text{e}}}} = \alpha \left( {N + M} \right)/\left( {\alpha + \beta } \right) $$

where α and β are substitution rates, and N and M are numbers of AG|G and AG|A splicing sites, correspondingly. N e and M e are equilibrium values.

In the human sample, which was used in this study, N (AG|G) = 3671 and M (AG|A) = 1901. Taking the substitution rates α and β from Fig. 6, we can calculate the equilibrium values N (AG|G)e = 3697.5 and M (AG|A) = 1874.4. It is quite obvious that the current and equilibrium values are very close. Thus, it seems plausible that current frequencies of splicing sites in human genes are not far from the equilibrium point and, if so, further changes of AG|G frequency are not likely.

Estimated Changes in Frequency of Splicing Sites in Mammalian Species Since the Common Ancestor

Moderate differences in frequencies of AG|G and to a lesser degree AG|A were observed among five compared mammalian species (Fig. 1). Changes of the frequencies in other splicing sites were rather small. Here an attempt was made to estimate the AG|G frequency in the common ancestor of compared species. We were able to reconstruct consensus data for about 2/3 of all five species comparisons (10,351 sets of orthologous splicing sites from the total 13,905), which allows the exact calculation of AG|G losses and gains for each of the species. In order to make estimates for the remaining 1/3 of comparisons (3,554 sets), we assumed that the rate of AG|G losses and gains for this group of comparisons was not different from those where the consensus was established. Using this approach an estimate of AG|G gains and losses was made for all five compared species (Fig. 7). This estimate points to the likely course of evolutionary changes in AG|G splicing sites. The AG|G frequency in human genes has grown by approximately 1.9% above the level of the common ancestor. In the murine genes it has declined by approximately 2.2%. These changes create a significant gap between the two species. Genome-wide differences in guanine (~1.9%) and adenine (~0.7%) content observed between human and mouse genomes may well account for at least part of the changes which occurred in AG|G splicing sites. According to the estimates (Fig. 7) dog, horse, and cow might lose approximately 0.9, 0.9, and 1.3% of AG|G sites, respectively.

Fig. 7
figure 7

Estimated net gain or loss of AG|G exonic splicing sites per genome for compared mammalian species which occurred since the common ancestor. The zero line represents AG|G frequency in the common ancestor. There were 10,351 sets of orthologous splicing sites for five compared mammalian species for which consensus splicing sites were reconstructed. These data provide reliable information about the number of splicing sites which either remained unchanged or were a subject of some changes during eutherian evolution. There were also 3,554 orthologous sets for which complete reconstruction of consensus was not directly possible. However, an assumption that proportion of conservative and changed splicing sites in this group does not differ significantly from the group where the consensus was reconstructed, allowed making an estimate of AG|G net gains or losses for all 13,905 sets of orthologous splicing sites

Discussion

Earlier a comparative study of eukaryotic genomes revealed a higher degree of conservation of exonic splice sites of the U2 type in more complex species as well as conservation the splicing sites of the U12 type (Sheth et al. 2006). A significant conservation of orthologous splice sites between human and rodents and between rodents was also observed by Abril et al. (2005), when compared with unrelated splice sites. The focus of our investigation was on evolutionary changes of exonic splicing sites which took place since the eutherian radiation some 90–100 MYA. Comparisons of orthologous splicing sites from five mammalian species representing major orders like Primates, Rodentia, Carnivora, Perissodactyla, and Artiodactyla showed considerable similarity, with the exception of AG|G sites (Fig. 1). Significant correlation between conservation of splicing sites and their frequency indicates the existence of a trend, according to which the frequencies of the most conservative splicing sites tend to increase (Figs. 4, 5). The major observed difference between compared species was frequency of AG|G. In human genes, it was higher than in other species, while in murine genes it was the lowest. The low frequency of AG|G in mouse can be explained, at least partially, by a significant loss of guanine over the whole genome (Duret et al. 2002) as well as in exonic splicing sites (43.8% versus 45.7% in human splicing sites). However, despite the loss of guanine, the frequency of GG|G in murine genes was slightly higher than in human genes. This fact alone requires more cautious interpretation of potential links between AG|G and guanine frequencies.

Introns framing exonic nucleotides are responsible for at least two entirely different but crucial functions, protein coding, and intron splicing. This inevitably requires a compromise which can be achieved by the natural selection. We observed that the rates of nonsynonymous substitutions based on human to mouse comparisons in position +1 for phases 0 and 1 are about 2–2.5 times higher (Table 1) than the average rate of substitutions per nonsynonymous nucleotides described by Li and Graur (1991). This looks surprising because the +1 position should be under stronger selection pressure than the average nonsynonymous position with rates of substitutions anticipated to be lower. In phase 2, where the +1 position is synonymous, the substitution rates are nearly the same despite the potentially lower cost of the substitutions. There is a simple explanation for phase 2, as synonymous substitutions are possible only between AG|G and AG|A (both coding for arginine) and hence the “degree of freedom” is low. For phases 0 and 1 the explanation might be more complex. Perhaps the +1 position being less critical for correct splicing serves as a “relaxation” point, which can accommodate more substitutions than an average nonsynonymous position. If so, +1 position is the most likely point for compromise between coding and splicing requirements. As in 2/3 of the cases this position is occupied by purine, the compromise might not be too demanding and it is in a way similar to the situation in phase 2. Our data also show that transversions from AG|T and AG|C to AG|G (and AG|A) should be common. Comparative investigations of splicing errors occurring on both sides of intron–exon junction may shed additional light on earlier discussed explanations.

The data presented in this paper indicate that particular substitution pathways were used to increase the frequency of AG|G during evolution (Fig. 6). This mainly occurred through substitutions from AG|T, AG|C, and AG|A toward AG|G. In turn, AG|T, AG|C, and AG|A might have been fed by substitutions from less common exonic splicing sites. These data create an expectation that AG|G frequencies may increase further. However, the growth of canonic splicing site frequencies is limited because of two major reasons. Firstly, the high concentration of AG|G and AG|A, depending on intron phase, limits the use of certain codons and hence amino acids in certain regions of proteins. For phase 2, it leads to a high saturation of corresponding areas of proteins by arginine. Taking into account that arginine usage frequency in mammalian genomes is only about 4%, it is reasonable to expect a reduction of drive toward the canonic splicing sites. Secondly, the dynamic equilibrium is reached when the flow of substitutions from AG|G to AG|A and back becomes equal. As our calculations show, this may happen despite the unequal rates of substitutions simply as a result of differences in the frequencies of the splicing sites. Thus, if the substitution rates remain relatively stable into the future, no significant changes in the frequency of AG|G and other exonic splicing sites are anticipated.

Finally, we made estimates of changes in the frequencies of AG|G which occurred after the commencement of eutherian radiation in five mammalian species representing different orders. These estimates tentatively point out that in all species except humans there were some losses of AG|G in comparison with the common ancestor. In the murine genes these losses were particularly pronounced. The human genes demonstrate an opposite trend. It seems likely that selection in favor of AG|G in human genes contributed to the process, while other splicing sites did not show noticeable growth. What was behind these selective forces is not clear yet. Certainly, molecular evolution of splicing sites might not necessarily have only a selection based explanation. Random events constantly promote neutral evolutionary changes and finding out the right balance between the two is still a significant challenge (Sella et al. 2009).