Introduction

BK virus (BKV) belongs to the Polyomaviridae family (Cole and Conzen 2001). Sixteen members of this family are now known. All have capsids of similar size and are formed by three viral capsid proteins. All show a similar genomic organization (see Fig. 1), and many regions of their genomes are highly conserved among them, demonstrating that they are derived from a common ancestor. Furthermore, all have unique natural hosts. For example, humans are the natural hosts of BKV and JC virus (JCV), while Papio ursinus, the chacma baboon, is thought to be the natural host of simian agent 12 (SA12), a virus closely related to BKV (Cunningham and Pipas 1985). Thus, it is considered that the Polyomaviridae coevolved with their viral hosts.

Fig. 1
figure 1

BKV genome. The circular double-stranded DNA of the DUN strain (Seif et al. 1979). The origin of replication (Ori), the noncoding regulatory region (CR), and the regions encoding six viral proteins (two early regulatory proteins [large T (LT), small t (ST)], a late regulatory protein [agnoprotein; Agno], and three capsid proteins [VP1, VP2, and VP3]) are shown. A 287-bp typing region (TR) (Takasaka et al. 2004) frequently used for typing BKV is also indicated. Nucleotide numbers are those of Seif et al. (1979).

BKV was first isolated in the urine of a renal transplant patient (Gardner et al. 1971). Seroepidemiological surveys conducted in various countries have since demonstrated that this virus is ubiquitous in humans (Knowles 2001). Infection most frequently occurs asymptomatically during childhood (Knowles 2001). After the primary infection, BKV persists in renal tissue (Heritage et al. 1981; Chesters et al. 1983). The renal BKV becomes reactivated particularly in immunocompromised patients, and progeny viruses are excreted in urine. In renal transplant patients, BKV causes renal dysfunction, such as BKV-associated nephropathy (de Bruyn and Limaye 2004).

BKV is the only primate polyomavirus that has subtypes distinguishable by immunological reactivity (Knowles 2001). Jin et al. (1993a) investigated VP1 (the major capsid protein) gene sequences which may be responsible for the antigenic variability among BKV isolates (see Fig. 1). In comparisons of VP1 among representative strains belonging to various subtypes, they found that although there is 95.8% similarity over the entire coding region of VP1, there is only 61–70% similarity between residues 61 and 83. Furthermore, they found that this region is stably conserved among isolates belonging to each subtype. Finally, they found that the region spanning amino acids (aa) 61–83 is highly hydrophilic. Based on these findings, this region was postulated to constitute the epitope responsible for serotypic differences between BKV isolates.

Jin et al. (1993b) proposed genomic typing based on the nt variations in a VP1 gene region (hereafter designated the epitope region) that encodes aa 61–83. Using this variable region, the distribution pattern of BKV subtypes has been studied in several countries including England, Tanzania, the United States, and Japan (Jin 1993; Jin et al. 1993b, 1995; Agostini et al. 1995; Di Taranto et al. 1997; Baksh et al. 2001; Takasaka et al. 2004). The results of these studies have indicated that (i) subtype I predominates in all geographical regions, (ii) subtype IV occurs at lower rates, and (iii) subtypes II and III rarely occur. In addition, subtype I has been subdivided into three subgroups called Ia, Ib, and Ic based on nt variations within the typing region (Takasaka et al. 2004).

Nevertheless, it remained to be elucidated whether the phylogeny of BKV is in parallel with its antigenic variation. Naturally, a substantial number of full-length sequences of BKV isolates belonging to various subtypes are required to establish the phylogenetic relationships among BKV subtypes. Thus far, three complete BKV DNA sequences derived from different individuals (Seif et al. 1979; Yang and Wu 1979; Tavis et al. 1989) and several sequences from each of three individuals (Chen et al. 2004) have been reported. Most of these sequences, however, belonged to subtype I, with a single exception, strain AS belonging to subtype III.

In this study, we made a phylogenetic comparison of 28 complete BKV coding sequences derived from different individuals, using the maximum likelihood (ML) method (Felsenstein 1981). These sequences were determined mainly in this study, and included 21, 2, and 5 sequences belonging to subtypes I, III, and IV, respectively (no sequences belonging to subtype II, a minor subtype, were included, as no subtype II isolates were available). First, we reconstructed phylogenetic trees for individual protein-encoding genes, as well as for concatenated genes with the ML method at the nt level, and demonstrated that the BKV isolates analyzed in this work can be classified into three phylogenetic groups, I, III, and IV, consistent with previous studies. Then the order of deep branchings among these groups was analyzed in detail at both the aa and the nt levels using a JCV strain (GH1) (Sugimoto et al. 2002) and SA12 (Cantalupo et al. 2005) as outgroups.

As described above, BKV is unique in that it has four serologically distinguishable subtypes. Stoner and Hübner (2001) speculated that these subtypes might have originated in archaic humans rather than modern humans. In contrast, subclades (Ia, Ib, and Ic) within subtype I are associated with human populations (Takasaka et al. 2004; unpublished observations), suggesting that subtupe I subclades separated in correspondence to the division of modern humans after their out-of-Africa migration. Further insights into the relationships between BKV and humans should be obtained, if it is determined when the division in the BKV subtypes and that in the subtype I subclades occurred. Now that the complete nt sequences of many BKV strains belonging to three of the four subtypes of BKV and that of SA12, i.e., the most closely related to BKV among various primate polyomaviruses (Cunningham and Pipas 1985), are available, here we attempted to elucidate the time scale of the division among the subtypes of BKV and the division among subclades within subtype I, assuming that BKV and SA12 split when the natural hosts of these viruses diverged, i.e., 30 MyrBP (Hasegawa et al. 2003).

Comparison of synonymous and nonsynonymous substitution rates in protein encoding genes is important for studying the mechanism of DNA sequence evolution (Kimura 1983). A commonly used measure in such studies is the nonsynonymous/synonymous rate ratio (ω = d N/d S, where d N and d S are the numbers of nonsynonymous and synonymous substitutions per site, respectively), termed the acceptance rate by Miyata and Yasunaga (1980). An excess of nonsynonymous substitutions over synonymous ones has been regarded as an important indicator of positive selection at the molecular level (e.g., Yang 1998). Several studies have demonstrated that some viral genomes such as those of HIV and influenza virus are under continuous pressure of positive selection (e.g., Yamaguchi-Kabata and Gojobori 2000; Yang 2000; Yang et al. 2003). In this paper, we examine whether a similar selective pressure is operating on the BKV genome.

Materials and Methods

Urine Samples

We used crude BKV DNAs previously extracted from urine samples collected from renal transplant and bone marrow transplant patients (Takasaka et al. 2004).

DNA Analysis

Entire BKV DNAs were cloned into pUC19 at the unique BamHI site as described previously (Yogo et al. 1991). The complete BKV DNA clones were prepared using a QIAGEN Plasmid Maxi kit (QIAGEN GmbH, Hilden, Germany). Purified plasmids were used for a cycle sequencing reaction using the DYEnamic ET Terminator Cycle Sequencing Kit (Amersham Biosciences, Buckinghamshire, England). Forward and reverse primers used are listed in Table 1. Primers were added to a final concentration of 0.25 pmol/μl in a final reaction volume of 20 μl. The reaction profile was 25 cycles of 30 sec at 96°C, 15 sec at 50°C, and 60 sec at 60°C. The reaction was terminated at 4°C. The cycle sequencing products were purified with Centri-Sep columns (Princeton Separations, Adelphia, NJ, USA). DNA sequencing was performed using an automated sequencer (ABI PRISM 373S DNA sequencer, Applied Biosystems, Foster City, USA).

Table 1 Primers used for sequencing full-length BKV genomes

Phylogenetic Analysis

Sequences encoding viral proteins and spacer sequences were used in the phylogenetic analyses. Positions with gaps or ambiguous alignments plus overlapping regions with different reading frames were excluded. The total number of codons remaining is 1453. Prior to the ML tree’s reconstruction from the nt sequence data, MODELTEST 3.06 (Posada and Crandall 1998) was applied in order to find the optimal model of nt substitution by using the Akaike (1974) information criterion (AIC), first introduced by Kishino and Hasegawa (1990) in the field of molecular evolution. AIC is defined by AIC = −2 × log-likelihood + 2 × (number of parameters), and a model that minimizes this criterion is considered to be the most appropriate. The optimal model thus selected was assumed throughout in using PAUP*4.0b10 (Swofford 2002) for reconstructing phylogenetic trees. Heuristic ML searches using tree bisection/reconnection (TBR) branch swapping (initial trees were obtained by the neighbor-joining [NJ] method) were performed in PAUP*. ML nodal support was estimated by using the nonparametric bootstrap (Felsenstein 1985) with 100 replications. Furthermore, the TOPALi program (Milne et al. 2004) was applied to the sequence data to test the possibility of recombination during the evolution of BKV.

After establishing the phylogenetic relationships within several groups of BKV, we examined the branching orders among the major groups of BKV with the codon-substitution (CS) model (Goldman and Yang 1994; Yang et al. 1998) with a JCV isolate (GH1) (Sugimoto et al. 2002) and SA12 (Cantalupo et al. 2005) as outgroups. This model is more realistic and much better in terms of AIC than the models implemented in PAUP*, which do not take account of the correlation among different codon positions (e.g., Sasaki et al. 2005). We used the CodeML program in PAML (version 3.14) (Yang 1997) for this analysis.

For the CS model in phylogenetic analyses, we used Miyata, Miyazawa, and Yasunaga’s (1979) aa distance with geometric formulae (Yang et al. 1998). We also used the BaseML program in the PAML package with the GTR (Rodriguez et al. 1990) + Γ model in analyzing nt sequences with and without partition among different codon positions. In using CodeML and BaseML, the discrete Γ distribution (with eight categories) for site heterogeneity (Yang 1996) was adopted, and the shape parameter (α) of the Γ model was optimized. Bootstrap probabilities (BPs) were estimated by the RELL (resampling of estimated log-likelihoods) method (Kishino et al. 1990) with 10,000 bootstrap resamplings. The RELL method has been shown to be efficient in estimating BPs without performing ML estimations for the resampled data (Hasegawa and Kishino 1994).

Estimation of Branching Dates with a Relaxed Molecular Clock

From the sequences of BKV isolates and SA12 (Cantalupo et al. 2005), we estimated branching dates in the BKV tree. However, the molecular clock does not always hold, and in general, we must take account of the possible rate difference among lineages (a relaxed molecular clock) in estimating the dates. The Bayesian method of Thorne and colleagues (Thorne et al. 1998; Kishino et al. 2001) is useful for this purpose as shown by Hasegawa et al. (2003) and other authors. Here, we applied this method to BKV evolution. By using a JCV isolate (GH1) (Sugimoto et al. 2002) as an outgroup and by tentatively assuming that the BKV/SA12 separation corresponds to the separation of their hosts, that is, between humans and Old World monkeys (Konwles 2001; Valis et al. 1977), we calibrated the clock with 30 MyrBP for the human/Old World monkey separation (e.g., Hasegawa et al. 2003).

Analysis of Adaptive Evolution

An excess of nonsynonymous substitutions over synonymous ones has been regarded as an important indicator of adaptive evolution or positive selection at the molecular level (e.g., Yang 1998). A lineage or an aa site that underwent positive Darwinian selection may have a nonsynonymous/synonymous rate ratio (ω = d N/d S) that is different from those of other lineages or other aa sites, or > 1. We applied the codon-based likelihood models that allow for variable ω ratios among lineages (Yang and Bielawski 2000) and among sites (Nielsen and Yang 1998; Yang et al. 2005). All these analyses were carried out with the CodeML program in PAML (Yang 1997). The F61 model of codon frequencies was used. The overlapping region between VP1 and VP2 was excluded from the phylogenetic analysis because of the different reading frame but was used in the analysis of adaptive evolution.

Results

Phylogenetic Analysis

We sequenced 23 complete BKV DNA clones, including four established previously (Goudsmit et al. 1981; Pauw and Choufoer 1978; Chauhan et al. 1984; Sugimoto et al. 1989) and 19 established in this study (the origins of these clones are shown in Table 2). From each of these and five complete BKV sequences reported previously (the sequence of strain MM was not used because of a relatively large deletion [Yang and Wu 1979]) (Table 2), together with a JCV (GH1) sequence (Sugimoto et al. 2002) used as the outgroup, we extracted sequences encoding individual proteins. For each isolate, we linked these sequences together according to the genomic organization for the concatenated analysis.

Table 2 Twenty-eight BKV isolates analyzed in this study

Since the GTR + Γ + I model (Rodriguez et al. 1990; Yang 1996) turned out to be the best of the models implemented in MODELTEST (Posada and Crandall 1998), this model was used throughout in the ML analyses with PAUP*. ML phylogenies were estimated by PAUP* using not only concatenated nt sequences (Fig. 2A) but also individual nt sequences (only sequences with significant lengths and without overlapping, i.e., those of VP1, VP2, and LT, were analyzed) (Figs. 2B–D).

Fig. 2
figure 2

ML trees relating 28 BKV isolates based on nt sequences (with the GTR + Γ + I model by PAUP*). Phylogenetic trees were reconstructed from concatenated nt sequences of 4959 sites excluding the control region (A) and individual viral gene sequences (B, VP1; C, VP2; and D, LT). A JCV isolate GH1 belonging to Type C (Sugimoto et al. 2002) was used as an outgroup. The horizontal length of each branch is proportional to the estimated number of nucleotide substitutions. The abbreviations for sequences are listed in Table 2. The numbers at nodes in the trees indicate the BPs (percentage) obtained with 100 replicates. Subtypes I, III, and IV and subgroups within subtype I (Ia, Ib-1, Ib-2, and Ic) are indicated.

In the tree based on concatenated nt sequences, the 28 BKV isolates were divided into three clades corresponding to subtypes I, III, and IV with 92, 78, and 87% BPs, respectively (Fig. 2A). The trees based on individual nt sequences (Figs. 2B–D) are all congruent with this division, although the bootstrap supports are not always high and this division is sometimes unclear due to multifurcations. In addition, with the concatenated as well as with individual nt sequences, splits in subtype I generated four clades designated Ia, Ib-1, Ib-2, and Ic. Clades Ia and Ic corresponded to subgroups Ia and Ic, respectively, previously identified on an NJ tree based on 287-bp typing sequences within the VP1 gene (Takasaka et al. 2004), while subgroup Ib (Takasaka et al. 2004) was separated into Ib-1 and Ib-2 in the current analysis. For the relationships within subgroups, there are some differences among the trees based on the concatenated sequences and individual sequences. For example, TW8 and RYU2 represent the first and second earliest offshoots of subtype Ic in the concatenated tree, while other isolates (THK9 and THK11) represent the earliest offshoot in the VP2 tree. However, the BP value for the latter is low (60%), and thus we could not detect any clear evidence of incongruence among the trees. Furthermore, when the TOPALi was applied to the data, no evidence of recombination was detected (data not shown).

After establishing the relationships within each of subtypes I, III, and IV by the ML analysis with PAUP* (see above), we examined the relationships among the three subtypes (I, III, and IV) of BKV with a JCV isolate (GH1) and SA12 as outgroups and with the CS model (Goldman and Yang 1994; Yang et al. 1998) which is not implemented in PAUP* by using the CodeML program in PAML. As shown in Table 3, the GTR model with partition among three codon positions (Nuc123) greatly improved the fitting of the model to the data over the model without partition (Nuc) (AIC: 27,289.4 vs 28,838.8), but the CS model was much better in approximating protein-encoding sequences than the GTR model with partition which does not take account of the correlation among the different codon positions (AIC: 26,053.4 vs 27,289.4). Thus, the CS model is the best of the nt substitution models used in this work. The aa substitution model cannot be compared with the nt substitution models with AIC because of the difference of the data.

Table 3 Comparison of alternative relationships among BKV subtypes with the nucleotide sequences of VP1, VP2, LT, ST, and Agno

All the nt substitution models, including the CS model (the best of the models used), gave only ambiguous results concerning the deep branchings in the BKV tree (Table 3). The ML tree based on the AA and Nuc123 models was Tree 3 (I/III grouping), while that based on the Nuc and CS models was Tree 1 (I/IV grouping). However, the log-likelihood differences among alternative trees were smaller than their SEs, and therefore we could not resolve the basal branching order of BKV evolution. It seems that the branchings among subtypes I, III, and IV of BKV occurred almost simultaneously.

Time Estimate of BKV Evolution

Thorne and Kishino’s Bayesian method (Thorne et al. 1998; Kishino et al. 2001) was applied to the BKV and SA12 sequences with GH1 as an outgroup, assuming 30 MyrBP as the date for the BKV/SA12 separation (see Materials and Methods). Since we could not resolve the branching order among subtypes I, III, and IV, multifurcation among these three lineages was assumed in estimating dates of divergence. The time scale of BKV evolution estimated by this method is as shown in Fig. 3; the separation among subtypes I, III, and IV was estimated to occur 4.4 ± 0.7 MyrBP (± 1 SE), and the deepest node of subtype I to occur 1.2 ± 0.2 MyrBP.

Fig. 3
figure 3

The time scale of BKV evolution estimated by Thorne and Kishino’s Bayesian method assuming the BKV/SA12 separation to be 30 MyrBP when the hosts of the two viruses separated.

Analysis for Detecting Adaptive Evolution

Since we could not find any BKV genes which have ω ratios >1 when averaged over the entire tree (data not shown), we applied the codon-based likelihood models that allow for variable ω = d N/d S ratios among lineages (Yang and Bielawski 2000) and among sites (Nielsen and Yang 1998; Yang et al. 2005). We applied the method to the sequences from the 28 isolates of BKV assuming the evolutionary relationship presented in Fig. 2A.

Table 4 lists branches in which the ratio ω is estimated to exceed one. The F61 model for the codon frequencies was applied. The one-ratio model assumes that ω is constant over all branches in the tree, whereas the free ratio model allows different ω ratios for different branches and looks for branches which have ω ratios exceeding one. For the branches with ω > 1, two LRTs were carried out (1) comparing the one-ratio model (one ratio ω0 across all branches) with the two-ratio model (one branch has ratio ω1 different from ω0 for other branches) (LRT-1) and (2) comparing the neutral model (two-ratio model with fixed ω1 = 1) with the positive selection model (two ratio model with free ratio ω1) (LRT-2).

Table 4 Likelihood ratio statistics (2Δℓ) for testing hypotheses on selection along different lineages

As listed in Table 4, several branches (6, 6, 1, and 5 in the VP1, VP2, LT, and Agno trees) have ω ratios >1, and estimates of ω ratios of these branches are infinite when ω1 is assumed to be a free parameter, indicating the absence of synonymous substitutions along these branches. Among these branches, however, only KOM3 in VP1 and KOM6 in LT have significantly different ω ratios from other branches (p = 0.5% and 3%) by LRT-1. For the former case, four aa substitutions occurred in VP1 along the KOM3 branch, while no synonymous substitutions occurred. It should be noted that the variation in the ω ratio among lineages is not sufficient evidence for adaptive evolution. Besides positive selection, a relaxed selective constraint can also elevate the ω ratio, and in the case of neutral evolution, ω becomes 1. Therefore, it is necessary to test whether ω is significantly >1 in order to detect positive selection by using LRT-2. When LRT-2 was applied, the ω ratios of these two branches turned out not to be significantly different from 1. Thus, the two positive cases of LRT-1 may indicate relaxation of the selective constraint along the particular branches, and we could not find evidence for positive selection.

The epitope region (aa 61–83 of VP1) is known to be responsible for antigenic differences among subtypes (Jin et al. 1993a), and we therefore carried out a similar analysis also on this short region. Although any two-ratio model of the above type (one ratio for one lineage and another ratio for the background) was not preferred significantly over the one-ratio model, a two-ratio model which assigns one ratio to the basal branches of subtypes I, III, and IV and another to the background was preferred significantly over the one-ratio model (2Δℓ = 4.05 > χ2 5% = 3.84, p < 5%), but was not significantly preferred over the fixed two-ratio model (2Δℓ = 1.32 < χ2 5%).

In the analysis presented above, we assumed that all aa sites in a protein are under the same selective pressure with the same ω ratio. The analysis averages the ω ratio across all sites and positive selection can be detected only if the average is > 1. This appears to be a conservative test of positive selection, because positive selection if it exists might be operating only on a limited number of aa sites in a protein. Therefore, an alternative approach should be performed to allow the ω ratio to vary among sites as implemented by Nielsen and Yang (1998), Yang and Bielawski (2000), and Yang et al. (2005). The site models of these authors use a statistical distribution to describe the random variation in ω among sites. An LRT was conducted to compare a null model that does not allow ω > 1 in the distribution with an alternative model that does. Several LRTs were implemented and we used the two LRTs following the recommendation by Yang et al. (2005). The first involves the null model M1 (neutral), which assumes two site classes in proportions p 0 and p 1 = 1 − p 0 with 0 < ω0 < 1 and ω1 = 1, and the alternative model M2 (selection), which adds a proportion p 2 of sites with ω2 > 1 estimated from the data. The second LRT compares the null model M7 (ß), which assumes a ß distribution for ω (in the interval 0 < ω <1), and the alternative model M8 (ß & ω, which adds an extra class of sites with positive selection (ωs > 1). If the LRT is significant, positive selection is inferred. The Bayes empirical Bayes (BEB) approach is then used to calculate the posterior probability that each site is from a particular site class, and sites with high posterior probabilities (p > 95%) coming from the class with ω > 1 are inferred to be under positive selection. This approach makes it possible to detect positive selection and to identify sites under positive selection even if the average ω ratio over all sites is <1.

Table 5 lists the aa sites inferred to be under positive selection with the BEB approach. Although we could not detect positive selection from the analysis of VP1 as a whole, the analysis of the epitope region identified two (or three) sites to be under positive selection with the M2 (or M8) model. Concerning the other proteins, only two sites in Agno (site 7, R; site 14, S) were identified to be under positive selection, and even these sites were identified with only marginal significance (p = 97.5% and 99.1% for site 7 with the M2 and M8 models, and p = 95.3% for site 14 with M8 but not significant with M2). On the other hand, the naïve empirical Bayes (NEB) approach inferred as many as 24 aa sites of VP2 to be under positive selection (data not shown). Since the NEB approach tends to give a high rate of false positives, the BEB approach is recommended (Yang et al. 2005). Although we have carried out extensive analyses looking for positive selection in the evolution of BKV, we could not obtain any strong evidence for the positive selection when a conservative approach is used.

Table 5 Likelihood ratio statistics (2Δℓ) for testing hypotheses on positively selected sites

Discussion

Molecular Evolution of BKV

BKV isolates are classified into four subtypes based on nt variations in a VP1 region responsible for antigenic diversity (Jin 1993; Jin et al. 1993b, 1995; Agostini et al. 1995; Di Taranto et al. 1997; Baksh et al. 2001; Takasaka et al. 2004). However, it remained to be elucidated to what extent antigenic variation is driving the evolution of BKV. In other words, the role of antigenic diversity in the evolution of BKV remained to be clarified. In this study, the phylogenetic trees were constructed from nt sequences of 28 BKV isolates belonging to subtype I, III, or IV (II was excluded as no isolate belonging to this subtype was available). On all ML trees based on individual viral genes as well as on a tree based on concatenated viral genes, BKV isolates diverged into three clades corresponding to subtypes previously classified by aa or nt variation in the epitope region (Jin et al. 1993b). Thus, our results as a whole indicated that nt substitutions over the entire genome contributed to the evolution of BKV.

Deep Branching of BKV

We carried out phylogenetic analyses of BKV genomes to gain insights into the deep branchings into BKV subtypes. More sophisticated likelihood models, including one taking account of codon structure, were used to elucidate the deep branchings of BKV subtypes. Despite all our efforts, the basal branching order of BKV evolution could not be resolved. Therefore, it appears that the branchings among subtypes I, II, and IV occurred almost simultaneously.

Time Scale for the Evolution of BKV

It is likely that JCV coevolved with modern humans (Yogo et al. 2004), but BKV may have diverged into subtypes before the emergence of modern humans (Stoner and Hübner 2001). Nevertheless, it was suggested that the subclades within subtype I coevolved with modern humans (Takasaka et al. 2004). Then, the origin of subtype I subclades (Ia, Ib, and Ic) dates back to 0.05–0.1 MyrBP. This estimate is earlier by an order of magnitude than the estimate (1.2 ± 0.2 MyrBP) based on the assumption that BKV and SA12 evolved with their hosts.

This discrepancy might be due to the possibly wrong assumption that the separation between BKV and SA12 occurred in correspondence to the separation between the host species of these viruses, i.e., humans and Old World monkeys. For example, if the separation between BKV and SA12 occurred 1 MyrBP, the separation among subtypes I, III, and IV dates to 0.15 ± 0.01 MyrBP, and the deepest node of subtype I dates to 0.04 ± 0.01 MyrBP. These estimates might be more congruent with the timing of the deepest branching of subtype I. However, the above assumption that the separation between BKV and SA12 occurred 1 MyrBP implies that it occurred when each of the natural hosts of these viruses (i.e., humans and Old World monkeys) had already evolved into distinct species. Therefore, we must hypothesize that host shift occurred in the course of the evolution of primate polyomaviruses, even though host shift is rarely observed at present in primate polyomaviruses.

Although the current analysis could not determine the absolute time when the separation among BKV subtypes and the separation among subtype I subclades occurred, it showed that the separation among subtypes dates back to an age about four times earlier than the separation among subtype I subclades. If, as noted above, it can be assumed that the separation among subtype I subclades occurred in correspondence to the out-of-Africa migration of modern humans, then the separation among BKV subtypes occurred significantly before the out-of-Africa migration, supporting the hypothesis that BKV subtypes originated in archaic humans (Stoner and Hübner 2001).

Adaptive Evolution

Although we carried out extensive analyses looking for positive selection during the evolution of BKV, we could not detect any strong evidence for it. Indeed, the aa replacements listed in Table 4 are mostly conservative and the physicochemical distances between the aa are generally not great. For example, the four aa replacements that occurred in VP1 along the KOM3 branch (N←→H, S←→T, E←→D, R←→K) are between chemically similar aa and therefore frequently observed during molecular evolution in general (McLachlan 1971; Grantham 1974; Dayhoff et al. 1978). If positive selection is operating during the evolution of a protein, it is expected that radical replacements of aa with respect to the difference of physicochemical properties are favored over conservative replacements (Hughes 1999; Pupko et al. 2003). The conservative features observed in BKV may be different from those expected of adaptive evolution. This seems to be consistent with the idea that a slightly deleterious evolution (Ohta 1992) rather than positive Darwinian evolution is operating in the case of BKV. As shown in Table 4, the branches with a ω ratio significantly higher than the background ratio are all short terminal branches (KOM3 and KOM6). By analyzing mitochondrial genomes of Hominoidea, Hasegawa et al. (1998) found that the ω ratios are higher within species than between species and interpreted this finding as follows; slightly deleterious mutations, most of which should ultimately be eliminated from the population, might remain during a short time scale such as within species, and therefore the average ω ratio is higher over a short time cale than a long time scale. Our observation is consistent with this expectation. If nonsynonymous mutations are slightly deleterious, they will have a higher probability of fixation in a small population than in a large one.

As described above, an analysis based on entire genes could not detect any strong evidence for positive selection, but an analysis of the epitope region identified a few sites potentially under positive selection (Table 5). There remained the problem of why these sites detected in the epitope region analysis were not detected by the analysis of the whole VP1. The aa at these sites differed among subtypes (Jin et al. 1993a), and therefore it is possible that these aa changes are involved in antigenic differentiation among subtypes. BKV with antigenic changes may have gained an advantage over wild-type BKVs without the antigenic changes, and thus may have been selected in hosts carrying antibodies against wild-type BKVs.