Abstract
The human polyomavirus BK virus (BKV) is ubiquitous in humans, infecting children asymptomatically. BKV is the only primate polyomavirus that has subtypes (I–IV) distinguishable by immunological reactivity. Nucleotide (nt) variations in a major capsid protein (VP1) gene region (designated the epitope region), probably responsible for antigenic diversity, have been used to classify BKV isolates into subtypes. Here, with all the protein-encoding gene sequences, we attempted to elucidate the evolutionary relationships among 28 BKV isolates belonging to subtypes I, III, and IV (no isolate belonging to subtype II, a minor one, was included). First, using the GTR + Γ + I model, maximum likelihood trees were reconstructed for individual viral genes as well as for concatenated viral genes. On the resultant trees, the 28 BKV isolates were consistently divided into three clades corresponding to subtypes I, III, and IV, although bootstrap probabilities are not always high. Then we used more sophisticated likelihood models, one of which takes account of codon structure, to elucidate the phylogenetic relationships among BKV subtypes, but the phylogeny of the deep branchings remained ambiguous. Furthermore, the possibility of positive selection in the evolution of BKV was examined using the nonsynonymous/synonymous rate ratio as a measure of selection. An analysis based on entire genes could not detect any strong evidence for positive selection, but that based on the epitope region identified a few sites potentially under positive selection (these sites were among those showing subtype linked polymorphisms).
Similar content being viewed by others
Avoid common mistakes on your manuscript.
Introduction
BK virus (BKV) belongs to the Polyomaviridae family (Cole and Conzen 2001). Sixteen members of this family are now known. All have capsids of similar size and are formed by three viral capsid proteins. All show a similar genomic organization (see Fig. 1), and many regions of their genomes are highly conserved among them, demonstrating that they are derived from a common ancestor. Furthermore, all have unique natural hosts. For example, humans are the natural hosts of BKV and JC virus (JCV), while Papio ursinus, the chacma baboon, is thought to be the natural host of simian agent 12 (SA12), a virus closely related to BKV (Cunningham and Pipas 1985). Thus, it is considered that the Polyomaviridae coevolved with their viral hosts.
BKV was first isolated in the urine of a renal transplant patient (Gardner et al. 1971). Seroepidemiological surveys conducted in various countries have since demonstrated that this virus is ubiquitous in humans (Knowles 2001). Infection most frequently occurs asymptomatically during childhood (Knowles 2001). After the primary infection, BKV persists in renal tissue (Heritage et al. 1981; Chesters et al. 1983). The renal BKV becomes reactivated particularly in immunocompromised patients, and progeny viruses are excreted in urine. In renal transplant patients, BKV causes renal dysfunction, such as BKV-associated nephropathy (de Bruyn and Limaye 2004).
BKV is the only primate polyomavirus that has subtypes distinguishable by immunological reactivity (Knowles 2001). Jin et al. (1993a) investigated VP1 (the major capsid protein) gene sequences which may be responsible for the antigenic variability among BKV isolates (see Fig. 1). In comparisons of VP1 among representative strains belonging to various subtypes, they found that although there is 95.8% similarity over the entire coding region of VP1, there is only 61–70% similarity between residues 61 and 83. Furthermore, they found that this region is stably conserved among isolates belonging to each subtype. Finally, they found that the region spanning amino acids (aa) 61–83 is highly hydrophilic. Based on these findings, this region was postulated to constitute the epitope responsible for serotypic differences between BKV isolates.
Jin et al. (1993b) proposed genomic typing based on the nt variations in a VP1 gene region (hereafter designated the epitope region) that encodes aa 61–83. Using this variable region, the distribution pattern of BKV subtypes has been studied in several countries including England, Tanzania, the United States, and Japan (Jin 1993; Jin et al. 1993b, 1995; Agostini et al. 1995; Di Taranto et al. 1997; Baksh et al. 2001; Takasaka et al. 2004). The results of these studies have indicated that (i) subtype I predominates in all geographical regions, (ii) subtype IV occurs at lower rates, and (iii) subtypes II and III rarely occur. In addition, subtype I has been subdivided into three subgroups called Ia, Ib, and Ic based on nt variations within the typing region (Takasaka et al. 2004).
Nevertheless, it remained to be elucidated whether the phylogeny of BKV is in parallel with its antigenic variation. Naturally, a substantial number of full-length sequences of BKV isolates belonging to various subtypes are required to establish the phylogenetic relationships among BKV subtypes. Thus far, three complete BKV DNA sequences derived from different individuals (Seif et al. 1979; Yang and Wu 1979; Tavis et al. 1989) and several sequences from each of three individuals (Chen et al. 2004) have been reported. Most of these sequences, however, belonged to subtype I, with a single exception, strain AS belonging to subtype III.
In this study, we made a phylogenetic comparison of 28 complete BKV coding sequences derived from different individuals, using the maximum likelihood (ML) method (Felsenstein 1981). These sequences were determined mainly in this study, and included 21, 2, and 5 sequences belonging to subtypes I, III, and IV, respectively (no sequences belonging to subtype II, a minor subtype, were included, as no subtype II isolates were available). First, we reconstructed phylogenetic trees for individual protein-encoding genes, as well as for concatenated genes with the ML method at the nt level, and demonstrated that the BKV isolates analyzed in this work can be classified into three phylogenetic groups, I, III, and IV, consistent with previous studies. Then the order of deep branchings among these groups was analyzed in detail at both the aa and the nt levels using a JCV strain (GH1) (Sugimoto et al. 2002) and SA12 (Cantalupo et al. 2005) as outgroups.
As described above, BKV is unique in that it has four serologically distinguishable subtypes. Stoner and Hübner (2001) speculated that these subtypes might have originated in archaic humans rather than modern humans. In contrast, subclades (Ia, Ib, and Ic) within subtype I are associated with human populations (Takasaka et al. 2004; unpublished observations), suggesting that subtupe I subclades separated in correspondence to the division of modern humans after their out-of-Africa migration. Further insights into the relationships between BKV and humans should be obtained, if it is determined when the division in the BKV subtypes and that in the subtype I subclades occurred. Now that the complete nt sequences of many BKV strains belonging to three of the four subtypes of BKV and that of SA12, i.e., the most closely related to BKV among various primate polyomaviruses (Cunningham and Pipas 1985), are available, here we attempted to elucidate the time scale of the division among the subtypes of BKV and the division among subclades within subtype I, assuming that BKV and SA12 split when the natural hosts of these viruses diverged, i.e., 30 MyrBP (Hasegawa et al. 2003).
Comparison of synonymous and nonsynonymous substitution rates in protein encoding genes is important for studying the mechanism of DNA sequence evolution (Kimura 1983). A commonly used measure in such studies is the nonsynonymous/synonymous rate ratio (ω = d N/d S, where d N and d S are the numbers of nonsynonymous and synonymous substitutions per site, respectively), termed the acceptance rate by Miyata and Yasunaga (1980). An excess of nonsynonymous substitutions over synonymous ones has been regarded as an important indicator of positive selection at the molecular level (e.g., Yang 1998). Several studies have demonstrated that some viral genomes such as those of HIV and influenza virus are under continuous pressure of positive selection (e.g., Yamaguchi-Kabata and Gojobori 2000; Yang 2000; Yang et al. 2003). In this paper, we examine whether a similar selective pressure is operating on the BKV genome.
Materials and Methods
Urine Samples
We used crude BKV DNAs previously extracted from urine samples collected from renal transplant and bone marrow transplant patients (Takasaka et al. 2004).
DNA Analysis
Entire BKV DNAs were cloned into pUC19 at the unique BamHI site as described previously (Yogo et al. 1991). The complete BKV DNA clones were prepared using a QIAGEN Plasmid Maxi kit (QIAGEN GmbH, Hilden, Germany). Purified plasmids were used for a cycle sequencing reaction using the DYEnamic ET Terminator Cycle Sequencing Kit (Amersham Biosciences, Buckinghamshire, England). Forward and reverse primers used are listed in Table 1. Primers were added to a final concentration of 0.25 pmol/μl in a final reaction volume of 20 μl. The reaction profile was 25 cycles of 30 sec at 96°C, 15 sec at 50°C, and 60 sec at 60°C. The reaction was terminated at 4°C. The cycle sequencing products were purified with Centri-Sep columns (Princeton Separations, Adelphia, NJ, USA). DNA sequencing was performed using an automated sequencer (ABI PRISM 373S DNA sequencer, Applied Biosystems, Foster City, USA).
Phylogenetic Analysis
Sequences encoding viral proteins and spacer sequences were used in the phylogenetic analyses. Positions with gaps or ambiguous alignments plus overlapping regions with different reading frames were excluded. The total number of codons remaining is 1453. Prior to the ML tree’s reconstruction from the nt sequence data, MODELTEST 3.06 (Posada and Crandall 1998) was applied in order to find the optimal model of nt substitution by using the Akaike (1974) information criterion (AIC), first introduced by Kishino and Hasegawa (1990) in the field of molecular evolution. AIC is defined by AIC = −2 × log-likelihood + 2 × (number of parameters), and a model that minimizes this criterion is considered to be the most appropriate. The optimal model thus selected was assumed throughout in using PAUP*4.0b10 (Swofford 2002) for reconstructing phylogenetic trees. Heuristic ML searches using tree bisection/reconnection (TBR) branch swapping (initial trees were obtained by the neighbor-joining [NJ] method) were performed in PAUP*. ML nodal support was estimated by using the nonparametric bootstrap (Felsenstein 1985) with 100 replications. Furthermore, the TOPALi program (Milne et al. 2004) was applied to the sequence data to test the possibility of recombination during the evolution of BKV.
After establishing the phylogenetic relationships within several groups of BKV, we examined the branching orders among the major groups of BKV with the codon-substitution (CS) model (Goldman and Yang 1994; Yang et al. 1998) with a JCV isolate (GH1) (Sugimoto et al. 2002) and SA12 (Cantalupo et al. 2005) as outgroups. This model is more realistic and much better in terms of AIC than the models implemented in PAUP*, which do not take account of the correlation among different codon positions (e.g., Sasaki et al. 2005). We used the CodeML program in PAML (version 3.14) (Yang 1997) for this analysis.
For the CS model in phylogenetic analyses, we used Miyata, Miyazawa, and Yasunaga’s (1979) aa distance with geometric formulae (Yang et al. 1998). We also used the BaseML program in the PAML package with the GTR (Rodriguez et al. 1990) + Γ model in analyzing nt sequences with and without partition among different codon positions. In using CodeML and BaseML, the discrete Γ distribution (with eight categories) for site heterogeneity (Yang 1996) was adopted, and the shape parameter (α) of the Γ model was optimized. Bootstrap probabilities (BPs) were estimated by the RELL (resampling of estimated log-likelihoods) method (Kishino et al. 1990) with 10,000 bootstrap resamplings. The RELL method has been shown to be efficient in estimating BPs without performing ML estimations for the resampled data (Hasegawa and Kishino 1994).
Estimation of Branching Dates with a Relaxed Molecular Clock
From the sequences of BKV isolates and SA12 (Cantalupo et al. 2005), we estimated branching dates in the BKV tree. However, the molecular clock does not always hold, and in general, we must take account of the possible rate difference among lineages (a relaxed molecular clock) in estimating the dates. The Bayesian method of Thorne and colleagues (Thorne et al. 1998; Kishino et al. 2001) is useful for this purpose as shown by Hasegawa et al. (2003) and other authors. Here, we applied this method to BKV evolution. By using a JCV isolate (GH1) (Sugimoto et al. 2002) as an outgroup and by tentatively assuming that the BKV/SA12 separation corresponds to the separation of their hosts, that is, between humans and Old World monkeys (Konwles 2001; Valis et al. 1977), we calibrated the clock with 30 MyrBP for the human/Old World monkey separation (e.g., Hasegawa et al. 2003).
Analysis of Adaptive Evolution
An excess of nonsynonymous substitutions over synonymous ones has been regarded as an important indicator of adaptive evolution or positive selection at the molecular level (e.g., Yang 1998). A lineage or an aa site that underwent positive Darwinian selection may have a nonsynonymous/synonymous rate ratio (ω = d N/d S) that is different from those of other lineages or other aa sites, or > 1. We applied the codon-based likelihood models that allow for variable ω ratios among lineages (Yang and Bielawski 2000) and among sites (Nielsen and Yang 1998; Yang et al. 2005). All these analyses were carried out with the CodeML program in PAML (Yang 1997). The F61 model of codon frequencies was used. The overlapping region between VP1 and VP2 was excluded from the phylogenetic analysis because of the different reading frame but was used in the analysis of adaptive evolution.
Results
Phylogenetic Analysis
We sequenced 23 complete BKV DNA clones, including four established previously (Goudsmit et al. 1981; Pauw and Choufoer 1978; Chauhan et al. 1984; Sugimoto et al. 1989) and 19 established in this study (the origins of these clones are shown in Table 2). From each of these and five complete BKV sequences reported previously (the sequence of strain MM was not used because of a relatively large deletion [Yang and Wu 1979]) (Table 2), together with a JCV (GH1) sequence (Sugimoto et al. 2002) used as the outgroup, we extracted sequences encoding individual proteins. For each isolate, we linked these sequences together according to the genomic organization for the concatenated analysis.
Since the GTR + Γ + I model (Rodriguez et al. 1990; Yang 1996) turned out to be the best of the models implemented in MODELTEST (Posada and Crandall 1998), this model was used throughout in the ML analyses with PAUP*. ML phylogenies were estimated by PAUP* using not only concatenated nt sequences (Fig. 2A) but also individual nt sequences (only sequences with significant lengths and without overlapping, i.e., those of VP1, VP2, and LT, were analyzed) (Figs. 2B–D).
In the tree based on concatenated nt sequences, the 28 BKV isolates were divided into three clades corresponding to subtypes I, III, and IV with 92, 78, and 87% BPs, respectively (Fig. 2A). The trees based on individual nt sequences (Figs. 2B–D) are all congruent with this division, although the bootstrap supports are not always high and this division is sometimes unclear due to multifurcations. In addition, with the concatenated as well as with individual nt sequences, splits in subtype I generated four clades designated Ia, Ib-1, Ib-2, and Ic. Clades Ia and Ic corresponded to subgroups Ia and Ic, respectively, previously identified on an NJ tree based on 287-bp typing sequences within the VP1 gene (Takasaka et al. 2004), while subgroup Ib (Takasaka et al. 2004) was separated into Ib-1 and Ib-2 in the current analysis. For the relationships within subgroups, there are some differences among the trees based on the concatenated sequences and individual sequences. For example, TW8 and RYU2 represent the first and second earliest offshoots of subtype Ic in the concatenated tree, while other isolates (THK9 and THK11) represent the earliest offshoot in the VP2 tree. However, the BP value for the latter is low (60%), and thus we could not detect any clear evidence of incongruence among the trees. Furthermore, when the TOPALi was applied to the data, no evidence of recombination was detected (data not shown).
After establishing the relationships within each of subtypes I, III, and IV by the ML analysis with PAUP* (see above), we examined the relationships among the three subtypes (I, III, and IV) of BKV with a JCV isolate (GH1) and SA12 as outgroups and with the CS model (Goldman and Yang 1994; Yang et al. 1998) which is not implemented in PAUP* by using the CodeML program in PAML. As shown in Table 3, the GTR model with partition among three codon positions (Nuc123) greatly improved the fitting of the model to the data over the model without partition (Nuc) (AIC: 27,289.4 vs 28,838.8), but the CS model was much better in approximating protein-encoding sequences than the GTR model with partition which does not take account of the correlation among the different codon positions (AIC: 26,053.4 vs 27,289.4). Thus, the CS model is the best of the nt substitution models used in this work. The aa substitution model cannot be compared with the nt substitution models with AIC because of the difference of the data.
All the nt substitution models, including the CS model (the best of the models used), gave only ambiguous results concerning the deep branchings in the BKV tree (Table 3). The ML tree based on the AA and Nuc123 models was Tree 3 (I/III grouping), while that based on the Nuc and CS models was Tree 1 (I/IV grouping). However, the log-likelihood differences among alternative trees were smaller than their SEs, and therefore we could not resolve the basal branching order of BKV evolution. It seems that the branchings among subtypes I, III, and IV of BKV occurred almost simultaneously.
Time Estimate of BKV Evolution
Thorne and Kishino’s Bayesian method (Thorne et al. 1998; Kishino et al. 2001) was applied to the BKV and SA12 sequences with GH1 as an outgroup, assuming 30 MyrBP as the date for the BKV/SA12 separation (see Materials and Methods). Since we could not resolve the branching order among subtypes I, III, and IV, multifurcation among these three lineages was assumed in estimating dates of divergence. The time scale of BKV evolution estimated by this method is as shown in Fig. 3; the separation among subtypes I, III, and IV was estimated to occur 4.4 ± 0.7 MyrBP (± 1 SE), and the deepest node of subtype I to occur 1.2 ± 0.2 MyrBP.
Analysis for Detecting Adaptive Evolution
Since we could not find any BKV genes which have ω ratios >1 when averaged over the entire tree (data not shown), we applied the codon-based likelihood models that allow for variable ω = d N/d S ratios among lineages (Yang and Bielawski 2000) and among sites (Nielsen and Yang 1998; Yang et al. 2005). We applied the method to the sequences from the 28 isolates of BKV assuming the evolutionary relationship presented in Fig. 2A.
Table 4 lists branches in which the ratio ω is estimated to exceed one. The F61 model for the codon frequencies was applied. The one-ratio model assumes that ω is constant over all branches in the tree, whereas the free ratio model allows different ω ratios for different branches and looks for branches which have ω ratios exceeding one. For the branches with ω > 1, two LRTs were carried out (1) comparing the one-ratio model (one ratio ω0 across all branches) with the two-ratio model (one branch has ratio ω1 different from ω0 for other branches) (LRT-1) and (2) comparing the neutral model (two-ratio model with fixed ω1 = 1) with the positive selection model (two ratio model with free ratio ω1) (LRT-2).
As listed in Table 4, several branches (6, 6, 1, and 5 in the VP1, VP2, LT, and Agno trees) have ω ratios >1, and estimates of ω ratios of these branches are infinite when ω1 is assumed to be a free parameter, indicating the absence of synonymous substitutions along these branches. Among these branches, however, only KOM3 in VP1 and KOM6 in LT have significantly different ω ratios from other branches (p = 0.5% and 3%) by LRT-1. For the former case, four aa substitutions occurred in VP1 along the KOM3 branch, while no synonymous substitutions occurred. It should be noted that the variation in the ω ratio among lineages is not sufficient evidence for adaptive evolution. Besides positive selection, a relaxed selective constraint can also elevate the ω ratio, and in the case of neutral evolution, ω becomes 1. Therefore, it is necessary to test whether ω is significantly >1 in order to detect positive selection by using LRT-2. When LRT-2 was applied, the ω ratios of these two branches turned out not to be significantly different from 1. Thus, the two positive cases of LRT-1 may indicate relaxation of the selective constraint along the particular branches, and we could not find evidence for positive selection.
The epitope region (aa 61–83 of VP1) is known to be responsible for antigenic differences among subtypes (Jin et al. 1993a), and we therefore carried out a similar analysis also on this short region. Although any two-ratio model of the above type (one ratio for one lineage and another ratio for the background) was not preferred significantly over the one-ratio model, a two-ratio model which assigns one ratio to the basal branches of subtypes I, III, and IV and another to the background was preferred significantly over the one-ratio model (2Δℓ = 4.05 > χ2 5% = 3.84, p < 5%), but was not significantly preferred over the fixed two-ratio model (2Δℓ = 1.32 < χ2 5%).
In the analysis presented above, we assumed that all aa sites in a protein are under the same selective pressure with the same ω ratio. The analysis averages the ω ratio across all sites and positive selection can be detected only if the average is > 1. This appears to be a conservative test of positive selection, because positive selection if it exists might be operating only on a limited number of aa sites in a protein. Therefore, an alternative approach should be performed to allow the ω ratio to vary among sites as implemented by Nielsen and Yang (1998), Yang and Bielawski (2000), and Yang et al. (2005). The site models of these authors use a statistical distribution to describe the random variation in ω among sites. An LRT was conducted to compare a null model that does not allow ω > 1 in the distribution with an alternative model that does. Several LRTs were implemented and we used the two LRTs following the recommendation by Yang et al. (2005). The first involves the null model M1 (neutral), which assumes two site classes in proportions p 0 and p 1 = 1 − p 0 with 0 < ω0 < 1 and ω1 = 1, and the alternative model M2 (selection), which adds a proportion p 2 of sites with ω2 > 1 estimated from the data. The second LRT compares the null model M7 (ß), which assumes a ß distribution for ω (in the interval 0 < ω <1), and the alternative model M8 (ß & ω, which adds an extra class of sites with positive selection (ωs > 1). If the LRT is significant, positive selection is inferred. The Bayes empirical Bayes (BEB) approach is then used to calculate the posterior probability that each site is from a particular site class, and sites with high posterior probabilities (p > 95%) coming from the class with ω > 1 are inferred to be under positive selection. This approach makes it possible to detect positive selection and to identify sites under positive selection even if the average ω ratio over all sites is <1.
Table 5 lists the aa sites inferred to be under positive selection with the BEB approach. Although we could not detect positive selection from the analysis of VP1 as a whole, the analysis of the epitope region identified two (or three) sites to be under positive selection with the M2 (or M8) model. Concerning the other proteins, only two sites in Agno (site 7, R; site 14, S) were identified to be under positive selection, and even these sites were identified with only marginal significance (p = 97.5% and 99.1% for site 7 with the M2 and M8 models, and p = 95.3% for site 14 with M8 but not significant with M2). On the other hand, the naïve empirical Bayes (NEB) approach inferred as many as 24 aa sites of VP2 to be under positive selection (data not shown). Since the NEB approach tends to give a high rate of false positives, the BEB approach is recommended (Yang et al. 2005). Although we have carried out extensive analyses looking for positive selection in the evolution of BKV, we could not obtain any strong evidence for the positive selection when a conservative approach is used.
Discussion
Molecular Evolution of BKV
BKV isolates are classified into four subtypes based on nt variations in a VP1 region responsible for antigenic diversity (Jin 1993; Jin et al. 1993b, 1995; Agostini et al. 1995; Di Taranto et al. 1997; Baksh et al. 2001; Takasaka et al. 2004). However, it remained to be elucidated to what extent antigenic variation is driving the evolution of BKV. In other words, the role of antigenic diversity in the evolution of BKV remained to be clarified. In this study, the phylogenetic trees were constructed from nt sequences of 28 BKV isolates belonging to subtype I, III, or IV (II was excluded as no isolate belonging to this subtype was available). On all ML trees based on individual viral genes as well as on a tree based on concatenated viral genes, BKV isolates diverged into three clades corresponding to subtypes previously classified by aa or nt variation in the epitope region (Jin et al. 1993b). Thus, our results as a whole indicated that nt substitutions over the entire genome contributed to the evolution of BKV.
Deep Branching of BKV
We carried out phylogenetic analyses of BKV genomes to gain insights into the deep branchings into BKV subtypes. More sophisticated likelihood models, including one taking account of codon structure, were used to elucidate the deep branchings of BKV subtypes. Despite all our efforts, the basal branching order of BKV evolution could not be resolved. Therefore, it appears that the branchings among subtypes I, II, and IV occurred almost simultaneously.
Time Scale for the Evolution of BKV
It is likely that JCV coevolved with modern humans (Yogo et al. 2004), but BKV may have diverged into subtypes before the emergence of modern humans (Stoner and Hübner 2001). Nevertheless, it was suggested that the subclades within subtype I coevolved with modern humans (Takasaka et al. 2004). Then, the origin of subtype I subclades (Ia, Ib, and Ic) dates back to 0.05–0.1 MyrBP. This estimate is earlier by an order of magnitude than the estimate (1.2 ± 0.2 MyrBP) based on the assumption that BKV and SA12 evolved with their hosts.
This discrepancy might be due to the possibly wrong assumption that the separation between BKV and SA12 occurred in correspondence to the separation between the host species of these viruses, i.e., humans and Old World monkeys. For example, if the separation between BKV and SA12 occurred 1 MyrBP, the separation among subtypes I, III, and IV dates to 0.15 ± 0.01 MyrBP, and the deepest node of subtype I dates to 0.04 ± 0.01 MyrBP. These estimates might be more congruent with the timing of the deepest branching of subtype I. However, the above assumption that the separation between BKV and SA12 occurred 1 MyrBP implies that it occurred when each of the natural hosts of these viruses (i.e., humans and Old World monkeys) had already evolved into distinct species. Therefore, we must hypothesize that host shift occurred in the course of the evolution of primate polyomaviruses, even though host shift is rarely observed at present in primate polyomaviruses.
Although the current analysis could not determine the absolute time when the separation among BKV subtypes and the separation among subtype I subclades occurred, it showed that the separation among subtypes dates back to an age about four times earlier than the separation among subtype I subclades. If, as noted above, it can be assumed that the separation among subtype I subclades occurred in correspondence to the out-of-Africa migration of modern humans, then the separation among BKV subtypes occurred significantly before the out-of-Africa migration, supporting the hypothesis that BKV subtypes originated in archaic humans (Stoner and Hübner 2001).
Adaptive Evolution
Although we carried out extensive analyses looking for positive selection during the evolution of BKV, we could not detect any strong evidence for it. Indeed, the aa replacements listed in Table 4 are mostly conservative and the physicochemical distances between the aa are generally not great. For example, the four aa replacements that occurred in VP1 along the KOM3 branch (N←→H, S←→T, E←→D, R←→K) are between chemically similar aa and therefore frequently observed during molecular evolution in general (McLachlan 1971; Grantham 1974; Dayhoff et al. 1978). If positive selection is operating during the evolution of a protein, it is expected that radical replacements of aa with respect to the difference of physicochemical properties are favored over conservative replacements (Hughes 1999; Pupko et al. 2003). The conservative features observed in BKV may be different from those expected of adaptive evolution. This seems to be consistent with the idea that a slightly deleterious evolution (Ohta 1992) rather than positive Darwinian evolution is operating in the case of BKV. As shown in Table 4, the branches with a ω ratio significantly higher than the background ratio are all short terminal branches (KOM3 and KOM6). By analyzing mitochondrial genomes of Hominoidea, Hasegawa et al. (1998) found that the ω ratios are higher within species than between species and interpreted this finding as follows; slightly deleterious mutations, most of which should ultimately be eliminated from the population, might remain during a short time scale such as within species, and therefore the average ω ratio is higher over a short time cale than a long time scale. Our observation is consistent with this expectation. If nonsynonymous mutations are slightly deleterious, they will have a higher probability of fixation in a small population than in a large one.
As described above, an analysis based on entire genes could not detect any strong evidence for positive selection, but an analysis of the epitope region identified a few sites potentially under positive selection (Table 5). There remained the problem of why these sites detected in the epitope region analysis were not detected by the analysis of the whole VP1. The aa at these sites differed among subtypes (Jin et al. 1993a), and therefore it is possible that these aa changes are involved in antigenic differentiation among subtypes. BKV with antigenic changes may have gained an advantage over wild-type BKVs without the antigenic changes, and thus may have been selected in hosts carrying antibodies against wild-type BKVs.
References
Agostini HT, Brubaker GR, Shao J, Levin A, Ryschkewitsch CF, Blattner WA, Stoner GL (1995) BK virus and a new type of JC virus excreted by HIV-1 positive patients in rural Tanzania. Arch Virol 140:1919–1934
Akaike H (1974) A new look at the statistical model identification. IEEE Trans Aut Cont 19:716–723
Baksh FK, Finkelstein SD, Swalsky PA, Stoner GL, Ryschkewitsch CF, Randhawa P (2001) Molecular genotyping of BK and JC viruses in human polyomavirus-associated interstitial nephritis after renal transplantation. Am J Kidney Dis 38:354–365
Cantalupo P, Doering A, Sullivan CS, Pal A, Peden KWC, Lewis AM, Pipas JM (2005) Complete nucleotide sequence of polyomavirus SA12. J Virol 79:13094–13104
Chauhan S, Lecatsas G, Harley EH (1984) Genome analysis of BK (WW) viral DNA cloned directly from human urine. Intervirology 22:170–176
Chen Y, Sharp PM, Fowkes M, Kocher O, Joseph JT, Koralnik IJ (2004) Analysis of 15 novel full-length BK virus sequences from three individuals: evidence of a high intra-strain genetic diversity. J Gen Virol 85:2651–2663
Chesters PM, Heritage J, McCance DJ (1983) Persistence of DNA sequences of BK virus and JC virus in normal human tissues and in diseased tissues. J Infect Dis 147:676–684
Cole CN, Conzen SD (2001) Polyomaviridae: the viruses and their replication. In: Knipe DM, Howley PM (eds) Fields virology, 4th ed. Lippincott Williams & Wilkins, Philadelphia, pp 2141–2173
Coleman DV, Wolfendale MR, Daniel RA, Dhanjal NK, Gardner SD, Gibson PE, Field AM (1980) A prospective study of human polyomavirus infection in pregnancy. J Infect Dis 142:1–8
Cunningham TP, Pipas JM (1985) Simian agent 12 is a BK virus-like papovavirus which replicates in monkey cells. J Virol 54:483–492
Dayhoff MO, Schwartz RM, Orcutt BC (1978) A model of evolutionary change in proteins. In: Dayhoff MO (ed) Atlas of protein sequence and structure, Vol 5, Suppl 3. National Biomedical Research Foundation. Washington DC, pp 345–352
de Bruyn G, Limaye AP (2004) BK virus-associated nephropathy in kidney transplant recipients. Rev Med Virol 14:193–205
Di Taranto C, Pietropaolo V, Orsi GB, Jin L, Sinibaldi L, Degener AM (1997) Detection of BK polyomavirus genotypes in healthy and HIV-positive children. Eur J Epidemiol 13:653–657
Felsenstein J (1981) Evolutionary trees from DNA sequences: a maximum likelihood approach. J Mol Evol 17:368–376
Felsenstein J (1985) Confidence limits on phylogenies: An approach using the bootstrap. Evolution 39:783–791
Gardner SD, Field AM, Coleman DV, Hulme B (1971) New human papovavirus (B.K.) isolated from urine after renal transplantation. Lancet 1:1253–1257
Goldman N, Yang Z (1994) A codon-based model of nucleotide substitution for protein-coding DNA sequences. Mol Biol Evol 11:725–736
Goudsmit J, Baak ML, Sleterus KW, Van der Noordaa J (1981) Human papovavirus isolated from urine of a child with acute tonsillitis. Br Med J (Clin Res Ed). 283:1363–1364
Grantham R (1974) Amino acid differences formula to help explain protein evolution. Science 185:862–864
Hasegawa M, Kishino H (1994) Accuracies of the simple methods for estimating the bootstrap probability of a maximum likelihood tree. Mol Biol Evol 11:142–145
Hasegawa M, Cao Y, Yang Z (1998) Preponderance of slightly deleterious polymorphism in mitochondrial DNA: nonsynonymous/synonymous rate ratio is much higher within species than between species. Mol Biol Evol 15:1499–1505
Hasegawa M, Thorne JL, Kishino H (2003) Time scale of eutherian evolution estimated without assuming a constant rate of molecular evolution. Genes Genet Syst 78:267–283
Heritage J, Chesters PM, McCance DJ (1981) The persistence of papovavirus BK DNA sequences in normal human renal tissue. J Med Virol 8:143–150
Howley PM, Khoury G, Byrne JC, Takemoto KK, Martin MA (1975) Physical map of the BK virus genome. J Virol 16:959–973
Hughes A (1999) Adaptive evolution of genes and genomes. Oxford University Press, New York
Jin L (1993) Rapid genomic typing of BK virus directly from clinical specimens. Mol Cell Probes 7:331–334
Jin L, Gibson PE, Knowles WA, Clewley JP (1993a) BK virus antigenic variants: sequence analysis within the capsid VP1 epitope. J Med Virol 39:50–56
Jin L, Gibson PE, Booth JC, Clewley JP (1993b) Genomic typing of BK virus in clinical specimens by direct sequencing of polymerase chain reaction products. J MedVirol 41:11–17
Jin L, Pietropaolo V, Booth JC, Ward KH, Brown DW (1995) Prevalence and distribution of BK virus subtypes in healthy people and immunocompromised patients detected by PCR-restriction enzyme analysis. Clin Diag Virol 3:285–295
Kimura M (1983) The neutral theory of molecular evolution. Cambridge University Press, Cambridge
Kishino H, Hasegawa M (1989) Evaluation of the maximum likelihood estimate of the evolutionary tree topologies from DNA sequence data, and the branching order in hominoidea. J Mol Evol 29:170–179
Kishino H, Hasegawa M (1990) Converting distance to time: application to human evolution. Methods Enzymol 183:550–570
Kishino H, Miyata T, Hasegawa M (1990) Maximum likelihood inference of protein phylogeny and the origin of chloroplasts. J Mol Evol 30:151–160
Kishino H, Thorne JL, Bruno WJ (2001) Performance of a divergence time estimation method under a probabilistic model of rate evolution. Mol Biol Evol 18:352–361
Knowles WA (2001) The epidemiology of BK virus and the occurrence of antigenic and genomic subtypes. In: Khalili K, Stoner GL (eds) Human polyomaviruses: Molecular and clinical perspectives. John Wiley & Sons, New York, pp 527–559
McLachlan AD (1971) Tests for comparing related amino-acid sequences. Cytochrome c and cytochrome c551. J Mol Evol 61:409–424
Milne I, Wright F, Rowe G, Marshall DF, Husmeier D, McGuire G (2004) TOPALi: software for automatic identification of recombinant sequences within DNA multiple alignments. Bioinformatics 20:1806–1807
Miyata T, Yasunaga T (1980) Molecular evolution of mRNA: a method for estimating evolutionary rates of synonymous and amino acid substitutions from homologous nucleotide sequences and its applications. J Mol Evol 16:23–36
Miyata T, Miyazawa S, Yasunaga T (1979) Two types of amino acid substitutions in protein evolution. J Mol Evol 12:219–236
Nielsen R, Yang Z (1998) Likelihood models for detecting positively selected amino acid sites and applications to the HIV-1 envelope gene. Genetics 148:929–936
Ohta T (1992) The nearly neutral theory of molecular evolution. Annu Rev Ecol Syst 23:263–286
Pauw W, Choufoer J (1978) Isolation of a variant of BK virus with altered restriction endonuclease pattern. Arch Virol 57:35–42
Posada D, Crandall KA (1998) MODELTEST: testing the model of DNA substitution. Bioinformatics 14:817–818
Pupko T, Sharan R, Hasegawa M, Shamir R, Graur D (2003) Detecting excess radical replacements in phylogenetic trees. Gene 319:127–135
Rodriguez F, Oliver JL, Marin A, Medina JR (1990) The general stochastic model of nucleotide substitution. J Theor Biol 142:485–501
Sasaki T, Nikaido M, Hamilton H, Goto M, Kato H, Kanda N, Pastene LA, Cao Y, Fordyce RE, Hasegawa M, Okada N (2005) Mitochondrial phylogenetics and evolution of Mysticete whales. Syst Biol 54:77–90
Seif I, Khoury G, Dhar R (1979) The genome of human papovavirus BKV. Cell 18:963–977
Stoner GL, Hübner R (2001) The human polyomaviruses: past, present, and future. In: Khalili K, Stoner GL (eds) Human polyomaviruses: Molecular and clinical perspectives. John Wiley & Sons, New York, pp 127–148
Sugimoto C, Hara K, Taguchi F, Yogo Y (1989) Growth efficiency of naturally occurring BK virus variants in vivo and in vitro. J Virol 63:3195–3199
Sugimoto C, Hasegawa M, Kato A, Zheng HY, Ebihara H, Taguchi F, Kitamura T, Yogo Y (2002) Evolution of human polyomavirus JC: implications for the population history of humans. J Mol Evol 54:285–297
Swofford DL (2003) PAUP*: Phylogenetic Analysis Using Parsimony (*and other methods), version 4.0b10. Sinauer Associates, Sunderland, MA
Takasaka T, Goya N, Tokumoto T, Tanabe K, Toma H, Ogawa Y, Hokama S, Momose A, Funyu T, Fujioka T, Omori S, Akiyama H, Chen Q, Zheng HY, Ohta N, Kitamura T, Yogo Y (2004) Subtypes of BK virus prevalent in Japan and variation in their transcriptional control region. J Gen Virol 85:2821–2827
Tavis JE, Walker DL, Gardner SD, Frisque RJ (1989) Nucleotide sequence of the human polyomavirus AS virus, an antigenic variant of BK virus. J Virol 63:901–911
Thorne JL, Kishino H, Painter IS (1998) Estimating the rate of evolution of the rate of molecular evolution. Mol Biol Evol 15:1647–1657
Valis JD, Newell N, Reissig M, Malherbe H, Kaschula VR, Shah KV (1977) Characterization of SA12 as a simian virus 40-related papovavirus of chacma baboons. Infect Immun 18:247–252
Yamaguchi-Kabata Y, Gojobori T (2000) Reevaluation of amino acid variability of the human immunodeficiency virus type 1 gp120 envelope glycoprotein and prediction of new discontinuous epitopes. J Virol 74:4335–4350
Yang RC, Wu R (1979) BK virus DNA: complete nucleotide sequence of a human tumor virus. Science 206:456–462
Yang W, Bielawski JP, Yang Z (2003) Widespread adaptive evolution in the human immunodeficiency virus type I genome. J Mol Evol 57:212–221
Yang Z (1996) Among-site rate variation and its impact on phylogenetic analyses. TREE 11:367–372
Yang Z (1997) PAML: a program package for phylogenetic analysis by maximum likelihood. CABIOS 13:555–556
Yang Z (1998) Likelihood ratio tests for detecting positive selection and application to primate lysozyme evolution. Mol Biol Evol 15:568–573
Yang Z (2000) Maximum likelihood estimation on large phylogenies and analysis of adaptive evolution in human influenza virus A. J Mol Evol 51:423–432
Yang Z, Bielawski JP (2000) Statistical methods for detecting molecular adaptation. TREE 15:496–503
Yang Z, Nielsen R, Hasegawa M (1998) Models of amino acid substitution and applications to mitochondrial protein evolution. Mol Biol Evol 15:1600–1611
Yang Z, Wong WSW, Nielsen R (2005) Bayes empirical Bayes inference of amino acid sites under positive selection. Mol Biol Evol 22:1107–1118
Yogo Y, Iida T, Taguchi F, Kitamura T, Aso Y (1991) Typing of human polyomavirus JC virus on the basis of restriction fragment length polymorphisms. J Clin Microbiol 29:2130–2138
Yogo Y, Sugimoto C, Zheng HY, Ikegaya H, Takasaka T, Kitamura T (2004) JC virus genotyping offers a new paradigm in the study of human populations. Rev Med Virol 14:179–191
Acknowledgments
We are grateful to all urine donors. We thank Drs. N. Goya, Y. Ogawa, A. Momose, T. Fujioka, and H. Akiyama for providing urine samples and Drs. J. van der Noordaa and R. Rubinstein for providing BKV DNA clones. This study was supported in part by grants from the Ministry of Health, Labor and Welfare, Japan, from the Japan Society for the Promotion of Science, and from TRIC, Research Organization of Information and Systems.
Author information
Authors and Affiliations
Corresponding author
Additional information
These author Yuriko Nishimoto and Tomokazu Takasaka contributed equally to this article.
[Reviewing Editor: Dr. Rasmus Nielsen]
Rights and permissions
About this article
Cite this article
Nishimoto, Y., Takasaka, T., Hasegawa, M. et al. Evolution of BK Virus Based on Complete Genome Data. J Mol Evol 63, 341–352 (2006). https://doi.org/10.1007/s00239-005-0092-5
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00239-005-0092-5