INTRODUCTION

Tick-borne encephalitis virus (TBEV) is usually divided into three subtypes: Far Eastern (TBEV-FE), European (TBEV-Eur), and Siberian (TBEV-Sib) [1]. Also, recently, several studies describing two new subtypes—the Himalayan and Baikalian subtypes—have been published [2–5]. The name of each variant reflects the main regions of circulation of their first representatives. However, subsequently, TBEV strains of all three subtypes were isolated outside the territories indicated in their names. Thus, TBEV-Eur, has been found in the Baltic countries [6, 7] and in West and East Siberia [811], as well as in South Korea [12–14], in addition to in Central Europe. TBEV-Sib occurs most widely and is widespread throughout the range of TBEV [15]. TBEV-FE is more widespread in the Far East region of Eurasia; however, there are cases of finding it in the territory of the Baltic countries, Crimea, the Urals, and Siberia, as well as in the European part of Russia [16, 17]. The phylogeographic connections of the most distant centers are unclear.

The goal of this study was a phylogenetic analysis of the TBEV-Eur strains isolated within the territories of Eastern and Western Siberia, with the establishment of the formation time of their common ancestor using the Bayesian approach. In addition, based on previously published studies devoted to the geographical distribution of TBEV, we analyzed the phylogeographic relationships between the studied group of TBEV-Eur strains from Siberia and the nearest isolates from Europe and the European part of Russia.

MATERIALS AND METHODS

The analysis used 53 nucleotide sequences of the polyprotein gene (10245 nt) of TBEV-Eur from the GenBank international database. It also includes the nucleotide sequences of two TBEV-Eur strains (nos. 214 and 172) from the collection of the Irkutsk Antiplague Research Institute of Siberia and the Far East, isolated within the territory of Irkutsk oblast (Ekhirit-Bulagatsky and Ust-Udinsky raions). Strains nos. 214 and 172 were collected in 1967 and 1968, respectively. Both strains were isolated from the cerebrospinal fluid of patients with tick-borne encephalitis. Reisolation of the strains was carried out on 2- to 3-day-old suckling white mice by inoculation into the brain in a volume of 0.02 mL, followed by a single fixation in the passage.

Total RNA was extracted using a Ribot-prep kit. Reverse transcription was performed using a Revert-L kit (Ampli-Sens, Federal Budget Institution of Science, Central Research Institute of Epidemiology of the Federal Service for Surveillance on Customer Rights Protection and Human Wellbeing, Moscow) in accordance with the manufacturer’s instructions. The fragments of the viral genome were amplified using a Syntol kit (Moscow). The PCR product were sequenced with an ABI PRISM BigDye Terminator v.1.1 Cycle Sequencing Kit and 3500xL Genetic Analyzer (Applied Biosystems).

The nucleotide sequences of strains nos. 214 (MK562430) and 172 (MK560446) have been deposited in the GenBank database.

The polymorphism of the analyzed nucleotide sequences was revealed using the IQTREE v.1.6.12 [18] and MEGA X [19] programs. Substitution saturation for codon positions 1 + 2 and 3 was tested using the DAMBE v.6.4 program by the method proposed by X. Xia et al. [20], based on the comparison of substitution saturation index Iss and critical saturation index Iss,c. In the case of Iss > Iss,c, the codon position is considered oversaturated with nucleotide substitutions and unsuitable for phylogenetic reconstruction and vice versa. Statistical significance in calculating the indices was assessed using a two-tailed test by the same program.

The BEAST v.1.8.4 software package (with the Bayesian phylogenetic method) was used for phylogenetic analysis and assessing the time of divergence of the studied group [21]. The reproducibility of the results for each combination of models was evaluated by four (4) independent runs of the Markov-chain Monte Carlo (MCMC) procedure, the runs were was from 50 to 100 million iterations (the MCMC run continued until the value of the effective sample size (ESS) reached 200). The frequency of data storage was chosen in a way that the total number of samples (trees) in each run was 50 thousand (the burn-in in this case was 10% of the total chain length).

Model combinations were compared on the basis of the value of the marginal likelihood function calculated by the Path Sampling and Stepping-Stone Sampling (PS/SS) methods [22], the number of steps was 100, and the MCMC length for each step was 1 million iterations. An additional comparison of the parameters of evolutionary models (taking into account parameter α for the gamma distribution of the variation of the substitution rates (+G4), as well as the proportion of invariant sites (+I) in the alignment) was carried out on the basis of the Bayesian information criterion (BIC) calculated using the Phangorn package implemented in the R language [23]. The analysis considered three evolutionary models (HKY, GTR, and SRD06), a strict clock model, and population models of exponential growth of population genetic diversity (EG), as well as Bayesian Skyline (BSL) [24] and Bayesian Skygrid (BSG) nonparametric population models [25] capable of taking into account multiple changes in genetic diversity over time. The relaxed molecular clock was not used in the analysis due to the low (0.08) value of the coefficient of variation for the rate of accumulation of nucleotide substitutions between branches of the phylogenetic tree. The statistical uncertainty for divergence time and evolutionary rate was estimated using the 95% interval of the highest probability density (95% HPD).

The phylogeographic relationships of the Siberian TBEV-Eur strains were assessed by means of regression analysis in accordance with the methodological guidelines from the study of D. Heinze et al. [26]. The genetic distance was calculated relative to the Sofjin-HO strain using the MEGA X program (the substitution model with Kimura 2-parameters + G4 + I). The geographic distance was calculated from the isolation site of the Sofjin strain using the Google Earth service (http://www.google.com/earth/index.html). The nucleotide sequences of study [26] of TBEV and louping-ill virus (LIV) strains by D. Heinze et al. were also used in the analysis taking into account their geographical distance from the Sofjin strain indicated in the same study.

RESULTS AND DISCUSSION

Earlier, in the studies of a number of researchers, a detailed description of the Siberian strains of TBE-Eur was provided, including a description of the virulent and invasive properties of the isolated strains; the homology at the nucleotide and amino-acid levels, as well as characteristic amino-acid substitutions, in various proteins of the TBE-Eur polyprotein structure, were indicated, and a relatively high genetic stability of TBEV-Eur in general has been shown [8–11]. The researchers examined the phylogenetic relationships between the Siberian TBEV-Eur strains and their position on the common phylogenetic tree. However, in the above-mentioned studies, the divergence time of a cluster formed by strains isolated on the territory of Siberia was not calculated. To solve this problem, we applied the Bayesian phylogenetic approach using two new nucleotide sequences of TBEV-Eur encoding the polyprotein gene in the analysis.

Phylogenetic Analysis

Analysis of nucleotide sequences polymorphism yielded that the total number of polymorphic sites was 1309, the number of parsimony informative sites was 1109, and the average and maximum number of mismatched nucleotides was 213 and 316, respectively. The obtained values allow the studied genome data set to be characterized as informative. Analysis of nucleotide sequences in the DAMBE program by the method of X. Xia et al. [20] showed the absence of substitution saturation for codon positions 1 + 2 and 3; in both cases, the Iss value (0.016 and 0.095 for positions 1 + 2 and 3, respectively) was significantly lower than Iss,c (0.57 and 0.55 for positions 1 + 2 and 3, respectively) and the p value was 0.00.

Based on a preliminary test of evolutionary models carried out using the Phangorn package, our analysis was supplemented with calculations of the variation of the accumulation rate of nucleotide substitutions (with four number of discrete categories) and the calculation of the proportion of invariant sites (Table 1).

Table 1.   Comparison of GTR and HKY evolutionary models with different combinations of parameters of the γ distribution (four discrete categories) and the proportion of invariant sites

Four independent runs of the MCMC for each combination of models showed a convergence of the chain with the same optimum of values, the ESS values in each run exceeded the required threshold (>200). PS/SS methods showed the highest value of the marginal likelihood function for the combination of strict clock, evolutionary model SRD06, and population model BSG (MSCSC + SRD06 + BSG).

The SRD06 model implies dividing the data array into two partitions—the 1 + 2 and 3 codon positions. For each partition, we applied the HKY + G4 evolutionary model taking into account the nucleotide frequencies (this combination of parameters proved to be best when analyzing 177 loci of various RNA-containing viruses [27]).

A preliminary run using a relaxed clock showed a coefficient of variation of <10% (μ = 0.08; 95% HPD, 0–0.15); therefore, this model was not considered in study [28]. Results of comparison of the models by PS/SS methods are presented in Table 2.

Table 2.   Comparison of values of marginal likelihood functions for all considered combinations of models

The phylogenetic tree reconstructed in the BEAST v.1.8.4 program using 53 nucleotide sequences (10 245 NB) of TBEV-Eur isolated in the period from 1951 to 2017 is shown in Fig. 1. The rate of nucleotide substitutions was calculated with a relatively high accuracy and amounted to 1.3E-5 nucleotide substitutions per site (or one position in the genome) per year or one nucleotide substitution in the polyprotein per 7.5 years (95% HPD, 1.0E-5–1.8E-5 nucleotides per site per year or one nucleotide substitution in the range of 5.4–9.7 years). Application of the SRD06 model showed that the rate of nucleotide substitutions at position 3 of the codon exceeds the rates for positions 1 + 2 by 7.9 times. When recalculated in absolute values (taking into account the average evolutionary rate expressed in years; see above), the rate of fixation of nucleotide substitutions for codon positions 1 + 2 was 2.9E-6 nucleotides per site per year (95% HPD, 2.2E-6–4.0E-6), while for codon position 3 the rate was 2.3E-5 nucleotide substitutions per site per year (1.7E-5–3.2E-5).

Fig. 1.
figure 1

Phylogenetic tree reconstructed based on 53 protein-coding nucleotide sequences (10 245 NB) of TBEV-Eur using the Bayesian method implemented in the BEAST program. Posterior probability values probability values are indicated to the left of the main nodes. Cluster S formed by TBEV-Eur strains from Siberia is highlighted in gray. For the main node of cluster S, the gray horizontal bar represents 95% HPD. A timeline expressed in calendar years is shown below the phylogenetic tree. The genomes decoded in the course of this study are highlighted in bold.

Twelve strains from Western and Eastern Siberia formed cluster S (posterior probability = 1). The appearance of the closest common ancestor for this cluster is estimated at approximately 1928; the confidence interval is 50 years (95% HPD, 1900–1950). Due to the low number of informative sites in 12 considered strains (seven sites with a sequence length of 10 245 nt), polytomy and low values of posterior probability were observed within cluster S. However, if sufficient information for the calculation of evolution rate is available in the nucleotide sequence set, the divergence time for unresolved clusters can be determined with high accuracy.

Phylogeography of Siberian TBEV-Eur Strains

D. Heinze et al. [26] analyzed the spatial and temporal dynamics of tick-borne flaviviruses (TBFVs). The authors showed the linear nature of the relationship between genetic distance and geographical distance in km (R2 = 0.91) for TBFVs, including TBEVs of three main subtypes. Based on the revealed dependence, a tentative model of the global distribution of TBFVs was created, according to which the separation of TBEV-Eur from the main genetic line (in other words, the formation of TBEV-Eur) occurred within the territory of Europe, whence all its modern representatives originated, including TBEV-Eur strains isolated in Siberia.

Reperformance of a regression analysis based on the study of Heinz et al. using the Siberian TBEV-Eur strains showed that these strains are located much higher than the revealed regression line (Fig. 2), which clearly indicates their relatively recent appearance in Siberia (in the same study, similar conclusions were made regarding the Powassan virus, isolated on the territory of Primorsky krai).

Fig. 2.
figure 2

Linear regression characterizing the relationship between the genetic distance and the geographical distance of the TBEV and LIV strains (R2 = 0.92, excluding the Siberian TBEV-Eur strains).

CONCLUSIONS

According to the results of the performed analysis, the evolutionary rate for TBEV-Eur was 1.3E-5 nucleotides per site per year or one nucleotide substitution per 7.5 years (95% HPD, 1.0E-5–1.9E-5 nucleotides per site per year or one substitution in the interval of 5.4–9.7 years). The calculated time of the occurrence of the group of TBEV-Eur strains isolated within the territory of Siberia dates back to approximately 1928 (95% HPD, 1900–1950). The relatively young evolutionary age of the cluster and the high genetic similarity of the isolates, as well as results of the regression analysis, indicate the recent introduction and subsequent increase in the genetic diversity of TBEV-Eur in Siberia.