Introduction

A particular significance of the genetic research of the Altai and neighboring regions lies in the geographic location of the Altai area on the margin of southwestern part of Siberia, believed to be the main gateway for the initial peopling of the remainder of Siberia from the southern regions of Central Asia. Archaeological data suggest that modern humans in the late Pleistocene could have already inhabited South Siberia. The earliest Upper Paleolithic industries occurred in the Altai region and are dated as 43,300±1,600 years B.P. are thought to be linked with a gradual transformation of a Mousterian tradition by the introduction of more progressive elements during the early stage of the midlast glacial interstadial interval (45,000–30,000 years B.P.) (Derevianko 1998; Goebel 1999). A similar “mixed” technological character observed in Altai can be found in most of the other Late Paleolithic stone industries in Siberia dating to the second half of the Karginsk interstadial interval. Analogous combinations of both the Mousterian and Late Paleolithic elements are documented in the Transbaikal and Angara River basin regions, as well as in the upper Yenisei area (Chlachula 2001). They also have been recorded in synchronous industries in Mongolia and North China thus testifying the similar processes of the cultural development as in Siberia (Okladnikov 1981; Derevianko 1998). During the following interval, i.e., at the end of the Karginsk interstadial and in the early last glacial (Sartan) stage, more progressive Late Paleolithic cultures with an advanced technique of the prismatic core flaking reminiscent of the European traditions emerged in vast areas of Siberia: on the eastern margin of the West Siberian Lowland, in the upper Yenisei River basin, as well as in the Angara River basin (Derevianko and Zenin 1996; Vasiliev et al. 1999). Such industries, however, are absent in the Altai area, indicative of complex and regionally divergent cultural evolution in different parts of Siberia (Chlachula 2001).

Unfortunately, archaeological records alone with the lack of human skeletal remains are inconclusive about the anthropological traits, which were characteristic for the Upper Paleolithic Siberian population. East Asian features thought to have been derived from early modern East Asians exist in the tooth from the Denisova Cave in the Altai region and in human remains from the Afontova Gora II site and indicate that the East Asians had moved into southwestern Siberia by 21,000 B.P. or even earlier (Alekseev 1998). Yet, the Upper Paleolithic artifacts from the 23,000-year-old Mal’ta site near Lake Baikal in south-central Siberia (Medvedev et al. 1996) have been found in association with skeletal remains that bear similar morphology with contemporary anatomically modern humans teeth from Europe thus providing the evidence for links between Siberia and the West during the Upper Paleolithic. Thus, on assuming that during the Upper Paleolithic the population of South Siberia was closely related to other East Asian populations, then during the Neolithic, admixture with populations from Eastern Europe probably occurred. The prevalence of European features among steppe zone inhabitants of Tuva, Altai, Khakassia, and West Mongolia became the most significant since the Bronze Age or even earlier (Alexeev and Gohman 1984; Alexeev 1989). The boundary of the Eastern European influence is clearly fixed at Lake Baikal. To the east of Baikal no palaeoanthropological find bears any traces of European admixture (Alekseev 1998).

Thus, the considerable ethnic diversity in South Siberia was largely shaped by migration processes that had occurred since the initial colonization of the region. Historically, a complex network of migrations can be traced from Central, Eastern Asia, and Western Eurasia (Alexeev 1989; Vasiliev 1993; Derevianko 1998).

Classical genetic data, covering most of the present-day aboriginal populations living in Siberia showed a high degree of between-population heterogeneity often attributed to low population densities (Rychkov and Sheremetyeva 1980; Cavalli-Sforza et al. 1994; Rychkov et al. 2000) as well as the existence of considerable correlations among geographic, linguistic, and genetic variation (Cavalli-Sforza et al. 1994; Karafet et al. 1994). Recent genetic studies of Siberian populations involving the high-resolution mitochondrial DNA (mtDNA) and Y-chromosomal markers have been focused mainly on the peopling of the Americas (Shields et al. 1993; Torroni et al. 1993; Sukernik et al. 1996; Starikovskaya et al. 1998; Karafet et al. 1999; Santos et al. 1999; Lell et al. 2002; Derenko et al. 2000; 2001; Bortolini et al., 2003), on the history of early human migrations in Eurasia (Zerjal et al. 1997; Wells et al. 2001; Derbeneva et al. 2002), East Asia (Su et al. 1999; Ke et al. 2001; Karafet et al. 2001) or particular regions of Siberia (Schurr et al. 1999; Pakendorf et al. 2003; Derenko et al. 2003; Fedorova et al. 2003; Puzyrev et al. 2003). MtDNA studies revealed that modern indigenous South Siberian populations show a complex pattern of the mtDNA structure reflecting diverse interactions that occurred at different times between eastern and western Eurasian gene pools (Derenko et al. 2001; Pakendorf et al. 2003; Derenko et al. 2003; Fedorova et al. 2003; Puzyrev et al. 2003). It has also been shown that the populations of South Siberia share a common genetic substratum with the Central and East Asian populations (Derenko et al. 2002; 2003; Pakendorf et al. 2003), whereas ethnic groups of northeastern Siberia (Koryaks and Itel’mens) have stronger genetic affinities with eastern Siberian/East Asian populations (Schurr et al. 1999). The Y-chromosome study of Su et al. (1999) (see also Shi et al. 2005) suggested that territories of Northern China and South Siberia was subsequently populated from mainland Southeast Asia during the last Ice Age, whereas Karafet et al. (2001, 2002) underscored an important role of Central Asia in the early peopling of Siberia. In general, a comprehensive survey of Y-chromosome SNP markers revealed that modern aboriginal Siberian populations show the pattern of reduced haplogroup diversity within populations combined with high level of between-population differentiation and underscored the important role of directed dispersals, range expansions, and long-distance colonizations bound by common ethnic and linguistic affiliation in shaping the genetic landscape of Siberia (Karafet et al. 2002).

Although the Y-chromosome variability data cover all major North Asian regional groups, the detailed data sets for the ethnical groups living in Eastern and South Siberia are still incomplete. The present research was aimed at understanding the Y-chromosomal composition of 11 Altaic-speaking populations, occupying the broader area of Altai–Sayan and Baikal regions, and estimating the level of relative contribution of different genetic components in their paternal gene pools.

Materials and methods

Subjects

A total of 1,358 samples (whole blood and hair roots samples) from unrelated males were collected in the following ten autochthonous Siberian populations (Fig. 1): South Altaians (Altaians-Kizhi and Teleuts), Shors, Tuvinians, Eastern Tuvinians (Todjins), Tofalars, Sojots, Khakassians, Buryats and Evenks, as well as in Mongolians, Koreans and Russians additionally typed for comparative purposes. Additional sample of Mongolic-speaking Kalmyks thought to be descendants of western Mongolians (Oirats) migrated to the Caspian region in the 17th century from Central Asia, were also included in our analyses. Taking into account the considerable genetic similarity between Kalmyks and Buryats demonstrated both by classical genetic data (Galushkin et al. 2001) and mtDNA variability data (authors unpublished data), the Kalmyks are considered to be “Siberians” for all statistical analyses. All individuals were unrelated through at least three generations and originated from the areas considered for this study. All the samples were collected with appropriate ethical approval and informed consent. These samples include those previously analyzed for mtDNA variation (Derenko et al. 2003).

Fig. 1
figure 1

Population sample sites. Sample sizes are given in parentheses

Molecular analysis

The Y Alu-insertion polymorphism (YAP) at DYS287 and 16 SNP markers (RPS4Y711, SRY-8299, M89, M201, M52, M170, 12f2, M9, M20, 92R7, SRY-1532, DYS199, M173, M17, Tat, and LLY22 g) previously reported to be polymorphic in Europe and/or Asia were typed as described (Karafet et al. 1999; 2001; Rosser et al. 2000; Ke et al. 2001; Zerjal et al. 2002; Flores et al. 2003) using the hierarchical typing strategy explained in Underhill et al. (2000) and Hammer et al. (2001), where additional genotyping of a sample was restricted to markers on the appropriate branch of the Y-chromosome haplogroup tree. The Y-SNP haplogroup nomenclature used here is according to the recommendations of the Y Chromosome Consortium (2002). The phylogenetic relationship of the Y-chromosome haplotypes, based on the 17 biallelic markers analyzed here, is shown in Fig. 2.

Fig. 2
figure 2

Phylogenetic network of 18 Y-chromosome haplogroups potentially detected by the 17 biallelic markers shown on branches

A total of 255 samples belonging to R1a1 haplogroup (134 South Siberian and 121 Russian Y-chromosomes) were also analyzed at 11 STR loci: DYS19, DYS385a/b, DYS389I, DYS389II, DYS390, DYS391, DYS392, DYS393, DYS437, DYS438, DYS439 using PowerPlex® Y System (Promega Corporation, Madison, USA), an ABI PRISM® 377 DNA Sequencer, internal size standards, and GeneScan fragment analysis software.

Statistical Analysis

Haplogroup frequency distributions were inspected by an exact test of population differentiation (Raymond and Rousset 1995) implemented in ARLEQUIN 2.000 (Schneider et al. 2000). Significance of these tests was assessed by comparison of the observed frequency distributions to 10,000 explorations of Markov Chain steps. This software was also used to estimate genetic distances (as pairwise values of F ST), genetic diversities (and their SEs), the among population variability (AMOVA), and to test for correlations between geographic and genetic distances through a Mantel test with 1,000 permutation steps. Geographical distances were drawn from the Great Circle Distance program (http://www.mercury.demon.co.uk/dist/dodist.html) by comparison of geographical coordinates. Multidimensional scaling (MDS) analysis (Kruskal 1964) of the F ST values was performed with STATISTICA (StatSoft).

For Y-STR data, an unbiased estimate of haplotype diversity was calculated with the ARLEQUIN 2.000 program (Schneider et al. 2000). A median-joining (MJ) network among R1a1 Y-STRs haplotypes was constructed using the Network 4.1.0.9 program (Bandelt et al. 1999). Network was calculated by the median-joining method (ε=0) (Bandelt et al. 1995), weighting the STR loci according to the average of their variability in the haplogroup R1a1. The age of microsatellite variation within R1a1 haplogroup was estimated as the average squared difference in the number of repeats between all the current chromosomes and the founder haplotype, averaged over microsatellite loci and divided by the effective mutation rate w of 0.00069 per locus per 25 years (Zhivotovsky et al. 2004). The upper bound for divergence time of populations (within a haplogroup) was calculated as T D (the estimator of the time since two populations split from a common ancestor), assuming microsatellite variance in repeat number at the beginning of population subdivision (Vo) equal to zero (Zhivotovsky 2001).

Published Y-SNP data for 89 Eastern Europeans (Udmurt and Mari) (Semino et al. 2000) and 829 Central/Eastern Asians (Kyrgyz, Dungans, Uyghurs, Kazakhs, Uzbeks, Tadjiks, Turkmens, Northern Han, Hui, Manchu, Oroqen, Mongolians, and Koreans) (Zerjal et al. 2002, Karafet et al. 2001) were also included in the analyses.

Results and discussion

Haplogroup distribution

Fifteen different haplogroups of the 18 possible (Fig. 2) were found in Siberia (Table 1). The most frequent haplogroups, present at frequencies greater than 10% across South Siberia, were C, R1a1, N3, and P*. Haplogroups C and R1a1 were common in all populations. The highest frequencies of haplogroup C were present in Mongolic-speaking Buryats (63.9%), Kalmyks (70.6%), and Mongolians (57.4%) as well as in Tungus-speaking Evenks (40%). In other Siberian populations it was found at low or moderate frequencies (up to 17.6%). The Shors and Teleuts had the highest frequency of haplogroup R1a1 (58.8% and 55.3%, respectively), and this haplogroup generally was more frequent in populations from Altai and eastern Sayan region than in the adjacent areas, although it is rather high in Sojots (23.5%) from Baikal region and Evenks (14.0%) from Central Siberia. Haplogroup R1a1, which is common in Central and Eastern Europe as well as in some populations of Western and Central Asia, is prevalent also in the Russian sample studied (48.3%). Haplogroup P* was present in the majority of Siberian samples, except for the Evenks and Teleuts, reaching its highest frequency in Tuvinians (35.4%). The Teleuts, in turn, together with Shors and Tofalars had a high frequency of haplogroup R1* (12.8, 19.6, and 12.5%, respectively), which was found in other Siberian populations at low frequencies (up to 7.6%). Haplogroups N3 and N*, widespread in Siberia (Karafet et al. 1999; 2002; Derenko et al. 2002) and northern Eurasia (Zerjal et al. 1997) and rarely observed in Central Asia (Zerjal et a. 2002), were also frequent in South Siberian populations studied here. Haplogroup N3, present at its highest frequency in Tofalars (25%) and Buryats (18.9%), was quite frequent in all other Siberian populations (with the exception of Kalmyks) as well as in the Russians (14.0%) studied. Haplogroup N* chromosomes were present in 8.5% of the Siberian samples reaching the highest frequencies in Khakassians (28.3%) and Tofalars (34.4%). It was quite frequent also in Evenks (18%), Tuvinians (14.2%) and Shors (13.7%). With the exception of haplogroup K* (with an overall frequency of 6.4%) all of the remaining haplogroups were present at frequencies of less than 2% in Siberia. In general, the haplogroup diversity in Siberia (average value is 0.82) is significantly higher than that in Eastern Europe (average value is 0.74; P=0.0001) and in Central/Eastern Asia (average value is 0.79; P=0.034).

Table 1 Haplogroup distribution (no. of individuals and % values in parenthesis) and Y-chromosomal diversity in populations studied

It should be noted that Y-chromosome haplogroup frequency distribution in 81 Buryats studied by Karafet et al. (2001) coincide perfectly with our sample. Two data sets differ only in frequency of haplogroup N3, which accounts for 18.9% chromosomes in our sample versus 28.4% in Buryats from Karafet et al. (2001). The similar frequency distribution is observed also between our Evenks and Evenks from Yenisei region studied by Lell et al. (2002), whereas a small sample of Okhotsk Evenks from Lell et al. (2002) and Evenks from Karafet et al. (2001) differ significantly from our sample in frequencies of haplogroups R1a1 and N*. A similar case of even more significant frequency differences within the same ethnic group occur between our Tofalars and the 19 Tofalars studied by Lell et al. (2002). Both samples have the same (highest in Siberia) frequency of haplogroup N* (34.4 and 36.8%, respectively), whereas the frequency of haplogroup N3 in Tofalars from Lell et al. (2002) exceed significantly the same value for Tofalars studied here (47.4 and 25%, respectively). Moreover, haplogroups R1* and R1a1 were revealed in our sample with the same frequency of 12.5%, whereas only haplogroup R1a1 with a low frequency (5.3%) was found in Tofalars studied by Lell et al. (2002). There are also some differences in haplogroup distribution pattern observed between two Tuvinian samples. A lower frequencies of haplogroups N3 (9.7 vs. 17.5%), C (7.1 vs. 30%) and N* (14.2 vs. 27.5%) as well as the higher frequencies of haplogroups P* (35.4 vs. 15%) and R1a1 (17.7 vs. 7.5%) distinguished our 113 Tuvinians from 40 Tuvinians studied by Lell et al. (2002). Such frequency differences observed between samples from the same ethnic group could be best explained by inadequate sample sizes (in some cases) or within-population structure.

Population structure and affinities

Haplogroup distribution in populations studied is heterogeneous (P=0.000) and only few pairwise comparisons (3%) were insignificant (Table 2). Pairwise values of F ST showed that, in some cases, neighboring populations were significantly different, whereas in other cases, geographically distant populations had nonsignificant pairwise F ST values. To extend this observation, we tested for correlations between genetic and geographical distances. A matrix of pairwise F ST values was compared with the same matrix made of pairwise geographical distances, and correlation was highly significant (r=0.77; Mantel test: P=0.005), indicating that, in the South Siberia region, genetic distances are related to geographical distances, as is observed in other cases (Karafet et al. 2001).

Table 2 FST distances (below diagonal) and exact probability values of differentiation (above diagonal) between populations studied

The population structure of Siberian Y-chromosomes was investigated by the AMOVA procedure (Excoffier et al. 1992). AMOVA showed that a considerable amount (22.1%, P=0.000) of the total variance was attributed to differences between populations. This value is higher than that of Eastern European populations (13.8%, P=0.000), but a slightly lower than value obtained for the 12 Eastern/Central Asian populations (24.7%, P=0.000). When AMOVA was calculated between populations pooled by geographical areas (Table 3), the lowest, although significant level of variation was observed in Central/Eastern Asian–South Siberian comparison. This indicates that Siberian populations studied here have greater differences from Eastern Europeans than from Central/Eastern Asians. Nevertheless, the percentage of variance between the South Siberians and Eastern Europeans is lower than the percentage between the Eastern Europeans and Central/Eastern Asians, showing some Eastern European paternal influence on Siberian populations. Looking at each population, all are significantly different both from Eastern European and Central/Eastern Asian pools. The only exception is Sojots, which are significantly different only from the Eastern Europeans (Table 3).

Table 3. Percentage of variation among different geographical areas resulting from the AMOVA analysis

Pairwise values of F ST were also used to perform a MDS analysis. A good fit between the two-dimension plot and the source data (pairwise values of F ST) were obtained, demonstrated by the low stress value obtained (0.014). Central/Eastern Asian and Eastern European populations constituted two different clusters. Central/Eastern Asian populations together with Evenks occupy a position close to a loose cluster formed by five of 11 Siberian populations, namely Tuvinians, Todjins, Khakassians, Sojots, and Tofalars. Eastern Europeans together with two groups of South Altaians (Altaians-Kizhi and Teleuts) and Shors formed a separate cluster in a lower part of the plot. The Mongolic-speaking Buryats and Kalmyks clustered together as extreme outliers, though they occupied a position more close to the Central Asians than to the Eastern Europeans (Fig. 3). Finally, it is obvious that the first dimension clearly separates Buryats and Kalmyks from all other populations, whereas the second dimension shows a clustering of Eastern Europeans and three Altai region populations clearly separated from a clustering of Central/Eastern Asians and all other Siberian groups.

Fig. 3.
figure 3

MDS plot based on pairwise F ST values showing relationships among South Siberian, Eastern European, and Central/Eastern Asian populations

STR diversity of Y-chromosome haplogroup R1a1

The Y-chromosome data obtained suggest that the populations of Altai region appear to be heterogeneous due to a different contribution of some male lineages, such as P* and R1-derivates (R1* and R1a1). Haplogroup P*, corresponding to the ancestral M45-lineage, has been suggested to have originated in the Central Asian populations and spread to South Siberia together with the appearance of anatomically modern humans (Wells et al. 2001). It is noteworthy that the haplogroup P* reaches higher frequencies in South Siberia than in Central Asia, being especially high in the Tuvinians and the Tuvinians-Todjins (35.4 and 22.2%, respectively) as well as in Altaians-Kizhi (28.3%). Three other Altai-Sayan populations, Shors, Teleuts, and Tofalars, demonstrate the highest (in Siberia) impact of most ancient Upper Paleolithic M173-lineages (R1* haplogroup) which is thought to originate also in Central Asia from where they dispersed westward into Europe (Semino et al. 2000; Wells et al. 2001). Moreover, both Teleuts and Shors exhibit the highest frequencies of haplogroup R1a1, which represents one of the most common male components of Altai–Sayan region populations. It looks like that R1a1-lineages probably originated in South Russia/Ukraine region around 5,000 years ago and associated with further Indo-European movements into Central Asia (Semino et al. 2000; Wells et al. 2001) are responsible for the high genetic similarity revealed between populations of Altaians-Kizhi, Teleuts, Shors, and Eastern Europeans. Yet, our data on Y-chromosome STRs variation (Table 4, Fig. 4) demonstrate the clear differences between South Siberian and Eastern European (Russian) R1a1-lineages, which can be best explained by apparently ancient than very recent events in South Siberian population history.

Table 4. Frequency of R1a1 haplotypes in South Siberians and Russians
Fig. 4
figure 4

Median-joining network of haplogroup R1a1 based on eleven STR loci (DYS19, DYS385a/b, DYS389I, DYS389II, DYS390, DYS391, DYS392, DYS393, DYS437, DYS438, DYS439) in 134 R1a1-chromosomes from ten populations of South Siberia (Tuvinians, Todjins, Shors, Khakassians, Evenks, Sojots, Teleuts, Altaians-Kizhi, Tofalars, and Buryats) and in 121 Eastern Europeans (Russians). The phylogenetic relationships between the 136 microsatellite haplotypes determined by using the program NETWORK 4.1.0.9 (Fluxus Engineering Web site). Circles represent microsatellite haplotypes with area proportional to frequencies, and colors indicate the region of origin. South Siberian haplotypes are shown in black, while Eastern European (Russian) haplotypes are shown in white

The analysis of 11 Y-chromosomal microsatellites revealed 51 different haplotypes among the 134 South Siberians carrying haplogroup R1a1 (haplotype diversity: 0.959±0.007) and 95 haplotypes among the 121 Russians (0.994±0.002). In general, 136 STR haplotypes were detected among R1a1 chromosomes studied (Table 4). Only ten of the 51 South Siberian R1a1-haplotypes were shared between the South Siberians and Eastern Europeans (Russians), reflecting thus a small admixture with recent migrants of European descent. Both groups were also significantly different based on Rst using those STRs (Rst=0.093, P=0.000). The modal haplotype (haplotype 24 in Table 4) containing the most common allele at each locus was found in both populations. The most frequent haplotype seen in the South Siberian sample (haplotype 40) is two mutational steps from the modal haplotype (at loci DYS389I and DYS389II). The most frequent haplotype observed in Russians (haplotype 26) is one step from the modal haplotype (at locus DYS385). These are the most frequent regional haplotypes that are three mutational steps from each other (at loci DYS385, DYS389I, and DYS389II). Divergency estimates based on Y-chromosome microsatellite variation indicate that, despite the haplotype diversity value of Eastern European R1a1-lineages exceeds that in South Siberia, the estimated ages for this haplogroup are almost equal in both regional groups—11,270±4,070 years in South Siberia and 11,380±3,200 years in Eastern Europe. These values are very close to the divergence time between the two regional groups studied (10,310±3,140 years). These results suggest that an isolation of the regional groups occurred soon after the origin of the R1a1 haplogroup.

The modal haplotype was also at the center of a MJ network relating haplotypes within R1a1 lineage (Fig. 4). The pattern of clustering in the networks was relatively complex, with a major “star cluster” and two minor subclusters with distinct, South Altaian and Eastern European, geographic distributions. The first among them is represented by R1a1 Y-chromosomes found solely in Altaians-Kizhi and Teleuts, but the second one is formed by the Russians and Altaians. The coalescent time of the South Altaian subcluster was estimated as 4,940±1,894 years, whereas the age of Eastern European subcluster was 9,920±3,151 years.

Conclusion

In this study we analyzed 28 Y-chromosome polymorphic markers in a representative set of ethnic groups of Siberia, Central/Eastern Asia, and Eastern Europe to investigate the genetic variation in the aboriginal South Siberian populations. We observed the statistically significant genetic differentiation between the populations of Baikal and Altai–Sayan regions. These regional differences can be best explained by different contribution of the Central/Eastern Asian and Eastern European paternal lineages into the gene pools of modern South Siberians. The most eastern population of Baikal region, Buryats together with Kalmyks and Mongolians show the highest frequencies of haplogroup C lineages, which may represent the major Central/Eastern Asian contribution into their gene pools. On the other hand, the populations of Altai and Sayan region are characterized by a considerable impact of Eastern European lineages. Meanwhile, analysis of Y-STR haplotype diversity within R1a1 haplogroup demonstrates the clear differences between the South Siberian and Eastern European R1a1-lineages with the evolutionary ages compatible with divergence time between these two regional groups.

It is noteworthy that analysis of mtDNA variability in Siberian populations also revealed the regional differences in distribution of Eastern Asian and Western Eurasian lineages. The highest influx of Western Eurasian mtDNA lineages was observed in populations of the Altai and East Sayan region (up to 34.5% in Altaians), whereas in Buryats and Sojots from Baikal region it was markedly lower (less than 10%) (Derenko et al. 2003). Such east-to-west cline in the frequencies of Western Eurasian-specific mtDNA haplotypes observed in South Siberia was considered concordant with the archaeological and paleoanthropological views about the early presence of Western Eurasians in South Siberia region (Alekseev 1998).

Thus, the genetic heterogeneity of South Siberian populations revealed both by Y-chromosome and mtDNA variability data could be compatible with the hypothesis of more than one prehistoric migration from different sources in Eurasia postulated by paleoanthropological, archaeological and linguistical data. Further wider sampling, additional typing of Y-chromosomal, mitochondrial and autosomal markers are likely to throw more light on the origin of different genetic components in the South Siberians.