Introduction

Mitochondrial DNA (mtDNA) located outside of the nucleus is a crucial part of human genome. Human mtDNA is a 16,569 bp closed circular double-stranded DNA molecule that encodes essential genes for proper cellular function [1]. While applied to forensic studies, mtDNA exhibits many distinct characteristics when compared to nuclear DNA, like maternal inheritance, lack of recombination, rapid mutation rate, and high polymorphisms [2,3,4,5,6,7]. Furthermore, many reports have confirmed that mtDNA sequence variations are strongly correlated with human genetic evolution and migration [6, 8]. Meanwhile, mtDNA polymorphism analysis is suitable for highly degraded bio-materials [9, 10]. Hence, the genetic analysis of mtDNA in different populations from disparate regions can be utilized for maternal lineage study.What’s more, mtDNA is also a helpful genetic marker for geographic ancestry inference and anthropology researches.

The Kazak group is one of the 56 ethnic groups in China, which has its own language, culture and religion. According to the 6th population census in China (2010), Kazak ethnic group has a population of approximately 1.5 million and most Kazak individuals reside in Xinjiang Uygur Autonomous Region and Gansu Aksai Kazakh Autonomous County. For a long time, the genetic structure of Kazak group had changed greatly due to gene interactions happened between Kazak and the neighboring populations. Besides, more and more studies focusing on Kazak group have been conducted in recent years to infer its migratory route and origin [11,12,13]. In this study, we analyzed the mtDNA genetic polymorphisms of Chinese Xinjiang Kazak group using a novel mtDNA panel, and revealed the genetic relationships between Kazak group and the reference populations as well as provided useful information for the maternal lineage study and migration history of Chinese Kazak group.

Materials and methods

Ethical statement

This study was approved by the Ethical Committee of Southern Medical University and Xi’an Jiaotong University, China. And all the volunteers have given their written informed consents before inclusion. The samples collection and subsequent analysis were conducted under the human and ethical research principles of Southern Medical University and Xi’an Jiaotong University, China.

Samples

The blood samples were collected from 141 unrelated healthy volunteers of Kazak ethnic group in Ili, Xinjiang Uygur Autonomous region. The written informed consent was acquired from each of them and migration events did not exist in their family history of the participants for at least three generations. Blood samples were collected respectively in terms of the standard procedure.

MtDNA extraction, amplification and genotyping

MtDNA was extracted according to the previous protocol [14]. Multiplex PCR amplification of 60 mtDNA loci (nt10398, 10873, 3010, 709, 7196, 12705, 3970, 13104, 10310, 5178, 13928, 6446, 8414, 8793, 8794, 15043, 16311, 16126, 16129, 8701, 8697, 4883, 10400, CA, 9-bp, 1719, 14668, 12811, 9824, 9123, 7028, 11719, 8584, 11251, 8020, 5460, 2706, 11215, 4216, 12372, 16362, 9698, 1541, 8684, 9477, 4491, 1811, 16316, 16319, 9545, 152, 14569, 8964, 10397, 3348, 4833, 7600, 5417, 5442 and 15784) was conducted using the Expressmarker mtDNA-SNP 60 reagent (AGCU ScienTech Incorporation, Jiangsu, Wuxi). Briefly, the total reaction volume of the PCR amplification system (25 µl) contained 1 µl genomic DNA, 10 µl reaction mix, 5 µl primer set, 5 µl taq DNA polymerase and 4 µl sdH2O. The cycling parameters were set up according to the manufacturer’s instruction, respectively. The PCR production of 1 µl was combined with 0.5 µl Marker SIZ-500 and 12 µl Hi-Di formamide. Capillary electrophoresis was performed by the ABI Prism 3130XL Genetic Analyzer and sample profiling was analyzed by GeneMapper ID software v3.2.1 (Applied Biosystems, USA). Male 9948 and Female 9947A DNA samples were used as the positive control in our experiment.

Data analysis

The genotyping results of Kazak group were aligned with the revised Cambridge Reference Sequence (rCRS) [15] for subsequent statistical analysis. Haplogroups were obtained according to van Oven M, Kayser M (http://www.phylotree.org; [16]). Forensic statistical parameters (haplotype diversity, nucleotide diversity, polymorphic loci and so on) for Kazak group were calculated by DnaSP software version 5.0. The random match probability (RMP) of two individuals from a population having the same haplotype was calculated as RMP = ∑Xi2, Xi is the frequency of the i-th mtDNA haplotype. Besides, the discrimination power (DP) was utilized for evaluating the probability of two unrelated random samples having different haplotypes from a certain population. Haplotype diversity could measure the uniqueness of a particular haplotype in a certain population. Nucleotide diversity (π) is the average number of nucleotide differences per locus between two DNA sequences selected randomly from a given population [17, 18].

Furthermore, Arlequin software version 3.0 was employed to estimate pairwise Fst values between Xinjiang Kazak and other neighboring groups [12, 13, 19,20,21,22,23,24]. Besides, a phylogenetic tree was reconstructed by MEGA software version 4.0 based on genetic distance values of pairwise populations to infer the genetic background of Kazak ethnic group as well as to evaluate the genetic relationships between Kazak and other referenced groups.

Results

Forensic parameter analysis

Allele frequencies of the 60 mtDNA loci detected in Chinese Kazak ethnic group were listed in Table 1. As for the 58 selected mtDNA SNP loci of Kazak ethnic group, the most common polymorphism was single nucleotide transition (87.93%), followed by single nucleotide transversion (nt5178, nt7196, nt13928 variants, 5.17%). At nt9824 locus, single nucleotide transition and transversion were simultaneously observed (A/T/C). While at loci nt3348, nt8697, nt8793, no polymorphisms were detected.

Table 1 Allele frequencies of 60 mtDNA loci in Chinese Xinjiang Kazak ethnic group

Compared with the previous data of Xinjiang Xibe ethnic group [19], transition and transversion were both observed at the nt9824 lcous. In addition, single nucleotide transversions occurring at nt5178, nt7196 and nt13928 loci were also identically detected in these two groups, while no polymorphisms were found at nt8697, nt8793 loci in Kazak group and at nt4491, nt6446, nt8684, nt13104 loci in Xinjiang Xibe group. Results demonstrated that allele frequencies of some mtDNA loci differed among different populations. Therefore, more population data should be collected to verify the efficiency of this novel mtDNA panel in forensic field.

With the exclusion of (CA)n locus, mtDNA haplogroups of the overall 141 individuals in Xinjiang Kazak group were presented in Fig. 1. While Table 2 and Fig. 2 showed haplogroups and haplotypes based on polymorphisms of the 60 mtDNA loci in Xinjiang Kazak ethnic group. Fifty-seven polymorphic loci (excluding nt3348, nt8697, and nt8793 loci) defined 25 haplogroups and 79 haplotypes. Moreover, Haplogroup D4 was the most common haplogroup (21.28%) in Xinjiang Kazak group, followed by the H haplogroup (14.18%). Among the total 79 haplotypes, 53 of them were observed for only once, 14 for twice, and 12 for three times or more, with the detailed information shown in Table 3.

Fig. 1
figure 1

MtDNA haplogroups of 141 individuals recruited from Chinese Xinjiang Kazak ethnic group using online software phylotree

Table 2 The numbers of mtDNA haplogroups and haplotypes based on 60 mtDNA loci in the 141 Xinjiang Kazak individuals
Fig. 2
figure 2

The pie chart showing the mtDNA haplogroup and haplotype frequencies based on the 60 mtDNA loci in Chinese Xinjiang Kazak group

Table 3 Haplotypes observed times and the corresponding quantities of the 60 mtDNA loci in Chinese Xinjiang Kazak ethnic group

Based on the total 60 mtDNA loci, the values of RMP and DP were 0.0270 and 0.9730 in Xinjiang Kazak group, respectively. In addition, more forensic statistical parameters of the 58 mtDNA loci excluding (CA)n and 9-bp deletion were presented in Table 4. The haplotype diversity was 0.978 ± 0.005, and the nucleotide diversity was 0.17449. As presented in Table 4, the values of RMP and DP were 0.0291 and 0.9709, respectively. In Chinese Xinjiang Kazak group, the DP value of 58 mtDNA SNP loci was lower, which was consistent with the previous report regarding other Chinese groups (Han population and Uyghur group) [25].

Table 4 Forensic statistical parameters of 58 mtDNA SNP loci excluding (CA)n and 9 bp deletion polymorphisms in Chinese Xinjiang Kazak ethnic group

Interpopulation differentiation and phylogenetic analysis

As shown in Table 5, the pairwise Fst and p values between Chinese Xinjiang Kazak group and other previously reported groups [12, 13, 19,20,21,22,23,24] were calculated. Values below the diagonal were Fst values, while above the diagonal were p values. Data with statistical significances were labelled in bold. It was obvious that the studied Xinjiang Kazak ethnic group had the smallest genetic differentiation with Xinjiang Uzbek group (Fst = 0.00808, p > 0.05), followed by Xinjiang Han population (Fst = 0.00828, p > 0.05), and Xinjiang Uygur ethnic group (Fst = 0.00935, p > 0.05). Oppositely, Italian population was observed to have the largest genetic differentiation with Kazak ethnic group (Fst = 0.20159, p < 0.05).

Table 5 Fst and p values between Kazak and other previously reported groups based on 60 mtDNA loci (the data below the diagonal were Fst values and above the diagonal were p values)

As shown in Fig. 3, a phylogenetic tree was constructed based on DA distances between Xinjiang Kazak and other groups to further demonstrate the genetic relationships among those popualtions. Similarly, Italian, African Americans, Estonian and Caucasian groups were in the same cluster, while the rest groups were in another. Xinjiang Kazak group had closer genetic relationships with Xinjiang Uygur and Uzbek groups, and they shared a sub-branch of the phylogenetic tree collectively. Furthermore, Xinjiang Kazak group had relatively closer genetic distances with Xinjiang Han, Xinjiang Xibe and Altaian Kazak groups, which meant these groups might have closer genetic relationships in a way.

Fig. 3
figure 3

A phylogenetic tree reconstructed by MEGA software version 4.0 based on 58 mtDNA loci excluding (CA)n and 9-bp deletion polyporphisms revealing the genetic relationships between Chinese Xinjiang Kazak group labeled in red and other previously reported groups

Discussion

The genetic polymorphism analysis of mtDNA plays an essential and irreplaceable role in population genetic studies. The analysis of hypervariable regions in human mtDNA is widely used in forensic applications in recently years. Due to the inheritance traits, high polymorphism, small amplicon size, it can be utilized for highly degraded bio-materials analysis, maternal ancestry inference and anthropology study. Hence, mtDNA could be a powerful genetic marker in forensic applications.

The highly polymorphic mtDNA loci can reveal some important genetic features of the studied population. Single nucleotide transition and transversion are the common polymorphisms. (CA)n is a kind of length polymorphism of which the n represents the number of CA dinucleotide repeats (from nt00514 to nt00524 in the rCRS). Besides, (CA)n has a strong correlation with geographic origin and could be applied to individual identification because of relatively higher DP [25]. 9-bp deletion is the deletion of CCCCCTCTA sequence, and the occurence of 9-bp deletion or not in different geographic distribution is related with human migration [26,27,28]. A combination of (CA)n and 9-bp deletion polymorphisms could improve the efficiency of mtDNA genetic marker in forensic applications.

In our present study, the results of pairwise Fst values and phylogenetic tree simultaneously demonstrated that Xinjiang Kazak ethnic group might be closely related to Xinjiang Uygur, Xinjiang Uzbek and Xinjiang Han populations. Meanwhile, Altaian Kazak and Xinjiang Xibe groups also had relative closer genetic distances with Xinjiang Kazak group which indicated closer genetic relationships among these groups, besides, it could be supported by historical records as well. The origin of Kazak group in Chinese history could trace back to Western Han dynasty, the inhabitants who lived in ill River valley and Issyk Kul were regarded as the forefather of Kazaks. [29,30,31,32]. In addition, ‘Silk Road’ accelerated the interaction of culture and gene between Kazak group and other populations [12, 33]. Furthermore, during 1932 to 1933, a large amount of foreign Kazaks migrated to China for severely famine [34]. Therefore, gene interaction between different populations inevitably happened in consideration of above mentioned historical events. Modern records indicated that after long-term residing with the Uygurs and other ethnic groups in Xinjiang, Kazaks broadly assimilated their culture, language and custom in Northwest China [35, 36]. Hence, according to the genetic analysis results, the Xinjiang Kazak ethnic group had colser genetic relationships with Xinjiang Uygur, Xinjiang Uzbek and Xinjiang Han populations.

Conclusion

Genetic polymorphisms of the 60 mtDNA loci were investigated to evaluate the efficiency of the overall 60 mtDNA loci for being a supplementary tool for individual identification and matrilineal parentage testing in Chinese Xinjiang Kazak group. Among these loci, single nucleotide transition was the most common polymorphism (87.93%), followed by single nucleotide transversion (5.17%). Single nucleotide transition and transversion were observed simultaneously at nt9824 locus, while there were three loci (nt3348, nt8697 and nt8793) that had no polymorphisms. There were 25 haplogroups and 79 haplotypes in the studied Kazak groups. Haplogroup D4 was the most common haplogroup (21.28%) in Chinese Xinjiang Kazak group. Among the total 79 haplotypes, 53 of them were observed for only once, 14 for twice, and 12 for three times or more. The haplotype diversity was 0.978 ± 0.005, and the nucleotide diversity was 0.17449. (CA)n and 9-bp deletion polymorphisms could improve DP of the mtDNA haplotypes. Finally, the genetic background of Chinese Xinjiang Kazak group and its genetic relationships with other referenced groups were also exploited through phylogenetic analysis. It was indicated that Xinjiang Kazak ethnic group had closer genetic relationships with Xinjiang Uygur, Xinjiang Uzbek and Xinjiang Han populations. However, in order to further reveal the genetic background of Chinese Kazak group, more referenced populations and genetic markers would be collected and studied in our future study.