Introduction

Short tandem repeat (STR) loci, are well widespread throughout the human genome and are the rich source of highly polymorphic genetic markers [1, 2]. STRs show sufficient variability among individuals and have become important analytical tools in several fields of study (e.g., genetic mapping, linkage analysis, human identification, and paternity testing) [3, 4]. With their short PCR fragment, efficient simple typing, and capacity for multiple amplification, STR loci are considered to be superior to other genetic markers. STR typing was also suitable for analyzing degraded and minute amounts of human DNA. Thus, STR technology has been widely used in forensic casework. Moreover, STR loci are employed in population genetics, as population data of STR loci are very useful in predicting genetic relationships among populations and in investigating genetic diversity of different populations [5, 6].

In recent years, more and more new STR loci have been discovered, and these loci have been increasingly used in research and other applications. The aim of this study was to create a set of new STR loci data representing Chinese Tujia ethnic group. We studied the allelic diversity and forensic statistical parameters of 21 new autosomal STR loci (D6S474, D12ATA63, D22S1045, D10S1248, D1S1677, D11S4463, D1S1627, D3S4529, D2S441, D6S1017, D4S2408, D17S1301, D1GATA113, D18S853, D20S482, D14S1434, D9S1122, D2S1776, D10S1435, D19S433 and D5S2500), and provided novel 21 STR data for 107 volunteers of the Tujia ethnic group from Chinese Hubei Province. The present study can provide basic and valuable data of 21 new STR loci for population genetics, human identification and paternity testing in forensic sciences.

Materials and methods

Samples collections and DNA extraction

Bloodstained samples were obtained from 107 unrelated healthy individuals of Tujia group in Enshi Tujia and Miao Autonomous Prefecture in south Hubei Province. The human genomic DNA was extracted using the Chelex-100 protocol as described by Walsh et al. [7]. All the individuals provided their written informed consent for the collection of the samples and subsequent analysis, and the investigation was conducted in accordance with human and ethical research principles of Xi’an Jiaotong University, China. This study was approved by the Ethics Committee of Xi’an Jiaotong, University, China.

PCR amplification and DNA typing

The 21 autosomal STR loci were amplified by using the AGCU 21+1 STR Fluorescence Assay Kit (AGCU ScienTech Incorporation, Wuxi, Jiangsu, China) following manufacturer’s instructions in 25 μl reactions containing 0.5–2 ng genomic DNA, 10 μl reaction mix, 5 μl 21+1 primers, 0.5 μl HS-Taq DNA polymerase and ddH2O. Thermal cycling was performed using the GeneAmp PCR System 9700 (Applied Biosystems, Foster City, CA, USA).

PCR products were separated by capillary electrophoresis on an ABI 3130 Genetic Analyzer (Applied Biosystems, Foster City, CA, USA) using 1 μl PCR product or 21+1 Allelic Ladder was mixed with 12 μl Hi-Di formamide and 0.5μl AGCU Marker SIZ-500 internal lane standard (AGCU ScienTech Incorporation, Wuxi, Jiangsu, China). The mixture was denatured at 95 °C for 3 min; followed by immediate chilling on ice for 3 min. Genotyping was performed by comparison with allelic ladders using GeneMapper ID 3.2 software. According to the number of repeat units present as recommended by the DNA Commission of the Society for Forensic Genetics [8], the alleles of all 21 STR loci were named. In addition, control DNA (9947A) (Promega, Madison, WI, USA) was included in the kit for quality control. All experimental steps were carried out according to the laboratory internal control standards and kit controls.

Statistical and phylogenetic analysis

Hardy–Weinberg equilibrium expectations at all STR loci were evaluated using the Exact test by the modified PowerStat version 1.2 spreadsheet (Promega, Madison, WI, USA) [9]. The polymorphic information content (PIC), power of discrimination (PD), power of exclusion (PE), observed heterozygosity (HO) and expected heterozygosity (HE) were calculated using Excel 2003 and the PowerStat version 1.2 spreadsheet (Promega, Madison, WI, USA) [10]. Locus-by-locus allelic frequencies were compared between the Tujia group and other previously published populations using Analysis of molecular variance method (AMOVA, based on F-statistics), which was performed with ARLEQUIN version 3.1 software [11]. The p-values of linkage disequilibrium (LD) between all the loci in our study were estimated by Genepop version 4.0.10 (http://genepop.curtin.edu.au/).

Results and discussions

The Tujia, with a total population of over 8 million, is the 6th largest ethnic minority in People’s Republic of China. They live in Wuling Mountains, straddling the common borders of Hunan, Hubei and Guizhou Provinces, and Chongqing Municipality. Tujia is a Tibeto-Burman language and is usually considered an isolate within this group. Although there are different accounts of their origins, the Tujia may trace their history back over twelve centuries, and possibly beyond, to the ancient Ba people who occupied the area around modern-day Chongqing some 2,500 years ago. Today, traditional Tujia customs can only be found in the most remote areas [12].

Allelic frequencies and forensic statistical parameters of the observed 21 STR loci are listed in Table 1. In the Tujia ethnic group from Chinese Hubei Province, 155 alleles were observed with corresponding allelic frequencies ranging from 0.005 to 0.589. No deviations from Hardy–Weinberg equilibrium were observed at 21 loci. The HO, HE, PIC, PE, PD ranged from 0.570 (D1S1627 locus) to 0.822 (D22S1045 locus), 0.579 (D9S1122 locus) to 0.824 (D19S433 locus), 0.525 (D1S1627 locus) to 0.802 (D19S433 locus), 0.257(D1S1627 locus) to 0.641 (D22S1045 locus), 0.773 (D1S1627 locus) to 0.945 (D19S433 locus), respectively. The combined probability of exclusion, power of discrimination, probability of matching value for all 21 STR loci reached 0.9999977307, 0.999999999999999999889026 and 1.10974 × 10−19, respectively.

Table 1 Allelic frequencies and forensic statistical parameters of 21 STR loci from Tujia ethnic group in Hubei Province, China (n = 107)

Linkage disequilibrium (LD) test is a statistic that refers to a non-random association of alleles at different loci. If the loci on the same human chromosome are closely linked, the LD has to be evaluated for their later practical application. Some articles have reported that LD manifestation between markers distanced more than 5 cM (genetic distance) or 5 Mb (physical distance) is unlikely [13, 14]. The selected loci on the same chromosome in our study are at least 50 Mb apart from each other, which means the 21 STR loci are not linked with each other. However, LD can occur between unlinked loci because of population substructure, natural selection, mutation, random genetic drift, founder effect etc. Hence, we tested the p-values of linkage disequilibrium between all the loci in our study. In 210 pairwise comparisons, 20 pairs were found with p values below 0.05 (D10S1248/D10S1435, p = 0.0001; D10S1248/D11S4463, p = 0.0014; D10S1248/D2S441, p = 0.0156; D10S1248/D9S1122, p = 0.0292; D12ATA63/D11S4463, p < 0.0001; D12ATA63/D1GATA113, p = 0.0044; D12ATA63/D4S2408, p = 0.0202; D12ATA63/D5S2500, p = 0.0050; D18S853/D19S433, p = 0.0001; D18S853/D2S441, p = 0.0295; D18S853/D4S2408, p = 0.0291; D18S853/D17S1301, p = 0.0102; D1GATA113/D1S1677, p < 0.0001; D2S441/D2S1776, p < 0.0001; D3S4529/D1GATA113, p = 0.0052; D4S2408/D3S4529, p = 0.003; D5S2500/D6S1017, p < 0.0001; D6S474/D9S1122, p = 0.0009; D6S474/D3S4529, p = 0.0151; D17S1301/D14S1434, p < 0.0001). If more than three pairs of LD are due to a single locus, this locus may be removed from the panel (e.g., if A and B, A and C, A and D are in LD, removing A can make the panel better). As is showed above, the three STR loci D10S1248, D12ATA63, and D18S853 result in four pairs of LD. In the population from the Tibetan ethnic minority group residing in Lhasa region, Tibet Autonomous Region of China, three pairs of LD are due to a single locus D18S853 [15]. In the individuals from the Salar ethnic group in Xunhua Salar Autonomous County of Qinghai province of China, the D10S1248, D12ATA63, D6S1017, and D14S1434 loci were in linkage disequilibrium with 4, 6, 4 and 8 STR loci, respectively [16]. The LD results indicate that 5 STR loci (i.e. D10S1248, D12ATA63, D6S1017, D14S1434 and D6S1017) have limited values for forensic applications in the Salar group. These results show that the loci with limited values in the above populations are not the same sites and such LDs may be mainly caused by population substructure.

The allelic frequency distributions in the Tujia group were compared with data which were available for the same set of 21 STR loci studied in the Tibetan, Salar and Northern Han groups [1517], and the Fst and p values were listed in Table 2. The AMOVA results showed statistically significant differences at four STR loci between Tujia group and the Tibetan group, i.e. D12ATA63 (Fst = 0.0285; p = 0.001), D22S1045 (Fst = 0.0250; p < 0.001), D2S441 (Fst = 0.0104; p = 0.0264), and D6S1017 (Fst = 0.0256; p = 0.0020); Also at four STR loci between Tujia group and the Salar group, i.e. D1S1677 (Fst = 0.0137; p = 0.0332), D22S1045 (Fst = 0.0769; p < 0.001), D4S2408 (Fst = 0.0110; p = 0.0381), and D6S474 (Fst = 0.0181; p = 0.0078) locus; Only at one STR loci between Tujia group and Northern Han, i.e. D22S1045(Fst = 0.0382; p < 0.001). From the results we can see there is only one STR marker showed the statistically significant difference between the Tujia group and Northern Han population. We can see that the loci difference between the Tujia and Northern Han is less than that between Tujia and the other two groups. This shows that the locus has higher ethnic differences than the other loci in the panel.

Table 2 Comparison of Fst and p values between Chinese Tujia and other groups at the same set of 21 STR loci

We used D19S433 as the overlapping STR marker to compare our data with previously published data from other Chinese ethnic groups. The results showed no significant differences between our studied population and Chinese Dongxiang [18], Salar [18], Tu [19], Ewenki [20], Yi [21], Hui [6], Russian [22], and Shaanxi Han population [23]. But significant differences were found between Tujia group and Guangdong Han population [24] and between Tujia group and Uygur group [25] with p values of 0.0489 and 0.0076, respectively. We also used D3S4529, D6S474, D2S1776, D10S1435, D12ATA63, D6S1017, D1S1627, D5S2500, D10S1248, D2S441, D22S1045, D1S1677 and D14S1434 as the overlapping STR markers to compare our data with previously published data from other groups [2629]. Significant differences were found between Tujia group and Malay at D6S474 and D12ATA63 loci; between Tujia and Indian at D3S4529 and D12ATA63 loci; between Tujia and the three groups (Afrikaner, Asian Indian and Mixed Ancestry) all at D1S1627 locus; between Tujia and individuals from Rio Grande do Sul, Southern Brazil at D2S441, D22S1045, D1S1677 and D14S1434 loci; between Tujia and individuals from Maghreb at D2S441 (in the Table 3).

Table 3 Comparison of Fst and p values between Chinese Tujia group and other groups at the overlapping STR markers

In conclusion, we found a number of novel highly polymorphic STR markers that can be a potential extension of the traditional 15–17 STR loci used in the routine forensic application. The population data of the new set of 21 STR loci is likely to be useful in elucidating the genetic background of the Tujia group, and may also provide valuable data for population differentiation studies in the future.