Introduction

Short tandem repeats (STRs) are important genetic markers and widely used in forensic applications because of their highly polymorphic characteristics [1, 2]. The STRs in the Combined DNA Index System (CODIS) are commercially available in many multiplex amplification kits (e.g., AmpFℓSTR Identifiler®). Testing with these loci usually can meet the requirements of human individual identification and standard triopaternity test. However, duo paternity testing [3, 4], complex kinship analysis cases (e.g., full-sib, grandparents–grandchildren, etc.) [5], and cases with mutations [6] may need more STR loci to obtain reliable identifications. Extra STR loci can provide additional genetic information and are a complementary tool to the conventional STR analysis [710]. There have been studies on seeking additional polymorphic STR loci independent from the current CODIS loci [11, 12]. This study is to validate a panel of 21 autosomal non-CODIS STR loci for Northern Han population in China.

Materials and methods

Population and DNA samples

Han population is a native ethnic group in China and, by most modern definitions, the largest single ethnic group in the world. In this study, 220 healthy unrelated Han volunteers in northern China were sampled. One hundred fifty-eight two-generation families including 80 father–child–mother trios and 78 mother–child or father–child duos were collected to estimate the mutation rates of the STR loci. All samples were genotyped with a AmpFℓSTR Identifiler® multiplex STR kit (Applied Biosystems, Foster City, CA, USA). Paternity index of each family is at least 10,000 to confirm the relationships based on the STRs in Identifiler.

DNA extraction and quantification

Genomic DNA was extracted by using the Chelex-100 protocol as described by Walsh et al. [13]. The quantity of recovered DNA was determined by Qubit® Quantitation System (Invitrogen, CA, USA) according to the manufacturer's specifications.

DNA amplification

Amplification of STRs was carried out using a multiplex PCR system AGCU 21 + 1 fluorescence amplification reagents (AGCU ScienTech Incorporation, Wuxi, Jiangsu, China), which includes Amelogenin, D6S474, D12ATA63, D22S1045, D10S1248, D1S1677, D11S4463, D1S1627, D3S4529, D2S441, D6S1017, D4S2408, D19S433, D17S1301, D1GATA113, D18S853, D20S482, D14S1434, D9S1122, D2S1776, D10S1435 and D5S2500. A multiplex PCR amplification was performed with a total volume of 10.0 μl containing 0.2–1.0 ng genomic DNA, 4 μl Reaction Mix, 2 μl 21 + 1 Primers, 1 U HS-Taq DNA polymerase and ddH2O. PCR was conducted with a GeneAmp PCR System 9700 Thermal Cycler (Applied Biosystems, Foster City, CA) following a protocol with an initial denaturation step at 95°C for 11 min, followed by 10 cycles at 94°C for 1 min, 62°C for 1 min, 72°C for 1 min, and 20 cycles at 90°C for 1 min, 60°C for 1 min, 72°C for 1 min, a terminal extension step at 60°C for 60 min. The experiments were conducted in accordance with quality control measures. Cell lines 9947A (Promega, Madison, WI, USA) were used as positive standard reference materials [14] and ddH2O was used as negative control.

Genotyping

The PCR products were separated by Capillary Electrophoresis on ABI PRISM 3130 Genetic Analyzers (Applied Biosystems, Foster City, CA, USA). Raw data were analyzed using GenemapperID 3.2 software (Applied Biosystems, Foster City, CA, USA), and alleles were determined by comparing with the allele ladder (AGCU ScienTech Incorporation, Wuxi, Jiangsu, China). If off-ladder peaks, triallelic patterns [15, 16] or mutations between children and parents were encountered, the samples would be typed again to confirm the genotypes. Off-ladder alleles were determined by the method described by Gill et al. [17].

Statistical analysis

Hardy–Weinberg Equilibrium (HWE) and expected heterozygosity (He) of each locus as well as Linkage Disequilibrium (LD) between each pair of loci were tested with the Genepop Version 4.0.10 software package (http://genepop.curtin.edu.au). Polymorphism information content (PIC), observed heterozygosity (Ho), discrimination power (DP) and probability of paternity exclusion (PE) of Han population were calculated by PowerStats program (http://www.promega.com/geneticidtools/). Further, the Han population data were compared with Tibet population [18] by the shuffling testing method described in Ref. [19].

Quality control

All laboratory procedures are accredited according to ISO17025. Furthermore, laboratory internal control standards were employed according to recommendation published by the Paternity Testing Commission of the International Society for Forensic Genetics [20].

Results and discussion

The genomic mapping information of the 21 non-CODIS loci showed in Table 1 is based on UCSC genome data (http://www.genome.ucsc.edu) and STRbase (http://www.cstl.nist.gov/biotech/strbase). No peak appeared in negative control. The genotypes of 9947A were the same as the standard reference in all experiments and listed in Table 1. Allele frequencies and forensic statistics of each locus were shown in Table 2. Triallelic patterns were observed at D19S433 (e.g., 12, 13 and 14) in a male and D10S1435 (e.g., 12, 13 and 14) in a female, respectively. Deviations from Hardy–Weinberg equilibrium were only detected at D22S1045 (p-value = 0.0000) and D14S1434 (p-value = 0.0047). After Bonferroni correction (i.e., 0.05/21 = 0.00238), only D22S1045 was still significant in HWE test.

Table 1 The genomic mapping information of 21 non-CODIS loci with some loci in the Identifiler® kit which locate on the same chromosome with the non-CODIS loci
Table 2 Allele frequencies and relevant forensic statistics of the 21 non-CODIS STR loci in northern China Han population (n = 220)

Sixteenout of 21 loci had the observed heterozygosity (Ho) greater than 0.7. The highest Ho was 0.836 at D19S433 and the lowest Ho was 0.591 at D1GATA113. Discrimination power (DP) varied between 0.762 at D1S1627 and 0.948 at D19S433. The probability of paternity exclusion (PE) in trios varied between 0.341 at D1S1627 and 0.659 at D22S1045. The probability of paternity exclusion (PE) in duos varied between 0.189 at D1S1627 and 0.487 at D19S433 and D22S1045. Cumulative DP of these 21 loci is 0.999999999999999999991. Combined probabilities of exclusion in duos and trios are 0.999025 and 0.9999997, respectively. These statistics indicate that these 21 loci have high or medium probability of exclusion as well as discrimination power and can be useful in human identification.

Thirteen pairs of loci were detected in significant Linkage Disequilibrium (LD) in a total of 210 pairwise comparisons in Table 3. After Bonferroni correction (e.g., p-value = 0.05/210 = 0.000238), only two pairs D2S441 and D2S1776 (P = 0.0000), D12ATA63 and D19S433 (P = 0.0000) were still significant in the LD test.

Table 3 The significant Linkage Disequilibrium values of 13 pair of loci after pairwise comparison

Table 1 also shows that some loci in the Identifiler® kit (TPOX, D2S1338, D3S1358, FGA, D5S818, CSF1PO, TH01, vWA and D18S51) are on the same chromosomes with the 21 non-CODIS loci. However, almost all physical distances between STR loci on the same chromosome are closed to or more than 50 Mb, except for D3S1459 and D3S1358 (i.e., 40 Mb). Thus, all 21 non-CODIS loci together with loci in the Identifiler® kit may be treated as independent loci, although further genetic linkage study on recombination fraction between D3S1459 and D3S1358 may be required.

In addition, we compared Northern Han population in China and Chinese Tibetan population in Lhasa [18] with these 21 non-CODIS loci. Table 4 shows the p-values of the shuffling tests [19] for population differentiation. Eleven out of 21 markers have p-value less than 0.05. Even after Bonferroni correction (0.05/21 = 0.00238), there are eight markers with significant p-values. Apparently, Northern Han and Tibet populations are significantly different in a good proportion of the tested markers.

Table 4 Shuffling test for Han and Tibet populations (10,000 shuffles). The method is described in Ref. [19]

In some complex kinship analysis cases (e.g., distant relatives or with mutations [21]), current commercial kits with CODIS core STRs may not be able to provide high enough likelihood ratio or probability of paternity to obtain reliable identifications. More STRs can provide extra information to raise the confidence of identifications. However, increasing the number of STR loci detection will also increase the probability of mutation. It would be better to select markers with low mutation rates. In the family genotype data, mutations were detected in four cases at four different loci (i.e., D22ATA63, D10S1248, D19S433 and D14S1434). All mutations were one step mutation (Table 5). With a total of 238 meioses, the expected mutation rates of these four loci are 0.0042 with 95% confidence interval [0.0001, 0.0232]. More pedigree samples will be tested to obtain more precise mutation rates.

Table 5 Mutations detected from the pedigree analysis