Introduction

Polymorphic short tandem repeats (STR) are presently the most powerful and most widely used genetic markers for individual identity and paternity testing in forensic applications [1]. STR polymorphisms amplified by polymerase chain reaction (PCR) are highly sensitive for typing stains with a minimal amount of DNA or degraded DNA because the amplified DNA fragments are usually shorter than 300 bp [2]. Most STRs have only three to six common alleles [3]. Therefore, a system which types a large number of STR loci in one reaction is required to increase the discrimination power and save time and tested material.

The Combined DNA Index System (CODIS) database with 13 CODIS core STR loci genotyped is one of the largest DNA databases in the world. A 15-autosomal STRs multiplex kit (AmpFℓSTR Identifiler PCR Amplification Kit, Applied Biosystems, Foster City, CA, USA), including the 13 CODIS loci, is widely used in current forensic casework. However, developing polymorphic STR markers unlinked to the CODIS loci is of growing interest among forensic practitioners. The identification of additional autosomal STRs is warranted so as to increase the number of highly polymorphic markers and improve the distinguishing ability [4]. A large pool of autosomal STRs can provide selected sets of STRs with distinct characteristics for multiplex design for specific usage in particular populations. Applying new STR markers may yield additional information and complement conventional STR analysis [5, 6].

We developed a 14-autosomal STR multiplex system that can be amplified in one single PCR reaction. Herein, we present information on this newly established fluorescent STR loci system and its application to the analysis of a Taiwanese population. The allelic frequency data and the variable repeat sequence of the 14 STR markers are demonstrated, and the forensic parameters of these markers are evaluated.

Materials and methods

This retrospective study was approved by the Institutional Review Board. A total of 572 DNA samples from 94 males and 478 females apparently healthy and unrelated Taiwanese Han subjects were analyzed. The blood samples and buccal swab samples were obtained from volunteer donors between 1993 and 2007. Standard methods of phenol–chloroform/isoamyl alcohol extraction and the QIAamp blood kit (Qiagen, Hilden, Germany) were used for DNA extraction from peripheral whole blood samples, and the Blood and Tissue Genomic DNA extraction Miniprep system (Viogene, Taipei, Taiwan) was used for DNA extraction from buccal cells.

Thirty parent–child pairs of parentage testing cases with a combined paternity index (CPI) below 1,000 and 32 parent–child pairs with single-step mutations found in AmpFℓSTR Identifiler (Amplied Biosystems) loci were recruited for validation of the newly developed system. For these parent–child pairs, genotyping using the new 14-autosomal STR loci multiplex system was carried out. The CPI using both our new system and the AmpFℓSTR Identifiler was calculated and compared with the results from using the AmpFℓSTR Identifiler only.

One multiplex PCR for each sample was performed with the newly designed primer sets. All 14 primer pairs (loci D3S1744, D4S2366, D8S1110, D12S1090, D13S765, D14S608, Penta E, D17S1294, D18S536, D18S1270, D20S470, D21S1437, Penta D, and D22S683) were designed using PRIMER3 software (http://frodo.wi.mit.edu). Of these 14 loci, four loci are novel (D8S1110, D13S765, D17S1294, and D18S536), and ten loci have been described in previous papers [713]. However, we modified all the previously reported primer sets in order to amplify the fragments in a single PCR reaction. The 14 autosomal STRs and amelogenin (AMEL) were typed following the methodology described previously, with minimal modifications [14]. Table 1 lists the primer sequences and dye labels used. Briefly, PCR reactions were performed in a total volume of 10μL containing 1 ng of genomic DNA, 1× Super-Therm Gold PCR buffer (JMR Holdings, Sevenoaks, UK), 1.5 mM MgCl2, 250 μM each of deoxyribonucleotide triphosphate, primer sets, and 0.5 U of Super-Therm Gold DNA polymerase (JMR Holdings). The amount of each primer set in the multiplex PCR mixture is listed in Table 1. PCR was performed using a GeneAmp 9700 Thermal Cycler (Applied Biosystems, Foster City, CA) in 9600 mode. The cycling programs consisted of pre-denaturation at 95°C for 11 min, followed by 32 cycles of denaturing at 94°C for 1 min, annealing at 59°C for 1 min, extension at 72°C for 1 min, and a final extension at 60°C for 45 min.

Table 1 Chromosomal location, primer sequences, labels, amount in PCR reactions, and amplification sizes of the 14 STR loci in multiplex

Electrophoresis was performed using an ABI 3100 Genetic Analyzer (Applied Biosystems) in which 1 μl of multiplex PCR product was mixed with 10 μl Hi-Di formamide and 0.2 μl of the GeneScan-500LIZ internal size standard. Fragment sizes were automatically determined using GeneScan Analysis software (Applied Biosystems). An allelic ladder (containing the same internal size standard) was used to assign genotypes to the samples. Our allelic ladder was a mixture of adequately diluted known DNA samples with particular different alleles in each locus. Genotyping was analyzed using either Genotyper or GeneMapper ID software (Applied Biosystems) by comparison with allelic ladder and reference DNA control samples 9947A (female; Applied Biosystems), GM9948(male; Coriell Institute for Medical Research, Camden, NJ), and GM3657(male; Coriell Institute for Medical Research) as recommended [15].

Because fragment length is a primary determinant of quality, fragments ranging in size from 102 (shortest) to 445 (longest) bases were evaluated, irrespective of dye format, using our allelic ladder and amplified samples. Twenty injections of each of selected DNA samples, control DNA samples, and our allelic ladder were conducted using an ABI 3100 Genetic Analyzer (Applied Biosystems) with the 61-cm capillary (50 cm effective length) and POP6 (Applied Biosystems). The precision of PCR fragments were assessed using three selected donor DNA samples (A, B, and C) and three control DNA samples (9947A, GM9948, and GM3657) among each locus especially those that exhibited the following characteristics: (1) the shortest fragment (102 base, D21S1437, allele 7) and the longest fragment (455 base, Penta D, allele 18), (2) alleles that differ in size by one or two bases (D18S1270, alleles 9.1 and 9.2; D22S683, alleles13.2, 14, and 14.2), and (3) “n” and “n ± 1” alleles. Precision was calculated as the standard deviation (SD) of size estimated as described previously [16, 17].The bins for the GeneMapper software and the category boundaries for the Genotyper were derived from the resulted SD.

DNA sequencing was performed for novel STRs, novel alleles, off-ladder alleles, and loci with limited sequence data. PCR products obtained with unlabelled primers were cloned into Escherichia coli DH5α (Yeastern Biotech, Taipei, Taiwan) using the pGET-T Easy vector (pGET-T Easy Kit Vector Systems, Promega, Medison, WI, USA). Sequencing of both strands of the DNA inserts was performed using the Big Dye Terminator Cycle Sequencing kit (Applied Biosystems). The product was detected in the ABI PRISM 3100 Genetic Analyzer electrophoresis system and analyzed with Sequencing Analysis 3.7 software (Applied Biosystems).

Statistical analysis

The power of discrimination [18], the power of exclusion [19], the mean exclusion chance [20], the polymorphic information content [21], the observed heterozygosity, the expected heterozygosity [22, 23], and the deviation from the Hardy–Weinberg equilibrium (HWE) based on the exact test [24] were carried out using the GENEPOP (version 3.4) software package [25] and Arleguin v3.1 software [26]. Linkage disequilibrium analysis of these loci by the exact test was performed with Arleguin v3.1 software.

Results and discussion

Allele frequencies of each sample from the 572 unrelated Taiwanese Han were investigated in one multiplex. Examples of DNA profile and chromatogram of the allelic ladder and an amplified DNA sample are illustrated in supplementary Figure S1 and supplementary Figure S2, respectively. All of the different alleles found for the loci were nomenclatured based on the variable tandem repeat motifs and classified according to the guidelines of the International Society for Forensic Haemogenetics (ISFH) [27]. Repeats could be divided into simple, compound and complex repeat sequences according to ISFH (Table 2). All markers were located on different chromosomes or separated by at least 50 cM from STRs of the AmpFℓSTR Identifiler. Therefore, these markers may be unlinked to commonly genotyped STRs and can provide additional STR information for analysis of forensic casework and parentage testing. A detection limit of at least 100 pg of DNA could be observed for this multiplex system.

Table 2 Sequencing data of different alleles of the 14 STR loci

Among these 14 STR loci, 11 contained a variation in a single repeat region (D4S2366, D12S1090, D13S765, D14S608, Penta E, D17S1294, D18S536, D18S1270, D20S470, D21S1437, and Penta D) and are classified as simple repeats. For these 11 simple repeat STRs, the allelic nomenclature follows directly from the number of repeats in the simple block. The non-variant repeat blocks that were not directly adjacent to the simple repeat were excluded [28]. For locus D4S2366, there were simple repeats with motif sequence variants. One STR locus (D22S638) in this multiplex was categorized as compound repeats because it contained more than two different repeat motifs that were directly adjacent [28]. The remaining two STRs (D3S1744 and D8S1110) were complex in structure since they consisted of more than two variable blocks interspersed with intervening non-variable sequences. The additional non-variable repeat blocks between the variable regions were not counted in the repeat length nomenclature. The sequence data and allelic nomenclature for novel STRs and loci with limited sequence data are presented in Supplementary Tables S1. Deletion variants were found in the adjacent sequence data of D22S683. For locus D18S1270, there were substitutions (TCTA → TATA) in the first motif and deletion variants in adjacent sequence structures in some samples (Supplementary Table S1).

For quality study, SD data of our allelic ladder are shown in supplementary Table S2, and SD data of amplified fragments of selected DNA samples, as well as control samples, are presented in supplementary Table S3, respectively. The SD ranges from 0.02 to 0.09, and the plus/minus three SD is ±0.27 (0.09 × 3) bases. The result of the sizing precision study supported at most a ±0.27-base range for binning alleles, which was within 0.5. Therefore, alleles could be distinguished from alleles that differ in size by one base. Based on the precision of the assay, alleles could be binned by the Genotyper and could be characterized correctly by comparison to the allelic ladder. The genotypes of common control DNAs (9947A, GM9948, and GM3657) are shown in supplementary Table S4. Our genotyping results of the D3S1744 and D4S2366 confirmed the results of a previous report [29].

The distributions of allelic frequencies for these autosomal STRs of these 572 Taiwanese are presented in Table 3. The number of alleles varied from seven (D4S2366, D13S765, and D18S536) to 26 (D22S683). The forensic parameters of these 14 loci typed in the newly developed multiplex PCR are shown in Table 3. The discrimination power in our multiplex ranged from 0.6858 (D18S536) to 0.9168 (Penta E) with a combined discrimination power of 0.999999999.Among these 14 loci, 12 were in HWE with a 5% significance level taken, and two loci (D8S1110 and D22S683) with p value below 0.05 may have possible deviation from HWE. (Table 3) However, if the Bonferronic correction is applied, then only p values below 0.00357 (0.05/14) will be considered significant, and none of these 14 loci are with significant deviation from HWE [30, 31]. Therefore, D8S1110 and D22S683 still can be used for paternity testing in Taiwanese. For pairwise linkage disequilibrium analysis of these loci, no statistically significant linkage disequilibrium was found (p values ranging from 0.085 to 1.000). These STR markers can be combined for biostatistical analysis.

Table 3 Allelic frequency distributions of the 14 STR loci of 572 unrelated Taiwanese Han

In order to evaluate the forensic application of this newly developed STR multiplex in paternity testing, we genotyped 30 parent–child pairs with CPI below 1,000, analyzed using AmpFℓSTR Identifiler, and another 32 parent–child pairs with single-step mutations in loci of the AmpFℓSTR Identifiler. The CPI in these 30 parent–child pairs with CPI below 1,000 increased from 2,908.1 to 1,664,414,542 times (mean, 55,027,018.2 times), and reached a CPI of 9.1 to 119,972,836,279.2 combined with results from the AmpFℓSTR Identifiler. The CPI in 32 parent–child pairs with single-step mutations increased from 121.4 to 33,192,744.1 times (mean, 2,753,791.4 times), and reached a CPI of 27,584.2 to 23,394,156,145.3. These 14 loci genotyped simultaneously in one PCR reaction provide sufficiently informative data. In addition to autosomal STRs included in the AmpFℓSTR Identifiler, this set of autosomal STRs improved the ability to prove parentage and increased the CPI. They provide additional power to distinguish the possible single-step mutation in parent–child pairs. Comparison between these 14 loci in our multiplex and the 15 loci in the AmpFℓSTR Identifiler in Taiwanese subjects in a previous report indicated that six loci (D12S1090, Penta E, D18S1270, D20S470, Penta D, and D22S683) of our 14 STRs were among the ten most polymorphic STRs, while loci D22S683 and Penta E appeared to be the most polymorphic STR markers [32]. The combined power of exclusion of the 15 loci of the AmpFℓSTR Identifiler in Taiwanese was 0.9999977068, while the combined power of exclusion of the 14 loci in our system in Taiwanese Han was 0.9999995913 [32].

In conclusion, this 14-non-CODIS autosomal STR multiplex system provides highly informative STR data and appears useful in parentage testing. Further research is necessary to investigate the sequence variation and application in different population groups.