Introduction

The human X chromosome has grown to be the object of wide research within the fields of population genetics and forensics in recent years (e.g., [1, 4, 7, 8, 1618]). X chromosomal STRs (X-STRs) are capable of complementing the analysis of Y and autosomal (AS) STRs, especially in cases where the offspring is female and the alleged father is unavailable, so that close blood relatives must be studied, or in the less common situation of maternity cases. Szibor et al. [18] summarized formulae for calculating the mean exclusion chance (MEC) when evaluating the forensic efficiency of X and AS loci in kinship analysis context. It is clear, that in trios involving a daughter, X chromosome markers are more efficient than AS markers, as the MEC is higher for these specific markers. Before a forensic application it is important to collect population data and construct reference databases to document the genetic variation of these specific STRs among worldwide populations.

Mutation rates and linkage disequilibrium studies of X-STRs are also lacking, which are essential for evaluating the potential applications of these specific genetic markers. Large PCR multiplexes for X-linked genetic markers make population studies and databasing more efficient and need to be designed and optimized. Several X-STR multiplexes have been described in the literature (e.g., [2, 4, 15, 16, 21, 22]), however, reference to amplifications in one single PCR reaction containing a high number of STR loci has not so far been common. This study presents information on the development and optimization of a ten-X-linked locus fluorescent STR multiplex and its application to the study of three United States population groups, namely, African Americans, Hispanics, and Asians. The aim of this study was to present and compare the distribution of allele frequencies of a newly developed decaplex X-chromosomal STR system in three US population groups.

Materials and methods

Samples and DNA extraction

Post-mortem blood stains available for research purposes at the Forensic Biology Department of the Office of Chief Medical Examiner, NY, USA were selected for this study. A total of 377 male samples from US-residing populations were typed for the ten X-STRs involved in this present work: 130 African Americans, 104 Asians, and 143 Hispanics. DNA extractions were performed employing the silica-coated magnetic bead purification technology using the automated M-48 bio-Robot (Qiagen, Hilden, Germany) following the manufacturer’s instructions (see also [13]).

X chromosomal STR amplification

Amplification was performed in a multiplex system, amplifying in one single-PCR reaction the following ten X-STRs: DXS8378, DXS9898, DXS8377, HPRTB, GATA172D05, DXS7423, DXS6809, DXS7132, DXS101, and DXS6789. Primer sequences are listed in Table S1. For DXS9898, DXS6809, DXS7132, DXS6789, and DXS101 new primers were designed using PRIMER3 software (http://frodo.wi.mit.edu/cgi-bin/primer3/primer3_www.cgi). PCR amplification was carried out using the QIAGEN Multiplex PCR kit (Qiagen) at 1X Qiagen multiplex PCR master mix and 0.5–5 ng of genomic DNA in a 10-μl final reaction volume. Final primer concentration in the reaction was 0.2 μM for all primers. Thermocycling conditions, using a GeneAmp PCR system 9700 thermocycler (Applied Biosystems, Foster City, CA, USA) were: pre-incubation for 15 min at 95°C, followed by ten cycles of 30 s at 94°C, 90 s at 60°C, and 60 s at 72°C; and 20 cycles of 30 s at 94°C, 90 s at 58°C, 60 s at 72°C with a final incubation for 30 min at 72°C.

Detection, typing, and analysis of PCR products

Separation and detection were performed using the ABI PRISM 3100 Genetic Analyser 16-capillary electrophoresis system. To each PCR product, 9.5 μl of HiDi formamide (Applied Biosystems) and 1.5 μl of internal size standard GS500 LIZ (Applied Biosystems) were added. Fragment sizes were automatically determined using GeneScan Analysis 2.1 software (Applied Biosystems) and genotyping performed through comparison with DNA control reference samples 9947A (female) and 9948 (male) taken from Promega commercial kits (Promega, Madison, WI, USA; originally established by Coriell Institute as NA9947 and NA9948, respectively; http://locus.umdnj.edu/nigms), previously typed by Szibor et al. [19] and using Genotyper software (Applied Biosystems).

Sequencing analysis

In this study, a few intermediate alleles were detected, and new alleles for some loci were unreported. Hence, DNA sequencing was performed to confirm these results, as well as sequencing of reference samples for loci DXS9898, HPRTB, DXS8377, GATA172D05, and DXS6809. To confirm evidence of allele dropout at the GATA172D05 locus, a new set of primers was designed using PRIMER3 software and used for sequencing (Table S1). PCR amplified fragments were purified with Microspin S-300 HR columns (GE Healthcare, Amersham Place, UK) and the sequencing reaction was performed using the ABI Big Dye Terminator Cycle Sequencing Ready Reaction kit (Applied Biosystems). Products were visualized in an ABI PRISM 3100 Genetic Analyser electrophoresis system and analyzed with Sequencing analysis 3.7 software (Applied Biosystems).

Statistical analysis

Allelic frequencies and standard errors, gene diversities, analysis of molecular variance (AMOVA), and population pairwise genetic distances (R ST for all markers, except DXS9898 where F ST was calculated; see Pereira et al. [14] for details) were calculated using software ARLEQUIN ver 3.0 [9]. Linkage disequilibrium tests were performed for all pairs of loci involved in this work using the same software. Statistics for forensic efficiency evaluation of each loci, namely, MEC in trios involving daughters (MECT) as well as in father/daughter duos (MECD), power of discrimination in women (PDF) and in men (PDM) were calculated using formulae according to Desmarais et al. [6].

Results and discussion

Amplification X-STR multiplex performance

The ten X-linked STR markers were successfully amplified in one single-PCR multiplex reaction, following the above reported conditions (Fig. S1). Aiming for a balanced amplification of multiplex STR performance, primer sequences for loci DXS9898, DXS6809, DXS7132, and DXS6789 had to be redesigned. New primer sequences were designed to adjust PCR product sizes to the range of each dye color and similar annealing temperatures. New primer sequences for locus DXS101 also had to be selected, as the initial primers used (Table S1) revealed amplification problems in comparison to the other X-linked STR loci selected for this work, presenting much lower amplification intensity in the multiplex, even when primer concentrations for this STR were twofold in relation to all other primers. Subsequent to redesigning and testing of the new primers, amplification of locus DXS101 was finally balanced in the decaplex X-STR multiplex. Several DNA concentrations ranging from 0.10 to 10 ng of DNA were tested for this X chromosomal specific multiplex and the best result revealing an enhanced balance among STRs, was observed for samples tested with a DNA concentration of 0.5 ng per reaction.

Nomenclature

Allelic nomenclature used for sample genotyping for most loci was according to Szibor et al. [19], except for two markers, where our sequencing data of reference samples revealed different typing results. After sequencing analysis of DNA reference samples 9947A and 9948 for locus HPRTB (Table S2), a TCTA repeat motif structure was considered and consequently these samples gained an extra repeat for this locus, having a 15-repeat TCTA motif instead of 14 repeats. In accordance with the DNA recommendations of the International Society for Forensic Genetics [3], we highlight the fact that both reference DNA samples 9947A and 9948 have a 15 allele for this specific marker. Genotyping of reference samples 9947A and 9948 for marker DXS9898 revealed incompatibilities among them. After DNA sequencing of 9948, we found 13 repeats for this sample instead of 14 repeat units (Table S2). Consequently, our 9948 genotype profile for calibration is a 13 allele.

X-STR alleles

Allele frequencies obtained for the ten X-STR loci studied (DXS8378, DXS9898, DXS8377, HPRTB, GATA172D05, DXS7423, DXS6809, DXS7132, DXS101, and DXS6789) in the three US population groups (African American, Asian, and Hispanic) are presented in Table 1. In this work four alleles not previously described were found, two at DXS9898 in African Americans and Hispanics (alleles 7 and 13.3), and two at DXS6809 in African Americans (alleles 31.1 and 33.1). These new alleles were sequenced and results are presented in Table S2.

Table 1 Allele frequencies for ten X-STRs in three US population groups

A rare intermediate allele at locus HPRTB

During sequencing analysis, of what was initially considered an intermediate allele with an equivalent size to 12.2 repeats for locus HPRTB, we observed 13 TCTA repeat motifs (Table S2). Comparison with reference samples 9947A and 9948 sequences for the same locus also revealed a non-interrupted repeat motif in all three sequencing results. On the other hand, an AG deletion at bases 48 and 49 downstream from the repeat unit was detected for the “12.2” allele; a similar result was reported by Mertens et al. [12]. According to our results and to the recommendations of the DNA commission of the ISFG [10], we named this allele as 13 (D48AGdel) instead of 12.2. In this way, if a different reverse primer is used complementing the region upstream the mutation, smaller amplicons will be produced and thus the fragment size will be compatible with 13 repeats as it does not involve the deletion.

A null allele at locus GATA172D05

A loss of an allele at locus GATA172D05 was observed in one sample in the Hispanic population group, after several attempts of multiplex and singleplex amplification. After decreasing the PCR specificity by lowering annealing temperatures of primers to 50°C in the singleplex reaction, a DNA fragment was amplified with six repeats. New primer sequences were designed, away from the previous flanking regions and used for sequencing and detection of the polymorphism responsible for the allele dropout. Results revealed a nucleotide substitution G→A at nucleotide 7 from the 3′ end of the reverse primer sequence (Table S2). The mutation was only observed in one out of the 377 chromosomes studied confirming the rarity of this null allele. For this reason, no measure was taken to change primer sequences for the GATA172D05 locus.

Electrophoretic mobility of shorter alleles at DXS8377 and DXS9898

For DXS8377, we found evidence of anomalous mobility in two samples of sizes initially thought to be equivalent to 37.1 and 39.1 alleles, which could lead to genotyping errors. After DNA sequencing results, no polymorphisms in or outside the repeat regions were found in these two samples that could justify the non-consensus genotypes (Table S2). The same was observed at the DXS9898 locus where an allele initially genotyped as 6.3 based on it size, was found to carry seven repeats without any point mutations in the repeat flanking regions (Table S2). Therefore, the electrophoretic behavior of the shorter alleles at DXS8377 and DXS9898 reinforces the need for the use of sequenced allelic ladders for accurate typing [3].

Forensic efficiency

Forensic statistical evaluation parameters were calculated in all three groups and are shown in Table S3. All loci selected for this decaplex study revealed to be highly polymorphic and as a result confirm their potential use for forensic purposes. DXS8377 was revealed to be the most polymorphic in all three population groups, followed by DXS101 in African Americans and in Hispanics. As for Asians, the second most polymorphic marker was DXS6809. The least discriminating locus in all populations was DXS7423. The high values obtained for combined MECT and MECD in all three populations support the potential of this decaplex system in a specific kinship analysis context when the offspring is female or when father/daughter relationships are being investigated.

The same was observed for the overall values of PDF and PDM in all three populations. These high values of power of discrimination obtained both in females and in males support the value of this X-STR multiplex in forensic identity testing.

Population pairwise comparisons

Population differentiation of the three US population groups studied in the present work was evaluated by genetic distance analysis. The results (Table S4) show that for loci HPRTB, DXS6809, and DXS7132 there are no significant genetic distances between population groups. On the other hand, for GATA172D05 and DXS6789 highly significant R ST values were obtained in all pairwise comparisons. At DXS8378, DXS101, and DXS9898, only non-significant genetic distances were obtained when comparing African Americans and Hispanics and for DXS7423 the only nonsignificant R ST value was observed between Asians and Hispanics. In contrast, these two groups were the only populations showing significant genetic distance at DXS8377. As expected, these results are consistent with other studies regarding the genetic structure of the US populations using different genetic markers on the human genome [5, 11]. As a result, a pooled global database cannot be utilized for this decaplex X-STR system, but independent databases would have to be employed for each of the tested New York ethnic groups.

Linkage disequilibrium

The exact test for linkage disequilibrium was performed for all pairs of loci in the three population groups. In Hispanics, the only significant result out of 45 pairwise comparisons (p = 0.016) was obtained between DXS8378 and DXS7132. Nevertheless, as these two markers are quite distant on the chromosome and no significant associations between intermediately located markers were found, no real linkage disequilibrium is expected to exist between them, and the result is best attributed to sampling effects. The same was observed in Asians, where two pairs of distant loci also revealed significant association (p = 0.0326 between DXS9898 and DXS101 and p = 0.0434 between DXS7132 and DXS6789). In African Americans, p values below 5% were observed in five pairs of loci.

The highest association (p = 0.0165) was found between loci DXS6809 and DXS6789, which have been described as being part of the same haplotype cluster group composed by DXS6801-DXS6809-DXS6789 [20]. The recent population admixture has probably not yet allowed for recombination to break down this association. Anyway, as the p values do not stand Bonferroni’s correction (p < 0.0011), it cannot be considered as established that, in forensic applications, test for DXS6809 and DXS6789 in African Americans should be considered as haplotypes instead as independent loci. In any case, and to allow future comparisons and sample size enlargements, haplotype frequencies for these two loci are shown in Table S5.

The ten markers included in the present multiplex are distributed along the four different linkage groups on the X chromosome. Nevertheless, concerning the markers included in this multiplex, only DXS6809 and DXS6789, belonging to the second linkage group, have been shown to be in strong linkage disequilibrium [16, 20]. The lack of association between these ten X-STRs contributes to the increased power of discrimination of this multiplex. Nevertheless, haplotype analysis has been demonstrated to be a valuable tool in pedigree-based-kinship testing. Therefore, similar to the strategy followed by Robino et al. [16], the development of a second multiplex assay including markers closely linked to these ten X-STRs can be useful in solving these particular cases. The two sixplex systems developed by Robino et al. [16] overlap in eight out of ten markers included in the present decaplex. Hence, it would be necessary to type only two additional markers (DXS7424 and DXS6801) to complete the two groups described by Robino et al. [16] as being in strong linkage disequilibrium.

In conclusion, this work demonstrates the usefulness of this X-STR decaplex system in both anthropological and identification analysis in the three studied US population groups, as well as for population genetic studies.