Introduction

Autosomal short tandem repeat (STR) polymorphisms are the gold standard for DNA-based personal and forensic identification due to their high discrimination power, sensitivity, and relatively simple analytical and methodological requirements [1, 2]. Although STRs are the markers of choice, occasionally, the results might be uninformative or inconclusive, particularly in deficient paternity cases or when degraded material is the only DNA source. Hence, complementary analyses such as polymorphisms located on Y [3, 4] and X [5, 6] chromosomes or mitochondrial DNA (mtDNA) are frequently required [7, 8]. The transmission pattern of the human X chromosome differs between genders, escaping recombination in males, whereas in females, it occurs at a lower rate than for autosomes leading to higher levels of linkage disequilibrium (LD) [9]. This feature provides a complementary tool for complex paternity cases in which males, whose X chromosome remains in a hemizygous condition [10], display the recombined information from both X elements inherited from their mother.

To increase the knowledge and improve the selection of polymorphic markers, studies involving novel polymorphisms are continually being carried out. Particularly, unique event polymorphisms (UEPs) have received special attention, such as single nucleotide polymorphisms (SNPs) and insertion/deletion (InDel) polymorphic variants. SNPs are useful as a complement to STR analysis in human identification cases involving degraded samples or inconclusive results [10, 11]. Moreover, these markers are helpful in inferring ancestry [12, 13], as well as visible external traits [14], even in ancient skeletal remains [10, 15]. Although SNPs are more likely to be amplified given their shorter length, they still exhibit some technical difficulties for routine forensic casework. A variety of platforms have been proposed for SNP analysis, i.e., microarrays [16], mini-sequencing SNaPShot® (Applied Biosystems, CA, USA) [17], and TaqMan assays (Applied Biosystems) [18], however, being more labor- and resource-intensive than STR typing approaches.

Almost two million InDels ranging from 1 to 10,000 bp [19] have been identified and extensively characterized by wet-lab [20] or in silico strategies [21]. They are widely distributed throughout the genome and derived from single mutation events which occur at a very low frequency. Moreover, some of them might be informative as ancestry markers since they may display significant allele frequency differences among geographically separated populations, whereas others might be a complement in human identification tests [2225].

Polymorphic genetic markers located on the X chromosome might complement forensic identification cases, particularly in deficient paternity cases. Aiming to evaluate their performance, we analyzed the genetic features of a panel of 33 X-chromosome InDel polymorphisms (X-InDels) including allele distribution, genetic variability, LD, and forensic statistical parameters in a representative sample from the Argentinean population.

Subjects, materials, and methodology

Donors and sample collection

Unrelated voluntary donor samples (200 males and 120 females) were collected at the Department of Forensic Genetics and DNA Fingerprinting Service, School of Pharmacy and Biochemistry, University of Buenos Aires, Argentina, during the period 2009–2012. All donors participated in paternity testing and signed written consent statements, approved by the Ethical Committee of the School of Pharmacy and Biochemistry. However, samples were treated anonymously during the study. Sample size was chosen as an approximation to the relative population density of each region to the entire Argentinean population (National Institute of Statistics and Censuses, INDEC 2010 www.indec.mecon.ar) attaining a similar representation of the Argentinean population density. Three major regions were considered for sampling, including Central region (N = 209), North-Eastern region (N = 53), and Southern region (N = 58).

DNA extraction and quantification

Blood samples were obtained by finger puncture and spotted onto Whatman 3MM paper (Merck, Lutterworth, UK). DNA extraction was performed with a semi-automated extraction method (Maxwell®16 System, Promega, Madison, WI, USA) following the manufacturer’s instructions.

All DNA extracts were quantified by real-time PCR in a Rotor Gene 6000 equipment (Corbett, Sydney, Australia) using Plexor HY® kit (Promega) according to the manufacturer’s instructions.

PCR amplification

The analyzed panel included 33 X-InDels, namely MID218, MID1445, MID3703, MID184, MID357, MID448804, MID3774, MID3780, MID1326, MID3763, MID193, MID3728, MID2610, MID3705, MID2652, MID3706, MID2657, MID1540, MID3692, MID304737,MID2694, MID2600, MID358, MID1705, MID3756, MID19147, MID356, MID3764, MID284601, MID2047, MID103547, and MID3712 (Figure S1). Localization, rs numbers, and primer sequences of the different markers were previously described in supplementary data by Freitas et al. [26].

Statistical analysis

Allele distribution and statistical parameters of forensic relevance, such as expected heterozygosity (HET), power of discrimination in females (PDF) and power of discrimination in males (PDM), were calculated with an online software ChrX-STR.org 2.0 Calculator (http://www.chrx-str.org/) [27]. The mean exclusion chance of X-chromosomal markers in trios involving daughters (MECTRIOS) and father/daughter duos lacking the maternal genotype information (MECDUOS) were calculated according to Desmarais et al. [28]. Since allele frequencies in males and females showed no significant statistical difference, both datasets were combined for forensic parameter calculation.

Arlequin v3.5 software was used to test Hardy-Weinberg Equilibrium (HWE) and LD [29]. HWE was evaluated in females, whereas LD was tested by an extension of Fisher’s exact test on contingency tables, D’, and Chi-square values from male haplotype counts. Alternatively, a likelihood-ratio test for the obtained haplotypes was performed on female genotypic data. A value of p < 0.05 was considered statistically significant. Diversity parameters including dwmin (minimum diversity within the population) and mwmax (maximum match probability within the population) were calculated according to Brinkmann et al. [30].

Results

For the 33 X-InDel markers, allele distribution, statistical parameters of forensic relevance, and LD test were performed in a sample of the Argentinean population. The majority of the samples showed different combinations of the insertion allele. In contrast, combinations of deletion alleles occurred at significantly lower ratios. No deviation from HWE was detected among the 33 markers included in the panel.

Regarding LD, 15 markers were organized in haplotypic blocks and tested accordingly [26]. LD was considered positive when D’ ≥ 0.8, a measure of recombination patterns, and the linkage Chi-square test, in men, was significant [31]. The likelihood-ratio test in women was also statistically significant [32]. We could identify six blocks containing two or three linked loci, namely block I (MID356-MID357), block II (MID448804-MID3703-MID218), block III (MID3705-MID3706-MID304737), block IV (MID197147-MID3754), block V (MID3664-MID284601-MID103547), and block VI (MID3763-MID3728). The observed frequencies for six haplotypic blocks in the Argentinean population are shown in Table S1.

The statistical parameters were calculated using single locus information (18 systems with independent segregation), whereas markers in LD were treated as haplotypes. The mean heterozygosity of the 33 X-InDel panel was 0.36, including 15/24 higher than 0.3. The accumulated power of discrimination was higher in females (99.9999992 %) than in males (99.9992925 %). The mean exclusion chance in trios and duos were 99.9891736 and 99.6099391 %, respectively (Table 1).

Table 1 Analysis of forensic statistical parameters of 33 X-InDel markers including disequilibrium linkage blocks

Haplotype diversity parameters dwmin and mwmax (match probability within the population) are indicated in Table S1. The haplotype diversity was higher than 0.99 in all cases. Block II showed the lowest mwmax, probably as a consequence of haplotype frequency distribution, whereas blocks III and VI showed the highest mwmax since over 90 % of the samples exhibited the same haplotype (Table S1).

Discussion

It is well known that increasing the number of polymorphic markers suitable for DNA-based forensic identification improves the well-established STR-based human identification analytical platform [33]. Prior to their incorporation, it is mandatory to characterize these markers completely by testing them in the potential population in which they will be used. Therefore, the aim of this work was to evaluate the potential forensic suitability of additional genetic markers to be included in the routine forensic casework toolbox.

Regardless of LD, the results obtained with the X-InDel panel for accumulated power of discrimination in males and females, as well as the mean exclusion chances in duos and trios, were slightly lower than other studied InDels [26, 34, 35], X-STR [6, 36, 37], and X-SNPs [38, 39], but comparable with the X-chromosomal 21-InDel panel proposed by Edelman et al. [40].

Our results for blocks I, III, and V are in accordance with previously published reports. In the Argentinean population, MID356 and MID357 markers (block I) showed to be linked, which is in agreement with previous reports for Somalian and Iraqi [41], Colombian [42], Brazilian Amazon [26], Portugal, Angola, Mozambique, Macau [35], German, and Baltic [40] populations due to their extreme proximity on the X chromosome (5.2 kb apart). A similar picture is observed for block III where MID3705, MID3706, and MID304737 are 0.07 centiMorgan (cM) apart. It was observed that those markers were in LD in the Brazilian Amazon population of Belem [26], European, African, and Northern and Southern Brazilian [43] populations. Regarding block V (MID3664-MID284601-MID103547), studies performed in the Brazilian Amazon population of Belem are in concordance with those obtained for this linkage block. Nonetheless, our results for blocks II and VI slightly differ when compared with previously published data. Within block II (MID448804-MID3703-MID218), LD was detected in European, African, and admixed populations from Northern and Southern Brazil for MID3703, MID218, plus MID3774 marker but not linked to MID448804 [43]. Furthermore, they found the MID448804 marker linked to MID 3780 in African, Native American, and admixed populations from Northern and Southern Brazil but not among the Europeans [43]. A similar scenario was observed in the population of Belem where the linkage between MID3780, MID448804, MID218, and MID3774 was previously described [26]. This discrepancy could be explained by (a) the different parental genetic migration rates that occurred in Brazilian Amazon and Argentinean populations and (b) possible admixture that took place in a period not long enough to detect the linkage. In the case of block VI (MID3728 and MID3763), the same were obtained by Freitas et al. for the population of Belem [26]. Moreover, Resque et al. detected this linkage block plus MID3728 in European, African, and admixed populations from Northern and Southern Brazil [43]. However, the latest criterion for determining linkage was D’ > 0.65, which is less stringent than that applied in our study. Further studies involving different and larger sample sets need to be carried out to understand better the LD patterns in different population groups. Finally, no previous studies have reported on block IV (MID197147-MID3754).

By analyzing the forensic parameters, at least five markers (5/18) and blocks III and VI presented expected heterozygosity values below 0.2, indicating that these markers would not be informative for forensic analysis, at least, in the Argentinean population (Table 1). The markers MID2600, MID2610, MID2694, MID3692, and MID2657 were associated with African ancestry [44]. Since the African population, or African descendants, are not numerically well represented in Argentina [45], the inclusion of these markers for forensic purposes could be discarded.

To our knowledge, this is the first X-chromosome InDel study in the Argentinean population. The analysis of forensic parameters showed that the values obtained for PDF, PDM, MECT, and MECD for this panel were slightly lower than those of other InDels [26, 34, 35, 41] or STR panels [6]. Nevertheless, due to the low mutation rates of InDels, this panel could be used as a complement in cases displaying few microsatellite transmission incompatibilities. However, to prevent data misinterpretation, careful consideration should be taken when markers in LD are used [46, 47]. Further studies testing these markers’ potential use for ancestry analysis are ongoing.