Introduction

X chromosome short tandem repeat (X-STR) typing has arisen in forensic casework as a powerful tool in kinship testing and also as a valuable source of identification data complementary to the conventional autosomal STR typing, and/or mitochondrial DNA sequencing [13]. Moreover, typing of linked X-STR loci enables differentiation of pedigrees that are otherwise indistinguishable if interpreted through unlinked autosomal STR markers [4]. Introducing X-STR typing to routine forensic work would be of great aid in solving criminal cases and missing person identification. For several years now we have been witnessing a continuous growth of research interest, population studies and product development for X-STR typing, which will certainly result in its extensive use in forensic practice [5]. In that respect, the aim of this study was to found the national reference X-STR database, which is needed for match interpretation in forensics, as well as for intra- and inter-population studies.

Investigator® Argus X-12 kit is a commercially available product that enables simultaneous amplification of 12 X-STR markers on genomic DNA template. Three markers per group are clustered in four linkage groups (LGs) as the linked loci that co-segregate during meiosis. Because of possible recombination events, linkage disequilibrium (LD) has to be considered as a factor which can have an impact on the interpretation and calculation of probabilities in relationship testing [6, 7]. Therefore, to confirm stable inheritance of linked clusters in the population of south Croatia, we performed LD testing. We also computed forensic parameters in order to evaluate forensic pertinence of Investigator® Argus X-12 kit on the south Croatia population. In case of markers with strong LD, their haplotype frequencies cannot be inferred from allele frequencies, but instead have to be estimated directly from population data [8]. Therefore, besides population analysis based on allele frequencies, we also estimated haplotype frequencies in the population of south Croatia and compared the results with other European and non-European populations. Overall objective of this study was employment of X-STR typing based on 12 markers included in Investigator® Argus X-12 kit in the routine casework of the Forensic Science Centre “Ivan Vučetić”.

Materials and methods

Total of 197 samples (99 male and 98 females) from south Croatia region were analyzed. Sampling was performed in an attempt to account for all subpopulation variations by choosing unrelated participants from the entire region covering Zadar, Šibenik-Knin, Split-Dalmatia and Dubrovnik-Neretva Counties. All samples were collected during routine forensic work by the Forensic Science Centre “Ivan Vučetić” and their use in the study was approved by the Ethics Committee of the Institute for Medical Research and Occupational Health, Zagreb, Croatia.

Samples were extracted from Flinders Technology Associates cards (Whatman, Maidstone, Kent, UK) using Chelex 100 [9]. For quantification, Investigator Quantiplex kit (Qiagen GmbH, Hilden, Germany) was used and normalization of samples to approximately 1 ng/µL was carried out subsequently. Investigator® Argus X-12 kit (Qiagen GmbH, Hilden, Germany) was used for amplification of amelogenin for sex determination and 12 X-STR markers belonging to four different LGs: LG1 (DXS10148, DXS10135, DXS8378), LG2 (DXS7132, DXS10079, DXS10074), LG3 (DXS10103, HPRTB, DXS10101), and LG4 (DXS10146, DXS10134, DXS7423). Amplification products were analyzed on 3500 genetic analyzer (Applied Biosystems, Foster City, CA, USA). Data obtained from capillary electrophoresis were analyzed using GeneMapper ID-X software (version 1.4, Applied Biosystems). Peak threshold values of 100 RFU and 200 RFU were applied for heterozygous and homozygous alleles, respectively. Linear detection range for 3500 instrument is 20,000 RFU. All procedures and protocols were carried out following manufacturer’s instructions. All samples containing variant alleles, female samples with triallelic patterns and male samples with biallelic pattern were confirmed by re-extraction followed by amplification and capillary electrophoresis.

Statistical analysis

The allele frequencies for all samples and haplotype frequencies for male samples were determined by counting. For biallelic male and triallelic female samples, allele(s) with highest frequencies were selected for further calculations. Testing for a departure from Hardy–Weinberg equilibrium (HWE), including observed heterozygosity (Ho) and expected heterozygosity (He), was performed only for female samples. Presence of pairwise LD between loci was tested by likelihood-ratio test using the Expectation–Maximization algorithm for female, and by exact test using a Markov chain for male samples. Genetic heterogeneity within population was estimated as gene i.e. haplotype diversity (H) for male haplotype data. All aforementioned computations were performed using Arlequin software v3.5.2.2 [10], significance level for all statistical tests was set to 0.05 and corrected for multiple comparisons using Bonferroni adjustment.

Forensic parameters encompassing polymorphism information content (PIC), power of exclusion (PE), power of discrimination (PD) for males and for females, mean exclusion chance (MEC) for deficiency cases (Krüger’s formula), MEC for normal trios consisting of a mother, a daughter and a putative father (Kishida’s formula), and MEC for duos consisting of a daughter and a putative father (Desmarais’ formula), were computed based on allele frequencies data using on-line tool available at ChrX-STR.org web page [1].

To examine the relationship with neighboring populations, pair-wise genetic distances with sample size correction (\({{\text{F}}_{\text{ST}}}^{\text{*}}\)) [11] were calculated based on 12 alleles’ frequencies using POPTREE2 software [12]. Inter-population comparison was represented by multi-dimensional scaling (MDS) plot constructed with IBM SPSS statistics for windows, version 23.0 (IBM Corp., Armonk, NY, USA). South Croatian population was compared with 26 world populations (Table 1).

Table 1 References to X-STR population studies used for inter-population comparison

Results and discussion

We determined allele frequencies, Ho, He and P values for the HWE of 12 X-STR in the population of south Croatia (Online Resource 1, Table S1). There was no statistically significant departure from HWE at any locus. This data set is therefore suitable for match probability calculations in forensic work. We found previously reported off-ladder alleles at the following loci: DXS10103 (22), DXS10135 (24.1), DXS10146 (34.1, 38.2, 47.2) and DXS10148 (17, 22, 25, 31.1) [13, 1517, 19, 20, 23, 3034]. We also detected, to our knowledge not previously reported variant allele at DXS10101 (26.1).

In two female samples, triallelic patterns were detected (Online Resource 2, Fig. S1); type 2 pattern at DXS10134 locus (alleles 33, 34 and 38) and type 1 pattern at DXS10146 locus (alleles 27, 30 and 31) [35].

Total of four male samples exhibited biallelic pattern at the same locus (DXS10079) (Online Resource 2, Fig. S2). Of those, three biallelic genotypes comprised alleles 20 and 21, while one consisted of alleles 20 and 22 (Table 2). This finding is exceptional and intriguing since, to the best of our knowledge, there is no evidence of biallelic pattern in males obtained with Investigator® Argus X-12 at any locus, except for the one sample from central Croatia having alleles 21 and 22 at DXS10079 [17] (Table 2). On the other hand, triallelic patterns for DXS10079 locus in female samples were reported in previous studies [22, 30] (Table 2), indicating a mutational hotspot. It is particularly interesting that three male profiles share identical aberrant pattern at DXS10079 (alleles 20 and 21), imposing a question of relatedness. Unfortunately, it is impossible to reach these individuals as well as their families because we analyzed forensic casework samples. However, we undertook all possible measures to avoid sampling relatives (different cases, surnames, counties). Moreover, three men displaying the same mutation were born at different mainland cities from two different counties (10,000–200,000 residents) and according to their autosomal and Y-STR profiles are not in kinship (data not shown), which excludes potential effect of isolated areas with high rate of endogamy.

Table 2 Reported cases of biallelic (male samples) and triallelic (female samples) patterns at DXS10079 locus

Observed length mutations might be the consequence of somatic mosaicism, parentally transmitted germline mutation or early embryonic somatic mutation [36, 37]. In the first case scenario only the subset of cells would carry the mutation, while in the second and the third, all cells of an individual would be affected. The latter seems more plausible although it implies a double-step mutation (local chromosomal duplication and gain/loss of tetranucleotide repeat). Namely, allele peaks of all obtained biallelic profiles are of approximately the same height (Online Resource 2, Fig. S2), indicating an equal contribution of two variants. A parallel could be drawn with the type 2 tri-allelic pattern [35], where it is assumed that both the wild-type and the mutated alleles are transmitted to the individual together, through an affected parental X chromosome. All samples but one displayed in Table 2 contain allele 20, the highest frequency allele for DXS10079 locus. Based on that observation we can only speculate that allele 20 might in fact be the wild-type allele, which mutated due to addition of one, two or three tetranucleotide units. DXS10079 marker was not integrated in the Mentype® Argus X-8, a product which was used in many population studies before the emergence of Investigator® Argus X-12. It indicates that more information on that locus is yet to be expected because there is a strong indication that we might have encountered a more broadly spread phenomenon. It would be of interest, at least for the Croatian population, to perform X-12 screening of volunteers’ samples and perform additional cell and/or familial analysis of individuals bearing the observed length mutation. More thorough investigation of DXS10079 mutations would be needed in order to clarify its origin, stability, chromosomal position, potential phenotype effect etc. In addition, further attempts should be made to assess possible founder effect in isolated areas such as certain Croatian islands.

We determined forensic parameters (Table 3) based on allele frequencies. The most informative marker in the sample population was DXS10135 (PIC = 0.921) with a total of 26 alleles. The least polymorphic marker was DXS8378 (PIC = 0.635) with a total of 6 alleles. It was shown previously by the X-8 study performed on central Croatia population using Mentype® Argus X-8 kit that the same markers, both belonging to LG1, were also the most and least polymorphic [38]. This is expected in regard to geographical and historical connection between two neighboring regions of the Croatian population. Exceedingly high combined values of all forensic parameters tested, especially PD observed in female samples (0.999999999999999) proves suitability and informativeness of Investigator® Argus X-12 kit for forensic identification and kinship testing in south Croatia population. Particularly keeping in mind the nature of actual forensic samples which are usually of low quality, in the sense of contamination, degradation and quantity. Partial and mixed profiles appear as everyday challenge in forensic practice. Hence, the new typing tools with improved performances should be implemented in an attempt to diminish these problems.

Table 3 Forensic parameters for 12 X-STR markers

Haplotype frequencies for each LG were computed based on the number of haplotypes in male samples (Online Resource 1, Table S2). We found LG1 as the most informative (H = 0.9981) and LG3 as the least informative (H = 0.9858) haplotype.

LD was calculated in both men and women, with the significant values found only in male samples within LG2 between loci DXS10074 and DXS10079, and also within LG3 between loci DXS10101 and DXS10103 (Online Resource 1, Table S3). Future efforts should be addressed at enlarging the sample pool representing Croatian population in order to confirm potential LD between all linked markers.

In addition to south Croatia, 26 world populations (Table 1) with the available allele frequencies for all 12 markers included in Investigator® Argus X-12 were selected for pair-wise genetic distance (\({{\text{F}}_{\text{ST}}}^{\text{*}}\)) calculations (Online Resource 1, Table S4). \({{\text{F}}_{\text{ST}}}^{\text{*}}\) values generally increase with geographical distance. All tested European (except Ibiza and Greenland), Middle Eastern and North African populations in relation to south Croatia show similar \({{\text{F}}_{\text{ST}}}^{\text{*}}\) values ranging from 0.070 to 0.076. Ibiza displays more pronounced genetic distance (\({{\text{F}}_{\text{ST}}}^{\text{*}}\) = 0.078), which is in concordance with the already documented position of Ibiza distant from its geographic neighbors, probably due to demographic isolation and distinctive matrilineal background of the island population [24]. Further increase in \({{\text{F}}_{\text{ST}}}^{\text{*}}\) values ranging from 0.080 to 0.086 in relation to south Croatia is evident for ethnically and geographically remote populations (Somalia, India, Cabo Verde, Japan and China). Interestingly, Greenland displays by far the highest genetic distance from all 26 population tested (\({{\text{F}}_{\text{ST}}}^{\text{*}}\) 0.096–0.132), which indicates its genetic uniqueness. The clear differentiation of Greenlandic population, probably due to genetic drift and interbreeding of different Inuit cultures, has already been documented [20].

In order to get better resolution of similarity between south Croatia and geographically closer populations displaying lower \({{\text{F}}_{\text{ST}}}^{\text{*}}\) values, we performed a multidimensional scaling with 15 European populations excluding Greenland and Ibiza. Moreover, we included Morocco and Algeria, Mediterranean countries which might share genetic resemblance with south Croatia, a distinctively Mediterranean region of Croatia. Left-to-right position of populations along first dimension axis of MDS plot (Fig. 1) correlates well with mild northeast-to-southwest geographical inclination, with Denmark and Morocco as “northern” and “southern” genetic endpoints. Similar relative positioning of European populations in MDS plots based on 8 X-STR markers has been shown previously [13, 19, 22, 24]. It is interesting, although not surprising, that central Croatia [17] is a part of tightly grouped northern/central European countries, while south Croatia is positioned within the more scattered group of southern populations. More extensive data set for all Croatian historical-cultural regions would be needed to better address the issue of intra-population variations.

Fig. 1
figure 1

A two-dimensional multidimensional scaling plot drawn from sample bias corrected \({{\text{F}}_{\text{ST}}}^{\text{*}}\) genetic distances calculated from the allele frequencies of 12 X-chromosome STRs included in Investigator Argus X-12 kit with the POPTREE2 software. Stress = 0.1314/RSQ = 0.9344. ALB Albania, ALG Algeria, BLR Belarus, CCRO Central Croatia, CZR Czech Republic, DEN Denmark, GER Germany, GRE Greece, HUN Hungary, ITA Italy, LIT Lithuania, MOR Morocco, POR Portugal, SCRO South Croatia, SLO Slovenia, SWE Sweden, WMED Western Mediterranean population—pooled Valencia, Majorca and Minorca

As a conclusion, we confirm the applicability of Investigator® Argus X-12 in routine forensic casework for the population of south Croatia. It can be used for both identification and familial testing, as a complement to autosomal STR-typing. Obtained allele and haplotype frequencies will be included into the growing national 12 X-STR database.