Introduction

The Brazilian population is the result of the admixture of three main ethnicities: European, African, and Amerindian [1]. Moreover, the admixture process, as estimated from autosomal genetic markers, seems to have been heterogeneous across the major geopolitical regions of Brazil. Thus, although genetic variability has been maintained to a high degree, admixing has been asymmetrical, i.e., the relative contribution from each ethnicity has been unequal in the five geopolitical regions of the country. The highest Native American contribution was detected in the Northern region, while the highest African and European contributions were identified in Northeastern and Southern regions, respectively. Indeed, those differences remain detectable even after intense interregional gene flow.

Autosomal short tandem repeats (STR) are the most widely used genetic markers in paternity and individual identification tests. However, in some complex cases, autosomal markers may be inconclusive [2], requiring the use of non-autosomal markers, like those found in X and Y chromosomes. In recent years, many studies built and validated panels of STRs markers on the X chromosome (X-STR markers) to be used in forensic genetics, usually employing polymerase chain reaction (PCR) multiplexes ranging from six to 12 STR per reaction [38]. Such panels have been useful in complex relatedness cases when [911] (1) the putative father was not available, (2) the relatives were consanguineous, (3) the relative of the putative father is either his mother or daughter, and (4) in maternity investigation of a son [12].

Despite extensive availability of autosomal [13] and Y-chromosome [14] markers databases in Brazil, there are only few and non-representative X-STR markers databases [6, 7, 15, 16], mostly restricted to its Southeastern region [12]. Recently, we developed a 12 X-STR multiplex panel to estimate the genetic variability of Brazilian populations [8]. Aiming to extend the usefulness of that panel, a major achievement would be to evaluate the accuracy of the markers in forensic investigations (forensic parameters) of specific populations. This is particularly important for highly admixed populations, as is the case of Brazil and other South American countries.

In this study, we investigated that panel of 12 X-chromosome STR markers for more than 2,000 individuals belonging to 16 out of the 27 Brazilian States, comprising all five of its main geopolitical regions (North, Northeast, Central-West, Southeast, and South) in order to provide a countrywide and more representative database for forensic purposes.

Materials and methods

Populations studied

This study was carried out in accordance to the ethical principles stated on the Helsinki Declaration (2000) of the World Medical Association. After willing and informed consent, blood samples were collected from 2,234 individuals from 16 Brazilian States, representing five geopolitical Brazilian regions (Fig. 1). The Northern Region was represented by 979 individuals (563 males and 416 females), comprising the States of Pará (400 individuals), Amapá (100), Amazonas (100), Acre (50), Rondônia (265), Roraima (38), and Tocantins (26); Northeastern Region, 289 individuals (204 males and 85 females), from the States of Maranhão (96), Ceará (135), and Pernambuco (58); Central-Western Region, 150 individuals (95 males and 55 females), from the States of Goiás (101) and Mato Grosso do Sul (49); Southeastern Region, 496 individuals (328 males and 168 females), from the States of São Paulo (278) and Minas Gerais (218); and Southern Region, 320 individuals (165 males and 155 females), from the States of Paraná (53) and Rio Grande do Sul (267). Samples from Roraima, Maranhão, Tocantins, and Mato Grosso do Sul were exclusively constituted by males.

Fig. 1
figure 1

Five Brazilian geopolitical regions (North, Northeast, Central-West, Southeast, and South). AC Acre, PA Pará, AM Amazonas, AP Amapá, RO Rondônia, RR Roraima, TO Tocantins, MA Maranhão, CE Ceará, PE Pernambuco, GO Goiás, MS Mato Grosso do Sul, MG Minas Gerais, SP São Paulo, RS Rio Grande do Sul, PR Paraná

DNA analysis

Genomic DNA was isolated from peripheral blood [17]. Samples were genotyped for 12 X-STR markers (DXS9895, DXS7132, DXS6800, DXS9898, DXS6789, DXS7133, GATA172D05, DXS7130, HPRTB, GATA31E08, DXS7423, and DXS10011) in a single multiplex reaction. Primers and PCR conditions were as described elsewhere [7, 8]. PCR products were submitted to capillary electrophoresis, while separation and detection of fragments were performed with an ABI PRISM 3130 Genetic Analyzer using the GS-500 ROX size standard, filter set D, and POP7 polymer (Applied Biosystems, Foster City, CA, USA). Alleles were identified and assigned using the GeneMapper v3.1 software (Applied Biosystems).

Allele nomenclature for all markers investigated in the present study was based on the number of repeats, in accordance to the International Society for Forensic Haemogenetics (ISFH) guidelines [18]. For DXS9895, DXS7132, DXS6800, DXS7133, GATA172D05, and DXS7423 markers, we adopted the nomenclature proposed by Edelmann et al. [19], while other markers were also designated following previous reports: DXS9898 [20], DXS6789 [21], GATA31E08 [5], DXS7130 [22], HPRTB [23], and DXS10011 [24]. The cell line NA9947 (Promega Corporation, Madison, WI, USA) was genotyped to be used as a typing reference sample. The number of repeats was in agreement with the data previously published [25].

Statistical analysis

Allelic frequencies, gene diversities, exact test of the Hardy–Weinberg equilibrium for female samples, pairwise exact test of linkage disequilibrium (LD) for male samples, analysis of molecular variance (AMOVA), population pairwise genetic distances (F ST), and pairwise exact test of population differentiation were calculated using the ARLEQUIN software version 3.1 [26]. Because of the presence of intermediate alleles at three loci, genetic distance estimations were based on the number of different alleles (F ST) rather than on the sum of squared size differences (R ST), as in Pereira et al. [27]. The relationship between populations inferred from pairwise F ST genetic distances were visualized in two-dimensional space using the multidimensional scaling (MDS) method (metric) included in the SPSS v. 14.0 software (SPSS Inc., Chicago, IL, USA).

Statistics for the evaluation of the forensic efficiency of each locus, namely power of discrimination in females (PDF), power of discrimination in males (PDM), probability of exclusion in trios involving daughters (PEXC.TRIO), and father/daughter duos lacking maternal genotype information (PEM), were computed according to Desmarais et al. [28].

Results and discussion

Genetic variability

Supplementary Table 1 shows the allelic frequency distribution and gene diversity estimates for each population. Allelic frequency distribution between males and females (exact test for population differentiation) was possible to be compared in 12 populations. The significance level was adjusted to 0.0042 after Bonferroni correction for multiple tests. Because no significant differences were observed for those 12 populations, the allelic frequencies shown in Supplementary Table 1 represent the pooled male and female subsamples from each population.

Considering the total Brazilian sample, gene diversity ranged from 67% (DXS7133) to 95% (DXS10011). When analyzing populations separately, the lowest gene diversity was 59% (DXS7133, in the State of Amazonas), while the highest was 96% (DXS10011, in the State of Pará). The population from the State of Ceará showed the highest mean gene diversity (79%), while the State of Amazonas population, the lowest one (74%).

All 1,355 male haplotypes-like allelic combinations of the 12 markers were unique, i.e., observed only once (Supplementary Table 2). Additionally, considering female genotypic proportions, all loci in all 12 populations did not deviate from the Hardy–Weinberg equilibrium after Bonferroni’s correction for multiple tests (Table 1).

Table 1 Forensic parameters for a panel of 12 X-STR in 16 States of Brazil

Forensic parameters

Forensic parameters estimates are shown in Table 1. DXS10011 marker was the most informative in all populations, followed by GATA172 (the second most variable in Pará, Amazonas, Amapá, Rondônia, Ceará, Pernambuco, Goiás, Minas Gerais, São Paulo, and Paraná) and GATA31E08 (the second most variable in Acre, Roraima, Mato Grosso do Sul, and Rio Grande do Sul).

The lowest power of discrimination were achieved by DXS7423 (in Acre, Pará, Amapá, Tocantins, Maranhão, Ceará, Pernambuco, and Minas Gerais) and DXS7133 (in Amazonas, Rondônia, Roraima, Goiás, Mato Grosso do Sul, São Paulo, Rio Grande do Sul, and Paraná) markers.

All forensic parameters achieved the highest values in Ceará, while the lowest values were found in Roraima. When all populations were amalgamated, the panel showed a PDF of 0.999999999999994 (ranging from 0.99999999999961 to 0.999999999999991) and a PDM of 0.9999999969 (ranging from 0.999999957 to 0.9999999960). The probability of exclusion in trios (PEXC.TRIO) ranged from 0.999999799 to 0.999999979, and in father/daughter duos lacking maternal genotype information (PEXC. MOTHERLESS) ranged from 0.999978963 to 0.999996680.

Linkage disequilibrium analysis

Only five pairs of markers showed pairwise linkage disequilibrium (p = 0.00076, after Bonferroni’s correction for 66 comparison in each population): DXS9898/DXS9895 (in Pará), DXS7423/GATA31E08 (in Ceará), DXS6789/DXS6800 (in Rio Grande do Sul), and GATA172D05/DXS10011 and DXS10011/DXS9895 (in Maranhão) (Supplementary Table 3). Those loci were occupying a physical distance of 10 Mb (DXS7423/GATA31E08) or more. However, LD does not depend exclusively on the physical distance between loci, but may result from random genetic drift, founder effect, recent interethnic admixture, population stratification, and sampling effects [12, 29]. In fact, LD between X-chromosome markers distant by more than 10 Mb had already been reported in admixed Brazilian populations: e.g., between DXS6789 and GATA31E08 (separated by 45 Mb) [6], and between DXS9898 and DXS9902 (separated by over 72 Mb) [12], in populations from Southern and Southeastern regions, respectively. In our study, none of the pairs of loci showed significant linkage disequilibrium in more than one population. Still, when populations were pooled according to their geopolitical regions, no linkage disequilibrium was found among the pairs of markers investigated.

Hence, the linkage disequilibrium detected might be spurious or due to sampling effects, corroborating the suggestion by Martins et al. [12] when studying a population from the Southeastern Brazil. However, considering the heterogeneity of the Brazilian populations, recent interethnic admixture may be another likely explanation for the few pairwise significant linkage disequilibrium observed. Further studies with the Brazilian formative parental populations would be helpful in disentangling this question.

Interpopulational comparisons

The global genetic differentiation among 16 Brazilian population samples, as measured by F ST, was statistically significant (F ST = 0.0047; p = 0.0000). The pairwise F ST between all possible pairs of populations showed 14 (out of possible 120) statistically significant pairwise F ST after multiple tests correction (p < 0.0042) (Supplementary Table 4).

A locus-by-locus AMOVA was carried out in order to identify markers with higher differentiation. In this analysis, DXS7133 showed the highest differentiation value (F ST = 0.01713; p = 0.000), similar to what had been previously found for a panel of 10 X-STR used in a study with the Southeastern Brazilian population [12].

A careful analysis of pairwise F ST estimates (Supplementary Table 5) revealed that three markers (DXS6789, GATA172D05, and DXS9898) did not show F ST estimates significantly different from zero. Additionally, three markers (GATA31E08, DXS7423, and DXS7130) revealed to be significant in only one occasion. Finally, four other markers showed a higher number of significant pairwise F ST: DXS6800 (seven populations), HPRTB (eight populations), DXS10011 (nine populations), and DXS7133 (12 populations).

Our results clearly show that there was a considerable global differentiation among the populations studied, although differentiation was heterogeneous across the markers: a group of markers showed high differentiation (DXS7133, DXS10011, HPTRB, and DXS6800); a second group showed moderate differentiation (DXS7132 and DXS9895); and a third group showed low or no differentiation (DXS9898, DXS6789, GATA172D05, GATA31E08, DXS7423, and DXS7130).

Interregional comparisons

Populations were grouped according to the geopolitical regions they belong to. The differentiation among regions was significantly different from zero (F ST = 0.00261; p = 0.0000). Pairwise comparisons showed that the Southeastern Region differs significantly from the Northern (F ST = 0.0042; p = 0.0000), Northeastern (F ST = 0.0030; p = 0.0010), and Southern (F ST = 0.0036; p = 0.0010) regions, while Southern and Northern were also significantly different when compared to each other (F ST = 0.0032; p = 0.0010) (Supplementary Table 6). The results again highlight the high differentiation for the DXS7133 marker (F ST = 0.0110; p = 0.000) among regions.

Comparisons with other Brazilian populations

We expanded the genetic comparisons to six additional Brazilian populations [4, 11] that share seven X-STR markers with the populations currently studied (DXS7132, DXS9898, DXS6789, DXS7133, GATA172D05, GATA31E08, and DXS7423). Altogether, those 22 populations showed significant differences (global F ST = 0.00416; p < 0.0000). Additionally, the intralocus F ST among populations confirmed that the highest level of differentiation was attributable to the DXS7133 marker (F ST = 0.01727; p < 0.000).

Comparisons with worldwide populations

We extended our analysis to include 10 populations from Latin America (Buenos Aires, Missiones, Rio Negro, Córdoba, Antioquia, and Costa Rica) and Europe (North and Centre of Portugal, Cantabria, and Galicia) [5] for the same seven markers aforementioned. Moreover, unpublished data from our group on seven Native American tribes (138 individuals) and Africans (95 individuals; described in Santos et al. [30]) were also added to the F ST analysis.

Pairwise F ST estimates (Supplementary Table 7) involving all populations suggested that the Brazilian populations are less differentiated among each other (28 out of 231 pairwise F ST were statistically significant) in comparison to other populations from Latin America (46 out of 132 pairwise F ST were statistically significant). Finally, the Brazilian and Latin American populations were more similar to European (Portugal and Spain) than to African and Native American populations. These results can be graphically visualized in the multidimensional scaling analysis charts, obtained from pairwise F ST matrices (Fig. 2).

Fig. 2
figure 2

Multidimensional analysis (MDS) from the comparisons involving Brazilian populations and from European, African, Latin American, and American native populations

Conclusions

The present study strongly supports the high informativeness of the 12 X-STR panel described here for forensic and population purposes in all Brazilian regions. Our results also reinforce the genetic differentiation between Brazilian populations and even between its geopolitical regions. However, the markers contribute heterogeneously to such differentiation, with some markers showing high differentiation (in special DXS7133), while others displaying moderate or no differentiation at all. Thus, it is clear the need of regional or even population-specific databases for the 12 X-STR panel to be used in forensic casework and kinship analysis, in agreement to the previous suggestion by Martins et al [12] for Southeastern Brazilian populations.