Introduction

Mitochondrial DNA (mtDNA) genome analysis has been a potent tool in forensic practice as result of its properties and certain advantages over nuclear DNA in some legal cases [1]. The increase of the mitochondrial population data, in the last 15 years, provided a better understanding of the mitochondrial phylogeny and it also motivated the publication of new forensic guidelines to ensure data quality and to establish standardization between laboratories around the world [2, 3].

For many years, the genetic diversity of the mtDNA sequence in different ethnic populations has been revealed through Sanger sequencing of the hypervariable 1 and 2 regions (HV1 and HV2) (from positions 16024 to 16365 and from 73 to 340, respectively) [4,5,6,7]. In the late 1990s, a third hypervariable region (HV3) (from positions 438 to 576) was also included in some population studies [8,9,10,11]. Nowadays, it recommended sequencing of the entire mitochondrial DNA control region (from positions 16024 to 576) in population database studies, to increase the power of discrimination and the haplogroup determination [3, 12].

In Brazil, previous studies based on the analysis of the HVs regions had been published [13,14,15]. Brazil is in South America and it has one of the most heterogeneous populations in the world, which is the result of five centuries of interethnic crosses of peoples from three continents: European colonizers, mainly represented by Portugueses, enslaved Africans and native Amerindians [16]. The mtDNA haplogroups frequencies are variable between Brazilian regions [17], as result of its large geographical extension and its highly mixed population [18]. Parana state is in the southern of Brazil and represents 2.34% of Brazilian territorial extension [19], and according to regional history data, the Parana population is also three-hybrid, with the contribution of Amerindians, Africans, and Europeans [20]. This is the first study to report the mitochondrial DNA diversity of Parana state. Therefore, the aim of this work is to generate high quality mtDNA forensic data to increase Brazilian mitochondrial genetic information, as well as to establish the predominant mtDNA haplogroups, in the Parana state population sample set.

Materials and methods

DNA samples

Genomic DNA, extracted from peripheral blood by Biopur Kit Extraction Mini Spin Plus (Mobius Life Science, Brazil) according to the manufacturer’s instructions, was obtained from 94 Euro-Brazilians, 24 Brazilians of mixed ancestry, and 4 Afro-Brazilians healthy unrelated individuals. They were assigned to one of these three groups based on self-classification and the sample size selected for each ethnic group was guided to respect the proportional contribution of them for Parana population according to the last data published [19]. Informed consent was obtained from all participants. This study was performed according to Brazilian Federal laws and was approved by the Human Research Ethics Committee of the Federal University of Parana.

PCR amplification

The entire mtDNA control region was amplified in a single amplicon (1418 bp), as recommended in [3], using the primer sets L15879 (5′ AAT GGG CCT GTC CTT GTA GT 3′) and H727 (5′ AGG GTG AAC TCA CTG GAA CG 3′) [13]. Amplification reactions were done with AmpliTaq Gold DNA polymerase (Applied Biosystems), following manufacturer’s specifications, using 50 ng of genomic DNA, in a final volume of 50 ul. The reactions were performed under conditions of initial denaturation at 95 °C for 10 min followed by 35 cycles of denaturation at 95 °C for 15 s, annealing at 58 °C for 30 s, extension at 72 °C for 1 min and 40 s.

Sequencing

PCR products were purified using 10 U Exonuclease I (EXO I) (United States Biochemical—USB, Staufen, Germany) and 2 U Shrimp Alkaline Phosphatase (SAP) and 10× SAP Buffer (United States Biochemical—USB, Staufen, Germany). The sequencing reactions were made using BigDye Terminator Cycle Sequencing Kit v3.1, according to manufacturer’s protocol. To ensure high quality data and double coverage at all positions, sequencing reaction was performed in both, forward and reverse strands. The ten sequencing primers used are listed in Table 1. In cases of length heteroplasmy, the double coverage of some positions was obtained from the same strand which was sequenced at least twice with different primers, as recommended in [3]. The sequenced fragments were analyzed by ABI 3130 Genetic Analyzer (Applied Biosystems, CA, USA).

Table 1 Sequencing Primers

Data analyses

The mtDNA control region was aligned and compared with the revised Cambridge Reference Sequence [22, 23] using SeqScape Software v2.7 (Life Technologies, Foster City, CA, USA). To ensure high quality data, two independent evaluations of raw data were performed. The length heteroplasmy in homopolymeric sequence stretches was interpreted by reporting the dominant variant [3, 24]. Haplogroup affiliation was inferred according to Phylotree, build 17 [25], by EMMA software (estimating mitochondrial haplogroups using a maximum likelihood approach) provided by EMPOP ver.3 (http://www.empop.org) [26, 27]. The ARLEQUIN v3.5 software was used to calculate molecular diversity indices, such as the number of different haplotypes, number of polymorphic sites, sequence diversity, nucleotide diversity, and mean number of pairwise differences [28]. The random match probability was calculated as the sum of squared haplotype frequencies based on mtDNA control region sequences. For the statistical analysis, the C-stretch length variation at positions 16193, 309, and 573 was excluded, with a total of 1131 usable sites.

Results and discussion

All haplotypes and haplogroups obtained from Parana samples are listed in table SM1. For the 122 individuals analyzed, 108 haplotypes were identified. Of them, 97 sequences were unique, 9 were observed twice, 1 was observed three times, and 1 was observed four times. The most frequent haplotype (16519C, 263G, 315.1C) was observed in 3.3% of the population sample.

In the 1131 analyzed positions, it was observed that 191 polymorphic sites (16.9%) distributed in 161 positions with only transition, 7 sites with only transversion, 6 positions with transition and transversion and 17 with indels. A previous study with higher sample size showed 6% of point heteroplasmy frequency [29]. In our data, we observed similar frequency (5.7%) of point heteroplasmy founded in seven samples, which presented five Y and two R transitions as followed: 16192Y, 16311Y, 16355Y, 185R, 271Y, 310Y, and 374R. All of them were reported before in [29], except the last one (374R).

The molecular diversity indices calculated for the entire control region and for HV1, HV2, and HV3 in Parana population sample are available in Table 2. The results reinforce the greater informativeness of the HV1 region when compared to the other two hypervariable regions and show the increased power of discrimination when the entire control region is analyzed.

Table 2 Genetic diversity in 122 samples from Parana state population, Brazil

The sequence diversity calculated for the entire control region was 0.9976 ± 0.0016 and the random match probability estimated between two unrelated individuals in Parana state population was 0.0106. High values from sequence diversity were also reported in other Brazilian population studies [13,14,15, 30,31,32], confirming the population heterogeneity in this country.

mtDNA haplogroups composition

The haplogroups identified in the examined sample showed the coexistence of matrilineal lineages with different phylogeographic origins. The importance of the indigenous women into the formation of Parana population was revealed by the prevalence of 49.2% of Amerindian mitochondrial lineages. It was observed that the coexistence of haplogroups B and C (15.6% each) with the highest frequencies, followed by A (13.9%) and D (4.1%), as showed in Table 3. The haplogroups A, C, and D were identified in Guarani tribes, and the haplogroups A, B, and C in Kaingang tribes, the two major Native Amerindian groups who lived in Parana state [33]. The high frequency of Amerindian matrilineal lineages in Parana state population reinforces the suggestion of a directional mating involving European males and Amerindian females in Brazil, as it was reported in [34] and also in agreement with historical data. In the XVI century, the mating between European men and indigenous women was encouraged as a strategy for population growth and colonial occupation of the country [35].

Table 3 Frequencies of the mtDNA haplogroups and subhaplogroups in Parana sample set

Regarding the European component, represented by 38.5% of the haplotypes in our study (haplogroups H, U3, U5, R0, T, K, J, V, W, and X2), as shown in Table 3, the prevalence of haplogroup H agrees with previous descriptions of European lineages in Brazil [31]. The haplogroup H was the most representative (11.5%), as occurs in most European populations [36]. The haplogroup U (except U6) was the second most frequent European haplogroup (7.4%) in the sample set, followed by haplogroup R0 with 6.5%. All these three haplogroups are also the most frequent haplogroups found in Portugal population [37], which agrees with the Brazilian colonization history. In the second half of the nineteenth century, many immigrants from Europe settled in Parana such as Italians, Germans, Ukrainians, and some English, French, and Swiss people [20]. Accordingly, with this populational diversity, we also observed different European haplogroups (T, K, J, V, W, and X2) with small frequencies (Table 3).

Considering the 12.3% of African lineages (L2, L1, L3, L0, and U6), the African sub-Saharan haplogroups (L2, L1, L3, and L0) were the most frequent (10.7%) in this study (Table 3). The subhaplogroup L3e is known to have high frequencies in West-Central Africa, which was the main source of Africans brought to Brazil during the colonial period [38]. But with the abolition of slavery (1888), the proportion of Africans in the population of the state of Parana has decreased substantially [13]. The North African contribution in this sample set, represented by the haplogroup U6 (1.6%), showed components of Maghreb origin, which has a high frequency in Portuguese, the main European colonizers of Brazil [39].

Comparisons between different states and regions of Brazil emphasize the differences between haplogroups compositions in each Brazilian region [13,14,15, 30,31,32]. However, the precise haplogroup assignment for some haplotypes is difficult even with entire control region analyzed. In this situation, the analysis of mtDNA coding region SNPs is recommended to refine the haplogroup classification, as it was done for H European haplogroup subdivided into many lineages [40, 41], and in some Native American clades, such as B2, which is incompletely classified by control region motifs [42, 43]. The analysis of SNPs in the coding region is also a tool to increase the discrimination power of common haplotypes in forensic cases [44].

Finally, the high sequence diversity and the relatively low random match probability calculated from Parana state population sample set imply in a high probability of differentiating between two given maternal lineages and reinforce the mtDNA analyses informativeness in forensic cases. The haplotypes reported in the present study are available for forensic purposes via EMPOP (www.empop.org), under the accession number, EMP00714.