Introduction

The analysis of human mtDNA has become a powerful tool for genetic characterization of forensic biological specimens. Several features of the mitochondrial genome, such as a partial protection against nuclease activity due to a circular molecule or the presence of thousands of copies per cell [1, 2] make it relatively convenient for the genetic analysis of highly degraded material where DNA typing with nuclear markers would not be successful, such as bones, teeth, hair shafts, buried material, faeces etc. [3, 4, 5, 6, 7, 8, 9, 10]. Additionally, the substitution rate of mtDNA (especially in the non-coding region) is markedly higher compared to most nuclear genes. This is probably caused by mtDNA polymerase mistakes, deficiencies in repare activities, high oxidant levels and lack of recombination [11, 12]. All these mechanisms increase the genetic variability of mtDNA and therefore increase the chance of individual identification. Another advantage is also the maternal inheritance [13, 14], which allows intergeneration comparison to determine maternal family relationships even in the case of several missing generations [15].

The non-coding highly variable region of human mtDNA called the D-loop is used for the purpose of forensic genetics. The D-loop of human mtDNA, which is approximately 1150 bp long, is situated between the mitochondrial tRNAPro and tRNAPhe genes. It contains two sequence variable regions called HV1 (position 16024–16365) and HV2 (position 73–340), and one sequence and length variable region HV3 (position 438–574) with CA dinucleotide repeats. Furthermore, the origin of replication of the H strand (OriH), from which 7S DNA is synthesized during the first stage of replication, is situated here [16, 17, 18, 19, 20, 21].

At present, the frequencies of mtDNA types in a population are quantified by counting the number of occurrences in the sequence database. Therefore, the availability of large sequence databases is necessary. In the last decade many population studies dealing with mtDNA D-loop polymorphisms have been published [22, 23, 24, 25, 26, 27, 28]. The Czech population data of the D-loop region presented in this study is another small piece in the big puzzle of the world human mitochondrial population data.

Materials and methods

DNA isolation

The total DNA from 93 randomly chosen unrelated Caucasians from the Czech population was isolated from blood samples by chelex extraction [29] or with a QIAamp DNA Blood Mini kit (QIAGEN, Hilden, Germany) according to manufacturer’s protocol.

Polymerase chain reaction (PCR)

A total of 10 ng of DNA was amplified using the primers (Generi Biotech, Hradec Kralove, Czech Republic) displayed in Table 1, which anneal to the area of the tRNAPro and tRNAPhe genes and border the D-loop region. The PCR was carried out in a total volume of 50 μl consisting of 1×PCR buffer (containing 1.5 mM MgCl2), 200 μM of each dNTP, 0.1 μM of each primer and 2 U Taq polymerase (TAKARA, Shiga, Japan).

Table 1 Mt DNA amplification primers

Amplification was performed on a Progene thermocycler (Techne, Cambridge, UK) using the following conditions: denaturation at 95°C for 2 min and then 30 cycles with denaturation at 95°C for 1 min, annealing at 55°C for 1 min and extension at 72°C for 1 min. The program was finished with elongation at 72°C for 7 min.

Successfully amplified PCR products were purified with a QIAquick spin PCR purification kit (QIAGEN, Hilden, Germany).

Sequencing and electrophoresis

Sequencing reactions were carried out using the Big Dye Terminator Sequencing kit (PE/Applied Biosystems, Foster City, CA) on a Progene thermocycler (Techne, Cambridge, UK) with 50–200 ng of each PCR product and 10 pmol of one of the sequencing primers (Table 2) (Generi Biotech, Hradec Kralove, Czech Republic) was added to the reaction for sequencing the complete D-loop region. The reaction conditions were as follows: denaturation at 96°C for 30 s and then 25 cycles with denaturation at 96°C for 30 s, annealing at 50°C for 15 s and extension at 60°C for 4 min.

Table 2 Mt DNA sequencing primers

The samples were precipitated and purified twice with isopropanol, resolved in template suppression reagent (PE/Applied Biosystems, Foster City, CA), denatured and run on an automated sequencer ABI Prism 3100 Avant (PE/Applied Biosystems, Foster City, CA) at a constant voltage of 12.2 kV for 180 min.

Data analysis

Analysis of sequences was carried out using ABI Prism sequence software (PE/Applied Biosystems, Foster City, CA ) and sequences were aligned using the BioEdit program [30]. The genetic characteristics of the D-loop were calculated according to Tajima [31] and Stoneking et al. [32]. Statistical analysis of the D-loop region was performed using the DNA SP v3 program [33].

Results and discussion

All sequences were manually checked in ABI Prism sequence software (PE/Applied Biosystems) and whole D-loop consensus sequences were created. Sequences were aligned by the BioEdit program [30] and compared with the reference sequence [16] (comparison table see ESM). For better utilization of all obtained data, several new terms were established in our paper. The region spanned by 7S DNA during replication of the H-strand, situated between positions 16366 and 72, was named 7S DNA spanned region (7S-SP) and the region between positions 341 and 574 containing HV3 was called HV3 extended region (HV3ex). All calculations were then made with regions HV1, 7S-SP, HV2 and HV3ex.

After the comparison of sequences of the complete D-loop region, 85 haplotypes (91.4%) were found. From these, 78 sequences (83.9%) were observed only once in the Czech population, 6 were observed twice (12.9%) and 1 was observed 3 times (3.2%). These results are slightly different from those given in the paper of Lutz et al. [20], where 95% had different haplotypes and 93% were unique sequences.

The most abundant haplotype (16519C, 263G, 315.1C) in comparison to Anderson et al. [16] was observed in 3.2% of samples, which is also the typical haplotype found by Lutz et al. [20]. When we focused only on the regions HV1 and HV2, type 263G, 315.1C was observed in 4.3% of cases and is also the typical haplotype found in other Caucasian populations (US Caucasians 4.3% [22], Spanish 5% [23], Swiss 2.6% [24], Austrians 3% [25], Germans 2% [34], British 4.5% [26], French 4% [27]).

Based on the observed frequencies of mitochondrial D-loop haplotypes and from data from each single area of this region, i.e. HV1, 7S-SP, HV2, HV3ex and HV1 together with HV2, genetic diversity [31] and random match probability [32] were counted (Table 3). The results for the complete D-loop and for HV1 together with HV2 regions are in a good accordance with values obtained by other authors for large populations [20, 22, 35], while the data for small isolated populations exhibited lower values of genetic diversity and higher values of random match probability [36]. Surprisingly, data of the HV3ex region exhibited higher RMP and lower GD than one would expect, i.e. they showed lower variability than other authors observed even if they used only HV3 region in their calculations [37].

Table 3 Genetic diversity, random match probability and average number of nucleotide differences in HV1, 7S-SP, HV2, HV3ex, HV1 and HV2 togethera and D-loop region of mtDNA of 93 Czech Caucasians

The degree of polymorphism within the Czech population, i.e. the average number of nucleotide differences between individuals for each region and complete D-loop region as shown in Table 4 are not markedly different from other European Caucasian studies such as the French, British, German etc. [22].

Table 4 Average number of nucleotide differences in HV1, 7S-SP, HV2, HV3ex, HV1 and HV2 together a and D-loop region of mtDNA of 93 Czech Caucasians

Table 5 displays the number of differences in the complete D-loop and separately in HV1, 7S-SP, HV2, HV3ex regions of the Czech population data set in comparison with the Anderson et al. reference sequence [16]. Substitution, deletion and insertion mutations were found in the whole D-loop region at 167 positions. Substitutions were observed at 151 positions with a total of 645 differences. From these, 96.6% were transitions, mainly T to C and C to T (63.3%), which is in accordance with other studies even if they analyzed only HV1 and HV2 regions [22, 23, 24, 25, 26, 27]. Transversions were very rare, they were randomly spread over the whole region and there was a slight predominance of A to C and A to T mutations (Table 5 and ESM).

Table 5 Observed nucleotide substitution, insertion, deletion and heteroplasmic events found in HV1, 7S-SP, HV2, HV3ex, HV1 and HV2 together a and D-loop region of mtDNA of 93 Czech Caucasians in comparison to reference Anderson et al. sequence [16]

Most of the substitutions were distributed randomly but among them there were several sites which exhibit, as in other population studies [22, 23, 24, 25, 26, 27], a higher percentage of changes in comparison to the reference sequence [16]. These are particularly the already mentioned sites 16519C, which was observed in 48.3% of samples, 263G observed in 92.4 %, 16126C observed in 24.7%, 73G observed in 59.1% and 195C observed in 21.5%. All exhibited more than 20% change in comparison to the reference Anderson et al. sequence [16] (for more details see ESM).

Deletions were found at three positions and were only found in the HV3ex region and particularly in the CA repeats where a dinucleotide deletion at positions 522 and 523 was found in 9 cases. Additionally, the deletion of a C at position 527 was observed in 1 sample. (Table 5 and ESM).

Both types of insertion of nucleotides, i.e. heteroplasmic and homoplasmic, were found in 11 positions. The frequency of the homoplasmic mutation type is relatively high and occurs in the region of the CA repeats in HV3ex. There was 1 CA dinucleotide added at positions 523.1 and 523.2 in 6 cases and 2 CA dinucleotides were added at positions 523.1–523.4 in 2 cases. Another insertion of a T at position 455.1 and a C at position 455.2 was observed in 1 sample (Table 5 and ESM). However, the main area where the insertion mutations occurred is situated in the poly-C tract called the C-stretch between positions 303–315 in HV2, where a C at position 315.1 was added in 94.6% and at position 309.1 in 53.8% of all samples (ESM). Similar results of a C insertion in this region were also observed in other populations [25, 34]. This type of variability is relatively often connected with another interesting phenomenon called length heteroplasmy, in which populations of mitochondrial haplotypes differing in length due to the insertion of nucleotide are found. Length heteroplasmy patterns between positions 303 and 315, similar to the data published by Parson et al. [25] were observed in 21 (22.5%) cases (Table 5 and ESM). Another length heteroplasmy can occur in the poly-C tract in HV1 due to a transition of T to C at position 16189 and addition of C between positions 16184 and 16193 as observed in 8 (8.6%) of our samples in agreement with other authors, i.e. in all where an observed change at position 16189 together with no substitution at position 16186 occurred [25, 38] (Table 5 and ESM).

Besides length heteroplasmy, two position heteroplasmies in HV1 at position 16155A/T in one sample and at position 16120A/C in a second sample were found (Table 5 and ESM).

Extreme care should be taken in determination of heteroplasmy, as for example different PCR strategies or sequencing artifacts can lead to different results [39]. In these cases new DNA extraction and reading both the forward and reverse reactions of the sequence should be considered. Alternatively, an error detection method covering not only errors leading to false determination of heteroplasmy but also of other types of errors in mtDNA data, based on phylogenetic analysis [40] could be used to minimize possible errors.

In conclusion, all these results and values, i.e. a sufficiently high genetic diversity, a sufficiently small RMP and a relatively high intrapopulation diversity, indicate that our data are relatively well suited for application to forensic casework and contribute to a better definition of continental and subcontinental distributions of mtDNA types [41]. Thus, it can be successfully applied especially to cases in which highly decomposed specimens are analyzed and typing with nuclear markers fails.