Introduction

The mitochondrial genome (mtDNA, mitogenome) is particularly suitable in forensic casework when STR profiling cannot be performed due to the degraded and/or scarce nuclear DNA [1, 2]. Given its matrilineal inheritance, individuals may be identified along maternal lineage across many generations [3] and that enables, for instance, testing for putative exclusion of individuals during the person identification [4]. Traditionally, mtDNA typing is based on ~ 600 bp of the hypervariable segments I and II (HVS-I and HVS-II) of the control region (CR, ~ 1100 bp) (e.g., [2, 5, 6]). However, the information available in these segments sometimes does not have sufficient discriminatory power to resolve distinct maternal lineages, and that has led to extensions of the mtDNA analyses in some cases to the third hypervariable segment, HVS-III, or to the entire CR [7,8,9]. Further extension to the complete mtDNA genome, generated today by Sanger or next generation sequencing, enables usage of sequence variation in the coding region as well, and thus analysis at the maximum resolution [10,11,12].

One of the essential aspects in forensic casework is interpretation of obtained mtDNA profiles, their comparison against the reference database for a possible match, and estimation of their frequency in the context of the relevant population sample [13, 14]. EMPOP database (https://empop.online) has specifically been developed for forensic applications, and currently, it comprises 48,572 quality-controlled mitotypes [15]. The number of complete mitogenomes in EMPOP, however, is still insufficient [16]. Given the continuous advancement in sequencing technologies, it appears that the wider usage of the complete mtDNA typing in forensic casework is more limited by the lack of appropriate database than by the costs of analyses which are decreasing over time [16].

The Slavs, which constitute more than one third of Europeans and inhabit a rather large part of the continent [17], are an ethnolinguistic group with a complex history which has been studied at the molecular level as well (e.g., [18,19,20,21,22,23,24,25]). They are characterized by a marked geographical stratification to South-, East-, and West Slavic-speaking groups. South Slavs inhabit the Balkan Peninsula, a region characterized by exceptionally turbulent historic, demographic, and other processes during the entire history of humans. It served, in addition, as a starting point for the postglacial recolonization of Europe [24, 26, 27] and for human migrations in later periods. However, the representation of the mtDNA diversity of South Slavs in the EMPOP database is rather limited since high-quality HVS-I and HVS-II haplotypes found in individuals from five populations are only available: Bosnia and Herzegovina [28], Bulgaria [29], Northern Macedonia [30], Serbia [18], and Slovenia [28, 31]. On the other hand, complete mitogenomes of Russians and Poles, representatives of East and West Slavs, respectively, have recently been included into this database [23]. Since the enrichment and refinement of the complete mtDNA reference database contributes towards the improvement of the worldwide mtDNA phylogeny, which is essential for the interpretation of the mtDNA casework [32], our aim was to depict complete mtDNA diversity of Serbians, representatives of South Slavs.

Materials and methods

Population samples

We analyzed complete mitochondrial genomes of 226 maternally unrelated individuals from the general population of Serbia, of which 170 were sequenced de novo and 56 were taken from our previous work (GenBank accession numbers KT697997–KT698032, KM096761–KM096763, and KM096765–KM096781) [18, 33]. For comparative purposes, we used 3145 complete mitogenomes originating from ten Eurasian populations. All mitogenomes belonging to one population were pooled together and used as a representative sample of that population. In that way, we included into our analysis complete mitogenomes of 1998 Danes [34], 53 French [35], 116 Poles [23, 36], 401 Russians [23, 35], 118 Tuscans [35, 37], 80 Hungarians [36], 114 Estonians [38], 101 Spaniards [39], 91 Sardinians [35, 40], and 73 Volga Tatars [41].

Samples (saliva or blood) were collected with written informed consent from all individual participants included in the study, and genealogical information regarding the place of birth of participant’s maternal grandparents was obtained. The Serbian ancestry was confirmed for at least two generations, and, according to the geographical origin of donor’s maternal grandmothers, they were classified into the following regions of Serbia: Belgrade (N = 16), Central Serbia (N = 36), Eastern Serbia (N = 19), Kosovo and Metohija (N = 11), Southern Serbia (N = 39), Vojvodina (N = 17), and Western Serbia (N = 49), or to the neighboring countries: Bosnia and Herzegovina (N = 13), Croatia (N = 18), Northern Macedonia (N = 1), and Montenegro (N = 7).

The study was approved by the Research Ethics Committees of the Institute of Molecular Genetics and Genetic Engineering, University of Belgrade, and Institute of Biological Problems of the North, Russian Academy of Sciences.

DNA extraction and sequencing of complete mtDNA genomes

High-quality genomic DNA was obtained from saliva (163 samples) according to Quinque et al. [42] or from blood (63 samples) using QIAamp® DNA Blood Mini Kit (QIAGEN GmbH, Hilden, Germany) in the Laboratory for Human Molecular Genetics at the Institute of Molecular Genetics and Genetic Engineering, University of Belgrade. Sequencing of complete mtDNA genomes was carried out in the Genetics Laboratory at the Institute of Biological Problems of the North, Russian Academy of Sciences, Magadan, Russia (saliva samples), and in the Division of Molecular and Forensic Genetics, Department of Forensic Medicine, Ludwik Rydygier Collegium Medicum, Nicolaus Copernicus University, Torun, Poland (blood samples). Sequencing was performed using ABI3500xL and ABI3130xL Genetic Analyzers, according to the methods described in Torroni et al. [43] and Fendt et al. [44]. As recommended by Just et al. [16], all cases of point heteroplasmy were confirmed through additional independent PCRs and sequencing reactions.

De novo sequenced mitogenomes from Serbian population were deposited in GenBank under accession numbers MK134267-MK134373 and MK617218-MK617280. The complete dataset of 226 Serbian mitogenomes went through the EMPOP quality check [15] and was made available for forensic searches in the EMPOP database (www.empop.org) under the accession number EMP00739.

Haplogroup estimation

Generated sequences were compared with the revised Cambridge reference sequence (rCRS) [45]; haplotypes were defined with the HaploSearch software [46], and sequence alignment was performed following the phylogenetic concept and the recommendations of the International Society for Forensic Genetics (ISFG) [6, 47, 48]. SAM2 [49], implemented in EMPOP [15], was used for alignment and haplogroup estimation according to PhyloTree build 17 [50]. Along with the assignment of haplotypes to the most recent common ancestor (MRCA) by SAM2, which is a conservative estimate suitable in forensics, we also performed haplotype assignment through reconstruction of most-parsimonious trees using mtPhyl v4.015 software (http://eltsov.org). Since this version of program does not use the updated mtDNA phylogeny available in PhyloTree build 17, the trees were modified manually according to the latest PhyloTree build 17 and the following literature [18, 23, 25, 33, 36, 51,52,53,54,55]. New subclades were defined when they comprised at least two different mitogenomes having at least one shared mutation which is not characterized as a hotspot [32]. The length variations at nps 303-315, 522-524, 573-576, and 16180-16193, as well as polymorphism at np 16519 and A-C transversions at nps 16182 and 16183, were excluded from phylogenetic analysis.

Statistical analysis

Basic parameters of genetic diversity in Serbian and other studied populations (number of haplotypes, H; number of polymorphic sites, P; haplotype diversity, HD; nucleotide diversity ND; and mean pairwise difference, MPD) were assessed with Arlequin 3.5.2.2 [56]. The same tool was used for assessing genetic differentiation among populations via the analysis of molecular variance (AMOVA) and estimating pairwise population and overall FST values. Point heteroplasmies were treated as differences, indels were excluded, and statistical significance of all tests was assessed with 10,000 permutations. The matrix of pairwise population FST values was visualized by two-dimensional scaling (MDS) using STATISTICA10 (StatSoft Inc., Tulsa, OK, USA).

The probability that two randomly selected individuals from a population share the identical mtDNA haplotype, i.e., the random match probability (RMP), was calculated according to Stoneking et al. [57].

Tajima’s D test implemented in Arlequin was performed to detect departures from population equilibrium. The significance of the test statistics was assessed with 10,000 bootstrap replicates. DnaSP 5.10.01 software [58] was used to assess the demographic history of populations by a mismatch distribution analysis.

Demographic changes over time were assessed from the Bayesian skyline plot (BSP) analysis carried out with BEAST 1.10.4 [59]. We used the HKY+G+I model of sequence evolution which had the best fit to our dataset as inferred from the Akaike information criterion (AIC) value calculated in MEGA 6.06 [60]. We used the strict molecular clock which, according to the ucld.stdev parameter, cannot be rejected for our dataset, and mutation rate of 1.665 × 10−8 [61]. All parameters were sampled once every 1000 steps from 150 million of Markov chain Monte Carlo (MCMC) steps. Tracer v.1.7.1 [61, 62] was used to assess acceptable mixing, likelihood stationarity of the MCMC chain, and adequate effective sample sizes for each parameter (≥ 200).

Results

mtDNA haplogroup assignment

SAM2 classified 226 complete Serbian mitogenomes into 143 different mtDNA (sub)haplogroups (Table S1) following the nomenclature of PhyloTree build 17, the widely accepted phylogenetic tree of human mtDNA lineages [50]. As expected, Serbian maternal gene pool was composed mainly of West Eurasian (sub)haplogroups: the prevalent haplogroup was H (47.78%), followed by U (16.37%), J (8.85%), K (6.19%), and T (5.31%). Less frequent lineages found in Serbians comprised N1 (including I) (3.54%), W (3.1%), HV (excluding H and V) (3.1%), V (2.2%), X2 (1.32%), and R0a (0.44%). In addition, two mitogenomes belonged to East Asian haplogroup D4j (0.88%) and one to the “European” branch of African haplogroup L2a1k (0.44%).

We subsequently compared the results of haplogroup assignment obtained with SAM2 and mtPhyl. In this respect, it is worth noting that both tools provide haplogroup estimates using the PhyloTree build 17; they employ, however, different approaches: SAM2 does not rely on strict decision trees but rather on variation and fluctuation rates for all sites and regions observed in substantial number of confirmed haplotypes, while mtPhyl uses maximum parsimony for haplogrouping. The assignment of haplogroups by SAM2 and mtPhyl was generally concordant (Table S1). Nevertheless, in a few cases where multiple haplogroup assignments were feasible by SAM2, haplogrouping by mtPhyl corresponded to one of them but not to the most conservative estimate (MRCA) (Table S1, samples 20_Sb, 69_Sb, 73_Sb, 86_Sb, 92_Sb, 143_Sb, 223_Sb, and 252_Sb). For example, mtPhyl assigned the haplotype 20_Sb to haplogroup H1c9′9 (H1c + 152) whereas SAM2 assigned it to MRCA H1c.

In addition, further refinement of haplogroups found outside of the widely accepted PhyloTree build 17 were estimated with mtPhyl by updating the phylogeny according to new findings from the present study as well as those found in available literature [18, 23, 25, 33, 36, 51,52,53,54,55] (Fig. S1). As a result, 22 haplotypes were assigned to 10 new subclades, namely H1cm, H5a10, H7j, H13d, H15b3, H15b3a, H30a1, H110, V6a, and W3b2, not defined previously in PhyloTree build 17 and available literature (Fig. S1 and Table S1). Subclades which comprise haplotypes from the Serbian population that were defined previously in other publications but not in PhyloTree build 17 encompass, for instance, H1c23a, H24a3 [23], H8b1b [52], X2q1a [25], U7b5 [55], U8a1a1a2, and U4d2b [33].

mtDNA diversity in Serbian population

The parameters describing mtDNA diversity in Serbians and other studied European populations are shown in Table 1. The number of unique haplotypes found in 226 Serbians was 192 (85.0%) while 15 haplotypes (15.0%) were observed more than once (Table S2).

Table 1 Parameters of genetic diversity in studied populations based on complete mitogenome sequences

Point heteroplasmies (PHP) were observed in 9 Serbian individuals (3.98%) as single occurrences (Table S3). They were all transitions, of which three were present in the control region and six were in the coding region of mtDNA (Table S3). Three PHPs recorded in Serbian mitogenomes were found in EMPOP database (198Y, 8843Y, and 15466Y).

Length heteroplasmy (LHP) was observed in more than 50% of the samples, with 27 mitogenomes (11.95%) showing more than one LHP (Table S4). Majority of length variants were found in polycytosine stretches of the HVS-II (48.2%) and HVS-I (12.8%), while LHP of C-stretch at np 573 in HVS-III was observed in six instances (2.65%). In the coding region, LHP was observed at position 965 in three mitogenomes (1.3%). Upon inclusion of indel polymorphisms into the analysis, the number of identical haplotypes in the sample was reduced to 12, improving slightly Serbian mtDNA haplotype resolution (87.6% with indels vs. 85.0% without indels).

The RMP for mitogenomic data of Serbians was 0.53%. The lowest RMP value was observed in Danes (0.06%), represented by 1998 mitogenomes, and the highest in French (1.89%), represented only by 53 complete mtDNA genomes (Table 1).

mtDNA pairwise nucleotide differences in Serbian population

The number of mean pairwise differences (MPDs) in Serbian population of 27.82 was in the range of that observed in other European populations (Table 1). The lowest MPD value (27.00) was detected in Sardinian population and the highest (35.18) in Volga Tatars (Table 1), as already shown by Malyarchuk et al. [23].

Negative and statistically significant Tajima’s D values observed in Serbian and other studied European populations indicated their recent expansions. However, all populations were characterized by a bimodal distribution of pairwise nucleotide differences (data not shown). In Serbian population, the number of pairwise differences ranged from 0 to 75, and two peaks had modes at 12 and 32 differences (Fig. S2).

Differentiation of European populations based on complete mtDNAs

AMOVA revealed low but statistically significant genetic differentiation among populations (FST = 0.74%, p = 0, Table S5). The pairwise population FST values between Serbian and other studied populations were generally low but significant in the case of five populations (Danes, Poles, Russians, Sardinians, and Volga Tatars) (Table S6). In MDS plot constructed using pairwise population FST values, Sardinian and Volga Tatar populations were outliers, while among other European populations, two genetically close groups of populations were observed (Fig. 1). One group comprised Romance-speaking populations from Western and Southwestern Europe (Tuscan, Spanish, and French) and the second comprised populations from Central and Eastern Europe (Hungarian, Polish, and Estonian). Serbian and Russian populations were positioned between these two groups.

Fig. 1
figure 1

Multidimensional scaling plot of FST distances between Serbian population and selected European populations based on complete mtDNA sequences (stress value 0.00031). Population pairwise FST values are presented in Table S6

Bayesian skyline plot in Serbian population

BSP for 226 Serbians was performed to detect changes in population size over time (Fig. 2). An expansion of a population until 45.8 kya (95%, CI 41.9–47.4 kya) was followed by a more stable population size with a slow and continuous decrease. The lowest population size was observed during the Last Glacial Maximum (LGM), around 23.7 kya (95% CI 22.9–24.5 kya) and was followed by a steady growth until 7.1 kya (95% CI 7.1–8.7 kya) when the highest population size was recorded. This time frame corresponds to the post-LGM expansion which was followed by a decline in population size until 1.6 kya (95% CI 1.6–2.4 kya) when the lowest population size was reached. The subsequent growth of the population size coincides with the Migration period that occurred during the Early Middle Ages (fourth to ninth century A.D.).

Fig. 2
figure 2

BSP indicating the median of the hypothetical effective population size through time based on 226 Serbian complete mitogenome sequences. The x-axis is the time from the present in units of thousands of years, and the y-axis is equal to Neμ (the product of the effective population size and mutation rate). The thick solid line represents the median posterior effective population size through time, and the thin lines show the 95% highest posterior density limits

Discussion

Mitochondrial genome variability, especially that present in hypervariable regions of the control region, has been successfully used in forensics for more than 30 years [1, 2]. Nowadays, a trend towards the usage of variability of complete mitogenomes is evident in the forensic community because that enables a maximum increase in discriminatory power of this molecular tool [12, 16] thus providing ultimate resolution of distinct maternal lineages. However, the wider employment of complete mitogenomes in forensic casework is currently limited by the lack of appropriate reference database [16]. For instance, only mitogenomes found in East Slavs (Russians) and West Slavs (Poles) are currently available in the EMPOP database [23] despite the fact that Slavs are an important component of the extant European population. In order to overcome this issue and to contribute to a better representation of the Slavic-speaking populations in the EMPOP database, we present complete mitogenome data for 226 Serbians, representatives of South Slavs, which is now available in this centrally curated database essential for the forensic casework.

Complete Serbian mitogenomes were classified into 143 different mtDNA (sub)haplogroups of predominantly West Eurasian origin, with East Asian (D4j) and African (L2a1k) haplogroups observed at a low frequency. It is worth mentioning that although haplogroup assignment obtained with mtPhyl and SAM2 was generally concordant, we detected some differences in haplogrouping. In a few cases, mtPhyl haplogroup assignment matched one of the multiple possible haplogroups provided by SAM2 instead of MRCA which represents conservative estimate that should preferably be used in order to avoid bias in forensic casework. On the other hand, we took advantage of the mtDNA data available in the literature [18, 23, 25, 33, 36, 51,52,53,54,55] which is not (yet) included by the PhyloTree, and assigned haplotypes found outside the PhyloTree build 17 using mtPhyl to 10 new subhaplogroups or those previously proposed by us and other research groups.

The number of unique mitogenomes in Serbian population of 85% was lower than that found in Polish and Russian populations [23], and the random match probability was as low as 0.53%. This value, along with that observed in Danes (0.06%) and in Russians (0.27%), was among the lowest values in our study.

Both PHP and LHP, which are much more common in humans than previously thought and have been addressed in several recent forensic studies [16, 63, 64], were observed among Serbian mitogenomes. Contrary to the frequency of control region LHPs observed in Serbians, which was comparable with that reported in other surveys [16, 63], the frequency of PHPs was lower than that found in other human populations [16, 39]. The variation in incidence of PHPs in various human populations, has, however, been observed previously [16, 63]. Moreover, the PHP detection is certainly dependent on the sequencing chemistries and electrophoretic separation systems used by different laboratories [48]. It is also worth mentioning that several studies showed that PHP occurrence is age-related, dependent on the tissue type, and position-specific [64,65,66]. Given the tissue-specific occurrence of heteroplasmies, it is justified from a forensic perspective to compare the distribution of heteroplasmic sites between tissues [67] in order to differentiate between exclusion, mixtures, and occurrence of tissue-specific differences within an individual [64]. In addition, while PHPs may increase the strength of the evidence in forensics [68], LHPs are usually disregarded in forensic databasing [48]. Nonetheless, we observed that the inclusion of indel polymorphisms may slightly improve the resolution of Serbian mtDNA haplotypes (87.6% with indels vs. 85.0% without indels).

The level of mtDNA genetic diversity in Serbian population obtained in our study increased in comparison with that observed in a previous survey employing variability of the HVS-I region [18]. Furthermore, somewhat higher values of among population variation were detected in AMOVA based on complete mitogenomes compared with those based on HVS-I/HVS-II analysis (e.g., [7, 69, 70]). In comparison with the previous analyses based on HVS-I variability [18], our investigation at the maximum resolution has not led to the significant increase in FST values among Serbian and Hungarian population (FST of 0.00184 vs. 0.00178), while these values almost doubled in cases of the Serbian and Polish (FST of 0.00418 vs. 0.00279) and Serbian and Russian population (FST of 0.00357 vs. 0.00185). This may be due to the difference in population sizes used for the analyses, or alternatively, because of increased resolution achieved by the analysis of complete mitogenomes. Higher genetic affinity among Serbians and Russians than among Serbians and Poles observed in our MDS graph is in accordance with previous findings based on HVS-I variability [18].

The number of mean pairwise differences in Serbian population of 27.82 is in the range of that observed in other European populations (this study, [23]). A recent expansion of the Serbian and other studied populations, inferred from the negative and statistically significant Tajima’s D values, was not indicated by the mismatch distribution analysis which revealed bimodal distribution of nucleotide pairwise differences in all analyzed populations. However, bimodal distribution may be obtained in cases of subdivided ancestral populations even in the context of exponential population growth [71]. On the other hand, it may also indicate the presence of two groups of pairwise distributions, with nucleotide differences between recently diverged lineages accounting for the first, smaller peak, and differences between more distantly related haplotypes accounting for the second, larger peak [16].

A recent expansion of the Serbian population was indicated also by the BSP analysis which is very accurate in detecting relatively recent demographic changes [35, 72, 73]. Furthermore, based on this analysis, we were able to correlate time frames of main changes in population sizes over time to relevant prehistoric/historic events. For instance, it is well known that the climate change after the LGM triggered expansions of human populations throughout Eastern and Central Europe from southern refugia [23, 35, 72]. Post-LGM expansions have been demonstrated recently based on the analysis of complete mitogenomes of Russians and Poles [23], and the same trend was observed in our study based on complete mitogenomes of Serbians, with the maximum population size recorded around 7.1 kya, i.e., during the Neolithic agricultural transition [73]. Afterwards, during the Neolithic transition, we found a decline in the population size, which is in accordance with the documented decline of Neolithic cultures throughout the Europe [74, 75]. Different mechanisms have been put forward to explain this Neolithic decline, such as environmental overexploitation (i.e., decrease or disappearance of forests associated with the expansion of steppe environment), and/or a confrontation with foraging Steppe populations, [76, 77], or alternatively, the outbreak of the infectious diseases that spread rapidly due to the increased population density and close contact with domesticated animals. New findings suggest that an outbreak of a plague, Yersinia pestis, may account for the collapse of different Neolithic societies throughout the Europe [78]. Further analysis of the Neolithic archaeological remains from the territory of Serbia may shed more light on the mechanisms underlying the observed Neolithic decline.

BSP for Serbian mitogenomic data shows also an onset of a rapid population growth ~ 1.6 kya, i.e., during the Migration period (fourth to ninth century A.D.). During this turbulent period, many different populations, among which were different Germanic and Slavic tribes, went through or settled in the Balkans [79,80,81]. While the analysis of segments identical by decent revealed that these migrations have had a significant impact on the formation of the contemporary autosomal gene pool of populations that inhabit nowadays Southeast, Central, and East Europe [82], our findings suggest that they could have caused the population growth in the parts of the Balkan Peninsula as well. Furthermore, the importance of the migrations of different Germanic and Slavic tribes for the formation of the contemporary mtDNA gene pool of Serbians has been suggested previously, based on the presence of subclades shared among Serbian and different populations of a Slavic and/or Germanic origin [18, 33].

In conclusion, we present complete mitogenome data for 226 Serbians of which 170 were sequenced de novo and 56 were taken from our previous work [18, 33]. Since recent studies that employed low-resolution mtDNA data and larger sample of European populations revealed that Serbian population occupies an intermediate position between eastward (Bulgarians and Macedonians) and westward South Slavic populations (Slovenians, Croatians, Bosnians, and Herzegovinians) inhabiting the Balkans [18, 21], we argue that the Serbian population, which is found in the central part of the Balkan Peninsula, may be used as a representative population for South Slavs. Although additional population datasets are needed to represent adequately high genetic diversity in this part of Europe, our data, as the first report on complete mitogenomes in South Slavs, may constitute a reference database of their complete mitogenomes which are of interest not only for forensics but also for studies focusing on evolutionary history of human populations. We also demonstrate that the usage of complete mitogenomes increased the power of discrimination among individuals, which is essential in the forensic casework, and that the enrichment of the complete mtDNA reference database with Serbian mitogenomes contributed to a certain extent towards the improvement of the worldwide mtDNA phylogeny, which is important for the interpretation of the mtDNA casework [32].