INTRODUCTION

Each individual carries a unique genome, which is characterized by a nonrepeating combination of alleles of many polymorphic DNA markers. The main method of forensic identification based on DNA analysis is the study of autosomal short tandem repeats (STR), which have a high level of allelic polymorphism. Genotyping of the allelic state of 15–20 loci of such repeats has a sufficient level of information content to establish individual identity. For the analysis of autosomal STR loci, genotyping reagent kits are used, the main manufacturers of which are Thermo Fisher Scientific (United States) and Promega Corporation (United States). The reagent kits of these manufacturers are based on the CODIS marker panel (Combined DNA Index System), proposed in 2015 and approved for use for forensic DNA analysis in 2017 by the Federal Bureau of Investigation of the US Department of Justice. Despite the fact that the CODIS system was developed and operates in the United States, many other countries, including the Republic of Belarus and the Russian Federation, rely on the standard of this system when carrying out DNA identification of a person in forensic examination. In 2017, the CODIS marker panel, which previously included 13 microsatellites, was expanded. Currently, the so-called core of the system includes 20 autosomal STR markers: CSF1PO, FGA, THO1, TPOX, VWA, D3S1358, D5S818, D7S820, D8S1179, D13S317, D16S539, D18S51, D21S11, D1S1656, D2S441, D2S1338, D10S1248, D12S391, D19S433, D22S1045. These changes have led to the need to update the reference databases containing the information necessary for the probabilistic and statistical assessment of the results of the expert study, as well as to update the information on the level of genetic differentiation of local populations by markers of the extended CODIS system and the informational significance of the new core of the system for DNA identification in a number of local populations of Eastern Europe. Most publications devoted to this issue currently contain information on only 16 STR markers [114].

In this work, we characterized the allele frequencies and the level of genetic differentiation of populations and assessed the identification potential of the system of 21 autosomal STR markers (D1S1656, D2S441, D2S1338, D3S1358, D5S818, D7S820, D8S1179, D10S1248, D12S391, D13S317, D16S395, D18S51, D21S11, D22S1045, CSF1PO, FGA, Penta D, Penta E, TH01, TPOX, vWA) in population samples of eight ethnic groups of Eastern Europe—Russians, Belarusians, Moldovans, Gagauz, Poles, Komi, Mari, and Ukrainians.

MATERIALS AND METHODS

Research material consisted of DNA samples from 15 different local populations of eight peoples of Eastern Europe. The total number of samples was 1093 individuals. DNA samples were taken from unrelated individuals from the following populations living in Russia (Russians—205 samples, Komi—83, Mari—76), Moldova (Moldovans—117, Gagauz—57), and Belarus (Russians—57, Belarusians–413, Poles—32, Ukrainians—110). The material from the populations of Belarus contains six samples from the historical and ethnographic regions of the republic: Eastern Polesie (N = 41), Western Polesie (N = 56), Central region (N = 135), Podneprovye (N = 80), Poozerye (N = 55), Ponemanye (N = 46). Komi are represented by two samples: Komi-Zyryans (N = 43), Komi-Izhemtsy (N = 40).

The collection of primary biological material (venous blood) from donors was carried out in compliance with the procedure of written informed consent for the study. For each donor, a questionnaire was drawn up with his genealogy, ethnicity, and places of birth of ancestors. The study included only DNA samples from donors who, according to the results of questioning, denied the fact of crossbreeding with representatives of other ethnic groups for at least three generations. An individual was assigned to a given ethnic group based on his own ethnic identity, his parents, and his place of birth.

Genotyping of samples for autosomal STR markers was performed by multiplex polymerase chain reaction with subsequent analysis of amplification products on an automatic genetic analyzer as described previously [10]. Experimental studies were carried out on the basis of the Center for Shared Use of Scientific Research Equipment Medical Genomics of the Tomsk National Research Medical Center and the Scientific Practical Center of the State Forensic Expertise Committee of the Republic of Belarus.

The correspondence of the distributions of genotypes to the Hardy–Weinberg equilibrium, genetic variability of STR loci, and interpopulation comparison were assessed using Arlequin v. 3.11 [15]. To analyze the genetic differentiation of populations, we used the calculation of pairwise values Fst and analysis of molecular variance (AMOVA), using the matrix of mean square differences in the number of repeats Rst as a matrix of genetic distances.

The discriminatory potential of the system of 21 STR markers was assessed using standard forensic indicators, namely, the probability of random coincidence of genotypes (MP, matching probability), the probability of discrimination of unrelated individuals (PD, power of discrimination), excluding ability (PE, power of exclusion), and paternity index (PI) [16]. Allele frequencies and forensic parameters were calculated using the MS Excel software package.

RESULTS AND DISCUSSION

Genetic Variability of 21 STR

Allele frequencies of 21 STR show significant diversity in the studied populations. The most frequent alleles for most loci in all studied populations are the same. The largest total number of alleles for 21 autosomal markers was found in the Belarusian population (250), a slightly smaller number of alleles was shown in the Russian and Ukrainian populations (228 and 214, respectively), 204 alleles were found in the Moldovan population. In the population samples of the Gagauz (195), Mari (192), and Poles (184), less than 200 alleles are shown. The smallest number of alleles was found in the Komi-Izhma population (163).

After applying Bonferroni correction (p = 0.002), no significant deviations from the Hardy–Weinberg equilibrium were observed, except for four markers in three populations. The FGA marker showed deviations in the Russian population (p = 0.00082 ± 0.00002). The Penta E marker showed deviations in the total sample of Belarusians (p = 0.00175 ± 0.00004). In the subsample of Eastern Polesie, a deviation from equilibrium was shown at the D1S1656 locus (p = 0.00095 ± 0.00003). Finally, D16S539 (p = 0.00095 ± 0.00003) showed deviations in the sample of Komi-Izhemtsy.

Analysis of genetic distances for six samples of Belarusians did not show statistically significant genetic distance between them. Thus, for a given population, there is no need to take into account the territorial origin of the individual within the country in the identification analysis. Nevertheless, in further calculations, we present the results both for the total sample of Belarusians and for the territorial samples.

The opposite situation is shown for the Komi samples. Analysis of Nei’s genetic distances revealed the presence of a significant genetic distance between the Komi-Zyryans and the Komi-Izhemtsy (0.0054, p = 0.004). These results provide a basis for considering these samples separately from each other in the formation of reference population data during genetic examination.

Also, significant genetic distances by 21 analyzed STR markers separate the populations of Komi-Zyryans, Komi-Izhemtsy, Mari, and Moldovans from each other and from the rest of the analyzed populations. On the contrary, there is no significant genetic distance between Russians, Belarusians, Poles, and Ukrainians (Table 1).

Table 1. Nei’s genetic distances

It should be noted that the marker with the highest number of alleles is the D12S391 locus, for which 18 allelic variants in the Belarusian population and 15 alleles in the Polish population have been shown. In the rest of the analyzed populations, the largest number of alleles is noted for the Penta E locus. Allele frequency data are available upon request from the authors.

Genetic Differentiation of Populations

The assessment of the genetic differentiation of the studied samples for the totality of all STR markers was carried out using the analysis of molecular variance. When comparing the entire array of samples, the differences between the populations were 0.35%. In the case of combining samples to the level of ethnic groups, differentiation at the level of ethnic groups increased to 0.39%. When comparing two samples of Russians and Belarusians, there were no differences at all. Comparison of the populations of Russians with the Komi and Mari showed a small negative level of genetic differentiation (0.03%), indicating a slightly greater genetic similarity of individuals from different populations as compared to individuals from the same population. The differentiation of the populations of Russians, Moldavians, and Gagauz was 0.13%.

When analyzing allele frequencies for all populations from the Republic of Belarus (all samples of Belarusians, Russians, Poles, and Ukrainians), the proportion of differences between them was 0.1%. When comparing samples of Belarusians from different historical and ethnographic regions of the republic, the level of differentiation between them was 0.1%. When comparing the group of Belarusians with Russians, Poles, and Ukrainians, the difference reached 0.8%. These results are quite expected owing to the multi-allelicity of STR markers and their relatively high degree of mutation and recombination. It is also natural to increase the indicators of genetic differentiation when comparing ethnic groups of different origins. The results of analysis of variance show that there are no significant genetic differences between Russians and Belarusians, and Slavic populations in general, as well as a very low level of differences between Russians, Komi, and Mari in STR genotypes. This means that, despite the available information on significant anthropological differentiation and existing modern differences between the studied ethnoterritorial groups, part of their gene pool for autosomal STR markers is very poorly differentiated.

Genetic Relationships between Populations

Visualization of the matrix of genetic distances between population samples by the method of multidimensional scaling (Fig. 1) shows that the studied samples are distributed in accordance with their ethnic proximity and geographic location. The genetically closest to each other are samples of Russians from Russia and Belarus. The most geographically dispersed samples of Komi-Izhemtsy and Poles are the most distant from each other. The Komi-Zyryans and the Komi-Izhemtsy show significant differences in the first dimension, which is probably related to a relatively strong shift in the allele frequencies in the sample of Komi-Izhemtsy because of their relatively recent settlement on this territory and the presence of the founder effect in this subethnic group. In accordance with the geographic location, the samples of Belarusians from the Eastern and Western Polesie, Central Belarus, the Podneprovye, and Ponemanye are also close to each other. Only the Poozerye sample is knocked out, which is the most northern, but at the same time is grouped with the southern samples from Polesie. The analysis highlights the genetic affinity between Moldovans and Gagauz.

Fig. 1.
figure 1

Position of the studied populations in the MDS space.

Identification Potential of the System of Autosomal STR Loci

To assess the possibility of using the marker system for DNA identification in forensic examination, the standard forensic parameters of informativeness were calculated: the probability of random coincidence of genotypes (MP, matching probability), the probability of discrimination of unrelated individuals (PD, power of discrimination), excluding ability (PE, power of exclusion), paternity index (PI, paternity index). Indicators MP and PD are used for DNA identification of a person, and values for PE and PI are calculated when determining paternity.

The forensic parameters for each locus are presented in the supplementary materials. The greatest information content of the extended set of markers for DNA identification was shown for the Komi samples (Komi-Zyryans, Komi-Izhemtsy) (Table 2). In the Russian population, the value of the probability of discrimination of unrelated individuals (PD) increased to 0.999999999994978, and the index of excluding ability (PE) changed from 0.99999999989 to 0.999999998679. The lowest identifying informational content of a set of 21 STRs is shown for the sample of Poles. The indicators of informativeness for identification and determination of paternity in all populations are several orders of magnitude higher than the values established by the regulations in force on the territory of the Russian Federation. Data on the allele frequencies of 21 STR loci in the populations studied in this work are included in the database developed by the authors. The presented frequencies can be reference (for the corresponding population or ethnic group) for a probabilistic and statistical assessment of the results of an expert study in identifying a person, establishing kinship, etc. In addition, the data can be used for population genetic comparative studies.

Table 2. Identification informativeness of a set of 21 autosomal STR loci in populations of Eastern Europe

Thus, the frequencies of alleles and forensic parameters of 21 autosomal STR loci were established for 15 populations of eight peoples of Eastern Europe. The significant discriminatory potential of the used kit for DNA identification and its effectiveness in determining kinship are shown. Assessment of the genetic differentiation of different populations revealed the need to take into account the population affiliation when creating databases of genotypes for comparing the reference allele frequencies.