This study encompass a total of 1020 unrelated healthy subjects belonging to four major ethnic groups of Pakistan (Punjabi, 600; Saraiki, 150; Pakhtuns, 140; and Sindhi, 130). After approval of the written informed consent, blood or buccal swab samples were collected from each subject residing in the respective area of their ethnicity. In order to avoid sampling biases and have a reasonable representation of each population, participants were recruited from both genders and all age groups. The selected ethnic groups in this study represent a reasonable proportion of the total Pakistani population. For example, by comparing the two maps shown in Figs. 1 and 2 (Supplementary Files), it is plausible to say that the current sample in our study is a true representation of the Pakistani population in whole.

Genomic DNA was extracted from blood samples/buccal swab by organic method. All 15 loci along with amelogenin were co-amplified using the AmpFℓSTR Identifiler® kit (Applied Biosystems). Allele frequencies at each locus were calculated using “Hierfstat package” [1] of R computing language. Observed heterozygosity (Ho) and expected heterozygosity (He) were calculated using the Genepop software version 4 [2]. Parameters of forensic interest were calculated by using the PowerStats software v1.2 [3]. Matching probability (MP), power of discrimination (PD), polymorphic information content (PIC), power of exclusion (PE), and typical paternity index (TPI) were calculated to investigate the admissibility of studied marker set for Pakistani populations. Genetic distances (FST) were calculated by introducing 5000 bootstrap values. These genetic distances were calculated by using the Poptree2 software [4] and phylogenetic tree (neighbor joining method) showing the closest and farthest genetic neighbors of the studied populations was created. Finally, our results were compared with other neighboring populations like Afghanistan [5], Iran [6], China [7], Nepal [8], Bhutan [9], United Arab Emirates [10], and India [11,12,13,14].

Distribution of allele frequencies for the four studied populations (Punjabi, Saraiki, Pakhtun, and Sindhi) is presented in Table S1 to S4 of the supplementary material provided in the online version of this article. A total of 187 different alleles were observed with a range of frequency from 0.001 (D21S11, CSF1PO, D3S1358, TH01, D13S317, D2S1338, D19S433, vWA, TPOX, D18S51, D5S818, FGA) in Punjabi population to 0.471 (TPOX) in Pakhtun population. Observed heterozygosity (Ho) was observed to be the lowest with a value of 0.5615 at locus (CSF1PO) in Sindhi population and was highest with a value of 0.9133 at locus FGA in Saraiki population. Matching probability (MP) calculations also showed a variety from the lowest being at 0.028 at locus D2S1338 in Punjabi population to the highest value at 0.174 at locus TPOX in Pakhtun population, while the combined matching probability was 1.321 × 10−18 for Punjabi population, 2.348 × 10−18 for Saraiki population, 6.327 × 10−18 for Pakhtun population, and 6.715 × 10−18 for Sindhi population. Power of discrimination (PD) ranged from 0.826 (TPOX) in Pakhtun population to 0.972 (D2S1338) in Punjabi population, and the combined power of discrimination for the Punjabi population was observed to be 0.3188, for Saraiki 0.3122, for Pakhtun 0.2822, and for the Sindhi population it was 0.2815. Power of exclusion (PE) spanned from 0.247 at locus CSF1PO in Sindhi population to the highest value of 0.823 at locus FGA in Saraiki population. All four major Pakistani populations were found to be in close proximity when the genetic distances were calculated among them and with the neighboring populations.

Notably, all the studied major Pakistani populations were observed to show close genetic affinity. However, the Punjabi population showed more genetic resemblance to the neighboring Balmiki population of India which lives across the border of the divided Punjab at the time of partition of the British India. Moreover, the previously studied Hazara population of Pakistan [15] showed their genetic neighborhood with the Ouzbek population of Afghanistan and to the Uyghur population of China which borders the northern parts of Pakistan. Balochi population showed their genetic resemblance with the Irani counterpart and also with the Arabs living in United Arab Emirates, while Afridi Pathan population of India showed the greatest genetic distance with the Pakistani populations. Nepali and Bhutani populations were also present at a notable distance. All the calculated genetic distances (FST) can be seen in Table S5 of the Supplementary data.