Introduction

Odisha is a coastal state in the eastern India with 485 km of coastline along the Bay of Bengal which stands out as the 9th largest state, spreading over an area of 155,707 km2 (Fig. S1). As per the 2011 national census, Odisha has a population of around 42 million and is ranked 11th among the Indian states on the basis of its population [1]. Its population constitutes about 3.47% of the total population of India. The extant Odia population of Indo-Aryan ethnic group varies in ethnicity, hierarchical caste system, and native language. Odisha’s relative isolation and lack of external invasions have conserved the social, religious, and genetic structure of the Odia population. The state is inhabited by the largest section of diversified tribal population including Santhal, Munda, Kandha, Bonda, Mahali, and Oraon. High-density population genetic data are needed to establish reference databases that can be used to statistically strengthen the results of forensic analyses. This is the first report on Odia population on 21 autosomal STR loci using a GlobalFiler™ PCR amplification kit (Thermo Scientific, US). The purpose of this study was to evaluate the extent of genetic diversity in the ethnic population of Odisha and to investigate the genetic diversity and allele frequency of the Odia population.

In the present study, 508 healthy unrelated individuals were considered, and the population samples were taken from various geographical regions of Odisha. All the samples were taken from the routine casework of the authors of DNA Profiling Unit, State Forensic Science Laboratory, Rasulgarh, Bhubaneswar, Odisha, India, during July 2015 to November 2018 with written informed consent following the declaration of Helsinki. DNA was extracted from the samples by an automated DNA extraction system, EZ1 Advanced XL (Qiagen, Germany) using EZ1 DNA Investigator Kit (Qiagen) as per the protocol supplied by the manufacturer. Quantification of the isolated DNA samples was performed by 7500 real-time PCR (Thermo Scientific, US) using Quantifiler Duo DNA Quantification Kit (Thermo Scientific, US) following the manufacturer’s protocol. The isolated DNA samples were further amplified at 21 STR loci which included the 20 CODIS loci (D3S1358, vWA, D16S539, CSF1PO, TPOX, D8S1179, D21S11, D18S51, D2S441, D19S433, TH01, FGA, D22S1045, D5S818, D13S317, D7S820, D10S1248, D1S1656, D12S391, D2S1338) and another additional autosomal loci SE33 besides 3 gender-determining loci (Amelogenin, Y indel, and DYS391) using the GlobalFiler™ PCR amplification kit (Thermo Scientific, US) as per the protocol supplied by the manufacturer. Veriti™ 96-Well Fast Thermal Cycler (Thermo Scientific, US) was used for amplification of STR markers. The amplified products along with the positive and negative control samples, supplied by the kit manufacturer, were run on the Genetic Analyzer Model 3500xL (Thermo Scientific, US) using Gene Scan™ 600 Liz size standard (Thermo Scientific, US). The data was analyzed using the GeneMapper® IDX Software v1.4 (Thermo Scientific, US). The obtained data were statistically analyzed by the GenAlEx 6.5 software [2], PowerStats v1.2 spreadsheet program [3], Arlequin v3.5 software [4], POPTREE2 program [5], and PAST 3.02a software [6].

The observed allele frequencies ranged from 0.001 to 0.409 (Table S2), out of which CSF1PO locus was found to have a maximum allele frequency and allele 12 (0.409) being the most frequent allele in the studied population. The power of discrimination (PD) ranged from 0.863 (D2S441) to 0.991 (SE33), and polymorphic information content (PIC) ranged from 0.65 (TPOX) to 0.94 (SE33). The most polymorphic and discriminatory STR locus in the studied population was found to be SE33 with values of 0.94 and 0.991 respectively. The study revealed a combined power of discrimination (CPD) and combined power of exclusion (CPE) for all the studied loci with the value of 1 and 0.999999999704865 respectively. The combined probability of match (CPM) and combined paternity index (CPI) for all the studied STR loci were found to be 8.01 × 10−26 and 3.45 × 109 respectively. The observed heterozygosity (Hobs) value varied from 0.679 (TPOX) to 0.925 (SE33).

Locus-wise allele frequencies of the studied population were compared at 15 common STR loci of previously published data on Indian populations [7,8,9,10,11,12,13,14,15] (Table S1), using pairwise Fst distance (Table S3). The population of Odisha showed significant variations at 12 loci with Kora (Bengal), 11 loci with Yerukula (Andhra Pradesh), 8 loci with Konkanastha Brahmin (Maharastra) and Bhil (Gujrat), 7 loci with Kurmans (Tamilnadu) and Central Indian population (Madhya Pradesh), 6 loci with Chenchu (Andhra Pradesh), Santhal (Chota Nagpur) and Jat Sikh (Punjab), 2 loci with Oraon (Chattisgarh), and the population of Jharkhand out of 15 common STR loci (Table S3). In order to verify the results obtained from Fst and distance analysis, a neighbor-joining (NJ) tree was drawn (Fig. S2). The NJ tree revealed two major clusters, and Jat Sikh (Punjab), Yerukula (Andhra Pradesh), and Bhil (Gujrat) were found to be the outlier population. The studied population clustered along with the population of Jharkhand belonging to the eastern geographical region of India and with the Oraon population of Chhattisgarh. This shows a close genetic affinity of the studied population with Central Indian population as well as geographically closely located populations with higher bootstrap values which ranged from 41 to 77. Second cluster consists of Konkanstha Brahmin (Maharastra) and Kurmans (Tamil Nadu) with the bootstrap value 57. Interestingly, the studied population showed significant genetic distance from distinct geographic regions, and the possible cause for the same may be a long term geographical isolation (Fig. S2). Similarly, the principal component analysis (Fig. S3) also showed the same clustering pattern in the compared populations. Thus, the NJ tree and PC analyses produced a consistent result with the previous observations.