Human diversity in India is defined by 4693 differently documented population groups. The genetic structure of Indian populations has been greatly influenced by social structure, population migrations, and caste endogamy. A predominantly tribal state, Chhattisgarh has about 35 big and small tribes inhabiting the state. Originally, Majhi are said to be originated from the intermingling of the Gond, Munda, and Kawar, the major tribes of Chhattisgarh. The Majhi are Caucasian belonging to Indo-Iranian group and practice a very high degree of endogamy. They speak Chhattisgarhi and use the Devanagari script for both inter-group and intra-group communication. In ancient days, Majhi used to make their living as boatmen, but now most of them are farmers and earn by cultivating rice, wheat, and vegetables. Rice is the staple food for the Majhi. In addition, most of the Majhi chew tobacco and have a strong addiction to haria (their native alcoholic beverage) which is consumed during every special occasion. The members of this tribe are spread in Ambikapur, Surguja, and Raigarh districts of central Indian state of Chhattisgarh (Supplementary Fig. S1).

In the present study, an attempt has been made to characterize the Majhi tribal population of central India by exploring genetic polymorphism using classical autosomal (biparental) and Y-STR (uniparental) markers. Peripheral blood from 129 (107 males and 22 females) unrelated Majhi individuals was collected from Mainpat, Ambikapur district of Chhattisgarh after obtaining written informed consent in compliance with the Declaration of Helsinki. DNA extraction, PCR (using Promega PowerPlex 16 HS for autosomal STR and PowerPlex Y23 kit for Y-STR markers), analysis on an ABI 3100 genetic analyzer and statistical evaluation has been done as described before [1, 2]. All steps were followed using the internal laboratory control standards and kit controls. The authors have passed the proficiency test of the GITAD, Spain (http://gitad.ugr.es/principal.htm) and quality control exercise of the YHRD, Germany (www.yhrd.org). This article follows the population data publication guidelines formulated by the journal.

The allele frequencies and the results of the forensic efficiency parameters for the 15 autosomal STR loci under study are given in Tables S1 and S2. The combined power of exclusion (CPE) and combined power of discrimination (CPD) for all 15 STR loci were 0.999998 and >0.999999, respectively. The combined matching probability (CPm) and combined paternity index (CPI) for all 15 STR loci was 3.72 × 1016 and 5.30 × 109, respectively. After applying Bonferroni correction, no significant deviations from Hardy-Weinberg equilibrium were observed among all the studied loci (p < 0.003) except at locus D3S1358 and D5S181. Genetic distances were estimated for each autosomal STR marker for other central Indian populations [1, 36]. The Majhi population showed genetic distance from all other published populations used for comparison and pair-wise genetic distance on previously published data on 15 (Supplementary Table S3a) or 13 (Supplementary Table S3b–d) autosomal STR markers. PCA plot (Supplementary Fig. S2a, b) of the Majhi tribal population when compared with other central Indian tribes is consistent with the clustering pattern of the NJ tree (Supplementary Fig. S3a, b)].

Of the total 107 unrelated individuals typed for 23 Y- STR loci, a total of 64 Y-STR haplotypes (Supplementary Table S4) were observed, out of which 31 haplotypes were unique, 30 haplotypes were observed twice, 2 haplotypes were observed six times, and 1 haplotypes was observed four times. The discrimination capacity (DC), defined as the ratio between the number of different haplotypes and the total number of haplotypes, was 0.598. Haplotype diversity (HD) calculated analogous to GD was 0.989. The overall haplotype diversity was calculated as 0.988. The highest gene diversity values at the single copy locus DYS570 and the multi-copy locus DYS385 a/b were 0.805 and 0.952, respectively. Average gene diversity over loci was 0.652304 ± 0.325572. Match probability (MP) was calculated as the sum of squared haplotype frequencies, 2.09 × 10−2. The data were submitted to the YHRD as “Majhi, Chhattisgarh, Central India” and can be retrieved using the accession number YP001109 (http://www.yhrd.org). The results of AMOVA analysis with available Y 23 Indian population data on YHRD is presented in Table S5, and clustering pattern of studied population with the compared Indian populations is shown in the MDS plot generated with the population data [7, 8] available in YHRD (Supplementary Fig. S4). The data provide a reference for autosomal and Y-STR database in India and may be valuable for the purpose of forensic and population genetic analysis.