Introduction

Determining the gender of human DNA samples is of prime importance in forensics. The amelogenin locus is routinely used for sex determination of a sample because it resides on homologous regions of the sex chromosomes [1]. PCR primers are designed spanning the 6-bp deletion on X chromosome to produce differentially sized amplicons (106 and 112 bp) for the X and Y chromosomes, respectively [2]. However, this method can sometimes misidentify a male as a female due to a deletion in the AmelY region. This deletion has been reported to occur at a frequency of 0.08% in Caucasian males, 1.8% in Indians, and as high as 8% in Sri Lankans [36]. Varying lengths of interstitial deletions ranging from 1.0 Mb to around 3.8 Mb have been observed on the pericentromeric region of the short arm of Y chromosome, which may include the amelogenin locus [7, 8]. The deletion in AmelY region does not have considerable consequence on enamel formation because of the presence of the X homologue but has important implications for forensic casework [9].

Several authors have suggested the inclusion of additional Y locus for conclusive sex test and to prevent gender mistyping. Chen et al. [10] first described and characterized the presence of a pentanucleotide repeat [(TAAAA)n] DXYS156, which maps to both the sex chromosomes and can be used for genetic identity testing. The microsatellite DXS156 has been localized by linkage analysis to a region on the long arm of X chromosome (Xq) which shows sequence homology with the short arm of Y chromosome (Yp) [11, 12]. The strictly sex-linked region offers a high degree of X–Y sequence homology which has been attributed to a recent transposition from Xq to Yp followed by Yp inversion during hominid evolution about 3–4 million years ago [13]. Because of the high sequence similarity, the polymorphisms in this STR can be detected by a single primer pair flanking the pentanucleotide repeat on both the sex chromosomes [10]. The pentanucleotide repeat is known to occur within a human long interspersed repetitive element (LINE) with a single TAAAA motif in the consensus sequence. It is still unclear whether both X and Y chromosomes contain this repeat as a part of homology or as independent insertions [10]. Unlike in the pseudoautosomal regions, no cross-over occurs between these regions of X and Y suggesting that the alleles of this STR stay specific to the respective chromosome [14]. The smaller and larger alleles have usually been assigned to the X and Y chromosomes, respectively.

The multipurpose STR DXYS156 offers an advantage over amelogenin in being multi-allelic, with demographically restricted alleles which may indicate the probable geographic origin of an individual [15, 16]. Its X and Y alleles can be unambiguously distinguished from each other due to an adenine insertion which has been reported to be Y specific [15]. Thus, DXYS156 locus would help to uncover the differential genetic variability occurring on homologous regions of X and Y chromosomes [17]. Since there is a dearth of data on the genetic variation of this pentanucleotide marker in Indian populations, the study would be a step towards setting up a database for the use of this marker in forensic casework.

India is the home to a number of extant endogamous populations which belong to the four major linguistic groups, namely, Indo-European (IE), Dravidian (DR), Tibeto-Burman (TB), and Austro-Asiatic (AA). IE speakers are concentrated in northern and central India while southern India is predominantly occupied by DR speakers. Northeastern India is the home to mainly TB speakers while AA speakers occupy the pockets of central and eastern India. Indian population is also socially stratified into castes and tribes. The strategic location at the crossroads of the African, the northern Eurasian, and the Oriental realm and the presence of linguistically, demographically, and socially diverse populations account for the high genetic diversity in India, next only to that of Africa [18, 19]. Since the genetic diversity within a geographical region is an indicator of the age of the populations’ dwelling in that area, it is quite likely that India became peopled with early colonizers migrating out of Africa along a southern route [19]. Genetic markers are pivotal in tracing human migration patterns and studying population stratification. Studying the variability of homologous markers on the sex-specific regions of X and Y chromosomes would provide an assessment of the role of sex-specific processes in shaping human population structures and the differential selective pressures acting on these markers [20]. Such studies can also complement those of mtDNA and Y chromosome for exemplifying the sex-specific patterns of population structure [20]. In the present study, we explore the genetic diversity existing among few Indian populations at the X and Y locus of DXYS156 marker and their genetic relationship with the populations of Africa and East and Southeast Asia.

Materials and methods

Sampling and subjects

Blood samples from 749 unrelated healthy male individuals were collected from 11 endogamous populations of India [Balmiki (n = 62), Sakaldwipi Brahmin (n = 65), Kanyakubja Brahmin (n = 78), Konkanastha Brahmin (n = 71), and Mahadev Koli (n = 65) populations belonging to IE linguistic group; Iyengar (n = 66), Kurumans (n = 67), and Gond (n = 75) populations from DR group; Tripuri (n = 65) and Riang (n = 67) populations of TB group, and Munda (n = 68) population belonging to AA group] distributed over six geographical regions (north, east, south, west, central, and northeast). At the social level, five caste and six tribal populations have been studied. A detailed description of the populations selected for the study has been made elsewhere [21]. The blood samples were collected with an informed consent following the protocols approved by the institutional ethics committee.

DNA isolation and quantitation

Genomic DNA was isolated and quantified as described previously [21].

Amplification and genotyping

Approximately 1 ng of DNA from population samples was amplified in 10 μl reaction for DXYS156 locus in GeneAmp® PCR System 9700 (Applied Biosystems, Foster City, CA) following the protocol of Chen et al. [10], with minor modifications. The experiments were conducted in accordance with quality control measures and using cell lines 9947A, 9948, and K562 from Promega Corp. (Promega, Madison) as standard reference materials [22]. The amplified products were electrophoresed on an ABI PRISM® 3100 Genetic Analyzer with POP4 polymer using GeneScanTM LIZ-500 size standard (Applied Biosystems, Foster City, CA). The data were analyzed with GeneMapper® ID Software v3.2 (Applied Biosystems, Foster City, CA). The samples were also genotyped for the amelogenin locus using AmpFlSTR® Identifiler® (Applied Biosystems, Foster City, CA) [21].

Sequencing of alleles

The repeat number and structure of the alleles at this locus were determined as described previously [15, 17]. The X and Y chromosome alleles were separated on a 2% Ultrapure Agarose gel and eluted using the QIAquick® Gel Extraction kit (Qiagen Inc., Valencia, CA) following the manufacturer’s protocol. The alleles were confirmed by bidirectional sequencing using Big Dye® Terminator v3.1 Cycle Sequencing Kit (Applied Biosystems, Foster City, CA) and published primers.

Statistical analyses

The estimation–maximization algorithm was employed for computing maximum likelihood estimates of allele frequencies for DXYS156X and DXYS156Y loci using Arlequin program v3.1 [23]. Gene diversity, population pairwise genetic distances (FST), and analysis of molecular variance (AMOVA) among the studied populations were also computed using Arlequin program. For the evaluation of forensic efficiency of the marker, various statistical parameters were calculated with chromosome X website software (http://www.chrx-str.org) [24].

Validation studies

Ten paternity case samples of male individuals showing the absence of Y allele peak and two samples of Sakaldwipi Brahmin population having X allele peak absent [21] at the amelogenin locus of AmpFlSTR® Identifiler® (Applied Biosystems, Foster City, CA) were employed for validation of DXYS156 STR for forensic casework. The Y allele deficient samples were also typed for 17 Y STR loci using AmpFlSTR® Yfiler® (Applied Biosystems, Foster City, CA) for confirmation of the presence of Y chromosome. DNA isolated from various body fluids (blood stain, semen stain, and saliva), postmortem tissue samples, and swabs from surfaces (buccal, vaginal, and fingerprint) were also utilized for the validation of the marker. DNA isolation was followed by purification and concentration with Microcon® 100 to evade the problem of PCR inhibition. The same procedure was undertaken for determining the template concentration of validation samples as done for the population samples. Also, DNA from different male and female samples were mixed in various ratios and used for mixture analysis interpretation studies.

Results and discussion

The analyses of 749 X and Y chromosomes in 11 populations revealed 11 different alleles (5–15.1) spanning a range of 130 to 181 bp. The allele frequencies are presented in Table 1. Alleles were assigned to their respective chromosomes according to Cali et al. and Karafet et al. [15, 17]. For the DXYS156X locus, the highest frequency was observed for the allele 7, concordant with the finding that this allele is highly frequent in non-African populations [17]. The allele 12.1 showed highest frequency in all the studied populations at the DXYS156Y locus. The long allele Y14.1 was observed in high frequency in the populations of TB linguistic family inhabiting northeastern India while allele Y15.1 was found exclusively in these populations. These alleles have been reported to occur in East Asian populations [16, 17], suggestive of a genetic signature of the eastern part of Asia. Previous Y chromosomal studies have suggested that speakers of the TB language family arrived on waves of migration from southern China through the northeastern corridor of India [25]. This passage has also served as a geographic barrier for human migrations, thereby creating a genetic discontinuity and isolation between the populations of northeast and those of Indian mainland [25]. The overall allelic distribution pattern of DXYS156 in India depicts a blend of the African and Asian data. This is consistent with the finding that an early wave of exodus from Africa, along the coastline of India to south Asia was responsible for the colonization of south Asia and parts of southeastern Asia [19, 26].

Table 1 Allele frequencies and forensic parameters for the studied population for the locus DXYS156

The gene diversity values for the X chromosome locus were found to be smaller as compared to those for the Y chromosome locus in most of the studied populations. The highest value of gene diversity for the DXYS156X locus was observed in the Sakaldwipi Brahmin population (0.470) while for the DXYS156Y locus was observed in the Kurumans population (0.594). Endogamous populations inhabiting the same geographical areas were found to harbor comparable total gene diversity. The variation in the genetic diversity among the populations at the X and Y locus is mentioned in Table 1 (a) and (b) (supplementary data). The results of population pairwise Fst at DXYS156X showed that only three populations had significant genetic distances from the remaining studied populations, while at DXYS156Y, most of the populations were significantly distinct from each other. Results of the exploration of genetic multiplicity by AMOVA at X and Y chromosome loci have been shown in Table 2 (a) and (b) (supplementary data).

The substantial genetic diversity at DXYS156Y locus compared to that at DXYS156X locus in Indian populations is in congruence with the astounding Y diversity in Asians rather than Africans [16, 17]. The mutation rate at Y chromosome is several folds higher than that at the X chromosome, favoring the accumulation of mutations in the repeat units. There is evidence that mutation rate enhances with the increase in the number of repeat units [27, 28]. Also, the smaller effective population size of the Y chromosome for each generation makes the mutations occurring on this chromosome more easily heritable [16]. Sex-specific processes such as lower migration rates and lower effective population size of males may also account for the discordant pattern of genetic variability on the sex chromosomes [20]. Previous Y chromosomal studies on Indian populations also reveal higher genetic differentiation at these loci due to the prevalent practices of restricted male social mobility [29]. Earlier studies analyzing the genetic structure among Indian ethnic populations have shown that they have resulted from admixture of four or five ancestral populations [29, 30]. There exists considerable genetic heterogeneity between castes and tribes, which might be attributed to their diverse origins, population histories, and restricted geographical territories, with a high degree of confounding occurring between language and geography [26]. The variations among the studied populations at DXYS156 computed by AMOVA also provide a reflection of previous observations. The genetic differentiation observed at the social level is indicative of little or no exchange of genetic material between tribal and caste populations [29, 30].

Forensic parameters for the STR are shown in Table 1. High values of Polymorphism Information Content (PIC), Power of Discrimination (PD), and Power of Exclusion (PE) were observed in the Kurumans, Konkanastha Brahmin, and Mahadev Koli populations. This STR was found to be polymorphic in all populations, with the southern and western regions of India displaying considerable diversity and polymorphisms. Alleles specific to X and Y chromosomes were observed for all validation samples which had failed to give positive result for the amelogenin sex test (Figs. 1 and 2, supplementary data). Samples having high detectable DNA (Table 3, supplementary data) produced peaks with high relative fluorescence units. For templates with low DNA concentration (<10 pg/μl), complete amplification was achieved by increasing the number of PCR cycles to 34. The procedure has also been described elsewhere [31]. Genotyping of mixture samples (containing both male and female DNA) showed the presence of alleles of both X and Y loci. The higher peak height and relative fluorescence units of the X chromosomal allele accounted for its occurrence in both the samples (Fig. 3, supplementary data). Further interpretation of such samples was done by typing additional X and Y STRs. Significantly high values were obtained for all the forensic parameters showing that this STR can be efficiently used for sex determination, especially in situation where a failure is observed in amelogenin-based sex test.

Conclusions

The present study shows that the pentanucleotide STR DXYS156 is polymorphic for the Indian population. It can contribute towards DNA profiling and also serve as an additional sex-determining marker. Higher genetic variability was observed at the Y chromosomal locus than at its sex chromosomal counterpart in the studied populations. Few glimpses of genetic affinity of Indian populations with the neighboring ones are indicative of the central role played by India in colonization of south and southeastern Asia. This STR may be used to complement other Y chromosomal and mtDNA markers for a better understanding of the peopling of India. A database of this STR in the Indian population needs to be established for its further use in forensic identity testing and evolutionary studies. Additional studies on X–Y homologous markers may be undertaken to have an insight into the role of sex-specific processes in shaping the genetic structure of human populations.