Introduction

Anatomically modern humans (Homo sapiens) originated in East Africa during Middle Paleolithic ~ 200 thousand years ago (kya) and dispersed ‘Out-of-Africa’ to populate the world (Lewin 1987; Stringer et al. 1989; Cavalli-Sforza et al. 1994; Lahr and Foley 1998; Kivisild et al. 1999a, b). During the Neolithic period ~ 10 kya, a major agricultural event along with extensive immigration by demic diffusion played a major role in recolonizing human population in Europe, which expanded with genetic crossroads (Zvelebil 1980; Cavalli-Sforza 1996; Thorpe 1999; Quintana-Murci et al. 2004; Sahoo et al. 2006; Alizadeh et al. 2010). It served as a reservoir of genetic variation for particular lineages, which would have subsequently expanded to other regions (De Fanti et al. 2015).

The subcontinent of India serves as a major corridor for human dispersal with multiple waves of migration and admixture, representing a sizeable fraction of global genetic diversity (Cann 2001; Basu et al. 2003), and the present mitochondrial DNA (mtDNA) gene pool of India was shaped by initial settlers and was instigated by minor events of gene flow from the East and West (Chandrasekar et al. 2009). Autosomal genetic evidence indicates that most of the ethnolinguistic groups in India have descended from a mixture of two divergent ancestral populations: Ancestral North Indians (ANI) related to People of West Eurasia, the Caucasus, Central Asia and the Middle East, and Ancestral South Indians (ASI) distantly related to indigenous Andaman Islanders (Reich et al. 2009). It is presumed that proto-Dravidian language, most likely originated in Elam province of South Western Iran, and later spread eastwards with the movement of people to the Indus Valley and later the subcontinent India (McAlpin et al. 1975; Cavalli-Sforza et al. 1988; Renfrew 1996; Derenko et al. 2013). West Eurasian haplogroups are found across India and harbor many deep-branching lineages of Indian mtDNA pool, and most of the mtDNA lineages of Western Eurasian ancestry must have a recent entry date less than 10 Kya (Kivisild et al. 1999a). The frequency of these lineages is specifically found among the higher caste groups of India (Bamshad et al. 1998, 2001; Basu et al. 2003) and many caste groups are direct descendants of Indo-Aryan immigrants (Cordaux et al. 2004). These waves of various invasions and subsequent migrations resulted in major demographic expansions in the region, which added new languages and cultures to the already colonized populations of India. Although previous genetic studies of the maternal gene pools of Indians had revealed a genetic connection between Iranian populations and the Arabian Peninsula, likely the result of both ancient and recent gene flow (Metspalu et al. 2004; Terreros et al. 2011). Therefore, studies involving uncovered Indian tribal population are warranted to understand the genetic connection with the neighboring gene pool in the evolution of modern human beings.

To know the mtDNA lineages in Indian population many studies were carried out covering both tribe and caste communities for the last one and half decade or so (Bamshad et al. 1996, 1998, 2001; Kivisild et al. 1999a, 2003; Majumder 2001, 2010; Roychoudhury et al. 2000, 2001; Edwin et al. 2002; Basu et al. 2003; Cordaux et al. 2003; Palanichamy et al. 2004, 2015; Metspalu et al. 2004; Rajkumar et al. 2005; Sun et al. 2005; Thangaraj et al. 2005, 2006, 2008, 2009; Kumar et al. 2008; Chaubey et al. 2008; Chandrasekar et al. 2009). The present study on the complete mitochondrial genome of Melakudiya tribal population of southern India provides a comprehensive structure of mtDNA phylogenetic distribution and molecular classification of haplogroups of the Indian matrilineal gene pool. The data obtained from this study are compared with previously published data on the phylogenetic distribution of the West Eurasian lineages.

Materials and methods

Ethics statement, sample collection, and complete mtDNA sequencing

The Institutional Ethical Committee of the Anthropological Survey of India and the University of Mysore approved the protocol and ethical clearance of the study. Written informed consent was obtained from all the 113 healthy unrelated subjects belonging to Melakudiya tribal population, a Dravidian speaking tribe from the Kodagu district of Karnataka, Southern India. Genomic DNA from whole blood was extracted using a standard phenol–chloroform method (Sambrook et al. 1989). Complete mtDNA genome was amplified with 24 standard primers (Rieder et al. 1998), and were checked on 2% agarose gels and were directly sequenced using Big Dye Terminator v3.1 Cycle Sequencing Ready Reaction Kit and ABI PRISM 3730 DNA Analyzer (Applied Biosystems, Foster City, CA, USA). The resulting sequences were analyzed with the Seqscape v2.5 software (Applied Biosystems, Foster City, CA, USA). Mutations were scored by comparing the sequences with the revised Cambridge Reference Sequence (rCRS) (Andrews et al. 1999) and aligned with MEGA v7 (Kumar et al. 2016) and BioEdit Sequence Alignment program (Hall 1999). The 46 complete mtDNA sequences reported in this study have been submitted to the GenBank database (http://www.ncbi.nlm.nih.gov/Genbank/, accession numbers: MG649324–MG649328; MH368695–MH368735) (Supplementary File Table S1).

Phylogenetic analysis

Putative haplogroup identification was done using Mitomaster (Brandon et al. 2009), based on PhyloTree Build 17 (Van Oven and Kayser 2009). Maximum parsimonious trees of the complete mtDNA sequences were reconstructed manually by computing the median-joining network algorithm using NETWORK5.1 (Bandelt et al. 1999). For the tree reconstruction, a total of 46 complete mitogenome sequences from the present study and previously published literature by Palanichamy et al. (2004, 2015), Derenko et al. (2013), Khan et al. (2013), Vyas et al. (2016), Margaryan et al. (2017), Sahakyan et al. (2017) and Peng et al. (2018) were utilized for analysis, excluding insertion sites (Supplementary File Table S2).

Molecular dating

The age estimates of the coding-region at positions 577–16,023 (Andrews et al. 1999) with 95% confidence intervals were estimated using the rho (ρ) statistic and standard errors (σ), and the variances of rho-based dating were calculated according to Saillard et al. (2000) using two previously described mutation rates. The first mutation rate based on substitution rate of the entire coding region 1.26 × 10−8 mutations per nucleotide per year, which yields 5.39 years for each mutation (Mishmar et al. 2003). The second mutation rate based on substitution rates for protein-coding synonymous changes of 3.5 × 10−8 mutations per nucleotide per year, which yields 7884 years per synonymous mutation (i.e., transition or transversion) (Soares et al. 2009).

Result and discussion

The spread of mtDNA haplogroup HV14 in South India

The mtDNA haplogroups detected, together with their frequencies are illustrated in (Table 1), haplogroup assignment for each individual according to the nomenclature of PhyloTree Build17 (Van Oven and Kayser 2009). In the present study we have reconstructed the phylogeny of haplogroup HV14 based on 49 compete mitogenomes which include 11 previously published sequences (Palanichamy et al. 2004, 2015; Khan et al. 2013; Derenko et al. 2013; Vyas et al. 2016; Margaryan et al. 2017 and; Peng et al. 2018) and 38 newly generated sequences from Melakudiya samples. With the addition of a substantial set of Melakudiya sequences to the tree, gives a branching point at HV14a1 node, now defined by a coding and control region transitions at 146 and 3834, which was defined as a novel subclade HV14a1b (Fig. 1). Further, two Melakudiya sequences shared a transition at 16,274, which allowed us to reveal another novel subclade HV14a1b, where additional coding region insertion at 5642.1 T and transition at 1842 was detected in the two individual sequences (Fig. 1). mtDNA haplogroup HV14 has prominence in North/Western Europe, West Eurasia, Iran, and South Caucasus to Central Asia (Malyarchuk et al. 2008; Schonberg et al. 2011; Derenko et al. 2013; De Fanti et al. 2015). Although Palanichamy identified haplogroup HV14a1 in three Indian samples (Palanichamy et al. 2015), it is restricted to limited unknown distribution. In the present study, by the addition of considerable sequences from the Melakudiya population, a unique novel subclade designated as HV14a1b was found with a high frequency (43%) allowed us to reveal the earliest diverging sequences in the HV14 tree prior to the emergence of HV14a1b in Melakudiya. Furthermore, four sequences from Pamiris of Tajikistan and one sequence from Artsakh of South Caucasus did not resolve within the HV14a tree and was branched as HV14* (Fig. 1). The coalescence age for haplogroup HV14 in this study is dated ~ 16.1 ± 4.2 kya and the founder age of haplogroup HV14 in Melakudiya tribe, which is represented by a novel clade HV14a1b is ~ 8.5 ± 5.6 kya (Table 2).

Table 1 Frequencies of Western Eurasian haplogroups detected in Melakudiya tribal population
Fig. 1
figure 1

Maximum Parsimonious tree of complete mitogenomes constructed using 38 sequences from Melakudiya tribe and 11 previously published sequences belonging to haplogroup HV14 [Supplementary file Table S2] Suffixes @ indicate back mutation, a plus sign (+) an insertion. Control region mutations are underlined, and synonymous transitions are shown in normal font and non-synonymous mutations are shown in bold font. Coalescence ages (Kya) for complete coding region are shown in normal font and synonymous transitions are shown in Italics

Table 2 Comparative phylogenetic age estimates for Western Eurasian haplogroups HV14 and U7 using the rho (ρ) statistic dating method

A likely in-situ origin of subhaplogroup U7a3a1a2 in India

Unlike, haplogroup HV14, haplogroup U7, has a geographic distribution ranging from Europe to India and is prominent in the Near East, Central Asia to South Asia (Quintana-Murci et al. 2004; Metspalu et al. 2004; Kim et al. 2010; Li et al. 2010; Sahakyan et al. 2017). Focusing on the Melakudiyas, the reconstructed phylogeny of haplogroup U7 based on 26 complete mitogenomes which includes 18 sequences from previous study (Palanichamy et al. 2015 and; Sahakyan et al. 2017) and eight newly generated sequenced samples from Melakudiya revealed the complete mtDNA sequences form a distinct clade U7a3a1a2, of which a consortium of 11 sequences belonging to the people of Tamil Nadu, Kerala, Karnataka (both tribe and castes) of Dravidian linguistic family, five sequences of the Indo-European speaking people from Gujarat, Maharashtra and Uttar Pradesh, one Sri Lankan Tamil, and one Kuwaiti sequence represented the reconstructed U7 tree (Fig. 2). The coalescence age of haplogroup U7a3a1a2 dates to ~ 13.3 ± 4.0 kya. The eight Melakudiya sequences were defined by transition at 10,238, which was classified within the U7a3a1a2* node, additional three coding region mutations (5147–10,245–13,745) and one control region mutations (16,189) was detected in four sequences (Fig. 2) and the founder age of haplogroup U7a3a1a2* in Melakudiyas tribe is dated ~ 12.8 ± 9.6 kya. Studies on mtDNA control region sequences have also detected the presence of U7 lineages with haplotypes (viz. 151–16,069–16,260–16,274–16,318 T) in Afghanistan, Tajikistan (Irwin et al. 2010); Sri Lankan (Ranaweera et al. 2014); in Dravidian (Andhra Brahmans) and Indo-European (Gujarat) speakers of India (Metspalu et al. 2004) (Supplementary File Table S3). Coalescence ages estimated for the complete coding region for the main and sub-branches of HV14 and U7a3a1a2 trees and coalescence times based on synonymous mutations are inferred in Table 1.

Fig. 2
figure 2

Maximum Parsimonious tree of complete mitogenomes constructed using 08 sequences from Melakudiya tribe and 19 previously published sequences belonging to haplogroup U7a3a1a2 (Supplementary file Table S2) Suffixes (FS) indicate Frame shift mutation, (T) in 16,318 T is a transversion, Control region mutations are underlined, and synonymous transitions are shown in normal font and non-synonymous mutations are shown in bold font. Coalescence ages (Kya) for complete coding region are shown in normal font and synonymous transitions are shown in italics

The complete mitogenome sequence analysis revealed the genetic diversity of the Melakudiya tribe, has an origin from a common maternal ancestral gene pools with the people of Iran, South Caucasus, and Central Asia, with the influence of west Eurasia component haplogroups HV14 and U7, suggest the migration might have occurred by a major agricultural event in Europe to the Near East during the Neolithic, which was accompanied by extensive immigration by vast demic population diffusions resulting in branching out of founder lineages into many daughter clades which explored the variation of West Eurasian haplogroup HV14 in the Melakudiya tribal population, which also supports the Elamo-Dravidian linguistic connections, with uniquely shared ancestry with populations of Iran (Derenko et al. 2013), also clustering with the Central Asia populations (Margaryan et al. 2017; Peng et al. 2018). Although, haplogroup U7 has its origin from the Near East and is widespread from Europe to India, the phylogeny of Melakudiya tribe with subclade U7a3a1a2 clusters with populations of India (caste and tribe) and neighboring populations (Irwin et al. 2010; Ranaweera et al. 2014; Sahakyan et al. 2017), hint about the in-situ origin of the subclade in India from Indo-Aryan immigrants. Furthermore, the newly branched recent subclades within haplogroup HV14 and U7 with most recent common ancestor split after the Last Glacial Maximum (LGM) ~ 20 kya, apparently resulting from glacial bottleneck from which the largest fraction of surviving lineages are originated. With the subclades of HV14 and U7 lineages detected in the Melakudiya tribal population, put forth an idea about the earliest settlement of the tribal population before the origin of caste system in India, suggesting either dispersal of an out-group population from Iranian Plateau migrated to India as a demic migrant and later settled in Southern India realm, with change in time the Melakudiya tribe acquired new haplotypes which deep branch into novel subclades. Nevertheless, they still practice their traditional agriculture and genetically with West Eurasian lineages traced back to Iranian Plateau indicating Neolithic genetic continuity. This study provides a comprehensive history for reconstructing ancient migration events.