Introduction

The Kurds are the fourth largest ethnic group in the Middle East (Hassanpour and Mojab 2005). Their original geographic region, the mountainous northern perimeter of the historic Fertile Crescent—the cradle of civilization, farming and domestication—extends between the Black Sea and Mesopotamia on one side, and the Anti-Taurus region and Iranian Plateau on the other (Izady 1992). They speak an Indo-European language from the northwestern branch of the Indo-Iranian subfamily, closely related to Persian. Undoubtedly, they belong to the Aryan stock who were nomadic tribes currently considered as plausible ancestors of most of the present Iranian’s, fairly pure, as might be expected from their isolated geographical position (Cumberland 1926). They may represent the descendants of the first shepherds that occupied the Kurdistan highlands since the first Neolithic after the technologies for food production were discovered (Braidwood and Howe 1960; Comas et al. 2000). For the time being, the generally accepted grouping of the Kurds based on the Kurdish dialects is that presented by D. N. Mackenzie (MacKenzie 1962), who distinguished four groups which are quite distinct from each other: (i) Northern Kurdish, also known as Kurmanji, is the most widely spoken variety of Kurdish; (ii) Central Kurdish has one main subgroup, Sorani; (iii) Southern Kurdish known as Kalhori; and (iv) Hawrami Kurdish found in Hawraman region in the Middle Zagros.

Genetic evidence concerning the origins of the Kurds is limited. Using classical markers allele frequencies, Cavalli-Sforza et al. (1994) showed that the Kurmanji groups from the Caucasus region are often outside the range observed for European populations, but they showed genetic similarities with other populations from the Middle East (Cavalli-Sforza et al. 1994). On the other hand, Comas et al. (2000) found that these Kurmanji groups present mitochondrial DNA (mtDNA) lineages that are typical for the European gene pool (Comas et al. 2000). MtDNA, due to its high mutation rate, haploid and strictly maternal mode of inheritance, has been accepted to be a highly informative tool for studying population relationships and tracing past demographic scenarios. Following the early findings, a number of international research studies were performed on the origin and genetic landscape of the Kurds, exclusively on the Kurmanji Kurds and unique to mtDNA (Richards et al. 2000; Quintana-Murci et al. 2004). Nasidze et al. (2005) introduced the first main genetic study on the Kurmanji groups based on the mtDNA and Y-chromosome variation. According to the results based on mtDNA, different Kurmanji groups demonstrated closer genetic relationships with the European groups than with the Caucasian groups; while the opposite results using the Y-chromosome, indicating some mismatch in their maternal and paternal histories (Nasidze et al. 2005). However, these studies were focussed only on the Kurmanji groups from Georgia, Eastern Turkey and Northern Iraq. Therefore, they may not reflect properly the possible genetic heterogeneity of the Kurds.

Despite the presence of all four major Kurdish groups in western Iran, so far, no comprehensive study has been conducted in this particular area. In the present study, using phylogeographic and population genetic analyses, we have extended previous observations by analysing the mtDNA D-loop sequence in a larger cohort consisting of individuals representing four Kurdish groups from western Iran, east of the Kurdish-inhabited region, as well as we have compared them to available data from the world populations in order to understand the origins, past demographic scenarios, any specific population structure governing the mtDNA diversity, and genetic relationships between four Kurdish groups from Iran and with other Eurasian populations.

Materials and methods

Population samples

A total of 79 blood samples were collected from unrelated males, representing four Kurdish groups from Iran: Sorani, Hawrami, Kurmanji and Kalhori (table 1; figure 1). Informed consent of all the donors and information about their ethnic background were obtained. Only native people were included in the sample; the Kurdish ancestry was ascertained for three generations. For population comparison purposes, the mtDNA D-loop sequence data from several Middle Eastern (GenBank acc. no. EU239582–EU239560 and EU239536–EU239655; Schönberg et al. 2011; Al-Zahery et al. 2013), European (Fraumene et al. 2006; Ingman and Gyllensten 2006; Cardoso et al. 2013; Mielnik-Sikorska et al. 2013), Caucasian (Schönberg et al. 2011), South Asian (Malyarchuk et al. 2010; GenBank acc. no. EU239560 and EU239536–EU239655), Central Asian (Irwin et al. 2010), North Asian (Volodko et al. 2008), East Asian (Cheng et al. 2008) and West Asian (Malyarchuk et al. 2010) populations were taken from NCBI.

Table 1 Study populations and additional data about the original and published mtDNA datasets.
Figure 1
figure 1

Map of the Kurdish-inhabited areas in the Middle East (a) after Izady 1992; and (b) our sampling localities in western Iran.

DNA extraction and typing

Genomic DNA was extracted using a standard phenol–chloroform method. Approximately 655 bp were amplified from 5 region of the mtDNA D-loop gene using two specific primers (Azimzadeh-Irani 2011): HDL29 (5 -GGTCTATCA CCCTATTAACCACT-3 ), and HDH901 (5 -ACTTGGGTT AATCGTGTGA-3 ). The 25 μL PCR reaction mixes included 16 μL of ultrapure water, 2.5 μL of MgCl 2 included 10 × PCR buffer, 1.2 μL of each primer (0.01 mM), 0.5 μL of each dNTP (0.05 mM), 0.2 U of Taq polymerase, and 1.4 μL of DNA template. The thermal cycler (Primus 96 advanced Gradient, PeqLab, Germany) regime for amplification consisted of an initial step of 3 min at 95 C, followed by 30 cycles (30 s at 94 C, 45 s at 57 C, and 45 min at 72 C), with a final extension of 10 min at 72 C. Best quality products were selected for DNA typing using an ABI Prism TM 3730 Genetic Analyzer (Applied Biosystems, Foster City, USA) by the Macrogen Company of South Korea.

Multiple alignment and data analysis

The BioEdit 7.1 (Hall 1999) was applied to read the electropherograms, whereas ClustalW implemented in the Mega 6.0 (Kumar et al. 2008) was used to align all sequences with the revised Cambridge Reference Sequence (rCRS). The mtDNA haplogroups were determined on the basis of diagnostic sites using the PhyloTree 16 (Van Oven and Kayser 2009). To reduce the error rate during haplogroup classification process, we also applied the algorithm implemented in the HaploGrep 2.0 (Kloss-Brandstätter et al. 2011). The best-fit model of sequence evolution was detected by the jModelTest 2.1.3 (Posada 2008) based on the corrected Akaike information criterion (AICc); (Hurvich and Tsai 1989). Genetic diversity statistics, pairwise F ST values (with 10,000 permutations), and analyses of molecular variance (AMOVA) based on different criteria were calculated using the Arlequin 3.5 (Excoffier and Lischer 2010). The F ST values were used to make a neighbour-Joining (NJ) tree of populations. Historical demography was examined following two different approaches: (i) neutrality tests including Tajima’s D (Tajima 1989), Fu’s F s (Fu 1997) and R 2 (Ramos-Onsins and Rozas 2002); and (ii) mismatch distribution (MMD; Harpending 1994). The statistical significance of the observed data to the expected distribution modelled for sudden expansion growth was calculated based on sum of squared deviations (SSD) and Harpending’s raggedness index (Harpending 1994). The population parameter τ (post-expansion time in mutational units) was estimated from the model of sudden expansion based on the following equation τ=2μ t (Slatkin and Hudson 1991), where μ is fragment length dependent mutation rate and t is post-expansion time in years. The Mantel test (Mantel 1967) implemented in the Vegan library (Dixon 2003) was applied to evaluate isolation by distance (IBD) using the R 3.1.3 (Team 2012).

Results

Haplogroup profile distribution

A total of 580 bp of the mtDNA D-loop region, tRNA Phe, and 12S rRNA sequence comprising nucleotide positions 156 to 736 were determined for individuals. Thirteen major haplogroups including nine Western Eurasian (H, HV, U, J, K, W, T, N*, R*, and V), two Eastern Eurasian (G and M1), and one sub-Saharan African (L3*) were identified in the Kurdish groups (table 2). Three of 13 haplogroups including V (V22 in the Hawrami group), M1 (M1a in the Iraqi Kurds), and G (G3b in the Hawrami group) were singletons and the other haplogroups were shared between two or more populations. Overall, the Kurds mtDNA pool was characterized by the high frequencies of Western Eurasian haplogroups: H (37.1%), U (13.8%), J (8.48%), K (9.54%), R* (7.42%), N* (6.36%), and HV (4.24%). The Hawrami group stood out in its lack of two common and high frequency haplogroups U and J, and in a higher than average frequency of haplogroup H (71.42% as compared with 38.27% in the other Kurdish groups). The Kurdish groups showed small fractions of the Eastern Eurasian lineages (M1a in the Iraqi Kurds (6.66%) and G3b in the Hawrami group (4.76%); overall 2.13%). The sub-Saharan African lineages were virtually absent in the analysed Kurdish groups except L3d in the Kurmanji (5.26%) and Kalhori (5.26%) groups, and L3e in the Iraqi Kurds (6.66%).

Table 2 MtDNA haplogroup frequencies in the Kurdish groups.

Genetic diversity

Subsequent analyses were restricted to 418 bp of mtDNA D-loop region (156–573). We recognized 199 haplotypes in 2157 individuals which 24 of them observed in the Iranian Kurdish groups. From these, eight of 24 haplotypes (33.33%) were singletons and 16 (66.67%) were shared between populations. The haplotypes no. 1 and no. 4 showed the first and second highest frequencies in the Iranian Kurds (in 48 and four individuals, respectively). The global haplotype diversity was 0.774 ± 0.007, ranging from 0.865 ± 0.058 for the Iranian Arabs, to 0.466 ± 0.131 for Eskimos (table 3). The haplotype diversity for the pooled samples of the Kurds was 0.668 ± 0.056, ranging from 0.847 ± 0.087 for the Iraqi Kurds, to 0.500 ± 0.132 for the Hawrami group. Other Kurdish groups presented haplotype diversities of 0.736 ± 0.111 (Kalhori) to 0.705 ± 0.111 (Sorani) and 0.614 ± 0.130 (Kurmanji). The nucleotide diversity range from 0.0073 ± 0.004 for the Iraqi Kurds to 0.0025 ± 0.002 for the Hawrami group. The high and low levels of mtDNA diversity observed in the Iraqi Kurds and Hawrami group were clearly evident in the high and low levels of mean number of pairwise differences.

Table 3 Basic statistics of genetic diversity and neutrality tests for the study populations.

Population differentiation and genetic structure

The best-fit nucleotide substitution model determined; TN93 + G; TN93 (Tamura and Nei 1993) distance method and a gamma correction (Meyer et al. 1999) value of 0.437 was used to compute pairwise F ST values (table 4). Interestingly, the F ST values between the Hawrami group and other Kurdish groups were much higher than those between the other Kurdish groups. Even its F ST value with the Iraqi Kurds population was significant (F ST = 0.061, P < 0.05). Overall, the F ST values among four Kurdish groups from Iran were not significantly different from zero, implying that they are genetically inseparable. Thus, all Kurdish samples from Iran were considered as a single population (i.e. the Iranian Kurds population; In_Ku) and the F ST values recomputed (table 4). In most cases, the estimated F ST values were positively significant with the exception of those with its geographic neighbours including the Armenians, Azeris, Iranian Arabs, Iraqi Kurds and Turks from Turkey. The Iranian Kurds population respectively showed the highest F ST values with Tharu (0.341) and Changpa (0.238) from south, Eskimo (0.2) from northeast, Mongol (0.159) from east, and Kirgiz (0.135) from Central Asia. The NJ tree displays the Kurdish, Caucasian, Middle Eastern and European populations at one edge, the South Asian populations (i.e. Tharu and Changpa) at the opposite end, and the Central Asian populations in tree centre (figure 2). The correlation between geographic and pairwise F ST distance matrices was highly positive and significant (r=0.631, P=0.001), implying that geographic distance has limited gene flow and extremely influenced the genetic structure.

Table 4 Pairwise F ST values among the study populations.
Figure 2
figure 2

NJ tree of populations based on pairwise F ST values.

The AMOVA result for geographical classification of all populations revealed that 4.73% of total mtDNA variation was related to differences among the geographical groups, whereas interpopulation and intrapopulation differences explained 1.50% and 93.76% of variation, respectively (table 5). When population samples were subdivided according to linguistic affiliation, among group’s percentage of variance (2.31%) was significantly lower than among populations within groups (3.11%). To further investigation on the patterns of genetic variation in smaller geographic space, we performed a second AMOVA only for the Iranian populations. Compatible with the first analysis, the regional AMOVA analysis also revealed that a geographic classification of the Iranian populations for the observed mtDNA diversity gave a highly better fit than what linguistic classification did.

Table 5 AMOVA based on different criteria.

Demographic history

Being strongly negative and significantly different from zero Tajima’s D and Fu’s F s, and small and significant R 2 values for pooled dataset of the Kurds, these estimates reflect an excess of low-frequency variants resulted from recent demographic expansion in the inspected mtDNA pool (table 3). However, in the population analyses, neutrality tests represented different results: significant (with the exception of the Iraqi Kurds) and negative Tajima’s D and significant positive R 2 values indicated recent expansion of each Kurdish group, while insignificant negative Fu’s F s value for each group provided support for rejecting the hypothesis of population growth probably due to the strong effects of drift and / or the small sample size. The MMD analysis when applied to the pooled dataset of the Kurds was unable to reject the model of sudden expansion (P (sim ≥ obs) > 0.05). The MMD curve of haplotypes showed a skewed exponential shape due to excess frequencies of small pairwise differences (figure 3). Altogether, these results implied an excess of recently diverged haplotypes and a deficit of deeper coalescence events. With a mutation rate of 32% per site per Myr (Sigurðardottir et al. 2000), the τ value of 2.57 was obtained by MMD analysis on the pooled dataset of the Kurds translated into an expansion time of ∼9,500 YBP. Due to insignificant F ST values among the Kurdish groups and the disagreement of neutrality tests in confirmation of population growth for each individual Kurdish group, we were unable to perform the MMD analysis for any Kurdish group, individually.

Figure 3
figure 3

MMD for the pooled dataset of the Kurds.

Discussion

Despite the high linguistic heterogeneity, our results revealed close genetic relationships among the Kurdish groups. Prior to this study, based on a linguistic tree of the Iranian ethnic groups (Zarei 2013), we hypothesized the significant genetic differentiation between the Hawrami group and other Kurdish groups. However, despite its greater F ST values, the only statistically significant F ST value belonged to the Havrami group / Iraqi Kurds. The Iranian Kurds population presented the insignificant low F ST values with its geographic neighbours, the significant low F ST values with other Western Eurasian Indo-European populations and the highest F ST values with South and Eastern Eurasian populations. Our results suggested that linguistic classification of the populations do not provide an infallible guide to ancestral relationships. Presence of a sharp geographical structure both on global and regional scales for mtDNA D-loop diversity, in agreement with study of Nasidze et al. (2005), was supported by the AMOVA analyses. Also, the Mantel test, in confirmation of isolation by distance, revealed a high and significant correlation indicating that the level of genetic resemblance between populations is strongly depend on geographic distance (Mantel 1967). These results suggested that the real affinities of the Kurdish groups with other Indo-European populations is partly masked by the effects of gene flow from neighbouring non Indo-European populations, as represented by the NJ tree based on pairwise F ST values.

The Kurdish groups were characterized by the presence of substantial western Eurasian mtDNA lineages. This testifies to a hypothesis that ancestors of the modern Kurds originally diverged from the proto-Indo-European population. The prevailing haplogroups in the East Eurasia such as M1a and G3b, and the sub-Saharan African haplogroups such as L3d and L3e were rare or even absent in the Kurdish groups. However, the role of recent gene flow from the east Eurasian (probably during the Mongol and Turkish invasions (Meri 2005), or trade along the Silk Road until the 16th century AD (Yang et al. 2008)), and the West-Central African groups in shaping the current mtDNA pool of the Kurds cannot be ignored.

Typically, the low genetic diversity is the characteristic of small or bottlenecked population. Such populations show a rapid increase in the number of identical haplotypes (Hoelzel et al. 1993; Weber et al. 2000). The very high frequency of H derived lineages and the lack of two common and high frequency haplogroups U and J in the Hawrami group are indicative of this idea. The low mtDNA diversity observed in the Havrami group indicated a bias towards isolation or increased drift due to small population size. This group living in the Hawraman region of the Middle Zagros is considered as the smallest and most isolated Kurdish group. The well- known local Kurdish sentiment that ‘they have no friends but the mountains’ demonstrates their ethnic and geographical isolation (Bulloch and Morris 1992). Surprisingly, the low mtDNA diversity observed in the Kurmanji group can be explained by drift or mass migration events during its contemporary history (i.e. 5–12th century AD; Izady 1992). Later, at the beginning of the 16th century AD, particularly after the Chaldiran battle in 1514, the Safavid dynasty (16–18th century AD) initiated the mass migration of the Kurmanji Kurds from Northwestern Iran to the North Khorasan province (figure 1), which was primarily aimed at dispersing the compact Kurdish population in the border zone with the Ottomans, as well as at making a defense line in the northeast frontiers of the country against the constant inroads of the Turkmen and Uzbek nomads (Madih 2007).

Here, to detect the population size growth, we used three neutrality tests which differ somewhat in their approach. However, Ramos-Onsins and Rozas (2002) showed that the most powerful tests are Fu’s F s and the recently developed R 2. Fu’s F s did not detect significant deviations from neutrality for any individual Kurdish group, but this may be due to the lack of power to detect deviation in small datasets (Ramos-Onsins and Rozas 2002). On the other hand, the behaviour of the R 2 test is superior for small sample sizes. In population analyses, the significant positive R 2 estimates were consistent with Tajima’s D results, indicating that populations were not in genetic equilibrium or decline. However, the Kurdish groups, as a whole, had significant negative D and F s, and significant positive R 2 values, which reflect an excess of low-frequency variants in the surveyed mtDNA pool resulted from recent demographic expansion. The MMD curve of the total Kurds mtDNA haplotypes showed a skewed exponential shape, but smooth unimodal distribution with the low raggedness value indicating that a single expansion occurred at some time in the past, ∼9,500 YBP, when farming and cattle breeding in the modern societies appeared first in the Fertile Crescent (Diamond and Bellwood 2003; Gupta 2004; Bellwood and Oxenham 2008). The introduction of agriculture and domestication to pre-Neolithic hunter-gatherer societies significantly impacted on the establishment and distribution of ancestral Kurdish population, more or less parallel with the expansion of the first farmers into Europe from the Fertile Crescent (Diamond and Bellwood 2003). Our results are largely in agreement with the time estimates based on the zoological and botanical remains of farming and domestication which have been found at three main archaeological sites in the Kurdish-inhabited region: (i) Çayönü (Diyarbakır, Turkey; dating to 9,000 YBC; (Braidwood et al. 1971)), (ii) Ganj Dara (Kermanshah, Iran; dating to 10,000 YBP; (Zeder and Hesse 2000)), and (iii) Jarmo (Sulaymaniyah, Iraq; dating to 8,000 YBP; Braidwood and Howe 1960)). Also, the signature of recent expansion of the Kurds is well represented in the high frequency of H derived lineages, as a recent study by Fu et al. (2012) suggested that the H-type in the European mtDNA genome shows a population expansion related to the prevalence of animal domestication and cultivation around 9,000 YBP and continuing to the present. In contrast, the U-type mtDNAs show a population growth between 20,000 and 10,000 YBP seeming to represent earlier hunter-gatherers that initiated farming practices and admixed with immigrant populations (Fu et al. 2012). In agreement with this scenario, the presence of relatively high frequency haplogroup U in the Kurds may testify the earlier hunter-gatherers in the Kurdish-inhabited area that adopted and admixed the Kurds ancestors, following the expansion.

Overall, our mtDNA data has supported the scenario of recent demographic expansion of the Kurds related to the Neolithic transition in the Near East, a scenario that is largely in agreement with a significant body of archaeological evidence. However, to provide more details about the peopling of the current Kurdish-inhabited region, it is also necessary to employ the nonrecombining Y-chromosomal (NRY) markers.