Introduction

All Triticum species are native to the ‘Fertile Crescent’ of the Near East, which encompasses the eastern Mediterranean, south-eastern Turkey, northern Iraq, western Iran, and its neighboring regions of the South Caucasus, and Northern Iran (Matsuoka 2011). South Caucasus (notably Georgia) and their earlier residents played an important role in wheat evolution and domestication. Georgian indigenous species include five Triticum species and subspecies (Menabde 1961; Hammer et al. 2011):

  1. 1.

    Triticum turgidum subsp. palaeocolchicum (T. karamyschevii Nevski)

  2. 2.

    Triticum turgidum subsp. carthlicum (T. carthlicum Nevski)

  3. 3.

    Triticum timopheevii (Zhuk.) Zhuk.

  4. 4.

    Triticum zhukovskyi Menabde et Ericzyan

  5. 5.

    Triticum aestivum L. subsp. macha (T. macha Dekapr. et Menabde).

Some of these species bear an evolutionarily close affinity to wild wheat species or have retained some of their features (Menabde 1961).

According to Wang et al. (1997), the wheat genus Triticum contains six species, grouped into three sections: Sect. Monococcon (consisting of diploid species: Triticum monococcum L. and Triticum urartu Tumanian ex Gandilyan); Sect. Dicoccoidea (consisting of tetraploid species: T. turgidum L. and T. timopheevii (Zhuk.) Zhuk.); and Sect. Triticum (consisting of hexaploid species: T. aestivum L. and T. zhukovskyi Menabde et Ericzjan). Three types of plasmon are typical for genus Triticum-A, B and G. Plasmon A is observed in T. monococcum, G-in Triticum Group Timopheevi. Plasmon B is detected in polyploid species—genus Triticum groups emmer and common: T. turgidum L. and T. aestivum L. As Plasmon B donor of all polyploid wheats Aegilops speltoides Tausch is accepted (Wang et al. 1997).

Analysis of the complete chloroplast genome sequence can shed light on polyploid wheat origin and evolution. Traditionally, extranuclear DNA, such as chloroplast DNA is considered as an effective tool of genealogic studies (Yamane and Kawahara 2005; Matsuoka et al. 2005; Tabidze et al. 2014). Next-generation sequencing technologies, which have been developed in recent years, enable the determination of the complete nucleotide sequence of chloroplast DNAs of many higher plants, including wheat and its relatives. The new methodology of chloroplast DNA sequencing was developed (see Tabidze et al. 2014). Briefly, this technique mixes and sequences the genomic DNA from many cultivars in one Illumina lane and barcodes each library to identify which reads are coming from which cultivar. Unlike the number of reads from chromosomal DNA, which are low and are thus ignored, the number of chloroplast DNA reads are relatively high and sufficient to compute the chloroplast DNA sequence. This methodology gives possibility for simultaneous sequencing of large number of chloroplast DNA without preliminary chloroplast isolation or chloroplast enrichment.

This technology was used to analyze G and D plasmons of Triticum and Aegilops (Gogniashvili et al. 2015, 2016). The complete sequence of chloroplast DNA of three species of Zanduri wheat (T. timopheevii, T. zhukovskyi, T. monococcum var. hornemannii (Clem.) Körn.) and wild species T. araraticum Jakubz. as well as chloroplast DNA (plasmon D) of nine accessions of Aegilops tauschii Coss. and two accessions of Aegilops cylindrica Host were determined. The complete nucleotide sequence of chloroplast DNA of Timopheevi group and Aegilops species allowed to refine the origin and evolution of G and D plasmons.

According to organelle genome resources of NCBI 21 complete sequences of chloroplast DNA of the genus Triticum have been published up to date. It should be noted that some inaccuracies were detected in the sequenced chloroplast DNAs. This is indicated in our article (Gogniashvili et al. 2015), also by other authors. Bernhardt et al. (2017) were not able to confirm all parts of NCBI-derived sequences obtained from whole-genome shotgun sequencing. They most likely contain some non-identified assembly errors. Bahieldin et al. (2014) analyzed published plastid genome of T. aestivum (‘Chinese Spring’) (Ogihara et al. 2002) and found some errors. The inaccuracies during sequencing can cause further inaccuracies at the construction of phylogenetic trees and this can result in an inaccurate interpretation of the results. For this reason, we found it necessary to use only the sequences that were obtained by using our method in our laboratory.

The purpose of this work was to find out the path of evolution of the plasmon B based on complete chloroplast DNA sequence of Georgian indigenous polyploid wheats. In the forthcoming paper complete nucleotide sequence of chloroplast DNA (Plasmon B) of 11 representatives of Georgian polyploid species of the genus Triticum, carrying Plasmon B was determined.

Materials and methods

Plant material, DNA isolation, PCR analysis, genomic DNA library preparation and sequencing on an Illumina MiSeq platform

The seeds of wheat species were received from the seed bank of the Agricultural University of Georgia and Institute of Botany, Ilia State University. The seeds were germinated in water at room temperature. Total genomic DNA was extracted from young leaves. The leaves were frozen and ground in liquid nitrogen and DNA was isolated according to the modified Marmur’s procedure (Beridze et al. 2011).

Genomic DNA libraries were constructed using the TruSeq DNA Sample prep kit (Illumina, San Diego, CA, USA). Genomic DNAs were quantified using the Qbit BR reagents (Qbit 2.0 Fluorometer, Life Technologies). Briefly, 1 μg of DNA was sheared into 300 bp fragments on a Covaris M220 focused ultrasonicator (Covaris Inc) using SonoLabTM 7.1 software for 250 cycles/burst, 20.0 Duty factor, 50.0 Peak power, in screw-cap microtubes. After shearing, the DNA was blunt-ended, 3′-end A-tailed and ligated to indexed adaptors. The adaptor-ligated genomic DNA was size selected with AMPure-beads using the gelfree protocol described in the TruSeq DNA Sample Prep manual. Size-selected DNA was amplified by PCR to selectively enrich for fragments that have adapters on both ends. Final amplified libraries were run on an Agilent bioanalyzer DNA 2100 (Agilent, Santa Clara, CA, USA) to determine the average fragment size and to confirm the presence of DNA of the expected size range. The libraries were pooled in equimolar concentration and loaded onto a flow cell for cluster formation and sequenced on an Illumina MiSeq platform. The libraries were sequenced from both ends of the molecules to a total read length of 150 nt from each end. The raw.bcl files were converted into demultiplexed compressed fastq files using Casava 1.8.2 (Illumina). Sequencing of wheat DNA samples was performed at the facilities of the National Centre for Disease Control and Public Health, Tbilisi, Georgia.

Assembly of chloroplast DNA

FASTAQ (a text-based format for storing nucleotide sequence) files were trimmed using the computer program Sickle, a windowed adaptive trimming tool for FASTQ files using quality (https://github.com/najoshi/sickle). The reads were filtered by standard parameters (quality reads—20, cutoff length—20). The reads containing “N” were discarded. The filtered chloroplast reads were assembled using the SOAPdenovo2 software program (version 127mer) (Li et al. 2009). The reads were first de novo assembled into contigs with k-mers 83–93. All contigs were aligned to the reference chloroplast genome sequence using BLASTN (http://www.ncbi.nlm.nih.gov). Merging of large overlapping contigs was performed according to EMBOSS 6.3.1: merger (Rice et al. 2000). Totally, 1,475,398 reads were generated for P_1 accession. Number of reads mapping at kmer 73 was 141,874. 12 contigs 286–56,327 bp length were used for chloroplast genome assembly with the coverage 39. Automatic annotation of cpDNA was performed by CpGAVAS (Liu et al. 2012). For detection of SNP (single nucleotide polymorphism) and Indels (insertion/deletion) and phylogeny tree construction computer programs Mafft and Blast were used (Katoh et al. 2002; Altschul et al. 1990).

Results and discussion

In the present study chloroplast DNA of 11 wheat samples were sequenced. Among them: one subsp. palaeocolchicum, 2 samples of subsp. carthlicum, 6 samples of subsp. macha, 1 sample of T. aestivum. All of them are Georgian indigenous species except D_1-T. turgidum L. subsp. durum (Desf.) Husn (Table 1).

Table 1 Georgian polyploid wheat accessions used in the study

Using subsp. macha-M_6 (T. aestivum subsp. macha as a reference, 5 SNPs can be identified in Georgian indigenous polyploid wheats. The number of noncoding substitutions is 2, coding substitutions—3. One coding substitution in the ndhF gene is synonymous, which does not alter the amino acid. Another coding substitution is detected in the rpS15 gene, which is located in an inverted repeat (Table 2).

Table 2 SNPs specific for chloroplast DNA of Georgian polyploid wheats

In comparison with reference DNA in P_1 (T. turgidum subsp. palaeocolchicum var. chvamlicum) two—38 and 56 bp inversions were observed; 38 bp inversion at position 56121–56158 (intergenic sequence of rbcL-rpl23) and 56 bp inversion at position 135,896-52 (intergenic rps19-psbA). The 38-bp sequence is a palindrome with a 4 bp loop and 17 bp stem. 56 bp inversion is also a palindrome with a 5 bp loop and 25 bp stem. There were six 1 bp indels detected in Georgian polyploid wheats (Table 3), all of them at microsatellite stretches.

Table 3 Indels in chloroplast DNA of Georgian polyploid wheats

Earlier Gornicki et al. (2014) sequenced 8 samples of B plasmon. Small difference can be observed between these sequences and sequences presented in this paper in position 75978-92 (infA-rps 8 intergenic region):

  • This article: TTTTTTTTTTCTCTCC

  • Gornicki et al. 2014: TTTTTTTTTCTCTCCC

In the presented article this region is located in the middle of contigs, which means, that the error is excluded.

To illustrate the evolutionary relationship among the studied cultivars, a phylogenetic tree was constructed based on multiple alignments using Jalview version 2 (Waterhouse et al. 2009). Figure 1 depicts the resulting phylogenetic tree, which shows that subspecies macha, carthlicum and palaeocolchicum occupy different positions.

Fig. 1
figure 1

Complete chloroplast genome phylogeny of Georgian polyploid wheats. a Neighbour joining tree using PID; b average distance tree using PID (Waterhouse et al. 2009); c neighbor-joining tree using MEGA 7. The bootstrap consensus tree inferred from 1000 replicates and the percentage of replicate trees are shown next to the branches. P palaeocolchicum; M macha; C carthlicum; A aestivum; D durum

The simplified scheme based on SNP and indel data of Georgian polyploid wheats was constructed (Fig. 2). According to this scheme the predecessor of plasmon B (chloroplast DNA) of all Georgian polyploid wheats is an unknown X predecessor. Four lines were formed from X predecessor. 1 SNP and two inversions (38 and 56 bp) caused the formation of subsp. paleocolchicum. Three other lines are: 2—macha line; 3—durum line; 4—carthlicum line.

Fig. 2
figure 2

The phylogenetic scheme based on single nucleotide polymorphisms (SNP) and insertions/deletions (indel) of chloroplast DNA priority of Georgian polyploid wheats

Macha line is further divided into two sublines (M_1 and M_4). Carthlicum line includes subsp. carthlicum and T. aestivum—C_1–C_2-A_1. The ancestral, female parent of all studied polyploid wheats is an unknown X predecesor.

Wild emmer (T. turgidum L. subsp. dicoccoides (Körn. ex Asch. et Graebn.) Thell.) is found today in the western ‘Fertile Crescent’ in Jordan, Syria and Israel, the central part of southeastern Turkey and mountain areas in eastern Iraq and western Iran. Based on published molecular and archaeobotanical data and on their own findings, Ozkan et al. (2011) summarized issues concerning geography and domestication of wild emmer wheat. The authors suggest that modern domestic tetraploid wheats derived from wild emmer lines from southeast Turkey (Ozkan et al. 2011). On the basis of genetic and morphological evidence, Georgian wheat (T. turgidum L. subsp. palaeocolchicum) is assumed to be a segregant from a hybrid cross between wild emmer wheat and T. aestivum, whereas Persian wheat (subsp. carthlicum) may be a segregant from a hybrid cross between domesticated emmer wheat and T. aestivum (Dvorak and Luo 2001; Kuckuck 1979, cited according to Matsuoka 2011).

The denotation as “Persian wheat” for subsp. carthlicum confuses the geographical side of the problem. First time this designation was used by N.Vavilov (Loskutov 1999) erroneously. Subsequently Nevski corrected this error by calling it carthlicum (Carthli-central province of East Georgia) though the name Persian wheat is still used (Menabde 1948; Mosulishvili et al. 2017). Common name of this subspecies in Georgian is Dika. In our opinion it is more preferable to denote tetraploid domesticated hulled T. turgidum subsp. palaeocolchicum not as Georgian wheat but as West Georgian wheat and tetraploid, domesticated, free-threshing T. turgidum subsp. carthlicum not as Persian wheat but as East Georgian wheat.

The third subspecies of wheat detected in west Georgia is hexaploid domesticated, hulled spelt wheat T. aestivum subsp. macha. This subspecies was detected in West Georgia in 1928 and described by Dekaprelevich and Menabde (1932). This subspecies is endemic to Georgia and is cultivated along with tetraploid West Georgian wheat (T. turgidum subsp. palaeocolchicum) (Dorofeev et al. 1979). Comparative and molecular genetic analyses suggest that macha wheat is a segregant from a hybrid cross between wild emmer wheat and T. aestivum. It is likely that macha and West Georgian wheats are sibling cultivars that arose in a hybrid swarm involving T. aestivum and wild emmer wheat (Tsunewaki 1968).

Earlier we termed it as the Zanduri Puzzle that the wild T.araraticum Jakubz. (T. timopheevii subsp. armeniacum (Jakubz.) Slageren) was not found in Georgia, even though cultivated T. timopheevii is only detected here (Gogniashvili et al. 2015). This statement is true also for both domesticated tetraploid wheats subsp. paleocolchicum and subsp. carthlicum. The wild emmer was not found in Georgia (and in South Caucasus), though cultivated tetraploid wheats were only detected there.

One of the important questions of wheat domestication is which people(s) participated in this process. The ones who lived in Fertile Crescent 10,000 years ago? When discussing the phylogenies of wheat species and subspecies, it also seems very important also to adhere precisely to not only the place of origin, but also a major diffusion routes of cultivated forms to other regions.

As we mentioned earlier (Gogniashvili et al. 2015), it is not exactly clear how the Georgian tribes came to the South Caucasus and where they lived at the period of wheat domestication. The Kartvelian peoples include: Georgians (Kartvels), Zans (Megrels and Lazs) and Svans. It is proposed that the split of the Proto-Kartvelian language into Svan and Proto-Karto-Zan was a separate development and can be fixed for these languages at approximately 2600 and 4200 years, respectively (Klimov 1998). In the view of Gamkrelidze and Ivanov (2010) based on the evidence of archaic lexical and toponymic data, Proto-Kartvelian, prior to its breakup, must be placed in the mountainous regions of the western and central part of the Little Caucasus (the Transcaucasian foothills).

One of the possibility to explain the “Wheat Puzzle” is that Kartvelian speakers lived (with other peoples) in the area of ‘Fertile Crescent’ and brought some wheat species and subspecies further north to South Caucasus. It is also possible that early farmers (circa 10,000 years ago) brought it into present day Georgia (M. Pagel, pers. comm). The movement of Kartvelian speakers from south to north can be represented as follows: Svans, Megrels, Lazs-north and Georgians (Kartvels)—north-east.

An unsolved problem is the necessity to explain how wild emmer and T. aestivum contacted each other. The habitat (motherland) of wild emmer is Fertile Crescent and that of T. aestivum - South Caucasus. Fertile Crescent and South Caucasus are divided by mountain chain-Pontic, Lesser Caucasus and Zagros mountains. Unfortunately the archaeological literature is not very rich in this subject. May be the wild emmer was brought to the South Caucasus with the cultivated emmer.