Introduction

Hexaploid wheat (Triticum aestivum L., genomes AABBDD) originated in South Caucasus by allopolyploidization of the cultivated Emmer wheat T. dicoccum Schrank (genomes AABB) with the Caucasian Ae. tauschii ssp strangulata (Eig) Tzvelev (genomes DD) (Kihara 1944; McFadden and Sears 1946, cited according Wang et al. 2013; Dvorak et al. 1998; Dubcovsky and Dvorak 2007). Ae. tauschii Coss. (syn. Ae. squarrosa auct non L.) is a diploid (2n = 14, genome DD) goat-grass species which donated its genome D to common wheat, T. aestivum L. It is considered as the most important donor of agriculturally important genes for improvement of common wheat (Kimber and Feldman 1987). Genetic variation of Ae. tauschii is an important natural resource, that is why it is of particular importance to investigate how this variation was formed during Ae. tauschii evolutionary history and how it is presented through the species area. The D genome is also found in tetraploid Ae. cylindrica Host (2n = 28, CCDD). The parents of Ae. cylindrica are Ae. caudata L. (2n = 14, CC) and Ae. tauschii. Ae. caudata and Ae. tauschii overlap the area of Ae. cylindrica, but have no area in common (Nakai 1981). The term Plasmon is used for cytoplasmic (organellar) genomes—chloroplast and mitochondria (Tsunewaki et al. 2002; Gill and Friebe 2002). The plasmons of Ae. tauschii Coss.and Ae. cylindrica Host both belong to the D type. According to Tsunewaki et al. the plasmon diversity that exists in Triticum and Aegilops species is of great significance for understanding the evolution of these genera.

Ae. tauschii is presented by two subspecies, Ae. tauschii Coss. ssp tauschii and Ae. tauschii Coss. ssp strangulata (Eig) Tzvelev with cylindrical and moniliform types of spike, respectively (Eig 1929). In ssp strangulata a relict lineage “t-91s” was found which considerably differ genetically from other accessions of this subspecies (Dudnikov 1998, 2012; Pestsova et al. 2000). The three markedly different gene-pools of Ae. tauschii, i.e. those of ssp tauschii, “usual” ssp strangulata, and relict lineage “t-91s” of ssp strangulata, were designated by Matsuoka et al. (2013, 2015) as TauL1, TauL2 and TauL3, respectively. These designations were finally used after several investigations (Matsuoka et al. 2007, 2008, 2009, 2013, 2015; Mizuno et al. 2010). Chloroplast DNA variation in Ae. tauschii was studied by Matsuoka et al. (2007, 2008, 2009) and four major haplogroups, HG7, HG9, HG16 and HG17 were identified. From those, HG16, HG9 and HG17 belonged to ssp tauschii, ssp strangulata and relict lineage “t-91s”, respectively; while haplogroup HG7 contained Ae. tauschii accessions from both ssp tauschii and ssp strangulata (Matsuoka et al. 2007, 2008, 2009). Analysis of AFLP polymorphism revealed two major gene-pools in Ae. tauschii: those of ssp tauschii and ssp strangulata, designated as L1 and L2 respectively; and Ae. tauschii accessions belonging to HG17 cpDNA (Chloroplast DNA) haplogroup had an intermediate position between L1 and L2 (Mizuno et al. 2010). Later on, the same three gene-pools, L1, L2 and HG17 were identified using DArT analysis (Matsuoka et al. 2015); they were renamed as TauL1, TauL2 and TauL3, respectively; and it was outlined that TauL3 is related to TauL2 (Matsuoka et al. 2013).

Ae. tauschii occupies the vast range, from Turkey to Kirgizia. The Georgian part of the area is of particular interest. Despite it is relatively very small, an essential part of Ae. tauschii genetic variation was pointed out here (Dudnikov 2000, 2012; Pestsova et al. 2000). Therefore Georgia is the only country where relict gene-pool TauL3 is rather common. Besides Georgia, it was pointed out only once, as a local population t-91s in Dagestan (Dudnikov 1998, 2012).

Traditionally, extranuclear DNA, such as cpDNA is considered as an effective tool of genealogic studies (Yamane and Kawahara 2005; Matsuoka et al. 2005; Tabidze et al. 2014; George et al. 2015; Kong and Yang 2015; Vieira et al. 2015; Oldenburg and Bendich 2015). Next-generation sequencing technologies, which have been developed in recent years, enable the determination of the complete nucleotide sequence of both, chloroplast and mitochondrial DNAs of many higher plants, including wheat and its relatives. Recently this technology was used in our lab to sequence cpDNA of three species of Zanduri wheat (T. timopheevii, T. zhukovskyi, T. monococcum var. hornemannii) as well as wild species T. araraticum (Gogniashvili et al. 2015). The new methodology of cpDNA sequencing was developed (see Tabidze et al. 2014; Gogniashvili et al. 2015). A study, in which complete cpDNA would be sequenced in a set of Ae. tauschii accessions originated from Georgia—the part of the area which is of particular importance for understanding the species evolution, seems to be of particular interest. Intraspecies divergency of Ae. tauschii was previously studied by Dudnikov (2012)—four regions of non-coding cpDNA, about 3000 bp in total, were sequenced in 112 accessions of Ae. tauschii. But the genealogy patterns obtained were complicated and rather contradictory. The root of phylogenetic tree was located between relict lineage t-91s of ssp. strangulata and the lineage “AE-725” of ssp tauschii. The latter was the ancestor for all the other lineages of both ssp tauschii and stangulata (Dudnikov 2012).

To date complete sequence of cpDNA of two ssp strangulata accessions and one accession of Ae. cylindrica is known (Middleton et al. 2014; Gornicki et al. 2014). In the present investigation we sequenced total cpDNA in nine Ae. tauschii and two Ae.cylindica accessions of Georgian origin. So, the data on eleven Ae. tauschii accessions was used for the analysis of its intraspecies phylogeny, and the data on tree accessions of Ae. cylindrica was used for investigation of peculiarities of plasmon origin of this spesies.

Materials and methods

Plant material

The seeds of Ae. tauschii Coss. and Ae. cylindrica Host were collected in East Georgia in 2010 (Jinjikhadze et al. 2010). In Ae. tauschii, the ssp determination was done according to Dudnikov (2000) on the basis of ssp index “SI”, which is a ratio between spikelet glume width and rachis segment width, and also on the basis of allozyme polymorphism of Acph1 and Got2 loci (Jaaska 1981; Dudnikov 2000).

DNA isolation, PCR analysis, Genomic DNA library preparation, sequencing on an Illumina MiSeq platform

The seeds were germinated in water at room temperature. Total genomic DNA was extracted from young leaves. The leaves were ground in liquid nitrogen, and DNA was isolated according to the modified Marmur’s procedure (Beridze et al. 2011). Primers used for amplification of Ae. tauschii cpDNA fragment 101,130 −101,548: Forward—AATATGGGCCCTCAACACCC; Reverse—GGGTTAACCGAACTCACGGA. The PCR conditions were as follows: 1 min denaturing at 94 °C; 30 cycles of 94 °C denaturing (1 min), 55 °C annealing (1 min) and 72 °C extension (2 min); followed by a final extension step at 72 °C (5 min).

Genomic DNA libraries were constructed using the TruSeq DNA Sample prep kit (Illumina, San Diego, CA, USA). Genomic DNAs were quantified using the Qbit BR reagents (Qbit 2.0 Fluorometer, Life Technologies). Briefly, 1 μg of DNA was sheared into 300 bp fragments on a Covaris M220 focused ultrasonicator (Covaris Inc) using SonoLabTM 7.1 software for 250 cycles/burst, 20.0 Duty factor, 50.0 Peak power, in screw-cap microtubes. After shearing, the DNA was blunt-ended, 30-end A-tailed and ligated to indexed adaptors. The adaptor-ligated genomic DNA was size selected with AMPure-beads using the gelfree protocol described in the TruSeq DNA Sample Prep manual. Size-selected DNA was amplified by PCR to selectively enrich for fragments that have adapters on both ends. Final amplified libraries were run on an Agilent bioanalyzer DNA 2100 (Agilent, Santa Clara, CA, USA) to determine the average fragment size and to confirm the presence of DNA of the expected size range.

The libraries were pooled in equimolar concentration and loaded onto a flow cell for cluster formation and sequenced on an Illumina MiSeq platform. The libraries were sequenced from both ends of the molecules to a total read length of 150 nt from each end. The raw.bcl files were converted into demultiplexed compressed fastq files using Casava 1.8.2 (Illumina). Sequencing of wheat DNA samples was performed at the facilities of the National Centre for Disease Control and Public Health, Tbilisi, Georgia.

Assembly of chloroplast DNA

FASTAQ (a text-based format for storing nucleotide sequence) files were trimmed using the computer program Sickle, a windowed adaptive trimming tool for FASTQ files using quality (https://github.com/najoshi/sickle). The reads were filtered by standard parameters (quality reads—20, cutoff length—20). The reads containing ‘‘N’’ were discarded. The filtered chloroplast reads were assembled using the SOAPdenovo2 software program (version 127mer) (Li et al. 2009). The reads were first de novo assembled into contigs with k-mers 83–93. All contigs were aligned to the reference chloroplast genome sequence using BLASTN (http://www.ncbi.nlm.nih.gov). Merging of large overlapping contigs was performed according to EMBOSS 6.3.1: merger (Rice et al. 2000). Totally, 2,446,306 reads were generated for Gt_30 accession. Number of reads mapping at kmer 89 was 237,690. Nine contigs 780–75,970 bp length were used for chloroplast genome assembly with the coverage 17.6.

Automatic annotation of cpDNA was performed by CpGAVAS (Liu et al. 2012). For detection of SNP (single nucleotide polymorphism) and Indels (insertion/deletion) and phylogeny tree construction computer programs Mafft and Blast were used (Katoh et al. 2002; Altschul et al. 1990).

Results and discussion

Tauschii accessions

Nine accessions of Ae. tauschii and two accessions of Ae. cylindrica used in the study are listed in Table 1. Ae. tauschii ssp attribution is distinct in all the accessions studied. Accessions Gt_14, Gt_17, Gt_30, Gt_32, Gt_34 have Acph1 95 and Got2 105 alleles and spike morphology characteristic for ssp strangulata, while Gt_15, Gt_19, Gt_24 and Gt_40 have Acph1 100 and Got2 100 alleles and spike morphology characteristic for ssp tauschii. Accessions G_30 and G_34 was found to belong to the relict lineage “t-91-s” (TauL3) of ssp strangulata (Dudnikov 2012).

Table 1 Ae. tauschii (Gt) and Ae. cylindrica (Gc) accessions of East Georgia used in the study

SNP and indel analysis

Using Gt_30 as a reference (TauL3), 33 SNPs can be identified in Ae. tauschii lineages TauL1 and TauL2 (Table 2). 28 SNPs are characteristic for both TauL1 and TauL2 accessions. 4 SNPs are additionally characteristic for TauL2. 20 SNPs are in the intergenic regions, 5—in introns; 26 SNPs are located in LSC (Long Single Copy section of chloroplast DNA), 6—in SSC (Short Single Copy section of chloroplast DNA), 1—in IR (Inverted Repeat). Eight SNPs were found into the genes. Three SNPs were observed in ndhF gene, 2 in rpoB and one in each matK, psbZ and petA. Five coding substitutions are synonymous, which does not alter the amino acid. In genes matK and ndhF amino acid substitutions were observed (Table 2). 19 bp inversion in psbA-trnL-UUU intergenic region with 3 bp loop and 8 bp stem were found.

Table 2 SNPs specific for Ae. tauschii and Ae. cylindrica

Eight indels longer than one bp, were observed (Table 3). They are located in intergenic spacers, as well as within the introns. The most interesting is the 27 bp indel. It is located in the intergenic spacer rps15-ndhF of SSC. This sequence is present in analyzed TauL3 accessions and is absent in TauL1 and TauL2 (Fig. 1). This indel can be used for simple determination of TauL3 lineage. Blast analysis demonstrated that all plasmons of Aegilops and Triticum (D, B, G, S, A, M) contain 27 bp sequence of rps15-ndhF intergenic region except Ae. tauschii ssp tauschii, Ae. tauschii ssp strangulata and Ae. cylindrica (all of them contain plasmon D).

Table 3 Indels in cpDNA of Ae. tauschii and Ae. cylindrica accessions
Fig. 1
figure 1

2 % agarose gel electrophoresis of PCR-amplified Ae. tauschii cpDNA fragment 101,130–101,548. Lanes: 2 Gt_25, 3 Gt_26, 4 G-28, 5 Gt_29, 6 Gt_30, 7 Gt_33, 8 Gt_34, 9 Gt_35, 10 Gt_36, 11 - Gt_37; Lanes 1, 12–100 bp DNA marker

In the case of Ae. cylindrica additionally 7 SNPs were observed, 4 in intergenic sequences and three in genes (matK, rpoB and rpoC2). Amino acid substitution is observed in matK and synonimous in rpoB and rpoC2. In the intergenic sequence infA-rps8 18 bp duplication was detected.

Phylogeny tree

At the present time Ae. tauschii accessions are grouped into three intraspecific lineages: TauL1, TauL2, and TauL3 (Matsuoka et al. 2015). In the present investigation the complete nucleotide sequence of cpDNA of 9 accessions of Ae. tauschii and 2 accessions of Ae. cylindrica are presented. To illustrate the evolutionary relationship among sequenced cpDNA of both Aegilops species accessions a neighbor-joining phylogenetic tree was constructed based on multiple alignments using Jalview version 2 (Waterhouse et al. 2009) (Fig. 2). The tree was drawn using also two published cpDNA sequence of ssp strangulata (Middleton et al. 2014; Gornicki et al. 2014).

Fig. 2
figure 2

Complete chloroplast genome phylogeny of Ae. tauschii and Ae. cylindrica accessions, Neighbour joining tree using PID (Waterhouse et al. 2009). The GenBank accessions used for the analyses are: G (TauL2) KJ614412.1 (Gornicki et al. 2014), M (TauL2) NC_022133.1 (Middleton et al. 2014), M_cyl (Ae. cylindrica) NC_023096.1 (Middleton et al. 2014)

The phylogeny tree shows that TauL3 diverged from other Ae. tauschii in ancient times. TauL1 lineage is relatively older then TaulL2, and was an ancestor one to the latter (Fig. 2). The data also reveal relatively high sequence variation within TauL1 and TauL2 lineages. According the branch length sequence divergence is quite high also within each of the two TauL1 and TauL2.

Conclusion

The simplified scheme based on SNP and indel data of Ae. tauschii lineages and Ae. cylindrica was constructed (Fig. 3). According to this scheme plasmon (cpDNA) of TauL1 has an intermediate position between TauL3 on one hand, and TauL2 and Ae.cylindrica—on the other. It is known that the cytoplasm of Ae. cylindrica was contributed by Ae. tauschii (Maan 1976; Tsunewaki 1989). The position of Ae. cylindrica accessions on Ae. tauschii phylogeny tree constructed on cpDNA variation data is intermediate between TauL1 and TauL2. Thus, the complete nucleotide sequence of cpDNA of Ae. tauschii and Ae. cylindrica allows to refine the origin and evolution of D plasmon of genus Aegilops.

Fig. 3
figure 3

The scheme reflecting the differences in single nucleotide polymorphisms (SNP) and indels between cpDNA of Ae. tauschii lineages and Ae. cylindrica