Introduction

Harvested over 200 million hectares corresponding to 650 million tons of annual production, bread wheat, Triticum aestivum L., is the most extensively grown crop following rice and maize (Mochida and Shinozaki 2013; Ani Akpinar et al. 2015). It is an important component of world food consumption by providing around 20 % of human dietary energy (Kurtoglu et al. 2014). Considering its high agronomic value, along with the ever increasing world population, sustainable and steady increases in wheat yields are of vital importance for the food security of next generations (Parry et al. 2011). However, agricultural practices following wheat domestication have largely focused on the selection and breeding of uniform plants with high yields, considerably narrowing the gene pools of the elite cultivars (Akpinar et al. 2015). Fortunately, extant wild relatives and progenitors of bread wheat still preserve remarkable genetic diversity. These wild gene pools contain diverse alleles that contribute to adaptive processes to different environmental conditions including biotic and abiotic stresses (Akpinar et al. 2012; Budak et al. 2013; Lopes et al. 2015). Exploring these sources of natural diversity as well as effective combination of them with powerful molecular methods (Budak 2010) can provide insights into the cellular mechanisms and associated processes underlying environmental adaptation which may, in turn, be utilized to develop hardy varieties that are high-yielding at the same time.

The large (17 Gb) and complex (80 % repetitive elements in content) nature of the allohexaploid wheat genome (Triticum aestivum, 2n = 6× = 42, genomic formula AABBDD) is thought to originate from at least two different hybridization events (Chalupska et al. 2008; Marcussen et al. 2014). As the first, the hybridization of two diploid species, Triticum urartu (2n = 2× = 14, AuAu) and an unknown B genome related organism (2n = 2× = 14, genomic formula SS) occurred about 0.5–0.36 million years ago (MYA) and resulted in the emergence of wild tetraploid emmer wheat Triticum turgidum ssp. dicocoides (2n = 4× = 28, AABB) (Dvorák et al. 1993; Akhunova et al. 2010). Around 8000 years ago, another hybridization event between domesticated emmer wheat Triticum turgidum (2n = 4× = 28, AABB) and Aegilops tauschii (2n = 2× = 14, DD), followed by whole genome duplication gave rise to the hexaploid bread wheat (Jia et al. 2013; Li et al. 2014). Domestication and evolutionary processes, such genome-wide diploidization following polyploidization, reshaped the modern wheat genome; many genes or families are lost, while others are strongly preserved or newly emerged (Murat et al. 2014).

Understanding the structure and organization of the complex wheat genome can be aided by working on its diploid progenitors and relatives. Additionally, diverse gene pools of these diploid germplasms may provide favorable alleles for wheat improvement. With this motivation and the decreasing costs of sequencing technologies, genome and/or transcriptome sequences of several wheat species have become available, delivering important clues into the wheat genome structure and evolution. Sequencing the A genome progenitor of allohexaploid bread wheat, T. urartu (2n = 14, AuAu) revealed that its 4.94 Gb genome consists of 67 % repetitive elements (Ling et al. 2013). Another A genome relative of wheat is Triticum monococcum (2n = 14, AmAm) which has a similar genomic structure, in terms of genome size (5.6 Gb) and content with T. urartu (Fricano et al. 2014). While T. urartu existed in wild populations until now, T. monococcum has undergone domestication and diverged into different subspecies such as T. monococcum ssp. aegilopoides (wild winter wheat) and T. monococcum ssp. monococcum (domesticated spring wheat) (Fox et al. 2014). Since the genome sizes of the both of these organisms are relatively small with respect to the 17 Gb hexaploid wheat genome, they can be used as simpler surrogates, for which the findings can be translated to the hexaploid wheat through comparative genomics (Brenchley et al. 2012). Furthermore, recent studies revealed that A genome relatives of wheat have a wide repertoire of important alleles for biotic and abiotic stress responses such as disease resistance or salt tolerance (Munns et al. 2012; Saintenac et al. 2013; Zaharieva and Monneveux 2014), which can be utilized in wheat genome improvement programs.

While the A genome progenitor of tetraploid and hexaploid wheat species is fairly known and generally accepted, the identity of the B genome progenitor is much more controversial and remains uncertain. The main challenge is the variation and divergence among the candidate B genome progenitors of wheat (Dvorák and McGuire 1981). In spite of the difficulties, several attempts has been made for the identification of the B genome origin through comparative analyses of nuclear and mitochondrial DNA sequences (Dvořák et al. 1989) and chromosome rearrangements (Jiang and Gill 1994; Devos et al. 1995). Candidate B genome progenitors has been gathered under a section named “Sitopsis” and one of the group members has shown strong evidence to be the progenitor of B genome: Aegilops speltoides (2n = 14, SS) (Kilian et al. 2007). Ae. speltoides is a diploid plant native to Western Asia and Southeastern Europe, presenting a natural source for distinct disease resistance alleles (Sarkar and Stebbins 1956; Haider 2012). Another Sitopsis section member, Aegilops sharonensis (2n = 14, SS) exhibit remarkable adaptation to various environmental conditions (Bouyioukos et al. 2013). Certain cultivars of Ae. sharonensis are resistant to wheat stem rust disease, considered as evidence of wide range genetic variation (Olivera et al. 2007). With a relatively small genome size (7.5 Gb) and high genetic diversity, Ae. sharonensis has morphological, phenological, and ecological characteristics that are remarkably different from bread wheat (Eilam et al. 2007).

Despite the controversy on the identity of the B genome progenitor, the D genome donor of bread wheat is clearly defined as Ae. tauschii (2n = 14, DD). It is an extensively distributed goat grass in Eurasia displaying high range of diversity at phenotypical and molecular levels (Dudnikov and Kawahara 2006). Ae. tauschii is a valuable source for development of improved wheat cultivars tolerant to different stress conditions due to its exclusive adaptation ability to many different environmental conditions (Iehisa et al. 2012; Akpinar et al. 2014). The 4.23 Gb sized Ae. tauschii genome, containing 65 % repetitive elements, is fairly well characterized (Kumar et al. 2012; Jia et al. 2013; Luo et al. 2013). Several efforts have also focused on the identification of genomic differences, such as Single Nucleotide Polymorphisms (SNPs) between the D genome of bread wheat and Ae. tauschii which may deliver important clues into the evolutionary history of these species, facilitating genetic transfer between two species (Iehisa et al. 2012; Wang et al. 2013).

MicroRNAs (miRNAs) are small (19–24 nt), endogenous, non-coding RNA molecules that are involved in gene expression regulation at the post-transcriptional level (Budak et al. 2015a; Budak and Akpinar 2015). miRNAs have big impact on several processes such as development (Neilson et al. 2007; Curaba et al. 2012), environmental adaptation, stress responses (Trindade et al. 2010; Budak and Akpinar 2011; Budak et al. 2015b) and disease resistance (Navarro et al. 2006; Lucas et al. 2014); therefore, elucidation of the miRNA repertoire as well as an in-depth understanding of their functions and biogenesis hold importance in discovering the complex nature of gene regulation (Sunkar et al. 2008; Budak et al. 2014). In plants, miRNA research has been ongoing for over a decade in several directions. Many studies focused on miRNA-target interactions with the intent of revealing the direct effects on biological processes which may even have immediate impacts on phenotype (Xing et al. 2010). On the other hand, a comprehensive point of view is implemented in miRNA studies by genome/transcriptome-wide miRNA mining approaches (Zhou et al. 2008; Chen and Cao 2015; Liu et al. 2015), unlocking the miRNA contents of various plants (Kurtoglu et al. 2014). Up to now, many different methods have been developed and used for the identification of miRNAs and their specific target genes such as cloning (Wang et al. 2004), genetic screens (Chen 2004), splinted-ligations mediated miRNA detection (Chamnongpol et al. 2010), microarray profiling (Kantar et al. 2011) and computational approaches (Wang et al. 2004; Sunkar et al. 2008; Budak and Akpinar 2011). While most experimental methods are expensive and time consuming, computational miRNA identification approaches from high-throughput genomic or transcriptomic data present as a decent and rapid alternative. Whereas genome-wide in silico miRNA identification can unravel the whole miRNA repertoire of an organism, expression evidence for the putative miRNAs is ultimately required. On the other hand, the utilization of transcriptomic data provides an overview of miRNAs putatively expressed under certain conditions (Akpinar et al. 2015). Even though miRNA prediction tools based on support vector machine algorithms also exist (Kadri et al. 2009; Teune and Steger 2010), homology-based miRNA identification methods, which are based on large scale sequence conservation, are considered as the most powerful and reliable techniques for the mass identification of plant miRNAs (Budak and Akpinar 2015; Budak and Kantar 2015).

In addition to their key roles on gene expression regulation, miRNAs might also have important roles on the stability and organization of polyploid plant genomes. Understanding how the cell regulates multiple copies of the genome may hold importance for interfering genetic regulation (Chen 2007) and many miRNA studies on many economically important polyploid crops such as cotton and sugarcane have focused on this issue (Thiebaut et al. 2012; Xie and Zhang 2015). However, studies which investigate miRNA genes comparatively in diploid and polyploid backgrounds of wild and domesticated varieties are rare to our knowledge (Li et al. 2014). Bread wheat stands as an enticing model for such studies of comparative microRNAome (or miRNAome) analysis, with the presence of multiple genome progenitors. Herein, we computationally investigated the miRNA repertoires of five different diploid wheat progenitors and relatives, Ae. sharonensis, Ae. speltoides, Ae. tauschii, T. monococcum, and T. urartu, using a homology-based approach on publically available transcriptomic data and compared the related miRNA content with the T. aestivum transcriptome. Identification of putatively conserved miRNAs in inter/intra-species level across different members of Poaceae family provides invaluable insight into the evolution of microRNAome with respect to polyploidization and domestication events.

Materials and methods

Construction and pre-processing of transcriptome dataset

In order to identify the miRNA repertoires of wheat diploid relatives, transcriptomic sequences of sixteen Aegilops sharonensis accessions (Supplementary File S1), Aegilops speltoides accession B-2140016, Aegilops tauschii, accession D-2220009, Triticum monococcum ssp. aegilopoides accession G3116, Triticum monococcum ssp. monococcum accession DV92 and Triticum urartu accession G1812 were obtained from National Center of Biotechnology Information (NCBI) sequence read archive (http://www.ncbi.nlm.nih.gov/sra). Additionally, Triticum aestivum cultivar Chinese Spring “leaf” and “root” RNA-sequencing (RNA-Seq) data, obtained from Unité de Recherche Génomique Info (URGI, http://wheat-urgi.versailles.inra.fr/Seq-Repository/RNA-Seq), were included for comparative analysis of miRNAs in diploid and polyploid backgrounds. All transcriptome data is derived from control plants grown under normal conditions, except for T. monococcum subspecies which were germinated in the dark and then exposed to 48 h of light (Fox et al. 2014). T. monococcum data was concluded to be fairly similar to the remaining datasets and nevertheless included in this study as it enabled us to assess the effects of domestication. In the case of T. urartu and two T. monococcum subspecies, pre-assembled transcriptomic data is used while for others de novo sequence assemblies from raw reads of each library were constructed by Trinity software (Grabherr et al. 2011). Quality trimming and adaptor removal of reads were performed with Trimmomatic (v 0.32) with default parameters “LEADING:5, TRAILING:5, MINLEN:36” (Bolger et al. 2014). A summary description of the input datasets used in this study is given in Table 1.

Table 1 Summary list of the transcriptomic datasets used for miRNA identification

All transcriptome assemblies were further evaluated for sequence quality and context. Since the transcriptional machinery of organelles can generate long transcripts which can affect miRNA identification processes, organelle DNA-associated sequences from all transcriptome assemblies were eliminated (Bouyioukos et al. 2013). Chloroplast genome sequences of each organism were obtained from NCBI archive and used as a query for elimination process (Middleton et al. 2014). For the removal of mitochondrial DNA, bread wheat mitochondrial genome sequence (NC_007579.1) was used as the closest data available. Reads were aligned with organellar sequences by BLASTn (version 2. 2. 26; 1E-15, −dust “no”) and positive hits (alignment ≥ 95 % of the query and identity ≥ 95 %) were removed. In addition, since transfer ribonucleic acid (tRNA) and ribosomal ribonucleic acid (rRNA) sequences can form hairpin or hairpin-like secondary structures, sequences with positive hits to tRNA or rRNA sequences were also discarded to avoid potential false positives. For each organism, published tRNA and rRNA sequences were obtained from European Nucleotide Archive (http://www.ebi.ac.uk/ena). Assembly sequences aligning with tRNAs/rRNAs were identified (1E-5, −dust “no”) and positive hits fulfilling the same criteria defined above were further eliminated.

Reference miRNA set

For the homology-based prediction of putative miRNA repertoires from the transcriptome assemblies of bread wheat and its diploid progenitors and relatives, a specific query which consists of previously identified plant mature miRNA sequences was used. The list of published 4000 unique mature miRNA sequences from 72 different Viridiplantae species were downloaded from the current release of miRBase (v21 June 2014) (Kozomara and Griffiths-Jones 2011) and combined with T. urartu mature miRNA sequences identified from (Ling et al. 2013). Unique mature miRNAs identified from T. urartu small RNA databases (Ling et al. 2013) were specifically marked as “miR-Sr”. All mature miRNA sequences were filtered against redundancy and the remaining 4942 unique mature miRNA sequences were used as query in the homology-based miRNA identification process.

Homology-based in silico miRNA prediction

Homology-based in silico miRNA prediction from all transcriptome assemblies was carried out with a previously described two-step procedure (Lucas and Budak 2012). Briefly, the two-step procedure can be described as follows: (1) preliminary selection of transcriptomic sequences exhibiting sequence homology to a reference miRNA sequence, (2) their subsequent elimination based on the consistency of their stem-loop secondary structures in relation to the general, pre-established preliminary miRNA (pre-miRNA) features (Zhang et al. 2005; Unver and Budak 2009; Kantar et al. 2010). Prediction was employed by the utilization of two in-house Perl scripts “SUmirFind and SUmirFold”, previously described in detail (Lucas and Budak 2012; Kurtoglu et al. 2013; Kurtoglu et al. 2014). SUmirFind script utilizes BLASTn alignment algorithm from BLAST+ package and detects candidate miRNA sequences which are aligned to mature miRNAs found in miRNA reference set with less than three mismatches. SUmirFold script further processes these putative sequences in order to generate secondary structures with the utilization of UNAFold version 3.8 (Markham and Zuker 2008) and performs a basic filtering step based on following criteria: (1) In the hairpin structure, mature miRNA and miRNA* sequences can have four and six mismatches, respectively, at most, (2) miRNA* cannot be broken into separate segments or contain a large loop, (3) The GC content of the hairpin should be in the range 24–71 %, (4) The minimum free energy index (MFEI) of the hairpin should be above 0.67. Putative sequences passing the preliminary filters were further evaluated based on pre-miRNA characteristics previously described in (Kurtoglu et al. 2014) with a Java-based semi-automated in-house program. Predicted miRNAs supported by the presence of at least one pre-miRNA sequence possessing the pre-defined stem-loop structure characteristics were recorded and accepted as a proof of putative expression of the related miRNA.

Identification of repetitive elements in putative miRNA-coding sequences

In order to investigate transposable element containing miRNAs, putative pre-miRNA sequences identified from all datasets were separately searched against a publically available repeat library of Poaceae family (MIPS-REdatPoaceae v9.3p, ftp://ftpmips.helmholtz-muenchen.de/plants/REdat/) which contains 34,135 different repeat sequences (Nussbaumer et al. 2013) using RepeatMasker version 4.0.5 (www.repeatmasker.org) at default settings. Sequences covered by repeats more than 50 % of their lengths were recorded as transposable element-related miRNAs, or TE-miRs.

miRNA-target prediction and functional annotation

In silico target prediction of putative miRNA sequences was separately carried out for each miRNA dataset identified from six different organisms using psRNATarget web-tool (http://plantgrn.noble.org/psRNATarget/) at default parameters (Dai and Zhao 2011) against their respective transcriptomes. Sequences marked as putative miRNA targets were retrieved and their functional annotation was performed using Blast2GO (www.blast2go.com) (Conesa and Götz 2008). The initial blast step was performed locally against all non-redundant Viridiplantae (taxid: 33090) proteins (3,485,798) at an e value cutoff 10−6, and the subsequent mapping and annotation steps were carried out at default parameters. Gene ontology (GO) terms were further analyzed, recorded, and visualized with multilevel pie graphs.

Results and discussion

miRNA identification from transcriptome assemblies

To explore putatively expressed miRNA repertoires of bread wheat and its diploid progenitors and relatives, transcriptome assemblies were constructed with the Trinity software after quality trimming and adaptor removal of reads by Trimmomatic (v 0.32) at default settings, where pre-assembled sequences were not available. RNA-Seq of 22 different genotypes belonging to six different organisms, Aegilops sharonensis (16 different accessions, Supplementary File S1), Aegilops speltoides accession B-2140016, Aegilops tauschii, accession D-2220009, Triticum monococcum ssp. aegilopoides accession G3116, Triticum monococcum ssp. monococcum accession DV9, Triticum urartu accession G1812, and Triticum aestivum cv. Chinese Spring, yielded assembled sequences ranging from 44 to 223 Mb with an average contig length of 546 to 1847 bases (Table 2). Following further elimination of organellar transcripts and other non-coding RNAs such as tRNAs and rRNAs, homology-based in silico miRNA prediction was performed on each assembly using previously known 4942 unique mature miRNA sequences of the reference miRNA set. Putative miRNAs from each assembly were filtered against redundant miRNAs. Identified miRNAs were named and clustered respect to their homology to known plant miRNAs in mature miRNA level. In case of the equal homology to diverse mature miRNA sequences coming from different families, pre-miRNA homology is considered. Cumulatively, a total of 17,197 unique stem-loops (or pre-miRNAs) corresponding to 259 different miRNA families, of which nine belonging to T. urartu had also been previously reported by Ling and his colleagues (Ling et al. 2013), were detected (Supplementary File S2). Since no registered miRNA sequences was detected in miRBase from Ae. sharonensis, Ae. speltoides, T. monococcum, and T. urartu, a great majority of identified miRNAs in this study had not been previously reported. It is also important to remark that orthologues of three different T. urartu miRNAs (miR-Sr60871, miR-Sr93419, and miR-Sr7354), which are not included in miRBase, were exclusively identified from T. aestivum transcriptomes where a similar situation stands for Ae. tauschii for four different miRNAs (miR-Sr60871, miR-Sr80912, miR-Sr93419, miR-Sr97354).

Table 2 Assembly metrics for the genotypes used in this study. For Ae. sharonensis, the average assembly metrics of 16 different accessions is given for simplicity

In our analysis, the number of identified miRNA families among 22 different transcriptome assemblies of wheat progenitors and relatives varied in the range of 30 to 89, while this range was 131 to 3046 for the stem-loops, as given in Fig. 1a. Of all datasets, T. urartu and Ae. sharonensis accession 2172 showed the highest miRNA diversity with the representation of 89 unique miRNA families (1.25 fold of average) (Fig. 1a). Interestingly, Ae. speltoides displayed the lowest miRNA diversity and total miRNA stem-loop count which might be related to the relatively small size of transcriptome assembly (Table 2). Despite a slight correlation between the total length of transcriptome assembly and the number of identified miRNA families (r 2 = 0.64, type, “Pearson”), associated miRNA stem-loop counts were hardly dependent on assembly size (r 2 = 0.37, type, “Pearson”), while it display a positive correlation with miRNA variety of the organism (r 2 = 0.59, type, “Pearson”), (Supplementary Figure S1). Even though calculated correlations might be effected from the randomness of dataset, this situation might also arise from differential expression and genomic expansion of some miRNA precursors in different species. For instance; miR1130 family members were represented with 136 putative precursor sequences in Ae. sharonensis accession 409 while this number was 57 for Ae. sharonensis accession 1192. Since expression of miRNAs in mature and pre-miRNA level highly specific to condition/tissue/time-zone, represented precursor numbers, and related correlations might be altered in different situations.

Fig. 1
figure 1

Summary information regarding the miRNAs identified in this study. a The total number of identified miRNA families and corresponding miRNA stem-loops. For the Ae. sharonensis, the average of 16 different accession is counted. b Main characteristics of putative pre-miRNAs: MFEI, pre-miRNA length, and GC % content. The average values are calculated and represented for each organism

The miRNA families, miR1130, miR1120, miR1127, miR5049, or miR5181 had the highest numbers of corresponding stem-loops across all organisms. Among these, miR1120 family has already been defined as one of the largest miRNA families in T. aestivum (Budak and Akpinar 2015). miR1120, miR1127, and miR1130 have also been reported as important regulators of seed and leaf development in T. aestivum, processes crucial to the survival and reproduction of plants (Han et al. 2014). In addition, these miRNA families were detected as conserved among all organisms including different accessions together with 15 other conserved miRNA families (Table 3). In the point of fact; it is a known phenomenon that many miRNA families and their members are evolutionary conserved among species within the same kingdom where they might exist as orthologues or homologues (Llave et al. 2002; Jones-Rhoades and Bartel 2004). The further examination of highly-represented/conserved miRNA families may provide insights about their exclusive functions; hence, it might be suggested that the expression of miRNA families regulating functionally important biological process for sustainably of plants may tend to be slightly higher and stable under normal conditions respect to other miRNA families taking place in more specific pathways such as stress tolerance.

Table 3 List of miRNA families, putatively expressed under normal conditions, in all genotypes (1), in only Ae. sharonensis accessions (2), or only in one genotype (3) included in this study

The characteristics of putative miRNAs

The characteristics of the putative miRNA stem-loops such as the GC % content, pre-miRNA/mature miRNA lengths, and MFEI values were assessed as an indicator of the accuracy of our miRNA identification pipeline. The average mature miRNA length was observed as 21 nucleotides in all of the organisms, as expected, since many of plant-mature miRNAs are ranging from 19–24 nucleotides with a bias towards 21 bases in length (Thakur et al. 2011; Kurtoglu et al. 2014). Animal miRNAs show high conservation of their precursor sequences; however, plant pre-miRNAs are long and diverse, which may be related with the presence of additional regulatory elements in the miRNA genes (Zhang et al. 2006). In our analysis, the average length of putative pre-miRNAs ranged from 110 to 150 in accordance with the previous studies. Interestingly, the average GC % of the identified pre-miRNAs slightly correlated with the average pre-miRNA length across different organisms (Fig. 1b). It is suggested that the pre-miRNA sequences encoded from intronic regions have higher GC contents that may be related to their biogenesis (Zhu et al. 2011b). Hence, high GC contents may indicate that the respective miRNAs originate from intronic sequences. Additionally, intronic miRNA precursor were observed to be lengthy, up to 890 bases (Ramalingam et al. 2014), which may, again, be reminiscent of unique biogenesis routes (Zhang et al. 2006). It is tempting to speculate that higher GC contents may assist in the stability of longer pre-miRNA sequences during miRNA biogenesis, particularly from long intronic miRNA precursors.

Another important criterion for the determination of genuine pre-miRNA secondary structures is the Minimum Folding Energy Index (MFEI). The specific MFEI values are used for the discrimination of miRNAs from other RNA species, i.e., miRNAs generally have higher MFEIs than other types of RNAs such as tRNAs (0.64), rRNAs (0.59), or mRNAs (0.62–0.66) (Schwab et al. 2005; Kantar et al. 2012). In our analysis, the average of MFEIs of putative pre-miRNAs was remarkably high in all organisms, in agreement with previous studies (Zhang et al. 2007; Jin et al. 2008; Kantar et al. 2012). The highest average MFEI of putative stem-loop structures was highest in T. urartu with an average of 1.34 ± 0.23 (Fig. 1b).

miRNAs from A genome-related relatives of bread wheat

The transcriptome sequences of the A genome progenitor of modern wheat, T. urartu, suggested the expression of 3046 putative miRNAs belonging to 89 families. Among these, miR1120, miR1127, miR1130, and miR5049 were the most highly expressed families each with more than 200 counts. While none of these families are well-known or well-characterized, the expression of all these four families has previously been shown in wheat (Yao et al. 2007; Han et al. 2014). miR1120 has been linked to developmental processes in both wheat and its close relative barley through expression profiles (Kruszka et al. 2013; Han et al. 2014). However, miR1120 coding region was found to have significant similarities to the TcMar-Stowaway family of DNA transposons in barley, raising the possibility that miR1120 sequence may actually correspond to another class of small non-coding RNAs (Kruszka et al. 2013). Interestingly, miR1127 exhibited seed-specific expression profile in bread wheat, while miR1127* accumulated in the flag leaves, suggesting that the miRNA*, in this case, may have regulatory roles in growth and development (Han et al. 2014). Additionally, miR5049 and its putative target, wpk4 protein kinase, an important signal transducer responsive to various stimuli, have been reported to be drought-inducible (Sano and Youssefian 1994; Ruuska et al. 2008). Given its high putative expression under normal conditions in the A genome progenitor T. urartu, further research on this miRNA may unravel intriguing physiological roles. These four miRNA families were also enriched in the transcriptomes of T. monococcum, a close relative of the A genome progenitor T. urartu.

The two einkorn wheat subspecies, T. monococcum ssp. aegilopoides and T. monococcum ssp. monococcum, putatively expressed 80 and 87 miRNA families, respectively, even though the total counts of identified miRNA stem-loops were much fewer than T. urartu under similar conditions (1536 and 1820, respectively, vs. 3046). Although 58 miRNA families were commonly identified from all three transcriptomes, putatively expressed miRNA families largely differed from one transcriptome to another. While 11 miRNA families were putatively expressed only in domesticated einkorn wheat, miR6220, miR9772, and miR-Sr80912 families were exclusive to the wild T. monococcum ssp. aegilopoides and T. urartu transcriptomes. miR1030, putatively expressed in domesticated T. monococcum ssp. monococcum but not in T. monococcum ssp. aegilopoides, or T. urartu, was downregulated in response to drought stress in rice (Zhou et al. 2010). Interestingly, another such miRNA, miR5021, has been indicated in chilling response and dormancy, terpenoid biosynthesis, and lipid metabolism (Barakat et al. 2012; Wang et al. 2012; Fan et al. 2015; Singh et al. 2015). Although these observations are mostly from dicot plants where this miRNA may have specialized roles, terpenoids have recently been indicated in stress tolerance in maize (Vaughan et al. 2015), suggesting that similar networks may also exist across the plant kingdom.

In our analysis, some miRNAs such as miR444, miR5171, miR9653, and miR9654 were detected in two subspecies of T. monococcum but not in the A genome progenitor of bread wheat, T. urartu. Among these, miR444 was declared as an important regulator of nitrate signaling pathway in rice by experimental characterization under P deprived conditions (Yan et al. 2014). Interestingly, some of these miRNAs were detected as conserved with B and D genome related relatives of bread wheat (certain Ae. sharonensis accessions in most cases) as well as itself, while miRNAs such as miR5171 were specific to T. monococcum species. Between the two einkorn subspecies, 9 and 18 miRNA families were also identified as unique to the transcriptomes of the wild and domesticated genotypes, respectively. Further characterization of these miRNAs differentially expressed in domesticated and wild A genome lineages will likely expand our knowledge on the molecular mechanisms related with evolution of A genome of bread wheat and consequences of domestication event on miRNA genes.

miRNAs in bread wheat and its progenitors

While the A genome progenitor of bread wheat putatively expressed 89 miRNA families, the D genome progenitor Ae. tauschii and the closest identified B genome progenitor, Ae. speltoides, revealed the expression of relatively fewer putative miRNA families with 55 and 30 families, respectively, under similar conditions. Through identified 259 different miRNA families, only 18 % corresponding to 48 different miRNA families was detected in T. aestivum transcriptomes under normal conditions. Regarding to our results, no T. aestivum specific miRNA was detected under normal conditions; however different tissues may exhibit diverse miRNA profiles and detection of new miRNA families might be possible under the same conditions from other tissues such as spike or stem. Since 20 miRNA families were found as conserved across all genotypes used in this study (Table 3), remaining 28 different miRNA families were retained to T. aestivum from different progenitors and relatives. Comparative analysis of common miRNA families across whole genotypes used in this study revealed the unequal contribution of different organism to the miRNA repertoire of bread wheat (Supplementary Figure S2). Many miRNA families from Ae. speltoides and Ae. tauschii was remained in T. aestivum miRNAome while many of the Ae. sharonensis miRNAs (80 % of whole identified miRNA families) was identified as “lost”. Twenty-five families were detected as ‘common’ to all three progenitors, of which all but one identified from T. aestivum, as well (Fig. 2). Interestingly, miR5050 putatively expressed in T. urartu, Ae. speltoides, and Ae. tauschii, was not detected in T. aestivum transcriptome, suggesting that the expression of this miRNA, under normal conditions, might have been lost in bread wheat. Following polyploidization, genome-wide diploizidation acts on polyploid plant genomes to reduce redundancy, which can lead to structural gene loss or functional pseudogenization (Pont et al. 2013; Murat et al. 2014). Diploidization, followed by years of selection for particular traits, might have led to the loss of this miRNA or its expression under normal conditions in the modern bread wheat genome. In a similar study, the expression of this miRNA could not be detected via Northern blotting, while expression of its precursor could be verified in T. aestivum Chinese Spring cultivar under normal conditions (Wei et al. 2009). This suggests that the loss of expression in miR5050 in bread wheat is possibly due to a loss at the functional level. Another such miRNA, miR5021, implicated in chilling response, dormancy, terpenoid biosynthesis, and lipid metabolism as mentioned above, is putatively expressed both in Ae. tauschii and Ae. speltoides, but not in T. urartu or T. aestivum. Putative expression of this miRNA in the domesticated einkorn T. monococcum ssp. monococcum, a close relative of T. urartu, is curious. These observations indicate that differential expression patterns of miRNAs in related lineages provide valuable insight into the elucidation of miRNA functions, at times in multiple directions, in different genetic backgrounds.

Fig. 2
figure 2

The number of miRNA families putatively expressed under normal conditions in bread wheat progenitors, T. urartu, Ae. speltoides, and Ae. tauschii (numbers in black). Numbers in gray indicate the number of families also expressed in T. aestivum under similar conditions

In our analyses, miR5062, miR5387, miR6248, and miR9668 family members were identified only from T. urartu and T. aestivum among the progenitor of bread wheat (T. urartu, Ae. tauschii, and Ae. speltoides). Similarly, miR2120 family was present only in the D genome progenitor Ae. tauschii and T. aestivum transcriptomes. These miRNAs may have key roles in normal development that they are retained through wheat genome evolution; however, little information is currently available regarding their physiological roles. It should be noted that these miRNAs can still be present in the genomes of the remaining progenitors and expressed under different conditions, preventing their identification from the transcriptome sequences used in this study. In fact, miR5062 has been previously reported from Ae. tauschii (Jia et al. 2013). However, the expressional differences in different lineages can point out to potential pathways that can be targeted for crop improvement. For instance, miR2120 was upregulated in rice roots in response to cadmium stress (Huang et al. 2009). The expression of Ae. tauschii miR2120 under normal conditions may be linked to environmental adaptation. An in-depth understanding of its role may allow researchers to modulate its expression in T. aestivum in order to develop improved varieties. Two miRNAs, miR6180, and miR9653, were putatively expressed only in T. aestivum but not in any of the progenitors under normal conditions, suggesting that these miRNA families may be related to important agricultural traits. Aforesaid miRNAs have been predicted in wheat and barley in previous studies; however, their expression has not been experimentally verified (Wei et al. 2009; Lv et al. 2012; Han et al. 2014). Interestingly, small RNA sequencing suggested a seed-specific expression pattern for miR9653, which may indicate a role in seed development (Han et al. 2014). As our knowledge on these miRNAs and the pathways they participate in expands, new candidates may arise to modulate gene expression to achieve better crop performance.

miRNAs putatively expressed in Ae. sharonensis genotypes

The miRNA repertoires of 16 different Ae. sharonensis accessions collected from different locations in Israel were identified and comparatively assessed. Overall, 9509 unique miRNA stem-loops associated with 237 miRNA families identified from all Ae. sharonensis accessions, corresponding to 55.3 % of all identified stem-loops (Supplementary File S2). Accession 396 exhibited the highest number of predicted miRNA stem-loops (973 unique miRNA stem-loops), while accession 2205 had the lowest (602 unique miRNA stem-loops). Additionally, putatively expressed miRNA family variety was highest in accession 2179 with 89 different miRNA families, corresponding to 847 stem-loops. A total of 28 miRNA families were commonly identified from all Ae. sharonensis accessions. Notably, miR1139, miR437, miR5067, miR5070, miR5203, miR6248, miR818, and miR-Sr60871 were putatively expressed in all Ae. sharonensis accessions, although they were not necessarily conserved at the interspecies level across all organisms (Table 3). Among these, miR818 targeted serine-threonine kinases in rice and contributed to the regulation of tissue differentiation (Luo et al. 2006). Similarly, the target transcripts of miR6248 were identified as DNA binding proteins and proteins kinases which have widespread key molecular functions within the cell (Liu 2012). Thus, the conservation of these miRNAs might be related to their regulatory roles on important biological processes which are replaced by other essential miRNA pathways in certain genotypes. Interestingly, precursor sequences of miR437 exhibited high sequence similarity to DNA transposons of MITE and TcMar-Stowaway families, which was also observed in maize (Zhang et al. 2009) and sugarcane (Zanca et al. 2010), suggesting that this miRNA may exist as a TE-miR in plants. Curiously, miR437 was not detected in Ae. speltoides, Ae. tauschii, or T. aestivum transcriptomes in this study. The fingerprinting studies on mitochondrial genomes of different Poaceae family members suggested that Ae. speltoides and Ae. sharonensis were diverged from the same ancestor and further evolved into different branches (El-Shehawi et al. 2012). Hence, the expression of miR437 might have been lost during the evolutionary history in Ae. speltoides, as a consequence of TE-related genome reorganization, although we cannot exclude the possibility that this miRNA could also recently evolve in Ae. sharonensis. In order to investigate putative miRNA families specifically related to the B genome lineage, we also tried to identify the conserved miRNAs among the both members of Sitopsis section, Ae. speltoides and Ae. sharonensis; however, no Sitopsis-section related miRNAs could be detected. This situation may arise from the early evolutionary discrimination of the two (El-Shehawi et al. 2012), although we cannot exclude the possibility of miRNAs that remained unidentified in Ae. speltoides transcriptome assembly, due to the small size of this dataset. Regarding to Petersen and his colleagues, Ae. speltoides corrupt the monophlogenic cluster in the Aegilops genera of Pooideae subfamily suggesting a different clade for the organism (Petersen et al. 2006). Additionally, Ae. speltoides showed a closer relationship with genera Triticum which also consists of T. aestivum, T. monococcum, and T. urartu in another study (Escobar et al. 2011); thus it is tempting to speculate that the miRNA analysis of Ae. speltoides also provide clues about distinct nature of this organism compared to other Aegilops genera members of Pooideae subfamily.

Of the Ae. sharonensis miRNAs, 107 families were exclusive to only one accession (Table 3). Given that all 16 genotypes belong to the same study (PRJEB5340) exposed to the same conditions prior to RNA-sequencing, these miRNAs likely exhibit accession-specific expression profiles that can be related to individual characteristics of different accessions, such as stress tolerance or environmental adaptation as observed across widely distributed Ae. sharonensis populations (Bouyioukos et al. 2013). For instance, miR397, putatively expressed exclusively in accession 2233, exhibited differential expression patterns in tolerant and susceptible soybean genotypes in response to rust disease (Kulcheski et al. 2011). Additionally, putative targets of miR7693, another miRNA potentially exclusive to accession 1995, was involved in biological processes, such as “response to pathogen infection”, “protection against oxidative stresses”, and “disease resistance” in rice (Campo et al. 2013). Ae. sharonensis accessions have been observed to display diverse responses against rust infection (Olivera et al. 2007); hence, further investigation of the putatively accession-specific miRNA families and their targets should contribute to the elucidation of molecular pathways underlying stress responses or other beneficial traits observed in wild wheat populations.

Putative wheat miRNA targets

miRNAs regulate gene expression by binding on the complementary sites of target sequences and directing cleavage, decay, or translational inhibition of the target mRNAs (Budak et al. 2015a, b). Identification of the target transcripts hints the physiological pathways the respective miRNAs are involved in. In order to gain insight into the functions of the putative miRNAs identified in this study, target transcripts were predicted using psRNATarget web-tool and subsequently annotated against all Viridiplantae proteins. Gene ontology (GO) annotations were grouped under “biological process (BP)”, “molecular function (MF),” and “cellular component (CC)” categories.

Overall, GO annotations of putative targets included “protein modification process” and “cellular component organization” related terms in all organisms with a marked abundance under the BP category, as could be expected by the regulatory roles of miRNAs. For the CC category, “nucleus”, “plasma membrane”, “cytosol”, and “plastid”-related terms were also commonly attributed to putative miRNA targets. Additionally, “nucleic acid binding” were the most dominant MF term among others in all species tested followed by “kinase activity”, “transporter activity”, “protein and carbohydrate binding”, and “enzyme regulator activity” related terms, pointing out to central roles of miRNAs in gene expression.

miRNAs can target transcription factors frequently and contribute the transcriptional regulation in an indirect manner. This property of the miRNAs is recurrently used as biomarker in animal studies, specifically for cancer experiments (Nazarov et al. 2013; Wu et al. 2015). Considering this important regulatory function of miRNAs, miRNA families targeting transcriptions factor were analyzed in detailed. Respect to our analysis, ratio of miRNA target transcripts which are associated with transcription factors was in the range of 2 to 5 %. Interestingly, T. aestivum was displayed the lowest number of targets related with transcription factors (2.45 % of all target transcripts) while wild species such as T. monococcum (5.51 % of all target transcripts) represent slightly higher transcription factor range thought the target transcripts. This situation might be associated with the adaptation of wild plant species into the various range of environments which also interface with many stress related conditions, however further investigation of related miRNA families and their target transcripts is necessary. Different important transcription factor family members such as “WRKY”, “MADS”, or “MYB” were detected as targeted by both conserved and more specific miRNA families such as miR1130 or miR9494. miRNA families targeting transcription factor families might be important in the regulation of many diverse pathways (Qin et al. 2014), their further characterization is necessary in order elucidate our understanding of transcriptional regulation of and miRNA relationship.

Annotations of the putative miRNA targets exhibited marked differences in bread wheat and its progenitors, despite an overall similarity in GO terms that is expected from their close evolutionary relationships. Under the BP category, as shown in Fig. 3, “post-embryonic development”-related terms were evident in bread wheat but did not appear in any of the progenitors, suggesting that this biological process might have gained significance following many years of cultivation. Intriguingly, “photosynthesis” and “biosynthesis” were among the top terms in Ae. speltoides, the closest identifiable relative of the B genome progenitor, while these terms were either not as significant or not present at all in A and D genome progenitors, T. urartu and Ae. tauschii, and also in T. aestivum. In contrast, “response to stress”-related terms did not comprise a significant portion of all BP terms in Ae. speltoides (Fig. 3). In general, the distributions of BP category annotations were similar in A and D genome progenitors, and T. aestivum, while Ae. speltoides had fewer terms that also differed in prominence. This may either be due to the relatively smaller set of putative miRNAs identified from Ae. speltoides generating an artificial bias in target prediction and annotation, or the actual B genome progenitor of bread wheat may have slightly different miRNA expression patterns than Ae. speltoides.

Fig. 3
figure 3

GO annotations of putative miRNA targets for “biological process” in bread wheat and its progenitors. Annotations unique to one species is highlighted in pale yellow, while annotations common to all species or common to Tu, Ata, and Tae are highlighted in pale green and gray, respectively, for emphasis. Other colors represent diverse “biological process” GO assessments. Tu, T. urartu, Asp, Ae. speltoides, Ata, Ae. tauschii, Tae, T. aestivum

Under the CC category, “plastid”-related terms were the most prominent in all progenitors and T. aestivum, making as much as 48 % of all annotations (Supplementary File S3). Plastid related terms were also highly abundant among CC annotations of Ae. sharonensis accessions. In T. urartu, as well as in two T. monococcum subspecies, several targets with “cell wall” and “peroxisome”-related GO terms were observed, in contrast to Ae. speltoides, Ae. tauschii, and T. aestivum, where none of the targets were annotated as cell wall or peroxisome-related. It is tempting to speculate that this observation may point out to ancestral defense mechanisms existing in the A genome lineage that can be exploited in improving wheat varieties against stress factors. Intriguingly, under the MF category, “hydrolase activity” comprised a considerable portion of all target annotations in both T. aestivum and Ae. speltoides, but not in T. urartu, Ae. tauschii, or any of the T. monococcum subspecies, which had highly similar MF term distributions otherwise. In contrast, “nuclease activity” showed up among only T. urartu or Ae. tauschii target annotations, and “lipid binding” related terms were exclusive to T. urartu (Supplementary File S3).

The Blast2GO annotations of target transcripts of putative Ae. sharonensis miRNAs showed that the most of the miRNA targets are located in “plastids”, “cytosol”, and the “plasma membrane” (Supplementary Figure S1). With respect to the “molecular function” (MF) assessment of GO annotations, most of the miRNA targets were related to “nucleotide binding”, “kinase activity”, and “protein binding” consistent with previous observation on different plant species (Luo et al. 2006; Liu 2012) (Supplementary Figure S3). In our analysis, Ae. sharonensis miRNAs were observed to target an array of biological processes with a marked abundance of “cellular protein modification process”, “cellular transport organization”, and “transport”-related terms. Additionally, many miRNA target transcripts were involved in stress responses, such as biotic and abiotic stimulus. For instance, Ash_miR1130 and Ash_miR5049 target transcripts were associated with “abscisic acid ripening protein” which is one of the well-known proteins in response to different abiotic (Kalifa et al. 2004; Yang et al. 2005) and biotic stress conditions (Liu et al. 2010). Additionally, Bücker and colleagues detected that this protein acts as a cis-regulating element for miR167 expression in rice or vice versa. Ae. sharonensis populations exhibit diverse characteristics against stress conditions (Olivera et al. 2007; Bouyioukos et al. 2013); hence, the elucidation of unique mechanisms underlying stress tolerance of certain subspecies of Ae. sharonensis can be aided by further research on such miRNAs. From this point of view, for instance, further analysis and characterization of Ae. sharonensis miRNAs targeting the abscisic acid ripening protein may be promising.

Interestingly a few miRNAs, such as miR1130 and miR398, targeted specific proteins associated with cell death such as “programmed cell death protein 4”. Among these, miR398 is well characterized in many plants such as Arabidopsis thaliana (Sunkar et al. 2006) and Medicago truncatula (Trindade et al. 2010) and its differential expression was detected with respect to different stress conditions such as oxidative, salt, or drought stresses (Zhu et al. 2011a). It is a known phenomenon that plants go through apoptosis under different stress conditions, especially under pathogen attack, and regulation of this process hold importance for the survival of cell (Greenberg 1997; Kuzuoglu-Ozturk et al. 2012). Distinct miRNA families might have crucial roles on the regulation of apoptotic/autophagic/necrotic cell death mechanisms under stress conditions and might play significant roles for cell survival.

Repetitive sequences found within putative miRNAs

The contribution of transposable elements (TEs) on the miRNA repertoire of organisms has been fairly well described in many studies (Smalheiser and Torvik 2005; Yao et al. 2007; Piriyapongsa and Jordan 2008; Li et al. 2011). Considering this, we tried to analyze the association of putative miRNAs, identified in this study, with known plant TEs by aligning the pre-miRNA sequences to Poaceae repeat library which contains 34,135 sequences. From 73 to 85 % of the identified stem-loops were mapped against TEs with more than 50 % of their lengths in each plant species; these miRNAs are classified as TE-MIRs (Fig. 4a). The high resemblance of plant miRNAs with repetitive sequences of the genome has been previously shown in several studies; for instance, Piriyapongsa and Jordan revealed that 10 out of 12 Arabidopsis thaliana miRNAs were found to be identical to repetitive elements while the same situation was observed for 38 out of 83 miRNAs for Oryza sativa (Piriyapongsa and Jordan 2008). Recent studies suggested that miRNA genes might have evolved from imperfect inverted repeats with the help of accumulated mutations (Fahlgren et al. 2007; Feldman and Levy 2012). Additionally, another hypothesis proposed that miRNA genes are directly derived from homologous TEs where they may directly transcribed to miRNAs with the help of related molecular elements (Li et al. 2011). Despite the existing ideas to explain the miRNA-TE relationship, the authentic mechanism(s) of miRNA biogenesis from TEs is yet-to-be discovered. It is also important to note that identification of TE-MIRs is still under controversy, in particular, due to the resemblance of these miRNAs to repeat-related small interfering RNAs (siRNAs). In our pipeline, the required presence of pre-miRNA like stem-loop structures for putative mature miRNAs should prevent mis-annotation of TE-related siRNAs as TE-MIRs.

Fig. 4
figure 4

Repetitive elements found within pre-miRNA sequences. a The ratio of repetitive elements to total bases, b The distribution of major TE families found within pre-miRNA sequences

In our analysis, putative pre-miRNAs were associated with 667 different repeat sequences including simple repeats, DNA transposons, and retrotransposons. Among 259 different putative miRNA families identified from this study, 111 of them were annotated as TE-MIRs while the most abundant TE-MIRs were from the families, miR5049, miR1130, miR1127, miR1120, and miR5181 families. Additionally, these miRNA families were also detected as the “highly abundant” and constitute approximately 55 % of all identified pre-miRNA sequences across all species; therefore, it is tempting to speculate that TE proliferation may simultaneously increase TE-MIR copies. Despite the presence of both Type I and Type II repeat elements across identified TE-MIRs, the abundances of “Enhancer/Suppressor mutator-like DNA transposons (DNA/En-Spm)”, “miniature inverted repeat transposable elements (DNA/MITE)”, and “Tcl/Mariner DNA transposon (DNA/TcMar)” families were remarkable (Fig. 4b). Most of the identified TE-MIRs were associated with more than one repeat family, while cases where a TE-MIR uniquely mapped to a single repeat family were also observed. As an example, Ash_miR854 (Ash: Ae. sharonensis) was specifically related to (GGA)n simple repeats, while Ash_miR819 aligned with many DNA transposons from Stowaway, Tcl/Mariner, En-Spm repeat subfamilies. In addition, our observations were in general agreement with previous studies of orthologous miRNAs, such as osa_miR819 which was also associated with DNA/Stowaway family (Piriyapongsa and Jordan 2008). Curiously, contrasting observations on TE-MIRs and the respective repeat families also exist; for instance, miR854 was related to LTR/Gypsy family in rice (Piriyapongsa and Jordan 2008), while this family was associated with (GGA)n simple repeats in Ae. sharonensis in our study.

Several studies have proposed that the miRNA families arise from TE-to-MITE transition during evolutionary processes (Yao et al. 2007; Piriyapongsa and Jordan 2008) where DNA/MITE origins are also linked with the Tcl/Mariner DNA transposon family (Fattash et al. 2013). A recent study from Yu and his colleagues reported that a vernalization-related gene from T. aestivum, Vrn-Ala which contains MITE-family repeat elements in its promoter region, is potentially regulated with miR1123 that also carry similar DNA/MITE repeats in its pre-miRNA sequence (Yu et al. 2014). They also successfully validated both the presence of Tae_miR1123 and its unique stem-loop which is transcribed from MITEs with wet-lab tools. These findings, together with our observations, support the hypothesis on TE-related miRNA evolution in plants (Piriyapongsa and Jordan 2008; Li et al. 2011), especially for crop species which have high contents of repetitive elements. Hence, careful handling of repetitive elements in the process of computational miRNA identification holds importance for unraveling of the entire miRNA repertoire of organisms, including TE-MIRs.

Bread wheat is a major constituent of human nutrition. Being a hardy crop, grown extensively in temperate regions, wheat is likely to preserve its agronomic value in the future, as well. However, challenged by global climate changes and the ever increasing world population, stable and sustainable increases in wheat yields are crucial to ensure the food security of upcoming generations. The modulation of miRNA-regulated pathways has become a promising aspect of crop improvement in recent years. Accordingly, several studies have focused on the identification and characterization of these small regulatory molecules in wheat and other crop species. Despite its high agronomic importance, genomics research on wheat has only begun to accelerate due its large, highly repetitive a challenging genome. While genomics-based tools are slowly accumulating to aid breeding programs, wild wheat populations, and progenitors provide a unique resource of favorable alleles and surrogates at the diploid level to study the complex wheat genome. Here, we explored the miRNA repertoires of bread wheat, Triticum aestivum, and its diploid progenitors and relatives, Triticum urartu, Aegilops speltoides, Aegilops tauschii, Triticum monococcum, and Aegilops sharonensis through homology-based in silico miRNA identification from transcriptome sequences. Wheat progenitors provided a unique perspective of miRNA evolution through putatively expressed miRNAs. Two T. monococcum subspecies, the wild T. monococcum ssp. aegilopoides and the domesticated T. monococcum ssp. monococcum, suggested clues into domestication from a miRNA expression point of view. Sixteen Aegilops sharonensis genotypes, included in this study, revealed several miRNAs that are putatively expressed in only one genotype, but not the others, which may be related to the high diversity observed in Ae. sharonensis populations. In this study, we also observed many miRNAs that were associated with transposable elements. Overall, our observations provide important insights into miRNA evolution in bread wheat and its progenitors and relatives that can be exploited to dissect regulatory pathways of interest or can be targeted for wheat improvement.