Introduction

Among the diverse classes of functional small non-coding RNA (small ncRNA), microRNAs (miRNAs) represent important regulators of gene expression that are expressed in plants, fungi, animals, and unicellular eukaryotes as well as viruses [16]. The typically 18–26 nucleotide long miRNAs primarily target messenger RNAs (mRNAs), but binding sites are also present in ncRNAs that can act as competing endogenous RNAs (ceRNAs) [7]. MiRBase represents the most comprehensive database that provides access to genomic miRNA annotations and underlying sequence data along with crosslinks to databases of predicted target genes. The current version (v21) comprises ~28,600 miRNA entries from 223 different species, including more than 1800 human loci [811]. In general, translation of targeted mRNAs is repressed followed by deadenylation and mRNA decay [1214]. However, in specific cellular conditions such as cell cycle arrest, miRNA binding was also shown to mediate translational up-regulation of target mRNAs [15].

Canonical miRNA biogenesis in plants and animals is a two-step process that initiates with primary miRNA (pri-miRNA) transcription by RNA polymerase II (Pol-II). All pri-miRNAs have a stem-loop structure that is co-transcriptionally recognized by the Microprocessor complex in animals [16] or by DCL1 (endoribonuclease Dicer-like Dicer-like1) and its paralogs in plants [17]. The Microprocessor is mainly composed of the RNase III Drosha and the double-stranded RNA-binding protein DGCR8 (DiGeorge syndrome critical region 8 gene; also Pasha). DGCR8 recognizes the RNA substrate, while Drosha functions as an endonuclease that generates ~70 nucleotide long precursor miRNAs (pre-miRNAs) with hairpin structures in a process termed cropping [18, 19]. The pre-miRNA hairpins are subsequently exported to the cytoplasm by Exportin-5 [20], where the second RNase III Dicer mediates cleavage (dicing) of pre-miRNAs into short miRNA duplexes [21, 22]. Only one strand out of the miRNA duplexes (the guide strand) is stably incorporated into the miRNA-induced silencing complex (miRISC) afterwards (Fig. 1a). Subsequent to loading of the duplex into one of four Argonaute subfamily proteins (AGO1-4) that form the core of each miRISC, the passenger strand (miRNA*) is degraded [23]. Canonical miRNA maturation in plants is also completed by loading of an argonaute complex in the cytoplasm, but the miRNA duplexes are released directly from DCL1 and subsequently exported to the cytoplasm by HASTY [24]. In contrast to mammalian cells, where miRNAs are loaded without much discrimination between different Argonaute subfamily proteins, plant AGO proteins show preferences for specific small RNA classes that are produced by distinct biogenesis pathways [23]. Mirtrons represent a widespread class of intron-derived miRNAs in animals that are generated by a non-canonical biogenesis pathway which bypasses the pri-miRNA processing step by Drosha [25, 26]. Instead of cropping, pre-miRNA-like RNAs are directly produced by refolding of debranched intron lariats during mRNA splicing (Fig. 1b). Since putative mirtrons were also identified by deep sequencing of small ncRNAs in A. thaliana and rice, mirtron production is likely to occur also in plants, although to a much lesser extent [27, 28].

Fig. 1
figure 1

Canonical miRNA biogenesis and generation of intron-derived miRNAs (mirtrons) in animals. a Canonical miRNAs are transcribed independently by Pol-II into pri-miRNAs. Subsequent microprocessing of pri-miRNAs is mediated by the DGCR8/DROSHA complex and produces pre-miRNAs that are exported into the cytoplasm. b Mirtrons are transcribed together with a host gene and produced by refolding of debranched intron lariats during mRNA splicing. After export into the cytoplasm, miRNA/miRNA* duplexes are generated by cleavage of the loop via Dicer. The duplexes are loaded into RISC, where only the mature miRNA strand of the duplex is retained, whereas the passenger strand is degraded. In general, partially complementary base pairing of a miRNA seed region with a target mRNA induces translational repression, while perfect base pairing results in siRNA-induced endonucleolytic cleavage

Tools for prediction of novel miRNAs from small RNA sequencing data rely on the recognition of a specific read pattern subsequent to mapping to a reference genome [29]. The pattern reflects dicing as the last step of miRNA processing (Fig. 2). Two distinct clusters of accumulated reads cover the miR/miR* loci, separated by a short interval without overlapping reads. This region corresponds to the loop, which is cropped by Dicer and consequently not sequenced. The identified patterns are used for prediction and evaluation of the secondary structure from potential miRNA precursors. The precursors that comply with user-defined criteria are then returned as predicted novel miRNAs.

Fig. 2
figure 2

Mapping pattern of miRNAs. Removal of the pre-miRNA loop by Dicer leads to two separated clusters of reads mapped to the reference genome. The figure is based on http://nbn-resolving.de/urn:nbn:de:bsz:15-qucosa-112876

Partially complementary Watson–Crick base pairing of the miRNA seed region (nucleotides 2–8 from the 5′ end) to a target mRNA typically induces translational repression, while perfect base pairing of the miRNA results in siRNA-induced endonucleolytic cleavage that is mediated by AGO2 in vertebrates [30, 31] or by AGO1 and 10 along with other AGO paralogs in plants [32]. Generally, plant miRNAs show perfect or near-perfect complementarity to their mRNA targets, and consequently, the number of direct targets is lower by at least one order of magnitude compared to animal miRNAs [33]. Canonical plant miRNA target sites are found within the 5′ and 3′ untranslated region (UTR) as well as the coding sequence of mRNAs, suggesting that miRNA-directed regulation in plants is not restricted to a specific RNA context [34]. Against this backdrop, prediction of miRNA targets in plants requires less elaborate algorithms that mostly rely on sequence complementarity of a given miRNA [35]. Although miRNA binding sites in mammalian cells are also found across the entire mRNA, imperfect ‘seed’ pairing in 3′ UTRs is considered to mediate post-transcriptional silencing more efficiently than pairing within the 5′ UTR or coding sequence [33, 36, 37]. Consequently, 3′ UTR heterogeneity caused by SNPs or alternative polyadenylation (APA) strongly impacts miRNA-mediated post-transcriptional regulation and has to be taken into account for prediction of miRNA target sites within these cells. In humans, 3′ UTR associated SNPs have been linked to disease susceptibility [38], but the subsets of polymorphisms that have a functional role in regulating gene expression are yet to be defined. Polymorphisms within miRNA binding sites harbor the potential to disrupt miRNA binding or even to introduce novel binding sites in 3′ UTRs, and the biological relevance of these polymorphisms is currently being examined in large case–control studies [39]. APA is a common regulatory mechanism of gene expression that generates mRNAs with distinct 3′ UTRs as well as coding sequences (Fig. 3). More than 70 % of human genes encode primary transcripts that contain multiple polyadenylation sites (PA sites) [40, 41], and a systematic examination of 3′ UTRs produced by APA in murine cells revealed that approximately half of all miRNA target sites are located downstream of the first poly(A) site [42].

Fig. 3
figure 3

Alternative polyadenylation (APA) results in formation of distinct mRNA isoforms that are derived from one and the same pre-mRNA. Terminal exons that contain 3′ UTRs with multiple PA sites can give rise to mRNA isoforms that differ in UTR length only (shown on top). The length of these tandem UTRs depends on recognition of promotor-proximal PA sites that are linked to shortened 3′ UTR formation or distal PA sites that result in transcription of (nearly) full-length UTRs. Additionally, APA occurs in a splicing-dependent context that comprises the incorporation of alternative terminal exons into the mature mRNA (shown at the bottom). In contrast to formation of splicing-independent tandem UTRs, 3′ exon switching affects both the sequence as well as the length of a given 3′ UTR. Shortened or mutually exclusive 3′ UTRs contain different miRNA recognition and protein binding sites that affect the stability, localization, and translation efficiency of an mRNA. Evasion of miRNA-mediated post-transcriptional regulation due to the lack of miRNA binding sites either caused by shortening of tandem UTRs or through incorporation of alternative 3′ terminal exons is typically coupled to increased translation and protein synthesis. Figure adapted from [52]

In this review we focus on the impact of SNPs and especially APA on miRNA target binding, and furthermore discuss bioinformatical approaches that account for these mechanisms in animals. Quantitative information on APA site usage can be used to extend current prediction algorithms with information on binding site availability to estimate actual interaction efficiencies.

Alternative polyadenylation and miRNA-mediated post-transcriptional gene regulation

The process of nuclear polyadenylation along with the underlying regulation mechanisms in eukaryotes is summarized in this section. Furthermore, experimental methods for detection of genome-wide APA events are described, and the impact of APA on miRNA binding is discussed based on the data that was generated by these techniques.

Polyadenylation of nascent mRNAs and regulation of PA site usage

Addition of poly(A) tails to the 3′ end of nascent mRNAs in eukaryotic cells occurs as a two-step, co-transcriptional process and depends on the presence of defined poly(A) signals within the pre-mRNA [43, 44]. In mammalian cells, the canonical poly(A) signal AAUAAA is recognized and subsequently bound by CPSF (cleavage and polyadenylation specific factor). Initiation of 3′ end formation is further stimulated by cooperative binding of CstF (cleavage stimulation factor) to a less defined U/GU-rich downstream element. CFIm, the first of two necessary cleavage factors, additionally assists in poly(A) signal recognition by binding to a third element with the consensus UGUA [45]. Subsequent recruitment of poly(A)-polymerase (PAP) and CFIIm results in endonucleolytic cleavage of the nascent RNA, followed by addition of adenosines to the newly formed 3′ end of the upstream cleavage product via PAP [46]. Together with Symplekin and the C-terminal domain of Pol-II, these proteins constitute the core of the pre-mRNA 3′-end processing machinery [47]. In total, however, ~90 proteins contribute to or directly interact with the pre-mRNA 3′-end processing machinery in human cells [48], and some of the corresponding mRNAs are found to be differentially expressed in a context-dependent manner.

A screening of genome-wide APA events in proliferative and arrested human cell lines revealed downregulation of the mRNAs encoding CPSF and CstF, accompanied by global lengthening of 3′ UTRs, during transition from the proliferative to the arrested state [49]. Increased expression of these genes in proliferating cells was shown to be driven by an enhanced expression of E2F transcription factors, and tightly linked to an increased utilization of proximal PA sites. In contrast, knockdown of CFIm25 resulted in 1450 transcripts with shortened 3′ UTRs in HeLa cells [50]. Decreased expression of CFIm68, another subunit from the CFIm complex, similarly resulted in globally shortened 3′ UTRs [51]. These findings are in line with previous reports of APA events associated with variations in the abundance of core polyadenylation factors (reviewed in [52]), and confirm that altered CFI levels influence PA site usage.

Apart from varying abundances of core members of the 3′-end processing machinery and the presence of cis-acting RNA elements, such as the poly(A) signal, the U/GU-rich downstream element or auxiliary sequences, trans-acting factors like splicing factors and RNA-binding proteins are also involved in the regulation of APA. The U1 small nuclear ribonucleoprotein (U1 snRNP) is usually implicated in mRNA splicing, but was also shown to play a role in APA regulation, where it globally suppresses the usage of distal PA sites [53]. Moderately decreased U1 snRNP levels led to significantly enlarged fractions of mRNAs with proximal PA sites in human, murine and Drosophila cells [54]. Di Giammartino and colleagues provide descriptions for several other splicing factors that influence APA including Nova2, PTB (polypyrimidine tract binding protein), hnRNP H (heterogeneous nuclear Ribonucleoprotein H), SRm160 and U2AF65. Besides trans-acting factors, increasing evidence suggests that APA is regulated on the epigenetic level by DNA methylation, nucleosome positioning and posttranslational histone modifications [52]. A systematic analysis of human nucleosome occupancy patterns revealed that highly used proximal PA sites exhibit higher upstream nucleosome levels and Pol-II accumulation than lowly used sites, indicating that the nucleosomes positioned upstream of proximal PA sites influence recognition of these sites by decelerated transcription [55].

Experimental methods for genome-wide identification of PA sites

Techniques that allow for global detection of novel APA events include 3′ end sequencing approaches like Poly(A) site sequencing (PAS-Seq [56]), PolyA-seq [57], poly(A)-position profiling by sequencing (3P-Seq [58]), massive analysis of cDNA ends (MACE [59]), as well as the more recently published 3′ region extraction and deep sequencing (3′READS) protocol [60]. In comparison to RNA-Seq, these methods generate exactly one read out of the 3′ end of each mRNA, which allows for accurate 3′ UTR isoform quantification. PolyA-seq, PAS-seq, and MACE produce oligo(dT)-primed reads proximal to (0–400 nucleotides) or even overlapping the poly(A) tail. In contrast, 3P-Seq and 3′READS circumvent conventional oligo(dT) priming during library preparation, because oligo(dT)-based reverse transcription is prone to inadvertent priming of homopolymeric adenosine stretches within the mRNAs. Both, 3P-Seq and 3′READS produce reads that are directly adjacent to the poly(A) tail, and thus allow for filtering of those reads with at least one or two non-genomic 3′-terminal adenine bases (termed poly(A) site supporting reads) for further analysis. In fact, 3P-Seq identified more than 8500 3′ UTR isoforms in C. elegans that were missed by standard oligo(dT)-based methods, while 3′READS from different mouse tissues revealed more than 5000 PA sites that were overlooked because of flanking homopolymeric adenosine stretches. Based on 3′READS, the number of murine mRNAs known to exhibit APA increased to almost 80 %.

Although library preparation via MACE relies on oligo(dT)-based reverse transcription, mispriming events during library preparation are minimized by hot priming of the mRNAs [61]. Instead of sequential denaturation and reverse transcription as specified for the PolyA-seq or PAS-seq protocol, hybridization of the oligo(dT) primer and subsequent cDNA synthesis are both performed at elevated temperatures, and without temporary cooling of the samples. The comparison of murine APA events either detected by MACE or by 3P-seq, which is not affected by internal priming, revealed a large overlap between high-confidence PA sites detected by the two methods (83 % for MACE and 85 % for 3P-seq). The proportionally similar percentages of PA sites exclusively detected by one of the two methods indicate an efficient exclusion of false positive PA sites arising from internal priming, and confirm the reliability of MACE alongside 3P-seq and 3′READS for PA site detection.

Impact of APA on miRNA-mediated gene regulation

Transcription of mRNA isoforms with distinct 3′ UTRs modulates the post-transcriptional fate of these mRNAs through inclusion or exclusion of miRNA binding sites (Fig. 3) [62]. Additionally, miRNA-mediated post-transcriptional regulation is affected by the altered accessibility of miRNA binding sites in shorter/longer 3′ UTRs due to secondary structure variations, and by the varying proximity to the translation machinery [36]. In different cell types from zebrafish, approximately ten percent of the predicted miRNA–mRNA interactions are influenced by APA [63]. While the shortest 3′ UTR isoforms are present in the ovaries, a significantly more prevalent use of distal PA sites is found in brain tissues. Globally shortened 3′ UTRs were furthermore identified in early mouse embryonic stem cells [64]. Reprogramming of somatic to induced pluripotent stem cells is associated with a trend towards shortened 3′ UTRs, whereas reprogramming of spermatogonial cells is linked to 3′ UTR lengthening [65]. Together with the globally lengthened 3′ UTRs in proliferative human cell lines described by Elkon and colleagues [40], these findings reflect that vertebrate cells with an increased proliferative potential generally harbor shorter 3′ UTRs.

Several studies emphasized the implications of APA in human diseases. Deregulated expression of the gene encoding brain-derived neurotrophic factor (BDNF) is associated with several neurodegenerative diseases, such as Huntington’s disease [66]. Reduced BDNF levels have been observed in corresponding cell models as well as genetic mouse models, and are likely to contribute to the clinical manifestations of the disease [67]. The UTR of the BDNF gene harbors two PA sites: one distal (~3 KB from UTR start, BDNF-L) and one proximal (~350 bps from UTR start, BDNF-S) site [41]. The BDNF-L isoform contains ten potential miRNA binding sites, predicted by at least four algorithms, whereas BDNF-S only contains six of these. Luciferase reporter assays in human embryonic kidney 293 cells (HEK-293) confirmed the direct interaction of miR-1, miR-10b, miR-155 and miR-191 with BDNF-L, while BDNF-S interacted only with miR-1 and miR-10b [68]. This is consistent with the observation that the short BDNF transcript isoform neither carries a miR-155 nor a miR-191 binding site. Furthermore, after transfection with the miR-1 precursor, luciferase activity was significantly lower for the BDNF-L isoform that carries three predicted binding sites for this miRNA, compared to BDNF-S, which carries only a single miR-1 binding site [68]. The protein level of BDNF, thus, largely depends on PA site usage associated with altered post-transcriptional regulation of the encoding mRNA by miRNAs.

In 2009, Mayr and Bartel hypothesized that APA might represent a mechanism for genes to escape miRNA-mediated post-transcriptional repression in cancer [69]. Based on Northern blot analysis of the 3′ UTR isoforms from six alternatively polyadenylated target genes in 27 cancer cell lines compared to corresponding normal tissues and non-transformed cell lines, they concluded that 3′ UTR shortening is indeed associated with cancerogenesis. Most cancer cell lines expressed significantly shorter APA isoforms, and across all six genes and all 27 cell lines, 25–70 % of the elevated protein-expression levels could be attributed to a loss of post-transcriptional regulation via miRNAs due to 3′ UTR shortening. In ~30 % of the cell lines, this regulatory loss could even explain all changes in protein levels. Using the dataset published by Sandberg and colleagues [42], Mayr and Bartel confirmed this trend to be a global mechanism on a genome-wide scale. Moreover, the cancer cell lines showed significantly shorter 3′ UTRs than immortalized cell lines despite a comparable proliferative potential, indicating that shorter 3′ UTRs are not only associated with increased proliferation but also with malignant transformation [69]. By transfection of NIH3T3 cells with retroviral vectors harboring either the shortest 3′ UTR or the full-length isoform of proto-oncogene IGF2BP1, they showed that expression of the short isoform greatly promoted cell transformation. An in-depth analysis of the 3′ UTR isoforms of IGF2BP1 discloses nine functional PA sites in human HLF cancer cell lines, and reveals a varying number of lacking miRNA binding sites in shortened isoforms (Fig. 4). As shown by Mayr and Bartel, these shortened 3′ UTRs positively affect mRNA stability and result in increased amounts of encoded protein product.

Fig. 4
figure 4

Visualization of the 3′ UTR from the human IGF2BP1 transcript by APADB. Nine PA sites (I) detected by clustering of end-coordinates of polyA-tail positive NGS reads (III) are present in human HLF liver cancer cell lines. Except for one site (IGF2BP1.3) each PA site is associated with a closely upstream (~20 BPs) PA signal (II). Only the longest 3′ UTR isoform harbors all predicted TargetScan binding sites (IV), while the isoform with the shortest 3′ UTR (IGF2BP1.9) does not carry a single predicted miRNA binding site

Experimental approaches for detection of miRNA–mRNA interactions

Since the discovery of the first miRNAs in C. elegans [7072], several techniques that allow for identification of miRNA target sites have been established. While miRNA–mRNA interactions in plants are readily detected by identification of degraded target intermediates, experimental methods in animals remain laborious and mostly focused on a predefined set of targets [73]. This section summarizes the current experimental techniques for detection/validation of miRNA–mRNA interactions in plants and animals.

Genome-wide experimental identification of miRNA binding sites in plants

Among the techniques for genome-wide analysis of miRNA-mediated mRNA degradation in plants, the most established approaches are termed parallel analysis of RNA ends (PARE) [74], genome-wide mapping of uncapped and cleaved transcripts (GMUCT) [75], and degradome sequencing [76]. In all three protocols, total RNA isolates are enriched for polyadenylated transcripts followed by 5′ adaptor ligation of an RNA oligonucleotide (GMUCT and PARE) or DNA–RNA hybrid (degradome sequencing). Since capped transcripts are not subject to 5′ adaptor ligation, only uncapped RNA fragments are subsequently reverse-transcribed and amplified. Following sequencing of these RNA fragments, miRNA-mediated cleavage sites are identified by alignment of the 5′ ends of the generated reads to respective reference mRNAs.

The first application of PARE revealed a widespread presence of miRNA target sites across all annotated genes in A. thaliana and detected most previously validated targets along with nearly all previously predicted but non-validated targets. Stringent analysis of the potential cleavage sites even allowed for identification of putative cleavage sites that were previously unknown, and further validation of some of these targets by 5′ RACE confirmed their presence. At the same time, the first genome-wide screening of miRNA cleavage sites in A. thaliana tissues via degradome sequencing resulted in identification of completely novel targets and confirmed previously validated and predicted cleavage sites. To date, genome-wide miRNA–mRNA interaction profiles have been generated for a wide variety of plants including legumes and other important food crops [77, 78]. Additionally, adapted pipelines have been developed that provide tailored computational algorithms for detection of cleaved miRNA targets from degradome data [79, 80]. Only recently, two further developed protocols were published that provide step-by-step instructions for streamlined library preparation for degradome sequencing on Illumina platforms [81, 82]. Together with the adapted pipelines, these techniques provide reliable approaches to globally profile miRNA-mediated cleavage sites also in non-model plants.

Experimental methods for validation of miRNA–mRNA interactions and identification of novel miRNA binding sites in animals

A variety of experimental methods have been developed to detect miRNA–mRNA interactions in animals. The transfection of cell lines with mimic miRNAs or miRNA inhibition by antagomiRs is widely used to measure the effect on mRNA levels within the cells [83, 84]. Even though this technique provides a global profile of changes in mRNA levels via transcriptome profiling by RNA-Seq or similar techniques, direct interactions must be distinguished from indirect interactions and stress responses of the cells have to be discriminated from true responses to miRNA elevation/repression [85]. Reporter assays, on the other hand, allow for identification of direct interactions between a given miRNA and specific mRNA targets [86]. However, these assays are usually limited to a fixed set of interactions, and therefore promising candidate interactions have to be defined in advance either based on additional experimental data or by in silico prediction of potential miRNA target genes.

Genome-wide methods for detection of miRNA binding sites are based on UV cross-linking in vivo followed by immunoprecipitation of AGO-bound miRNA–mRNA complexes in vitro. HITS-CLIP [87], PAR-CLIP [88], and CLASH [89] represent the most recent approaches in this field, and their adaption to immunoprecipitation of AGO proteins allows for simultaneous identification of miRNAs along with mRNA targets and corresponding binding sites. In all three protocols, interactions between DNA and RNA-binding proteins and their respective targets are initially fixed by UV cross-linking. Following precipitation and purification of AGO(s) as the RNA-binding protein(s) of interest, cross-linked and co-immunoprecipitated RNAs are partially digested and subsequently released by proteinase digestion. The remaining RNA fragments are adaptor-ligated, reverse-transcribed into cDNA, amplified and finally subjected to next-generation sequencing (NGS). While HITS-CLIP and PAR-CLIP provide distinct sets of reads that are either derived from miRNAs or target mRNAs, CLASH involves an additional ligation step subsequent to partial digestion of protein-bound RNA and prior to adaptor ligation. This additional intramolecular ligation results in generation of chimeric reads that comprise a respective miRNA sequence along with the targeted mRNA. Using this approach, interaction sites of the RNA molecules that gave rise to a chimeric read can be identified by in silico folding. The localization of protein RNA-binding sites within sequencing data from CLIP experiments is based on identification of cross-linking induced mutation sites (CIMS). Reads generated by HITS-CLIP are screened for cross-linking induced deletions that originate during reverse transcription of RNAs with amino-acid-RNA adducts [90]. In contrast to substitutions or insertions, these deletions are highly site-specific and permit identification of interaction sites at single-nucleotide resolution. Likewise, reads generated by PAR-CLIP are screened for CIMS to identify binding regions. Since PAR-CLIP aims to maximize cross-linking efficiency by random incorporation of photoactivatable nucleoside analogues into nascent RNAs, CIMS are represented by specific substitutions within the sequenced reads. Feeding of cultured cells with 4-thiouridine, for instance, results in thymidine to cytidine transitions in the cross-linked RNA. Once the binding regions have been defined by target CIMS, miRNA–mRNA interactions can be deduced by comparison of corresponding regions between the miRNA and mRNA reads from HITS-CLIP or PAR-CLIP experiments.

Using PAR-CLIP, ~3500 canonical miRNA target interactions could be identified in MCF-7 cells and the combined analysis of these with gene expression profiles from individual breast cancer patients revealed a significant correlation between expression levels of miR-182 targeted mRNAs and overall patient survival [91]. MiR-218 is significantly downregulated in human medulloblastoma and a recent screening for corresponding target sites via HITS-CLIP identified more than 600 target genes including both previously known and completely novel targets [92]. Since miRNA–mRNA interactions are directly preserved within chimeric reads, CLASH provides a quantitative basis for analysis of such interactions. The chimeric reads from human cell cultures confirmed that mRNAs represent the principal binding partner of miRNAs (~70 % of the identified interactions). The remaining binding partners comprised pseudogenes, long intergenic ncRNAs, ribosomal RNAs, transfer RNAs, small nuclear RNAs, as well as ceRNAs, such as the circular RNA sponge for miR-7 (ciRS-7). In addition to the quantitative aspects of CLASH, miRNA–mRNA base pairing patterns can be deduced from chimeric reads without prior knowledge of targets or binding modes.

In silico analysis of miRNA-mRNA interactions in animals

As previously outlined, experimental methods for identification of miRNA–mRNA interactions in animals often require highly sophisticated bioinformatical analyses and/or precedent knowledge of promising candidate interactions. Consequently, computational methods have to be constantly improved and modulated depending on the available experimental data. Current computational approaches for in silico prediction of miRNA targets are discussed in this section with special emphasis on methods that combine data from mRNA and miRNA quantification experiments.

In silico prediction methods and interaction databases

To date, several algorithms for in silico prediction of miRNAs and their target genes have been developed. Prominent tools comprise TargetScan [93], miRanda [94, 95], PITA [96] and PicTar [97], which rely on sequence complementarity of the 3′ UTRs from potential mRNA targets with the seed region of a given miRNA. Additionally, these algorithms account for the secondary structure of the miRNA and/or respective target sites (e.g. free energy of the miRNA–mRNA union: ΔG or costs to unfold the secondary structure of the target site ΔΔG), the number of potential binding sites within one transcript, and if applicable, conservation of the miRNA and/or the target site across mammals (assuming that conservation increases the likelihood of a functional site) [36, 98, 99].

To avoid computationally intensive re-calculation of target predictions, the results of several of the tools are also available from database resources (Table 1). These databases include all predicted interactions between known 3′ UTRs of selected organisms and known miRNAs from miRBase [9]. Nevertheless, most of them are not updated regularly and none of them takes into account that 3′ UTR length of one and the same transcript is highly variable between different biological conditions due to APA [100]. This heterogeneity in 3′ UTR length represents one of the major reasons for the relatively high false positive rate of available predictions in databases, which is estimated to reach up to 70 % [101], and consequently complicates reliable predictions for interactions in a given tissue.

Table 1 Overview of commonly used miRNA target databases

The overlaps between prediction results of different tools/databases are very low and range from 5 to 70 %. For that reason, databases such as miRwalk [102] or miRo [103] only list interactions that are commonly identified by several tools to increase the likelihood of predicting true interactions. However, even those databases that apply a combination of algorithms and tools for prediction are not capable of taking into account specific biological characteristics of a given sample. More recently published database tools try to account for these by including co-expression information into the prediction algorithm. MirCoX [104] and miRConnect [105] take advantage of the fast growing number of miRNA and mRNA sequencing data in publicly available repositories, and use the data to calculate significant negative correlations between mRNA and miRNA expression values under various biological conditions such as disease or stress states.

Databases that summarize experimentally validated interactions comprise miRecords [106], that provides 2705 records of interactions between 644 miRNAs and 1901 target genes in nine animal species, as well as TarBase [107], the largest manually curated database, indexing more than 65,000 validated interactions in 21 species. The validated interactions summarise the results from reporter assays and high-throughput experiments. However, with ~200,000 estimated interactions occurring only in humans [108], these databases remain a relatively limited resource for miRNA target identification.

Combining quantitative expression data with sequence-based predictions

The attempt to use quantitative mRNA expression profiles from appropriately designed experiments for improvement of in silico miRNA target predictions has led to the development of several computational models that improve sequence-based predictions with experimental data [109112]. These models combine mRNA and miRNA expression profiles from a particular experiment with information about predicted miRNA–mRNA interactions. The underlying frameworks are based on finding significant negative correlations between the expression of a miRNA and its potential target gene in the present experimental setup to increase the likelihood of a true interaction (Fig. 5). Several web services like omiRas [113], CPSS [114] or ncPRo-seq [115] provide the possibility to quantify and compare miRNA expression from raw sequencing libraries of two biological conditions, such as healthy and diseased individuals. Furthermore, this quantitative data can be used to infer miRNA targets among differentially expressed genes between the same conditions, incorporating target predictions from various target databases. However, these models are solely focused on interactions with perfect seed matches resulting in post-transcriptional degradation of target mRNAs. Translational inhibition due to imperfect seed matches does not alter mRNA abundance and is therefore undetectable by correlation analysis. High-throughput measurement of protein levels could help overcome this limitation, but reliable and reproducible measurements of these is currently only achieved for a subset of proteins [116].

Fig. 5
figure 5

Integration of experimental expression data into target prediction. In order to predict miRNA–mRNA interactions, a matrix for each miRNA-mRNA pair is generated (top). Potential miRNA–mRNA interactions exhibit a significant negative correlation (circles). Furthermore, interactions gain support by sequence complementarity (S), as visualized at the left bottom, and by the observed expression of miRNAs and mRNAs in each biological replicate (dots in the scatterplot at the bottom)

Including APA information into interaction predictions

Even though these approaches allow for considering the respective biological conditions under investigation, 3′ UTR heterogeneity remains disregarded. 3′ UTR shortening by APA can result in the absence of miRNA binding sites within a predicted target and consequently give rise to false positive predictions. Against this backdrop, incorporation of information from 3′ end sequencing data into target prediction algorithms would significantly improve the prediction of interaction efficiency [117], and thus provide biologically more relevant results. To date, TargetScan Web service (http://targetscan.org/) represents the only miRNA–mRNA interaction database that includes according data into its prediction results. The predicted interactions for zebrafish miRNAs are available for each 3′ UTR isoform detected by 3P-Seq [58]. However, the abundance of specific isoforms is not taken into account to rank the interaction efficiencies. The upcoming version of TargetScan is going to include the fractions of mRNAs with a specific miRNA target site into the predictions for human, mouse and zebrafish, but these fractions will still not be tissue-specific.

Nam and colleagues were the first that introduced a revised prediction model, which weights prediction scores for a miRNA and its target site by the percentage of PA isoforms in each cell type that carry this interaction site [117]. On average, this model outperforms previous models by 50 %, thus indicating a new strategy in miRNA–mRNA interaction algorithm development. The growing number of global and tissue-specific APA profiles in databases such as APADB (http://tools.genxpro.net/apadb/) provides the necessary resource for a significant improvement of miRNA–mRNA interaction predictions in diseased and stressed tissues.

Predicting the effect of SNPs on miRNA–mRNA interactions

Single-nucleotide polymorphisms can influence miRNA binding by two distinct mechanisms. On the one hand, SNPs can alter the seed region of a miRNA or mRNA binding site, which is linked to increased/decreased binding efficiencies. On the other, SNPs can interrupt or create novel PA signals that lead to transcript isoforms with altered miRNA binding site availability. A screening of human SNPs indicates that a substantial fraction of SNPs harbor the potential to create or disrupt APA signals, and this type of SNPs has been defined as APA-SNPs [118]. Many APA-SNPs are associated with shortened transcripts and increased gene expression due to reduced miRNA binding site availability, especially in disease. Moreover, these SNPs do not only affect PA site usage by altering canonical PA signals. An APA-SNP located in the U/GU-rich downstream element of the second PA signal of the human ATP1B1 gene was shown to be associated with higher blood pressure in a European–American population [119]. This APA-SNP is linked to an increased usage of the second PA site due to increased CstF binding efficiency. The resulting transcript isoform exhibits increased translation efficiency and consequently yields a larger amount of protein product.

Recently, several databases that account for the impact of SNPs within 3′ UTRs on miRNA binding have been released [120123]. Most of these rely on simple modifications of known algorithms like TargetScan or miRanda to detect the effects of a given SNP on miRNA binding stability [124]. However, these databases only rely on known SNPs from public resources like DBSNP and are therefore restricted to a predefined subset of SNPs [125]. To determine the influence of novel variations, mrSNP [126] calculates the impact of user supplied SNPs based on different criteria. Most importantly,  these criteria comprise the distance in the alignment to the seed region of a miRNA, and the changes in the energy of the alignment compared to an alignment without the mutation. Currently, limited data for evaluation is available, but mrSNP correctly predicted disrupted binding for 69 % (11/16) of the SNPs with experimentally verified effect on miRNA binding.

Concluding remarks and perspectives

The functional fraction of the human genome remains a widely discussed topic and naturally depends on the definition of functionality. While the ENCODE sequencing project revealed that more than 80 % of the human genome codes for RNA, recent analysis that takes evolutionary conservation into account estimates 7.1–9 % of the human genome to be functional [127]. However, only a small fraction of these functional sites (1 % of the genome) is protein-coding. Comparative analyses between mouse and human tissues suggest that most, but not all, non-coding RNAs have short-lived lineage-specific functionality, which is especially true for long non-coding RNAs. In contrast, miRNA precursors and in particular, seed regions of mature miRNAs are highly conserved between mammals which underline their role as important post-transcriptional regulators.

The advent of NGS methods that allow for global investigation of the miRNAome across several tissues and disease states along with corresponding gene expression profiles provides the basis for a deeper understanding of the post-transcriptional regulation of biological pathways. Experimental methods such as CLASH permit a reliable detection and even quantification of direct miRNA–mRNA target interactions on a global scale.  This trend is going to further extend the overwhelmingly large amount of sequencing data in  respective databases, and consequently more efficient algorithms and data-mining tools are needed. The resulting gain of knowledge will likely improve therapeutic options for treatments of serious diseases such as cancer. In 2013, the first clinical study based on a miRNA mimic was initiated and more miRNAs, currently published as biomarkers with unknown function, are waiting in line for potential clinical use. To accelerate this process, the underlying molecular mechanisms of these miRNAs, and especially their condition and tissue-specific target genes, need further investigation. APA is one of the major reasons for 3′ UTR heterogeneity, and consequently, identification of condition and tissue-specific PA isoforms represents the next necessary step to fill this gap in knowledge. With the growing number of 3′ end sequencing data that is available in public databases, regulatory models can be further improved not only in the context of human diseases, but also for other animals and cellular processes.