Keywords

3.1 Introduction

In 2002, Eric Lai [1] compared the sequences of 11 microRNAs to the K box and Brd Box motifs that were known to mediate post-transcriptional regulation in Drosophila. He demonstrated that the first eight nucleotides, now called the seed region, of microRNAs (miRNAs), were perfectly complementary to these motifs and concluded that this complementarity may be essential in post-transcriptional regulation by microRNAs. This simple bioinformatics analysis established one of the strongest predictive features used in target prediction to date. Since then, the microRNA repertoire has grown exponentially and numerous experimental methods have been developed to confirm microRNA targets. None of these advances has produced a unique feature of microRNA targeting that is more telling than the seed region. They have instead led to the conclusion that microRNA regulation is very intricate and diverse. For this reason, the computational and experimental methods that have been developed generally focus on specific aspects of microRNA regulation and are used to either investigate the physical interaction between microRNAs and their putative targets or the functional outcome of microRNA targeting. Here we describe these computational and experimental methods and explain which specific aspects of microRNA regulation they focus on.

3.2 Computational Methods to Identify microRNA Targets

Despite a plethora of different algorithms and methods to predict microRNA targets, most rely on similar sequence-based approaches for their starting point. These algorithms initially search for some degree of sequence complementarity between the miRNA of interest and the 3′ untranslated region (3′UTR) of mRNAs with emphasis on the miRNA seed region (nt 2–8). Because the miRNA:mRNA duplex can contain mismatches, gaps and G:U pairs, the number of possible targets based uniquely on this alignment is too large to be informative. Additional steps are therefore required to refine target predictions and rank them according to statistical confidence. Here we describe the most commonly used methods for detecting miRNA targets, classified according to the criteria used to refine the initial sequence analysis (Fig. 3.1). For each approach we provide examples of commonly used algorithms and discuss their limitations.

Fig. 3.1
figure 1

Computational methods to identify miRNA targets. After the initial search for sequence complementarity between the seed region of the miRNA (nt 2–8) and the putative mRNA target, most algorithms will use additional criteria to refine predictions. The functional category of targets can be used to search for targets that belong to the same biological pathway or process. Combining microRNA and mRNA expression data and searching for negative correlations between them can efficiently predict miRNA targets regulated through mRNA destabilization. The thermodynamic stability of the microRNA:mRNA duplex searches for stronger physical interaction between the miRNA and its targets. Investigating sequence conservation of the target site between multiple species or multiple target sites in the same 3UTR can be used to rank putative targets according to their statistical likelihood

3.2.1 Thermodynamic Stability of the microRNA:mRNA Duplex

miRanda [2], the first freely-available prediction program measures the thermodynamic stability between a miRNA and its putative target to increase prediction accuracy. Different scores for the C:G, A:U, and G:U pairs are used to measure stability with a requirement for more stable energy scores at the 5′ end of the miRNA. A user-defined threshold can then be set to eliminate unstable duplexes. Since miRanda became available, more complete models to calculate the stability of RNA duplexes have been published and successfully used to predict miRNA targets. The standalone algorithm RNAhybrid [3], for example, calculates the most stable hybridization site between two sequences and can easily be incorporated into existing prediction algorithms. The PITA algorithm [4] also uses thermodynamic stability of a miRNA:mRNA duplex but compares it to the stability of local structures within the 3′UTR of the target mRNA. If the duplex is predicted to occur within a region of the 3′UTR that is already involved in a stable structure, the miRNA is less likely to bind to its target. This approach is limited by the accurate prediction of stable secondary structures, which becomes ­unreliable when considering long distance interactions and therefore larger RNA structures.

3.2.2 Sequence Conservation of the Target Site Between Multiple Species

Evaluating sequence conservation of predicted targets between distantly related species efficiently reduces the number of false positive predictions. Most algorithms will require that the predicted miRNA target site be located in homologous regions of the 3′UTR, and that the seed binding region be in a highly conserved region. TargetScan [5] initially searches for conserved seed pairing regions in 3′UTR alignments between 28 vertebrate species. This set of putative targets is then refined using a context score based on the target position in the 3′UTR and surrounding sequence composition and further refined by considering 3′ pairing of the miRNA within [6]. This approach is of little use in detecting species-specific binding sites or binding sites of species-specific miRNAs. TargetScan also provides non-conserved targets on their website.

3.2.3 Multiple Targets in the Same 3′UTR

Recent analysis demonstrates that numerous mRNAs are targeted by the same miRNA at different sites within their 3′UTR. This multi-targeting occurs at a significantly higher rate than expected. Focusing therefore on mRNAs that have more than one predicted site for the same miRNA in the 3′UTR can increase the signal to noise ratio for different algorithms [7, 8]. Although this approach will eliminate numerous true target sites it has the advantage of producing a list of high confidence gene targets. This method requires the user to first select one or more target prediction programs and subsequently refine their results for multi-targeting. This last step can be performed on the mimiRNA website [8] (http://mimirna.centenary.org.au). The PicTar [9] algorithm uses a combinatorial approach that not only accounts for multiple binding sites of the same miRNA but also computes the likelihood that a sequence is bound by a combination of input miRNA sequences. Filtering predictions based on multi-targeting drastically reduces the number of predicted targets and, because they increase the probability of discovering true target genes, they are useful for studies where experimental validation of miRNA targets is necessary.

3.2.4 Functional Category of Targets

Because miRNAs can often affect genes in a biochemical pathway or biological process [10], considering the function of target genes may eliminate biologically irrelevant predictions. mirBridge [11] starts with a set of genes with a known function and searches for enrichment of putative targets based on sequence analysis amongst this gene set. This approach is useful for experiments where a specific function or pathway is being dissected but may prove limiting in studies where a specific miRNA or mRNA is being analysed with no prior knowledge of its function.

3.2.5 Combining microRNA and mRNA Expression Data

Numerous miRNAs inhibit gene expression by destabilizing mRNAs [12]. As a consequence, mRNA targets should be expressed at lower levels in tissues where the miRNA is expressed. Correlating mRNA and miRNA expression across multiple tissues and selecting those pairs that are negatively correlated can successfully detect target genes [13]. Because this method is independent of any sequence analysis, it can be used to filter predictions made by any of the aforementioned algorithms. Another advantage of this approach is that it is not restricted to targets located in the 3′UTR. Although there are fewer published examples of miRNA targets in other regions of mature mRNAs, there may be numerous targets in the coding region that have been overlooked because the high level of sequence conservation in exons prohibits the use of sequence conservation-based techniques (see Sect. 3.3.3). The major drawback of this approach is that miRNAs that do not affect mRNA levels or that only “fine-tune” gene expression will not be identified. The mimiRNA website [8] provides correlation analysis in human samples and displays the predicted targets from TargetScan, miRanda, and PicTar.

3.2.6 Concluding Remarks Regarding Computational Methods

The goal of these different approaches is to reduce prohibitively large lists of predicted targets without losing too many true targets. Tuning these algorithms to find an optimal tradeoff between accuracy and sensitivity is currently impossible because relatively few targets have been validated experimentally. As a result, the efficiency of these algorithms is often tested by measuring the enrichment for predicted targets amongst a set of mRNAs or proteins for which the expression is subject to perturbation of miRNA expression. A recent study based on protein expression following both miRNA overexpression and knockdown found that TargetScanS and Pictar gave the best results [14]. However, this type of benchmark does not account for off-target effects which may be prevalent considering that miRNAs often target transcription enhancers and repressors [13]. One commonly used approach to enhance the quality of target predictions is to consider the overlap between multiple programs. We do not recommend this as there is no proof that this will increase prediction quality and it will systematically reduce the number of candidates [7].

3.2.7 Future Directions

The degree of sequence conservation of a target or its involvement in a pathway for which other targets are predicted (described in Sects. 3.2.2, 3.2.3 and 3.2.4 above) does not imply the biological mechanism through which a specific miRNA binds to its targets. Binding of miRNA:mRNA pairs is affected by spatial and temporal co-expression of the miRNA:mRNA pair, target site availability, and the formation of a stable duplex at the target site. Future algorithms will be required to investigate these three criteria to discover the whole repertoire of miRNA targets.

Co-expression of miRNA:mRNA pairs is often evaluated by simultaneous sequencing of mRNA enriched libraries and small RNA libraries from the same cells. As more of these experiments are performed on different cell types and even subcellular localizations, prediction tools will be able to integrate co-expression data with increasing efficiency.

Target site availability is currently evaluated by folding a small sequence of RNA around the putative target. As discussed above, this does not take into account long distance interactions between different regions of the same RNA molecule. Such interactions are currently impossible to predict because there is insufficient biochemical data on the stability of large RNA structures and because the number of possible suboptimal structures that could be predicted is prohibitively large. Moreover, target site accessibility should take into account RNA binding proteins, the prediction of which suffers the same limitations as miRNA targets.

The stability of the miRNA:mRNA duplex has been thoroughly investigated through machine learning models and in vivo mutagenesis assays [15]. The results of these studies show that there is no clear-cut rule on the amount of sequence complementarity required between the miRNA and its target or at what position complementarity should occur. These most likely depend on the region of the mature miRNA that is exposed in the active site of Argonaute proteins and are therefore available to interact with its target. Understanding the different conformations of the Argonaute proteins should therefore allow for more accurate target predictions.

3.3 Experimental Identification and Validation of microRNA Targets

The identification of microRNAs and their target genes was originally conducted through classic genetic studies in the worm Caenor­habditis elegans, whereby a miRNA mutant displayed an opposite phenotype to that shown by the corresponding target gene null mutant [16]. Although this method was appropriate for small organisms such as nematodes [17] or the fruit fly Drosophila melanogaster [18], it remains limited for larger animals like mammals. Therefore artificial systems are needed to identify and validate miRNA target genes. Validation of a putative miRNA target site requires that a physical interaction between a miRNA and its target mRNA will lead to decreased production of the corresponding protein. Such physical interaction implies the spatiotemporal co-expression of the regulating miRNA and its target gene. On this basis, modulating miRNA expression levels should result in changes in the amount of a reporter protein such as luciferase or GFP, which are quantified in comparison to controls. Several methods have been designed to experimentally identify targeted mRNAs at various steps along the miRNA regulatory pathway (Fig. 3.2). Since the net result of miRNA-mediated gene regulation is a decrease in the amount of target protein being produced, methods measuring changes in protein output resulting from variations of miRNA expression have become a standard approach to identifying and validating miRNA targets. In addition, a number of biochemical methods have been developed in order to experimentally identify miRNA:mRNA pairs isolated from immunopurified ribonucleoprotein complexes or enriched miRNA:mRNA duplexes. Here we describe some of the methods used to experimentally identify and validate miRNA target genes (see also refs 1921 for review).

Fig. 3.2
figure 2

Experimental methods designed to identify and validate targeted mRNA based on the relevant part of the miRNA regulatory pathway. Once loaded into the RNA-induced silencing complex (miRISC), miRNA drives miRISC to the targeted mRNA. Depending on the level of complementarity between the miRNA and the mRNA target site, miRISC follows two different routes to inhibit protein production. Partial base pairing between miRNA and mRNA (left) leads to translation inhibition and mRNA decay. High complementarity between miRNA and targeted transcript (right) results in mRNA cleavage by Argonaute slicing activity. (a) Biochemical methods have been designed in order to purify miRNA:mRNA complexes by immunoprecipitation (IP) or pull-down of labeled miRNA from miRISC components (Sect. 3.3.3). (b–c) Molecular approaches are used to identify target genes through miRNA-primed reverse transcription of targeted mRNA template, or by analysis of cleavage products (Sect. 3.3.4). (d) Proteomics analysis identifies changes in protein output upon miRNA expression variations (Sect. 3.3.2). (e) Target genes are ultimately validated by reporter assay (Sect. 3.3.1). DIG digoxigenin, ORF open reading frame

3.3.1 Reporter Assays

In vitro reporter assays have been designed to confirm the interaction between a given miRNA and a putative target mRNA. The rationale is that upon binding to its target site(s) a given miRNA will inhibit reporter protein production, thereby leading to reduced protein amount or activity which can be measured compared to relevant controls [2225]. Typically the putative miRNA target site is cloned downstream of the open reading frame of a reporter gene, e.g. luciferase (Renilla or firefly) or GFP, and the recombinant plasmid is transfected into mammalian cells. Depending on the size of the 3′UTR to be tested, the full-length UTR or a fragment containing the predicted binding site is used. However, a partial UTR sequence may give erroneous positive results due to higher accessibility of the miRNA consequent to loss of secondary structures in the UTR. The recombinant reporter plasmid and a vector overexpressing the miRNA of interest, or a synthetic double-stranded oligonucleotide (miRNA mimic), are then transiently transfected into mammalian cells, usually HeLa or HEK293 cells, and luciferase activity or fluorescence intensity is measured 24–48 h later. It is important to assess endogenous miRNA expression levels in the cell system used for the assay, as the endogenous expression of miRNAs is not the same from one cell type to another, and some miRNAs display tissue-specific expression (e.g. hematopoietic-, brain-, embryonic stem cell-restricted miRNAs). Alternatively, cells can be transfected with the reporter vector alone if they express suitable endogenous levels of the candidate miRNA. Reduction of miRNA expression can be achieved using miRNA inhibitors such as modified antisense oligonucleotides [26] or sponge vectors [27], which constitute an elegant option when cells have high endogenous miRNA levels.

Importantly, transfection controls must be chosen carefully. These controls include reporter vectors without the UTR sequence, or with a UTR cloned in the antisense orientation. Also, cells must be co-transfected with a control luciferase reporter vector to normalize for variations in transfection efficiencies. Alternatively, dual luciferase reporter systems can be used, in which UTR sequences are cloned downstream to one luciferase gene (Renilla), while the other luciferase reporter (firefly) remains unaltered and is used for normalization. Specificity of miRNA regulation is assessed by co-transfection of an irrelevant miRNA or scrambled RNA duplexes. In these conditions, only transfection with the relevant miRNA should result in a decrease of reporter activity/expression. However, this result could be due to some off-target effect of the miRNA, which is provided in supra-physiological amounts to the cell when overexpressed, or indirect regulation by targeting genes that, in return, affect expression of the reporter. To confirm the specific inhibition of a miRNA on a target gene, it is therefore essential that the predicted binding sites be disrupted and that modified UTR sequences be tested in the reporter assay as well. This strategy not only definitively validates the miRNA:mRNA interaction and regulation, but also identifies which site(s) is/are true functional binding site(s) in the case of multiple predicted miRNA target sites. Last, a modified miRNA mimic harboring the complementary sequence to the mutated UTR can be used to rescue target regulation of the mutated UTR reporter constructs. In summary, a valid reporter assay should be carried out by co-transfecting (1) a reporter plasmid containing the full 3′UTR sequence, and (2) the same reporter construct with a disrupted target site, together with a miRNA overexpressing vector vs. scramble sequence.

The reporter assay described above indicates that, when a given miRNA and target gene are expressed simultaneously in the same cell, they are likely to interact and this interaction might result in miRNA-mediated reduced expression of the target gene. It remains, however, an artificial system in which both the miRNA and the targeted UTR are overexpressed in a heterologous system. It is thus recommended to confirm, when possible, that such regulation does occur on the endogenous gene. Changes in protein amounts upon miRNA overexpression/inhibition can be measured by Western blot, flow cytometry, or immunocytochemistry experiments. If antibodies are not available, other validation methods can be used, for example, based on enzymatic activity, ligand binding, etc. Another indication of miRNA-induced gene regulation can be provided by target transcript quantification. Although miRNAs were originally shown to regulate gene expression by repressing mRNA translation without affecting transcript level, it is now widely accepted that miRNA-mediated regulation is frequently accompanied by mRNA destabilization, essentially due to increased deadenylation [28, 29]. Transcripts displaying reduced levels upon miRNA ectopic expression are subsequently analysed for the presence of miRNA target sites in their 3′UTR using prediction algorithms (see Sect. 3.2) in order to identify putative miRNA target genes [12, 26, 30].

3.3.2 Proteomics Methods

Several proteomics studies have been designed to identify miRNA target genes. Vinther et al. [31] used stable isotope labeling by amino acids in cell culture (SILAC), in which proteins are metabolically labeled by cells growing in medium containing heavy isotopes of essential amino acids. Differences in protein synthesis are determined by mass spectrometry as the ratio of peptide peak intensities from light and heavy isotopes [32]. Of 504 proteins investigated by SILAC, they identified a set of 12 proteins with reduced expression in HeLa cells overexpressing miR-1 and grown in medium containing heavy isotopes, as compared to control cells grown with light isotopes. Seed region complementary sites were found in the 3′UTR of corresponding genes for 8 of these proteins, which was a significant enrichment for miR-1 seed motif when compared with entire 3′UTR sequence databases. These investigators used the luciferase reporter assay to confirm miR-1 regulation for 6 out of 11 target genes tested [31].

The SILAC method was subsequently used in two large-scale proteomics studies to identify target genes of several miRNAs [14, 33]. In both cases, HeLa cells were transfected with different miRNA duplexes, and protein output was measured 48 h post-transfection. Selbach et al. used a modified version of SILAC in which cells were pulse-labeled (pSILAC) so that heavy isotopes were primarily incorporated into newly synthesized proteins [14]. In addition, SILAC was used to study the impact of miR-223 deficiency in mouse neutrophils [33] and let-7b knockdown in HeLa cells [14]. The authors concluded that each miRNA regulates hundreds of target proteins, though to a relatively modest degree. Motif analysis revealed a significant enrichment for corresponding miRNA seed complementary sites in the 3′UTR of repressed genes, as compared to an unmodified protein set. While Baek et al. found that most repressed targets displayed detectable mRNA destabilization [33], Selbach et al. identified substantial direct regulation by translation inhibition [14]. Overall, these studies suggested that miRNAs act primarily by fine-tuning expression of a large number of target genes.

Zhu et al. used two-dimensional differentiation in-gel electrophoresis (2D-DIGE) to identify miR-21 targets in a mouse breast cancer model [34]. Proteins were extracted from tumors derived from human MCF7 cells treated with anti-miR-21 antisense or control oligonucleotide. After labeling with two different fluorescent dyes, both protein samples were separated by 2D-polyacrylamide gel electrophoresis (PAGE) in the same gel. Fluorescence intensity was measured by gel imaging, and differentially expressed proteins were purified from the gel prior to identification by mass spectrometry. This method identified seven proteins that were up-regulated in anti-miR-21 treated tumors, including tropomyosin (TPM) 1, which was further validated by reporter assay and Western blot [34]. Of note, several proteins were also found to be down-regulated upon anti-miR-21 treatment in this study, which suggests an indirect effect of miR-21.

Another approach for target identification combined miRNA and protein expression analysis with computational predictions. miRNA profiling was performed to identify differentially expressed miRNAs between two samples, which were compared to proteomics data generated by 2D-PAGE associated to mass spectrometry [35] or reverse-phase protein arrays [36]. Reciprocally expressed miRNAs and proteins were then compared to miRNA target predictions to identify relevant target genes. This analysis resulted in the identification of 52 and 17 miRNA:gene target pairs in rat kidney [35] and human cartilage [36], respectively. More recently, a targeted proteomics approach was designed to identify let-7 miRNA target genes in C. elegans [37]. The method combined isotope-coded affinity tag (ICAT) protein labeling [38] and detection by selected reaction monitoring mass spectrometry [39] to quantify protein levels between wild type and let-7 mutant whole worms. By definition, the ICAT labeling is restricted to proteins harbouring mass spectrometry-detectable peptides that contain cysteine residues [38]. This limitation implied working on a predefined set of proteins predicted as let-7 targets that met these requirements, leading to consequent reduced proteome coverage. Of 161 proteins analysed, 29 were significantly altered in mutant worms, including ten that were downregulated, suggesting an indirect effect of miRNA regulation [37]. Ten of the identified targets were further validated by genetic analysis and, for one of them, by reporter assay. The authors then used a modified method based on metabolic labeling of worms using heavy isotopes [40], to facilitate full coverage of the C. elegans peptide repertoire. Of 27 predicted miR-58 targets, four were identified as significantly upregulated in a miR-58 mutant using this modified method [37].

3.3.3 Biochemical Approaches

miRNA-mediated gene silencing in mammals requires a functional miRNA-loaded RNA-induced silencing complex (miRISC) machinery (Fig. 3.2). Several studies identified miRNA target transcripts by virtue of their association with miRISC components by co-immunoprecipitation of human or Drosophila Argonaute (AGO) proteins [35, 4146], human TNRC6 proteins [45], or nematode GW182 protein family AIN1-2 [47]. This strategy was originally used by Mourelatos et al. to identify new miRNAs that were co-immunoprecipitated with AGO2/EIF2C2-containing complex in HeLa cells [48]. Immunoprecipitated mRNAs were then identified by cloning, microarray analysis, or deep sequencing. A first strategy consists in the purification of all miRISC-associated mRNA species in a given cell type, in order to identify the global “targetome” of that cell type, without preliminary knowledge of the presence of any specific miRNA. Sequence motif analysis is then performed to identify miRNA complementary sites enriched in miRISC-bound mRNAs compared to whole cell mRNAs, thus inferring which miRNAs are co-expressed. Easow et al. used this approach in Drosophila S2 cells stably expressing FLAG/HA-Ago1 [42]. Microarray analysis revealed significant enrichment of transcripts containing complementary sites for miR-184, miR-7 and miR-314, in anti-HA pulled down mRNAs. Similarly, Beitzinger et al. pulled down AGO1- and AGO2-associated transcripts from HEK293 cells and identified immunoprecipitated mRNAs by complementary DNA (cDNA) library preparation and sequencing [41]. Another approach consists in comparing miRISC-associated mRNAs of cells transfected with, or deprived of, a given miRNA to mock-transfected or unmodified control cells. Easow et al. found a significant overrepresentation of miR-1 complementary sequences in Ago1 co-purified transcripts from miR-1 transfected S2 cells compared to untransfected cells [42]. Several studies using this method, also called RIP-Chip (ribonucleoprotein immunoprecipitation-gene chip), reported identification of miRNA targets in 293 cells [4345], Hodgkin lymphoma cell lines [49], human H4 glioneuronal cells [46], and C. elegans [47]. In this latter study, high-throughput sequencing was used to identify co-immunoprecipitated miRNAs as well. Notably, this experimental procedure allowed the identification of miRNA target genes with stable mRNA levels that are likely to be primarily regulated by translational repression [43].

Recently, the HITS-CLIP method (high-throughput sequencing by crosslinking and immunoprecipitation) was developed to identify direct protein/RNA interactions [50]. This approach uses UV irradiation to crosslink nucleic acids and proteins in close proximity, which are then immunopurified using an antibody to a miRISC component. Partial RNA digestion leaves miRISC-protected RNA fragments, which are then identified by high throughput sequencing. Chi et al. used HITS-CLIP to purify Ago2-bound mRNA and miRNA species from mouse brain as well as miR-124 transfected HeLa cells [51]. As in other studies, bound mRNAs were enriched for complementary sites to miRNAs that were either highly endogenously expressed or over-expressed following transfection. This approach, also called CLIP-Seq, was used to isolate Argonaute protein ALG-1-bound mRNAs in C. elegans [52] and Ago2-purified transcripts in wild type versus dicer −/− mouse ES cells [53]. An improvement of the method, named photoactivatable-ribonucleoside-enhanced (PAR)-CLIP, was recently described, in which crosslinking efficiency was enhanced by incorporation of the photoactivatable nucleoside analog 4-thiouridine into transcripts of cultured cells [54]. Upon UV crosslinking at 365 nm, thymidine located at the crosslinking sites are converted to cytidine, which allows for the precise identification of RNA-protein binding site. PAR-CLIP method was used to identify miRNA target sites of mRNAs associated to AGO and TNRC6 family proteins in 293 cells. Deep sequencing of bound RNAs revealed enrichment of complementary sites for the most highly expressed miRNAs [54].

Interestingly, these high-throughput studies revealed that a high proportion (25–50 %) of the binding sites were located within the coding sequence (CDS) region of bound mRNAs [51, 52, 54]. This observation suggests that functional miRNA target sites may not only be located in 3′UTRs as previously thought, in agreement with a number of recent reports identifying miRNA target sites in CDS [5558]. Furthermore, Schnall-Levin et al. recently demonstrated frequent CDS targeting through repeated miRNA binding sites, of paralogous families of the C2H2 zinc-finger genes, which typically contain many tandem repeats of the finger motif [59]. Similarly, building on previously published microarray data in mammalian cells transfected with, or deprived of, specific miRNA [14, 33], Fang and Rajewsky showed that CDS target sites act synergistically with 3′UTR sites for miRNA-mediated regulation of gene expression [60]. Of importance, most prediction algorithms could not identify this class of miRNA target sites because of the “3′UTR-only” rule. However, the PITA algorithm [4], which mainly identifies target site accessibility, and the rna22 program [57, 61], which identifies over-represented sequence patterns, can be used to detect miRNA binding sites located outside the UTR. In addition, the mimiRNA algorithm [8], which identifies miRNA:mRNA pairs that display conserved negative correlation of expression across several tissues, can be used to select candidate target genes prior to searching for putative binding sites. Of note, CDS target site validation requires a modified reporter assay, whereby the target-site-containing sequence is fused in frame with a reporter CDS [42, 59]. Alternatively, co-transfection of wild type and mutated versions of the targeted CDS associated with two different epitope tags, e.g. Myc and FLAG, has been used to monitor by Western blot the level of protein down-regulation upon miRNA co-expression [58].

An alternative strategy to the aforementioned protein pull-down methods was proposed by Orom and Lund, who developed an affinity-based target gene identification procedure [62]. In this case, transfection of a biotinylated synthetic miRNA allows the purification of miRNA:mRNP complexes using streptavidin-agarose beads. This strategy is attractive since it allows target gene identification of a specific miRNA, whereas other methods seek to isolate virtually all miRNA-regulated transcripts. By purifying a biotin-tagged bantam miRNA in Drosophila S2 cells, the endogenous target gene Hid was efficiently identified [62]. The same group subsequently used this technique to isolate mRNAs bound to biotinylated miR-10a in mouse ES cells. Surprisingly, microarray analysis revealed that 55 of the 100 most enriched mRNAs corresponded to ribosomal protein genes, with no enrichment for known miR-10a targets or transcripts with miR-10a complementary sites [63]. They further showed that miR-10a bound conserved sites in the 5′ UTR of these genes, leading to upregulation of ribosomal protein translation and ribosome formation, resulting in a ∼30 % increase of global protein synthesis [63]. Combined to 4-thiouridine modified nucleotides and UV crosslinking, biotin-tagged miRNA ‘pullout’ was used to demonstrate direct interaction between miR-34a and MYC transcript in human fibroblasts. Similarly the LAMP (labeled miRNA pull-down) assay was developed [64, 65], in which synthetic miRNAs were labeled with digoxigenin (DIG), and binding RNAs were isolated using anti-DIG agarose beads. The LAMP method was used to isolate known targets of C. elegans let-7 and lin-4, and zebrafish let-7 and miR-1. Specifically, 302 transcripts enriched using DIG-tagged miR-1 pull down (compared to mutated miR-1 control) were identified, including the known miR-1 target Hand2 [66]. An improvement of the method, called TAP-Tar (tandem affinity precipitation target identification) was recently described, which combined HA-tagged AGO1-2 immunoprecipitation followed by biotinylated miRNA pull down using streptavidin beads in HeLa cells [67]. This two-step procedure was shown to recover the known miR-20a target E2F1 more efficiently than each pull down method used separately.

3.3.4 Molecular Methods

Vatolin et al. reported the use of endogenous miRNAs as primers for cDNA synthesis by reverse transcriptase on the targeted mRNA template [68]. Although pairing of the target mRNA to the miRNA 3′ end is usually weaker than to the 5′ end (the seed region), the hypothesis underpinning this work was that the miRNA 3′ end could form a temporary stable duplex with the target mRNA to initiate cDNA synthesis (Fig. 3.2). Using cytoplasmic extracts, a first round of reverse transcription elongates the miRNA sequence to generate cDNA-miRNA molecules, which are purified and used as secondary primers to drive a second round of reverse transcription, thereby increasing the specificity of the reaction. After ligation of an adapter sequence at the 5′ end, cDNAs are PCR amplified using a primer from the adapter and a gene-specific primer corresponding to a target RNA of interest. PCR products are then cloned and sequenced to identify the regulatory miRNA based on homology searches of the appropriate databases. Vatolin et al. recovered partial sequences of miRNAs associated to β-actin, N-Ras and K-Ras mRNAs from human hTERT-RPE1 epithelial cells, and confirmed their functional regulation by Western blot and luciferase assay upon miRNA overexpression [68].

Andachi modified the method by ligating an adapter sequence to the 3′ end of the cDNA and by using a biotinylated, miRNA-specific primer together with an adapter-specific primer for PCR amplification [69]. The amplification product was purified using avidin beads, and further PCR amplified with adapter-specific and nested miRNA-specific primers. When applied to C. elegans, this method isolated the known lin-4 target gene lin-14, and identified the K10C3.4 gene as a new target for let-7, which was further validated through reporter assay and genetic complementation analysis [69]. The two methods described above allow identification of miRNA:mRNA pairs by either target gene- or miRNA-specific analysis, and are not suitable for high-throughput identification of miRNA targets.

In the specific context of miRNA-mediated cleavage of a target gene (Fig. 3.1), several studies identified mRNA cleavage products by RNA ligase mediated-5′ rapid amplification of cDNA ends (RLM-RACE) [7077]. In the original method, an RNA adapter was ligated to the 5′ phosphate of cleaved, uncapped poly-A+ RNAs. After reverse transcription with oligo-(dT), cDNAs were amplified using adapter- and gene-specific primers, before cloning and sequencing. This approach was used to validate miR-171-mediated cleavage of several transcripts of the SCL family of transcription factors in Arabidopsis thaliana [70], as well as Hoxb8 mRNA cleavage by miR-196 in mouse embryos [25]. In addition, the 5′ end of the cloned mRNA was shown to map to the nucleotide pairing with the tenth nucleotide of the miRNA.

An improved method, named PARE (parallel analysis of RNA ends), was developed for genome-wide identification of miRNA-induced cleavage products [71, 72]. In this modified protocol, the 5′ RNA adapter was engineered to contain an MmeI restriction site, and after reverse transcription and second strand cDNA synthesis, double-stranded molecules were digested with MmeI, generating 20–21 nt tag sequences attached to the adapter. A DNA adapter was then ligated at the 3′ end of the tag, which was PCR amplified using 5′adapter- and 3′adapter-specific primers. Tags were analysed by high-throughput sequencing and matched to the Arabidopsis genome to identify corresponding target genes and infer regulatory miRNAs. This ‘degradome’ tag analysis identified a large proportion of known Arabidopsis miRNA and trans-acting siRNA (ta-siRNA) target genes, although most of the tags represented mRNA degradation products unrelated to these small RNAs [71, 72]. PARE was also used to identify miRNA and ta-siRNA target genes in rice [75]. A modified RLM-RACE methodology was also developed, in which Arabidopsis cleaved transcripts were linearly amplified by in vitro transcription using a T7 promoter, prior to microarray analysis [73, 78]. Of the 228 candidate targets identified, 14 corresponded to previously known miRNA targets [73].

Although this approach is most suited to plants, in which extensive base pairing between miRNA and mRNA leads to miRISC-mediated cleavage of targeted mRNA, several studies reported PARE analysis of the degradome in mammalian cells [74, 76, 77]. Karginov et al. compared degradome tags from wild type versus Ago2 −/− mouse ES cells, in order to identify miRNA-specific cleavage products. Tag abundance peaked at nucleotide position 10 of the miRNA in wild type cells, whereas no peak was identified in Ago2 −/− cells [74]. This study also identified a number of target genes subjected to direct Drosha-mediated endonucleolytic cleavage, as well as Ago2- and Drosha-independent cleavage sites that were conserved in human 293 cells. In another study, Shin et al. defined a class of metazoan target sites named ‘centered sites’, which lack perfect seed pairing and 3′-compensatory pairing, but instead harbour 11–12 contiguous nucleotides that pair with miRNA nt 4–15 [76]. Using RLM-RACE degradome sequencing, they identified a set of genes targeted for miRNA-mediated cleavage in HeLa cells and human brain, though of low abundance. Although most of the putative target genes were attributed to three highly expressed miRNAs (miR-196a, -28, -151-5p), a total of 18 additional miRNA target genes were identified [76]. Likewise, Bracken et al. performed degradome analysis on six adult mouse tissues and d16.5 whole mouse embryo, resulting in the identification of 23 putative miRNA-mediated cleavage sites, most of which displayed low read frequency [77]. Although these studies revealed the existence of miRNA-guided cleavage of target mRNAs in mammals, such targeting remains restricted to a limited number of genes. In addition, degradome analyses showed that a substantial proportion of transcripts were subjected to endonucleolytic cleavage, though most of them were not related to miRNA regulation [76, 77].

3.3.5 Concluding Remarks

Here we have considered a diverse array of computational and experimental methods used for genome-wide identification of miRNA target genes, each of which exhibits its own strengths and weaknesses. Yet, high-throughput approaches require formal validation to discriminate direct from indirect targeting, and identify functional miRNA target sites among the plethora of predictions. In this regard, reporter assays can provide such information, although they should be supported by other validation analysis, notably showing miRNA and mRNA co-expression and targeted protein output variations upon miRNA expression modulation.

The experimental methods described above highlight the existence of a large number of miRNA binding sites outside the 3′UTR of interacting mRNAs, particularly in the CDS. Although target sites in the CDS do not appear to be as effective in regulating protein output as those present in the 3′UTR [59], their contribution to the fine-tuning of gene expression has been essentially ignored so far. In addition, most of the widely used target prediction algorithms consider sites solely located within the 3′UTR, which renders CDS target site analysis even more difficult. Implemented computational methods will undoubtedly be developed in the future in order to investigate CDS target sites, together with the recently identified centered sites [76].

New models to explore miRNA function are regularly described, among which miRNA loss- and gain-of-function approaches will play an increasing role. Such models have proved useful for functional analyses of miRNA activity and target gene identification in nematode and Drosophila, and to a lesser extent in mouse ([7983], see ref [84] for review). The mirKO resource [85] that was recently made available for the scientific community should aid in deciphering new miRNA functions and targets in the mouse. Likewise, the generation of miRNA/mRNA targeting networks through computational analysis of putative target gene function [86] should provide additional hints towards functional miRNA target gene identification.