Keywords

8.1 Introduction

In the last decade gene expression profiling by microarrays (Brown and Botstein 1999), and more recently by RNA-Seq (Mortazavi et al. 2008), has become one of the most important and widely used tools of molecular biology. However, recent studies have shown that mRNA levels only imperfectly correlate with protein levels (Vogel et al. 2010), and that regulation at the level of translation and the level of protein stability plays a very important role (Sonnenberg and Hinnebusch 2009) in influencing the final outcome of gene expression. Ribosome profiling (also called Ribo-Seq), i.e. next-generation sequencing of mRNA fragments protected by the translat ing ribosome, pioneered in the Weissman lab in 2009 (Ingolia et al. 2009), is a method that closes some of the gap between the mRNA molecule and the protein. Since 2009, ribosome profiling has been used to shed light on many open questions in several different species (Table 8.1), from the mechanisms behind miRNA regulation (Bazzini et al. 2012) to experimental determination of translation initiation sites (Ingolia et al. 2011). Perhaps surprisingly, and most probably due to a very demanding and labour intensive protocol behind the ribosome profiling, since the first publication only 56 published studies have presented new ribosome profiling datasets. This means that although already 6 years old, ribosome profiling is still very much in the development phase and although a detailed protocol has been published (Ingolia et al. 2012), individual procedures, such as the use of cycloheximide for translation inhibition, have recently come under intense scrutiny. In the following pages, we present different ways in which ribosome profiling has been put to use, different biological application of the methods and the current state-of-the-art experiment guidelines, with special attention put to alternative protocols and still open questions.

Table 8.1 Studies that provided new ribosome profiling datasets from 2009 to 2014

8.2 Applications

Different applications of ribosome profiling have been recently reviewed (Ingolia 2014). In this section, we report on all studies (identified by using the search term ribosome profiling or Ribo-Seq in Web of Knowledge) that have generated ribosome profiling data until December 2014, together with information of the species, main application and sequencing platform (Table 8.1). There have been several specific exciting discoveries made with ribosome profiling in the last 5 years and it is beyond this chapter to name all of them; however from all the studies some very general conclusions can be made. Perhaps most important is that the studies have demonstrated that global and specific regulation of gene expression at the translational level is ubiquitously present in all biological processes, from development to defence against oxidative stress. The mechanisms behind specific regulation are most likely sequence features on the 5′ and 3′-UTRs of individual transcripts that are subject to different translation initiation regimes, but more research is needed until firmer conclusions can be made. A second important conclusion is that translation often involves initiation from alternative initiation codons on single transcripts, and thirdly, apparently translated RNAs correspond to surprising regions of the genome, such as 5′UTRs or noncoding RNAs. It is reasonable to assume that ribosome profiling will in the future significantly increase the number of discovered peptide and proteins.

In the abovementioned studies, ribosome profiling has generally been used in three different ways: (1) identification of translated RNA regions, (2) calculation of single transcript and global translation efficiency as a measure of protein synthesis, and (3) comparing ribosome occupancy along single transcripts and along the transcriptome. Each of these takes advantage of different ribosome profile properties and is described in more detail below.

8.2.1 Identification of Translated Regions

Traditionally eukaryotic protein-coding regions were identified based on cDNA sequence data generated from known transcripts or from peptide sequences. Normally the longest possible ORF in a transcript is assumed to be the coding region (CDS) . Today, despite the fact that a combination of ab initio transcriptomic, comparative genomic and machine learning approaches have increased the accuracy of gene prediction above 95 %, the prediction of coding regions still lags behind (Yip et al. 2013). Ribosome profiling provides a very promising alternative to the current state-of-the-art (Ingolia et al. 2011) by (1) assuming that ribosome-protected regions of the mRNA are also translated and (2) taking advantage of the near nucleotide precision of ribosome profiling—since ribosome-protected fragments are of quite uniform size it is possible to assign the position of ribosomal A site to a particular nucleotide or at least codon.

One of the strategies (see Fig. 8.1 for a schematic of all strategies) used for identification of coding region by several groups was to detect all translation initiation sites (TIS) , by using a translational initiation inhibitor, such as harringtonine (Ingolia et al. 2011) or lactimidomycin (Lee et al. 2012), before sequencing ribosome-protected fragments. The result is a very sparse ribosome coverage, which is assumed to coincide with translational initiation sites. To further reduce the number of false positives, machine learning methods are used to recognize patterns of ribosome coverage similar to pattern of known initiation sites. In all studies that have used this strategy so far, a surprisingly high number of translational initiation sites was discovered in 5′-UTRs and in noncoding RNAs, leading to the hypothesis that current annotation misses a large part of the translated transcriptome (Ingolia et al. 2011). This proved a very controversial hypothesis and many following studies have tried to confirm or repudiate it using alternative strategies.

Fig. 8.1
figure 1

Strategies for detecting translated mRNA regions based on ribosome profiling. (a) Typical ribosome profile obtained with or without translation inhibition with harringtonine. The harringtonine profile is high over the putative TIS, while the no drug profile is high throughout the translated region. Segmentation of the profile identified two separate translated regions, both starting at the same TIS: it seems a shorter and a longer peptide are produced from this genomic region, as would be the case of selenoprotein translation (Zupanic et al. 2014). (b) If the ribosome density over the subcodon positions does not follow a standard subcodon pattern (usually high low low), then translation is questionable. (c) Distribution of RNA fragment length in a protein-coding region and RNA not covered by ribosomes (but by, e.g. telomerase) (Ingolia et al. 2014)

One of the arguments against prevalent translation of UTR regions and noncoding RNAs was that although the discovered TISs do show translation initiation, this does not necessarily also lead to elongation. In one study, the predicted TISs were compared to regions predicted to be translated by a segmentation algorithm, which identified genetic regions with uniform ribosome coverage, indicating uninterrupted translation (Zupanic et al. 2014). The study showed that less than 1 % of the alternative identified TISs were found to initiate robust translation. The segmentation method was also able to detect alternative initiation in cases when more than one TIS is used for a given transcript.

Other strategies have also been developed. In one, the size distribution of the ribosome fragment aligned to the putative translated region is compared to a standard distribution of fragment sizes and significant deviation from the standard was deemed artefacts not connected with translation (Ingolia et al. 2014). To profile only actively translating ribosome complexes Poly-Ribo-Seq was developed, in which polysomes (actively translated RNA-ribosomes complexes) are biochemically purified prior to ribosome footprinting (Aspden et al. 2014). Another strategy was to analyse the nucleotide periodicity of ribosome profiling (the first nucleotide position in a codon has higher ribosome density than the second and third)—a broken periodicity points to artefacts or a possible frameshift during translation (Michel et al. 2012). Other strategies for defining coding regions include searching for a stop codon after the putative TIS (Zupanic et al. 2014; Albert et al. 2014; Howard et al. 2013; Guttman et al. 2013), which enables detection of premature termination during translation, and confirmation of the putative translated peptide sequences by mass spectrometry (Schrader et al. 2014; Smith et al. 2014; Menschaert et al. 2013). There has so far been no standardized comparison of the different strategies using common or comparable datasets, so it is currently not clear whether any single of them is superior or a combination would provide the best result.

8.2.2 Translational Efficiency

Biological systems react to p erturbation by employing appropriate regulatory pathways. In most cases, the regulation consists of changes in gene expression; however these changes occur at both the transcriptional and translational level. To differentiate between regulation that occurs at the translational level from that at the transcriptional level, a measure called translational efficiency (TE) was developed (Ingolia et al. 2009; Guo et al. 2010):

$$ \mathrm{T}\mathrm{E}=\frac{\raisebox{1ex}{${C}^{\prime }$}\!\left/ \!\raisebox{-1ex}{${N}^{\prime }{L}^{\prime }$}\right.}{\raisebox{1ex}{$C$}\!\left/ \!\raisebox{-1ex}{$NL$}\right.} $$

where C′ is the number of ribosome profiling reads aligned to an individual coding region of a gene, N′ is the total number of ribosome profiling reads aligned to all coding regions, L′ is the length of the coding region of the gene, C is the number of RNA-Seq reads aligned to a transcript, N is the total number of RNA-Seq reads aligned to all the transcripts and L is the length of the transcript. TE can only be calculated if RNA-Seq and ribosome profiling were both performed by taking samples from the same source. For lower counts, the TE metric is associated with a large error; therefore all genes with a low number of aligned reads (usually, below an average of at least 1 read per nucleotide—(Guo et al. 2010)) are disregarded in the analysis. As defined above, TE does not account for error due to alternative splicing of individual genes or alternative protein-coding regions on individual transcripts; however if RNA-Seq and ribosome profiling are also used to estimate these two events (Zupanic et al. 2014), it can easily be adjusted.

Although most studies performed so far use the above definition of translational efficiency, it lacks statistical robustness. This can be improved by using a linear modelling approach that also leverages both RNA-Seq and ribosome profiles—the method has been provided as an R package (Larsson et al. 2011). Another, more recently published approach called Babel relies on error-in-variables regression model for estimation of unexpected patterns in ribosome occupancy, and the Fisher’s exact test to calculate significance levels (Olshen et al. 2013).

The outcome of a translational efficiency study is a list of genes that are differentially regulated at the translational level, and this list can be used analogously to RNA-Seq to determine differentially expressed pathways and processes regulated at the translational level or resolve sequence features of groups of genes to establish mechanisms behind their differential translation (Thoreen et al. 2012; Hsieh et al. 2012). Another option is to use ribosome profiling datasets as estimates of protein production rates and perform downstream analysis on these alone (Li et al. 2014a).

8.2.3 Ribosome Speed

A number of studies thus far have focused, not on detecting translated regions or translational efficiency, but on using the nucleotide precision of ribosome profiling to try to understand what controls ribosomal speed along a transcript (Ingolia et al. 2011; Gardin et al. 2014; Stadler and Fire 2011; Pop et al. 2014; Li and Weissman 2012; Charneski and Hurst 2013; Artieri and Fraser 2014; Dana and Tuller 2012, 2014; Shah et al. 2013). The assumption behind this is that ribosomes spend more time on slower codons; therefore there is a higher probability that a ribosome will be found on these codons and the ribosome density on these codons will be bigger than on their faster counterparts.

So far, studies have come to very different conclusions, and it is not clear whether these results depend on the species studied or are due to different analysis methods. Heterogeneity in tRNA availability across tissues and cell types used in different experiments is also likely to contribute to the biases observed (Dittmar et al. 2006). Although all studies have used a similar methodology, there is as yet no consensus on how to account for the biases (see later sections) inherent to ribosome profiling: some studies have excluded regions at the beginning and end of coding regions from the analysis, others have used these regions (but used normalization) and again others have not accounted for bias at all.

In short, some studies have found a strong effect of codon bias on elongation speed (Pop et al. 2014), others of tRNA availability (Dana and Tuller 2014), again other effects of positive amino acids (Charneski and Hurst 2013), strong control asserted by proline alone (Artieri and Fraser 2014), specific stalling sequences (Li and Weissman 2012) or even none of the above (Ingolia et al. 2011). A systematic evaluation of a large number of ribosome profiling datasets with the whole set of methodologies is needed to evaluate different contributions to elongation speed.

8.3 Experimental Design Guidelines

With regard to sequencing, ribosome profiling is little different from the more traditional RNA-Seq; therefore the guidelines established in the last decade for RNA-Seq (and described in other sections in this book) should also be valid for ribosome profiling (SEQC/MAQC-III Consortium 2014; Li et al. 2014b). In any case, no systematic comparison of different sequencing platform for ribosome profiling is available, and those few studies that made any sort of comparison between RNA-Seq and ribosome profiling properties have found clear correlations between different properties of both types of datasets (Zupanic et al. 2014; Artieri and Fraser 2014).

There are, however, important differences between ribosome profiling and RNA-Seq with respect to the preparation of samples for sequencing and in bioinformatic analysis after sequencing. In the following pages we, therefore, focus particularly on those parts of the ribosome profiling protocols that are different from RNA-Seq counterparts. In the description, we mostly follow the ribosome profiling protocol published by Ingolia et al. in 2012, and its modifications as proposed by various studies.

8.3.1 Technical and Biological Replicates, Sequencing Depth

In the recent large-scale assessment of RNA-Seq accuracy, it has been found that technical variation due to sequencing artefacts is low, while biological variation is high (SEQC/MAQC-III Consortium 2014). The study thus emphasized the value of biological replicates to increase the quality of RNA-Seq studies. Although the minimum number of biological replicates required in some studies has been 2, the study suggests big improvements can be made with each additional biological replicates, with the biggest influence of the first 4–5. There is currently no reason to expect that ribosome profiling would have different requirements.

The same study also evaluated the importance of sequencing depths and concluded that increasing the depth up to 500 million aligned reads still contributes significantly to the number of detected genes, but that the improvements with further increase are smaller (SEQC/MAQC-III Consortium 2014). While the first ribosome sequencing studies feature lower total read counts, some of the later studies have already taken the number of aligned reads towards 100 million and this has significantly increased the number of detected genes (McManus et al. 2014). As for the number of biological replicates, there is currently no reason to provide any recommendation that would differ from RNA-Seq guidelines.

8.3.2 Wet Lab Protocol

A detailed ribosome profiling protocol for mammalian cells, together with a list of necessary reagents, reagent setup, equipment and equipment setup, has recently been published (Ingolia et al. 2012). In the following sections, we follow the published protocol, but also describe alternatives and point out those parts that have received criticism from the community.

8.3.2.1 Cell Lysis

Following cell culture according to conditions relevant to the study, cells must undergo lysis. The most contentious issue during this first phase of the protocol is the timing and use of translation elongation inhibitors. In the original ribosome profiling study, cycloheximide was used to stabilize the polysomes before performing lysis (Ingolia et al. 2009). The study found an increase in ribosome density immediately after the TISs and postulated that an elevated 5′ ribosome density (ramp) is a general feature of translation. It was later discovered that different translation inhibitors (i.e., emetine vs cycloheximide vs anisomycin vs chloramphenicol vs tetracycline) lead to different distribution of sizes of ribosome-protected fragment and also different shapes of the ramp, while the ramp even disappears when using no drugs (Ingolia et al. 2011; Lareau et al. 2014; Nakahigashi et al. 2014).

Recently, a critical study has cast some doubt on some of the previous discoveries and put them down to a bias caused by inappropriate cycloheximide use (Gerashchenko and Gladyshev 2014). They discovered that the nature of the ramp also depends on the used concentrations of the translation inhibitors: the ramp effect gets smaller with higher concentration and disappears completely when the concentration used is high enough. This concentration dependence was explained by slow passive diffusion of the drug into the cells—at low concentrations cycloheximide is only partly effective and allows for some extra movement of the ribosomes. For this reason many of the newer studies avoid the use of translation inhibitors and rather opt for flash freezing (Oh et al. 2011) of the samples to stabilize the ribosome positions.

8.3.2.2 Translation Initiation Site Profiling

While the use of translation elongation inhibitors, such as cycloheximide and emetine, can bias the position of ribosomal fragment and should be used with caution, nothing similar has been reported for translation initiation inhibitors, such as harringtonine (Ingolia et al. 2011) or lactimidomycin (Lee et al. 2012). These inhibitors, which need to be used immediately before adding cycloheximide and lysis of the cells, are used to enrich ribosomes on TISs and thus enable discovery of new coding regions. While their use might still bias the distribution of ribosome around the TIS, this was shown not to be critical for TIS identification.

8.3.2.3 Nuclease Footprinting

After lysis, the next step is ribosome footprinting, using endonucleases to digest the unprotected RNA. While most studie s use bacterial RNAse I for digestion (Ingolia et al. 2009), some recent studies also use micrococcal nuclease (MNAse) (Dunn et al. 2013; Nakahigashi et al. 2014). The choice of nuclease depends on the studied species, with most higher eukaryote studies so far using RNAse I, but in those studies that used both nucleases no significant differences were found (Nakahigashi et al. 2014). Recently, a ribosome profiling kit has become available for both yeast and mammalian cells (ARTseq/TruSeq Ribo Profile Kit) and it has been successfully used in a few studies (Bazzini et al. 2014).

While the use of different endonucleases does not seem to affect the results, it has been shown that the lysis buffer can have an important effect. Buffers with lower salt and magnesium content result in narrower ribosome fragment size distributions, and fragments whose termini show more specific positioning relative to the reading frame being decoded (Ingolia et al. 2012). These can then be aligned to the genome with a higher positional resolution, making inference of the coding regions easier. Recent studies have shown that ribosome complexes are not maintained in all buffer compositions, the result being loss of a part of the ribosome footprint population (Aspden et al. 2014).

8.3.2.4 Ribosome and RNA Fragment Recovery

After nuclease digestion, ribosome-RNA complexes need to be isolated from cell lysates. In earlier studies this was performed by sucrose density gradient purification (Ingolia et al. 2009); however due to the need of special equipment and methodological difficulties this was then replaced by sucrose cushion sedimentation (Ingolia et al. 2012). This includes laying the lysate on top of a 1 M sucrose cushion in an ultracentrifuge tube, followed by centrifugation to pellet ribosomes.

Alternative methods for ribosome recovery include translating ribosome affinity purification (TRAP) (Heiman et al. 2008; Oh et al. 2011; Becker et al. 2013) and size exclusion chromatography (Bazzini et al. 2014). TRAP takes advantage of genetically modified, epitope tagged ribosomal proteins, and chromatography using strongly specific antibodies. Size-exclusion spin column chromatography, on the other hand, separates the ribosome-RNA complexes from other lysate content purely based on size. The speed and convenience of size exclusion chromatography could very well make it the preferred method for ribosome recovery in the future; however so far, it has not been used in many studies.

After recovery of ribosome-RNA complexes, the ribosomes need to be removed from the RNA fragments, which is usually done using one of the widely available RNA purification kits, such as miRNeasy kit (Ingolia et al. 2012). Care must be taken to avoid any ribonuclease contamination from this point on, as this will lead to RNA fragment digestion. Finally, the remaining RNA fragments of sizes ranging from 26 to 34 nt for mammalian cells (Ingolia et al. 2012) or shorter for prokaryotes (Li et al. 2014a) are separated from the rest using RNA gels and electrophoresis followed by gel extraction. Recently, at least in E. coli it has been shown that a larger range of mRNA foot print sizes can also be used without significantly affecting the final results. Indeed, another recent study in yeast showed that in the absence of cycloheximide, there exist two different populations of ribosome-protected fragments, one of size 28–30 nt and a shorter of size 20–22 nt (Lareau et al. 2014). Contrary to cycloheximide, the 20–22 nt fragments were seen in case of using anisomycin as translation inhibitors, indicating that the ribosome-RNA complex can exist in two different configurations. It therefore seems best that the size inclusion of RNA fragments is defined according to the translation inhibitor used and that if no inhibitor is used, a wider fragment size distribution is taken for further analysis.

8.3.2.5 Library Preparation

Linker Ligation

Since most of the studies performed so far used Illumina platforms for the sequencing, linker ligation is mostly performed according to the Illumina prescribed protocols, which include the addition of a polyA tail to each sequence. Alternatively, optimized RNA ligation of a preadenylated linker can be used to achieve similar results (Ingolia et al. 2012). Ligation is followed by reverse transcription, polyacrylamide gel electrophoresis and circularization of the reverse transcription products to get the cDNA molecules used in the following procedures.

Barcoding

Following the circularization it is optional to add barcode sequences for each sample (multiplexing) (Ingolia et al. 2012; Duncan and Mata 2014), followed by several cycles of PCR amplification. The amplification reactions can either be purified by magnetic bead-based methods or are loaded on to polyacrylamide nondenaturating gels, separated by electrophoresis and the amplified PCR product excised. The latter step is now widely available as an automated process via pippin prep, E-gels and other similar products. The libraries thus generated are finally characterized using one or more of the following methods such as qPCR, Bioanalyzer, and Tape-station to ensure library quality and concentration, before using for sequencing.

rRNA Depletion

At this point, cDNA molecules derived from rRNA still represent a significant amount of the sample. In most studies, it turned out that a few (species specific) rRNA molecules are responsible for the bulk of the contamination and it was thus possible to remove most of the contamination by focusing on a few specific molecules. This was mostly done using hybridization to biotinylated sense-strand oligonucleotide followed by removal of duplexes through streptavidin affinity (Ingolia et al. 2012). Alternatively, more general removal of rRNA via rRNA removal kits before the library preparation step was also used with good results.

8.3.2.6 Sequencing

All ribosome profiling studies conducted so far, with the exception of one (Reid and Nicchitta 2012), have used the Illumina Platforms (GAII or HiSeq2000) for the sequencing, with the same basic protocol that is no different from the one used in RNA-Seq (Ingolia et al. 2012). The output of a Illumina sequencing run is a FASTQ format file, which includes both the sequence and the quality all the sequenced read and is the basis for computational analysis which follows the sequencing.

8.3.3 Computational Analysis

Although it takes quite some time and effort to get from the initial samples to the sequences, without proper interpretation the sequences are not worth much. Computational analysis enables one to first align the sequenced reads to a genome and then to evaluate whether the number of aligned reads to particular genetic regions has an important biological function.

8.3.3.1 Alignment

The alignment of the reads to the genome is also no different than for RNA-Seq. First, the sequencing data are pre-processed by discarding low quality reads, removing the 3′ linker sequence and removing the first nucleotide from the 5′ end of each read. This can be done, e.g. using the FastX Toolkit. Note that although the outputs of different sequencing platforms are not all the same, FastX Toolkit and most similar tools can read most of the formats if these are correctly specified. The trimmed sequences are then first aligned to an rRNA reference, using any of the available aligners (Bowtie, Subread, Burrows-Wheeler). The non-rRNA reads are then aligned to the genome using a splicing-aware aligner (e.g., Tophat2).

Because the ribosome-protected fragments are quite short, the alignment is not always perfect, i.e. many reads align to more than one genomic segment. Different studies have applied different strategies to remove the bias potentially arising from such multiple alignments: (Guo et al. 2010) simply discarded all reads with multiple alignments, (Ingolia et al. 2011) kept all alignments, thereby counting a single read multiple times, (Dana and Tuller 2012) suggested an iterative approach, in which first only uniquely aligned reads are kept, then in the second round each multiple aligned read is assessed for the presence of neighbouring reads from the first round, and keeping only those with neighbours, while discarding the rest. Since reads with no neighbours are excluded from the analysis in any case at later stages, the iterative procedure should lead to the least bias and is recommended. An iterative approach is usually implemented by running the alignment algorithms several times, with different input files. The output of the alignment is either a BAM or a SAM (human readable version of BAM) file, which is the basis for all further computational analysis.

Another general occurring problem with alignment shared with RNA-Seq is assignment of a read to the correct transcript for alternatively spliced genes. Although none of the ribosome profiling studies used alternative splicing detection, several studies have shown that this can bias the final analysis (Zupanic et al. 2014). We therefore recommend that an algorithm for detection of alternative splicing, such as rMATs (Shen et al. 2014), is used during analysis.

8.3.3.2 Biases

Since both ribosome profiling and RNA-Seq are based on the same sequencing procedures, it is reasonable to assume they would also suffer from the same biases . This was demonstrated by a recent study that used RNA-Seq profiles to normalize ribosome profiles. The study showed that the obtained normalized average profiles are a better representation of our current understanding of translation than the non-normalized profiles: ribosome density was quite smooth and slowly decreasing from the 5′ to the 3′ region, which was to be expected if occasional ribosome drop-offs occur (Zupanic et al. 2014). Another study took a similar approach and discovered that normalization with RNA-Seq significantly changes the previous analyses of ribosome speed, implicating proline as an important ribosome pausing factor (Artieri and Fraser 2014). In none of these studies did normalization completely remove the increased ribosome density observed in the first couple of codons in coding regions, when using translational inhibitors. This bias can be eliminated either by using correction factors for the biased region (Li et al. 2014a) or by simply ignoring the biased regions in the analysis. In any case, bias removal by RNA-Seq normalization and accounting for translation inhibition artefacts is necessary before any further analysis.

8.3.3.3 Functional Analysis

Once the sequences have been aligned and bias has been taken care of, the visualization and the functional interpretation of data can begin. For easy visualization, we recommend the riboseqR Bioconductor package, w hich produces a genome browser type of a visualization which can be useful for analysis of open reading frames (Hardcastle 2014). The most common application of Ribo-Seq is to find translated regions, changes in translational efficiency after a perturbation or follow ribosomal speed across the genome to study codon bias. Regardless of the application, there are currently no standard methods that the community would use nor specifically developed and widely available computational packages. Currently, the optimal strategy for a researcher is to carefully study the work performed by others and then test the proposed methods. Most papers have made the algorithms they developed available as supplementary material, but even when this is not the case the community gladly shares their computational resources.

In case of using Ribo-Seq to determine differentially translated transcripts after a perturbation, it is possible to use the differential expression packages developed for RNA-Seq, such as edgeR and DESeq (Robinson et al. 2010; Anders and Huber 2010). Upon obtaining a list of differentially expressed genes, further functional analysis is possible, but this is beyond the scope of this chapter.

8.4 Databases

Currently, most ribosome profiling datasets are being deposited in the GEO database in the SRA format (Barrett et al. 2013); however GWIPS-viz a ribosome profiling specific database and genome browser is under development (Michel et al. 2013). Currently, the database features some preloaded datasets available from the GEO, but in the future the developers plan to include options to upload own datasets. In its latest update, they have made available a range of tools to help the researcher develop own workflows of the sequenced data.

Although there are no alternatives for publishing raw ribosome profiling data, except for a special section for ribosome profiling data in the E. coli PortEco database (Hu et al. 2014), the results of ribosome profiling analysis have been included in a few other databases. One such option is the TISdb, a database of mRNA alternative translation that followed studies that searched for TISs (Lee et al. 2012; Wan and Qian 2014). Another is HAltORF, a database of alternative out-of-frame open reading frames for human (Vanderperre et al. 2012, 2013).

8.5 Conclusion

Ribosome profiling is emerging as a powerful technique to gain a genome-wide snapshot of gene expression and translation control under a given cellular condition. The availability of positional information of ribosome occupancy facilitates the discovery of novel translational control elements such as alternative initiation at non-canonical start sites, upstream and multiple ORFs, stop codon readthrough such as in the case of selenoprotein translation and pause/regulation of elongation. In addition, coupled with RNA-seq it is a powerful tool to discover alternative splicing variants undergoing differential translation as well as measuring productive alternative splicing at the translational level. Thus, this technique is expected to have a far reaching impact on multiple biological investigations.