1 Mass Spectrometry of Modified Ribonucleic Acids

Mass spectrometry (MS) is a powerful and popular analytical platform for the characterization of biomolecules. However, the application of mass spectrometry for characterizing nucleic acids has lagged behind other classes, due to the simplicity, speed and sensitivity of amplification-driven technologies such as Sanger and Next-Gen sequencing. Where MS has proven most useful in nucleic acids is in the direct detection of modified nucleosides. In both deoxyribonucleic acids (DNA) and ribonucleic acids (RNA), many nucleosides can be enzymatically or chemically modified. These modifications have important biological functions or outcomes, including RNA editing, RNA stability and protein expression [13]. Not surprisingly, techniques and technologies that enable the rapid determination of modified nucleosides remain an ongoing interest. The effectiveness of MS as an enabling technology is that it can reveal the mass and the structure of the modified nucleoside, which many amplification-based approaches are unable to perform directly.

A primary focus of our laboratory has been developing MS approaches that enable the rapid and accurate identification of modified nucleosides from RNA. More specifically, one goal has been to create a platform that enables RNA modification mapping—placing identified modified nucleosides into the correct RNA sequence context. The basis for RNA modification mapping by MS is a hyphenated liquid chromatography tandem MS (LC–MS/MS) approach although an alternative method using matrix-assisted laser desorption/ionization MS (MALDI-MS) has been used by us and others.

RNA modification mapping by MS was initially developed by McCloskey and co-workers [4]. The general approach involves two separate experiments. The first experiment allows one to obtain a census of all the modified nucleosides in the RNA sample of interest by completely digesting the intact RNA into individual nucleosides, which are separated and identified by LC–MS/MS [5]. The second experiment requires that the intact RNA first be digested using a specific nuclease, which will generate a mixture of oligonucleotides of varying length. This mixture of digestion products is then analyzed by LC–MS/MS as well [6]. Here, the MS/MS step is used to fragment an oligonucleotide by collision-induced dissociation (CID) such that the original sequence can be reconstructed [7]. As noted above, a similar approach can be used with MALDI-MS, as demonstrated by Kirpekar and coworkers during the mapping of post-transcriptional modifications to ribosomal RNAs (rRNAs) [8].

RNA modification mapping by MS is facilitated these days by the availability of known RNA sequences, which arise due to advances in genomic sequencing technologies. These sequences reflect the status of the RNA lacking modification, thus one can readily calculate the molecular weights of unmodified RNAs and any subsequent RNase digestion products using a variety of online tools. Because nearly all RNA modifications result in an increase in the mass of the canonical nucleoside, digestion products matching the calculated value will not be modified. As such, experimental strategies now limit data analysis primarily to those RNase digestion products whose masses do not match, suggesting the digestion product contains a modification. The interested reader is directed to a number of recent publications that describe RNA modification mapping by MS in more detail [912].

2 Stable Isotope Labeling Methods for RNA Mass Spectrometry

While methods that allow one to identify modified nucleosides and map those nucleosides onto specific sequence locations of an RNA sample are quite powerful, methods that allow for quantitative measurement of modification levels are needed to better inform and understand the biological significance of these molecules. As is commonly conducted in other areas of MS, the field of nucleic acid modifications has turned to stable isotope approaches to improve both the qualitative analysis of modified RNAs and the quantitative measurement of modified nucleosides/nucleic acids. Before discussing phosphate labeling of RNA by using 18O-labeled water in detail, it is first of value to briefly summarize other approaches that have been used in the field as those will serve as a good entry point to the analytical advantages of using stable isotopes in the identification of modified nucleosides and mapping those modifications onto RNA sequences.

Two styles of labeling can be employed: in vivo or in vitro. An example of in vivo labeling is when a medium containing stable isotope labeled nutrients (e.g., essential amino acids) is used in the culturing of the organism of interest. The normal biochemistry pathways of the organism will result in the incorporation of the stable isotope, which provides a specific traceable marker for identification in MS. The most common form of in vivo labeling in MS is stable isotope labeling by amino acids in cell culture (SILAC) [13], which has found widespread application in proteomics. SILAC relies on the addition of Leu-D3- or 13C-labeled arginine or lysine to the medium for incorporation in protein synthesis [13, 14].

The alternative approach is to use some in vitro method for labeling. Most often, these in vitro methods rely on chemical or enzymatic strategies to incorporate the stable isotope into the biomolecule(s) of interest. Due to ionization rates and the potential coupling to liquid chromatography, the use of labels as close to the original product is desired. The goal of isotopic labeling is to change the mass of the ion but minimize effects on ionization rates and chromatography retention.

Bruckl and co-workers demonstrated parallel isotope-based quantification of modified transfer RNA (tRNA) nucleosides [15]. A subset of modified tRNA nucleosides were generated using deuterium labeling of a methyl group, which were used as internal standards to quantify modified nucleoside levels in tumor cells versus healthy tissue. The area under the peak of the known concentration spike was compared to the calibration curves created for each isotopically labeled modified nucleoside. Kellner and co-workers have developed a method for the absolute quantification of modified ribonucleosides using biosynthetic isotopomers [16]. By feeding 13C glucose to bacteria, stable isotope-labeled modified RNAs are synthesized by the organism. The stable isotope-labeled RNA can be hydrolyzed to nucleosides and those naturally occurring modified nucleosides, generated by the bacterium, are then used as internal standards.

A different stable isotope labeling strategy was developed by Dickman and co-workers [17]. By using 15N-labeled medium, they could successfully map post-transcriptional modifications in bacterial 16S rRNA. The approach requires two samples—one cultured in 14N-labeled medium and the other in 15N-labeled medium. This labeling approach allows for the unambiguous identification of base composition in each digestion product, thereby improving the accuracy of RNA modification mapping experiments. While the quantitative applications of this approach were not explored in depth by Dickman and co-workers, such a strategy clearly pointed towards the additional utility of isotope labeling for relative quantification during RNA modification mapping experiments.

Williamson and co-workers adapted the Dickman methodology to quantify rRNA modification levels [18]. Cells were cultured in minimal media and minimal media supplemented with 15N ammonium sulfate as the heavy nitrogen source. The heavy labeled culture was used as an internal standard. Known ratios of 15N and 14N cultured cells were combined for relative quantification. Methylated nucleosides in rRNA were quantified using CD3-methionine-supplemented medium while pseudouridine modifications were quantified by cultures supplemented with 5,6-D-uracil.

More recently, an alternative strategy has been developed by Taoka and co-workers for the absolute quantification of post-transcriptional modifications in rRNA [19]. This approach, deemed stable isotope-labeled ribonucleic acid internal standard (SILNAS), relies on an internal standard that is generated by in vitro transcription of RNA using 13C-labeled nucleoside triphosphates, which generates an unmodified copy of the rRNA uniformly labeled. After RNase digestion of both the sample of interest and the in vitro transcript internal standard, any LC peaks that lack a co-eluding heavy transcript (internal standard) were indicative of a modification in that digestion product. The modified oligonucleotide could be quantified through the ion abundance ratio of spike to sample. Building on that approach, we have recently adapted in vitro internal standards for the comparative analysis of tRNA digests [20].

3 18O Labeling in Mass Spectrometry

The origins of 18O labeling in MS first arose in the field of proteomics. In 1951, Sprinson and Rittenberg took advantage of 18O to better understand enzyme activity of proteases [21]. This idea was adapted and used by Desiderio and Kai in protein sample preparation for MS [22]. By 1983, they were taking advantage of stable isotope-incorporated peptide internal standards for field desorption MS quantification of peptides in biological tissues [22]. They were preparing internal standards by digesting proteins with trypsin in 18O-labeled water. Due to the mechanism of trypsin, digestion under these conditions can result in the C-terminus of the peptide being labeled with one or two 18O molecules, which leads to a 2- or 4-Da mass increase in the tryptic peptide (Fig. 1). As such, unique heavy internal standards could be generated and applied to the MS-based analysis of peptides (and proteins) for identification and quantification.

Fig. 1
figure 1

Endonuclease method for RNase T1 and other RNases. The RNA oligonucleotide is cleaved and a 3′ cyclic phosphate intermediate is formed. An 18O atom from the reaction solvent can break the cyclic phosphate and allow for the addition of the stable isotope label to the final 3′ linear phosphate product. Protease method for Trypsin. The protein is cleaved at the C-terminus after arginine or lysine residues. The 18O from the labeled reaction solvent is incorporated into the newly formed carboxylic acid after the cleavage site

In 2000, Mirgorodskaye paired 18O stable isotope labeling of proteins with MALDI-MS for quantification of protein samples [23]. The following year Yao introduced a shotgun comparative proteomics tool based on 16O versus 18O labeling of trypsin digested proteins [24]. This comparative proteomics labeling approach has even been applied to clinical samples when a pooled 18O-labeled reference sample was spiked into patient samples [25]. Unlike in vivo approaches, 18O labeling is cost-effective due to the minimization of “wasting” the stable isotope as labeling is only performed on extracted protein. On the other hand, 18O labeling was found to have drawbacks as compared to in vivo methods including a poorer dynamic range in protein identification and a limit of detection—at best—in the high femtomole range [26].

The use of 18O labeling in MS has not been limited to only proteins. In 2013, Hamasaki and coworkers used solid-phase synthesis to incorporate 18O into the oligonucleotide to enable the MS-based study of oligonucleotide therapeutics [27]. Because these labeled standards are generated via solid-phase synthesis, they can be used as quantitative standards for a variety of pharmacokinetic and pharmacodynamics studies, in particular for oligonucleotide therapeutics like small interfering RNAs (siRNAs). The mechanism of action and drug clearance can be studied directly due to the mass label, which provides advantages over fluorescently tagged siRNAs that may not behave ideally due to the structural differences in the drug caused by the fluorescent tag.

4 18O Labeling of RNA—Early Applications for Mass Spectrometry

Learning from the field of proteomics, Beniam Berhane in our lab began investigating the applicability of enzyme-mediated labeling of nucleic acids using 18O-labeled water. The initial studies focused on whether the similarity of enzyme mechanisms between proteases, such as trypsin, and nucleases, such as ribonuclease T1 (RNase T1), would enable a similar labeling method to be used for RNA (Fig. 1) [28]. Once it was found that RNase T1 could be used to incorporate 18O onto the 3′-terminal phosphate of the oligonucleotide digestion product, this approach was exploited to simplify data interpretation in MALDI post-source decay (PSD) analysis of oligonucleotides [28].

Samples were digested in 50:50 light (16O-labeled) and heavy (18O-labeled) water to give the characteristic doublets for samples successfully digested bearing the 3′-phosphate group (Fig. 2). The doublet leads to simplified identification of products for further analysis. The only potential drawback was the need to use twice as much sample, because one was digested in light water at the same time as the other half of the sample was digested in the 50/50 mixture. This approach allowed for the spectra to be directly compared. Without the “normal” spectrum, it would be difficult to identify the +2 doublet of the 18O-labeled digest.

Fig. 2
figure 2

Reproduced with permission from Berhane et al. [28] Copyright 2003

Expanded view of MALDI mass spectral data obtained from the RNase T1 digestion products of E. coli 5S rRNA. a Digestion was done in unlabeled water. Three major ions are detected (m/z 980, 987 and 998). b After digestion in a 50:50 (v/v) mixture of unlabeled and 18O-labeled water, only those oligonucleotide digestion products that contain a 3′-phosphate will exhibit the characteristic A + 2 doublet. From this, base compositions for the three ions can be made or confirmed.

Once we determined that certain RNases could be used to enzymatically label terminal phosphates with a single 18O, Zhaojing Meng in the lab next turned to developing an approach for quantifying RNase digestion products [29]. This time samples were separately digested in 18O-labeled and 16O-labeled water. Method development was performed using commercially available Escherichia coli tRNA-Val to determine the effectiveness of this strategy for sample quantification (Fig. 3). The heavy and light digestion products were combined in ratios from 1:10 through 10:1 and analyzed using MALDI-MS. The averaged ion abundance ratio (heavy:light) was plotted against the prepared sample ratio to generate a calibration curve.

Fig. 3
figure 3

Reproduced with permission from Meng et al. [29] Copyright 2005

Representative MALDI mass spectrum of RNase T1 digestion products obtained from a tRNA-Val mixture prepared at a heavy-to-light RNA ratio of 2:1. The asterisk denotes expected RNase digestion product pairs. Inset: expanded view of the RNase T1 digestion product 5′-CUCAGp-3′ used for quantifying RNA levels for tRNA-Val with overlaid calculated isotopic distribution assuming a 2:1 ratio.

From this study, it was determined that more accurate results were obtained when the 18O-labeled sample was more abundant than the 16O-labeled sample. When the 16O-labeled sample is more abundant, other natural isotopes in the digestion product (e.g., 13C, 15N) can interfere with accurate determination of the 18O-labeled peak abundance. By ensuring the more abundant sample is labeled with 18O, those interfering isotope peaks from the 16O-labeled sample are proportionally much less than the 18O peak abundance, which minimizes errors in relative quantification. This information can guide the application of this approach in quantitative analysis. Accurate relative quantification required the generation of a calibration curve for each RNase digestion product of interest. To demonstrate the robust nature of this approach, a blinded analysis of heavy and light E. coli tRNA-Val mixtures was performed using the previously established calibration curve.

One of the more significant limitations of using enzyme-mediated labeling of RNA is that this approach requires complete enzymatic digestion of the RNA. As the mechanism involves a cyclic phosphate intermediate (Fig. 1), incomplete digestion products will not be labeled, significantly impacting the utility of this approach [28]. To circumvent this issue, higher amounts of RNases and a longer incubation time at an optimal temperature have been linked to a decrease in cyclic phosphate digestion products [28, 30]. Another experimental challenge identified is the presence of sodium or potassium salt adducts to the RNase digestion products. These adducts can interfere with accurate detection and quantification, thus sample preparation and desalting are important to ensure accurate relative quantification when using MALDI-MS.

Having demonstrated the applicability of 18O-based quantification of individual RNA samples, our interest next turned to using this approach to examine more complex mixtures of RNAs. Our specific interest was to characterize the total pool of tRNAs in a cell, which would obviate the need to individually purify tRNAs one by one from a sample. The analysis of total tRNA pools would not only decrease sample preparation time, it would also allow more information to be gained from a single MS experiment. When such a strategy is applied to the total cellular pool of tRNAs, information regarding codon usage and potential codon bias can be obtained in a more straightforward fashion [31].

The initial method developed to identify individual tRNAs within an unseparated mixture of total tRNAs was presented by Mahmud Hossain, who described the signature endonuclease digestion product (SDP) concept for tRNA identification using MALDI-MS [32]. In short, when one performs an in silico digest of known tRNA sequences (e.g., all E. coli tRNAs), the RNase digestion product masses that result will reveal that each individual tRNA will have at least one digestion product that is unique in both mass and sequence. Thus, these unique or signature digestion products can be used as a proxy to confirm the presence of any specific tRNA within the sample mixture.

Colette Castleberry built upon this SDP strategy by first demonstrating LC–MS/MS was just as effective at SDP identification as MALDI-MS [31]. In the same work, she then focused on combining RNase-mediated 18O-labeling and the signature digestion product concept to create quantitative signature digestion products (qSDPs)—those SDPs that could simultaneously be used for tRNA identification and quantification. The criteria for defining a digestion product as a qSDP include incorporation of the 18O label, a difference in mass by more than 2 Da from other known digestion products, and the labeled SDP must be able to provide a linear response spanning a fivefold change in SDP amount. By creating a set of qSDPs, studies on how specific tRNA levels change as a result of culturing conditions were examined (Table 1).

Table 1 Analysis of RNase T1 quantitative signature digestion products from E. coli..

5 18O Labeling of RNA for Modification Mapping by Mass Spectrometry

Our lab next turned the focus to how 3′-terminal phosphate labeling can enable alternative strategies to characterize RNA samples. Of particular interest to the lab is the discovery and characterization of post-transcriptionally modified nucleosides in RNA samples. As noted earlier, RNA modification mapping by MS is an analytical approach that is used to identify the specific sequence location for modified nucleosides. Although several different strategies have been developed for modification mapping, it was thought that by using 18O labeling, one could multiplex the analysis. This would reduce analysis time and cost as well as improving run-to-run reproducibility.

While 18O labeling only enables duplex analysis (i.e., two different samples in a single analysis), it provides the template for even higher levels of multiplexing in the future. Our first 18O labeled multiplexing investigation was described in 2012 by Siwei Li [33]. Comparative analysis of RNA digests (CARD) pairs a sample with known post-transcriptional modification with a sample of unknown post-transcriptional modifications. The idea is that any peaks appearing as doublets separated by 2 Da indicate that the digestion product from the “unknown” is identical to the digestion product of the “known” or reference sample. Thus, by examining all doublets one can quickly identify the similarities of RNA samples. By the same reasoning, digestion products appearing as singlets (either from the 16O-labeled sample alone or the 18O-labeled sample alone), inform one of differences between the two RNA samples. These differences could arise because the unknown sample is modified differently than the known sample or singlets could arise due to sequence differences in the two samples (Fig. 4).

Fig. 4
figure 4

Reproduced with permission from Li et al. [33] Copyright 2012

Schematic outline of comparative sequencing by isotope labeling and LC–MS where Escherichia coli serves as the reference organism and Citrobacter koseri serves as the candidate (unknown) to be sequenced. tRNA endonuclease digestion products that are equivalent between organisms will appear as doublets (separated by 2 Da) in the mass spectral data; digestion products that are different between the two organisms will appear as a singlet.

If the known is truly “well characterized” in terms of post-transcriptional modification identity and pattern, the CARD approach would simply report sample equivalence through doublet identifications and sample differences through singlet identifications. Siwei first demonstrated the proof-of-concept studies by comparing a single purified tRNA from two bacteria [33], and then expanded this approach into an RNA modification mapping strategy for total tRNAs (Fig. 5) [34]. We found that this CARD approach is most effective when one uses reference and unknown samples (i.e., organisms) that share high sequence homology. By using phylogenetically related organisms, one minimizes the number of singlets that arise due simply to sequence differences so that mapping RNA modifications for the “unknown” sample is enhanced.

Fig. 5
figure 5

Reproduced with permission from Li et al. [34] Copyright 2013

Mass spectra corresponding to a detected singlet when a C. koseri is labeled with 18O and b E. coli is labeled with 18O. The singlet U[s4U]AACAAAGp (m/z 1469.6, 2-charge) arises from the E. coli tRNA-Cys(GCA) as confirmed by the +1 increase in the m/z isotopic envelope after 18O-labeling of E. coli.

While CARD was found to significantly improve RNA modification mapping of total tRNA pools from organisms whose tRNA modification patterns were previously unknown, the minimal mass difference of the 18O label (versus 16O label) combined with interferences from naturally occurring stable isotopes (e.g., 13C) limited our ability to generate automated methods for identifying singlets and doublets within the sample. To overcome this limitation, Collin Wetzel worked with Siwei to investigate culturing conditions that would minimize stable isotope interferences. We used 12C-enriched medium during cell culturing to essentially eliminate 13C (and 15N) isotope interferences during CARD (Fig. 6) [35]. This culturing strategy leads to identification of singlets and doublets that can be automated due to improved differentiation of doublets. Moreover, this strategy can again be combined with the SDP approach to provide more targeted tRNA analysis.

Fig. 6
figure 6

Reproduced with permission from Wetzel et al. [35] Copyright 2014

Improvements in singlet and doublet identification using 12C-enriched medium as illustrated with the doubly charged E. coli total tRNA RNase T1 digestion product A[ms2i6A]AACCGp (MW 2403.4 Da). a Mass spectrum from sample grown in rich medium and labeled with 16O during RNase T1 digestion. b Same sample as in a except labeled with both 16O and 18O during RNase T1 digestion. c Mass spectrum obtained when sample grown in 12C-enriched medium and labeled with 16O during RNase T1 digestion. d Same sample as in c except labeled with both 16O and 18O during RNase T1 digestion. Singlet and doublet identifications are simplified in c and d, respectively, by use of 12C-enriched medium.

Another limitation of the CARD strategy was identified by limiting singlet and doublet measurements to only mass measurements. RNase digestion products having the same mass but different sequences in the two samples could co-elute and appear as doublets, leading to a false positive in the analysis. To address this challenge, Siwei showed that 16O/18O-labeled digestion products can be differentiated based on MS/MS data [36], echoing our very first studies of 18O labeling during MALDI PSD experiments [28].

When oligonucleotides are fragmented during CID MS/MS, the most abundant fragment ions are the c-type and y-type ions, representing the oligonucleotide sequence from the 5′- and 3′-termini, respectively. The y-type ions contain the 16O or 18O label on the 3′-phosphate. Knowing this, Siwei demonstrated that the MS/MS data can be used to confirm that doublets detected in the mass spectrum are truly the same sequence rather than sequence isomers. Doublets detected in the y-type ions during MS/MS can only arise if the two sequences are identical. Sequence isomers are identified by singlets in the y-type ions, which occur wherever sequence differences are present in the original digestion products.

6 Future Outlook

As discussed above, phosphate labeling by using enzyme-mediated incorporation of 18O into RNase digestion products has been used in numerous ways to improve the MS-based characterization of modified RNAs. However, a fundamental limitation remains that the minimal mass difference between 16O and 18O limits the overall utility of this approach to those examples discussed previously. It will be of interest to examine alternative strategies for phosphate labeling, which are known in the field of RNA biology, but which have not yet entered the world of MS.

T1 ligase has been used in radiolabeling for visualizing RNA since the late 1970s [37, 38]. This ligase has not been explored in MS applications because it is known to generate a variety of different side-products. However, more recently, a T4 RNA ligase was created to reduce these unwanted ligation products [39]. With this advancement, T4 RNA ligase may become a more promising tool for stable isotope labeling of oligonucleotides, including RNase digestion products. Another potential enzyme for RNA labeling is Thg1 [40]. The role of Thg1 in the cell is to add a single guanosine nucleotide to the 5′ terminus of tRNA-His. However, it has been shown that this enzyme has 3′–5′ polymerase activity [41]. With additional study, it may be possible to use this unique function and activity to incorporate stable isotopes into specific RNA samples, which could be part of a broader MS strategy for sample characterization.

Another area where RNA MS in general, and RNA modification mapping in particular, can look to for inspiration and ideas for phosphate labeling and multiplexing strategies is the field of proteomics. A significant diversity of multiplexing strategies have been developed in proteomics [42]. Given the demonstrated advantages of relatively rapid characterization of multiple samples by these proteomics approaches, one can envision the development of tools and technologies for RNA MS that provide similar advantages, even if the particular chemistry and biochemistry may differ due to the unique characteristics of RNA. Regardless, MS as a platform for RNA analysis in general, and RNA modification mapping in particular, are now well-appreciated. It remains an ongoing challenge for the community to identify and develop the needed sample labeling tools to take full advantage of this platform.