Introduction

NMR chemical shifts of biomolecules are a rich source of local structural and dynamic information (Mulder and Filatov 2010; Wishart and Case 2001). Their extensive use for protein structure determination is well documented ranging from facilitating resonance assignment (Grzesiek and Bax 1993), detecting cis-peptide bonds (Schubert et al. 2002), predicting secondary structure (Wishart et al. 1992), deriving angle restraints (Cornilescu et al. 1999) to the generation of 3D structures (Cavalli et al. 2007; Shen et al. 2008; Wishart et al. 2008). Their application for RNA structure determination is still limited (Lam and Chi 2010). Especially the information content of 13C chemical shifts of RNA has not been systematically exploited although recent studies showed a strong potential in providing structural information for RNA (Fares et al. 2007; Ohlenschlager et al. 2008).

Despite the fact that frequencies can be measured very accurately with modern NMR spectrometers, the chemical shift is a relative measure that depends strongly on correct calibration to a standard. Inaccurate or incorrect chemical shift referencing can blur or distort the information contained in the chemical shift data. The standard procedure for calibrating chemical shifts of biomolecules is well documented (Wishart et al. 1995) and should be applied prior to any chemical shift assignment. A reliable chemical shift database is indispensable for comparing chemical shifts of different structures, and to reveal structure–chemical shift relationships. Unfortunately, a significant percentage of deposited chemical shifts in the Biological Magnetic Resonance Data Bank (BMRB) (Seavey et al. 1991) is still incorrectly calibrated. A study from 2003 revealed that 25% of all protein entries contained incorrectly referenced 13C chemical shifts, and 40% of all protein entries appeared to have assignment errors (Zhang et al. 2003). In the meantime, a variety of protocols and programs exist to detect and eventually correct calibration errors in deposited protein chemical shifts (Ginzinger et al. 2007; Wang and Wishart 2005; Zhang et al. 2003).

To date, such a procedure is not available for RNA chemical shift depositions. Recent studies of structure–13C chemical shift relationships of RNAs (Fares et al. 2007; Ohlenschlager et al. 2008) noted that inconsistent calibration is a serious problem for RNA chemical shift data. Therefore, a procedure to check 13C calibration in RNAs would be highly desirable. We therefore decided to establish such a procedure. Our analysis of over sixty 13C chemical shift datasets deposited in the BMRB database identified various sources of inconsistencies in 13C chemical shifts allowing us to correct several datasets, and therefore to increase the number of usable chemical shifts datasets. From this improved quality of the datasets, we can start to build reliable statistics that should help us deciphering clear relationships between RNA structure and 13C chemical shifts.

Materials and methods

Data mining

We collected all available 13C chemical shifts of RNAs without binding partners from the BMRB (Seavey et al. 1991) (Table 1). Chemical shifts of six additional RNAs reported only in publications (Butcher et al. 1997; Jucker and Pardi 1995; SantaLucia and Turner 1993; Sich et al. 1997; Smith and Nikonowicz 1998; Szewczak and Moore 1995) and correctly referenced chemical shift data of six RNA stem-loops from our laboratory (unpublished) were added to the final database (Table 1). The secondary and tertiary structure of all datasets was extracted from the associated pdb coordinates, publications and the BMRB star file. The local structure of the terminal nucleotides of all RNAs was determined manually by analyzing the 3D structure using the pdb files, or from the secondary structure if the coordinates were not available. Subsequently, a script written in C++ was used to extract all available chemical shift values for each previously characterized nucleotide from the corresponding star files in the BMRB. These data were then converted into Microsoft Excel format. RNA chemical shift data from publications were entered manually.

Table 1 Datasets used for our analysis of chemical shift inconsistencies

Chemical shift correlations

Microcal Origin (Microcal Software Inc. MA) was used to create 2D scatter plots of chemical shift correlations. The expected chemical shift ranges for the five internal reference values (green boxes in Fig. 4) were defined as 138.7–139.7 ppm for C8 of 5′G, 136.4–137.6 ppm for C8 of 5′GG, 97.4–98.8 ppm for C5 of 3′C, 92.5–93.4 for C1′ of 3′C and 69.4–70.4 ppm for C3′ of a 3′C.

NMR measurements

NMR experiments were performed on AVANCE III (600 or 700 MHz) and AVANCE (900 MHz) Bruker spectrometers equipped with cryogenic probes. Unless indicated otherwise, spectra were recorded at 303 K. Six RNA stem loops with concentrations of 1.5–2.5 mM were used (their secondary structures are depicted in Supplementary Fig. 1 and their preparation is described in the Supplementary Text). With all RNA samples 2D 1H-1H TOCSY, 2D 1H-13C natural abundance HSQC and 2D NOESY spectra were recorded in D2O and a 2D NOESY spectrum in H2O. Typical parameters for the 2D NOESY experiments in D2O were 48 scans, t1max = 55 ms, 2,048 × 1,100 recorded data points, a mixing time of 250 ms and a relaxation delay of 1 s. Typical parameters for the 2D NOESY experiments in H2O were 96 scans, t1max = 33 ms, 2,048 × 1,000 recorded data points, a mixing time of 300 ms and a relaxation delay of 1 s. Typical parameters for the 2D 1H-1H TOCSY experiments were 4 scans, t1max = 25 ms, 2,048 × 512 recorded data points, a mixing time of 50 ms and a relaxation delay of 1 s. The 2D 1H-13C natural abundance HSQC experiment was typically recorded with 220 scans, t1max = 7.5 ms, 2,048 × 300 data points, and a relaxation delay of 1 s. For testing the influence of temperature on the chemical shifts, 1H-13C natural abundance HSQC spectra of stem-loop TASL2 were recorded at 283, 293, 303 and 313 K. Temperatures were calibrated using methanol-d4 (>98.8% D, Armar AG, Switzerland) according to Findeisen et al. (Findeisen et al. 2007). The NMR spectra were processed with the software Topspin 2.1 (Bruker), and analyzed using the software SPARKY (Goddard and Kneller 1999). Spectra were referenced by an external sucrose/DSS sample which is described in detail in the Supplementary Material. The assignment of the six RNA stem-loops will be reported elsewhere.

BMRB accession codes

Chemical shifts of six newly assigned stem-loops were deposited in the BMRB under the accession numbers 17326, 17559, 17560, 17566, 17567 and 17568.

Determination of the sugar pucker

The backbone torsion angles δ were extracted from pdb files using the program AMIGOS (Duarte and Pyle 1998). δ angles between 130° and 190° were classified as C2′-endo (S-type) (Varani et al. 1996). δ angles between 50° and 110° were classified C3′-endo (N-type). These ranges were derived from high-resolution crystal structures, and are used in our laboratory (Oberstrass et al. 2006; Schubert et al. 2007). The δ angle range for C2′-endo is identical, and the range for C3′-endo is very similar to the angles described by Varani et al. (1996) (55°–115°). If the average of the δ angles of the structural ensemble lay in none of these regions, then the pucker was classified as unclear. Cases where the δ angles were found in the C3′-endo region that stand in contrast to experimental data indicating C2′-endo characteristics (e.g. H1′–H2′ couplings or a H1′–H2′ cross peak in the 2D 1H–1H COSY or 2D 1H–1H TOCSY spectrum) were also classified as ambiguous. Covariance ellipses were derived assuming an underlying bivariate normal distribution (Meyer 1975).

Results

Data mining and initial chemical shift analysis

Our initial aim was to perform a statistical analysis of 13C RNA chemical shifts. We used all available BMRB entries containing 13C data of RNA. To eliminate the influence of binding partners in our analysis, we excluded the chemical shift depositions of RNA complexes. This resulted in a database of 58 BMRB 13C datasets. For our subsequent analysis, we added six datasets extracted from publications, and six unpublished datasets of RNA stem-loops, which were prepared for this work. All 70 entries are listed in Table 1. A simple two-dimensional plot of the 13C versus 1H chemical shifts of aromatic C6–H6 and C8–H8 pairs shows an interesting pattern (Fig. 1a). Guanine C8–H8, Adenine C8–H8 and pyrimidine C6–H6 are found in distinct regions. More surprisingly, it appears that within this grouping the peaks split into two clusters, which are separated by 2.5–3 ppm in the 13C dimension (Fig. 1a). One explanation for these two clusters is that 13C chemical shifts were calibrated using at least two different standards. RNA chemical shift data should be referenced like other biomolecules in aqueous solution to 2,2-dimethyl-2-silapentane-5-sulfonic acid (DSS). However, referencing to other standards like tetramethylsilane (TMS)—that is the general standard for substances in organic solvents (Fig. 2a)—was observed. In order to systematically analyze the datasets, we looked for chemical shifts that could serve as internal 13C reference values in RNA.

Fig. 1
figure 1

RNA 1H-13C chemical shift correlations of bases in an A-form helix environment of the initial chemical shift data (a) and after validation and recalibration (b). Correlations of guanines are colored in red, of adenines in green, of cytosines in blue and of uracils in cyan. A/U chemical shifts of entry 5834 were excluded in the final chemical shifts (see footnote of Supplementary Table 2 for more details)

Fig. 2
figure 2

Commonly occurring nucleotides that were used to extract chemical shift reference values. a Structures of the chemical shift reference standards tetramethylsilane (TMS) and 2,2-Dimethyl-2-silapentane-5-sulfonic acid (DSS). The chemical shifts of TMS in respect to DSS were reported previously (Markley et al. 1998; Morcombe and Zilm 2003). b Schematic structure displaying the involved nucleotides in an RNA stem-loop with two G–C closing base pairs. Chemical shifts of nucleotides in and adjacent to mismatches were not used as reference points. c Schematic atomic structure of the 5′- and 3′-end indicating the chemical shift reference values in red. d Regions of a 13C-HSQC spectrum of the stem-loop FZL2 highlighting the C8 chemical shifts of two consecutive guanosines at the 5′-end, and the C3′, C5 and C1′ of a Watson–Crick base-paired cytosine at the 3′-end

Selecting internal 13C reference values for the chemical shift calibration

13C chemical shifts of each nucleotide are highly dependent on the RNA sequence. Nevertheless we could find a set of five chemical shifts that are present in most RNA datasets, and whose values are found in narrow shift ranges in the majority of the datasets. Therefore they are ideally suited as internal references to check the chemical shift calibration. The first two of these ‘reference’ 13C chemical shifts are the C8 resonances of G1 and G2 found at the 5′-end of most RNAs prepared by in vitro transcription, and denoted here as 5′G or 5′GG, respectively (Fig. 2b, c). Characteristic C8–H8 cross peaks occur at ~139.1/~8.15 ppm and ~137.0/~7.65 ppm in a 13C-HSQC spectrum for 5′G and 5′GG, respectively (Fig. 2d). The terminal 5′G lacks a 5′ stacking neighboring base, thus resulting in a very distinct shift for its C8–H8 making it easily accessible. A mono- or a triphosphate at the 5′-end does not appear to modify the C8 chemical shift (within 0.1 ppm, see Fig. 3). Even a complete lack of phosphate, as found in chemically synthesized RNAs, does not significantly influence the 13CC8 chemical shifts of the 5′G; the value is for example 138.8 ppm in entry 15571. Since GG is a frequently used starting sequence for RNA made by in vitro transcription, the 13C C8 resonance of G2 is a good second reference value (5′GG) for most RNAs (44 out of 70). The third reference value is the C3′ 13C chemical shift of the last 3′-nucleotide (Fig. 2b, c), which also occurs in a distinct position of a 13C-HSQC spectrum (~69.9/~4.19 ppm, see Fig. 2d), because this nucleotide is lacking a phosphate at the 3′-end. This value is apparently independent of the 5′-neighbour. The fourth and fifth reference values are the C1′ and C5 chemical shifts of the 3′ terminal cytosine (3′C) involved in a Watson–Crick base pair with 5′G1 displaying 13C values of ~92.9 ppm and ~98.1 ppm, respectively. In contrast to the other reference chemical shifts, these two resonances are not found in a very distinct region of the 13C-HSQC spectrum (Fig. 2d) and a slight dependence of the 5′ neighbor might be possible. Nevertheless, these values are usually correctly assigned and can provide information to help detecting systematic errors in chemical shift datasets.

Fig. 3
figure 3

Illustration of the effect of tri- versus monophosphate at the 5′-end on 1H and 13C chemical shifts. a and b 2D NOESY and 13C-HSQC spectra of the RNA stem-loop TASL1 transcribed in the presence of GMP. The H6/H8-H1′ walk is indicated by lines. There are two sets of signals visible for the 5′-end corresponding to a terminal monophosphate (red) and triphosphate (green). The 13C chemical shifts of both sets are virtually identical. c and d 2D NOESY and 13C-HSQC spectra of the RNA stem-loop TASL3 transcribed in the absence of GMP. Only one set of signals is visible for the 5′-end corresponding to a terminal triphosphate (green)

Correlations of internal reference values reveal correct calibration

In order to evaluate the 13C calibration, we analyzed the chemical shift distributions of these five reference values in all collected 13C RNA datasets using 2D correlation plots. Figure 4 shows four 2D correlations among the five references: between the two C8 of 5′G and 5′GG (Fig. 4a), between the C8 of 5′G and C3′ of 3′C (Fig. 4b), between the C1′ and C5 of the 3′C (Fig. 4c) and between the C1′ and C3′ of the 3′C (Fig. 4d). In all correlation plots the majority of the datasets cluster within ranges of about 1 ppm, indicating correct referencing (green boxes in Fig. 4).

Fig. 4
figure 4

Carbon-carbon chemical shift correlations of RNA chemical shift reference values of all 13C RNA datasets (excluding the six unpublished datasets from our laboratory). a Correlations between C8 chemical shifts of Watson–Crick base paired guanosines at the 5′-end (δ13CC8 5′G) and C8 chemical shifts of Watson–Crick base paired guanosines following a guanosine at the 5′-end (δ13CC8 5′GG). b Correlations between C8 shifts of guanosines at the 5′-end (δ13CC8 5′G) and C3′ shifts of cytidines at the 3′-end (δ13CC3′ 3′C). c Correlations between C1′ chemical shifts of guanosines at the 3′-end (δ13CC1′ 3′C) and C5 chemical shifts of cytidines at the 3′-end (δ13CC5 3′C) and d Correlations between C1′ chemical shifts of cytidines at the 3′-end (δ13CC1′ 3′C) and C3′ chemical shifts of cytidines at the 3′-end (δ13CC3′ 3′C). The green boxes indicate the expected ranges for correctly calibrated reference values. The blue boxes are shifted by 2.7 ppm compared to the green boxes. The black lines have a slope of 1. For each off-diagonal data point either the BMRB entry number or the PDB code is indicated. A black arrow indicates data points lying outside the range of the figure

However, several datasets present equally shifted carbon chemical shift values for both resonances, and therefore appear shifted along a line with a slope of 1 drawn in each figure. Along this line, a second cluster appears shifted by ~2.7 ppm in all four 2D plots (Blue box). This 2.66 ppm offset is likely to coincide with the 13C chemical shift difference between 2,2-dimethylsilapentane-5-sulfonic acid (DSS) and tetramethylsilane (TMS). TMS is the default 13C standard on Bruker spectrometers. However, biomolecules should be referenced via the absolute 1H frequency of DSS multiplied by the ratio 0.251449530, yielding the absolute 13C frequency of DSS which is then set to 0 ppm (Markley et al. 1998). Since 1H chemical shifts of proteins are almost always calibrated correctly in contrast to heteronuclear data (Wang and Wishart 2005), we assume this holds true for RNA chemical shifts. The origin of this 2.66 ppm offset is described in more detail in the Supplementary Material. Although indirect chemical shift referencing was introduced as the standard for biomolecular NMR (Wishart et al. 1995), it is still not generally followed. However, this offset of 2.66 ppm can be easily corrected by a simple addition. When 2.66 ppm is added, all datasets lying in the blue box are found in the correct green box.

The origin for other calibration inconsistencies as depicted in Fig. 4 is not always clear. Since the C8 shifts of 5′G and 5′GG are usually recorded in the same spectra, an off-diagonal correlation cannot originate from mis-calibration, and must therefore result from a mis-assignment (Fig. 4a). The same considerations are true for the sugar shifts of the 3′C (C1′ and C3′, Fig. 4c). Chemical shifts that could originate from two different spectra could potentially differ in calibration. In this case, a correlation away from the diagonal could be the result of two differently calibrated spectra, or from a mis-assignment. Such cases appear in Fig. 4b for 5′G C8—3′C C3′ correlations.

Experimental chemical shift of internal 13C reference values

We transcribed six RNAs ranging from 20 to 30 nts (Supplementary Fig. 1) and assigned them by NMR spectroscopy. All internal reference values of those RNAs cluster in even narrower ranges within the green boxes. To verify that the chemical shifts of the internal referencing values stay within the defined tolerances (green boxes) under a variety of solution conditions, we measured spectra of the 26 nt stem-loop TASL2 at several temperatures ranging from 10 to 40°C, at several pH conditions ranging from 5 to 8, and at different NaCl concentrations ranging from 0 to 200 mM, with or without KH2PO4/K2HPO4 buffer. The five chemical shift reference values vary only within a small range (≤0.1 ppm compared to conditions at 30°C pH 6.0), and are therefore independent of temperature, pH and salt concentration (Supplementary Table 1). One exception is the small deviation observed for the C3′ 13C of the 3′C which varies for low and high temperature by −0.2 ppm at 10°C and +0.2 ppm at 40°C. In addition, the C8 13C chemical shifts of the 5′G increases by +0.2 ppm at 200 mM NaCl. The following ranges were measured, namely 139.1–139.2 ppm for C8 of 5′G, 136.8–136.9 ppm for C8 of 5′GG, 97.9–98.2 ppm for C5 of 3′C, 92.8–92.9 for C1′ of 3′C and 69.8–69.9 ppm for C3′ of a 3′C. The 13C chemical shifts were indirectly referenced to DSS (2,2-dimethyl-2-silapentane-5-sulfonic acid) according to the recommendations for biomolecules (Markley et al. 1998).

Correction of the chemical shift data

Forty-nine of the 64 RNA 13C chemical shift datasets (without our 6 RNAs) contain at least two of the internal 13C reference chemical shifts that allowed us to evaluate the calibration of these datasets (Table 1). We used a color code to indicate if each individual reference value is correct (green), either shifted by 2.66 ppm or diagonally shifted (yellow), is not assigned (black), absent in the RNA sequence (blank) or outside the expected ranges without detectable systematic error (red). For 23 datasets all assigned internal reference frequencies are lying within the expected chemical shift range, and are therefore counted correctly referenced (Table 1, category I). In addition we added six correctly referenced datasets from our laboratory which extend category I to 29 datasets. 17 datasets (category II) contained inconsistent shift values, but could be recovered by either detecting correct parts in the datasets or by recalibrating the datasets. There are two cases (category IIa) with a single outlier of more than 30 ppm indicating that the outlier is not systematic. Seven datasets (category IIb) have at least two reference values correctly referenced that were recorded in one spectrum. For example, two C8 shifts within the expected region strongly indicate that also the other C8/C6/C2 shifts of the RNA are likely to be correct, independent of whether or not the C1′ shifts are consistent. In 8 datasets (category IIc and IId), all the reference values are shifted by approximately the same value. The offset of the five datasets of category IIc can be explained by the improper calibration to TMS instead of DSS (blue boxes). While these datasets can be easily recalibrated by adding 2.66 ppm to all 13C chemical shifts, datasets of category IId require recalibration by a different offset. For 10 datasets (category III), the origin of the inconsistency is not clear from the reference values. Therefore, we did not attempt to recalibrate these datasets. 15 RNA datasets lacked our internal reference values (category IV) and could not be evaluated. This was either due to the absence of chemical shifts or the RNA termini differed from Fig. 2b. Comments for each individual case can be found in Supplementary Table 2.

To demonstrate the benefit of proper calibration, we show in Fig. 1b the corrected 13C chemical shift values of the datasets of category I, IIa, consistent parts of category IIb and the recalibrated values of category IIc. The filtering and recalibration significantly improved the quality of the data, resulting in a much improved correlation between the C6/C8 and H6/H8 chemical shifts (Fig. 1b). The higher reliability and accuracy of the data revealed additional systematic inconsistencies that were not detected earlier. In one case we detected a systematic offset of C6/C8 chemical shifts of Ura/Ade that was not observed for Cyt/Gua bases (BMRB entry 5834). In another case (BMRB entry 15656, category IIb) in which the C5 reference resonance of 3′C was outside the expected range, all C5 chemical shifts are systematically shifted by ~2 ppm as illustrated in Fig. 5. For details, see footnotes of Supplementary Table 2.

Fig. 5
figure 5

H5-C5 correlations of cytosines (a) and uracils (b) in an A-helix environment of the corrected database containing categories I and IIa-c. The C5 reference value of the dataset 15,656 was outside the expected region. Here it is apparent that all C5 chemical shifts are systematically downfield shifted by ~2 ppm. This deeper analysis can help to distinguish systematic from nonsystematic errors

Correct referencing of the 13C chemical shift database results in better structure–chemical shift relationships: sugar pucker–13C chemical shift correlations

It was shown earlier that the sugar pucker conformation influences the sugar 13C chemical shift values (Ohlenschlager et al. 2008; Varani and Tinoco 1991). We wanted to determine whether we could now get a good correlation using our ensemble of corrected chemical shift data. For 29 datasets, we could also identify pdb files from which dihedral angles could be extracted. We first investigated the correlation between C1′ chemical shifts and the sugar pucker conformation. Purines and pyrimidines are treated separately because the type of base attached to the sugar affects the C1′ chemical shift. As shown in Fig. 6, purines and pyrimidines show clearly different C1′ 13C chemical shifts depending on the sugar pucker. However, there is still some overlap between the different pucker states. Nucleotides in an exchange between the pucker states typically have intermediate chemical shifts (Varani and Tinoco 1991). This agrees with the observed chemical shifts of the C1′ shift of the 5′G and the C1′ shift of the 3′C, which are known to be in equilibrium between C2′- and C3′-endo conformations. The separation for the two sugar pucker conformations is similar to the ones found in a previous study using a linear combination of chemical shifts optimized to get maximal separation (Ohlenschlager et al. 2008). In contrast to this mentioned study only one chemical shift is required here. An even better separation of the sugar puckers can be obtained by considering C1′–C4′ 2D correlations (Fig. 7). By assuming an underlying 2D Gaussian distribution we calculated the corresponding covariance ellipses at two standard deviations in which 86% of the data points are supposed to lie. A clear separation of the different sugar puckers for purines (Fig. 7a) and pyrimidines (Fig. 7b) was obtained. The chemical shifts of C3′ show also an obvious dependence on the sugar pucker whereas the C2′ does not (Fig. 8). Altogether the sugar puckers appear to be predictable on the basis of the C1′, C3′ or the C4′ chemical shifts. In addition the C2′–C3′ 2D plots allowed us to detect potential swapped assignments in the C2′ and C3′ chemical shifts of some sugar resonances (Fig. 8).

Fig. 6
figure 6

Dependence of 13C chemical shifts on the sugar pucker. Histograms showing the correlation between RNA C1′ 13C chemical shifts of (a) purines and (b) pyrimidines with the sugar pucker conformation. Riboses adopting C3′-endo conformation are colored red; riboses adopting C2′-endo conformation are colored blue. Riboses with unclear sugar pucker conformation are colored grey. Average values found for the δ13CC1′ of the 5′G and the δ13CC1′ of the 3′C are indicated by black arrows

Fig. 7
figure 7

2D ribose chemical shift correlations in dependence of the sugar pucker. C1′–C4′ chemical shift correlations of purines (a) and pyrimidines (b), respectively. C2′-endo conformers are colored blue, C3′-endo conformers are colored red and nucleotides with unclear pucker conformation are colored grey, respectively. Covariance ellipses at 2 standard deviations of a bivariate normal distribution (86 percent of the data points are supposed to lie in the corresponding ellipses) show a clear separation between C2′-endo and C3′-endo conformations. Labels indicate the BMRB and residue number of outlier data points

Fig. 8
figure 8

2D ribose chemical shift correlations in dependence of the sugar pucker. C2′–C3′ chemical shift correlations of purines (a) and pyrimidines (b), respectively. The different sugar pucker conformations are nicely separated along the C3′-axis. No clear correlation with the C2′ chemical shift can be seen. Six pyrimidines show unexpected C2′ values between 72 and 73.5 ppm. Revision of the secondary structure reveals that most of these outliers can be found in an A-RNA helix environment where C3′ conformation is expected and swapping C2′ and C3′ chemical shifts values would bring both chemical shift values in the expected ranges. Since it is often complicated to unambiguously assign C2′ and C3′ chemical shifts it is very likely that these shifts were swapped during the assignment process

Discussion

The splitting of the 13C chemical data into two clusters (Fig. 1a), as well as previously described problems caused by improper 13C chemical shift calibration (Ohlenschlager et al. 2008), illustrate the importance of a validation procedure for deposited RNA 13C chemical shifts. For validating proper referencing of 13C resonances in RNA, we propose five internal chemical shift standards that are found in most RNA structures studied by NMR, and do not vary with solution conditions, two from guanines at the RNA 5′-end (C8 of 5′G and 5′GG) and three from a cytosine at the RNA 3′-end (C1′, C3′ and C5). Using these references, we found that only 22 datasets were correctly referenced and contained exclusively correct reference values. We were able to increase the number of usable datasets from 22 to 45 after corrections of several datasets and by adding six (Table 1). Among those, 8 datasets were recalibrated, 9 datasets were partially recalibrated (inconsistent parts were omitted) and 6 additional datasets were contributed from our laboratory. Improper calibration was the main source of errors. In a few cases a more detailed evaluation was necessary to distinguish systematic from non-systematic errors (Fig. 5). Overall, more than 50% of the published 13C chemical shift data of RNAs are not properly calibrated, or contain obvious errors. This is much more than we expected since about 25% of wrongly calibrated datasets were reported for protein 13C shifts (Zhang et al. 2003). Each individual dataset is mentioned in Supplementary Table 2. In contrast to the initial data, the ensemble of correctly calibrated and corrected data shows a clear clustering of chemical shifts depending on the residue type (Fig. 1) suggesting that the entire database can now be used to systematically analyze the dependence of 13C chemical shift values on RNA sequence and structure. So far the presented method is limited to a subset of RNAs containing specific bases at the 3′ and 5′ ends that need to be base-paired. However, further analysis of the corrected 13C database will reveal other typical chemical shifts suitable as internal reference values that could then be used to validate the 13C calibration of RNAs with different termini or lacking assignments of the terminal nucleotides.

As a first application, we could use this corrected database by showing a clear correlation between the conformation of the sugar pucker and the C1′, C3′ or C4′ 13C chemical shifts (Figs. 6, 7 and 8). In a previous study, Ohlenschlager et al. needed to use a linear combination of several 13C ribose chemical shifts (Ebrahimi et al. 2001) to predict the sugar pucker conformations yielding ~95% correct predictions (Ohlenschlager et al. 2008). With our corrected database, we can obtain equally high prediction rates for the sugar pucker conformation by directly using 13C C1′, C3′ or C4′ chemical shifts with no need of linear combinations. This method is simpler, and not dependant on a full assignment of the sugar. Furthermore, the three values can be used for independent confirmation.

In order to prevent the publication of improperly referenced RNA chemical shifts in the future, we suggest that the five internal reference shifts proposed here should be used as a method for validation of future depositions. We nevertheless would like to emphasize the importance of correct referencing according to the recommendations for biomolecules (Markley et al. 1998). Since proper indirect chemical shift referencing seems to be less established in the RNA-NMR community, we provide a detailed calibration procedure in the Supplementary Material to ensure proper referencing for future depositions into the BMRB. Improving the quality of the 13C chemical shift data within the BMRB database should lead to more structure–chemical shift relationships for RNA that could be exploited to help resonance assignments, and facilitate RNA structure determination with NMR.