Introduction

Although methods for accurate stereospecific assignments of resonance lines in folded proteins exist, they are often cumbersome and require additional spectrometer time, which is not always available. These stereospecific assignments are often required for a detailed structural analysis of a given protein or an enzymatic reaction but are also crucial for general NMR based bioinformatics. For biological applications the biological magnetic resonance data bank (BMRB) (Ulrich et al. 2007) is the most widely used data base for chemical shifts, which is often evaluated for stereospecific assignments. An example are the side chain amide protons in asparagine and glutamine residues that provide usually two well separated resonance lines.

The signals of the two amide protons can often be assigned stereospecifically by their typical homonuclear NOE (nuclear overhauser effect) (Wüthrich 1986) or heteronuclear J-coupling patterns (McIntosh et al. 1997). The distributions of the chemical shifts of these protons in Asn and Gln can be retrieved from the BMRB including their supposed stereochemical assignment. The probability density distributions for the given protons have, independent of the fact that they are supposed to be assigned stereochemically, two maxima separated by approximately 0.67 ppm (Asn) and 0.68 ppm (Gln) (Fig. 1). Given that the chemical shift differences are mainly determined by the chemical structure as suggested by the values of 0.68 ppm (Asn) and 0.71 ppm (Gln) obtained in the random-coil model peptides Gly-Gly-Asn-Ala and Gly-Gly-Gln-Ala (Bundi and Wüthrich 1979), respectively, such bimodal probability distributions are difficult to explain by a simple physical model. In Gly-Gly-Asn-Ala-NH2 and Gly-Gly-Gln-Ala-NH2 the resonances of the amide protons were assigned stereospecifically by a combination of molecular dynamics with NOESY-spectroscopy. The upfield shifted amide protons were assigned to Hδ21 and Hε21, respectively (Harsch et al. 2013). It has been suggested by Harsch et al. (2013) that a large part of the entries is most probably wrongly submitted as being stereospecifically assigned into the BMRB data base. The large chemical shift difference observed in model peptides allows a direct stereochemical assignment in random-coil structures from the chemical shifts only. If such a chemical shift difference is also found for folded proteins, a direct stereospecific assignment of these protons from the chemical shifts alone would be possible. However, such a difference cannot be directly deduced from the data deposited BMRB database (see above). In the paper presented here, we will show that a stereochemical assignment is also possible for proteins from the chemical shifts alone and that the reliability of the corresponding assignments can be calculated by a Bayesian ansatz.

Fig. 1
figure 1

Chemical shift distributions of side chain amide protons in asparagines and glutamine residues. Data were taken from the BMRB database. a Asn Hδ21, b Gln Hε21, c Asn Hδ22, d Gln Hε22. The distributions were created by binning with 0.05 ppm. (Black) all entries of the BMRB where two chemical shift values were given, (red) only amide groups with ambiguity code 1 (claimed as stereospecifically assigned). f(δ) is the frequency of the chemical shifts δ when the areas of the gray curves are normalized to 1

Materials and methods

Calculation of side chain amide chemical shift distributions

Experimental amide proton chemical shifts were taken from the BMRB (state: 12.01.2017). When only one chemical shift value is given in the datatbase for one of the two atoms of the amide group, both chemical shifts of the sidechain protons were omitted. 417 entries of proton chemical shifts were therefore removed for Asn and 308 entries for Gln. At the end, the database for Asn was reduced to 20,220 sidechain groups and the database of Gln was reduced to 18,474 sidechain groups respectively. The unbiased artificial chemical shift data base (UACSB) was recalculated from the unbiased protein structural data base NH3D (Sillitoe et al. 2013). The structures with added protons were energy minimized in explicit water using the molecular dynamics program GROMACS (version 4.6.1) (Berenden et al. 1995; Hess et al. 2008). The final energy minimized structure of each model was then used for a molecular dynamics simulation in explicit water. After equilibration, 400 ps trajectories have been calculated. Eleven equidistant structures were extracted from these trajectories for creation of the UACSB database. Since stereospecifically defined atom names may change after energy minimization (bond rotation), it was checked again and if necessary corrected after minimization with the AUREMOL (Gronwald and Kalbitzer 2004) tool IUPACify. Chemical shifts were calculated using SHIFTS (version 4.3) (Osapay and Case 1991) or SHIFTX2 (version 1.04) (Neal et al. 2003). Since the stereospecific assignments of the random coil values of the side chain amide protons were incorrect for Asn in Gln in SHIFTS and incorrect for Asn in SHIFTX1, the stereospecific assignments of the random coil basis was corrected for UACSB by us (Harsch et al. 2013). The program REDUCE (Word et al. 1999), which is recommended by SHIFTS (Osapay and Case 1991) for adding protons to X-ray structures, also uses a definition of the names of two protons that is not in agreement with the stereochemical IUPAC definition. Details about UACSB will be published elsewhere.

Assessment of stereospecific assignment

The side chain amide groups contain a pair of protons called Hδ21 and Hδ22 in Asn and Hε21 and Hε22 in Gln with a stereochemistry, which is defined in the IUPAC/IUB recommendations (Markley et al. 1998). In the following, the corresponding chemical shifts are called δ 1 and δ 2. The observed chemical shifts δ can always be separated into three contributions, a contribution from the intrinsic chemical structure δ rc, a contribution from the three-dimensional structure δ 3D, and a term δ cor containing contributions (e.g. exchange processes), that cannot be separated completely in δ rc and δ 3D. A good approximation is to define δ rc by values taken from random coil peptides. Unfortunately, only for a few non-equivalent resonances stereospecific assignments in random-coil models are published. For the side chain amide groups they are given by Harsch et al. (2013). The chemical shift δ is then given by

$$\delta ={{\delta }_{\text{rc}}}+{{\delta }_{\text{3D}}}+{{\delta }_{\text{cor}}}$$
(1)

For the chemical shift difference Δδ = δ 2 − δ 1 one obtains

$$~\Delta \delta =\Delta {{\delta }_{\text{rc}}}+\Delta {{\delta }_{\text{3D}}}+\Delta {{\delta }_{\text{cor}}}$$
(2)

If C 1 is the class of correct stereospecific assignments and C 2 the class of incorrect stereospecific assignments, then the conditional probability P(C 1| Δδ) can be calculated according to Bayes that for a given Δδ a stereochemical assignment is correct by

$$P({{C}_{1}}|\Delta \delta )=\frac{P(\Delta \delta |{{C}_{1}})P({{C}_{1}})}{P(\Delta \delta )}=\frac{P(\Delta \delta |{{C}_{1}})P({{C}_{1}})}{P(\Delta \delta |{{C}_{1}})P({{C}_{1}})+P(\Delta \delta |{{C}_{2}})P({{C}_{2}})}$$
(3)

When a correct assignment is not known a priori, then P(C 1) = P(C 2) = 0.5 can be assumed. However, if other information such as information from NOESY-spectroscopy becomes available, P(C 1) and P(C 2) would have to be adapted.

By using difference distributions the problem is reduced to one dimension but information about the individual chemical shifts is lost. Equation 3 can be rewritten for pairs of chemical shifts by

$$P({{C}_{1}}|{{\delta }_{1}},{{\delta }_{2}})=\frac{P({{\delta }_{1}},{{\delta }_{2}}|{{C}_{1}})P({{C}_{1}})}{P({{\delta }_{1}},{{\delta }_{2}}|{{C}_{1}})P({{C}_{1}})+P({{\delta }_{1}},{{\delta }_{2}}|{{C}_{2}})P({{C}_{2}})}$$
(4)

For becoming more realistic the probability distributions have to be smoothened. Here we use a kernel density estimator with Gaussian kernel k(δ)

$$k(\delta )=\sum\limits_{i=1}^{n}{\frac{1}{\sqrt{2\pi }h}\exp \left( -\frac{1}{2}{{\left( \frac{\delta -{{\delta }_{i}}}{h} \right)}^{2}} \right)}$$
(5)

The bandwidth h of the kernel was either determined by Silvermans rule (Silverman 1998)

$$h={{\left( \frac{4{{\sigma }^{5}}}{3n} \right)}^{1/5}}\,$$
(6)

or according to Nasser (2006) using a dynamic bandwith h i (Eq. 7). In Eq. (6) the overall bandwidth h is determined by the standard deviation σ of the underlying chemical shift dataset and the number n of data points in the set. The dynamic bandwith h i replaces h in the kernel of Eq. 5 with

$$\mathop{{h}_{i}}=\sqrt{\frac{1}{n-1}{{\sum\limits_{j=i-m}^{i+m}{\left( {{\delta }_{j}}-{{{\bar{\delta }}}_{i}} \right)}}^{2}}}$$
(7)

h i is the bandwidth of the predicted chemical shift value δ i and is calculated over a sliding window of the size 2m + 1 data points in the sorted set of chemical shift values. The sliding window is centered around the i-th value. This kind of smoothing was used for Fig. 2.

Fig. 2
figure 2

Distribution of the pairwise sidechain amide proton chemical shift differences. The chemical shifts differences Δδ of the proton resonances of the sidechain amide groups δ(Hδ22) − δ(Hδ21) of Asn and δ(Hε22) − δ(Hε21) of Gln were calculated from the BMRB data. The chemical shifts difference distributions were smoothed by the application of a kernel density estimator with a Gaussian kernel (Silverman 1998). a Asn, b Gln. f(∆δ) is the frequency of the chemical shift differences ∆δ when the area of the gray curves is normalized to 1

In addition to the Bayesian analysis, the data were also analyzed with a support vector machine from scikit-learn (Pedregosa et al. 2011).

Software

The programs GROMACS, SHIFTS, SHIFTX2, and scikit-learn 0.14.1 can be downloaded from http://www.gromacs.org, casegroup.rutgers.edu, http://www.shiftx2.ca, and scikit-learn.org respectively. The program AUREMOL as well as the chemical shift data base are available at http://www.auremol.de.

Results and discussion

Chemical shift distribution of side chain amide protons in the BMRB data base and in the chemical shift data base UACSB

If the hypothesis, that the bimodal chemical shift distribution is the consequence of errors in the stereospecific assignments in the BMRB only, is correct, the chemical shift difference of a given pair of amide protons should be bimodal and the two maxima of the distribution should, as it is indeed observed (Fig. 2), be located symmetrically about 0 ppm. The obtained distributions of the pairwise differences (Fig. 2) is surprisingly similar for Asn and Gln, in contrast to the rather different frequency distributions obtained for the individual chemical shifts. Again this is plausible, since the distributions of chemical shifts in the bimodal distribution of Fig. 1 depend on the distribution of correct and incorrect stereochemical assignments of the data base as well as the non-correlated absolute structure dependent shifts δ 3D of the resonances that may be quite different for Asn and Gln. However, the differences of the pairwise chemical shifts Δδ are mainly determined by the random-coil chemical shift difference Δδ rc,, that is in most cases larger than Δδ 3D, and the distribution of correct and incorrect stereochemical assignments.

Experimentally determined chemical shift distributions are, especially with regard to stereospecific assignments, prone to errors, but chemical shift data bases recalculated from three-dimensional structures have the advantage that assignment errors cannot occur. Usually, chemical shifts are calculated on the basis of random-coil chemical shifts modified by tertiary structure effects. Although the accuracy of the predictions depends on the chemical shift prediction program used, general trends caused by tertiary structure effects can be predicted rather well. For this work, we used the unbiased chemical shift data base UACSB which is based on a structurally unbiased set NH3D of X-ray structures (Sillitoe et al. 2013).

Figure 3 compares the chemical shift distributions obtained from UACSB with the distributions obtained from the BMRB data base. It shows an almost perfect agreement with a subset of the chemical shift distributions obtained from the BMRB indicating that the bimodal distributions probably are assignment artifacts. As suggested, the chemical shift distributions of the two protons are well separated. As a rule, the Hδ21 resonance in Asn is shifted downfield relative to the Hδ22 resonance and the Hε21 resonance in Gln is shifted downfield relative to the Hε22 resonance. In UACSB one can select which of the two programs SHIFTS and SHIFTX2 is applied to the structures refined in explicit water. Note that we had to introduce a correction for the stereospecific assignment (see Materials and Methods). The two distributions shown here were calculated with SHIFTS, which creates somewhat better results for the amide protons. A detailed analysis of the general differences between the two programs will be presented elsewhere since they are not important in the actual context of stereochemical assignment.

Fig. 3
figure 3

Chemical shift distribution of side chain amide obtained from UACSB. The distribution of chemical shifts a δ (Hδ21) and c δ(Hδ22) of Asn and b δ(Hε21) and d δ (Hε22) of Gln. (Gray) distributions obtained from the BMRB, (red) distributions obtained from UACSB. The distributions were smoothed by the application of a kernel density estimator with a Gaussian kernel. The bandwidth of the kernel was derived by a sliding window. The bandwidth is the standard deviation σ of the window which contained 15% of the whole dataset and was centered on the current data point. The random-coil values of the UACSB distribution were corrected to achieve the best match between the experimental and calculated distributions. The correction values were −0.025 ppm for Asn Hδ21, −0.078 ppm for Asn Hδ22, 0.021 ppm for Gln Hε21 and −0.046 ppm for Gln Hε22. f(δ) is the frequency of the chemical shifts δ when the area of the gray curves is normalized to 1

Deconvolution of the experimental side chain amide chemical shift distribution

The question arises if we can deconvolute the BMRB chemical shift distribution for obtaining a “true”, corrected distribution. For such a deconvolution a classifier is required that determines if a pair of chemical shift values has been correctly assigned or if the assignment has to be corrected. The simplest classifier has been proposed by Harsch et al. (2013) with the assumption that P corr δ) = 1 for Δδ < 0, P corrδ) = 0.5 for Δδ = 0, and P corrδ) = 0 for Δδ > 0. Since it represents the probability that a stereochemical assignment is correct, a more accurate definition would acknowledge that the assignment is always correct when both chemical shifts are identical (Δδ = 0). This leads to the improved definition P corrδ) = 1 for Δδ ∈ (−∞, 0] and P corrδ) = 0 for Δδ ∈ (0, ∞]. A second classifier could be a Bayesian classifier derived from the theoretical Δδ distribution (Fig. 4). Using these classifiers, one could reorganize the BMRB data for finding the most probable distribution (Fig. 5) of the experimental data. If we assume a confidence level of 95%, approximately 55% of the stereochemical assignments in the BMRB would be correct, approximately 32% are recognized as exchanged, and for approximately 13% of the assignment we could not make decision for this confidence level (Table 1).

Fig. 4
figure 4

Theoretical amide chemical shift differences and Bayesian probabilities. The ∆δ distributions were calculated from UACSB (grey). The distributions were smoothed by the application of a kernel density estimator with a Gaussian kernel (Silverman 1998). In addition, the probability P(∆δ) of the corresponding Bayesian classifier (green, y2-axis) is shown (see “Materials and methods” section). A Gausssian fit of the ∆δ distributions of UACSB is plotted in red and the probability curve of the corresponding Bayesian classifier is plotted in blue (y2-axis). a Asn, b Gln. f(∆δ) is the frequency of the chemical shift differences ∆δ when the area of the gray curves is normalized to 1

Fig. 5
figure 5

Reconstructed experimental chemical shifts of side chain amide protons. Corrected chemical shift distributions were calculated from the BMRB distributions by using the Bayesian classifier (see Fig. 4). a (Hδ21) and c (Hδ22) of Asn and b (Hδ21) and d (Hδ22) of Gln. A confidence level of 95% was used for the correction of the order of the sidechain NH2-proton resonances of Asn and Gln. f(δ) is the frequency of the chemical shifts δ when the area of the gray curves is normalized to 1.

Table 1 Assessment of the stereospecific assignments of amide protons in the BMRBa

A program for determining the stereochemical assignment from chemical shifts only

When a pair of amide protons has been assigned, from the chemical difference the probability for a given stereochemical assignment can be calculated by applying the Bayes theorem (Eq. 3) to the chemical shift difference distribution. The corresponding software tool AssignmentChecker is implemented in the program AUREMOL (menu point: Assignment—Check stereospecific assignment of sidechain NH2-groups). Here, either a pair of chemical shifts can be entered manually or a list of chemical shifts can be read in and the stereochemical assignment will be corrected. Independent of the software presented, at a confidence level >95% a stereochemical assignment is correct when the chemical shift difference δδ21) − δδ22) is greater than 0.4 ppm for Asn or δε21) − δε22) is greater than 0.42 ppm for Gln, respectively.

Somewhat more significant results can be expected when both chemical shifts and not only the chemical shift difference of the sidechain amide group resonances are used as input. We tested the two-dimensional Bayesian ansatz (Eq. 4) as well as a support vector machine (Schölkopf and Smola 2002), which worked with the chemical shift values as input. For both methods, the results improved only marginally compared to the one-dimensional Bayesian ansatz (Eq. 3), which had the chemical shift difference Δδ of the resonance lines of the sidechain NH2-groups as input. Therefore, we implemented only this type in AUREMOL.

Structural implications

The theoretical chemical shift distributions calculated from the unbiased structural data set (Fig. 4) as well as the corrected experimental distributions from the BMRB (Fig. 5) are characterized by a central Gaussian like distribution but also by significant frequency densities in the overlap region of the Hδ21/ε21 and Hδ22/ε22 distributions. The simulated data show that such an overlap should exist. However, in the corrected data from the BMRB the intensity in this range is significantly larger than in the data derived from the UACSB. This is the result of the fact that the values of the BMRB in the overlap region could not be redistributed at a confidence level >95% and thus the original stereospecific assignment was retained. An additional effect that would lead to identical chemical shift values of the two protons is the fast 180° flip around the C–N bond that would lead to identical experimental values. For asparagine and glutamine in the random-coil Ac-GGXA-NH2 the flip rates are 1.3 and 0.4 s−1 at 293 K, respectively (Harsch et al. 2013). The large reorientaional correlation times imply that fast exchange averaging may only occur in special cases in proteins. The main factor leading to structure dependent deviations from the random coil chemical shifts of amide protons are electric field effects and ring current effects. In addition, paramagnetic shifts may occur when paramagnetic centers are present. The BMRB data base contains a small number of structures with paramagnetic ligands that can lead to extreme paramagnetic shifts up to 111 ppm but are statistically irrelevant and are not contained in the theoretical data base.

The calculated chemical shifts from the unbiased structural data base show that sometimes upfield chemical shifts of individual Hδ21/ε21 or downfield chemical shifts of individual Hδ22/ε22 protons occur that are larger than the corresponding random-coil chemical shift differences. In fact, the largest upfield shifts of Hδ21, Hδ22, Hε21, and Hε22 are 2.39, 2.25, 3.32, and 1.68 ppm, respectively. Correspondingly, the largest downfield shifts values for Hδ21, Hδ22, Hε21, and Hε22 are 8.97, 8.27, 8.86, and 8.23 ppm, respectively. Since in the BMRB even more extreme experimental values are found for diamagnetic proteins this is clearly not an artifact of the chemical shift calculations by SHIFTS. Inspection of Fig. 3 also shows that large downfield shifts for amide protons are more likely than upfield shifts.

Correlated effects in chemical shifts were not in the focus of this paper but promise new, interesting insights. A simple example are the distribution of chemical shift differences. Large differences indicate that the two protons of an amide group experience very different shifts although they are located rather close to each other. For the differences δ(Hδ22) − δ(Hδ21) and δ(Hε22) − δ(Hε21) values between −4.37 and 3.17 ppm and −4.66 and 2.89 ppm, respectively, are calculated. In the BMRB even more extreme values are reported.

Conclusion

We have shown here that in many cases side chain amide protons can be stereochemically assigned from their chemical shift difference only. However, one has to have in mind that this is a probability statement with a predefined error probability. There may be rare cases where the assignment may be wrong. The stereochemical assignment can be important when the amide group is involved in a specific biological interaction and it will increase the precision of distance limits. However, it is well-known that the precision of the distance restraints only determines the accuracy of the 3D structures when the total number of restraints is in the low to intermediate range (<10 restraints per residue). In cases of ambiguity, other methods such as homonuclear NOE (nuclear Overhauser effect) (Wüthrich 1986) or heteronuclear J-coupling patterns should be used. Conversely, chemical shift information can be added if the latter methods cannot accomplish a unique solution. In addition, we have shown that simulated chemical shift data bases have clear advantages when the experimental data base itself is not flawless or limited in numbers as it is expected for stereochemical assignments.