Introduction

Heterocycles that bear nitrogen and sulfur or oxygen are very common moieties in biochemistry and are therefore potentially of high pharmaceutical value [1]. Among them, thiazines are known to have various pharmacological properties: they exert anti-inflammatory, cytostatic, sedative, analgesic, and immunosuppressive effects [2]. The oxazine structure is common to all promising antitumor compounds, such as the tumor-specific benzocycloheptoxazines [3, 4], which also possibly exhibit anti-inflammatory properties [5]. Furthermore, compounds from the benzoxazine family are known to be antioxidant, analgesic, and hypolipidemic drugs [6]. 1,4-Thiazines and 1,4-oxazines are potential remedies for tuberculosis due to their antimycobacterial action [7]. Derivatives of thiazoline are equally valuable; one example is agrochelin, which possesses cytotoxic and antibiotic properties [8]. Oxazolines have completely different but nonetheless highly valuable properties (e.g., the aminorex family, which are psychostimulants [9, 10]). Furthermore, dibenzothiazines are known building blocks of typical antipsychotics such as chlorpromazine [11]. The parent compound of this class, phenothiazine, has itself been found to be a useful insecticide and anthelmintic [12, 13]. From the perspective of combinatorial chemistry, these highly desirable pharmacological properties show that this substance class should be considered to be among the most interesting regions of chemical space.

There has recently been considerable interest in chemical species that can be synthesized and/or derived in multicomponent reactions such as the Ugi [14], Passerini [15, 16], and Asinger [1720] reactions. These reactions have high atomic efficiencies and can be performed in biocompatible solvents such as glycols, ionic liquids, or even plain water, which links them to aspects of green chemistry [2127]. Combinatorial application of these reactions leads to whole libraries of similar compounds [14, 28, 29] in a very short time, although a downside of this is that structure elucidation or validation could become a bottleneck in the workflow.

NMR spectroscopy is undoubtedly one of the most valuable tools for structure elucidation and has become a routinely employed method in most modern organic research groups since the advent of pulse FT techniques. Unlike other analytical methods (e.g., elementary analysis), predicting the expectation value for a chemical shift is not an easy task. This can make the evaluation of complex NMR spectra very tedious.

Many empirical solutions to this problem have been proposed, such as incremental systems [3032] (e.g., the CS ChemNMR Pro facility [33]). Other approaches are systems that utilize databases [34] and descriptors such as the HOSE code [35]. Prominent examples of database-driven programs are NMRShiftDB [36] and SpecInfo [37, 38]. Another class of empirical methods are the regression methods, such as partial least squares fitting [39, 40] and artificial neural networks [4049], which have been used extensively. There are also programs that combine neural networks with HOSE code, such as the ACD NMR predictor [50].

In the framework of electronic structure theory, every property of a particular chemical system can be calculated from its wavefunction or even from its electron density. Thus, if the geometry of a system is known, it is possible to immediately calculate its chemical shifts (for a detailed review of the quantum mechanical calculation of NMR parameters, see [51]). While empirical methods are inherently approximate, ab initio calculations—but not DFT methods—offer systematic increases in accuracy up to an arbitrary precision, given enough computational power. Furthermore, a pure quantum chemical ab initio treatment can be regarded as a physically sound understanding of the subject, while methods such as neural networks are dependent on experimental reference data and are difficult to interpret (their synaptic weights have been described as “an opaque, unreadable table … valueless as a scientific resource” [52]).

However, because of practical limitations, all computational chemistry tools rely on different approximations, which are rather poorly understood or uncontrollable in the case of DFT methods. Although quantum chemical methods have the potential to be true simulations that are perfectly accurate in theory, practical obstacles may downgrade their accuracy to levels far below those of pure empirical methods. Indeed, in one study in which gauge-invariant atomic orbital (GIAO) calculations were performed for various test sets, even the “gold standard” ab initio method CCSD(T) seemed to perform poorly and was inferior to various DFT functionals [53].

In the present study, we investigated the general performance of computationally inexpensive GIAO calculations, their compatibility with simple empirical upgrades, and their special limitations in the practically relevant task of computing the chemical shifts of a pharmaceutically interesting substance class.

Theoretical details

Geometry optimizations of all investigated compounds were carried out at the DFT level of theory according to the Kohn–Sham scheme [54] using the well-established B3LYP hybrid functional [5558] and the dispersion-corrected ωB97xD functional [59] with the basis set 6-31G(d) [6062]. Additional geometry optimizations within the 6-311++G(d,p) basis set [6365] were performed for comparison. All shielding constants were calculated using the gauge-invariant atomic orbital method [6668] (GIAO), and solvent effects were included using the polarizable continuum model [69] with the dielectric constant of chloroform. GIAO single-point calculations were performed using the B3LYP, ωB97xD, WP04 [70], WC04 [70], M06-2X [71], and PBE0 [72, 73] functionals, the Hartree–Fock [74, 75] method and the Møller–Plesset [76] second-order perturbation theory. The Wx04 functionals have been developed to provide an accurate description of chemical shifts (x = P for proton, C for carbon), and are therefore of particular interest. M06-2X is a relatively new and promising functional, while PBE0 has been known for quite a while to be reliable when calculating chemical shifts [77]. Basis set sizes up to valence quintuple-ζ quality were employed in one case study, while the basis sets 6-31G(d) and pcS2 [78] were used to evaluate the test set. The latter has been specifically designed for GIAO calculations with density functionals. All quantum chemical calculations were performed with the Gaussian 09 program [79]. External basis sets were taken from the EMSL basis set exchange database [80].

Chemical shifts δ can be computed in various ways using calculated shielding constants. The most common uses a reference compound such as tetramethylsilane (TMS), and the relative chemical shifts are obtained by subtracting the shielding constant of the simulated nucleus σ calc from the shielding constant of the reference compound σ ref:

$$ \delta ={\sigma}_{\mathrm{ref}}-{\sigma}_{\mathrm{calc}}. $$
(1)

This requires computation of the reference shielding constants at the same level of theory as they are for the investigated compound. The choice of the reference compound is usually arbitrary. TMS is used in most cases, but this may not be the best choice [81].

Another method for obtaining the shifts is more suitable in the statistical analysis of large test sets. In this method, linear regression of the computed shielding constants against experimental chemical shifts in the form σ calc = b −  exp gives the slope m and the intersect b as regression parameters. Using those parameters, chemical shifts δ calc can easily be computed by

$$ {\delta}_{\mathrm{calc}}=\frac{\sigma_{\mathrm{calc}}-b}{m}. $$
(2)

In a regression-based calculation, the accuracy of the method can be expressed in terms of the mean unsigned error (MUE, or mean absolute error) in the regression values:

$$ \mathrm{M}\mathrm{U}\mathrm{E}=\frac{{\displaystyle \sum \left|\left({\delta}_{\exp }-{\delta}_{\mathrm{calc}}\right)\right|}}{n}. $$
(3)

Data processing of the test sets was facilitated by two scripts based on M. Siebert’s scripts from the Tantillo group [8285]. The first script, called geometryExtractor, extracts the optimized geometries of concatenated Gaussian output files of jobs and constructs a linked input file with these geometries and the GIAO tag in the keyword section. The other script, called NMRDataExtractor, was designed to extract isotropic shifts from the concatenated output of GIAO jobs created with the first script. Modifications were made to ensure more general usability; our modified scripts can therefore be applied to all concatenated files, whereas the old scripts were tailored to a specific test set.

Evaluation of the validity and robustness of the linear regression results indicates a high degree of reliability: cross-validation yields almost identical coefficients of determination, and small confidence intervals were obtained (for details, see the “Electronic supplementary material,” ESM).

Results and discussion

To examine the influences of the methodology, basis set, and geometry on the quality of chemical shift prediction for heterocyclic compounds, a test set of 24 compounds was designed with chemically similar 1,4-thiazines, 3-thiazolines, and their oxo derivatives (Scheme 1). The first nine molecules were synthesized and characterized by the Martens group here in Oldenburg.Footnote 1 The remaining entries were taken from the SDBS data bank [90].

Scheme 1
scheme 1

Heterocyclic compounds included in the test set

Only molecules that can be described by a single geometry or—as in the case of freely rotating methyl groups for instance—by the simple averaging of computed shieldings were included in the test set. Electronic structure theory commonly relies on static geometries, whereas NMR is a slow process that measures time-averaged chemical environments and not snapshots. If a measured compound is subject to conformational isomerism, this is usually treated by performing Boltzmann averaging of all important geometries, which complicates the NMR prediction [9194].

The test set contains 217 1H nuclei, but the number of shielding constants is reduced to 83 signals using averaging and by omitting nuclei that were not safely assignable. It also contains 169 13C nuclei, which correspond to 134 signals.

The results of the statistical analysis of all calculations are given in Tables 1 and 2. These tables show the linear regression results; most importantly, the mean unsigned error (MUE) and the slope factor m. Table 1 contains the results of GIAO-NMR calculations performed using various methods employing the 6-31G(d) basis set and the B3LYP/6-31G(d) or ωB97xD/6-31G(d) geometry. In addition to the gas-phase NMR calculations, a simulated solvent field was employed using the polarizable continuum model (PCM) and chloroform as solvent. Table 2 shows the parameters for the same calculations performed using the same geometries as above but with the pcS2 basis set for NMR evaluations, with a single exception. MP2 calculations were not performed with the pcS2 basis set because the calculations were prohibitively expensive.

Table 1 Linear fit parameters from (PCM-)GIAO-method/6-31G(d) calculations for two different geometries
Table 2 Linear fit parameters from (PCM-)GIAO-method/pcS2 calculations for two different geometries

The results in Tables 1 and 2 were analyzed to investigate the influence of the simulated chloroform solvent field, the impact of the underlying geometry, and the importance of the method and the basis set used for NMR calculations. The most interesting of these factors is probably the latter, in which the performance of theoretical methods in reproducing (and predicting) 13C chemical shifts is evaluated. From the data presented in the paper, it is apparent that the HF-based calculations give the worst results. Interestingly, the specialized WC04 functional does not yield satisfactory results, even though it was reparametrized for this purpose. There are two functionals that give the smallest MUE values regardless of basis set and geometry used: PBE0 and ωB97xD. Their average errors are well below 2 ppm per carbon atom. B3LYP also gives average deviations of only a little more than 2 ppm. The differences between the functionals in terms of accuracy of 1H chemical shift prediction are relatively small. The abovementioned functionals also perform well here (with MUEs of 0.11–0.15 ppm), and the purpose-built functional WP04 yields the lowest errors. However, the methods show a lack of consistency in performance, as several theoretical levels that are good at calculating 13C cores do not necessarily perform well for 1H cores, and vice versa.

The influence of the solvent field is only slight. For the proton shifts, the chloroform environment leads to a small systematic improvement in the MUE. For the 13C signals, the inclusion of solvation has a rather more random effect. All in all, the impact is very limited. This is in agreement with the findings of Pecul and Sadley [95], who pointed out that explicit solvent–solute interactions—but not implicit ones—are required to significantly improve the prediction of chemical shifts.

When comparing the results for different geometries, the 13C data obtained from NMR calculations based on the ωB97xD/6-31G(d) geometry are usually somewhat better than those afforded by their B3LYP/6-31G(d) geometry counterparts. In particular, the methods with small MUEs give better results with the dispersion-corrected functional optimized structures. For the computed 1H shifts, the errors are comparable for both underlying geometries (B3LYP structures lead to slightly lower MUEs). Employing an established modern functional and a larger basis set (M06-2X/6-311++G(d,p)) for geometry optimization has only a minor impact on the results (Table 3): the abovementioned “best” functionals PBE0 and ωB97xD give MUEs of 1.89 and 1.87, respectively, for gas-phase GIAO calculations (6-31G(d)), as compared to 1.80 and 1.86 for the ωB97xD/6-31G(d) structures, respectively.

Table 3 Linear fit parameters from GIAO-method/6-31G(d) calculations for the M06-2X/6-311++G(d,p) geometries

The 1H shifts as well as the 13C shifts of all test compounds were additionally simulated with ChemDraw. For the 1H shifts, an MUE of 0.35 ppm is calculated, while the MUE for the 13C shifts is 4.17 ppm, larger than the worst of the corresponding values in Tables 1 and 2. It can therefore be concluded that the incremental code in ChemDraw performs rather badly for this class of substances.

Comparison of Tables 1 and 2 shows that enlarging the basis set from 6-31G(d) to pcS2 results in a slight reduction of the mean unsigned errors of the chemical shifts in many cases. This is more pronounced for the 1H shifts, where a more systematic improvement appears possible. However, the improvements are very small considering the vastly greater computational demand of pcS2 calculations.

Closer inspection of Tables 1 and 2 reveals an interesting fact: in Table 1, the values of the slope m are scattered around the ideal value of 1 (in general, a slope value of approximately 1 is desirable because much larger values will lead to significant deviations at either end of the scale). One would expect that a larger basis set would bring the slope values closer to 1 on average, as enlarging a basis set represents a systematic improvement. However, Table 2 gives values for m that are always larger than those in Table 1. It appears that the basis set size exerts an effect on the magnitude and/or quality of the computed shieldings.

To demonstrate that this effect is not introduced by our linear regression approach, chemical shifts with a “traditional” reference-based procedure are calculated . Data on the computed relative chemical shifts for compound 1 (2,2-dimethyl-2H-1,4-benzothiazine, Chart 1, B3LYP/6-31G(d) geometry) using TMS as a reference are plotted against the respective experimental chemical shifts [96] in Fig. 1, where the best-fit lines are forced through the origin to obtain a one-parameter function.

Chart 1
figure 1

Compounds used in reference-based evaluation of chemical shifts

Fig. 1
figure 2

Plot of HF-calculated against experimental 13C chemical shifts δ of 1 (TMS used as internal reference, regression function without intersect) for different basis sets

One might expect that the effect of changing the basis set on the individual chemical shifts would result in changes in 13C signal scattering when using different basis sets. However, this is clearly not the case: it can be seen from Fig. 1 that enlarging the basis set from the modest 6-31G(d) to the specialized pcS2 barely changes the pattern (or the accuracy), although it does yield higher chemical shift values and a different (higher) slope. The same behavior is observed for other basis sets (the corresponding data are omitted from Fig. 1 for clarity). Therefore, one could transform the results for one basis set to those for another by simply applying a scaling factor. We call this slope factor stigma ς. It is similar but not equivalent to the slope m obtained in the linear regression: the difference between different slope factors ς in Fig. 1 is similar to the difference between the linear regression slopes m in Tables 1 and 2.

Although the observed increase in chemical shifts with increasing basis set size is somewhat counterintuitive, it fits with the above observation for slope values, and supports previous findings [97] that small basis sets are often equal to or better than larger basis sets in terms of accuracy if linear regression is used to fit the calculated shifts to experimental data. The increased slope factor remains unexplained.

These findings are further underlined by a comparison of different basis sets up to quintuple-ζ quality. One example is given in Table 4, which presents the deviations of GIAO-B3LYP-computed 13C chemical shifts of thiazole 7 (2,2,5,5-tetramethyl-2,5-dihydro-1,3-thiazole; B3LYP/6-31G(d)-optimized) from the respective experimental values [98]. The shifts are again derived from comparison with TMS as the reference.

Table 4 Deviations of GIAO-B3LYP-computed NMR shifts for various basis sets (in ppm, relative to TMS) for 7 from the experimental values, as well as the number of basis functions and the relative computation times; the linear regression results for two basis sets are provided in the last two columns for comparison

It can be clearly seen that the number of basis functions does not correlate with the accuracy of the obtained chemical shifts; in fact, the larger the basis set, the higher the MUE. There is one exception: the medium-sized pcS2 basis set gives the largest errors, even though it has been specifically designed for GIAO calculations with density functionals: in B3LYP-GIAO calculations, it gives results closer to the basis set limit than even the largest cc-pVxZ basis sets [78].

However, when linear regression is used to derive chemical shifts (last two columns in Table 4) instead of a reference, this trend vanishes: both basis sets give results with similar accuracy and outperform all uncorrected approaches. Recalling that \( \left({\delta}_{calc}=\frac{\sigma_{calc}-b}{m}\right) \), this is not surprising, as dividing by m drastically reduces or eliminates the basis set effect. Therefore, the use of linear regression methods to evaluate NMR data appears to be mandatory!

If the HF method is compared with the MP2 method, which is the only hierarchically higher method used in this study, it appears that ς is inversely proportional to the quality of the method, and our conjecture is that, for good methods with good basis sets, these diametric effects will cancel each other out so that stigma will approach unity, as expected from a correct calculation. One could speculate that a non-unity stigma factor serves as an indicator of an unbalanced combination of basis set and method.

It is worth noting that the WP4 and WC4 functionals have been designed by adjusting the parameters of B3LYP to test sets of various protons and carbon shieldings within the 6-311+G(2d,f) basis set and the chloroform PCM field. These reparameterized methods produce much lower systematic errors, with mean errors almost one order of magnitude lower than that of the parent functional B3LYP. In our study, where proper regression treatment is applied, these approaches do not surpass B3LYP in terms of accuracy. This indicates that the slope factor ς can be adjusted somewhat arbitrarily by reparametrizing the respective functionals. Another method of diminishing the influence of the stigma factor is the multi-standard method of Sarotti et al. [81]. It can easily be shown that splitting the range of the shift scale - by introducing multiple standards - implicitly not only corrects for offsets but also reduces the influence of pathological slope factors such as stigma. Although barely explainable, the errors that are caused by the deviation of stigma from unity can easily be corrected by linear regression. However, a deeper understanding of this rather unexpected artefact would be nonetheless desirable.

In general, methods that perform well for 1H shifts do not necessarily perform as well for 13C shifts and vice versa. The slope m also differs greatly between the two different nuclei, as shown in Tables 1, 2, and 3. The accuracy of the various methods differs much more for the 13C shifts than for the 1H shifts. The latter are calculated fairly well (and much better than the incrementally calculated values) by many of the methods tested, but none of the methods studied here shows excellent performance in this task. The best combination is PCM-GIAO-WP04/pcS2//B3LYP/6-31G(d), although this theoretical level is rather costly and its practical use questionable. On the other hand, the 13C shifts are calculated to astonishing levels of precision by the combinations GIAO-PBE0/6-31G(d)//ωB97xD/6-31G(d) and PCM-GIAO-PBE0/pcS2//ωB97xD/6-31G(d). The gain in accuracy from a larger basis is not huge, and neither is the influence of the PCM field, so the PBE0 functional in combination with 6-31G(d) is our recommended choice, with an MUE as low as 1.80 ppm. Under the same conditions, the ωB97xD functional also yields good results, and is our close second favorite. In a study with benchmark-quality methods (GIAO-CCSD(T) with basis sets as large as 13s9p4d3f) and vibrational corrections for small molecules in the gas phase, Auer and Gauss found errors as low as 1–2 ppm to be possible [99]. This is only slightly better than our favorite methods, and exemplifies the huge performance gain that can be achieved through linear regression.

There is, however, another issue that should be kept in mind: the rather strong influence of relativistic effects on carbon atoms bonded to third-row elements, especially sulfur and chlorine. The largest deviations from our linear regressions are often seen for carbon atoms directly bound to sulfur atoms. Shifts for simple compounds such as carbon tetrachloride or chloroform, which were calculated for reference purposes, were found to be exceedingly inaccurate, with errors in excess of 30 ppm. This is attributed to relativistic effects. Heavy atoms exert relativistic influences on directly bonded light atoms via spin–orbit interactions. This is well documented in the literature [100103] and in accord with the findings of Dybiec and Gryff-Keller [104] and Tantillo et al. [82], who included chlorine-bearing molecules in their test sets but excluded the carbon atoms directly connected to chlorine. We decided to include the carbon atoms directly bound to sulfur in our study on heterocyclic compounds, as we wanted to determine the magnitude of relativistic errors, which were found to be distinct but not prohibitively large, implying general applicability of our suggested approaches. However, the most pronounced errors for carbon atoms bound to thiocarbonyl sulfur atoms are not purely relativistic in nature, because the errors were considerably smaller when calculated by the MP2 method, which accounts for correlation effects but not for relativistic effects. The errors dropped from values as high as 20 ppm or more to well below 10 ppm, but remained large. If there is more than one formal bond (e.g., one double bond or two single bonds) to a third-row element, then a nonrelativistic GIAO calculation has to be considered potentially unreliable. We also advise against using tetramethylsilane or even chloroform as a reference compound, because systematic errors will arise from disregarded relativistic effects, as the overwhelming majority of all practically reasonable methods are nonrelativistic.

The other problematic signals that are difficult to reproduce are those arising from carbonyl groups. However, this is a known phenomenon, and suggestions for correcting this issue include protonation and/or explicit solvation [105]. This is not satisfactory in particular for a linear regression approach. A better option could be the use of a method with a slope factor of close to 1, as the large absolute shift value for the carbonyl carbon atom is much more impacted by the slope m.

Conclusions

In this study, GIAO calculations with various functionals and two ab initio methods were carried out on a set of heterocyclic molecules. The results of our favored theoretical levels (GIAO-PBE0/6-31G(d)//ωB97xD/6-31G(d) or GIAO-ωB97xD/6-31G(d)//ωB97xD/6-31G(d)) approach the accuracy of the most elaborate benchmark-quality calculations when applied to a chemically narrow class of molecules and in combination with linear statistical enhancement. Purely empirical incremental methods are erratic and consistently outperformed by most of the methods used in this study.

Interestingly, the accuracy of 1H shift calculations can be more systematically improved by increasing the basis set than that of 13C shift calculations. The latter shifts seemingly become larger—but not more accurate—when the basis set is enlarged.

An unexplained distortion factor called ς is observed which is proportional to the “completeness” of the basis set and seemingly inversely proportional to the quality of the method when strictly hierarchical ab initio methods are used. This distortion can be corrected by simple linear regression but it leads to a poor systematic correlation between method quality and overall performance; in fact, very good results are achieved with very modest basis sets. Additionally, in most cases, the residual errors are qualitatively similar regardless of the method employed. This leads to the conjecture that parameters other than basis set composition and computational level also need to be considered.

Although derived from a large test set, the general reliability of this approach needs to be evaluated for additional functionals and other substance classes. We would like to encourage the use the abovementioned levels in different chemical situations and subsequent report of the results. Nevertheless, these levels of theory appear to be sufficiently reliable for wider applicability.