Introduction

1,3-Dipolar cycloaddition (13 DC) reactions have been extensively analyzed due to their paramount importance for the synthesis of a wealth of heterocyclic compounds [1,2,3,4,5,6,7]. As such, a great body of experimental work has been accumulated reporting the effects of the substituents, solvent, catalysts, temperature, and other factors on the thermodynamics, kinetics, regioselectivity, and stereoselectivity of these reactions for a large and diverse set of dipoles and dipolarophiles. Parallel to this, many theoretical models and calculations have been developed, not only to rationalize the experimental observations, but also to give precise details of the corresponding mechanisms and to allow accurate predictions of the primary product in a given reaction. These models range from the simple, although powerful, frontier molecular orbitals (FMO) [2, 8,9,10,11,12,13,14,15,16,17,18] treatments, to modern methods that allow a nearly complete and quantitative description of the reaction path [3,4,5,6,7, 19,20,21,22].

Especially important is the prediction of which regioisomer will be predominant in a 13 DC reaction involving asymmetric reagents. Here we consider this problem for the particular case of the addition to N-methyl-C-phenylnitrone (MPN) of a set of mono-substituted alkenes with diverse electronic effects (see Fig. 1).

Fig. 1
figure 1

Reaction of N-methyl-C-phenylnitrone (1) with an asymmetric alkene (2) giving the 5-substituted (3) and 4-substituted (4) heterocycles

Under similar experimental conditions (e.g., solvent, temperature) the predominant product will be determined by the nature of the moiety X. Thus, the driving factor is typically associated with the electronic effects of the substituent group. Therefore, it is common to classify dipolarophiles in three major groups: strong electron acceptors (group 1), intermediate character (group 2), and strong electron donors (group 3) [1, 2]. The favored final regioisomers for the reaction of the MPN with dipolarophiles belonging to each group have been extensively studied. Experimental reports show a clear predominance of 4-substituted heterocycles for dipolarophiles in group 1 and the 5-substituted regioisomer for dipolarophiles belonging to group 3 [1, 2, 23]. For reactants of group 2, a tendency to obtain the 5-substituted regioisomer has been observed, although in some cases the results are not as conclusive as for groups 1 and 3 [23, 24].

Among the modern strategies developed to study chemical reactivity, conceptual DFT [25,26,27,28,29,30,31,32,33,34] appears as an attractive option, blending the simplicity and ease of interpretation of traditional FMO approaches with a rigorous theoretical foundation that permits the use of state-of-the-art computational tools. Taking this into account it is not surprising that previous works using this methodology to analyze 13 DC have appeared in the literature. The majority of these studies are based on some formulation of the hard/soft acid/base (HSAB) principle [35,36,37,38,39], though some use local electrophilicity measures [40,41,42,43]. All these approaches have an important similarity with the traditional FMO formulation of Houk et al. [1, 2] as they require the determination of the direction of total net charge transfer between the dipole and the dipolarophiles (a point that will be analyzed in detail in the following section). Here we explore a different approach, namely, the use of the dual descriptor [44,45,46]. This relatively new descriptor has been applied with great success to the study of the regioselectivity of Diels-Alder cycloadditions [47], a classical example of a concerted organic reaction. Some time ago there were vivid debates surrounding the nature of the mechanism of 13 DC, discussing if such reactions occurred in a single or in multiple steps, though nowadays the concerted mechanism is the most widely accepted [48, 49]. For this reason we believe that the dual descriptor could be valuable for rationalizing the regioselectivity of 13 DC.

In the next section we present a brief overview of conceptual DFT, presenting the dual descriptor and discussing the approaches based on the Fukui function [25, 50,51,52]. Then, we describe the computational methodology selected to study the regioselectivity of the considered systems. It must be stressed that we analyze the influence of several computational factors in the final results, namely: the way to approximate the Fukui function (e.g., finite-differences [53, 54] and Galván, Gázquez and Vela’s [55] spin-density proposal), the computational method (e.g., UHF and UB3LYP), basis set (e.g., 6-31G, 6-31G*, 6-31G**, 6–31+G** and 6–311+G**), and atoms-in-molecules (AIM) scheme used to condensate the reactivity descriptors (e.g., Becke [56], Hirshfeld [57,58,59,60], iterative Hirshfeld [61,62,63], iterative stockholder [64, 65], and extended Hirshfeld [66] partitions). We also considered the use of the extended dual descriptor [67, 68], and a novel proposal to “directionally condense” it [69]. The analysis of the results are presented followed by the concluding remarks.

Theoretical details

Conceptual DFT can be formulated from (equivalent) thermodynamic, variational or perturbative points of view [25,26,27,28,29,30,31,32,33,34]. The last approach is based on the fact that when a system having an initial external potential v(r) and number of electrons N suffers a perturbation changing its state to one with external potential v(r) + Δv(r) and and number of electrons N + ΔN, the associated energy change can be computed as:

$$ \begin{array}{c}\Delta E={\left(\frac{\partial E}{\partial N}\right)}_{v\left(\mathbf{r}\right)}\Delta N+\int {\left(\frac{\delta E}{\delta v\left(\mathbf{r}\right)}\right)}_N\Delta v\left(\mathbf{r}\right)d\mathbf{r}+\frac{1}{2}{\left(\frac{\partial^2E}{\partial {N}^2}\right)}_{v\left(\mathbf{r}\right)}\Delta {N}^2+\\ {}\int \left(\frac{\delta^2E}{\delta v\left(\mathbf{r}\right)\partial N}\right)\Delta N\Delta v\left(\mathbf{r}\right)d\mathbf{r}+\frac{1}{2}\int {\left(\frac{\delta^2E}{\delta v\left(\mathbf{r}\right)\delta v\left({\mathbf{r}}^{\mathbf{\prime}}\right)}\right)}_N\Delta v\left({\mathbf{r}}^{\mathbf{\prime}}\right)\Delta v\left(\mathbf{r}\right)d\mathbf{r}d{\mathbf{r}}^{\mathbf{\prime}}+\dots \end{array} $$
(1)

The connection with chemistry is established by the fact that the partial derivatives appearing in Eq. (1) as coefficients of the perturbative expansion can be identified with key chemical concepts. For example, Parr et al. [70] and Parr and Pearson [71] argued that the derivatives \( {\left(\frac{\partial E}{\partial N}\right)}_{v\left(\mathbf{r}\right)} \) and \( {\left(\frac{\partial^2E}{\partial {N}^2}\right)}_{v\left(\mathbf{r}\right)} \) can be respectively identified with the chemical potential μ (e.g., the additive inverse of the electronegativity) and the chemical hardness η.

The essential ingredient in many conceptual DFT regioselectivity studies is the Fukui function, 24,49–51 defined as:

$$ f\left(\mathbf{r}\right)={\left(\frac{\partial \rho }{\partial N}\right)}_{v\left(\mathbf{r}\right)} $$
(2)

It is important to remark that the derivative discontinuity of the energy at integer N implies that in Eq. (2) we must differentiate between left and right-hand derivatives [31, 33, 72]. In other words, at the time of taking the limit implicit in the definition of the derivative, we must consider separately the cases with ΔN > 0 (e.g., the system gains electrons, acting as an electrophile, and thus suffering a nucleophilic attack) and ΔN < 0 (e.g., the system loses electrons, acting as a nucleophile, and thus suffering an electrophilic attack). Thus, we are prompted to work with two Fukui functions f +(r) and f (r), where the superscripts indicate the sign of the variation of the number of particles. There are several ways to calculate these descriptors, perhaps the most popular being the one based on the grand-canonical and size-consistent result that the values of a given state function in a system with non-integer number of particles M can be obtained linearly interpolating its value between the systems with M and M ± 1 particles [72,73,74,75,76,77]. After this we can write:

$$ {f}^{+}\left(\mathbf{r}\right){\cong}^{N+1}\rho \left(\mathbf{r}\right){-}^N\rho \left(\mathbf{r}\right) $$
(3)
$$ {f}^{-}\left(\mathbf{r}\right){\cong}^N\rho \left(\mathbf{r}\right){-}^{N-1}\rho \left(\mathbf{r}\right) $$
(4)

Here and in the following, the left superscript indicates the number of particles of the system. Notice also that Eqs. (3) and (4) are the natural finite-differences (FD) approximations to Eq. (2).

Other popular expressions are those derived by Galván, Gázquez, and Vela [55] (GGV) in terms of spin densities, M ρ S (r) = M ρ α (r) − M ρ β (r), and spin numbers, M N S  = M N α  − M N β , as:

$$ {f}^{+}\left(\mathbf{r}\right)\cong {}{}^{N+1}\rho_S\left(\mathbf{r}\right)/{}{}^{N+1}N_S $$
(5)
$$ {f}^{-}\left(\mathbf{r}\right)\cong {}{}^{N-1}\rho_S\left(\mathbf{r}\right)/{}{}^{N-1}N_S $$
(6)

As expressed in Eq. (2) (and in Eqs. (3)–(6)) the Fukui function is a local descriptor, in the sense that it depends only on a spatial position r. However, in some cases one would like to use simpler representations of this function in order to perform an atom-based analysis. As such, we have to condense the Fukui function, using a suitable atoms-in-molecules (AIM) procedure. Given an AIM method, such that an atom k possesses a given charge M q k in the M-electron state, the corresponding Fukui functions can be expressed as [78]:

$$ {f}_k^{+}{=}^N{q}_k{-}^{N+1}{q}_k $$
(7)
$$ {f}_k^{-}{=}^{N-1}{q}_k{-}^N{q}_k $$
(8)

(Notice that, for simplicity, we have chosen to condensate the Fukui function using the response-of-molecular-fragments approach) [79, 80].

As mentioned before, most of the applications up to date of conceptual DFT to the study of 13 DC reactions use some local formulation of Pearson’s HSAB or electrophilicity measures [3, 39]. The Fukui function is a central ingredient to both approaches, allowing to obtain local (and atom-condensed) softness and electrophilicity values, which are then used jointly with some “matching criteria” to determine the most favorable regioisomer. It is important to notice that the use of the Fukui function forces us to consider the direction of total net charge transfer right at the beginning of our calculation. That is, we must start in any case by determining which species (e.g., the dipole or dipolarophile) donates or receives electrons. We are forced to do so because this will indicate which Fukui function, f + or f , we should consider for each molecule. Such determination within conceptual DFT is commonly based on the comparison of the chemical potentials (e.g., electronegativities) of the molecules. Typically, the chemical potential is calculated through a FD approximation involving the ionization energy I and electron affinity A of the species [25]:

$$ \mu =-\frac{I+A}{2} $$
(9)

Thus, one could expect that the dipole, D, will donate (accept) electrons to (from) the dipolarophile, d, if μ D  > μ d (μ D  < μ d ) [81]. If the dipole acts as a donor we have to work with the Fukui functions \( {f}_D^{-} \) and \( {f}_d^{+} \), contrarily, if the dipole accepts electrons the involved Fukui functions will be \( {f}_D^{+} \) and \( {f}_d^{-} \).

The fact that the first step of conceptual DFT approaches based on the Fukui function must be the determination of which species acts as electron donor or acceptor closely resembles traditional FMO theories used to study 13 DC [1, 2]. This is not surprising, since in many cases FMO quantities can be seen as approximate conceptual DFT descriptors. For example, within a frozen-core setup, it can be easily shown that the right-hand sides of Eqs. (3) and (4) reduce to the densities of the lowest unoccupied and highest occupied molecular orbitals respectively (e.g., LUMO and HOMO). Also, if one approximates the ionization energy and electron affinities using Koopman’s theorem, Eq. (9) nicely delivers Sutstmann’s classification of 13 DC in terms of the relative energies of the frontier orbitals of the dipole and the dipolarophile [1, 2].

This close analogy between the FMO and conceptual DFT tools serves to point out an interesting fact related to the determination of the direction of total net charge transfer between the reactants. In a detailed analysis using a perturbative treatment, Houk determined that in a great number of examples, the FMOs involved in the reaction were those corresponding to the direction of charge transfer expected in terms of the relative energies of the FMOs (e.g., electrons will tend to flow from the HOMO of the donor to the LUMO of the acceptor reactants) [1, 2]. This could be translated to conceptual DFT terms stating that the Fukui functions that one has to consider for a given dipole-dipolarophile pair are those obtained following the simple prescription given after Eq. (9). However, for the case of the nitrone, Houk found that this is not the case, and that the involved FMOs are the opposite of those predicted from solely the orbital energies. Then, in this case, the conceptual DFT recipe of determining the direction of net charge transfer in terms of chemical potential differences results in an erroneous estimate of the involved Fukui functions. Houk correctly predicted the nature of the FMOs governing the regioselectivity taking into account not only their energies, but also their atomic orbital coefficients and the interatomic interaction integrals. This fact was recently analyzed from a conceptual DFT point of view, where it was shown that it is a result of the neglect of the effects of the molecular environment on the electronegativity, a factor that was proven to be decisive in some 13 DC [31, 81, 82].

As a possible alternative to circumvent this situation we propose to use the dual descriptor as the basic tool to rationalize the regioselectivity of 13 DC. The dual descriptor, Δf(r), is defined as [44]:

$$ \Delta f\left(\mathbf{r}\right)={\left(\frac{\delta \mu }{\delta v\left(\mathbf{r}\right)}\right)}_N={\left(\frac{\partial f\left(\mathbf{r}\right)}{\partial N}\right)}_{v\left(\mathbf{r}\right)} $$
(10)

To better understand the power of this descriptor let us rewrite it in terms of the FD approximation to the second partial derivative appearing in Eq. (10):

$$ \Delta f\left(\mathbf{r}\right)\cong {f}^{+}\left(\mathbf{r}\right)-{f}^{-}\left(\mathbf{r}\right) $$
(11)

In this way it is clear that the dual descriptor measures the propensity of a given site within a molecule to suffer a nucleophilic or electrophilic attack, just by following a simple rule: a portion of a molecule in which Δf(r) > 0 (Δf(r) < 0) will be more prone to accept (donate) electrons. By combining in a single descriptor both Fukui functions, Δf(r) allows us to study charge-transfer reactions without the need to establish beforehand the direction of total charge transfer. Another advantage of the dual descriptor is that it provides a natural way to perform regioselectivity (and even stereoselectivity) predictions based in a natural principle: when two molecules A and B interact, the region of molecule A more prone to donate electrons will tend to bind to the region of B more likely to accept them. This can be easily translated in terms of the dual descriptor stating that the region of molecule A with the greatest value (typically positive) of Δf(r) will more likely interact with the region of B with lowest value (typically negative) of Δf(r).

Finally, it should be noted that an extended (state-specific) dual descriptor that generalizes the working equation given in Eq. (11) was recently proposed [67]. For the calculation of this descriptor (or rather, group of descriptors) we use the excited states of the system of interest, without the need to perform the often cumbersome (and error-prone) single-point calculations on the cationic and anionic systems. As such, this descriptor is defined as:

$$ \Delta {f}_i\left(\mathbf{r}\right) =^M{\rho}^{(i)}\left(\mathbf{r}\right) -^M{\rho}^{(0)}\left(\mathbf{r}\right) $$
(12)

where M ρ (i)(r) and M ρ (0)(r) represent the electron densities of the ith-excited state and the ground state of the system. If we use the first excited state in Eq. (12) the resulting dual descriptor is analogous to the “usual” dual descriptor calculated in Eq. (11). Despite its recent introduction, this descriptor has been successfully applied to “tricky” chemical reactions, as well as a valuable tool for rationalizing diverse observations in coordination chemistry [67,68,69, 83].

Computational details

We will analyze the ability of the different formulations of the dual descriptor to predict the predominant regioisomer in the reaction of MPN with a set of monosubstituted alkenes, namely: nitro-ethylene (group 1), cyano- and methoxycarbonyl-ethylene (group 2) and phenyl- and methoxy-ethylene (group 3). (The group classification is in accordance with that based on the electronic effects of the substituents, as explained below Fig. 1). The experimentally reported preferred product in each case is shown in Fig. 2 [1, 2, 23].

Fig. 2
figure 2

Primary products of the reaction of nitro-ethylene (5), cyano-ethylene (6), methoxycarbonyl-ethylene (7), phenyl-ethylene (8), and methoxy-ethylene (9) with MPN

As a first step, the geometries of all the neutral molecules involved in the study were optimized and frequency calculations were performed to guarantee that minimum-energy structures were obtained in all cases. Some of the expressions that we will be using to estimate the dual descriptor also require information about the cationic and anionic species. However, note that the partial derivatives entering the definitions of the Fukui function and the dual descriptor, Eqs. (2) and (10), must be evaluated at constant external potential (e.g., constant geometry), then for the charged systems only single-point calculations were performed at the geometry of the neutral species. All these calculations were performed with the UHF and UB3LYP methods, using the Gaussian09 suite of programs [84], and with the basis sets: 6-31G, 6-31G*, 6-31G**, 6–31+G**, and 6–311+G**. The dual descriptor was then calculated using the expression given in Eq. (11). Fukui functions needed to evaluate Δf(r) in this approach were estimated using both GGV spin-density formula [55], Eqs. (5) and (6), and the FD results [25], Eqs. (3) and (4). In the case of the dual descriptor obtained using FD Fukui functions, we calculated the (atomic) condensed values of Δf(r). For this we used a set of AIM schemes commonly used in conceptual DFT studies: Becke (b) [56], Hirshfeld (h) [57,58,59,60], iterative Hirshfeld (ih) [61,62,63], iterative stockholder (is) [64, 65], and extended Hirsheld (eh) [66] partitions. In all cases the corresponding atomic charges were calculated using the HORTON package [85]. We calculated the state-specific dual descriptors of the involved molecules using their first and second excited states [67]. For this, we carried out TDDFT calculations at the optimized DFT geometries, using the B3LYP functional. The condensation of the values of the dual descriptor in these cases was done using the methodology of Tognetti et al. [69], which allows us to retain some information about the spatial distribution of this descriptor.

Results and discussion

Dual descriptor from GGV spin-density Fukui functions

In Table 1 we show the isosurface of the dual descriptor for the MPN, calculated both at the UHF and UB3LYP methods and with all the basis sets mentioned in the previous section.

Table 1 Dual descriptor for the MPN, calculated at the UHF and UB3LYP theories and with basis sets 6-31G, 6-31G*, 6-31G**, 6–31+G**, and 6–311+G**, using GGV spin-density Fukui functions. Positive values of the dual descriptor are shown in red and negative values in blue. Isosurface contour values: UHF =0.013459; UB3LYP =0.007789

As one could expected, both methods and all basis sets agree that the oxygen atom of the dipole shows a clear preference to donate electrons. This, reflected in the negative value of Δf(r) (e.g., blue isosurfaces in Table 1 and in the following), is in accordance with the canonic resonance structure of the MPN shown in Fig. 1, which one can argue is the most important one, taking into account the electronegativities of the atoms of the dipole. The other extreme of the dipole, namely the carbon atom bonded to the phenyl group, requires a more detailed analysis. Though both methods give essentially identical results for each basis set, the results seem to depend slightly on the choice of the basis. On one hand, the simplest basis set shows that the value of the dual descriptor over this atom is clearly positive, thus presenting a tendency to suffer a nucleophilic attack. However, on the other hand, the more complete basis present regions of both signs of the dual descriptor surrounding the given carbon atom. This is not a problem though, as we are only interested in the relative tendency to donate or receive electrons of both ends of the dipole. As such, it is clear that in all cases the terminal oxygen shows a significantly greater tendency to donate electrons than the carbon, which conversely is more likely to receive electrons in the course of a 13 DC.

Having completed the elucidation of the nucleophilic and electrophilic character of the extremes of the dipole, what remains is the corresponding analysis of the dipolarophiles. In Table 2 we present a sample of the calculations of the dual descriptor for the studied alkenes. Only the results obtained with the 6-31G** basis set are shown because the calculations performed with the other basis show qualitatively similar behavior (see the Supplementary information section).

Table 2 Dual descriptor calculated using GGV spin-density Fukui functions, at UHF and UB3LYP methods of theory with the 6-31G** basis set, for the studied alkenes. Positive values of the dual descriptor are shown in red and negative values in blue. Isosurface contour values: UHF = 0.009347; UB3LYP = 0.005409

As was the case for MPN, both UHF and UB3LYP give the same qualitative results for the sign of the dual descriptor. For dipolarophiles in group 3, methoxy- and phenyl-ethylene (columns 4 and 5 of Table 2), one can clearly identify the electrophilic and nucleophilic regions. In these cases, the substituted carbon atom (C2, see Fig. 1) shows a marked tendency to accept electrons, while the other end of the dipolarophile (C1, see Fig. 1) will tend to donate them. According to this, one would expect that the C2 extreme will bind to the O atom of the MPN while the C1 atom of the alkene will bond to the C end of the dipole. This would lead to the formation of the 5-substituted heterocycles, which is in perfect agreement with the experimental observations. For nitro-ethylene, a group 1 dipolarophile (column 3, Table 2), we can determine by simple visual inspection that the C atom next to the NO2 group would tend to donate electrons, while the other C atom would accept negative charge. As such, the O end of the MPN will more likely bind to the C1 atom of this dipolarophile, with the C side of the dipole willing to accept electrons from the C2 extreme of the alkene. This is consistent with the experimental observation of the preferred formation of the 4-substituted regioisomer in this reaction.

The situation changes though, when considering the dipolarophiles in group 2 (cyano- and methoxycarbonyl-ethylene, columns 1 and 2 in Table 2). Here we can see how, independently of the method (Table 2) or basis set (Supplementary information), the electrophilic and nucleophilic regions of these dipolarophiles are parallel to those in the nitro-ethylene. Then, one would predict that also in this case the primary product would be the 4-substituted regioisomers, but this disagrees with the experimental reports. As previously indicated, group 2 dipolarophiles generally lead to mixtures of products in which the 5-substituted heterocycle is predominantly found. These results could indicate two possibilities; either the dual descriptor is not a suitable choice to study the regioselectivity in 13 DC, or one of the computational approximations we used to calculate this descriptor is inappropriate. As a way to test this last point we re-calculated the dual descriptor, starting now from Fukui functions derived from FD formulas, Eqs. (3) and (4).

Dual descriptor from FD Fukui functions

In Table 3 we present the dual descriptor for the MPN, obtained from FD Fukui functions, after UHF and UB3LYP calculations using all considered basis sets.

Table 3 Dual descriptor for the MPN, calculated at the UHF and UB3LYP methods and with basis sets 6-31G, 6-31G*, 6-31G**, 6–31+G**, and 6–311+G**, using FD Fukui functions. Positive values of the dual descriptor are shown in red and negative values in blue. Isosurface contour value: UHF = 0.007789; UB3LYP = 0.005409

As was the case in the previous section, no qualitative differences appear when we vary the theoretical method or the basis set. However, as previously reported, the determination of the electrophilic and nucleophilic regions by simple inspection is a little more difficult when FD Fukui functions are used. Though one could conclude that the O atom must be more prone to donate electrons than the C atom of the dipole, as for both ends of the MPN all basis sets show close, if not overlapping, positive and negative regions of Δf(r). To base our analysis in a more quantitative criterion we proceeded to obtain the condensed values of the dual descriptor (e.g., substituting Eqs. (7) and (8) in Eq. (11)). The corresponding results are shown in Fig. 3.

Fig. 3
figure 3

Condensed values of the dual descriptor for the MPN using UHF (above) and UB3LYP (below) according to the selected AIM schemes. More negative values of Δf(r) are shown in blue and more positive values in red

The graphics presented in Fig. 3 clearly confirm that this new way to calculate the dual descriptor gives the same results for the electrophilic and nucleophilic regions of MPN as those discussed in the section above. Here we also obtained, by both methods and all basis sets, that the dipole is more likely to suffer an electrophilic attack at the O atom position, while the C end of the dipole would more probably suffer a nucleophilic attack. Notice that this tendency appears even in situations in which the condensed dual descriptor over both atoms has the same (negative) sign. We must remember that our interest points to the relative electrophilicity/nucleophilicty character of these atoms. Even if in some situations (i.e., using ih charges by both computational methods and 6-31G* and 6-31G** basis) the C atom shows some tendency to donate electrons, and the O end appears to be markedly favored to suffer an electrophilic attack.

Qualitative results of the dual descriptor for the dipolarophiles appear in Table 4 (only for the 6-31G** basis set, as the other basis sets give qualitatively similar results, see the Supplementary information). Here the determination of the net nucleophilicty or electrophilicity of a given atom of the alkene is even more challenging than in the previous (MPN) case.

Table 4 Dual descriptor calculated using FD Fukui functions, at UHF and UB3LYP methods of theory with the 6-31G** basis set, for the studied alkenes. Positive values of the dual descriptor are shown in red and negative values in blue. Isosurface contour value: UHF = 0.009347 (NO2 and Ph 0.003757); UB3LYP = 0.005409 (Ph 0.002609)

As done for the MPN, a detailed analysis of the dual descriptor in this case forces us to condense it using an AIM method. To evaluate the performance of the dual descriptor in a straightforward way we will make use of the known experimental results indicating which regioisomer is predominantly obtained in each situation (see Fig. 2). In other words, after determining the nucleophilic/electrophilic character of the atoms of the dipole and knowing the primary product, we could anticipate the sign of the dual descriptor over the atoms of the alkene in order for the theoretical prediction to concur with the experimental facts. In Fig. 4 we represent this scheme where, after positioning the dipolarophile in the orientation required to obtain the preferred heterocycle, the relative values of the condensed Δf(r) over its atoms are chosen in a complementary way with respect to those of the dipole. For example, in the case of the phenyl-ethylene, obtaining the 5-substituted heterocycle requires that the substituted C atom of the alkene (C2) presents a value of Δf(r) greater than the one corresponding to the unsubstituted end (C1). (The detailed numerical results of these calculations, for all the molecules and combinations of methods, basis and partitions, are presented in the Supplementary information).

Fig. 4
figure 4

Expected nucleophilic (blue) and electrophilic (red) character of the atoms of the dipolarophiles required to, jointly with the sign of the dual descriptor in the MPN, account for the preferred regioisomer in each situation

Following this strategy, we now proceed to analyze the results of the condensed values of the dual descriptor for the dipolarophiles, obtained using both UHF (Fig. 5) and UB3LYP (Fig. 6). Since we fixed beforehand the relative signs of Δf(r), the interpretation of the graphics is quite simple: a given theoretical method, basis set and AIM partition combination will correctly predict the primary product whenever the condensed value of Δf(r) for the expected electrophilic atom (always represented with red dots) will be greater than the condensed value of the nucleophilic atom (blue dots).

Fig. 5
figure 5

Condensed values of the dual descriptor for the alkenes, using UHF, according to the selected AIM schemes

Fig. 6
figure 6

Condensed values of the dual descriptor for the alkenes, using UB3LYP, according to the selected AIM schemes

For the calculations performed with UHF (Fig. 5) we can see that the b partition fails to predict the correct regioselectivity for the cyano-ethylene (independently of the basis set considered). In general, the performance of this AIM scheme gives mixed results when applied with UHF, giving erroneous predictions for the methoxycarbonyl-ethylene (with the basis sets including diffuse functions) and the phenyl-ethylene (for the 6–31+G** basis, where both atoms of the dipolarophile are predicted to have exactly the same tendency to gain or lose electrons), but being otherwise correct. The performance of the h charges is slightly better, failing only for the cyano- and methoxycarbonyl-ethylene when basis sets including diffuse functions are used, as well as for the phenyl-ethylene in the case of the 6–31+G** basis (here, the same pathological behavior of quasi-identical dual descriptor values over both atoms found in the b case is encountered). Here we should point out that even if our main objective is the qualitative prediction of the preferred regioisomer, we must pay close attention to the quantitative values of the condensed dual descriptor, more precisely, to the difference between these values for both ends of the dipolarophile. In a way, the value of δ = |Δf(C1) − Δf(C2)| will provide us a measure of the sensitivity for the selected theoretical scheme, since a greater value of δ provides a clearer differentiation between the reactivity tendencies of the atoms of the given molecule. For the cases of b and h partitions, the order of their δ values is typically close to 10−3 (and up to 10−2 for the phenyl-ethylene), which means that at the UHF method these schemes do not clearly differentiate between the electrophilic and nucleophilic behaviors within the selected molecules. As such, no clear distinction for the predicted product could be obtained by these methods.

For he AIM charges at the UHF method, the general results are similar to those of the h partition, with incorrect predictions for the cyano-ethylene with the 6–31+G** basis, the nitro-ethylene with the 6-31G* and 6-31G** basis and the methoxy-ethylene with the more complete basis set. However, in this case we usually find δ values in the range of 10−1, which means that this partition allows a clearer differentiation of the electronic effects at both ends of the alkene.

Both hi and is partitions have a similar behavior as that observed in the UHF case, both giving erroneous predictions for the cyano-ethylene when the basis sets include diffuse functions. The hi charges also fail for the methoxy-ethylene with the 6–311+G** basis and the phenyl-ethylene with both basis including diffuse functions. The is AIM fails in these cases and additionally for the methoxycarbonyl-ethylene, the phenyl-ethylene and the methoxy-ethylene when diffuse functions are used. Both these partitions give δ values of around 10−1, with the is providing better separation between the reactive tendencies of atoms C1 and C2 in almost all cases.

Up to this point we could have noticed an interesting tendency (exacerbated when hi and is charges are used), of a general failure of basis sets with diffuse functions. This may appear unexpected, since these were the most complete basis set. However, this behavior can be easily rationalized. The key point in the analysis is that the compounds we have been studying have negative (if not very small) vertical electron affinities (see the Supplementary information). This indicates that the corresponding anions (which we had to calculate to compute the dual descriptor) are unstable. Tozer et al. [86,87,88] reported that the use of diffuse functions at the time of studying systems with this characteristic could imply a poor description of their anionic states (a fact recently studied for several 13 DC reactions) [81]. There are several ways to deal with this problem, a popular one being to use smaller basis sets, better fitting the features of the system (as small basis sets artificially bind the extra electron). Thus, in the case of compounds with unstable anions, the use of diffuse functions will provide us a poor description of the electron density and, consequently, questionable values of the dual descriptor. Taking this into account we could briefly summarize the results obtained at the UHF method stating that, if one excludes the basis sets containing diffuse functions, the hi and is charges correctly predict the preferred regioisomer in all cases (even for group 2 dipolarophiles, which could not be predicted by the dual descriptor calculated using GGV spin-density Fukui functions).

The condensed values of the dual descriptor obtained by using UB3LYP (Fig. 6) present several similarities with those given by UHF. In this case the b partition fails (independently of the chosen basis set) to reproduce the regioselectivity patterns for the cases of the cyano- and methoxycarbonyl-ethylene. However, for the other dipolarophiles, a complete agreement with the experiment is found, except for the case of the phenyl-ethylene when diffuse functions are taken into account. The behavior of the h charges also parallels that of the b scheme when UB3LYP is used. Once again, there is no basis set that can account for the experimental trends observed for the cyano- and methoxycarbonyl-ethylene, and the 6–31+G** basis fails for the phenyl-ethylene. These AIM schemes provide δ values ranging from 10−1 to 10−3, but most frequently around 10−2. Thus, UB3LYP does a better job than UHF differentiating the reactivity of the C1 and C2 atoms in these compounds.

The he scheme performs slightly better than the b and h charges. Once again there is no basis set which gives the correct results for methoxycarbonyl-ethylene and the reactivity of the cyano-ethylene can be accounted for with the use of the 6-31G* and 6-31G** basis sets. The corresponding results for the other dipolarophiles show nice agreements with the experimental findings, except for the cases of nitro-ethylene (6-31G*) and phenyl-ethylene (with diffuse basis). As was the case for the UHF results, δ values can be found around 10−2.

The hi and is AIM charges provide the most promising results. Here, leaving the methoxycarbonyl-ethylene case aside, because once again it cannot be reproduced with any basis set, all the other experimental trends can be explained if we use basis sets without diffuse functions. (The reason diffuse functions are expected to give non-reliable results was discussed in detail before.) This, and the results obtained at the ab initio and DFT methods, allows us to conclude that, in lieu of more involved calculations, regioselectivity studies of 13 DC based on the dual descriptor should avoid the use of basis sets containing diffuse functions. Similarly to the UHF situation, is charges consistently provide bigger δ values (e.g., always above 10−1), and are therefore the best for differentiating the electrophilic and nucleophilic behaviors of the extremes of the dipolarophiles.

Before concluding this section we would like to briefly recap the case of dipolarophiles belonging to group 2 (e.g., cyano- and methoxycarbonyl-ethylene). These are the most challenging tests for any method trying to rationalize the regioselectivity of 13 DC. For the cyano-ethylene, a great number of combinations of procedural “degrees of freedom” (e.g., theoretical methods, ways to compute the Fukui functions, basis sets, AIM partitionings) give erroneous results (e.g., any basis set and the b partition with both computational methods, and the h scheme using UB3LYP). However, there are other computational choices that, consistently and rationally (e.g., hi and is using both computational methods, with any basis set without diffuse functions), correctly predict the primary product.

The methoxycarbonyl-ethylene case is more pathological. Here we should mention that, while we have chosen to follow the regioselectivity reported by Padwa et al. [23] (see Fig. 2), there are other experimental reports indicating that 4-substitution is preferred [24]. If we accept the results of Padwa et al., the performance of different computational strategies is very scattered. The UHF calculations give correct results for the vast majority of the cases (with the sole exceptions of the b and is charges when diffuse functions are used), UB3LYP calculations fail completely, as no single combination of AIM scheme and basis set predicts the preferred formation of the 5-substituted heterocycle. The opposite trends appear if one chooses to follow the other experimental reports [23] indicating that the 4-substituted heterocycle is the primary product. Here, UB3LYP gives the “right answers” and UHF fails.

State-specific dual descriptor and its directionally-condensed values

Up to now we have seen the results of the two Fukui function-based ways to calculate the dual descriptor, using Eq. (11). The GGV and, to a lesser extent, the FD dual descriptors allow one, by visual inspection, to identify the regions of this descriptor that guide the regioselectivity and stereoselectivity, according to its sign and spatial orientation, respectively. In both cases, the most important drawback is the complete absence of a quantitative measure of these preferences, making it very difficult to compare different trends among a given set of reactants, or even between different sites within a molecule.

On the other hand, the condensed dual descriptor obtained from FD is able to provide a quantitative value for the nucleophilic or electrophilic character of an atom. The main drawback of this approach is that the spatial information is lost during the condensation. The reactivity on a given atom is not only related to the overall value of the condensed dual descriptor, but also to the spatial distribution of it. It may very well be the case that the dominant condensed electrophilic/nucleophilic character of a moiety is due to higher absolute values of the dual descriptor in regions that, even though associated with it according to the AIM scheme under use, are inaccessible for reactions because of the spatial orientation of the molecule at the time of reacting. This loss of spatial information, while simplifying the final interpretation, could be misleading when the direction of charge transfer between the reactants plays a fundamental role in the overall reactivity.

In order to solve this problem, a novel approach has been developed that keeps the best of the two previous methods, as it provides quantitative measures for the character of the atoms while retaining the information regarding the spatial anisotropy [69]. The method is based on separating the dual descriptor in continuous domains of the same sign. By visual inspection, it is possible to select the domains that control the reactivity of a given atom in the studied reaction (according to its geometry). Then, a proper numerical integration of the descriptor over the domains gives us a quantitative measure of the electrophilic/nucleophilic character of this domain and hence of this atom in the studied reaction, in a way that takes into account the preferred orientation between the reactants.

We thus follow this approach to condense the values of the state-specific dual descriptor, calculated for the dipole and dipolarophiles under study. For this, we took into account that these species will tend to interact in a quasi-parallel way, which indicates that we must focus on the values of the dual descriptor that lie outside the molecular planes. In particular, to calculate the state-specific dual descriptor, we selected the first (S1) and second (S2) excited states, since these are expected to be the most relevant for the reactivity of these species.

Table 5 shows the qualitative results corresponding to the regioselectivities predicted by this method. The numeric values corresponding to these results can be found in the Supplementary information.

Table 5 Representation of whether the interaction of the MPN and the alkenes correctly predicts the experimental behavior based on the first (S1) and second (S2) state specific dual descriptor

The state-specific dual descriptor gives mixed results. Apparently the driving factor is the selection of the state for the dual descriptor of MPN, since in the case of S2 the results are overall better than for MPN in S1. Perhaps counterintuitively, the best results were obtained for the 6-31G basis set and using the S2 dual descriptor for MPN and alkenes, as this combination is able to reproduce all the experimental results. When the S1 state of dipolarophiles is used, only the methyl carboxylate experimental selectivity fails to be predicted.

Another interesting fact is that the basis with diffuse functions performs just as well, if not better than, the basis with polarization functions but not diffuse functions. This is to be expected, since by using the state-specific dual descriptor we avoided the calculation of anions, which we believe to be the main source of errors when we considered the FD-based dual descriptor.

The 6-31G* and 6-31G** basis provided the worst results, if we consider the nitrone in S2, in contrast with the results obtained from the condensed FD dual descriptor. In this case, the only regioselectivities that were correctly predicted are those of the methylcarboxilate and the nitro-alkene (dipolarophile in S1) and nitroalkene (dipolarophile in S2). When diffuse functions are considered, we obtain the opposite results, with incorrect predictions only for these cases.

These results show that considering the nitrone in its second excited state is of paramount importance to reproduce the correct regioselectivity patterns. The inclusion of diffuse functions on the basis does not worsen the results. In general, the preferred set up for predicting the reactivity would be to use the S2 dual descriptor on the nitrone and the alkenes and the 6-31G basis, as it gave the correct results for all the tested systems. The robustness of such an approach remains to be established. However, as the systems analyzed here cover a wide range of electronic effects, this approach is expected to perform well in general.

Conclusions

Different ways to calculate the dual descriptor were tested, as well as the influence of the basis set selection, for predicting the correct regioselectivity of model 13 DC reactions involving MPN. Predictions based on visual inspection of the GGV-based dual descriptor showed no significant dependence on the basis sets. However, this analysis was unable to correctly predict the selectivity of the reaction of MPN and alkenes with substituents of intermediate character (CN and CO2Me).

The visual analysis of the FD dual descriptor is difficult and often inconclusive due to the great number of nodes and domains of opposite sign on the same atoms (difficulties that are exacerbated if electron correlation is taken into account). The condensation of the dual descriptor on atomic domains greatly simplifies the procedure, although information of the descriptor anisotropy is lost (which means that domains that are not oriented toward the other reactant can significantly affect the condensed value). The quality of the predictions greatly varies with the basis set selection as well as the AIM scheme used in the condensation. All condensation schemes and electronic structure methods (UB3LYP and UHF) fail to reproduce the observed experimental observations when diffuse functions are considered in the basis set (independently of the condensation scheme). This seems to be because diffuse functions gave poor results for the electronic structure of the anions, since their instability and the extra flexibility provided by the diffuse functions results in unrealistic electron densities. The best performing AIM scheme was the extended Hirshfeld method (he) which only fails for the dipolarophiles in group 2 when using a basis with diffuse functions. Otherwise its values give the correct predictions, as the relative values associated to the involved atoms show the correct selectivity and their difference in value is big enough to provide a clear differentiation between the electrophile and nucleophile regions.

In the case of the state-specific dual descriptor, it is important to select the correct state to obtain the dual descriptor of the MPN. Using the second excited state of the MPN gave the best results. With this method, the use of a basis with diffuse functions does not present the same problems as it is not necessary to perform calculations on the anions. Within this approach, the use the S2 dual descriptor for all systems and the 6-31G basis set is able to correctly reproduce all experimental results.

Before concluding, it should be pointed out that the fact that the dual descriptor fails to explain the observed regioselectivity in some cases (most notably, for the intermediate donors), might be an indication of the need to take into account other factors. Particularly, this could point to the prevalence of charge over orbital control. We are currently working to see if this is indeed the case.

In general, we have seen that the selection of the “computational conditions” at the time of undertaking a reactivity study is of paramount importance. The use of black-box computational setups is greatly discouraged, since even the qualitative results are shown to depend heavily on the initial conditions. It seems that the best way to proceed in these cases is to test all the available options for reactions that are (a) similar to those we want to study and (b) have well-established experimental results. Using the results of that systematic study, a combination of electronic structure method, AIM schemes, etc. that best reproduce those results can be selected.