1 Introduction

Hierarchical basis sets such as [1] cc-pVXZ and [2] XZP are known to improve the description of the energetics of a system by allowing to ascend on the basis set hierarchy without any additional cost, i.e., by permitting to move upward on the cardinal number series (\(X=D:2\), T: 3, Q: 4, 5, …) that specifies the basis set quality. In fact, the hierarchical structure allows both self-consistent-field Hartree–Fock (HF) type and correlation energies to be separately extrapolated to their so-called complete basis set (CBS) limits. In this regard, several protocols have been suggested to extrapolate the SCF [37] and correlation [823] energies. Recently [19, 20, 22, 24], we have shown that a basis set re-hierarchization procedure can help to improve further the results from such extrapolations for single-reference correlated energies, namely Møller–Plesset and coupled cluster types. The novel hierarchical structures [19, 22] were determined from the requirement that the \(X \le 6\) values fall on the straight line built by fitting the \(X=6\) and 5 correlation energies with the unified singlet- and triplet-pair extrapolation (USTE) scheme of Varandas [4], a procedure similar to the one previously utilized to obtain an extremely accurate curve for the helium dimer [25]. Specifically, the new hierarchical numbers (x) were chosen as the average values of the ones found for a test set of 18 molecules, with the method named as uniform singlet- and triplet-pair extrapolation [19], USTE\((x_i,x_j)\). The extrapolation protocol assumes the form

$$\begin{aligned} E_{X}^{\rm cor}=E_{\infty }^{\rm cor}+\frac{A}{x^{3}}, \end{aligned}$$
(1)

where \(E_{X}^{\rm cor}\) is the raw correlation energy calculated with the basis of cardinal number X, \(E_{\infty }^{\rm cor}\) the correlation energy at the CBS limit, x the novel hierarchical number [19, 22], and A a parameter to be determined jointly with \(E_{\infty }^{\rm cor}\) from a fit to the computed raw energies. The novel hierarchical structure has also allowed the construction of a one-parameter extrapolation scheme for different correlated methods [20], referred to as unified single-parameter extrapolation [20], USPE(x). This has been possible due to a good linear relation between the parameter A in Eq. (1) and the total energy. The CBS limit in USPE(x) is then obtained from

$$\begin{aligned} E_{X}^{\rm cor}=E_{\infty }^{\rm cor}+\frac{aE_{X}^{\rm tot}}{x^{3}}, \end{aligned}$$
(2)

where a is the average slope obtained for the 18 test sets by employing MP2, CCSD, and CCSD(T) correlated methods and \(E_{X}^{\rm tot}\) is the total energy calculated for the X-basis. The USTE\((x_i,x_j)\) procedure yielded excellent results in extrapolations of the correlation energy [19, 21, 22], atomization energies [19], and electrical properties [24] when compared with other two-parameter extrapolation schemes. In fact, USPE(x) yields correlation energies [20, 22] and electrical properties [20] of at least similar accuracy to the ones obtained with the best available two-parameter extrapolation schemes, which are obviously computationally more expensive. As current limitations, both USTE\((x_i,x_j)\) and USPE(x) are available only for single-reference methods [MP2, CCSD, and CCSD(T)], with USPE(x) being thus far available only for systems involving atoms from H to Ne.

It is well established that the multireference configuration interaction (MRCI) method is one of the most important tools for the calculation of accurate potential energy surfaces (PES). The literature in this area is vast, and we direct the reader to recent publications on both neutral and ionic species [2638] from where others can be reached via cross-referencing. To produce an accurate PES, extrapolations of the total MRCI energy to the CBS limit are necessary in many cases. The CBS limit of the dynamical correlation (dc) energy can be estimated with Varandas’s extrapolation scheme [4], as commented in the next section. However, for the CASSCF energy component, no definite extrapolation scheme has been proposed, and it would be valuable if an accurate extrapolation protocol could also be found for such a purpose. Additionally, we expect the CASSCF extrapolation scheme to be also applicable to the HF energy, as suggested elsewhere by one of us [4]. The aim of the current work is therefore to find an accurate extrapolation scheme both for the HF and CASSCF energies, hence improving the results obtained with previously reported protocols [7, 9] specifically proposed for the HF energy.

2 Extrapolation of HF and CASSCF energies: synopsis and novel scheme

Consider the total MRCI energy [4] split as

$$\begin{aligned} E_{X}^{\rm MRCI}=E_{X}^{\rm CAS}+E_{X}^{\rm dc}, \end{aligned}$$
(3)

where \(E_{X}^{\rm cas}\) is the CASSCF energy and \(E_{X}^{\rm dc}\) is the dynamical correlation component; the subscript X denotes as above the basis set quality. Such a partitioning is indispensable to perform good-quality extrapolations, since the CASSCF and dynamical correlation energies are known to converge with distinct rates for the same number of functions in the basis set.

The first extrapolation scheme specifically proposed for the dc energy is due to Varandas [4], with the CBS limit obtained from the following two-parameter form:

$$\begin{aligned} E_{X}^{\rm dc}=E_{\infty }^{\rm dc}+\frac{A_3}{(X+\alpha )^3} + \frac{A_5}{(X+\alpha )^5}, \end{aligned}$$
(4)

where

$$\begin{aligned} A_5=A_5^0+cA_3^n, \end{aligned}$$
(5)

with the parameters \(A_5^0\), n, and c for MRCI energies assuming the values of \(A_5^0=0.0037685459\), \(n=1.25\), and \(c=-1.17847713\). To obtain the CBS limit, two calculations with different basis sets are then required.

For the CASSCF energy, four popular schemes [7, 9] are now highlighted. The first [9] is of the exponential type with the form:

$$\begin{aligned} E_X^{\rm CAS}=E_{\infty }^{\rm CAS}+Ae^{-\beta X}, \end{aligned}$$
(6)

where A, \(\beta\), and \(E_{\infty }^{\rm CAS}\) are parameters to be determined by performing calculations with three distinct bases. The second [7] converts the three-parameter into a two-parameter protocol by fixing the value of \(\beta\) for each extrapolation pair of cardinal numbers. The criterion has been to minimize the RMSD for a set of test molecules [7], with the following outcoming values of \(\beta\): 1.54 for the (TQ), 1.95 for (Q, 5), 1.72 for (5, 6), and 1.26 for (6, 7). In the third scheme [7], the extrapolation assumes the form

$$\begin{aligned} E_X^{\rm CAS}=E_{\infty }^{\rm CAS}+\frac{A}{X^{\alpha }} \end{aligned}$$
(7)

where \(\alpha\) is once more determined to minimize the RMSD of a set of test systems [7]. The following values result 3.54 [5], 5.34, 8.74, 9.43, and 8.18, respectively, for (DT), (TQ), (Q, 5), (5, 6), and (6, 7) extrapolation cardinal pairs. Finally, the fourth scheme [7] assumes the form:

$$\begin{aligned} E_{X}^{\rm CAS}=E_{\infty }^{\rm CAS}+A(X+1)e^{(-\gamma X^{\frac{1}{2}})}, \end{aligned}$$
(8)

where \(\gamma\) assumes the values of: 6.57 for the (TQ) pair, 9.03 for (Q, 5), 8.77 for (5, 6), and 7.10 for (6, 7). Although the values of \(\alpha\), \(\beta\), and \(\gamma\) were determined from HF energies [7] when relatively large basis sets are used, Varandas [4] proposed that they should also yield accurate CBS values for the CASSCF energy, a suggestion that will be shown later to be corroborated by the results here presented. Indeed, the two-parameter extrapolation schemes of Karton and Martin [7] have been used by the Coimbra Group to extrapolate the CASSCF energy in calculations of a variety of accurate global PESs [2636], [3840].

The \(E_{\infty }^{\rm CAS}\) from Eq. (6) can be written, for a fixed and universal parameter \(\beta\), as:

$$\begin{aligned} E_{\infty }^{\rm CAS}=\frac{E_{X_i}e^{\beta X_i}-E_{X_j}e^{\beta X_j}}{e^{\beta X_i}-e^{\beta X_j}}, \end{aligned}$$
(9)

To determine hierarchical numbers for CASSCF extrapolations, we will then consider the CBS energy \(E_{\infty }^{\rm CAS}\) to be known, \(\beta\) to be universal and fixed, and one hierarchical number \(X=x_k\) to be also known. Thus, we may determine the remaining hierarchical numbers \((x_i)\) by solving the following system of equations:

$$\begin{aligned} {\left\{ \begin{array}{ll} E_{\infty }^{\rm CAS}=\frac{E_{x_2}e^{\beta x_2}-E_{x_3}e^{\beta x_3}}{e^{\beta x_2}-e^{\beta x_3}}\\ E_{\infty }^{\rm CAS}=\frac{E_{x_3}e^{\beta x_3}-E_{X_4}e^{\beta x_4}}{e^{\beta x_3}-e^{\beta x_4}}\\ \quad \quad \quad \quad \quad .\\ \quad \quad \quad \quad \quad .\\ \quad \quad \quad \quad \quad .\\ E_{\infty }^{\rm CAS}=\frac{E_{x_i}e^{\beta x_i}-E_{x_k}e^{\beta x_k}}{e^{\beta x_i}-e^{\beta x_k}} \end{array}\right. }, \end{aligned}$$
(10)

with the general solution being

$$\begin{aligned} x_i=\frac{1}{\beta }\ln\left[ \frac{e^{{\beta }x_{k}}(E_{\infty }^{\rm CAS}-E_{x_{k}}^{\rm CAS})}{E_{\infty }^{\rm CAS}-E_{x_{i}}}\right] , \end{aligned}$$
(11)

where \(i=2,3,4,5,\ldots\) and \(i\ne k\). Note that Eq. (10) has been constructed for consecutive \(x_i\) values \((x_i,x_{i+1})\), but the result is the same for pairs of nonconsecutive ones. In fact, the extrapolation with pairs of nonconsecutive x values is also viable with the new hierarchical scheme, hereinafter denoted by CAS-exponential or CAS-E\((x_i,x_j)\) or simply CAS-E when used for extrapolating CASSCF energies (correspondingly, HF-E for HF, or simply SCF-E when taken in its universal version; see later). Following previous usage [19, 25], the hierarchical numbers are labeled as \(x_2=d\), \(x_3= t\), \(x_4=q\), \(x_5=p\), \(x_6=h\), and \(x_7=s\).Footnote 1 All valence-only raw energies have been calculated with the Molpro electronic structure code [41], see the Supplementary Information (SI).

To proceed, we have chosen the \(E_{\infty }^{\rm CAS}\) values from Eq. (6), hereinafter denoted by KM\(\beta\)(6,7), with the (6, 7) cardinal pair and \(\beta =1.26\). [7]. Note that KM\(\beta (6,7)\) has been adopted simply because it yields the closest to each other energies: deviations of \(\le\)0.1 \(\mu \hbox {E}_h\). In fact, the three-parameter Eq. (6) underestimates the KM\(\beta (6,7)\) predictions by typically 10 \(\mu \hbox {E}_h\). The adopted x and x j -numbers are finally chosen as average values of the calculated ones for each member of the 18 test sets (\({\rm CH}_{4}\), \({\rm CO}_{2}\), \({\rm CO}\), \({\rm F}_{2}\), \({\rm H}_{2}{\rm CO}\), \({\rm H}_{2}{\rm O}\), HCN, HF, HNO, N2, NH3, O3, CH2, C2H2, C2H4, N2H2, H2O2) used in previous work [19, 20, 22]. To determine the novel hierarchical x-numbers, the Dunning [1] cc-pVXZ (\(X=D\), T, Q, 5, and 6) basis sets have been utilized, together with the pV7Z of Feller and coworkers [42, 43]. As noted above, to solve Eq. (11), both \(\beta\) and \(x_k\) must be fixed. Because the hierarchical numbers are expected [19, 22] to lie close to the corresponding cardinals, the values of \(\beta\) and \(x_k\) were further constrained to minimize the root mean square deviation (RMSD) relative to the corresponding cardinals. Since the highest x-number is expected to present the lowest variation among the various systems [22], this led to perform the search around \(x_k=7\). Finally, to find the minimum RMSD, a grid of points was constructed for \(1.2\le \beta \le 2.0\) and \(6.50 \le x_k\le 7.50\), with a mesh separation of 0.01. For each pair (\(\beta\), \(x_k\)) of the above grid, the following steps have then been performed: (i) Solve Eq. (11) to obtain \(x_i\), with \(i=2, 3, 4, 5\), and 6 for the 18 systems. (ii) Determine average \(x_i\) numbers for the 18 test systems: The final hierarchical numbers d, t, q, p, h, and s (where \(s=x_k\)) are then obtained for each specific pair (\(\beta\), \(x_k\)). (iii) For the d, t, q, p, h and s values so obtained, the RMSD is calculated with the reference values being the corresponding cardinal numbers, \((x_i-X_i)\). The optimum values of \(\beta\) and \(x_k\) are the ones that yield the smallest RMSD.

Figure 1 shows a plot of the RMSD versus \(\beta\) versus \(x_k\) for the CASSCF method, with a similar plot being obtained for the HF energy. In turn, the optimal \(\beta\) and \(X_i=x_i\) values obtained for the HF and CASSCF energies are given in Table 1, with the CBS limit obtained with the SCF-E (HF-E and CAS-E) schemes via Eq. (9). Note that a set of universal x-numbers, i.e., usable both with the HF and CASSCF methods, is also proposed.

Fig. 1
figure 1

Parameter \(\beta\) and hierarchical number \(x_k\) versus root mean square deviation (RMSD) relative to the corresponding cardinals. The minimum of RMSD for the CASSCF occurs at \((\beta ,x_k)=(1.63,6.90)\)

Table 1 Hierarchical numbers for HF and CASSCF energies

3 Results and discussion

Table 1 of the SI lists the statistical analysis for the raw and extrapolated CASSCF energies of the 18 systems used to determine the x-numbers; the reference values have been obtained by KM\(\beta (6,7)\) extrapolation. Comparison for the same pairs of consecutive X/x values shows that the CAS-E scheme with the specific x-number outperforms other results based on the Karton–Martin protocols, with the most significant improvement of 322 \(\mu \hbox {E}_h\) (~33%) in mean absolute deviation (MAD) occurring for the (DT)/(dt) pair. For the (TQ)/(tq), the gain in MAD is only ≥25 \(\mu \hbox {E}_h\) (~7%). In turn, for (Q, 5)/(qp), (5, 6)/(ph), and (6, 7)/(hs), the results are virtually the same for all schemes. When employing universal x-numbers, CAS-E shows an improvement of 49 \(\mu \hbox {E}_h\) relative to (tq) with specific x values. However, (dt) underperforms by 206 \(\mu \hbox {E}_h\) relative to the case where specific x-numbers are used.

Shown in Table 2 is the statistics for the extrapolated and raw values obtained at the HF/cc-pVXZ level of theory with the test set of 106 closed-shell systems [17, 19] formed by atoms of H, N, O, C, and F. Comparing the results for pairs of consecutive X/x values, it is seen that the difference between the HF-E and KM\(\alpha\) [Eq. (7)] results is only of 7 \(\mu \hbox {E}_h\) (~1%) for (DT)/(dt). However, for (TQ)/(tq), the improvement obtained with HF-E is ≥152 \(\mu \hbox {E}_h\) (~35%) compared with the Karton–Martin schemes. For the others pairs with consecutive values of X/x, the results are practically the same. The notable point is that both the HF- and CAS-E schemes using pairs of nonconsecutive x values yield results of at least similar quality to the ones obtained via Karton–Martin and even SCF-E schemes with consecutive extrapolation pairs. For example, the (ds), (dh), (dp), and (dq) pairs show virtually the same results as (hs), (ph), (qp), and (tq), but at significantly lower computational cost. Such a pattern is also observed for other nonconsecutive pairs. In fact, the key point of hierarchical extrapolation schemes lies on the highest x-number used in the pair: The CBS estimate is practically the same regardless the lowest x value of the pair.

Figure 2 shows the behavior of the KM\(\alpha\), MK\(\beta\) , and CAS-E extrapolation schemes for the \(\mathrm O_3\) molecule at CASSCF/cc-pVXZ level of theory. Although the results are practically the same for large X/x values, the current hierarchical scheme is the only one that predicts the correct shape of the extrapolation curve for any extrapolation pair. Thus, the methodology here utilized to generate the x-numbers makes the extrapolations consistent. On average, the x values lie closer to the correct extrapolation curve, CAS-E(hs), than the cardinals ones (see the SI). For this reason, the results of the SCF-E extrapolations are (on average) more reliable than the Karton–Martin protocols, especially when involving \(x=d\), t, and q. To summarize, the hierarchical numbers are expected to be more realistic than the cardinals, allowing accurate predictions in extrapolations with pairs of consecutive and nonconsecutive x values. This is to be expected based on the results of previous work [19, 22, 25] where the re-hierarchization procedure has also been utilized.

Table 2 Statistical analysis (in \(\mu \hbox {E}_h\)) for the extrapolated and raw energies for the 106 test sets
Fig. 2
figure 2

Comparison of extrapolation schemes for \(\hbox {O}_3\) at CASSCF/cc-pVXZ level of theory. The hierarchical numbers (#s) are the specific values from Table 1. Note that the hierarchical numbers describe the correct shape of the extrapolation curve for all x values, while the cardinals, \(X=D\) and T, present small deviations from the CAS-E(hs) curve. Additionally, despite accurate results obtained from KM\(\alpha (6,7)\) and KM\(\beta (6,7)\), the shape of the extrapolation curve is clearly incorrect

Shown for brevity in Table 3 of the SI is the statistical analysis for the extrapolated and raw values of the 43 systems considered by Jensen [44]: \({}^{1}{\hbox {CH}}^{+}\), \({} ^{3}{\hbox {CH}}^{-}\), \({} ^{3}{\hbox {NH}}\), \({} ^{1}{\hbox {OH}}^{-}\), \({} ^{1}{\hbox {FH}}\), \({} ^{1}{\hbox {C}_2}\), \({} ^{2}{\hbox {CN}}\), \({} ^{1}{\hbox {CN}}^{-}\), \({} ^{1}{\hbox {N}_2}\), \({} ^{1}{\hbox {NO}}^{+}\), \({} ^{3}{\hbox {NO}}^{-}\), \({} ^{1}{\hbox {CO}}\), \({} ^{3}{\hbox {O}_2}\), \({} ^{1}{\hbox {CF}}^{+}\), \({} ^{3}{\hbox {CF}}^{-}\), \({} ^{3}{\hbox {NF}}\), \({} ^{1}{\hbox {OF}}^{-}\), \({} ^{1}{\hbox {F}_2}\), \({} ^{2}{\hbox {F}_2}^{-}\), \({} ^{3}{\hbox {SiH}}^{-}\), \({} ^{1}{\hbox {SH}}^{-}\), \({} ^{1}{\hbox {HCl}}\), \({} ^{2}{\hbox {CP}}\), \({} ^{1}{\hbox {CP}}^{-}\), \({} ^{1}{\hbox {CS}}\), \({} ^{2}{\hbox {SiN}}\), \({} ^{1}{\hbox {SiN}}^{-}\), \({} ^{1}{\hbox {NP}}\), \({} ^{3}{\hbox {SN}}^{-}\), \({} ^{3}{\hbox {NCl}}\), \({} ^{1}{\hbox {SiO}}\), \({} ^{3}{\hbox {PO}}^{-}\), \({} ^{3}{\hbox {SO}}\), \({} ^{1}{\hbox {SF}}^{-}\), \({} ^{3}{\hbox {PF}}\), \({} ^{1}{\hbox {ClF}}\), \({} ^{1}{\hbox {SiS}}\), \({} ^{1}{\hbox {P}_2}\), \({} ^{3}{\hbox {PS}}^{-}\), \({} ^{3}{\hbox {S}_2}\), \({} ^{1}{\hbox {SCl}}^{-}\), \({} ^{1}{\hbox {Cl}_2}\), \({} ^{2}{\hbox {Cl}_2}^{-}\), all composed of first- and second-row elements having wave functions of \(\Sigma\) symmetry; the left superscript indicates the spin multiplicity. All energies have been calculated at both the HF and CASSCF levels of theory with the aug-cc-pVXZ basis [45] at the same geometries given in the original paper [44]; in this case, the reference energies are HF numerical ones [44], and CASSCF/KM\(\beta (5,6)\) extrapolations that are deemed sufficiently accurate for the propose. Note that such test set consists of neutral species formed from first- and second-row atoms, as well as positive and negative molecular ions, thus contrasting with the neutral and first-row systems considered above. A comparison for extrapolation pairs with consecutive X/x values shows once more that the hierarchical methods outperform the Karton–Martin schemes for both the (dt) and (tq) pairs. The average gain for the HF method is now of 93 \(\mu \hbox {E}_h\) (~6%) for the (dt) pair and \(\le 81 \mu \hbox {E}_h\) (~14%) for (tq). Regarding the CASSCF energies, the average improvement is of 334 \(\mu \hbox {E}_h\) (~21%) and 100 \(\mu \hbox {E}_h\) (~6%) for (dt) when using specific and universal x-numbers, respectively. In the same order, the mean improvement for CASSCF/(tq) is of ≥28 \(\mu \hbox {E}_h\) (~6%) and ≥76 \(\mu \hbox {E}_h\) (~17%) for specific and universal x values. For the others pairs of consecutive X/x values, the results are virtually the same for all schemes here examined. In turn, for extrapolations with pairs of nonconsecutive x values, the predictions are somewhat poorer, but still competitive with the ones obtained via consecutive x-numbers, thus reinforcing the observation that the key issue in the hierarchical extrapolations refers to the highest x-number of the pair. Also notable is the fact that the extrapolated results for Jensen’s set of 43 systems [44] are somewhat poorer than for the first-row systems in Table 1 of the SI and Table 2. However, if only molecules formed by first-row elements are considered from there [44], then the RMSD is reduced by 1112 \(\mu \hbox {E}_h\), 239 \(\mu \hbox {E}_h\), and 83 \(\mu \hbox {E}_h\) for the (dt), (tq), and (qp) pairs (respectively) when specific values of x and CASSCF energies are considered. For HF ones with specific x-numbers, the reduction in the same order is of 1257 \(\mu \hbox {E}_h\), 259 \(\mu \hbox {E}_h\), 103 \(\mu \hbox {E}_h\), and 12 \(\mu \hbox {E}_h\). In this case (only first-row molecules), extrapolation with pairs of nonconsecutive x values yields results of better (or at least equal) quality than those obtained with consecutive x values.

4 Concluding remarks

We have suggested a two-parameter scheme for extrapolating Hartree–Fock and multiconfiguration self-consistent-field energies to the complete one-electron basis set limit. The novel protocol employs the idea of basis set re-hierarchization, with the hierarchical x-numbers for the HF and CASSCF methods lying very close to each other. A universal set of hierarchical numbers is also proposed which applies to both methods with an almost equal performance. The scheme here reported allows extrapolations with arbitrary hierarchical pairs (consecutive or nonconsecutive), hence allowing a significant reduction in computational cost, especially when high quality basis sets are used in the calculation. Note that the CASSCF is a significant part of the total MRCI energy. Thus, performing MRCI/aug-cc-pVQZ calculations rather than MRCI/aug-cc-pV5Z ones can represent a considerable reduction in the computational cost when thousands, even millions, of points are necessary to build a multidimensional global PES. On this regard, the numerical results here presented for test sets that include neutral, ionic, closed-shell, and open-shell species, show that the new scheme outperforms the best available protocols in all cases, even when pairs of nonconsecutive x values are used. Indeed, the results can be up to 30% closer to the estimated CBS limit than the ones obtained from the Karton–Martin schemes with the same computational cost. Furthermore, the most significant improvements are obtained for small hierarchical numbers, namely pairs formed from \(x=q\), t, and d which are the ones expected to be affordable with increasing system size.