4.1 Introduction

Validation and certification of numerical results is a key issue in all fields involving computer simulation. The error between the exact and computed values of a given physical quantity of interest (QOI), e.g. the dissociation energy of a molecule, has several origins [1]: a model error (resulting from the choice of a computationally tractable, but not extremely accurate, model, e.g. Kohn-Sham with PBE functional), a discretization error (resulting from the choice of a finite basis set or a grid), an algorithmic error (due to the choice of stopping criteria in self-consistent field and other iterative algorithms), an implementation error (due to possible bugs or uncontrolled round-off errors), a computing error (due to random hardware failures). Quantifying these different sources of errors is of major importance for two reasons. First, guaranteed estimates on these five components of the error would allow one to supplement the computed value of the QOI returned by the numerical simulation with guaranteed error bars (certification of the result). Second, this would allow one to choose the parameters of the simulation (approximate model, discretization parameters, algorithm and stopping criteria, data structures) in an optimal way in order to minimize the computational effort required to reach the target accuracy (error balancing).

In contrast with the current situation in other fields, such as computational mechanics and engineering sciences [2], neither fully guaranteed error bounds nor black-box error balancing schemes are available yet for molecular simulation. However, recent progress has been made on the analysis of the different sources of errors for various electronic structure models, see e.g. [1, 3,4,5,6,7,8,9,10,11,12,13,14,15,16] and references therein, and in particular on discretization error, which is the matter of the present chapter.

For the sake of clarity and brevity, we will restrict ourselves to the analysis of the plane-wave approximation of the  Gross-Pitaevskii model. This model was introduced in the early 60s to describe the ground state of Bose-Einstein condensates [17]. From a mathematical point of view, it can be seen as a simplified version of the Kohn-Sham model, involving a single orbital and a very simple mean-field potential. The discretization error cancellation phenomenon, which plays a crucial role in electronic structure calculation, will be analyzed in Sect. 4.4. Beforehand, we will introduce in Sect. 4.2 the key concepts of QOI-related a priori and a posteriori error estimators leading to post-processing methods, and asymptotic expansions leading to extrapolation methods.

We will omit the proofs of the rigorous mathematical results mentioned in this contribution, but we will comment on these results in detail.

4.2 Basic Concepts in Discretization Error Analysis

To clarify what error analysis is about, consider a reference model for which the ground-state energy is obtained by solving a minimization problem of the form

$$\begin{aligned} E_0 = \inf \left\{ \mathscr {E}(v), \; v \in \mathscr {X}, \, c(v)=0 \right\} , \end{aligned}$$
(4.1)

where \(\mathscr {E}: \mathscr {X}\rightarrow \mathbb R\) is an energy functional defined on some infinite-dimensional function space \(\mathscr {X}\), and \(c : \mathscr {X}\rightarrow \mathscr {Y}\) represents the constraints on the admissible states (\(\mathscr {Y}\) is a finite- or infinite-dimensional vector space). Hartree-Fock, Kohn-Sham, multi-configuration self-consistent field (MCSCF), and many other models are of the generic form (4.1). For instance, the restricted Hartree-Fock problem for the helium atom can be written, in atomic units, as (4.1) with

$$ \mathscr {E}(v) = \int \limits _{\mathbb R^3} |\nabla v|^2 - 4 \int \limits _{\mathbb R^3} \frac{v(\mathbf{r})^2}{|\mathbf{r}|} \, \mathrm{d}\mathbf{r}+ 2 \int \limits _{\mathbb R^3} \int \limits _{\mathbb R^3} \frac{v(\mathbf{r})^2 \, v(\mathbf{r}')^2}{|\mathbf{r}-\mathbf{r}'|} \, \mathrm{d}\mathbf{r}\, \mathrm{d}\mathbf{r}', $$

\(\mathscr {X}= H^1(\mathbb R^3)\), \(\mathscr {Y}=\mathbb R\) and \(c(v)=\int \nolimits _{\mathbb R^3}v(\mathbf{r})^2 \, \mathrm{d}\mathbf{r}-1\), where \(H^1(\mathbb R^3)\) is the Sobolev space of real-valued functions of \(\mathbb R^3\) which are square integrable and whose gradient is square integrable as well:

$$\begin{aligned} H^{1}(\mathbb R^{3}) : = \left\{ v : {\mathbb R}^{3} \rightarrow {\mathbb R} \; | \; \Vert v\Vert _{H^{1}}^{2}:=\int \limits _{{\mathbb R}^{3}} v({\mathbf {r}})^{2} \, \mathrm{d}{\mathbf {r}} + \int \limits _{{\mathbb R}^{3}} |\nabla v({\mathbf {r}})|^{2} \, \mathrm{d}{\mathbf {r}} < \infty \right\} . \end{aligned}$$
(4.2)

Likewise, the restricted Kohn-Sham LDA model for a non-magnetic molecular system with \(N_p\) electron pairs can be written as (4.1) with \(\mathscr {X}=(H^1(\mathbb R^3))^{N_p}\), \(\mathscr {Y}\) the space of real, symmetric, \(N_p \times N_p\) matrices, and for all \(v=(\phi _1,\cdots ,\phi _{N_p}) \in (H^1(\mathbb R^3))^{N_p}\),

$$ \mathscr {E}(v)=\sum _{i=1}^{N_p} \int \nolimits _{\mathbb R^3} |\nabla \phi _i|^2 + \int \nolimits _{\mathbb R^3} \rho _v V_\mathrm{nuc} + \frac{1}{2} \int \nolimits _{\mathbb R^3} \int \nolimits _{\mathbb R^3} \frac{\rho _v(\mathbf{r})^2 \, \rho _v(\mathbf{r}')^2}{|\mathbf{r}-\mathbf{r}'|} \, \mathrm{d}\mathbf{r}\, \mathrm{d}\mathbf{r}' + E_\mathrm{xc}^\mathrm{LDA}(\rho _v), $$
$$ \text{ with } \quad \rho _v(\mathbf{r}) = 2 \sum _{i=1}^{N_p} |\phi _i(\mathbf{r})|^2, \quad \text{ and } \quad [c(v)]_{ij}= \int \nolimits _{\mathbb R^3} \phi _i\phi _j - 1. $$

Here \(V_\mathrm{nuc}\) is the electrostatic potential generated with the nuclei, and \(E_\mathrm{xc}^\mathrm{LDA}\) the local density approximation of the exchange-correlation functional [18].

4.2.1 Variational Approximations

A variational approximation of (4.1) is obtained by choosing a finite-dimensional subspace \(\mathscr {X}_\mathscr {N}\) of \(\mathscr {X}\) and in considering

$$\begin{aligned} E_{0,\mathscr {N}} = \inf \left\{ \mathscr {E}(v_\mathscr {N}), \; v_\mathscr {N}\in \mathscr {X}_\mathscr {N}, \, c(v_\mathscr {N})=0 \right\} . \end{aligned}$$
(4.3)

Obviously, since \(\mathscr {X}_\mathscr {N}\subset \mathscr {X}\), we have \(E_{0,\mathscr {N}} \ge E_0\): the approximate ground-state energy \(E_{0,\mathscr {N}}\) is an upper bound of the exact ground-state energy \(E_0\).

A particularly important QOI is the ground-state energy \(E_0\). It is, therefore, natural to try and estimate the error \(E_{0,\mathscr {N}}-E_0\) and compare it to other characteristic energies of the problem (e.g. to \(k_\mathrm{B} T\)) to determine whether the discretization error is sufficiently small or not. In other cases, the QOI is a function of the minimizer u of (4.1) (e.g. the dipolar momentum of a neutral molecule is obtained from the ground-state electronic density, which is itself computed from the Kohn-Sham orbitals). In such cases, the exact value of the QOI is q(u) while the computed value is \(q(u_{\mathscr {N}})\), where \(q:\mathscr {X}\rightarrow \mathbb R\) is a given function, u a minimizer of (4.1), and \(u_{\mathscr {N}}\) a minimizer of (4.3). The error on the QOI to be estimated then is \(q(u_{\mathscr {N}})-q(u)\).

4.2.2 A Priori Error Analysis

For systematically improvable discretization methods, such as plane-waves (PW) [19,20,21], finite-elements [22, 23], or wavelets [24], we can construct a sequence of approximation spaces \((\mathscr {X}_\mathscr {N})_{\mathscr {N}>0}\) such that

  1. 1.

    for \(\mathscr {N}< \mathscr {N}'\), \(\mathscr {X}_\mathscr {N}\subsetneq \mathscr {X}_{\mathscr {N}'}\), that is \(\mathscr {X}_\mathscr {N}\) gets larger and larger when \(\mathscr {N}\) grows;

  2. 2.

    any function of \(\mathscr {X}\) can be approximated arbitrarily well by some function of \(\mathscr {X}_\mathscr {N}\) provided \(\mathscr {N}\) is large enough:

    $$ \forall v \in \mathscr {X}, \quad \min _{v_\mathscr {N}\in \mathscr {X}_\mathscr {N}} \Vert v-v_\mathscr {N}\Vert _\mathscr {X}\mathop {\rightarrow }_{\mathscr {N}\rightarrow \infty } 0, $$

    where \(\Vert \cdot \Vert _\mathscr {X}\) is the norm of the function space \(\mathscr {X}\).

This can be done by refining the mesh in finite-element methods, and by increasing the resolution in wavelet methods, or the energy cut-off in PW methods. In the latter case, \(\mathscr {N}\) is usually the wave-vector cut-off, which is related to the energy cut-off \(E_{\mathrm{co},\mathscr {N}}\) by the relation \(E_{\mathrm{co},\mathscr {N}}= \frac{\mathscr {N}^2}{2}\) (in atomic units).

A priori error estimators are results assessing that the computed value of the QOI converges to the exact value of the QOI when \(\mathscr {N}\) goes to infinity, and providing in addition convergence rates. A typical such result (which holds true [3] for the PW discretization of the periodic Kohn-Sham LDA model with pseudopotentials [25,26,27], for well-chosen minimizers \(u_{\mathscr {N}}\) of (4.3)) is the existence of positive constants s, \(c_-\), \(c_+\) and \(c_s\) such that for all \(\mathscr {N}\),

$$\begin{aligned} c_- \Vert u_{\mathscr {N}}-u\Vert _\mathscr {X}^2 \le E_{0,\mathscr {N}}-E_0 \le c_+ \Vert u_{\mathscr {N}}-u\Vert _\mathscr {X}^2 \end{aligned}$$
(4.4)

and

$$\begin{aligned} \Vert u_{\mathscr {N}}-u\Vert _\mathscr {X}\le \frac{c_s}{\mathscr {N}^s}. \end{aligned}$$
(4.5)

This result implies that, on the one hand, the error on the energy goes to zero at the same speed as the square of the error on the orbitals (measured in \(\mathscr {X}\)-norm), and that, on the other hand, the \(\mathscr {X}\)-norm error on the orbitals goes to zero as \(\mathscr {N}^{-s}\). Gathering (4.4) and (4.5), we obtain

$$\begin{aligned} 0 \le E_{0,\mathscr {N}}-E_0 \le \frac{c_+c_s^2}{\mathscr {N}^{2s}} = \frac{c_+c_s^2}{2^s E_{\mathrm{co},\mathscr {N}}^{s}}. \end{aligned}$$
(4.6)

The admissible values of s in (4.5)–(4.6) can usually be obtained explicitly. Typically, estimate (4.5) will hold true for any \(s < s_\mathrm{max}\), but not for \(s > s_\mathrm{max}\), where the value of \(s_\mathrm{max}\) is an explicit outcome of the mathematical analysis. As a matter of example [3], \(s_\mathrm{max}=\frac{7}{2}\) for PW discretizations of periodic Kohn-Sham LDA models with Troullier-Martins pseudopotentials [26]. Note that \(-s_\mathrm{max}\) is basically the slope of the convex hull of the log-log plot of the discretization error \(E_{0,\mathscr {N}}-E_0\) as the function of the energy cut-off \(E_{\mathrm{co},\mathscr {N}}\). The higher \(s_\mathrm{max}\), the faster the asymptotic convergence of the computed ground-state energy towards the exact value for the considered model.

  The main interest of a priori error estimators is that they allow to get quantitative insight on the difficulty of getting an accurate approximation of a given quantity of interest with a given numerical method. Indeed, the value of \(s_\mathrm{max}\) for which

$$\begin{aligned} |q(u_\mathscr {N})-q(u)| \le \frac{C_s}{E_{\mathrm{co},\mathscr {N}}^s} \end{aligned}$$
(4.7)

for any \(s < s_\mathrm{max}\), but not for \(s>s_\mathrm{max}\) heavily depends on the QOI q. If \(s_\mathrm{max}\) is “large” (say \(s_\mathrm{max}=3\)) doubling the energy cut-off will typically increase the accuracy by a factor 8, while if \(s_\mathrm{max}\) is “small” (say \(s_\mathrm{max}=1\)) doubling the energy cut-off will only double the accuracy. Again for PW discretizations of periodic Kohn-Sham models with Troullier-Martins pseudopotentials, we have seen that \(s_\mathrm{max}=\frac{7}{2}\) if the QOI is the energy; on the other hand, \(s_\mathrm{max}=\frac{3}{2}\) is the QOI is the value of the ground-state density at a particular point of the simulation cell, which makes the latter QOI much more difficult to converge than the former one.

This argument also allows one to clearly understand one of the main roles of pseudopotentials, namely smoothing out the Coulomb singularities generated by point nuclei. Indeed, as we have seen before, if the QOI is the ground-state energy, \(s_\mathrm{max}=\frac{7}{2}\) for Troullier-Martins pseudopotentials, while we only have \(s_\mathrm{max}=\frac{3}{2}\) for point-like nuclei.

It is also interesting to reformulate the above results in terms of the computational time \(\mathrm{CPU}_\varepsilon ^q\) necessary to reach a given accuracy \(\varepsilon \) for the QOI q. For PW Kohn-Sham LDA calculations, the computational time typically scales as \(E_{\mathrm{co},\mathscr {N}}^{3/2} \log E_{\mathrm{co},\mathscr {N}}\) (using preconditioned gradient methods and Fast Fourier Transforms, see e.g. [28] and references therein). A simple calculation shows that if (4.7) is satisfied for any \(s < s_\mathrm{max}\), but not for \(s>s_\mathrm{max}\), then

$$\begin{aligned} \log \mathrm{CPU}_\varepsilon ^q \sim - \frac{3}{2s_\mathrm{max}} \log \varepsilon . \end{aligned}$$
(4.8)

We will see later that a priori error estimates can also be useful to design new, efficient, numerical schemes.

A priori error estimators, however, suffer from two severe limitations. First, the optimal value of the constant \(C_s\) in (4.7) is usually unknown. The constant \(C_s\) derived from the mathematical analysis is most often dramatically overestimated, sometimes by several orders of magnitude. In addition, it usually depends on the exact solution u to the problem, which is unknown. The constant \(C_s\) does not appear in (4.8) because this relation is in log-log scales, but an estimation of the optimal value of \(C_s\) would of course be of major interest for practical purposes. The second limitation is that an inequality such as (4.7) is only useful when the right-hand side is small enough, that is in the asymptotic regime when the cut-off energy \(E_{\mathrm{co},\mathscr {N}}\) is large enough.

4.2.3 A Posteriori Error Estimators and Post-Processing

  A posteriori estimates are very different in nature from a priori error estimates. An a posteriori discretization error estimator for the QOI q is a pair of inequalities of the form

$$\begin{aligned} \eta _\mathrm{l.b.}^q(u_\mathscr {N}) \le q(u_\mathscr {N}) - q(u) \le \eta _\mathrm{l.b.}^q(u_\mathscr {N}) \end{aligned}$$
(4.9)

(where we recall that u is a minimizer of (4.1) and \(u_\mathscr {N}\) is a minimizer of (4.3), and where \(\mathrm{l.b.}\) and \(\mathrm{u.b.}\) stand for lower bound and upper bound respectively), which, ideally, satisfy the following properties:

  1. 1.

    the estimator is guaranteed, in the sense that inequalities (4.9) can be established with full mathematical rigour;

  2. 2.

    the lower and upper bounds \(\eta _\mathrm{l.b.}^q(u_\mathscr {N})\) and \(\eta _\mathrm{l.b.}^q(u_\mathscr {N})\) are fully computable from the approximate solution \(u_\mathscr {N}\) and the data of the problem; in particular, they do not involve the exact solution u, in contrast with the bounds resulting from a priori error estimates;

  3. 3.

    \(\eta _\mathrm{l.b.}^q(u_\mathscr {N})\) and \(\eta _\mathrm{l.b.}^q(u_\mathscr {N})\) are cheap to compute: their numerical values can be obtained with a negligible, or small enough, computational cost;

  4. 4.

    the estimates are accurate, in the sense that \(\eta _\mathrm{l.b.}^q(u_\mathscr {N})\), \(q(u_\mathscr {N}) - q(u)\), and \(\eta _\mathrm{l.b.}^q(u_\mathscr {N})\) are of the same order of magnitude for generic values of \(\mathscr {N}\) (note however that \(|q(u_\mathscr {N}) - q(u)|\) can be, by chance, much smaller than \(|\eta _\mathrm{l.b.}^q(u_\mathscr {N})|\) and \(|\eta _\mathrm{l.b.}^q(u_\mathscr {N})|\) for some specific values of \(\mathscr {N}\));

  5. 5.

    the estimates give insights on what to do to improve the quality of the approximation.

Let us clarify the last point. Finite-element methods, as well as wavelet or some hierarchical tensor methods, have more flexibility than PW discretization methods. While in PW method, the user only controls a single discretization parameter, namely the wave-vector cut-off \(\mathscr {N}\), or equivalently, the energy cut-off \(E_{\mathrm{co},\mathscr {N}}=\frac{\mathscr {N}^2}{2}\), the quality of a finite-element approximation space can be improved by locally refining the mesh in the regions of the simulation cell where the field u strongly varies. In many cases, it is possible to construct lower and upper bounds \(\eta _\mathrm{l.b.}^q(u_\mathscr {N})\) and \(\eta _\mathrm{l.b.}^q(u_\mathscr {N})\) as a sum of localized contributions to the error [29], each of them being obtained by solving a small-size local problem. The advantage of such a decomposition is twofold: first the computation of these local contributions can be easily parallelized on a large number of processors; second, it paves the way to adaptive finite-element methods, where the mesh is refined only in the regions of the simulation cell where the local error is significant. This can be done with a black-box algorithm and can dramatically reduce the overall computational effort necessary to reach a given accuracy (compared to brute force, uniform, mesh-refinement methods).

Let us emphasize that the above five properties of ideal a posteriori error estimators are usually not completely fulfilled by most of the a posteriori error estimators available in practice. Indeed,

  1. 1.

    Inequalities (4.9) are sometimes only satisfied for large enough values of \(\mathscr {N}\). In this case, it is interesting to have at our disposal checkable conditions allowing one to know whether the bounds are reliable or not. Such conditions can take the form

    $$ \text {if }c_q(u_\mathscr {N}) > 0, \text {then (9) hold true}, $$

    where \(c_q(u_\mathscr {N})\) is a real number computable from the approximate solution \(u_\mathscr {N}\) at low cost;

  2. 2.

    The lower and upper bounds may not be fully computable in the sense that they are in fact a function of the (known) approximate solution \(u_\mathscr {N}\) and of the exact (unknown) solution u, but nevertheless decomposable as

    $$ \eta _\mathrm{\star .b.}^q(u_\mathscr {N},u) = \eta _\mathrm{\star .b., 1}^q(u_\mathscr {N}) + \eta _\mathrm{\star .b., 2}^q(u_\mathscr {N},u), $$

    where \(\eta _\mathrm{\star .b., 1}^q(u_\mathscr {N})\) is fully computable and \(\eta _\mathrm{\star .b., 2}^q(u_\mathscr {N},u)\) small compared to \(\eta _\mathrm{\star .b., 1}^q(u_\mathscr {N})\), at least when \(\mathscr {N}\) is large enough. A priori error estimates can be called out to justify the smallness of \(\eta _\mathrm{\star .b., 2}^q(u_\mathscr {N},u)\);

  3. 3.

    Computing \(\eta _\mathrm{l.b.}^q(u_\mathscr {N})\) and \(\eta _\mathrm{l.b.}^q(u_\mathscr {N})\) may require solving an auxiliary problem of the same complexity as the original problem (4.1), which may double or triple the cost of the calculation. In engineering sciences, simulations are more and more substitutes to experiments and prototypes in the design process; it is then worth paying a significant extra-cost to guarantee the quality of the simulation results;

  4. 4.

    Quite often, the relative quality of the lower and upper bounds increases with \(\mathscr {N}\). In the case when the QOI is the ground-state energy, a posteriori error estimates are of the form

    $$\begin{aligned} 0 < \eta _\mathrm{l.b.}^E(u_\mathscr {N}) \le E(u_\mathscr {N}) - E(u) \le \eta _\mathrm{l.b.}^E(u_\mathscr {N}), \end{aligned}$$
    (4.10)

    and we can define the efficiency factors of the lower and upper bounds as

    $$ 1 \le I^\mathrm{l.b.}_\mathscr {N}= \frac{E(u_\mathscr {N}) - E(u)}{\eta _\mathrm{l.b.}^E(u_\mathscr {N})} \quad \text{ and } \quad 1 \le I^\mathrm{u.b.}_\mathscr {N}= \frac{\eta _\mathrm{u.b.}^E(u_\mathscr {N}) }{E(u_\mathscr {N}) - E(u)}. $$

    The closer \(I^\mathrm{l.b.}_\mathscr {N}\) and \(I^\mathrm{u.b.}_\mathscr {N}\) to 1, the better. The a posteriori estimate (4.10) is called asymptotically exact if both \(I^\mathrm{l.b.}_\mathscr {N}\) and \(I^\mathrm{u.b.}_\mathscr {N}\) converge to 1 when \(\mathscr {N}\) goes to infinity. Note that if, for instance, \(I^\mathrm{l.b.}_\mathscr {N}\) goes to 1 when \(\mathscr {N}\) goes to infinity, then for \(\mathscr {N}\) large enough, the post-processed approximation of the ground-state energy

    $$ \widetilde{E}(u_\mathscr {N})= E(u_\mathscr {N})-\eta _\mathrm{l.b.}^E(u_\mathscr {N}) $$

    is more accurate than the original one \(E(u_\mathscr {N})\).

4.2.4 Asymptotic Expansions and Extrapolation

  In some specific cases, it is possible to expand the error \(q(u_\mathscr {N})-q(u)\) in terms of simple functions of \(\mathscr {N}\) in the limit when \(\mathscr {N}\) goes to infinity, and obtain, as a matter of illustration—this is just an example—, asymptotic expansions of the form

$$\begin{aligned} q(u_\mathscr {N})-q(u) = \frac{a_1}{\mathscr {N}^{2/3}} + \frac{a_2}{\mathscr {N}} + O \left( \frac{1}{\mathscr {N}^{4/3}} \right) . \end{aligned}$$
(4.11)

The main interest of asymptotic expansions is that they allow extrapolations. Indeed, assuming a result such as (4.11), one can combine the values of \(q(u_\mathscr {N})\) for several correlated values of \(\mathscr {N}\), and obtain, for instance,

$$ \left( \alpha q(u_{\mathscr {N}}) + \beta q(u_{2\mathscr {N}}) + \gamma q(u_{3\mathscr {N}}) \right) - q(u)= O \left( \frac{1}{\mathscr {N}^{4/3}} \right) , $$

where the weights \(\alpha ,\beta ,\gamma \) are obtained by solving the linear system

$$ \left( \begin{array}{ccc} 1 &{} 1 &{} 1 \\ 1 &{} \frac{1}{2^{2/3}} &{} \frac{1}{3^{2/3}} \\ 1 &{} \frac{1}{2^{2/3}} &{} \frac{1}{3^{2/3}} \end{array} \right) \left( \begin{array}{c} \alpha \\ \beta \\ \gamma \end{array} \right) = \left( \begin{array}{c}1 \\ 1 \\ 1 \end{array} \right) . $$

In other words, one can obtain a much better convergence rate by linear combinations of a few calculations performed with different values of \(\mathscr {N}\).

Extrapolation methods are very appealing. Unfortunately, the situations where the error on the QOI of interest is known to admit an asymptotic expansion are not so common in the field of electronic structure calculation. An interesting example is a Makov-Payne correction for computing the energy of charge defects in insulators and semiconductors [30]. It has indeed been proved in [31] that the Makov-Payne correction corresponds to the leading term of the asymptotic expansion of the error on the ground-state energy when the discretization parameter is the size L of the supercell.

The second limitation of extrapolation methods based on asymptotic expansions of the error is that they are only efficient for \(\mathscr {N}\) “large enough”. It is usually not clear how to check whether the asymptotic regime has been reached without running a number of calculations with different values of \(\mathscr {N}\) covering a large range and check whether the results match the prediction of the asymptotic expansion.

4.3 Periodic Gross-Pitaevskii and Kohn-Sham Models

  We now turn to the analysis of discretization errors for self-consistent quantum problems. For pedagogical reasons, we will mainly deal with the (relatively simple) Gross-Pitaevskii model, and the existing results on the Kohn-Sham model will only be mentioned. Still, for pedagogical results, we will focus on the periodic versions of these models, and on plane-wave discretization methods.

For simplicity, we assume that the periodic simulation cell is \(\varOmega = (0,2\pi )^d\) (\(d \le 3\)), but all the results below can easily be extended to the generic case of a d-dimensional periodic cell of any shape. The fundamental Hilbert space for periodic Gross-Pitaevskii and Kohn-Sham models is

$$ L^2_{\mathrm {\#}}(\varOmega ) := \left\{ u \in L^2_\mathrm{loc}(\mathbb R^d,\mathbb R) \; | \; u \; 2\pi \mathbb Z^d\text {-periodic} \right\} , \quad \langle u | v \rangle _{L^2_{\mathrm {\#}}}=\int \limits _\varOmega u \, v, $$

where \(L^2_\mathrm{loc}(\mathbb R^d,\mathbb R)\) is the space of locally square-integrable real-valued functions on \(\mathbb R^d\). We will make extensive use of the periodic Sobolev spaces (see e.g. [32])

$$ H^s_{\mathrm {\#}}(\varOmega ) :=\left\{ v = \sum _{{\mathbf {k}} \in {\mathbb {Z}}^{d}} \widehat{v}_{\mathbf {k}} e_{\mathbf {k}}, \; v \text { real valued}, \; \Vert v\Vert _{H^{s}_{\mathrm {\#}}}^2:=\sum _{{\mathbf {k}} \in {\mathbb {Z}}^{d}} \left( 1+|{\mathbf {k}}|^2 \right) ^s |\widehat{v}_{\mathbf {k}}|^2 < \infty \right\} , $$

\(s \in \mathbb R\), where \(e_\mathbf{k}=|\varOmega |^{-1/2} e^{i \mathbf{k}\cdot \mathbf{r}}\) is the Fourier mode with wave-vector \(\mathbf{k}\in \mathbb Z^d\), which, endowed with the inner product

$$ \langle u | v \rangle _{L^2_{\mathrm {\#}}}= \sum _{\mathbf{k}\in \mathbb Z^d} \left( 1+|\mathbf{k}|^2 \right) ^s \overline{\widehat{u}_\mathbf{k}} \, \widehat{v}_\mathbf{k}, $$

are also Hilbert spaces. Note in particular that \(H^0_\#(\varOmega )=L^2_\#(\varOmega )\) and that

$$ H^1_\#(\varOmega )=\left\{ v \in L^2_\mathrm{loc}(\mathbb R^d,\mathbb R) \, | \, \nabla v \in (L^2_\mathrm{loc}(\mathbb R^d,\mathbb R))^d, \; v \; 2\pi \mathbb Z^d\text{-periodic } \right\} . $$

4.3.1 Plane-Wave Discretization of the Gross-Pitaevskii Model

The d-dimensional periodic Gross-Pitaevskii model is defined as

$$\begin{aligned} I = \inf \left\{ E(v), \; v \in H^1_\#(\varOmega ), \; \int \limits _\varOmega v^2 = 1 \right\} , \end{aligned}$$
(4.12)

where the Gross-Pitaevskii energy functional is given by

$$ E(v) = \int \limits _\varOmega |\nabla v|^2 + \int \limits _\varOmega V v^2 + \frac{\mu }{2} \int \limits _\varOmega v^4. $$

Here, the trapping potential V is a \(2\pi \mathbb Z^d\)-periodic real-valued continuous function, and the mean-field interaction parameter \(\mu \) is chosen positive (repulsive interaction). The mathematical properties of the minimization problem (4.12) are well-understood:

  • Equation (4.12) has exactly two minimizers u (with \(u > 0\) in \(\varOmega \)) and \(-u\);

  • There exists a unique real number \(\lambda \in \mathbb R\) such that \((\lambda ,u)\) satisfies the nonlinear Schrödinger equation

    $$\begin{aligned} - \varDelta u + V u + \mu u^3 = \lambda u, \qquad \Vert u\Vert _{L^2_\#} = 1. \end{aligned}$$
    (4.13)

    Physically, \(\lambda \) is the chemical potential of the condensate. Mathematically, it is the Lagrange multiplier of the equality constraint \(\int _\varOmega v^2=1\) in (4.12);

  • \(\lambda \) is the lowest eigenvalue of the self-consistent Hamiltonian

    $$ H_u=-\varDelta + V + \mu u^2. $$

We refer to the appendix of [4] for detailed proofs of these standard results.

In plane-wave discretization methods, the approximation spaces are defined as

$$ \displaystyle X_{\mathscr {N}} = \left\{ v_{\mathscr {N}} = \sum _{|{\mathbf {k}}| \le {\mathscr {N}}} \widehat{v}_{\mathbf {k}} e_{\mathbf {k}}, \; v_{\mathscr {N}} \text { real valued} \right\} , $$

where \(\mathscr {N}\) is the cut-off parameter.  The Galerkin approximation of (4.12) in \(X_\mathscr {N}\) consists in searching \(u_\mathscr {N}\in X_\mathscr {N}\) satisfying the constraint \(\int _\varOmega |u_\mathscr {N}|^2 = 1\), and such that

$$\begin{aligned} I_\mathscr {N}= E(u_\mathscr {N})=\inf \left\{ E(v_\mathscr {N}), \; v_\mathscr {N}\in X_\mathscr {N}, \; \int \limits _\varOmega |v_\mathscr {N}|^2 = 1 \right\} , \quad (u_\mathscr {N},1)_{L^2_\#} \ge 0. \end{aligned}$$
(4.14)

The additional requirement \((u_\mathscr {N},1)_{L^2_\#} \ge 0\) ensures that \(u_\mathscr {N}\) approximates the positive solution u to (4.12) (and not the other solution, \(-u\)).

Relying on the fact that the operator \(-\varDelta \) commutes with the projection operator \(\varPi _\mathscr {N}\), we obtain that the function \(u_\mathscr {N}\) satisfies the Euler-Lagrange equation

$$\begin{aligned} -\varDelta u_\mathscr {N}+ \varPi _\mathscr {N}(V+\mu u_\mathscr {N}^2) \varPi _\mathscr {N}u_\mathscr {N}= \lambda _\mathscr {N}u_\mathscr {N}, \end{aligned}$$
(4.15)

where \(\lambda _\mathscr {N}\) is the Lagrange multiplier of the \(L^2_\#\)-normalization constraint. It can be shown that, except perhaps for very small values of \(\mathscr {N}\), \(\lambda _\mathscr {N}\) is the lowest eigenvalue of the operator \(-\varDelta + \varPi _\mathscr {N}(V+\mu u_\mathscr {N}^2) \varPi _\mathscr {N}\) on \(L^2_\#(\varOmega )\).

From a geometrical point of view, the situation is as depicted in Fig. 4.1. The positive solution u to (4.12) is not in general in the approximation space \(X_\mathscr {N}\). The best approximation of u in \(X_\mathscr {N}\) for a given norm \(\Vert \cdot \Vert _{H^s_\#}\) is the orthogonal projection of u on \(X_\mathscr {N}\) for the inner product of \(H^s_\#\). An interesting property is that this orthogonal projector is independent of s: it is simply the Fourier truncation operator \(\varPi _\mathscr {N}\) defined by

$$ \varPi _{\mathscr {N}} \left( \sum _{{\mathbf {k}} \in {\mathbb Z}^{d}} \widehat{v}_{\mathbf {k}} e_{\mathbf {k}} \right) = \sum _{|\mathbf{k}| \le \mathscr {N}} \widehat{v}_{\mathbf {k}} e_{\mathbf {k}}. $$

Indeed, for all \(s \in \mathbb R\) and all \(\mathscr {N}\in \mathbb N\), \(X_\mathscr {N}\subset H^s_\#(\varOmega )\), and for all \(v \in H^s_\#(\varOmega )\),

$$\begin{aligned}&\varPi _\mathscr {N}v \in X_\mathscr {N}, \\&\Vert v-\varPi _\mathscr {N}v\Vert _{H^s_\#} = \min _{w_\mathscr {N}\in X_\mathscr {N}} \Vert v-w_\mathscr {N}\Vert _{H^s_\#} = \left( \sum _{|{\mathbf {k}}| > \mathscr {N}} \left( 1+|{\mathbf {k}}|^2 \right) ^s|\widehat{v}_{\mathbf {k}}|^2 \right) ^{1/2} . \end{aligned}$$

Note that the Galerkin approximation \(u_\mathscr {N}\) of u obtained by solving (4.14) is not the best approximation \(\varPi _\mathscr {N}u\) of u in \(X_\mathscr {N}\) (see Fig. 4.1). The best we can hope for is that \(u_\mathscr {N}\) will be close to \(\varPi _\mathscr {N}u\) for the various norms of interest.

Fig. 4.1
figure 1

Graphical representation of the best approximation \(\varPi _\mathscr {N}u\) in the discretization space \(X_\mathscr {N}\) of the exact solution u to (4.12), and of the approximation \(u_\mathscr {N}\) obtained by the variational method (4.14)

4.3.2 A Priori Error Analysis

The following result has been proved in [4]. It is an extension of classical results for linear eigenvalue problems (see [33] and references therein) to the nonlinear setting of the Gross-Pitaevskii model. The case of Kohn-Sham LDA models is dealt with in [3] for PW discretizations and in [7] for other systematically improvable discretization methods.

Theorem 4.1

Let u be the unique positive minimizer of (4.12) and \(u_\mathscr {N}\) a minimizer of (4.14), which is unique for \(\mathscr {N}\) large enough . Then, there exists \(0< c \le C < \infty \) such that for all \(\mathscr {N}\in \mathbb N\),

$$\begin{aligned} \Vert u-\varPi _\mathscr {N}u \Vert _{H^1_\#} \le \Vert u-u_\mathscr {N}\Vert _{H^1_\#} \le C \Vert u-\varPi _\mathscr {N}u \Vert _{H^1_\#} \mathop {\longrightarrow }_{\mathscr {N}\rightarrow 0} 0, \end{aligned}$$
(4.16)
$$\begin{aligned} c \Vert u-u_\mathscr {N}\Vert _{H^1_\#}^2 \le I_N-I = E(u_\mathscr {N})-E(u) \le C \Vert u-u_\mathscr {N}\Vert _{H^1_\#}^2. \end{aligned}$$
(4.17)

Assume that \(V \in H^\sigma _\#(\varOmega )\) for some \(\sigma > d/2\). Then,

  • \((u_\mathscr {N})_{\mathscr {N}\in \mathbb N}\) converges to u in \(H^{\sigma +2}_\#(\varOmega )\);

  • there exists positive constants C and \(C_s\) such that

    $$\begin{aligned} \!\! \forall -\sigma \le s < \sigma +2, \quad \Vert u-u_\mathscr {N}\Vert _{H^s_\#} \le \frac{C_s}{\mathscr {N}^{\sigma +2-s}}, \quad |\lambda -\lambda _\mathscr {N}| \le \frac{C}{\mathscr {N}^{2(\sigma +1)}}. \end{aligned}$$
    (4.18)

According to estimate (4.17), the error on the ground-state energy behaves as the square of the \(H^1\)-norm of the error on the eigenfunction, and according to estimate (4.16), the latter goes to zero when \(\mathscr {N}\) goes to infinity.

If, in addition, the external periodic potential V is regular enough, more precisely if V belongs to the Sobolev space \(H^\sigma _\#(\varOmega )\) for some \(\sigma > d/2\), then (4.18) provides optimal a priori convergence rates for both the Lagrange multiplier \(\lambda \) and the ground-state eigenfunction (the optimality has been checked numerically [4]). The estimates \(\Vert u-u_\mathscr {N}\Vert _{H^s_\#} \le \frac{C_s}{\mathscr {N}^{\sigma +2-s}}\) are valid for the whole hierarchy of Sobolev spaces \(H^s_\#(\varOmega )\), \(-\sigma \le s < \sigma +2\), and therefore allow one to derive optimal convergence rates for any differentiable observable \(q: H^s(\varOmega ) \rightarrow \mathbb R\) with \(-\sigma \le s < \sigma +2\). For instance, the value of the ground-state density at some point \(\mathbf{r}_0 \in \mathbb R^d\) is defined as \(q_{\mathbf{r}_0}(u)=u(\mathbf{r}_0)^2\). Using Sobolev embedding theorems (see e.g. [32]), we obtain that \(q_{\mathbf{r}_0}\) is a differentiable functional on the Sobolev space \(H^s_\#(\varOmega )\) for all \(s > d/2\). It follows that for all \(s < \sigma +2-d/2\), there exists \(C_s \in \mathbb R_+\) such that for all \(\mathscr {N}\),

$$ |u_\mathscr {N}(\mathbf{r}_0)^2 - u(\mathbf{r}_0)^2| = |q_{\mathbf{r}_0}(u_\mathscr {N})-q_{\mathbf{r}_0}(u)| \le \frac{C_s}{\mathscr {N}^{s}}. $$

In addition to providing optimal convergence rates for various QOI, a priori error estimates can also be used to design computational cost reduction methods based on neglecting terms with higher convergence rates. For instance, two-grid methods consist in finding in a first stage a solution \(u_n\) to the full problem in a coarse variational space \(X_n\), and in a second stage a solution \(u_{n,\mathscr {N}}\) to a simpler problem parameterized by \(u_n\) in a finer approximation space \(X_\mathscr {N}\) (see Fig. 4.2). For a well-chosen value of n, it is possible to obtain in this way, at a much lower cost, the same accuracy as if the full problem had been solved in \(X_\mathscr {N}\). These methods were introduced by Xu and Zhou to solve nonlinear elliptic problems [34], then adapted to linear eigenvalue problems in [35, 36], and to nonlinear eigenvalue problems in [37].

Fig. 4.2
figure 2

Graphical representation of the two-grid method

Indeed, solving (4.14) in a fine approximation space \(X_\mathscr {N}\) is costly since it requires about \(\sim \mathscr {K}\mathscr {N}^d \ln \mathscr {N}\) elementary operations, where \(\mathscr {K}\) is a constant related to the structure of problem (4.14). In the two-grid method,

  1. 1.

    \(u_n\) is computed by solving the full problem on the coarse approximation space \(X_n\), \(n \ll \mathscr {N}\), which requires \(\sim \mathscr {K}n^d \ln n\) elementary operations,

  2. 2.

    \(u_{n,\mathscr {N}}\) can be computed in \(\sim \kappa \mathscr {N}^d \ln \mathscr {N}\) elementary operations with \(\kappa \ll \mathscr {K}\) since the problem to be solved is much simpler.

Typically, in the present case, the simpler problem can be (i) a linear eigenvalue problem obtained by freezing the mean-field potential to \(V+\mu u_n^2\), or (ii) the linear system

$$ -\varDelta v + (V+\mu u_n^2) v = \lambda _nu_n. $$

Both strategies have been tested numerically and are quite efficient [37]. Using the a priori error estimators (4.18), the following theoretical justification of the efficiency of the first strategy can be given.

Theorem 4.2

Assume that \(V \in H^\sigma _\#(\varOmega )\) for some \(\sigma > d/2\). Let \(u_n\) be a solution to (4.14) in a coarse approximation space \(X_n\) and \(u_{n,\mathscr {N}}\) the variational approximation in \(X_\mathscr {N}\) to the ground state of the linear eigenvalue problem

$$ -\varDelta v + (V+\mu u_n^2) v = \mu v, \quad \Vert v\Vert _{L^2_\#}=1. $$

Then, there exists \(C \in \mathbb R_+\) such that for all n and \(\mathscr {N}\) with \(n \le \mathscr {N}\),

$$\begin{aligned} \Vert u_{n,\mathscr {N}}-u\Vert _{H^1_\#}\le & {} C \left( n^{-\sigma -3}+ \mathscr {N}^{-\sigma -1} \right) , \\ 0 \le E(u_{n,\mathscr {N}})-E(u)\le & {} C \left( n^{-\sigma -3}+ \mathscr {N}^{-\sigma -1} \right) ^2. \end{aligned}$$

Choosing \(n \sim \mathscr {N}^{\frac{\sigma +1}{\sigma +3}}\) in order to balance the error contributions in the right-hand sides of the above inequalities, we obtain same convergence rates as in Theorem 4.1:

$$\begin{aligned}&\Vert u_{n,\mathscr {N}}-u\Vert _{H^1_\#} \le C \mathscr {N}^{-(\sigma +1)}, \qquad&0 \le E(u_{n,\mathscr {N}})-E(u) \le C \mathscr {N}^{-2(\sigma +1)},\\&\Vert u_\mathscr {N}-u\Vert _{H^1_\#} \le C \mathscr {N}^{-(\sigma +1)}, \qquad&0 \le E(u_{\mathscr {N}})-E(u) \le C \mathscr {N}^{-2(\sigma +1)}, \end{aligned}$$

with a significant gain in CPU time (see the numerical results in [37]).

4.3.3 A Posteriori Error Analysis and Post-Processing

A posteriori error analysis for linear elliptic eigenvalue problems has been the matter of a large number of numerical analysis papers (see [38, 39] and references therein). It turns out that even the simple case of the Laplace operator on a bounded polyhedral domain with Dirichlet boundary conditions is quite challenging (see [38, 40] and references therein). The case of linear and nonlinear Schrödinger operator has been considered in [10, 41,42,43,44,45] (see also the references therein and the appendix in [46]), leading to adaptive discretization procedures with optimal complexity [47,48,49]. Some results regarding Hartree-Fock and Kohn-Sham models have also been established [12, 15, 22].

As far as PW discretizations of Gross-Pitaevskii models are concerned, post-processing methods can be obtained by a non-standard application of Rayleigh-Schrödinger perturbation theory (RSPT).

Recall that if we have at hand a simple eigenmode \((E_0,\psi _0)\) of a reference Hamiltonian \(H_0\) on \(L^2_\#(\varOmega )\):

$$\begin{aligned} H_0 \psi _0=E_0, \quad \Vert \psi _0\Vert _{L^2_\#}=1, \end{aligned}$$
(4.19)

and if W is a small perturbation of \(H_0\) (small in a sense made precise by Kato [50]), then the perturbed Hamiltonian \(H=H_0+W\) has a unique eigenvalue E in the vicinity of \(E_0\), which is simple. Using first-order perturbation for the eigenvector and second-order perturbation for the eigenvalue, we obtain

$$\begin{aligned} H\psi =E\psi , \;\; \text{ with } \;\;&\psi \simeq \psi _0-\varPi _{\psi _0^\perp } (H_0-E_0)|_{\psi _0^\perp }^{-1} \varPi _{\psi _0^\perp }(W\psi _0), \\&E \simeq E_0+\langle \psi _0|W|\psi _0\rangle -\langle \varPi _{\psi _0^\perp }(W\psi _0)| (H_0-E_0)|_{\psi _0^\perp }^{-1}| \varPi _{\psi _0^\perp }(W\psi _0)\rangle , \end{aligned}$$

where \(\varPi _{\psi _0^\perp }\) is the orthogonal projector on the space

$$ \psi _0^\perp =\left\{ \phi \in L^2_\#(\varOmega ) \; | \; \langle \psi _0|\phi \rangle = \int \limits _\varOmega \psi _0 \phi =0 \right\} , $$

for the \(L^2_\#(\varOmega )\) inner product, and where \((H_0-E_0)|_{\psi _0^\perp }^{-1}\) is the inverse of the restriction of the operator \(H_0-E_0\) to the invariant space \(\psi _0^\perp \) (this operator is invertible since \(E_0\) is simple).

As shown in [51], RSPT can be used to derive a posteriori error estimators. The idea is to consider the Euler-Lagrange equation of the variational approximation of (4.12) in \(X_n\), i.e.

$$\begin{aligned} -\varDelta u_n + \varPi _{n} (V+\mu u_n^2) \varPi _{n} u_n = \lambda _n u_n, \end{aligned}$$
(4.20)

as the unperturbed eigenvalue problem, and the Euler-Lagrange equation of (4.12), i.e.

$$\begin{aligned} -\varDelta u + (V+\mu u^2) u = \lambda u \end{aligned}$$
(4.21)

as the perturbed eigenvalue problem. In other words, we take

$$\begin{aligned}&H_0 = -\varDelta + \varPi _{n} (V+\mu u_n^2) \varPi _{n} , \quad \psi _0=u_n, \quad E_0=\lambda _n, \\&W= (V+\mu u^2) -\varPi _{n} (V+\mu u_n^2) \varPi _{n}. \end{aligned}$$

Note that \(\langle \psi _0|W|\psi _0\rangle =0\): the first-order correction to the eigenvalue vanishes; this is the reason why we need to consider the second-order correction of the eigenvalue. We then notice that since both \(u_n\) and \(\varDelta u_n\) belong to \(X_n\), (4.20) also reads

$$ \varPi _{n}\left( -\varDelta u_n + (V+\mu u_n^2)u_n - \lambda _n u_n\right) =0, $$

which means that the residual \(r_n:= -\varDelta u_n + (V+\mu u_n^2)u_n - \lambda _n u_n\) is in \(X_n^\perp \). Since \(u_n \in X_{n}\) and \(r_n \in X_{n}^\perp \), this implies that

$$ \varPi _{u_n^\perp } r_n =r_n, $$

It follows that

$$\begin{aligned} \varPi _{\psi _0^\perp }(W\psi _0)&=\varPi _{u_n^\perp } \left( \left( (V+\mu u_n)^2) -\varPi _n (V+\mu u_n^2) \varPi _n\right) u_n\right) \\&=\varPi _{u_n^\perp } \left( r_n+\mu (u^2-u_n^2)u_n\right) \\&= r_n+ \mu \varPi _{u_n^\perp }\left( (u^2-u_n^2)u_n\right) . \end{aligned}$$

Next, we observe that the block representation of \(H_0\) associated with the decomposition \(L^2_\#(\varOmega )=X_{n}\oplus X_{n}^\perp \) reads

$$ H_0 = \left( \begin{array}{c|c} -\varDelta |_{X_n} + \varPi _{n} (V+\mu u_n^2) \varPi _{n} &{} 0 \\ \hline 0 &{} -\varDelta _{X_{n}^\perp } \end{array} \right) . $$

As a consequence,

$$ (H_0-E_0)|_{\psi _0^\perp }^{-1} \varPi _{\psi _0^\perp }(W\psi _0)= u_n^{(1)} +u_n^{(2)}, $$

with

$$\begin{aligned} u_n^{(1)}&=(-\varDelta -\lambda _n)|_{X_{n}^\perp }^{-1}r_n, \\ u_n^{(2)}&=\mu \left( -\varDelta + V+\mu u_n^2 - \lambda _n \right) |_{u_n^\perp }^{-1}\varPi _{u_n^\perp }\left( (u^2-u_n^2)u_n\right) . \end{aligned}$$

Since in PW calculations, functions are stored as vectors of Fourier coefficients, computing a very accurate approximation \(u_{n,\mathscr {N}}^{(1)}\) of \(u_n^{(1)}\) in a very fine discretization space \(X_\mathscr {N}\) with \(\mathscr {N}\gg n\) is easy. On the other hand, it can be shown using the a priori error estimates in Theorem 4.1 that \(\Vert u_n^{(2)}\Vert _{H^1_\#}\) is much smaller than \(\Vert u_n^{(1)}\Vert _{H^1_\#}\). Introducing

$$ \widetilde{u}_n = u_n + u_n^{1} \qquad \text{ and } \qquad \widetilde{\lambda }_n = \lambda _n + (u_n^{1},Wu_n)_{L^2_\#}, $$

we have

$$ \Vert u-\widetilde{u}_n\Vert _{H^1_\#} \le C n^{-2} \Vert u-u_n\Vert _{H^1_\#} \quad \text{ and } \quad |\lambda -\widetilde{\lambda }_n| \le C n^{-2} |\lambda -\lambda _n|, $$

for a constant \(C \in \mathbb R_+\) independent of n. For large enough values of n and for \(\mathscr {N}\gg n\), \(\widetilde{u}_{n,\mathscr {N}}=u_n+\widetilde{u}_{n,\mathscr {N}}^{(1)}\) therefore represent a much better approximation of u than \(u_n\).

We refer to [4] for an application of this technique to Kohn-Sham LDA models.

4.3.4 Error Balancing

  As mentioned in the introduction, discretization error is only one of the various components of the overall error. In this section, we give an example of a numerical scheme automatically balancing discretization and algorithmic error for the Gross-Pitaevskii model. Still for pedagogical reasons, we consider the simplest possible self-consistent algorithm for solving the Gross-Pitaevskii equation, defined at the continuous level by

$$ \left\{ \begin{array}{l} - \varDelta {v_{k}} + V {v_k} + \mu {v}_{{k-1}}^2 {v_k} = {\lambda _k} {v_k} , \quad {v_k} \in H^1_\#(\varOmega ), \quad \Vert {v_k}\Vert _{L^2_\#} = 1, \quad (v_k,1)_{L^2_\#} \ge 0, \\ \\ {\lambda _k} \text{ lowest } \text{ eigenvalue } \text{ of } -\varDelta + V + \mu {v}_{{k-1}}^2. \end{array} \right. $$

The initial guess \(v_0\) can be chosen, for example, as a normalized ground state of the operator \(-\varDelta + V\) for small values of \(\mu \), and as the Thomas-Fermi approximation of the ground state for large values of \(\mu \), but many other choices are possible. In this algorithm, the iterate \(v_k\) is the \(L^2_\#\)-normalized positive ground-state (in the weak sense \((v_k,1)_{L^2_\#} \ge 0\)) of the mean-field operator \(-\varDelta + V + \mu {v}_{{k-1}}^2\) constructed from the previous iterate \(v_{k-1}\). In the Hartree-Fock and Kohn-Sham frameworks, this algorithm is referred to as the Roothaan algorithm, and has been analyzed from a mathematical point of view in [52, 53]. It is known in particular that the sequence \((v_k)_{k \ge 0}\)

  • either converges to the unique positive solution u to the Gross-Pitaevskii equation (4.13);

  • or oscillates between two states in the sense that there exist two functions \(v_\mathrm{e}\) and \(v_\mathrm{o}\) in \(H^1_\#(\varOmega )\), with \(v_\mathrm{e} \ne v_\mathrm{o}\) such that

    $$\begin{aligned}&-\varDelta v_\mathrm{e} + V v_\mathrm{e}+\mu v_\mathrm{o}^2 v_\mathrm{e} = \lambda _\mathrm{e} v_\mathrm{e}, \quad \Vert v_\mathrm{e}\Vert _{L^2_\#}=1, \quad (v_\mathrm{e},1)_{L^2_\#}\ge 0, \\&-\varDelta v_\mathrm{o} + V v_\mathrm{o}+\mu v_\mathrm{e}^2 v_\mathrm{o} = \lambda _\mathrm{o} v_\mathrm{o}, \quad \Vert v_\mathrm{o}\Vert _{L^2_\#}=1, \quad (v_\mathrm{o},1)_{L^2_\#}\ge 0, \end{aligned}$$

    and

    $$ v_{2k} \mathop {\longrightarrow }_{k \rightarrow \infty } v_\mathrm{e}, \qquad v_{2k+1} \mathop {\longrightarrow }_{k \rightarrow \infty } v_\mathrm{o} \quad \text{ in } H^1_\#(\varOmega ). $$

Typically, \((v_k)_{k \ge 0}\) converges if \(\mu \) is small and oscillates if \(\mu \) is large. Clearly, this is not an efficient way to solve the Gross-Pitaevskii equation. We focus on this algorithm for pedagogical reasons only, because it is easier to analyse. Note that the oscillatory behaviour can be suppressed by using an optimal damping algorithm [54]. At a discrete level, it is recommended to solve (4.14) using a preconditioned nonlinear conjugate gradient algorithm [55].

The following scheme is a discretized version of the basic self-consistent field algorithm, in which the discretization space depends on k (compare with (4.15)):

$$ \left\{ \begin{array}{l} - \varDelta {v_k} + \varPi _{n_k} \left( V + \mu {v}_{{k-1}}^2 \right) {\varPi }_{n_k} {v_k} = {\lambda _k} \; {v_k} , \quad {v_k} \in X_{n_k}, \quad \Vert {v_k}\Vert _{L^2_\#} = 1, \quad (v_{k,1})_{L^{2}_{\mathrm {\#}}} \ge 0, \\ \\ \lambda _k=\lambda _{v_{k-1},n_k}, \quad \text {where } \lambda _{v,n} \text { is the lowest eigenvalue of } -\varDelta +{\varPi }_{n} \left( V + \mu {v}^2 \right) {\varPi }_{n}. \end{array} \right. $$

Intuitively, it is indeed inefficient to compute the first iterates in a very fine discretization space since we are far from convergence. It, therefore, makes sense to increase the size of the discretization space along the iterations when getting closer to the exact solution u. To automatize this process, we need to define a criterion allowing the algorithm to decide when to refine to discretization space. For this purpose, we use the following result [51]:

Proposition 4.1

Let u be the unique positive minimizer to (4.12). Let J be the error criterion defined by

$$ \forall v \in H^1_\#(\varOmega ) \text { such that } \Vert v\Vert _{L^2_\mathrm {\#}}=1, \quad J(v) = E(v)-E(u) + \frac{1}{2} \Vert v -u \Vert _{L^2}^2. $$

Let \(v_k \in X_{n_k}\) be the kth iterate of the above algorithm. Then, we have

$$ 0 \le J(v_k) \le \; \eta _{\mathrm{d},k}+ \eta _{\mathrm{a},k} , $$

where the discretization and algorithmic error estimators \(\eta _{\mathrm{d},k}\) and \(\eta _{\mathrm{a},k}\) are, respectively, defined by

$$ \eta _{\mathrm{d},k} {=}\frac{1}{2} \left( {\lambda _{v_k,n_k}} - { \lambda _{v_k,\infty }} \right) \ge 0, \quad \eta _{\mathrm{a},k}=\frac{1}{2} \left( \mu \int \limits _\varOmega ({v}_{{k}}^2-{v}_{{k-1}}^2){v}_{{k}}^2 + {\lambda _k} -{\lambda _{v_k,n_k}} \right) \ge 0. $$

We see that \(\eta _{\mathrm{d},k}=0\) if \(n_k=\infty \), that is, if the problem at iteration k has been solved in the whole space \(H^1_\#(\varOmega )\) (no discretization error), and that \(\eta _{\mathrm{a},k}=0\) if \(v_{k-1}=v_k\), that is if the SCF iteration has converged in the discretization space \(X_{n_k}\) (no algorithmic error). The numerical experiments reported in [51] show that, in practice, the inequalities

$$ E(v_k)-E(u) \le J(v_k) \le \; \eta _{\mathrm{d},k}+\eta _{\mathrm{a},k} $$

are almost equalities; this observation can be theoretically justified in the asymptotic regime using a priori error analysis results. As a consequence, \(\eta _{\mathrm{d},k}+\eta _{\mathrm{a},k}\) gives an accurate estimate of the energy error \(E(v_k)-E(u)\), which is split into a discretization error and an algorithmic error. A natural strategy to reach a desired accuracy \(\varepsilon \) in an optimal way from a computational point of view then consists in refining the discretization if \(\eta _{\mathrm{d},k} \gg \eta _{\mathrm{a},k}\), and in iterating otherwise in the same discretization space \(X_{n_k}\), until \(\eta _{\mathrm{d},k} + \eta _{\mathrm{a},k} \le \varepsilon \).

Note that at iteration k, \(v_{k-1}\), \(v_k\) and \(\lambda _k\) are known, but not \(\lambda _{v_k,n_k}\), whose computation would require solving another eigenvalue problem in \(X_{n_k}\), nor a fortiori \(\lambda _{v_k,\infty }\), which is out of reach of numerical methods. It is, therefore, not possible to compute exactly \(\eta _{\mathrm{d},k}\). On the other hand, it is possible to obtain very accurate approximations of all these numbers by adapting the approach based on Rayleigh-Schrödinger perturbation theory detailed in the previous section.

In conclusion, \(\eta _{\mathrm{d},k}\) and \(\eta _{\mathrm{a},k}\) therefore provide relatively cheap and sharp estimators of the discretization and algorithmic errors at iteration k if the quantity of interest is the energy, allowing adaptive error balancing.

We refer to [4] for an extension of this approach to the periodic Kohn-Sham LDA setting.

4.4 Error Cancellation Phenomenon

In many applications in physics, chemistry, materials science, and biology, energy differences are far more important than absolute energies. Consider for instance a simple chemical reaction that can be modelled as a transition from a local minimum of the ground-state Born-Oppenheimer potential energy surface (GS-BO-PES)—the reactants—to another local minimum of the GS-BO-PES—the products—through a well-defined saddle point—the transition state (see Fig. 4.3). According to Arrhenius law, the reaction rate is given by the relation

$$ k = \nu _0 \; \exp \left( -E_\mathrm{a}/k_\mathrm{B}T \right) , $$

where \(\nu _0\) is a prefactor, \(E_\mathrm{a}\) the activation energy, that is, the difference between the energy \(E_\mathrm{ts}\) of the transition state and the energy \(E_\mathrm{re}\) of the reactants (see Fig. 4.3), \(k_\mathrm{B}\) the Boltzmann constant, and T the temperature. The relevant QOI, therefore, is the energy difference \(E_\mathrm{ts}-E_\mathrm{re}\), and not each of the energies \(E_\mathrm{ts}\) and \(E_\mathrm{re}\). The same is true for the reaction energy, defined as the energy difference \(E_\mathrm{pr}-E_\mathrm{re}\), where \(E_\mathrm{pr}\) is the energy of the products.

Fig. 4.3
figure 3

Sketch of a chemical reaction taking place on the ground-state potential energy surface. The activation energy \(E_\mathrm{a}\) of the reaction is the difference between the energy of the reactants and that of the transition state

Considering two configurations \(R_1\) and \(R_2\) of the system, our goal is to estimate the error

$$ \underbrace{(E_{R_1,\mathscr {N}}-E_{R_2,\mathscr {N}})}_{\text {computable quantity}} - \underbrace{(E_{R_1}-E_{R_2})}_{\text {quantity of interest}} $$

where \(E_{R_j}\) is the exact ground-state energy for the configuration \(R_j\) and \(E_{R_j,\mathscr {N}}\) its variational approximation in the discretization space \(\mathscr {X}_\mathscr {N}\).

It has been observed both in quantum chemistry and computational materials science, that in general,

$$ |(E_{R_1,\mathscr {N}}-E_{R_2,\mathscr {N}}) - (E_{R_1}-E_{R_2}) | \ll |E_{R_1,\mathscr {N}}-E_{R_1}|+|E_{R_2,\mathscr {N}}-E_{R_2}|. $$

In other words, the error on the energy difference between two configurations is usually much lower than the error on the energy of each configuration, typically by one, sometimes two orders of magnitude. This is the so-called error cancellation phenomenon. Chemists and physicists heavily rely on this phenomenon: obtaining an accuracy of 1 kcal/mol (or 1 meV) on an energy difference turns out to be much cheaper in terms of computational effort than obtaining a similar accuracy on a single point energy.

Fig. 4.4
figure 4

Two different configurations of a system composed of 2 oxygen atoms and 4 hydrogen atoms corresponding to the reactants (left) and the products (right) of the chemical reaction (4.22)

As a matter of example, consider the two different configurations of a system composed of 2 oxygen atoms and 4 hydrogen atoms corresponding, respectively, to the reactants and the products of the chemical reaction (Fig. 4.4).

$$\begin{aligned} {\quad 2 H_2 + O_2 \; \longrightarrow \; 2 H_2O.} \end{aligned}$$
(4.22)

The sum and difference of the energy errors

$$\begin{aligned} S_\mathscr {N}:&=(E_{\mathrm{reactants},\mathscr {N}}-E_\mathrm{reactants})+ (E_{\mathrm{products},\mathscr {N}}-E_\mathrm{products}), \end{aligned}$$
(4.23)
$$\begin{aligned} D_\mathscr {N}:&=|(E_{\mathrm{reactants},\mathscr {N}}-E_\mathrm{reactants})- (E_{\mathrm{products},\mathscr {N}}-E_\mathrm{products})| \\&= |\underbrace{(E_{\mathrm{reactants},\mathscr {N}}-E_{\mathrm{product},\mathscr {N}})}_{\mathrm{computed \; value \; of \; the \; QOI}} - \underbrace{(E_{\mathrm{reactants}}-E_\mathrm{products})}_{\mathrm{exact \; value \; of \; the \; QOI}}|, \nonumber \end{aligned}$$
(4.24)

for PW Kohn-Sham LDA calculations with Troullier-Martins pseudopotentials as a function of the energy cut-off \(E_\mathscr {N}=\frac{1}{2} \mathscr {N}^2\) are plotted on Fig. 4.5 (top). It can be observed that \(D_\mathscr {N}\) is indeed smaller than \(S_\mathscr {N}\) by about two orders of magnitude. In addition, the non-dimensional error cancellation factor

$$\begin{aligned} 0 \le Q_\mathscr {N}:=\frac{D_\mathscr {N}}{S_\mathscr {N}} \le 1 \end{aligned}$$
(4.25)

fluctuates about a value close to \(Q_\infty \simeq 5 \times 10^{-3}\).

Fig. 4.5
figure 5

Observed convergence of the quantities \(S_\mathscr {N}\) and \(D_\mathscr {N}\) defined by (4.23)–(4.24) as a function of \(\mathscr {N}\), and behaviour of the ratio \(Q_\mathscr {N}\) for the two configurations represented on Fig. 4.4

In order to unravel the origin of the discretization error cancellation phenomenon, let us consider a simple linear 1D model for which explicit calculations can be carried out [1]. In this model, the external potential is periodic (with period \(a=1\)) and is a sum of Dirac masses:

$$ V_{\mathrm{ext},R} =- \sum _{m \in \mathbb Z} z_1 \delta _m - \sum _{m \in \mathbb Z} z_2 \delta _{m+R}. $$

For given values of the charges \(z_1\) and \(z_2\), the configurations are labelled by \(R \in (0,1)\). The ground-state energy and wave-function are obtained by computing the lowest eigenvalue and an associated normalized eigenfunction of the 1D periodic Schrödinger equation

$$\begin{aligned}&\left( -\frac{\mathrm{d}^2}{\mathrm{d}x^2} - \sum _{m \in \mathbb Z} z_1 \delta _m - \sum _{m \in \mathbb Z} z_2 \delta _{m+R} \right) \psi _R = E_R \psi _R \quad \text{ in } L^2_\mathrm{per}(0,1), \\&\int \limits _0^1 \psi _R^2(x) \mathrm{d}x = 1. \nonumber \end{aligned}$$
(4.26)

As a matter of illustration, the ground-state wave-function for \(z_1=1\), \(z_2=0.5\) and \(R=0.4\) is plotted on Fig. 4.6.

Fig. 4.6
figure 6

Exact ground-state wave-function of (4.26) for \(z_1=1\), \(z_2=0.5\) and \(R=0.4\)

The following result is proved in [1].

Theorem 4.3

Let \(z_1,z_2>0\) and \(R \in (0,1)\), let \(E_R\) be the ground-state energy of (4.26), and \(E_{R,\mathscr {N}}\) the variational approximation of \(E_R\) in the Fourier approximation space

$$ \text{ Span } \left\{ e^{2i\pi kx}, \; k \in \mathbb Z, \; |k|\le \mathscr {N}\right\} . $$

Then, we have the asymptotic expansion

$$\begin{aligned} E_{R,\mathscr {N}}-E_R = \frac{\alpha _R}{\mathscr {N}}- \frac{\alpha _R}{2\mathscr {N}^2} +\frac{\beta ^{(1)}_{R,\mathscr {N}}}{\mathscr {N}} + \frac{\gamma _R}{\mathscr {N}} \eta _{R,\mathscr {N}}+ o\left( \frac{1}{\mathscr {N}^{3-\varepsilon }}\right) , \end{aligned}$$
(4.27)

where

$$\begin{aligned} \alpha _R:= \frac{z_1^2 u_R(0)^2 + z_2^2 u_R(R)^2}{2\pi ^2}, \end{aligned}$$
(4.28)
$$ \gamma _R:= \frac{z_1z_2 u_{R}(0)u_R(R)}{\pi ^2}, \qquad \eta _{R,\mathscr {N}}:= \mathscr {N}\sum _{k=\mathscr {N}+1}^{+\infty } \frac{\cos (2\pi k R)}{k^2}, $$
$$ \beta ^{(1)}_{R,\mathscr {N}} := \frac{z_1^2 u_R(0) (u_{R,\mathscr {N}}(0)-u_R(0)) + z_2^2 u_R(R) (u_{R,\mathscr {N}}(R)-u_R(R))}{2\pi ^2}. $$

In addition

$$ |\eta _{R,\mathscr {N}}| \le \min \left( 1, \frac{2+\frac{\pi ^3}{8}}{|\sin (\pi R)| \mathscr {N}} \right) \quad \text{ and } \quad \forall \varepsilon >0, \; \exists C_\varepsilon \in \mathbb C_+ \; \text{ s.t. } \; |\beta ^{(1)}_{R,\mathscr {N}}| \le \frac{C_\varepsilon }{\mathscr {N}^{1-\varepsilon }}. $$

This result sheds light on the mechanism of discretization error cancellation for the PW discretization of (4.26). First, it implies that errors on energies and errors on energy differences all scale in \(N^{-1}\), and that discretization error cancellation is a matter of prefactors:

$$ E_{R,\mathscr {N}}-E_R \mathop {\sim }_{\mathscr {N}\rightarrow \infty } \frac{\alpha _R}{\mathscr {N}}\quad \text{ while } \quad (E_{R_2,\mathscr {N}}-E_{R_1,\mathscr {N}})-(E_{R_2}-E_{R_1}) \mathop {\sim }_{\mathscr {N}\rightarrow \infty } \frac{\alpha _{R_2}-\alpha _{R_1}}{\mathscr {N}}. $$

Note that \(\alpha _R\) only depends on the charges \(z_1\) and \(z_2\) of the Dirac potentials and on the values of the ground-state densities at the positions of the Dirac potentials. Error cancellation is due to the fact that, for \(0.1 \le R_1,R_2 \le 0.9\), we have

$$\begin{aligned} |\alpha _{R_2}-\alpha _{R_1}| \ll \max (\alpha _{R_1},\alpha _{R_2}) \qquad \text{(see } \text{ Fig. } \text{4.7). } \end{aligned}$$
(4.29)

We also obtain that the error cancellation factor converges when \(\mathscr {N}\) goes to infinity:

$$ \lim \limits _{\mathscr {N}\rightarrow +\infty } Q_\mathscr {N}= \frac{|\alpha _{R_1}-\alpha _{R_2}|}{\alpha _{R_1}+\alpha _{R_2}}. $$

Numerical simulations show that the convergence is monotonous for \(R_1,R_2\) away from the singularities \(R=0\) and \(R=1\) (where the two Dirac combs overlap), and oscillating for \(R_1\) or \(R_2\) close to the singularities (see Fig. 4.8).

Inequality (4.29), which is at the root of error cancellation, can be rewritten as

$$\begin{aligned}&\left| z_1^2 \left( \rho _{R_1}(0) - \rho _{R_2}(0) \right) + z_2^2 \left( \rho _{R_1}(R_1) - \rho _{R_2}(R_2) \right) \right| \\&\qquad \ll \max (z_1^2 \rho _{R_1}(0) + z_2^2 \rho _{R_2}(0) , z_1^2 \rho _{R_1}(R_1) + z_2^2 \rho _{R_2}(R_2)), \end{aligned}$$

where \(\rho _R(x)=u_R(x)^2\) is the ground-state density at point x in configuration R. For R away from the singularities 0 and 1, the ground-state density at the positions 0 and R of the Dirac potentials does not change much with R.

Fig. 4.7
figure 7

Plot of the function \(R \mapsto \alpha _R\) defined by (4.28) for different sets of parameters \((z_1,z_2)\)

Fig. 4.8
figure 8

Convergence of \(Q_\mathscr {N}\) to \(Q_\infty \) for \(R_1=\frac{1}{2}\) and three different values of \(R_2\)

Another interesting remark reported in [1] is that

$$ \left| \frac{\mathrm{d}E_{R,\mathscr {N}}}{\mathrm{d}R}-\frac{\mathrm{d}E_R}{\mathrm{d}R} \right| \gg \left| \frac{\frac{\mathrm{d}\alpha _R}{\mathrm{d}R}}{\mathscr {N}}\right| . $$

In other words, it is not possible to infer from \(E_{R,\mathscr {N}}-E_R \mathop {\sim }_{\mathscr {N}\rightarrow \infty } \frac{\alpha _R}{\mathscr {N}}\) a result on the convergence of the forces. In fact, it is observed that

$$ \frac{\mathrm{d}E_{R,\mathscr {N}}}{\mathrm{d}R}-\frac{\mathrm{d}E_R}{\mathrm{d}R} \mathop {\sim }_{\mathscr {N}\rightarrow \infty } \frac{\mathrm{d}}{\mathrm{d}R}\left( \frac{\gamma _R}{\mathscr {N}} \eta _{R,\mathscr {N}}\right) $$

and that the function \(R \mapsto \frac{\mathrm{d}E_{R,\mathscr {N}}}{\mathrm{d}R}-\frac{\mathrm{d}E_R}{\mathrm{d}R}\) is oscillating more and more when \(\mathscr {N}\) becomes large. As a consequence,

  1. 1.

    It is not a good idea to try and compute the energy difference between two configuration \(R_1\) and \(R_2\) by integrating the forces along a path of the configuration space linking \(R_1\) and \(R_2\);

  2. 2.

    Extrapolation methods based on the asymptotic expansion (4.27) can be used to improve the accuracy of the energy, but they will not improve the accuracy of the forces.

4.5 Conclusion

In this chapter, we have introduced basic concepts of mathematically based discretization error analysis: a priori error estimators, a posteriori error estimators and post-processing methods, asymptotic expansions and extrapolation methods, and error cancellation phenomenon. These concepts have been illustrated on the simple examples of plane-wave discretizations of the Gross-Pitaevskii model, and of a 1D periodic Schrödinger equation with Dirac potentials.

Significant progress on discretization error analysis for Kohn-Sham and other electronic structure models has been made in the past few years, and many ongoing works in these directions are in progress in several groups around the world. As witnessed in other fields of science and engineering, rigorously founded error analysis should play a major role in the design of a new generation of electronic structure calculation software, generating numerical results supplemented with error bars, optimizing the available computational resources, and adapted to massively parallel and heterogeneous architectures.