Abstract
Substantial modifications of both the choice of the grids, the combination coefficients, the parallel data structures and the algorithms used for the combination technique lead to numerical methods which are scalable. This is demonstrated by the provision of error and complexity bounds and in performance studies based on a state of the art code for the solution of the gyrokinetic equations of plasma physics. The key ideas for a new fault-tolerant combination technique are mentioned. New algorithms for both initial- and eigenvalue problems have been developed and are shown to have good performance.
Access provided by Autonomous University of Puebla. Download conference paper PDF
Similar content being viewed by others
Keywords
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.
1 Introduction
The solution of moderate- to high-dimensional PDEs (larger than four dimensions) comes with a high demand for computational power. This is due to the curse of dimensionality, which manifests itself by the fact that very large computational grids are required even for moderate accuracy. In fact, the grid sizes are an exponential function of the dimension of the problem. Regular grids are thus not feasible even when future exascale systems are to be utilized. Fortunately, hierarchical discretization schemes come to the rescue. So-called sparse grids [53] mitigate the curse of dimensionality to a large extent.
Nonetheless, the need for HPC resources remains. The aim of two recent projects, one (EXAHD) within the German priority program “Software for exascale computing” and one supported through an Australian Linkage grant and Fujitsu Laboratories of Europe, has been to study the sparse grid combination technique for the solution of moderate-dimensional PDEs which arise in plasma physics for the simulation of hot fusion plasmas. The combination technique is well-suited for such large-scale simulations on future exascale systems, as it adds a second level of parallelism which admits scalability. Furthermore, its hierarchical principle can be used to support algorithm-based fault tolerance [38, 46]. In this work, we focus on recent developments with respect to the theory and application of the underlying methodology, the sparse grid combination technique.
The sparse grid combination technique utilizes numerical solutions \(u(\gamma )\) of partial differential equations computed for selected values of the parameter vector \(\gamma\) which controls the underlying grids. As the name suggests, the method then proceeds by computing a linear combination of the component solutions \(u(\gamma )\):
Computationally, the combination technique thus consists of a reduction operation which evaluates the linear combination of the computationally independent components \(u(\gamma )\). A similar structure is commonly found in data analytic problems and is exploited by the Map Reduce method. Since the inception of the combination technique, parallel algorithms were studied which made use of the computational structure [15, 18, 19]. The current work is based on the same principles as these earlier works, see [2, 24, 25, 28, 31, 32, 38–40, 48].
The combination technique computes a sparse grid approximation without having to implement complex sparse grid data structures. The result is a proper sparse grid function. In the case of the interpolation problem one typically obtains the exact sparse grid interpolant but for other problems (like finite element solutions) one obtains an approximating sparse grid function. Mathematically, the combination technique is an extrapolation method, and the accuracy is established using error expansions, see [5, 44, 45]. However, specific error expansions are only known for simple cases. Some recent work on errors of the sparse grid combination technique can be found in [16, 20, 21, 47]. The scarcity of theoretical results, however, did not stop its popularity in applications. Examples include partial differential equations in fluid dynamics, the advection and advection-diffusion equation, the Schrödinger equation, financial mathematics, and machine learning, see, e.g., [8–13, 17, 41, 51]. However, as the combination technique is an extrapolation method, it is inherently unstable and large errors may occur if the error expansions do not hold. This is further discussed in [30] where also a stabilized approach, the so-called Opticom method, is analyzed. Several new applications based on this stabilized approach are discussed in [1, 7, 23, 26, 35, 51, 52]. Other non-standard combination approximations are considered in [4, 35, 37, 43].
The main application considered in the following deals with the solution of the gyrokinetic equations by the software code GENE [14]. These equations are an approximation for the case of a small Larmor-radius of the Vlasov equations for densities \(f_{s}\) of plasmas,
The densities are distribution functions over the state space and \(E\) and \(B\) are the electrostatic and electromagnetic fields (both external and induced by the plasma), \(\mathbf{v}\) is the velocity and \(\mathbf{x}\) the location. The fields \(E\) and \(B\) are then the solution of the Maxwell equations for the charge and current densities defined by
While the state space has 6 dimensions (3 space and 3 velocity), the gyrokinetic equations reduce this to 5 dimensions. The index \(s\) numbers the different species (ions and electrons). The numerical scheme uses both finite differences and spectral approximations. As complex Fourier transforms are used, the densities \(f_{s}\) are complex.
In Sect. 2 a general combination technique suitable for our application is discussed. In this section the set \(I\) occurring in the combination formula (1) uniquely determines the combination coefficients \(c_{\gamma }\) in that formula. Some parallel algorithms and data structures supporting the sparse grid combination technique are presented in Sect. 3. In order to stabilize the combination technique, the combination coefficients need to be modified and even chosen dependent on the solution. This is covered in Sect. 4. An important application area relates to eigenvalue problems in Sect. 5, where we cover challenges and algorithms for this problem.
2 A Class of Combination Techniques
Here we call a combination technique a method which is obtained by substituting some of the hierarchical surpluses by zero. This includes the traditional sparse grid combination technique [19], the truncated combination technique [4], dimension adaptive variants [10, 29] and even some of the fault tolerant methods [24]. The motivation for this larger class is that often the basic error splitting assumption—which can be viewed as an assumption about the surplus—does not hold in these cases. We will now formally define this combination technique.
We assume that we have at our disposition a computer code which is able to produce approximations of some real or complex number, some vector or some function. We denote the quantity of interest by \(u\) and assume that the space of all possible \(u\) is a Euclidean vector space (including the numbers) or a Hilbert space of functions. The computer codes are assumed to compute a very special class of approximations \(u(\gamma )\) which in some way are associated with regular \(d\)-dimensional grids with step size \(h_{i} = 2^{-\gamma _{i}}\) in the \(i\)-th coordinate. For simplicity we will assume that in principle our code can compute \(u(\gamma )\) for any \(\gamma \in \mathbb{N}_{0}^{d}\). Furthermore, \(u(\gamma ) \in V (\gamma )\) where the spaces \(V (\gamma ) \subset V\) are hierarchical, such that \(V (\alpha ) \subset V (\beta )\) when \(\alpha \leq \beta\) (i.e. where \(\alpha _{i} \leq \beta _{i}\) for all \(i = 1,\ldots d\)). For example, if \(V = \mathbb{R}\) then so are all \(V (\gamma ) = \mathbb{R}\). Another example is the space of functions with bounded (in \(L_{2}\)) mixed derivatives \(V = H_{\mathop{\mathrm{mix}}\nolimits }^{1}\left ([0,1]^{d}\right )\). In this case one may choose \(V (\gamma )\) to be appropriate spaces of multilinear functions.
The quantities of interest include solutions of partial differential equations, minima of convex functionals and eigenvalues and eigenfunctions of differential operators. They may also be functions or functionals of solutions of partial differential equations. They may be moments of some particle densities which themselves are solutions to some Kolmogorov, Vlasov, or Boltzmann equations. The computer codes may be based on finite difference and finite element solvers, least squares and Ritz solvers but could also just be interpolants or projections. In all these cases, the combination technique is a method which combines multiple approximations \(u(\gamma )\) to get more accurate approximations. Of course the way how the underlying \(u(\gamma )\) are computed will have some impact on the final combination approximation.
The combination technique is fundamentally tied to the concept of the hierarchical surplus [53] which was used to introduce the sparse grids. However, there is a subtle difference between the surplus used to define the sparse grids and the one at the foundation of the combination technique. The surplus used for sparse grids is based on the representation of functions as a series of multiples of hierarchical basis functions. In contrast, the combination technique is based on a more general decomposition. It is obtained from the following result which follows from two lemmas in chapter 4 of [22].
Proposition 1 (Hierarchical surplus)
Let V (γ) be linear spaces with \(\gamma \in \mathbb{N}_{0}^{d}\) such that V (α) ⊂ V (β) if α ≤β and let u(γ) ∈ V (γ). Then there exist w(α) ∈ V (α) such that
Moreover, the w(γ) are uniquely determined and one has
where B(α) ={ γ ≥ 0∣α − 1 ≤γ ≤α} and \(1 = (1,\ldots,1) \in \mathbb{N}^{d}\).
The set of \(\gamma\) is countable and the proposition is proved by induction over this set. Note that the equations are cumulative sums and the solution is given in the form of a finite difference. For the case of \(d = 2\) and \(\gamma \leq (2,2)\) one gets the following system of equations:
Note that all the components of the right hand side and the solution are elements of linear spaces. The vector of \(w(\alpha )\) is for the example:
For any set of indices \(I \subset \mathbb{N}_{0}^{d}\) we now define the combination technique as any method delivering the approximation
In practice, the approximation \(u_{I}\) is computed directly from the \(u(\gamma )\). The combination formula is directly obtained from Proposition 1 and one has
Proposition 2
Let u I = ∑ α∈I w(α) where w(α) is the hierarchical surplus for the approximations u(γ). Then there exists a subset I ′ of the smallest downset which contains the set I and some coefficients \(c_{\gamma } \in \mathbb{Z}\) for γ ∈ I ′ such that
Furthermore, one has
where C(γ) ={ α∣γ ≤α ≤γ + 1} and where χ I (α) is the characteristic function of I.
The proof of this result is a direct application of Proposition 1, see also [22]. For the example \(d = 2\) and \(n = 2\) one gets
Note the coefficients \(c_{\gamma } = 1\) for the finest grids, \(c_{\gamma } = -1\) for some grids which are slightly coarser and \(c_{\gamma } = 0\) for all the other grids. There are both positive and negative coefficients. Indeed, the results above can also be shown to be a consequence of the inclusion-exclusion principle. One can show that if \(0 \in I\) then \(\sum _{\gamma \in I}c_{\gamma } = 1\).
An implementation of the combination technique will thus compute a linear combination of a potentially large number of component solutions \(u(\gamma )\). One thus requires two steps, first the independent computation of the components \(u(\gamma )\) and then the reduction to the combination \(u_{I}\). Thus the computations require a collection of computational clusters which are loosely connected. This is a great advantage on HPC systems as the need for global communication is significantly reduced to a loose coupling.
Many variants of the combination technique are obtained using the technique introduced above. They differ by their choice of the summation set \(I\). The classical combination technique utilizes
Many variants are subsets of this set. This includes the truncated sparse grids [3, 4] defined by
where \(\downarrow \) is the operator producing the smallest downset containing the operand. Basically the same class is considered in [49] (there called partial sparse grids):
for some \(\beta \geq 1\). Sparse grids with faults [24] include sets of the form
for some \(\beta\) with \(\vert \beta \vert = n\). Finally, one may consider the two-scale combination with
where \(e_{k}\) is the standard \(k\)-th basis vector in \(\mathbb{R}^{d}\). This has been considered in [3] for the case of \(n_{0} = n_{k} = n\). Another popular choice is
This corresponds to a truncated ANOVA-type decomposition. An alternative ANOVA decomposition is obtained by choosing \(\beta ^{(k)}\) with \(\vert \mathop{\mathrm{supp}}\nolimits \beta ^{(k)}\vert = k\) and setting
The sets \(I\) are usually downsets, i.e., such that \(\beta \in I\) if there exists an \(\alpha \in I\) such that \(\beta \leq \alpha\). Note that any downset \(I\) especially contains the zero vector. The corresponding vector space \(V (0)\) typically contains the set of constant functions.
We will now consider errors. First we reconsider the error of the \(u(\gamma )\). In terms of the surpluses, one has from the surplus decomposition of \(u(\gamma )\) that
Let \(I_{s}(\gamma ) =\{\alpha \mid \alpha _{s} >\gamma _{s}\}.\) Then one has
as any \(\alpha\) which is not less or equal to \(\gamma\) contains at least one element \(\alpha _{s} >\gamma _{s}\). We now define
for any non-empty subset \(\sigma \subseteq \{ 1,\ldots,d\}\). A direct application of the inclusion-exclusion principle then leads to the error splitting
where
This is an ANOVA decomposition of the approximation error of \(u(\gamma )\). From this one gets the result
Proposition 3
Let \(u_{I} =\sum _{\gamma \in I^{{\prime}}}c_{\gamma }\,u(\gamma )\) and the combination coefficients c γ be such that ∑ γ∈I c γ = 1. Then
Proof
This follows from the discussion above and because 0 ∈ I one has
□
An important point to note here is that this error formula does hold for any coefficients \(c_{\gamma }\), not just the ones defined by the general combination technique. This thus leads to a different way to choose the combination coefficients which results in a small error. We will further discuss such choices in the next section. Note that for the general combination technique the coefficients are uniquely determined by the set \(I\). In this case one has a complete description of the error using the hierarchical surplus
In summary, we have now two strategies to design a combination approximation: one may choose either
-
the set \(I\) which contains all the \(w(\alpha )\) which are larger than some threshold
-
or the combination coefficients such that the sums \(\sum _{\alpha \in I(\gamma,\sigma )}c_{\gamma }\,z(\gamma,\sigma )\) are small.
One approach is to select the \(w(\alpha )\) adaptively, based on their size so that
Such an approach is sometimes called dimension adaptive to distinguish it from the spatially adaptive approach where grids are refined locally. One may be interested in finding an approximation for some \(u(\gamma )\), for example, for \(\gamma = (n,\ldots,n)\). In this case one considers
and one has the following error bound:
Proposition 4
Let I ={ α ≤γ∣∥w(α)∥≥ε} and u(γ) − u I be the error of the combination approximation based on the set I relative to u(γ). Then one has the bound
The result is a simple application of the triangle inequality and the fact that
In particular, one has if all \(\gamma _{i} = n\):
While this bound is very simple, it is asymptotically (in \(n\) and \(d\)) tight due to the concentration of measure. Note also, that a similar bound for the spatially adaptive method is not available. An important point to note is that this error bound holds always, independently of how good the surplus is at approximating the exact result. For \(\gamma = (n,\ldots,n)\) one can combine the estimate of Proposition 4 with a bound on \(u - u(\gamma )\) to obtain
One can then choose \(n\) which minimizes this for a given \(\epsilon\) by balancing the two terms. Conversely, for a given \(n\) the corresponding \(\epsilon\) is given by \(\epsilon _{n} = (n + 1)^{-d}K4^{-n}\). In Fig. 1 we plot \(\epsilon _{n}/K\) against \(\Vert u - u_{I}\Vert /K\) for several different \(d\) to demonstrate how the error changes with the threshold.
While the combination approximation is the sum of the surpluses \(w(\alpha )\) over all \(\alpha \in I\), the result only depends on a small number of \(u(\gamma )\) close to the maximal elements of \(I\). In particular, any errors of the values \(u(\alpha )\) for small \(\alpha\) have no effect for approximations based on larger \(\alpha\). Thus when doing an adaptive approximation, the earlier errors are forgotten.
Finally, if one has a model for the hierarchical surplus, for example, if it is of the form
for some bounded \(y(\alpha )\) then one can get specific error bounds for the combination technique, in particular the well-known bounds for the classical sparse grid technique. In this case one gets \(\Vert w(\alpha )\Vert \leq K4^{-\vert \alpha \vert }\) if one chooses \(\vert \alpha \vert \geq n\) as for the classical combination technique. One can show that the terms in the error formula for the components \(u(\gamma )\) satisfy
3 Algorithms and Data Structures
In this section we consider the parallel implementation of the combination technique for partial differential equation solvers. For large-scale simulations, for example as being the final target for the EXAHD project in the second phase, even a single component grid (together with the data structures to solve the underlying PDE on it) will not fit into the memory of a single node any more. Furthermore, the storage of a full grid representation of a sparse grid will exceed the predicted RAM of a whole exascale machine. Furthermore, the communication overhead across a whole HPC systems’ network cannot be neglected. In this section we will assume that the component grids \(u(\gamma )\) are implemented as distributed regular grids. In a first stage we consider the case where the combined solution \(u_{I}\) is also a distributed regular grid. Later we will then discuss distributed sparse grid data structures.
The combination technique is a reduction operation combining the components according to Eq. (1). This reduction is based on the sum \(u^{{\prime}}\leftarrow u^{{\prime}} + cu\) of a component \(u\) (we omit the parameters \(\gamma\) for simplicity) to the resulting combination \(u^{{\prime}}\) (or \(u_{I}\)). Assume that the \(u\) and \(u^{{\prime}}\) are distributed over \(P\) and \(P^{{\prime}}\) processors, respectively.
The direct SGCT algorithm involves for each of the component processes sending all its points of \(u\) to the respective combination process. This is denoted as the gather stage. In a second stage, the combination processes then first interpolates the gathered points to the combination grid \(u\) before adding them. In a third stage, the scatter stage, the data on each combination process is sampled and the samples sent to the corresponding component processes, see Fig. 2.
In the direct SGCT algorithm, the components and combination are represented by the function values on the grid points or coefficients of the nodal basis. We have also considered a hierarchical SGCT algorithm which is based on the coefficients of the hierarchical basis which leads to a hierarchical surplus representation. When the direct SGCT algorithm is applied to these hierarchical surpluses there is no need for interpolation, and the sizes of the corresponding surplus vectors are exactly the same for both the components and the combination. However, for performance, it is necessary to coalesce the combination of surpluses as described in [49]. As the largest surpluses only occur for one component they do not need to be communicated. Despite the savings in the hierarchical algorithm, we found that the direct algorithm is always faster than the hierarchical, and it scales better with both \(n\), \(d\) and the number of processes (cores). This does however require that the representation of the combined grid \(u'\) is sparse, as is described below. We also found that the formation of the hierarchical surpluses (and its inverse) took a relatively small amount of time, and concluded that, even when the data is originally stored in hierarchical form, it is faster to dehierarchize it, apply the direct algorithm and hierarchize it again [49].
New adapted algorithms and implementations have been developed with optimal communication overhead, see Fig. 3 (left) and the corresponding paper in this proceedings [27]. The gather–scatter steps described above have to be invoked multiple times for the solution of time-dependent PDEs. (We found that for eigenvalue problems it is often sufficient to call the gather–scatter only once, see the Sect. 5.) In any case, the gather–scatter step is the only remaining global communication of the combination technique and thus has to be examined well. In previous work [31] we have thus analyzed communication schemes required for the combination step in the framework of BSP-models and developed new algorithmic variants with communication that is optimal up to constant factors. This way, the overall makespan volume, the maximal communicated volume, can be drastically reduced with a slightly increased number of messages that have to be sent.
A distributed sparse grid data structure is described in [49]. The index set \(I\) for this case is a variant of a truncated sparse grid set, see Eq. (14). Recall that the sparse grid points are obtained by taking the union of all the component grid points. As the number of sparse grid points is much less than the number of full grid points it makes sense to compute only the combinations for the sparse grid points. A sparse grid data structure has been developed which is similar to the CSR data structure used for sparse matrices. In this case one stores both information about the value \(u\) at the grid point and the location of the grid point. Due to the regularity of the sparse grid this can be done efficiently.
With optimal communication, distributed data structures and corresponding algorithms, excellent scaling can be obtained for large numbers of process groups as shown in Fig. 3 (right) on Hazel Hen, which includes local algorithmic work to hierarchize, local communication and global communication. See the corresponding paper in this proceedings [27].
4 Modified Combination Coefficients
Here we consider approximations which are based on a vector \((u(\gamma ))_{\gamma \in I}\) of numerical results. It has been seen, however, that the standard way to choose the combination coefficients is not optimal and may lead to large errors. In fact one may interpret the truncated combination technique as a variant where some of the coefficients have been chosen to be zero and the rest adapted. In the following we provide a more radical approach to choosing the coefficients \(c_{\gamma }\). An advantage of this approach is that it does not depend so much on properties of the index set \(I\), in fact, this set does not even need to be a downset.
A first method was considered in [30, 52] for convex optimization problems. Here, let the component approximations be
Then the Opticom method, a Ritz approximation over the span of given \(u(\gamma )\) computes
Computationally the Opticom method consists of the determination of minimization of a convex function \(P(c)\) of \(\vert I\vert \) variables of the form
to get the combination coefficients. Once they have been determined, the approximation \(u^{O}\) is then computed as in the Sects. 2 and 3. By design, one has \(J(u^{O}) \leq J(u(\gamma ))\) for all \(\gamma \in I\). If \(I\) gives rise to a combination approximation \(u^{C}\) then one also has \(J(u^{O}) \leq J(u^{C})\). A whole family of other convex functions \(\varPhi (c)\) for the combination coefficients were considered in [30]. Using properties of the Bregman divergence, one can derive error bounds and quasi-optimality criteria for the Opticom method, see [52].
A similar approach was suggested for the determination of combination coefficients for faulty sets \(I\). Let \(I\) be any set and \(I^{{\prime}}\) be the smallest downset which contains \(I\). Then let the \(w(\alpha )\) be the surpluses computed from the set of all \(u(\gamma )\) for \(\gamma \in I\) and \(\alpha \in I^{{\prime}}\). Finally, let the regular combination technique be defined as
and let for any \(c_{\gamma }\) a combination technique be
Then the difference between the new combination technique and the regular combination technique is
where \(I(\alpha ) =\{\gamma \in I\mid \gamma \geq \alpha \}\). Using the triangle inequality one obtains
with
where \(\theta\) is such that \(\Vert w(\alpha )\Vert \leq \theta (\alpha )\). Minimizing the \(\varPhi (c)\) thus seems to lead to a good choice of combination coefficients, and this is confirmed by experiments as well [22]. The resulting combination technique forms the basis for a new fault-tolerant approach which has been discussed in [24].
5 Computing Eigenvalues and Eigenvectors
Here we consider the eigenvalue problem in \(V\) where one would like to compute complex eigenvalues \(\lambda\) and the corresponding eigenvectors \(u\) such that
where \(L\) is a given linear operator defined on \(V\). We assume we have a code which computes approximations \(\lambda (\gamma ) \in \mathbb{C}\) and \(u_{\lambda }(\gamma ) \in V (\gamma )\) of the eigenvalues \(\lambda\) and the corresponding eigenvectors \(u\). We have chosen to discuss the eigenvalue problem separately as it does exhibit particular challenges which do not appear for initial and boundary value problems.
Consider now the determination of the eigenvalues \(\lambda\). Note that one typically has a large number of eigenvalues for any given operator \(L\). First one needs to decide which eigenvalue to compute. For example, if one is interested in stability of a system, one would like to determine the eigenvalue with the largest real part. It is possible to use the general combination technique, however, one needs to make sure that the (non-zero) combination coefficients \(c_{\gamma }\) used are such that the eigenvectors of \(L(\gamma )\) contain approximations of the eigenvector \(u\) which is of interest. However, computing the surplus \(\nu (\alpha )\) for the eigenvalues \(\lambda (\gamma )\) and including all the ones which satisfy \(\vert \nu (\alpha )\vert \geq \epsilon\) for some \(\epsilon\) would be a good way to make sure that we get a good result. Furthermore, the error bound given in Sect. 2 does hold here. As any surplus \(\nu (\alpha )\) does only depend on the values \(\lambda (\gamma )\) for \(\gamma\) close to \(\alpha\) any earlier \(\lambda (\gamma )\) with a large error will not influence the final result. Practical computations confirmed the effectiveness of this approach, see [34]. If one knows which spaces \(V (\gamma )\) produce reasonable approximations for the eigenvector corresponding to some eigenvalue \(\lambda\) then one can define a set \(I(\lambda )\) containing only those \(\gamma\). Combinations over \(I(\gamma )\) will then provide good approximations of \(\lambda\). (However, as stated above, the combination technique is asymptotically stable against wrong or non-existing eigenvectors.)
Computing the eigenvectors faces the same problem one has for computing the eigenvalues. In addition, however, one has an extra challenge as the eigenvectors are only determined up to some complex factor. In particular, if one uses the eigenvectors \(u(\gamma )\) to compute the surplus functions \(w(\alpha )\) one may get very wrong results. One way to deal with this is to first normalize the eigenvectors. For this one needs a functional \(s \in V ^{{\ast}}\). One then replaces the \(u(\gamma )\) by \(u(\gamma )/\langle s,u(\gamma )\rangle\) when computing the surplus, i.e., one solves the surplus equations
and computes the combination approximation as
In practice, this did give good results and it appears reasonable that bounds on the so computed surplus provide a foundation for the error analysis. In any case the error bound of the adaptive method holds. Actually, this bound even holds when the eigenvectors are not normalized. The advantage of the normalization is really that the number of surpluses to include are much smaller—i.e. a computational advantage. Practical experiments also confirmed this. It remains to be shown that error splitting assumptions are typically invariant under the scaling done above.
5.1 An Opticom Approach for Solving the Eigenvalue Problem
An approach to solving the eigenvalue problem which does not require scaling has been proposed and investigated by Kowitz and collaborators [34, 36]. The approach is based on a minimization problem which determines combination coefficients in a similar manner as the opticom method in Sect. 3. It is assumed that \(I\) is given and the \(u(\gamma )\) for \(\gamma \in I\) have been computed and solve \(L(\gamma )u(\gamma ) =\lambda (\gamma )u(\gamma )\). Let the matrix \(G = \left [u(\gamma )\right ]_{\gamma \in I}\) and the vector \(c = [c_{\gamma }]_{\gamma \in I}^{T}\), then the combination approximation for the eigenvector can be written as the matrix-vector product
This eigenvalue problem can be solved by computing
with the normal equations
for the solution of \(c\). Osborne et al. [33, 42] solved this by considering the problem
with \(K(\lambda ) = (LG -\lambda G)^{{\ast}}(LG -\lambda G)\). Here \(\lambda\) is a parameter. One obtains the solution
for which one then uses Newton’s method to solve \(\beta (\lambda ) = 0\) with respect to \(\lambda\). With \(\beta (\lambda ) = 0\) it follows that \(K(\lambda )c = 0\) and \(\langle s^{{\ast}},c\rangle = 1\). Thus one obtains a normalized solution of the nonlinear eigenvalue problem (i.e., where \(\lambda\) occurs in a nonlinear way in \(K(\lambda )\)).
Another approach for obtaining the least squares solution is its interpretation as an overdetermined eigenvalue problem. Das et al. [6] developed an algorithm based on the QZ decomposition which allows the computation of the eigenvalue and the eigenvector in \(\mathcal{O}(mn)\) complexity, where \(n = \vert I\vert \) and \(m = \vert V \vert \).
The approaches have both been investigated for a simple test problem (see left of Fig. 4) and for large eigenvalue computations with GENE (see right of Fig. 4). The combination approximations (though computed serially here) can be usually obtained faster than the full grid approximations. Note that the run-times here have been obtained in a prototypical implementation before the development of the scalable algorithms described in Sect. 3. For large problems, the combination approximation can be expected to be even significantly faster as the combination technique exhibits a better parallel scalability than the full grid solution. For further details, see [34, 36].
5.2 Iterative Refinement and Iterative Methods
Besides the adaptation of the combination coefficients, the combination technique for eigenvalue problems can also be improved by refining the \(u(\gamma )\) iteratively. Based on the iterative refinement procedure introduced by Wilkinson [50], the approximation of the eigenvalue \(\lambda _{I}\) and the corresponding eigenvector \(u_{I}\) can be improved towards \(\lambda\) and \(u\) with corrections \(\varDelta \lambda\) and \(\varDelta u\) by
Putting this into \(0 = Lu -\lambda u\), the corrections can be obtained by solving
where the quadratic term \(\varDelta \lambda \varDelta u\) is neglected. This system is underdetermined. An additional scaling condition \(\langle s^{{\ast}},\varDelta u\rangle = 0\) with \(s \in V\) ensures that the correction \(\varDelta u\) does not change the magnitude of \(u_{I}\). Solving the linear system
we obtain the corrections \(\varDelta \lambda\) and \(\varDelta u\). The linear operator \(L\) has a large rank and its inversion is generally infeasible for high-dimensional settings. Nevertheless computing a single matrix vector product \(Lu_{I}\) is feasible, so that the right-hand side is easily computed. In the framework of the combination technique the corrections \(\varDelta u\) and \(\varDelta \lambda\) are computed on each subspace \(V (\gamma )\). Therefore, the residual \(r = Lu -\lambda u\) and the initial combination approximation \(u_{I}\) are projected on \(V (\gamma )\) using suitable prolongation operators [18]. The corrections \(\varDelta u(\gamma )\) and \(\varDelta \lambda (\gamma )\) are computed on each subspace \(V (\gamma )\) by solving
Here, the significantly smaller rank of \(L(\gamma )\) allows the solution of the linear system with feasible effort. The corrections from each subspace \(V (\gamma )\) are then combined using the standard combination coefficients \(c_{\gamma }\) by
After adding the correction to \(u_{I}\) and \(\lambda _{I}\), the process can be repeated up to marginal \(\varDelta \lambda _{I}\) and \(\varDelta u_{I}\).
Instead of using the standard combination coefficients \(c_{\gamma }\), we can also adapt the combination coefficients in order to minimize the residual \(r\). The minimizer
is then the best combination of the corrections. Both approaches have been tested for the Poisson problem as well as GENE simulations. For details see [34].
6 Conclusions
Early work on the combination technique revealed that it leads to a suitable method for the solution of simple boundary value problems on computing clusters. The work presented here demonstrated, that if combined with strongly scalable solvers for the components, one can develop an approach which is suitable for exascale architectures. This was investigated for the plasma physics code GENE which was used to solve initial and eigenvalue problems and stationary solutions. In addition to the 2 levels of parallelism exhibited by the combination technique, the flexibility of the choice of the combination coefficients led to a totally new approach to algorithm-based fault tolerance which further enhanced the scalability of the approach.
References
Ali, M.M., Southern, J., Strazdins, P.E., Harding, B.: Application level fault recovery: Using fault-tolerant open MPI in a PDE solver. In: 2014 IEEE International Parallel & Distributed Processing Symposium Workshops, Phoenix, 19–23 May 2014, pp. 1169–1178. IEEE (2014)
Ali, M.M., Strazdins, P.E., Harding, B., Hegland, M., Larson, J.W.: A fault-tolerant gyrokinetic plasma application using the sparse grid combination technique. In: Proceedings of the 2015 International Conference on High Performance Computing & Simulation (HPCS 2015), pp. 499–507. IEEE, Amsterdam (2015). Outstanding paper award
Benk, J., Bungartz, H.J., Nagy, A.E., Schraufstetter, S.: Variants of the combination technique for multi-dimensional option pricing. In: Günther, M., Bartel, A., Brunk, M., Schöps, S., Striebel, M. (eds.) Progress in Industrial Mathematics at ECMI 2010, pp. 231–237. Springer, Berlin/Heidelberg (2010)
Benk, J., Pflüger, D.: Hybrid parallel solutions of the Black-Scholes PDE with the truncated combination technique. In: Smari, W.W., Zeljkovic, V. (eds.) 2012 International Conference on High Performance Computing & Simulation, HPCS 2012, Madrid, 2–6 July 2012, pp. 678–683. IEEE (2012)
Bungartz, H.J., Griebel, M., Rüde, U.: Extrapolation, combination, and sparse grid techniques for elliptic boundary value problems. Comput. Method. Appl. M. 116 (1–4), 243–252 (1994)
Das, S., Neumaier, A.: Solving overdetermined eigenvalue problems. SIAM J. Sci. Comput. 35 (2), 541–560 (2013)
Fang, Y.: One dimensional combination technique and its implementation. ANZIAM J. Electron. Suppl. 52 (C), C644–C660 (2010)
Franz, S., Liu, F., Roos, H.G., Stynes, M., Zhou, A.: The combination technique for a two-dimensional convection-diffusion problem with exponential layers. Appl. Math. 54 (3), 203–223 (2009)
Garcke, J.: Regression with the optimised combination technique. In: Proceedings of the 23rd International Conference on Machine Learning (ICML 2006), vol. 2006, pp. 321–328. ACM, New York (2006)
Garcke, J.: A dimension adaptive sparse grid combination technique for machine learning. ANZIAM J. 48 (C), C725–C740 (2007)
Garcke, J., Griebel, M.: On the computation of the eigenproblems of hydrogen and helium in strong magnetic and electric fields with the sparse grid combination technique. J. Comput. Phys. 165(2), 694–716 (2000)
Garcke, J., Hegland, M.: Fitting multidimensional data using gradient penalties and combination techniques. In: Modeling, Simulation and Optimization of Complex Processes, pp. 235–248. Springer, Berlin (2008)
Garcke, J., Hegland, M.: Fitting multidimensional data using gradient penalties and the sparse grid combination technique. Computing 84 (1–2), 1–25 (2009)
Gene Development Team: GENE. http://www.genecode.org/
Griebel, M.: The combination technique for the sparse grid solution of PDEs on multiprocessor machines. Parallel Process. Lett. 2, 61–70 (1992)
Griebel, M., Harbrecht, H.: On the convergence of the combination technique. Lect. Notes Comput. Sci. 97, 55–74 (2014)
Griebel, M., Thurner, V.: The efficient solution of fluid dynamics problems by the combination technique. Int. J. Numer. Method. H. 5 (3), 251–269 (1995)
Griebel, M., Huber, W., Rüde, U., Störtkuhl, T.: The combination technique for parallel sparse-grid-preconditioning or -solution of PDEs on workstation networks. In: Bougé, L., Cosnard, M., Robert, Y., Trystram, D.(eds.) Parallel Processing: CONPAR 92 – VAPP V, Lecture Notes in Computer Science. vol. 634, pp. 217–228. Springer, Berlin/Heidelberg/London (1992). Proceedings of the Second Joint International Conference on Vector and Parallel Processing, Lyon, 1–4 Sept 1992
Griebel, M., Schneider, M., Zenger, C.: A combination technique for the solution of sparse grid problems. In: Iterative Methods in Linear Algebra (Brussels, 1991), pp. 263–281. North-Holland, Amsterdam (1992)
Harding, B.: Adaptive sparse grids and extrapolation techniques. In: Proceedings of Sparse Grids and Applications 2014. Lecture Notes in Computational Science and Engineering, vol. 109, pp. 79–102. Springer, New York (2015)
Harding, B.: Combination technique coefficients via error splittings. ANZIAM J. 56, C355–C368 (2016). (Online)
Harding, B.: Fault tolerant computation of hyperbolic PDEs with the sparse grid combination technique. Ph.D. thesis, The Australian National University (2016)
Harding, B., Hegland, M.: A robust combination technique. ANZIAM J. Electron. Suppl. 54 (C), C394–C411 (2012)
Harding, B., Hegland, M.: A parallel fault tolerant combination technique. Adv. Parallel Comput. 25, 584–592 (2014)
Harding, B., Hegland, M.: Robust solutions to PDEs with multiple grids. In: Garcke, J., Pflüger, D. (eds.) Sparse Grids and Applications, Munich 2012. Lecture Notes in Computer Science. vol. 97, pp. 171–193. Springer, Cham (2014)
Harding, B., Hegland, M., Larson, J.W., Southern, J.: Fault tolerant computation with the sparse grid combination technique. SIAM J. Sci. Comput. 37 (3), C331–C353 (2015)
Heene, M., Pflüger, D.: Scalable algorithms for the solution of higher-dimensional PDEs. In: Proceedings of SPPEXA Symposium 2016. Lecture Notes in Computational Science and Engineering. Springer, Berlin/Heidelberg (2016)
Heene, M., Kowitz, C., Pflüger, D.: Load balancing for massively parallel computations with the sparse grid combination technique. In: Bader, M., Bungartz, H.J., Bode, A., Gerndt, M., Joubert, G.R. (eds.) Parallel Computing: Accelerating Computational Science and Engineering (CSE). pp. 574–583. IOS Press, Amsterdam (2014)
Hegland, M.: Adaptive sparse grids. In: Burrage, K., Sidje, R.B. (eds.) Proceedings of 10th Computational Techniques and Applications Conference CTAC-2001, vol. 44, pp. C335–C353 (2003)
Hegland, M., Garcke, J., Challis, V.: The combination technique and some generalisations. Linear Algebra Appl. 420 (2–3), 249–275 (2007)
Hupp, P., Jacob, R., Heene, M., Pflüger, D., Hegland, M.: Global communication schemes for the sparse grid combination technique. Adv. Parallel Comput. 25, 564–573 (2014)
Hupp, P., Heene, M., Jacob, R., Pflüger, D.: Global communication schemes for the numerical solution of high-dimensional PDEs. Parallel Comput. 52, 78–105 (2016)
Jennings, L.S., Osborne, M.: Generalized eigenvalue problems for rectangular matrices. IMA J. Appl. Math. 20 (4), 443–458 (1977)
Kowitz, C.: Applying the sparse grid combination technique in Linear Gyrokinetics. Ph.D. thesis, Technische Universität München (2016)
Kowitz, C., Hegland, M.: The sparse grid combination technique for computing eigenvalues in linear gyrokinetics. Procedia Comput. Sci. 18, 449–458 (2013). 2013 International Conference on Computational Science
Kowitz, C., Hegland, M.: An opticom method for computing eigenpairs. In: Garcke, J., Pflüger, D. (eds.) Sparse Grids and Applications, Munich 2012 SE – 10. Lecture Notes in Computer Science. vol. 97, pp. 239–253. Springer, Cham (2014)
Kowitz, C., Pflüger, D., Jenko, F., Hegland, M.: The combination technique for the initial value problem in linear gyrokinetics. Lecture Notes in Computer Science, vol. 88, pp. 205–222. Springer, Heidelberg (2013)
Larson, J.W., Hegland, M., Harding, B., Roberts, S., Stals, L., Rendell, A., Strazdins, P., Ali, M.M., Kowitz, C., Nobes, R., Southern, J., Wilson, N., Li, M., Oishi, Y.: Fault-tolerant grid-based solvers: Combining concepts from sparse grids and mapreduce. Procedia Comput. Sci. 18, 130–139 (2013). 2013 International Conference on Computational Science
Larson, J.W., Strazdins, P.E., Hegland, M., Harding, B., Roberts, S.G., Stals, L., Rendell, A.P., Ali, M.M., Southern, J.: Managing complexity in the parallel sparse grid combination technique. In: Bader, M., Bode, A., Bungartz, H.J., Gerndt, M., Joubert, G.R., Peters, F.J. (eds.) PARCO. Advances in Parallel Computing, vol. 25, pp. 593–602. IOS Press, Amsterdam (2013)
Larson, J., Strazdins, P., Hegland, M., Harding, B., Roberts, S., Stals, L., Rendell, A., Ali, M., Southern, J.: Managing complexity in the parallel sparse grid combination technique. Adv. Parallel Comput. 25, 593–602 (2014)
Lastdrager, B., Koren, B., Verwer, J.: The sparse-grid combination technique applied to time-dependent advection problems. In: Multigrid Methods, VI (Gent, 1999). Lecture Notes of Computer Science & Engineering, vol. 14, pp. 143–149. Springer, Berlin (2000)
Osborne, M.R.: A new method for the solution of eigenvalue problems. Comput. J. 7 (3), 228–232 (1964)
Parra Hinojosa, A., Kowitz, C., Heene, M., Pflüger, D., Bungartz, H.J.: Towards a fault-tolerant, scalable implementation of GENE. In: Recent Trends in Computation Engineering – CE2014. Lecture Notes in Computer Science, vol. 105, pp. 47–65. Springer, Cham (2015)
Pflaum, C.: Convergence of the combination technique for second-order elliptic differential equations. SIAM J. Numer. Anal. 34 (6), 2431–2455 (1997)
Pflaum, C., Zhou, A.: Error analysis of the combination technique. Numer. Math. 84 (2), 327–350 (1999)
Pflüger, D., Bungartz, H.J., Griebel, M., Jenko, F., Dannert, T., Heene, M., Kowitz, C., Parra Hinojosa, A., Zaspel, P.: EXAHD: An exa-scalable two-level sparse grid approach for higher-dimensional problems in plasma physics and beyond. In: Euro-Par 2014: Parallel Processing Workshops, Porto. Lecture Notes in Computer Science, vol. 8806, pp. 565–576. Springer, Cham (2014)
Reisinger, C.: Analysis of linear difference schemes in the sparse grid combination technique. IMA J. Numer. Anal. 33 (2), 544–581 (2013)
Strazdins, P.E., Ali, M.M., Harding, B.: Highly scalable algorithms for the sparse grid combination technique. In: IPDPS Workshops, Hyderabad, pp. 941–950. IEEE (2015)
Strazdins, P.E., Ali, M.M., Harding, B.: The design and analysis of two highly scalable sparse grid combination algorithms (2015, under review)
Wilkinson, J.H.: Rounding Errors in Algebraic Processes. Her Majesty’s Stationery Office, London (1963)
Wong, M., Hegland, M.: Maximum a posteriori density estimation and the sparse grid combination technique. ANZIAM J. Electron. Suppl. 54 (C), C508–C522 (2012)
Wong, M., Hegland, M.: Opticom and the iterative combination technique for convex minimisation. Lect. Notes Comput. Sci. 97, 317–336 (2014)
Zenger, C.: Sparse grids. In: Hackbusch, W. (ed.) Parallel Algorithms for Partial Differential Equations. Notes on Numerical Fluid Mechanics, vol. 31, pp. 241–251. Vieweg, Braunschweig (1991)
Acknowledgements
The work presented here reviews some results of a German-Australian collaboration going over several years which was supported by grants from the German DFG (SPP-1648 SPPEXA: EXAHD) and the Australian ARC (LP110200410), contributions by Fujitsu Laboratories of Europe (FLE) and involved researchers from the ANU, FLE, TUM, and the Universities of Stuttgart and Bonn. Contributors to this research included Stephen Roberts, Jay Larson, Moshin Ali, Ross Nobes, James Southern, Nick Wilson, Hans-Joachim Bungartz, Valeriy Khakhutskyy, Alfredo Hinojosa, Mario Heene, Michael Griebel, Jochen Garcke, Rico Jacob, Philip Hupp, Yuan Fang, Matthias Wong, Vivien Challis and several others.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2016 Springer International Publishing Switzerland
About this paper
Cite this paper
Hegland, M., Harding, B., Kowitz, C., Pflüger, D., Strazdins, P. (2016). Recent Developments in the Theory and Application of the Sparse Grid Combination Technique. In: Bungartz, HJ., Neumann, P., Nagel, W. (eds) Software for Exascale Computing - SPPEXA 2013-2015. Lecture Notes in Computational Science and Engineering, vol 113. Springer, Cham. https://doi.org/10.1007/978-3-319-40528-5_7
Download citation
DOI: https://doi.org/10.1007/978-3-319-40528-5_7
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-40526-1
Online ISBN: 978-3-319-40528-5
eBook Packages: Mathematics and StatisticsMathematics and Statistics (R0)