Abstract
With the advent of very large scale parallel computer, having millions of processing cores, it has become important to also use the time direction for parallelization. Among the successful methods doing this are the parareal algorithm, the paraexp algorithm, PFASST and also waveform relaxation methods of Schwarz or Dirichlet-Neumann or Neumann type. We present here a mathematical analysis of a further method to parallelize in time, proposed by Maday and Ronquist in 2007. It is based on the diagonalization of the time stepping matrix. Like for many time domain parallelization methods, this seems at first not to be a very promising approach, since this matrix is essentially triangular, and for a fixed time step even a Jordan block, and thus not diagonalizable. If one however chooses different time steps, diagonalization is possible, and one has to trade of between the accuracy due to necessarily having different time steps, and numerical errors in the diagonalization process of these almost not diagonalizable matrices. We study this trade-off mathematically and propose an optimization strategy for the choice of the parameters, for a Backward Euler discretization of the heat equation in two dimensions.
Access provided by Autonomous University of Puebla. Download conference paper PDF
Similar content being viewed by others
Keywords
- Backward Euler Discretization
- Waveform Relaxation Method
- Large-scale Parallel Computers
- Point Finite Difference Method
- Scalar Model Problem
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.
1 Introduction
Using the time direction in evolution problems for parallelization is an active field of research. Most of these methods are iterative, see for example the parareal algorithm analyzed in [3], a variant that became known under the name PFASST [10], and waveform relaxation methods based on domain decomposition [4, 5], see also [1] for a method called RIDC. Direct time parallel solvers are much more rare, see for example [2]. We present here a mathematical analysis of a different direct time parallel solver, proposed in [9]. We consider as our model partial differential equation (PDE) the heat equation on a rectangular domain Ω,
Using a Backward Euler discretization on the time mesh \(0 = t_{0} < t_{1} < t_{2} <\ldots < t_{N} = T\), \(k_{n} = t_{n} - t_{n-1}\), and a finite difference approximation Δ h of Δ over a rectangular grid of size \(J = J_{1}J_{2}\), we obtain the discrete problem
Let I t be the N × N identity matrix associated with the time domain and I x be the J × J identity matrix associated with the spatial domain. Setting \(\mathbf{u}:= (\mathbf{u}^{1},\ldots,\mathbf{u}^{N})\), \(\mathbf{f}:= (\mathbf{f}^{1} + \frac{1} {k_{1}} \mathbf{u}_{0},\mathbf{f}^{2},\ldots,\mathbf{f}^{N})\) and using the Kronecker symbol, (2) becomes
If B is diagonalizable, \(B = SDS^{-1}\), then (3) can be solved in 3 steps:
The N equations in space in step (b) can now be solved in parallel. This interesting idea comes from [9], but its application requires some care: first, B is only diagonalizable if the time steps are all different, and this leads to a larger discretization error compared to using equidistant time steps, as we will see. Second, the condition number of S increases exponentially with N, which leads to inaccurate results in step (a) and (c) because of roundoff error. We accurately estimate these two errors, and then determine for a user tolerance the maximum N and optimal time step sequence of the form \(k_{n+1} = qk_{n}\), which guarantees that errors stay below the user tolerance.
2 Error Estimate for Variable Time-Steps
We start by studying for a > 0 the ordinary differential equation (ODE)
For Backward Euler, \(u_{n} = (1 + ak_{n})^{-1}\,u_{n-1}\) with time steps from the division \(\mathcal{T} = (k_{1},\mathop{\ldots },k_{N})\) satisfying \(T =\sum _{ 1}^{N}k_{n}\), we define the error propagator by
such that the error at time T equals \(Err(\mathcal{T};a,T,N)u^{0}\). We also define the equidistant division \(\overline{\mathcal{T}}:= (\overline{k},\mathop{\ldots },\overline{k})\), where \(\overline{k} = T/N\).
Theorem 1 (Equidistant Partition Minimizes Error)
For any a,T and N, and any division \(\mathcal{T}\) , the error propagator is positive, and for the equidistant division \(\overline{\mathcal{T}}\) , the error is globally minimized.
Proof
Rewriting \(Err(\mathcal{T};a,T,N) =\prod \limits _{ n=1}^{N}(1 + ak_{n})^{-1} -\prod \limits _{n=1}^{N}(e^{ak_{n}})^{-1}\), we see that the error propagator is positive, since for all positive x, e x > 1 + x. To minimize the error, we thus have to minimize \(\varPhi (\mathcal{T} ):=\prod \limits _{ n=1}^{N}(1 + ak_{n})^{-1}\) as a function of \(\mathcal{T} \in \mathbb{R}^{N}\), with N inequality constraints k n ≥ 0, and one equality constraint \(\sum _{n=1}^{N}k_{n} = T\). We compute the derivatives
and to show that Φ is convex, we evaluate for an arbitrary vector \(\mathbf{x} = (x_{1},\ldots,x_{N})\)
Therefore the Kuhn Tucker theorem applies, and the only minimum is given by the existence of a Lagrange multiplier p with \(\varPhi '(\mathcal{T} ) + p{\boldsymbol 1} = 0\), \({\boldsymbol 1}\) the vector of all ones, whose only solution is \(\mathcal{T} = \overline{\mathcal{T}}\), \(p = a(1 + a\overline{k})^{-N-1}\).
We now consider a division \(\mathcal{T}_{q}\) of geometric time steps \(k_{n}:= q^{n-1}k_{1}\) for \(n = 1,\ldots,N\) as it was suggested in [8]. The constraint \(\sum _{n=1}^{N}k_{n} =\sum _{ n=1}^{N}q^{n-1}k_{1} = T\) fixes k 1, and using this we get
Since according to Theorem 1 the error is minimized for q = 1, one should not choose q very different from 1, and we now study the case \(q = 1+\varepsilon\) asymptotically.
Theorem 2 (Asymptotic Truncation Error Estimate)
Let \(u_{N}(q):=\varPhi (\mathcal{T}_{q})u_{0}\) be the approximate solution obtained with the division \(\mathcal{T}_{q}\) for \(q = 1+\varepsilon\) . Then, for fixed a,T and N, the difference between the geometric mesh and fixed step mesh approximations satisfies for \(\varepsilon\) small
Proof
Using a second order Taylor expansion, we obtain in the following two lemmas an expansion of \(\varPhi (\mathcal{T}_{1+\varepsilon })\) for small \(\varepsilon\).
Lemma 1
The time step k n in (7)has for \(\varepsilon\) small the expansion \(k_{n} = \overline{k}(1 +\alpha _{n}\varepsilon +\beta _{n}\varepsilon ^{2} + o(\varepsilon ^{2}))\) , with \(\alpha _{n} = n -\frac{N+1} {2}\) and \(\beta _{n} = n(n - N - 2) + \frac{(N+1)(N+5)} {6}\) . These coefficients satisfy the relations \(\sum _{n}\alpha _{n} =\sum _{n}\beta _{n} = 0\) , \(\sum _{n}\alpha _{n}^{2} = \frac{N(N-1)(N+1)} {12}\) .
Lemma 2
For \(\varepsilon\) small, we have the expansion
We can now apply Lemma 2 to obtain \(\varPhi (\mathcal{T}_{1+\varepsilon }) =\varPhi (\mathcal{T}_{1})(1 + \dfrac{b^{2}} {2} \sum _{n=1}^{N}\alpha _{n}^{2}\varepsilon ^{2} + o(\varepsilon ^{2}))\), and replacing Φ in the definition of u N concludes the proof.
3 Error Estimate for the Diagonalization of B
The matrix B is diagonalizable if and only if all time steps k n are different. The eigenvalues are then \(\frac{1} {k_{n}}\), and the eigenvectors form a basis of \(\mathbb{R}^{N}\). We will see below that the matrix of eigenvectors is lower triangular. It can be chosen with unit diagonal, in which case it belongs to a special class of Toeplitz matrices:
Definition 1
A unipotent lower triangular Toeplitz matrix of size N is of the form
Theorem 3 (Eigendecomposition of B)
If \(k_{n} = q^{n-1}k_{1}\) as in (7), then B has the eigendecomposition \(B = V DV ^{-1}\) , with \(D:= \mbox{ diag}( \frac{1} {k_{n}})\) , and V and its inverse are unipotent lower triangular Toeplitz matrices given by
Proof
Let \(\mathbf{v}^{(n)}\) be the eigenvector with eigenvalue \(\frac{1} {k_{n}}\). Since B is a lower bidiagonal matrix, a simple recursive argument shows that \(v_{j}^{(n)} = 0\) for j < n. One may choose \(v_{n}^{(n)} = 1\), which implies that for j > n we have \(v_{j}^{(n)} = (\prod _{i=1}^{n-j}(1 -\frac{k_{n+i}} {k_{n}} ))^{-1}\), and the matrix \(V = (\mathbf{v}^{(1)},\ldots,\mathbf{v}^{(N)})\) is lower triangular with unit diagonal. Furthermore, if \(k_{n} = q^{n-1}k_{1}\), we obtain for \(j = 1,2,\ldots,N - n\) that \(v_{n+j}^{(n)} = (\prod _{i=1}^{j}(1 - q^{i}))^{-1}\) which is independent of n and thus proves the Toeplitz structure in (10).
Consider now the inverse of V. First, it is easy to see that it is also unipotent Toeplitz. To establish (11) is equivalent to prove that
This result can be obtained using the q−analogue of the binomial formula, see [7]:
Theorem 4 (Simplified q−Binomial Theorem)
For any q > 0, q ≠ 1, and for any \(n \in \mathbb{N}\) ,
Multiplying (13) by p n then leads to (12).
In the steps (a) and (c) of the direct time parallel solver (4), the condition number of the eigenvector matrix S has a strong influence on the accuracy of the results. Normalizing the eigenvectors with respect to the ℓ 2 norm, \(S:= V \tilde{D}\), with \(\tilde{D} = diag( \frac{1} {\sqrt{1+\sum _{i=1 }^{N-n }\vert p_{i } \vert ^{2}}} )\), leads to an asymptotically better condition number:
Theorem 5 (Asymptotic Condition Number Estimate)
For \(q = 1+\varepsilon\) , we have
Proof
Note first that \(\vert q_{n}\vert \sim \vert p_{n}\vert \sim (n!\,\varepsilon ^{n})^{-1}\). Therefore
The same holds for V −1, and gives the first result. We next define \(\gamma _{n}:= \sqrt{1 +\sum _{ j=1 }^{N-n }\vert p_{j } \vert ^{2}}\), \(\tilde{d}_{n}:= \frac{1} {\gamma _{n}}\), which implies \(\tilde{D} = diag(\tilde{d}_{n})\). Then \(\gamma _{n} \sim \vert p_{N-n}\vert \), and we obtain
By definition \(S^{-1} =\tilde{ D}^{-1}V ^{-1} =\tilde{ D}^{-1}T(q_{1},\cdots \,,q_{N-1})\), that is the line n of T(q 1, ⋯ , q N−1) is multiplied by γ n . Therefore
4 Relative Error Estimates for ODEs and PDEs
We first give an error estimate for the ODE (5):
Theorem 6 (Asymptotic Roundoff Error Estimate for ODEs)
Let \(\mathbf{u}\) be the exact solution of \(B\mathbf{u} =\mathbf{ f}\) , and \(\hat{\mathbf{u}}\) be the computed solution from the direct time parallel solver (4)applied to (5), and u denote the machine precision. Then
Proof
In the ODE case, \(D = diag( \frac{1} {k_{n}} + a)\) and \(\|D\|_{\infty } = 1/k_{1} + a\). Using backward error analysis [6], the computed solution satisfies the perturbed systems
and since S and S −1 are triangular and D is diagonal we get (see [6])
Using Algorithm (4) to solve \(B\mathbf{u} =\mathbf{ f}\) by decomposition is equivalent to solving \((S +\delta S_{1})(D +\delta D)(S^{-1} +\delta S_{2})\ \hat{\mathbf{u}} =\mathbf{ f}\), which is of the form
The relative error then satisfies (see [6])
By a direct computation, we obtain for the inverse of B
and hence \(\|B^{-1}\|_{\infty } = k_{1}(1 + q + \mathop{\ldots }q^{N-1}) \sim Nk_{1}\). Since \(k_{1} \sim \overline{k} = T/N\), (16) is proved.
The error of the direct time parallel solver at time T can be estimated by
The first term on the right is the truncation error of the sequential method using equal time steps. The second term is due to the geometric mesh and was estimated asymptotically in Theorem 2 to be \(\alpha \varepsilon ^{2}\). The last term can be estimated by \(\|\mathbf{u}(q) -\hat{\mathbf{ u}}\|_{\infty }/\vert u_{0}\vert \) and thus Theorem 6, since a > 0 which implies \(\vert u_{0}\vert =\|\mathbf{ u}\|_{\infty }\). Because the second term is decreasing in \(\varepsilon\) and the last term is growing in \(\varepsilon\), we equilibrate them asymptotically:
Theorem 7 (Optimized Geometric Time Mesh)
Suppose the time steps are geometric, \(k_{n} = q^{n-1}k_{1}\) , and \(q = 1+\varepsilon\) with \(\varepsilon\) small. Let u be the machine precision. For \(\varepsilon =\varepsilon _{0}(aT,N)\) with
where α(aT,N) is defined in (8) and ϕ(N) in (15), the error due to time parallelization is asymptotically comparable to the one produced by the geometric mesh.
Proof
This is a direct consequence of Theorem 6.
We show in Fig. 1 on the left the optimized value \(\varepsilon _{0}(aT,N)\) from Theorem 7.
Choosing \(\varepsilon =\varepsilon _{0}(aT,N)\), the ratio between the additional errors due to parallelization to the truncation error of the fixed time step method is shown in Fig. 1 on the right.
In order to obtain a PDE error estimate for the heat equation (1), one can argue as follows: expanding the solution in eigenfunctions of the Laplacian, we can apply our results for the ODE with \(a =\lambda _{\ell}\), \(\lambda _{\ell}\) for ℓ = 1, 2, … the eigenvalues of the negative Laplacian. One can show for all N ≥ 1 and machine precision u small enough (single precision suffices in practice) that our error estimate (17) has its maximum for \(aT < a_{T}^{{\ast}} = 2.5129\), and thus if \(\lambda _{\min }\) and T are such that \(\lambda _{\min }T > a_{T}^{{\ast}}\), one can read off the optimal choice \(\varepsilon _{0}\) and resulting error estimate in Fig. 1 at \(aT =\lambda _{\min }T\) for a given number of processors N. Similarly, if we have N processors and do not want to increase the error compared to a sequential computation by more than a given factor, we can read in Fig. 1 on the right the size of the time window T to use (knowing \(a =\lambda _{\min }\)), and the corresponding optimized \(\varepsilon _{0}\) on the left.
5 Numerical Experiments
We first perform a numerical experiment for the scalar model problem (5) with a = 1, T = 1 and N = 10. We show in Fig. 2 on the left
how the discretization error increases and the parallelization error decreases as a function of \(\varepsilon\), together with our theoretical estimates, and also the condition number of the eigenvector matrix, and our theoretical bound. We can see that the theoretically predicted optimized \(\varepsilon\) marked by a rhombus is a good estimate, and on the safe side, of the optimal choice a bit to the left, where the dashed lines meet.
We next show an experiment for the heat equation in two dimensions on the unit square, with homogeneous Dirichlet boundary conditions and an initial condition \(u_{0}(x,y) =\sin (\pi x)\sin (\pi y)\). We discretize with a standard five point finite difference method in space with mesh size \(h_{1} = h_{2} = \frac{1} {10}\), and a Backward Euler discretization in time on the time interval \((0, \frac{1} {5})\) using N = 30 time steps. In Fig. 2 on the right we show again the measured discretization and parallelization errors compared to our theoretical bounds. As one can see from the graph, in this example, one could solve the problem using 30 processors, and would obtain an error which is within a factor two of the sequential computation.
References
A.J. Christlieb, C.B. Macdonald, B.W. Ong, Parallel high-order integrators. SIAM J. Sci. Comput. 32(2), 818–835 (2010)
M.J. Gander, S. Güttel, Paraexp: a parallel integrator for linear initial-value problems. SIAM J. Sci. Comput. 35(2), C123–C142 (2013)
M.J. Gander, E. Hairer, Nonlinear convergence analysis for the parareal algorithm, in Domain Decomposition Methods in Science and Engineering XVII, ed. by O.B. Widlund, D.E. Keyes, vol. 60 (Springer, Berlin, 2008), pp. 45–56
M.J. Gander, L. Halpern, Absorbing boundary conditions for the wave equation and parallel computing. Math. Comput. 74, 153–176 (2005)
M.J. Gander, L. Halpern, Optimized Schwarz waveform relaxation methods for advection reaction diffusion problems. SIAM J. Numer. Anal. 45(2), 666–697 (2007)
G.H. Golub, C.F. Van Loan, Matrix Computations, 4th edn. Johns Hopkins Studies in the Mathematical Sciences (Johns Hopkins University Press, Baltimore, 2013)
J. Haglund, The q,t-Catalan Numbers and the Space of Diagonal Harmonics. University Lecture Series, vol. 41 (American Mathematical Society, Providence, 2008)
Y. Maday, E.M. Rønquist, Fast tensor product solvers. Part II: spectral discretization in space and time. Technical report 7–9, Laboratoire Jacques-Louis Lions (2007)
Y. Maday, E.M. Rønquist, Parallelization in time through tensor-product space-time solvers. C. R. Math. Acad. Sci. Paris 346(1–2), 113–118 (2008)
M.L. Minion, A hybrid parareal spectral deferred corrections method. Commun. Appl. Math. Comput. Sci. 5(2), 265–301 (2010)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2016 Springer International Publishing Switzerland
About this paper
Cite this paper
Gander, M.J., Halpern, L., Ryan, J., Tran, T.T.B. (2016). A Direct Solver for Time Parallelization. In: Dickopf, T., Gander, M., Halpern, L., Krause, R., Pavarino, L. (eds) Domain Decomposition Methods in Science and Engineering XXII. Lecture Notes in Computational Science and Engineering, vol 104. Springer, Cham. https://doi.org/10.1007/978-3-319-18827-0_50
Download citation
DOI: https://doi.org/10.1007/978-3-319-18827-0_50
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-18826-3
Online ISBN: 978-3-319-18827-0
eBook Packages: Mathematics and StatisticsMathematics and Statistics (R0)