Multi-preconditioned Domain Decomposition Methods in the Krylov Subspaces

Ilin, Valery P.

doi:10.1007/978-3-319-57099-0_9

Valery P. Ilin¹⁶

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 10187))

Included in the following conference series:

International Conference on Numerical Analysis and Its Applications

1813 Accesses
7 Citations

Abstract

We consider the algebraic and geometric issues of the advanced parallel domain decomposition methods (DDMs) for solving very large non-symmetric systems of linear algebraic equations (SLAEs) that arise in the finite volume or the finite element approximation of the multi-dimensional boundary value problems on the non-structured grids. The main approaches in question for DDM include the balancing decomposition of the grid computational domain into parameterized overlapping or non-overlapping subdomains with different interface conditions on the internal boundaries. Also, we use two different sructures of the contacting the neigbour grid subdomains: with definition or without definition of the node dividers (separators) as the special grid subdomain. The proposed Schwarz parallel additive algorithms are based on the “total-flexible” multi-preconditioned semi-conjugate direction methods in the Krylov block subspaces. The acceleration of two-level iterative processes is provided by means of aggregation, or coarse grid correction, with different orders of basic functions, which realize a low - rank approximation of the original matrix. The auxiliary subsystems in subdomains are solved by direct or by the Krylov iterative methods. The parallel implementation of algorithms is based on hybrid programming with MPI-processes and multi-thread computing for the upper and the low levels of iterations, respectively. We describe some characteristic features of the computational technologies of DDMs that are realized within the framework of the library KRYLOV in the Institute of Computational Mathematics and Mathematical Geophysics, SB RAS, Novosibirsk. The technical requirements for this code are based on the absence of the program constraints on the degree of freedom and on the number of processor units. The conceptions of the creating the unified numerical envirement for DDMs are presented and discussed.

Access provided by CONRICYT-eBooks. Download conference paper PDF

Parallel Approaches and Technologies of Domain Decomposition Methods

Article 19 May 2015

Projection Methods in Krylov Subspaces

Article 28 June 2019

Comparative Analysis of Parallel Methods for Solving SLAEs in Three-Dimensional Initial-Boundary Value Problems

Keywords

1 Introduction

We consider the parallel computational methods and technologies for solving very large non-symmetric sparse SLAEs with positive definite matrices

$$\begin{aligned} \begin{array}{l} Au=f, A=\{a_{l,m}\}\in \mathcal {R}^{N,N}, u=\{u_{l}\}, f=\{f_{l}\}\in \mathcal {R}^{N}, \\ (Av,v)\ge \delta ||v||^{2}, \delta > 0, (v,w)=\sum \limits ^{N}_{i=1}v_{i}w_{i}, ||v||^{2}=(v,v), \end{array} \end{aligned}$$

(1)

which arise in finite element or finite volume approximations of the multi-dimensional boundary value problems (BVPs) on the adaptive non-structured grids. Let we have PDE

$$\begin{aligned} \begin{array}{l} Lu({{\varvec{r}}})=f({{\varvec{r}}}), {{\varvec{r}}}\in \varOmega , \\ lu|_{\varGamma }=g({{\varvec{r}}}), {{\varvec{r}}}\in \varGamma , \bar{\varOmega }=\varOmega \bigcup \varGamma , \end{array} \end{aligned}$$

(2)

where L is some linear differential operator with piece-wise smooth coefficients and l is boundary condition operator which has different types (Dirichlet, Neumann or Robin) at the different surface segments of $\varGamma $, in general. The computational domain $\bar{\varOmega }=\varOmega \bigcup \varGamma $ may have complicated geometry with multi-connected piece-wise boundary surfaces and contrast material properties in subdomains. We suppose that initial data of BVP (2) provide the existence of the unique solution $u({{\varvec{r}}})$ with the smooth enough properties, which are sufficient for validity of the numerical methods to be applied.

In recent decades, there are a lot of literature on the parallel domain decomposition methods, and we present in the reference some books and papers only [1,2,3,4]. The main approaches in question for DDM include the balancing decomposition of the grid computational domain into parameterized overlapping or non-overlapping subdomains with different interface conditions on the internal boundaries. Also, we use two different structures of the contacting the neigbour grid subdomains: with definition or without definition of the node dividers (separators) as the special (sceleton) grid subdomain. The first type decomposition (with sceleton grid subdomain) is usual for FETI approach of DDM, see [1, 2], but in the second case the original matrix A has more regular block-diagonal structure. The proposed Schwarz parallel additive algorithms are based on the flexible multi-preconditioned semi-conjugate direction methods in the block Krylov subspaces. The acceleration of two - level iterative processes is provided by means of aggregation, or coarse grid correction approach, with different orders of basic functions, which realize a low - rank approximation of the original matrix. The auxiliary subsystems in subdomains are solved by direct or by the Krylov iterative methods. The parallel implementation of algorithms is based on hybrid programming with MPI-processes and multi-thread computing for the upper and the low levels of iterations, respectively.

This paper is organized as follows. In the Sect. 2 we consider the geometric issues of the different types of domain decompositions. The next section includes the description of the two level iterative processes for solving SLAEs in Krylov subspaces, on the base of multi-preconditioning approache which was proposed by C. Greif with his colleagues in [5,6,7]. The Sect. 4 is devoted to the parallel implementation of the algorithms, which are realized in the framework of the library KRYLOV, Institute of Computational Mathematics and Mathematical Geophysics, SB RAS, Novosibirsk. The technical requirements of this code are based on the absence of the formal program constraints on the degree of freedom and on the number of processor units. In the conclusion we discuss the efficiency of the proposed methods and technologies, as well as the conceptions of the creating the unified numerical envirenment for fast solving very large sparse SLAEs and high perfomance computing for parallel DDMs.

2 Geometric Issues of DDM

Domain decomposition approaches can be considered at the continuous level and at the discrete level. We use the second way and suppose that the original computational domain $\varOmega $ is discretized already into grid computation domain $\varOmega ^{h}$. So, in the following, DDM is implemented to the grid domains only, and upper index “h” is omitted for bravity.

Let us decompose $\varOmega $ into P subdomains (with or without overlapping):

$$\begin{aligned} \begin{array}{l} \varOmega =\bigcup \limits ^{P}_{q=1}\varOmega _{q}, \bar{\varOmega }_{q}=\varOmega _{q}\bigcup \varGamma _{q}, \varGamma _{q}=\bigcup \limits _{q^{\prime }\in \omega _{q}}\varGamma _{q,q^{\prime }}, \varGamma _{q,q^{\prime }}=\varGamma _{q}\bigcap \bar{\varOmega }_{q^{\prime }}, q^{\prime }\ne q. \end{array} \end{aligned}$$

(3)

Here $\varGamma _{q}$ is the boundary of $\varOmega _{q}$ which is composed from the segments $\varGamma _{q,q^{\prime }}$, $q^{\prime }\in \omega _{q}$, and $\omega _{q}=\{q_{1},...,q_{M_{q}}\}$ is a set of $M_{q}$ contacting, or conjuncted, subdomains. Formally, we can denote also by $\varOmega _{0}=R^{d}/\varOmega $ the external subdomain:

$$\begin{aligned} \begin{array}{l} \bar{\varOmega }_{0}=\varOmega _{0}\bigcup \varGamma , \varGamma _{q,0}=\varGamma _{q}\bigcap \bar{\varOmega }_{0}=\varGamma _{q}\bigcap \varGamma , \varGamma _{q}=\varGamma ^{i}_{q}\bigcup \varGamma _{q,0}, \end{array} \end{aligned}$$

(4)

where $\varGamma ^{i}_{q}=\bigcup \limits _{q^{\prime }\ne 0}\varGamma _{q,q^{\prime }}$ and $\varGamma _{q,0}=\varGamma ^{e}_{q}$ mean internal and external parts of the boundary of $\varOmega _{q}$. We define also an overlapping $\varDelta _{q,q^{\prime }}=\varOmega _{q}\bigcap \varOmega _{q^{\prime }}$ of the neighbouring subdomains. If $\varGamma _{q,q^{\prime }}=\varGamma _{q^{\prime },q}$ and $\varDelta _{q,q^{\prime }}=0$ then overlapping of $\varOmega _{q}$ and $\varOmega _{q^{\prime }}$ is empty. In particular, we suppose in (3) that each of P subdomains has no intersection with $\varOmega _{0}$ $(\varOmega _{q}\bigcap \varOmega _{0}=0)$.

The idea of DDM includes the definition of the sets of BVPs for all subdomains which should be equivalent to the original problem (1):

$$\begin{aligned} \begin{array}{l} Lu_{q}({{\varvec{r}}})=f_{q}, {{\varvec{r}}}\in \varOmega _{q}, l_{q,q^{\prime }}(u_{q})\big |_{\varGamma _{q,q^{\prime }}}=g_{q,q^{\prime }}\equiv l_{q^{\prime },q}(u_{q^{\prime }})\big |_{\varGamma _{q^{\prime },q}}, \\ q^{\prime }\in \omega _{q}, l_{q,0}u_{q}|_{\varGamma _{q,0}}=g_{q,0}, \; q=1,...,P. \end{array} \end{aligned}$$

(5)

At each segment of the internal boundaries of subdomains, with operators $l_{q,q^{\prime }}$ from (4), the interface conditions in the form of the Robin boundary condition are imposed:

$$\begin{aligned} \begin{array}{l} \alpha _{q}u_{q}+\beta _{q} {\displaystyle \frac{\partial u_{q}}{\partial {{\varvec{n}}}_{q}}}\big |_{\varGamma _{q,q^{\prime }}}= \alpha _{q^{\prime }}u_{q}+\beta _{q^{\prime }} {\displaystyle \frac{\partial u_{q^{\prime }}}{\partial {{\varvec{n}}}_{q^{\prime }}}}\big |_{\varGamma _{q^{\prime },q}}, |\alpha _{q}|+|\beta _{q}|> 0, \quad \alpha _{q}\cdot \beta _{q}\ge 0. \end{array} \end{aligned}$$

(6)

Here $\alpha _{q^{\prime }}=\alpha _{q}, \beta _{q^{\prime }}=\beta _{q}$ and ${{\varvec{n}}}_{q}$ means the outer normal to the boundary segment $\varGamma _{q,q^{\prime }}$ of the subdomain $\varOmega _{q}$. Strictly speaking, at each part of the internal boundary $\varGamma _{q,q^{\prime }}, q^{\prime }\ne 0$, the pair of different coefficients $\alpha ^{(1)}_{q}, \beta ^{(1)}_{q}$ and $\alpha ^{(2)}_{q}, \beta ^{(2)}_{q}$ for the conditions of the type (5) should be given. For example, $\alpha ^{(1)}_{q}=1$, $\beta ^{(1)}_{q}=0$ and $\alpha ^{(2)}_{q}=0$, $\beta ^{(2)}_{q}=1$ correspond formally to the continuity of the solution to be sought for and its normal derivative respectively. The additive Schwarz algorithm in DDM is based on the iterative process, in which the BVPs in each q-th subdomain are solved simultaneously, and right hand sides of boundary condition in (5), (6) are taken from the previous iteration.

We implement domain decomposition in two steps. At the first one, we define subdomains $\varOmega _{q}$ without overlapping, i.e. the contacting grid subdomains have no the joint nodes, and each node belongs to one subdomain only. Then we define the grid boundary $\varGamma _{q}=\varGamma ^{0}_{q}$ of $\varOmega _{q}$, as well as the extensions of $\bar{\varOmega }^{t}_{q}=\varOmega ^{t}_{q}\bigcup \varGamma ^{t}_{q}, \; \varOmega ^{0}_{q}=\varOmega _{q}, \; t=0,...,\varDelta $, layer by layer:

$$\begin{aligned} \begin{array}{l} \varGamma _{q}\equiv \varGamma ^{0}_{q}=\{l^{\prime }\in \hat{\omega }_{l}, l\in \varOmega _{q}, l^{\prime }\notin \varOmega _{q}, \varOmega ^{1}_{q}=\bar{\varOmega }^{0}_{q}=\varOmega _{q}\bigcup \varGamma ^{0}_{q}\}, \\ \varGamma ^{t}_{q}=\{l^{\prime }\in \hat{\omega }_{l}, l\in \varOmega ^{t-1}_{q}, l^{\prime }\in \varOmega ^{t-1}_{q}, \varOmega ^{t}_{q}=\bar{\varOmega }^{t-1}_{q}=\varOmega ^{t-1}_{q}\bigcup \varGamma ^{t-1}_{q}\}. \end{array} \end{aligned}$$

(7)

Here $\varDelta $ means the parameter of extension, or overlapping.

At the Fig. 1, we present an example of 2D grid domain decomposition with grid sceleton subdomain whose node dividers are denoted by crosses.

The second example is presented at the Fig. 2 where we have the grid decomposition without node separators for overlapping parameter $\varDelta =3$.

Algebraic interpretation of DDM, after approximations of BVPs (5), is described by the block version of SLAEs (1):

$$\begin{aligned} \begin{array}{l} A_{q,q}u_{q}+\sum \limits _{r\in \hat{\omega }_{q}}A_{q,r}u_{r}=f_{q}, q=1,...,P, \end{array} \end{aligned}$$

(8)

where $u_{q}, f_{q}\in \mathcal {R}^{N^{\varDelta }_{q}}$ are subvectors with the components which belong to corresponding subdomain $\varOmega ^{\varDelta }_{q}$, and $N^{\varDelta }_{q}$ is the number of nodes in $\varOmega ^{\varDelta }_{q}$.

In the case for Fig. 1, if the sceleton subdomain is numbered as the last one, the block matrix A has the following arrow type structure:

$$ A=\{A_{q,r}\}=\left| \begin{array}{ccc|cc} A_{1,1} &{} &{} 0 &{} A_{1,P+1} \\ &{} \ddots &{} &{} \vdots \\ 0 &{} &{} A_{P,P} &{} A_{P,P+1} \\ --- &{} --- &{} --- &{} ---\\ A_{P+1,1} &{} \cdots &{} A_{P+1,P} &{} A_{P+1,P+1} \end{array}\right| . $$

In the second case (decomposition without node dividers, Fig. 2), the matrix A has more regular block-diagonal structure.

The implementation of the interface conditions between adjacent subdomains can be described as follows. Let the l-th node be a near-boundary one in subdomain $\varOmega _{q}$, see Fig. 3. Then write down the corresponding equation in the form

$$\begin{aligned} \begin{array}{l} (B_{q,q}u)_{l}\equiv (a_{l,m}+\theta _{l}\sum \limits _{m\notin \omega _{q}}a_{l,m})u_{l}+{\displaystyle \sum \limits _{m\in \omega _{q}}}a_{l,m}u_{m}=f_{l}+{\displaystyle \sum \limits _{m\notin \omega _{q}}}a_{l,m}(\theta _{l}u_{l}-u_{m}). \end{array} \end{aligned}$$

(9)

Here $\theta _{l}$ is some parameter which corresponds to different type of boundary condition at the boundary $\varGamma _{q}$: $\theta _{l}=0$ corresponds to Dirichlet condition, $\theta _{l}=1$ corresponds to Neumann condition, and $\theta _{l}\in (0, 1)$ – to the Robin boundary condition. The diagonal blocks $B_{q,q}$ define the block-diagonal matrix $B_{s}=\text{ block-diag } \{B_{q,q}\}$ for the additive Schwarz (AS) iterative process. In the implementation of AS we take the right hand side of (9) from the previous iteration.

The additive Schwarz iterative algorithm is defined by the corresponding preconditioning matrix $B_{AS}$ which can be described as follows. For subdomain $\varOmega ^{\varDelta }_{q}$ with overlapping parameter $\varDelta $ we define a prolongation matrix $R^{T}_{q,\varDelta }\in \mathcal {R}^{N,N^{\varDelta }_{q}}$ which extends vectors $u_{q}=\{u_{l},\;l\in \varOmega ^{\varDelta }_{q}\}\in \mathcal {R}^{N^{\varDelta }_{q}}$ to $\mathcal {R}^{N}$ by the relations

$$ (R^{T}_{q,\varDelta }u_{q})_{l}=\left\{ \begin{array}{cc} (u_{q})_{l} &{} \text{ if } l\in \varOmega ^{\varDelta }_{q}, \\ 0 &{} \text{ otherwise }. \end{array}\right. $$

The tranpose of this matrix defines a restriction operator which restricts vectors in $\mathcal {R}^{N}$ to the subdomain $\varOmega ^{\varDelta }_{q}$. The diagonal block of the preconditioning matrix $B_{AS}$, which presents the restriction of the discretized BVP to the q-th subdomain, is given by $\hat{A}_{q}=R_{q,\varDelta }AR^{T}_{q,\varDelta }$. In these terms, the additive Schwarz preconditioner is defined as

$$ B_{AS}=\sum \limits ^{P}_{q=1}B_{AS,q}, B_{AS,q}=R^{T}_{q,\varDelta }\hat{A}^{-1}_{q}R_{q,\varDelta }. $$

Also, we define so called restricted additive Schwarz preconditioner by considering the prolongation $R^{T}_{q,0}$ instead of $R^{T}_{q,\varDelta }$, i.e.

$$ B_{RAS}=\sum \limits ^{P}_{q=1}B_{RAS,q}, B_{RAS,q}=R^{T}_{q,0}\hat{A}^{-1}_{q}R_{q,\varDelta }. $$

Let us remark, that $B_{RAS}$ is non-symmetric matrix even A is symmetric one.

The second preconditioning matrix which we use for the DDM iterations in Krylov subspaces is responsible for the coarse grid correction, or aggregation approach, based on the low-rank approximation of the original matrix A. We define coarse grid, or macrogrid, $\varOmega _{c}$ and corresponding coarse space with degree of freedom $N_{c}\ll N$, as well as some basic functions $w^{k}\in \mathcal {R}^{N}, \; k=1,...,N_{c}$. We suppose that the rectangular matrix $W=(w^{1}...w^{N_{c}})\in \mathcal {R}^{N,N_{c}}$ has a full rank and define the coarse grid preconditioner $B_{c}$ as follows:

$$ B^{-1}_{c}=W\hat{A}^{-1}W^{T}, \hat{A}=W^{T}AW\in \mathcal {R}^{N_{c},N_{c}}, $$

where small matrix $\hat{A}$ is low rank approximation of A, W is called restriction matrix and transposed matrix $W^{T}$ is prolongation one.

In some papers, see [4, 8], aggregation, or deflation, technique is applied for improvement of the initial guess for Krylov’s iterative methods. Let $u^{-1}$ be an arbitrary vector. Then we can compute the initial vectors $u^{0}$ and $r^{0}$ by the formulaes

$$\begin{aligned} \begin{array}{l} u^{0}=u^{-1}+B^{-1}_{c}r^{-1}, r^{-1}=f-Au^{-1}, \\ r^{0}=f-Au^{0}, p^{0}=r^{0}-B^{-1}_{c}r^{0}, \end{array} \end{aligned}$$

(10)

which provide the orthogonal properties

$$\begin{aligned} \begin{array}{l} W^{T}r^{0}=0, W^{T}Ap^{0}=0, \end{array} \end{aligned}$$

(11)

where $p^{0}$ is convential initial direction vector in the Krylov’s methods.

It is possible to choose the basic functions $w^{k}({{\varvec{r}}})$ from the approximation principle. If the solution u of the original problem is smooth enough, we can write

$$ u=\big \{u_{l}\approx u^{c}_{l}=\sum \limits ^{N_{c}}_{k=1}c_{k}w^{k}({{\varvec{r}}}_{l})\big \}\cong W\hat{u}, $$

where the vector $\hat{u}=\{c_{k}\}\in \mathcal {R}^{N_{c}}$ consists of the coefficient of the expansion. If we substitute this representation in (1), we obtain the approximate equation $AW\hat{u}\approx f$. From here, we have by multiplication with $W^{T}$ the both sides of this equation,

$$\begin{aligned} \begin{array}{l} W^{T}AW\hat{u}\approx W^{T}f, \hat{u}\cong \hat{A}^{-1}W^{T}f, \hat{A}=W^{T}AW, \\ u\approx W\hat{u}\approx B^{-1}_{c}f, B_{c}=W\hat{A}^{-1}W^{T}. \end{array} \end{aligned}$$

(12)

It is natural to use basic function $w^{k}({{\varvec{r}}})$ as some finite interpolation functions of the different orders. In the simplest case the functions $w^{k}({{\varvec{r}}}), \; k=1,...,N_{c}=P$, are choosen as unit in k-th subdomain and equal to zero in the other subdomains. It is important, that in general the coarse grid $\varOmega _{c}$ does not depend of the domain decomposion.

Let us remark, that instead of the deflation approach (12), which is based on the multiplication of the both sides of the original SLAE with $W^{T}$, we can use multiplication with $W^{T}A^{T}$. In this case we obtain

$$ \begin{array}{l} W^{T}A^{T}AW\check{u}\approx W^{T}A^{T}f, \check{u}=\check{A}^{-1}W^{T}A^{T}f, \\ u\approx W\check{u}\approx \check{B}^{-1}_{c}f, \check{A}=W^{T}A^{T}AW, \check{B}_{c}=W\check{A}^{-1}W, \end{array} $$

and application of the formulae (10) with $\check{B}_{c}$, instead of $B_{c}$, provides the initial guess with the other kind of orthogonal properties:

$$ W^{T}A^{T}r^{0}=0, W^{T}A^{T}Ap^{0}=0. $$

3 Multi-preconditioned SCR

Now, we present the general description of the multi-preconditioned semi-conjugate residual iterative method. Let $r^{0}=f-Au^{0}$ be an initial residual of the algebraic system, and $B^{(1)}_{0},...,B^{(m_{0})}_{0}$ – be a set of some non-singular easily invertible preconditioning matrices. Using them, we define a rectangular matrix composed of the initial direction vectors $p^{0}_{k},\, k=1,...,m_{0}$:

$$\begin{aligned} \begin{array}{l} P_{0}=[p^{0}_{1}\cdots p^{0}_{m_{0}}]\in \mathcal {R}^{N,m_{0}}, p^{0}_{l}=(B^{(l)}_{0})^{-1}r^{0}, \end{array} \end{aligned}$$

(13)

which are assumed to be linearly independent.

Successive approximations $u^{n}$ and the corresponding residuals $r^{n}$ will be sought for with the help of the recursions

$$\begin{aligned} \begin{array}{l} u^{n+1}=u^{n}+P_{n}\bar{\alpha }_{n}=u^{0}+P_{0}\bar{\alpha }_{0}+...+P_{n}\bar{\alpha }_{n}, \\ r^{n+1}=r^{n}-AP_{n}\bar{\alpha }_{n}=r^{0}-AP_{0}\bar{\alpha }_{0}-...-AP_{n}\bar{\alpha }_{n}. \end{array} \end{aligned}$$

(14)

Here $\bar{\alpha }_{n}=(\alpha ^{1}_{n},...,\alpha ^{m_{n}}_{n})^{T}$ are $m_{n}$-dimensional vectors. The direction vectors $p^{n}_{l},\; l=1,...,m_{n}$ forming the columns of the rectangular matrices $P_{n}=[P^{n}_{1}\cdots P^{n}_{m_{n}}]\in \mathcal {R}^{N,m_{n}}$ are defined as orthogonal ones in the sense of satisfying the relations

$$\begin{aligned} \begin{array}{l} P^{T}_{n}A^{T}AP_{k}=D_{n,k}=0 \text{ for } k\ne n, \end{array} \end{aligned}$$

(15)

where $D_{n,n}=diag\{\rho _{n,l}\}$ is a symmetric positive definite matrix, because the matrices $P_{k}$ have the full rank as is supposed.

Orthogonal properties (15) provide the minimization of the residual norm $||r^{n+1}||_{2}$ in the block Krylov subspace of the dimension $M_{n}$:

$$\begin{aligned} \begin{array}{l} K_{M_{n}}=Span\{P_{0},...,A^{n-1}P_{n-1}\}, M_{n}=\sum \limits ^{n-1}_{k=0}m_{k}, \end{array} \end{aligned}$$

(16)

if we define the coefficient vectors $\bar{\alpha }_{n}$ by the formula

$$\begin{aligned} \begin{array}{l} \bar{\alpha }_{n}=\{\alpha _{n,l}\}=(D^{-1}_{n,n})^{-1}P^{T}_{n}A^{T}r^{0}. \end{array} \end{aligned}$$

(17)

For such values of $\bar{\alpha }_{n}$ it is easy to check that the vectors $p^{n}_{k},\; r^{n}_{k}$ satisfy to semi-conjugation condition, i.e.

$$\begin{aligned} \begin{array}{l} P^{T}_{k}A^{T}r^{n+1}=0, k=0,1,...,n. \end{array} \end{aligned}$$

(18)

In this case, the following relations are valid for the functionals of the residuals:

$$\begin{aligned} \begin{array}{l} ||r^{n+1}||^{2}\equiv (r^{n+1}, r^{n+1})=(r^{n}, r^{n})-(C_{n}r^{0}, r^{0})= \\ (r^{0},r^{0})-(C_{0}r^{0},r^{0})-...-(C_{n}r^{0}, r^{0}), C_{n}=P_{n}AD^{-1}_{n,n}A^{T}P^{T}_{n}. \end{array} \end{aligned}$$

(19)

Let us remark that such properties are valid for any direction vectors $p^{n}_{k}$ which satisfy to orthogonal condition (15). We will look for the matrices composed of the direction vectors from the recurrent relations

$$\begin{aligned} \begin{array}{l} P_{n+1}=Q_{n+1}-\sum \limits ^{n}_{k=0}P_{k}\bar{\beta }_{k,n}, \end{array} \end{aligned}$$

(20)

where the auxiliary matrices

$$\begin{aligned} \begin{array}{l} Q_{n+1}=[q^{n+1}_{1}...\; q^{n+1}_{m_{n}}], q^{n+1}_{l}=(B^{(l)}_{n+1})^{-1}r^{n+1},l=1,...,m_{n}, \end{array} \end{aligned}$$

(21)

are introduced, $B^{(l)}_{n+1}$ are some non-singular easy invertible preconditioning matrices and $\bar{\beta }_{k,n}$ are the coefficient vectors, which are defined after substitution of (18) into orthogonality conditions (15,) by the formula

$$\begin{aligned} \begin{array}{l} \bar{\beta }_{k,n}=D^{-1}_{k,k}P^{T}_{k}A^{T}AQ_{n+1}. \end{array} \end{aligned}$$

(22)

The following statement is valid.

Theorem 1

Let the matrices A and $B^{(l)}_{n}, \; n=0,1,...; \; l=1,...,m_{n}$ be nonsingular, and the matrices $P_{n}$ have a full rank. Then the iterative process (14), (17), (20)–(22) provides minimization of the residual norm $||r^{n}||$ in the block Krylov subspaces (16). Moreover, the following semi-orthogonal properties of residual vectors are valid:

$$\begin{aligned} \begin{array}{l} (A^{\gamma }B^{-1}_{k,l}r^{n},r^{k})=\left\{ \begin{array}{lr} 0, \quad k< n, \\ \sigma ^{(\gamma )}_{n}=(A^{\gamma }B^{-1}_{n,l}r^{n},r^{n}), \quad k=n. \end{array}\right. \end{array} \end{aligned}$$

(23)

Also, the coefficients $\alpha _{n,l}$ can be computed by the formula

$$\begin{aligned} \begin{array}{l} \alpha ^{(\gamma )}_{n,l}=(A^{\gamma }B^{-1}_{n,l}r^{n},r^{n})/\rho _{n,l}. \end{array} \end{aligned}$$

(24)

instead of (17).

The presented MPSCR method use the dynamic, or flexible, definition of the preconditioners $B^{(l)}_{n}$ and, moreover, their number $m_{n}$ is variable at the different iteration steps. We propose to use the “total-flexible” variants of “coarse” preconditioning matrics $B_{c}$ and Jacobi-Schwarz ones $B_{s}$. It means that at each n-th iteration we can apply several number of the different coarse grids $\varOmega ^{l^{\prime }}_{c,n}, \; l^{\prime }=0,1,...,m^{c}_{n}$, of the corresponding dimensions $N^{l^{\prime }}_{c,n}$, and different number ofSchwartz – type preconditioners $B^{l^{\prime \prime }}_{s,n}, \; l^{\prime \prime }=0,1,...,m^{s}_{n}\;\; (m^{s}_{n}=0 \text{ or } m^{c}_{n}=0$ means no using the corresponding preconditioning at the current iterations). Let us note that application of several Schwarz preconditioners at each iteration corresponds to weighted version of domain decomposition, proposed by Greif in [7], and using several aggregation approaches at one iterative step can be interpretated, in a sense, as additive multi-grid techniques. The simplified versions of SCR where considered in [9]–[11], and in [10] it is called as Generalized Conjugate Residual (GCR) method. In a sense, SCR method is an alternative to wellknown GMRES, see [12].

The disadvantage of the considered algorithm, as well as other Krylov’s methods for solving non-symmetric SLAEs, is the necessity of saving the all direction vectors what requires a lot of memory for large number of iterations. There are two main ways to avoid these circumstances. The first one consists into developing the restarts periodically: at each iteration with the numbers $n_{r}=r\cdot n_{rest}, \; r=1,2,...,$ the current residual vector is computed not by recurion (14), but from equation $r^{n_{r}}=f-Au^{n_{r}}$, and the next values $u^{n+1}, r^{n+1}, P_{n+1}$ are computed recursivaly again from the beginning. In the second way the limited orthogonalization is used, and only $n_{lim}$ last direction matrices $P_{n}, P_{n-1},...,P_{n-n_{lim}}$ are saved and used in the formula (20). Also, for the case when we have too large deminsions of the Krylov subspaces, in [7] it was proposed to use selected multi-preconditioning when some preconditioners and corresponding direction vectors are omitted on some steps of the iterative process.

Let us remark also, that in the overlapping DDM the vectors $u^{n}$ are overdetermined in the intersections of the subdomains, and we use restricted additive Schwartz approach with the preconditioner $B_{RAS}$ in this case.

4 Parallel Algorithms and Technologies

The presented principles of the constructing the algorithms are implemented in the library Krylov [12], Institute of Computational Mathematics and Mathematical Geophysics, SB RAS, Novosibirsk, for the efficient parallel solution of the large SLAEs with sparse matrices, which are saved in the compressed sparse row format (CSR, [13]). Of course, the convertors to other convential the key approach for the automatical construction partitioning of the weighted oriented graph that presents the structure of grid set of equation, see Fig. 4. The synchronization of the distributed computing in DDMs is provided by the MPI-processes which are implemented for corresponding subdomains at the multi-core CPUs with shared memory.

The main requirements to develop a proper software are the following:

no program formal restrictions on the degree of freedom of algebraic systems to be solved and on the number of the processor units or computational cores used; in another words, the numerical envirenment would be provide the scalable parallelism in the weak and/or in the strong sense;
application of the advanced iterative methods, with the possibility of the extension of the library functionality;
robust implementation on the base of hybrid programming of two-level computational process: using MPI tools for outer Krylov’s iterations and multi-thread techniques for solving the algebraic subsystems in subdomains;
high performance computing by using the efficient SPARSE BLAS tools [13] and communication optimization.

The library tools include automatical construction of the balancing domain decomposition, application of the different number of subdomains, size of overlapping, type of interface conditions, using various preconditioners and Krylov’s algorithms, etc. The current version of the library does not use multi-GPU computer configuration yet, and corresponding development is in progress.

5 Conclusion

At present time, there are many wellknown libraries and packages of algebraic iterative solvers for large sparse SLAEs: HYPRE, PETSc, Sparse Kit, and others,– which are available by context at the Internet. The goal of the creating the new library Krylov consists in the development of the extendable efficient envirenment for scalable parallel solving the various types of grid algebraic systems (real and complex, symmetric and non-symmetric, positive definite and non-definite, Hermitian and non-Hermitian, etc.) by advanced approaches of DDM by means of hybrid programming at the geterogenous multi-CPU, multi-core and multi-GPU computers. The program implementation is organized as Open Source adapted to the evolution of the computer architectures and platforms. This library has two-fold destinations. The first one consists in the providing the numerical tools for automatical construction of the algorithms, fast developing, validation, verification, testing, and comparative analysis of the new methods. The second aim includes the development of the high preformance code and friendly interface for the end users. In principle, this is not group project, and it is oriented to the wide cooperation of the computational algebra community.

References

Toselli, A., Widlund, O.: Domain Decomposition Methods - Algorithms and Theory. Springer Series in Computational Mathematics, vol. 34. Springer, Heidelberg (2005)
Book MATH Google Scholar
Dolean, V., Jolivet, P., Nataf, F.: An Introduction to Domain Decomposition Methods: Algorithms, Theory and Parallel Implementation. SIAM, Philadelphia (2015)
Book MATH Google Scholar
Dubois, O., Gander, M.J., Loisel, S., St-Cyr, A., Szyld, D.: The optimized Schwarz method with a coarse grid correction. SIAM J. Sci. Comput. 34(1), 421–458 (2012)
Article MathSciNet MATH Google Scholar
Gurieva, Y.L., Il’in, V.P.: Parallel approaches and technologies of domain decomposition methods. J. Math. Sci. 207(5), 724–735 (2015)
Article MathSciNet MATH Google Scholar
Bridson, R., Greif, C.: A multipreconditioned conjugate gradient algorithm. SIAM J. Matrix Anal. Appl. 27(4), 1056–1068 (2006)
Article MathSciNet MATH Google Scholar
Greif, C., Rees, T., Szyld, D.B.: MPGMRES: a generalized minimum residual method with multiple preconditioners. Technical Report 11–12-23, Department of Mathematics, Temple University (2011)
Google Scholar
Greif, C., Rees, T., Szyld, D.B.: Additive Schwarz with variable weights. In: Erhel, J., Gander, M.J., Halpern, L., Pichot, G., Sassi, T., Widlund, O. (eds.) Domain Decomposition Methods in Science and Engineering XXI. LNCSE, vol. 98, pp. 779–787. Springer, Cham (2014). doi:10.1007/978-3-319-05789-7_75
Google Scholar
Chapman, A., Saad, Y.: Deflated and augmented Krylov subspace techniques. Numer. Linear Algebra Appl. 4(1), 43–66 (1997)
Article MathSciNet MATH Google Scholar
Il’in, V.P., Itskovich, E.A.: On the semi-conjugate direction methods with dynamic preconditioning. J. Appl. Ind. Math. 3(2), 222–233 (2009)
Article MathSciNet MATH Google Scholar
Eisenstat, S.C., Elman, H.C., Schultz, M.H.: Variational iterative methods for nonsymmetric systems of linear equations. SIAM J. Numer. Anal. 20(3), 345–357 (1983)
Article MathSciNet MATH Google Scholar
Yuan, J.Y., Golub, G.H., Plemmons, R.J., Cecilio, W.A.: Semi-conjugate direction methods for real positive definite systems. BIT 44(1), 189–207 (2004)
Article MathSciNet MATH Google Scholar
Butyugin, D.S., Gurieva, Y.L., Il’in, V.P., Perevozkin, D.V., Petukhov, A.V., Skopin, I.N.: Functionality and technologies of the algebraic solvers in the library Krylov (in Russian). Her. SUSU Ser. Comput. Math. Inf. 2(3), 92–103 (2013)
Google Scholar
Intel(R) Math Kernel Library (2013). http://sofware.intel.com/en-us/articles/intel-mkl/

Download references

Acknowledgements

The paper is supported by the Russian Scientific Foundation grant N 14-11-00485 and RFBR grant N 14-07-00128.

Author information

Authors and Affiliations

The Institute of Computational Mathematics and Mathematical Geophysics, SB RAS, Novosibirsk State University, Novosibirsk, Russia
Valery P. Ilin

Authors

Valery P. Ilin
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Valery P. Ilin .

Editor information

Editors and Affiliations

Institute of Information and Communication Technologies, Bulgarian Academy of Sciences, Sofia, Bulgaria
Ivan Dimov
Department of Applied Analysis, Eötvös Loránd University, Budapest, Hungary
István Faragó
Department of Applied Mathematics and Statistics, University of Rousse, Ruse, Bulgaria
Lubin Vulkov

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Ilin, V.P. (2017). Multi-preconditioned Domain Decomposition Methods in the Krylov Subspaces. In: Dimov, I., Faragó, I., Vulkov, L. (eds) Numerical Analysis and Its Applications. NAA 2016. Lecture Notes in Computer Science(), vol 10187. Springer, Cham. https://doi.org/10.1007/978-3-319-57099-0_9

Download citation

DOI: https://doi.org/10.1007/978-3-319-57099-0_9
Published: 12 April 2017
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-57098-3
Online ISBN: 978-3-319-57099-0
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Multi-preconditioned Domain Decomposition Methods in the Krylov Subspaces

Abstract

Similar content being viewed by others