Abstract
A hybrid approach for the solution of linear elliptic PDEs, based on the unified transform method in conjunction with domain decomposition techniques, is introduced. Given a well-posed boundary value problem, the proposed methodology relies on the derivation of an approximate global relation, which is an equation that couples the finite Fourier transforms of all the boundary values. The computational domain is hierarchically decomposed into several nonoverlapping subdomains; for each of those subdomains, a unique approximate global relation is derived. Then, by introducing a modified Dirichlet-to-Neumann iterative algorithm, it is possible to compute the solution and its normal derivative at the resulting interfaces. By considering several hierarchical levels, higher spatial resolution can be achieved. There are three main advantages associated with the proposed approach. First, since the unified transform is a boundary-based technique, the interior of each subdomain does not need to be discretized; thus, no mesh generation is required. Additionally, the Dirichlet and Neumann values can be computed on the interfaces with high accuracy, using a collocation technique in the complex Fourier plane. Finally, the interface values at each hierarchical level can be computed in parallel by considering a quadtree decomposition in conjunction with the iterative Dirichlet-to-Neumann algorithm. The proposed methodology is analysed both regarding implementation details and computational complexity. Moreover, numerical results are presented, assessing the performance of the solver.
Similar content being viewed by others
Avoid common mistakes on your manuscript.
1 Introduction
The numerical solution of boundary value problems (BVPs) for linear elliptic partial differential equations (PDEs) represents an important topic in the area of applied mathematics and scientific computing. In recent years, efforts have been directed towards the development of novel numerical schemes based on the unified transform, or the Fokas method [16,17,18,19]. This method can be considered as the spectral analogue of the classical method of boundary integral equations. The unified transform that was introduced by one of the authors in the late 1990s has been used for the solution of a large class of BVPs, including linear and certain nonlinear PDEs. For the solution of a particular well-posed BVP in a polygonal domain, this method involves two basic steps: first, the solution is expressed in terms of integrals in the complex Fourier plane; these integrals contain certain integral transforms of the Dirichlet and Neumann boundary values. Second, the computation of these integral transforms through the analysis of the global relation. For the second task, several numerical schemes have been developed over the last years in order to determine the unknown boundary values [13, 20, 22, 23, 28, 41].
In this paper, considering the second ingredient of the unified transform, i.e. the analysis of the global relation, we propose a class of numerical schemes exhibiting the following properties: high accuracy, avoidance of mesh generation, and inherent parallelism. Regarding the first property, high accuracy is achieved by using a Legendre expansion collocation method in the complex plane for computing the unknown Dirichlet and Neumann values on the boundary of a computational domain. The second property relies on the fact that the collocation method for the global relation represents a boundary-based technique that involves the computation of the relevant expansion coefficients; hence, the solution can be evaluated anywhere on the boundary without the generation of a particular computational grid. Regarding the third property, the inherent parallelism is achieved by considering a domain decomposition technique. In particular, by redesigning an iterative Dirichlet-to-Neumann algorithm in terms of the global relation, and by considering a quadtree-type domain decomposition scheme, it is possible to derive several hierarchical partitions of the domain, where groups of subdomains can be processed independently.
Here, we present the details of the proposed methodology for the Laplace equation on a square as well as on an L-shaped domain; however, the relevant numerical schemes can be used with slight modifications for any linear elliptic PDE that admits a global relation, in a general polygonal domain with an arbitrary number of sides. In addition to the method for solving the given PDE in the interior of the domain, we also propose a technique for estimating the error of the approximated solution and the approximated normal derivative on the boundaries of the domain. This a posteriori error estimator is then used as a stopping criterion for terminating the relevant iterative procedures. Furthermore, it is also used for designing a novel technique for adaptively generating subdomains according to a prescribed threshold criterion. Moreover, we show that above approach can be possibly advantageous for solving PDEs formulated on nonconvex polygons with multiple re-entrant corners. Recent developments concerning numerical techniques based on the unified transform can be found in [2, 10, 15, 24,25,26,27].
Over the last decades, the development in the field of high-performance computing (HPC) has directed the efforts of the research community to design highly parallel techniques. Domain decomposition techniques [34, 39, 42] have been used extensively for solving practical problems in science and engineering. There are several methods for solving PDEs using the domain decomposition approach, based on the finite difference (FDM) [38], the finite element (FEM) [48], the finite volume (FVM) [38], as well as the boundary element (BEM) [40] method and spectral methods [7, 8]. The techniques proposed in this paper share several characteristics with the boundary element method, the spectral as well as the meshless method, forming an entire new class of techniques. Regarding the scope of high-performance scientific computing, work related to the utilization of hardware and software on scalable multiprocessor systems can be found in [1, 4, 5, 30, 43]. In addition, recent developments in preconditioning techniques and parallel solution methods can be found in [21, 29, 31,32,33, 35, 36, 45, 47].
In Sect. 2, the collocation method for solving the global relation is briefly reviewed. In Sect. 3, an iterative Dirichlet-to-Neumann (DtN) algorithm based on the global relation is introduced. In Sect. 4, a parallel meshless hierarchical solver based on the iterative DtN algorithm is discussed. In Sect. 5, an adaptive technique for subdomain generation based on a posteriori error estimation is also proposed. In Sect. 6, the computational complexity of the proposed schemes is analysed and discussed. Finally, in Sect. 7, numerical results are presented demonstrating the applicability and possible advantages of the proposed techniques.
2 Discrete global relations
The global relation is an algebraic equation that couples the finite Fourier transforms of all boundary values. Given a well-posed boundary value problem subject to prescribed boundary conditions, the global relation can be solved either analytically or numerically in order to determine the unknown boundary values. In this paper, we consider the first approach, and below we review the collocation technique that will be used in conjunction with the proposed domain decomposition methodology.
Let us consider the Laplace equation in complex coordinates, on a convex two-dimensional polygon \(\varOmega \)
Additionally, let us consider the second identity of Green
as well as the identity
Considering the particular solution \(v=e^{-i \lambda z},\lambda \in {\mathbb{C}}\), derived by the method of separation of variables, and using Eqs. (2) and (3), the following global relation is derived
In order to solve numerically Eq. (4), each side of the polygonal domain \(\varOmega \) is parameterized using a local parameter t, as follows
Then, the discrete version of Eq. (4) takes the form
The collocation-type technique requires the expansion of the boundary values in terms of appropriate basis functions. Following [28], we choose the Legendre polynomials; hence,
where [12]
The finite Fourier transform of the Legendre polynomials can be computed using the modified Bessel functions of the first kind, as follows
Hence, the discrete global relation (6) becomes
By choosing a multitude of values for the complex parameter \(\lambda \), according to the heuristic rule
a linear system is derived and its solution determines the expansion coefficients for the unknown boundary values. For the particular case of the Dirichlet problem, the linear system has the following form
where G and H are nonsquare matrices associated with the Fourier transforms of the Neumann and Dirichlet boundary values, respectively. It should be noted that mixed-type boundary conditions can be implemented by appropriately exchanging columns between the matrices G and H.
3 An iterative unified transform Dirichlet-to-Neumann algorithm
In this section, we present the main domain decomposition technique, which constitutes the backbone of the proposed methodology and is referred to as the Steklov–Poincare framework [34].
Let us consider the following model problem
where L is a general elliptic operator and \(\varOmega \) a bounded two-dimensional domain.
Let us also consider a nonoverlapping decomposition of two subdomains \(\varOmega _{1}\) and \(\varOmega _{2}\), separated by an interface \(\varGamma \). At this interface, the following equations, also called as transmission conditions, hold [34]
The solution of the boundary value problem defined in Eq. (13) can be determined by solving the following coupled system of partial differential equations [34]
The Steklov–Poincare system (16) can be solved iteratively using a modified Dirichlet-to-Neumann algorithm based on the Fokas method, namely unified transform Dirichlet-to-Neumann (UTDtN). In particular, by solving in an iterative approach a Dirichlet problem on domain \(\varOmega _{1}\), as depicted in Fig. 1, followed by the solution of a mixed problem on domain \(\varOmega _{2}\) with Neumann boundary conditions on \(\varGamma \), using the approximate global relations, the resulting approximate solution converges to the exact solution of problem (13).
Let P be the matrix corresponding to the Legendre polynomials, (8), in \(t\in [-1,1]\). Let us consider a relaxation parameter \(\theta \), with \(0<\theta <1\), in order to ensure the convergence of the iterative Dirichlet-to-Neumann algorithm [34]. The solution of the coupled system (16) is given by the algorithm unified transform Dirichlet-to-Neumann (UTDtN), (Algorithm 1).
In Algorithm 1, \({\hat{f}}\) denotes the integral transform of the source term, given by
Furthermore, in lines 6 and 7 of Algorithm 1, the corresponding columns of matrices G and H should be exchanged accordingly in order to account for the mixed-type boundary conditions.
It should be stated that the UTDtN (Algorithm 1) does not require a mesh generation for the two subdomains. Particularly, the solution is computed indirectly by alternately solving for the expansion coefficients on the boundaries of the two subdomains, \(\partial \varOmega _{1}\cup \varGamma \), \(\partial \varOmega _{2}\cup \varGamma \). The solution and the normal derivative can then be evaluated anywhere along the curve \(\partial \varOmega _{1}\cup \partial \varOmega _{1} \cup \varGamma \).
4 A parallel meshless hierarchical solver based on the unified transform
In this section, a parallel, meshless, hierarchical solver is proposed for solving linear elliptic PDEs. The solver is based on a modified version of the iterative Dirichlet-to-Neumann algorithm, cf. UTDtN (Algorithm 1), and has the particular advantage of avoiding the task of mesh generation, and also it can be implemented in parallel. The proposed method requires a quadtree-type decomposition of the given rectangular domain. In Fig. 2, a hierarchical, quadtree-type decomposition of a square domain, using three successive levels, is depicted. Initially, at Level 1, the domain consists of four subdomains; for each successive level, each subdomain is further decomposed into four new subdomains until the prescribed spatial resolution is achieved. For each successive level, the solution as well as the normal derivative on the interfaces of a group of four subdomains can be computed independently.
The procedure begins by decomposing the initial rectangular domain into four nonoverlapping subdomains. At the resulting interfaces, the Dirichlet as well as the Neumann values are required. The determination of those values is accomplished by modifying the iterative algorithm UTDtN (Algorithm 1), in order to be used with four, instead of two, subdomains. In Fig. 3, the process of computing the unknown Dirichlet and Neumann values across the interfaces \(\varGamma _{ij},\,i=1,3,\,j=2,4\), as implemented by the UTDtN_init algorithm, is depicted.
Initially, the four subdomains are grouped into two nonadjacent pairs, as shown in Fig. 3. The process starts by iteratively solving two Dirichlet problems for the subdomains 1 and 3 (red colour), followed by the solution of two mixed problems with Neumann conditions across the interfaces for the subdomains 2 and 4 (blue colour). The arrows on each subdomain of Fig. 3 indicate the orientation of computing the expansion coefficients of the unknown interface values. For subsequent spatial levels, depending on the position of the new subdomains, the orientation for computing the new interface values changes; in that case, the orientation of the previously computed interface values should be reversed. This is a crucial implementation detail, which is taken into account when proceeding on subsequent spatial levels, as the interface values are used as input in the forthcoming computations. This procedure is described in algorithm UTDtN_init, (Algorithm 2). By selecting a maximum number of iterations MaxIt, the unknown Dirichlet and Neumann values are computed across the interfaces \(\varGamma _{ij},\,i=1,3,\,j=2,4\), by solving two Dirichlet problems (lines 8 and 9) followed by the solution of two mixed problems (lines 10 and 11). Then, each of the four subdomains is further decomposed into four new subdomains subject to Dirichlet boundary conditions. In UTDtN_init (Algorithm 2), \(\partial \varOmega \) denotes the boundary of the initial domain (Level 1) and \(\left\{ \partial \varOmega _{j} \right\} _{1}^{4}\) the boundaries of the four subdomains (Level 2). Moreover, \(\left\{ {\bar{a}}_{\partial \varOmega _{j}} \right\} _{1}^{4}\), \(\left\{ {\bar{b}}_{\partial \varOmega _{j}} \right\} _{1}^{4}\) are subsets of the expansion coefficients a, b, corresponding to the sides \(\left\{ \partial \varOmega _{j} \cap \partial \varOmega \right\} _{1}^{4}\). In what follows we denote by a, b the expansion coefficients of the Neumann and Dirichlet values, respectively, and by \(\left[ a,b,c,d \right] \) the indices for identifying a second-level subdomain.
In Fig. 4, a two-level decomposition of a square domain is depicted, indicating the orientations for computing the respective interface values. In Fig. 5, the process of computing the expansion coefficients of the boundary values of the second level, using the first-level interface values, is shown in detail for the upper left subdomain. The expansion coefficients of the interface values on \(\varGamma _{12}\), \(\varGamma _{32}\) are used for determining the Dirichlet boundary values on side 1 of the subdomains a, d and on side 4 of the subdomains c, d, as depicted in Fig. 5. The interfaces \(\varGamma _{12}\), \(\varGamma _{32}\) are parameterized on the interval \([-1,1]\) using equidistant nodes; hence, the boundary values on side 1 of the subdomain a are evaluated on the interval \([-1,0]\), whereas the boundary values on side 1 of the subdomain d are evaluated on the interval [0, 1], using Eq. (7). Similarly, the boundary values on side 4 of the subdomain d are evaluated on the interval \([-1,0]\), whereas the boundary values on side 4 of the subdomain c are evaluated on the interval [0, 1], using Eq. (7). Then, the respective Dirichlet expansion coefficients are obtained by solving the following linear systems:
The process of computing the boundary values on the upper right, lower right, and lower left subdomains can be performed similarly. The above process is implemented for each of the four subdomains, cf. Fig. 3, using four functions that take into consideration the position of each subdomain.
The respective procedures are given in UTDtN_upper_right (Algorithm 3), UTDtN_upper_left (Algorithm 4), UTDtN_lower_right (Algorithm 5), and UTDtN_lower_left (Algorithm 6). Those algorithms receive as input the appropriate Legendre expansion coefficients for the Dirichlet values on the interfaces \(\varGamma _{ij},\,i=1,3,\,j=2,4\) of the first-level decomposition, which are used for evaluating the Dirichlet boundary values of the newly formatted subdomains. If \(\left\{ \partial \varOmega _{j} \right\} _{1}^{4}\) denote the boundaries of a first-level subdomain and \(\left\{ \partial \varOmega _{j} \right\} _{a}^{d}\) the boundaries of a second-level subdomain, then \({\bar{a}}_{\partial \varOmega _{i}^{j}}\), \({\bar{b}}_{\partial \varOmega _{i}^{j}}\) are subsets of the expansion coefficients a, b, corresponding to the sides \(\left\{ \partial \varOmega _{i} \cap \partial \varOmega _{j} \right\} \), \(i\in [1,2,3,4]\), \(j\in [a,b,c,d]\).
After the initial partitioning of the computational domain, the aforementioned algorithms are applied for each of the four subdomains, independently. In particular, the given domain is firstly decomposed into four subdomains, using UTDtN_init, and the respective interface values on \(\varGamma _{12}, \varGamma _{14}, \varGamma _{32}\) and \(\varGamma _{34}\) are also computed. Afterwards, for each of the four subdomains, using Algorithms 3, 4, 5, and 6 results in a subsequent decomposition into four new subdomains where the respective interface values are also computed. At the end of this step, a total of 16 subdomains are derived with known Dirichlet boundary values. In a similar manner, at the third level, Algorithms 3, 4, 5, and 6 can be used for each of the 16 subdomains independently for a subsequent decomposition and computation of the corresponding interface values.
At the end of the third-level process, a total of 64 subdomains are derived. The whole procedure can be implemented in parallel, and the parallelism as expected increases when choosing a large number of hierarchical levels.
In Algorithm 7, the UTDtN_main parallel procedure is presented. Initially, the number of hierarchical levels, lev, is chosen, and afterwards, the UTDtN_init algorithm is used in order to partition the original domain into four subdomains. This algorithm also returns the Legendre expansion coefficients for the Dirichlet interface values \(b_{ij},\,i=1,3,\,j=2,4\), on \(\varGamma _{ij},\,i=1,3,\,j=2,4\). Subsequently, since the Dirichlet boundary values are known for each of the resulting subdomains, the four algorithms UTDtN_upper_right, UTDtN_upper_left, UTDtN_lower_right, and UTDtN_lower_left are used for each of the resulting subdomains, in a parallel manner. The above procedure is repeated until the prescribed spatial resolution, i.e. number of levels, is achieved.
In Fig. 6, the parallel computing technique for a three-level domain decomposition process is depicted. Within the scope of a parallel computing environment, a set of tasks is executed by a worker \(W_{j}\) (thread) on a single core of a multicore machine. Hence, on a multicore system, ideally each worker is associated with each core. In this example, the task of decomposing each of the four subdomains of level 1 into a group of four new subdomains at level 2 is taken over by a total of four workers (or four cores). The 16 resulting subdomains can be further decomposed independently by 16 workers (or 16 cores).
5 An adaptive subdomain generation technique based on a posteriori error estimation
In this section, we propose a stopping criterion for the iterative procedures in Algorithms 2, 3, 4, 5, and 6. This criterion is based on an a posteriori error estimation technique, using the approximate global relations. The proposed criterion is then used for the design of a self-adaptive, subdomain generation algorithm, in order to increase the spatial resolution in specific areas of the computational domain, where the estimated errors are large.
Let us consider again the approximate global relation in matrix form for a linear elliptic PDE with source term
where \(\varepsilon \) is a term associated with the truncation error of the Legendre series expansion, as well as the error introduced when computing the integral transform of the source term, \({\hat{f}}\), using a numerical quadrature method. Since the exact expansion coefficients for the Dirichlet and Neumann values, a, b, should satisfy exactly the global relation, the estimator is defined as the maximum error arising when substituting the computed expansion coefficients into Eq. (19).
Using Eq. (20), it is possible to prescribe a tolerance parameter, TOL, to be used as a stopping criterion for the aforementioned iterative algorithms. Then, the order of the Legendre polynomials \(N_{\ell }\) should be chosen according to this prescribed criterion. In [7], it is shown that the approximation error when using a truncated Legendre series is given by
where \(H^{\sigma }(-1,1)\) denotes the Hilbert space with continuous derivatives of order \(\sigma \). Using the above relation, and under the hypothesis of a sufficiently smooth function u, an exponential convergence rate can be assumed [44]
where \(\mu \in {\mathbb{R}}^{+}\). Hence, the following heuristic formula can be prescribed for the choice of the order of the Legendre polynomials,
By examining a variety of experimental results, we have found that a good choice for the parameter \(\mu \) is \(\mu \approx 2\).
In Algorithm 8, a modified version of Algorithm 2 is presented, using a stopping criterion for the termination of the iterative procedure. For each iteration, and for each of the resulting subdomains, the maximum norm of the respective global relation, using the resulting expansion coefficients, is computed. The algorithm is terminated once all the estimated errors for each subdomain become less than or equal to the prescribed error tolerance, TOL. It should be noted that Algorithms 3, 4, 5, and 6 can be modified similarly.
Using the presented error estimators, it is possible to design an alternative algorithm that adaptively generates subdomains on specific areas and not across the whole domain, based on a certain criterion. Let us consider a first-level decomposition, as depicted in Fig. 3. Then, instead of deriving all 16 subdomains at the second hierarchical level, only the first-level subdomains whose error estimates are above a certain threshold parameter, TSH, are further decomposed. The process is then repeated until the desired spatial resolution is obtained.
In Fig. 7, an example of a decomposed computational domain using the adaptive subdomain generation technique, with five hierarchical levels, is shown. For this particular example, a threshold parameter, TSH, is assumed to be prescribed. Then, for each hierarchical level, the error estimates of each subdomain are compared to TSH; the estimates exceeding the value of TSH indicate the subdomains to be further decomposed. For the example presented in Fig. 7, the following subdomains are generated: Level 1: 2/4, Level 2: 4/16, Level 3: 8/64, Level 4: 7/256. In Algorithm 9, a modified version of Algorithm 7 is proposed. For each hierarchical level, the error estimate of each derived subdomain is computed along with the respective interface values. Consequently, only the subdomains that their respective error estimates satisfy the threshold criterion, TSH, are decomposed at the next level.
6 Computational complexity
The key step of the proposed algorithms is the evaluation of the discrete global relation, cf. (10). This procedure requires the formation of the coefficient matrices G, H and the solution of the resulting linear system. In order to form the coefficient matrices, the finite Fourier transform of the Legendre polynomials has to be computed for each entry, using Eq. (9). The modified Bessel function of the first kind is given by the following equation [37]
The series described by Eq. (24) is convergent for all z; however, when \(z\gg 1\), it is not (computationally) practical [37]. For large z, a computational approach and its implementation are given in [37]. Here, we discuss the computational complexity considering Eq. (24). It should be noted that for each entry of the coefficient matrices G, H, the argument of z is given by Eqs. (10) and (11). The magnitude of the argument is given by:
Using \(M=nN_{\ell }\), \(R=2M\) [28], for a polygonal domain lying on the unit circle, we have
Considering a truncated series \(k=0,\ldots ,c\) for Eq. (24), we have \(\frac{3}{2}c^{2}+\frac{7}{2}c+(c+2)\nu +2\) arithmetic operations (additions and multiplications). The coefficient matrices G, H consist of \((2nM)\times (nN_{\ell })\) elements. By considering Eqs. (9), (10), setting up the linear system corresponding to the global relation requires \(4n^{2}MN\left( \frac{3}{2}c^{2}+\frac{7}{2}c+(c+2)\nu +3 \right) +k_{1}+k_{2}-2nM\) arithmetic operations. The constants \(k_{1}\), \(k_{2}\) are associated with the number of arithmetic operations for computing the exponential coefficients appearing in matrices G, H. Hence, the computational complexity is \({\mathcal{O}}\left( (6c^{2}+4\nu c)n^{2}MN \right) \).
The next step for determining the unknown boundary values requires the solution of the resulting overdetermined linear system. Here, we have used the QR factorization based on Householder transformations and the computational complexity for a matrix \(A\in {\mathbb{R}}^{m\times n}\) is \(2mn^{2}-\frac{2}{3}n^{3}\) [6]. Hence, in our case solving the resulting linear system requires \(4n^{3}MN_{\ell }^{2}-\frac{2}{3}n^{3}N_{\ell }^{3}\). It should be stated that discretizing the global relation results in relatively small, dense coefficient matrices. Increased accuracy can be achieved in most cases by considering Legendre polynomials of order between 10 and 15. In that case, for a square domain, the order of the coefficient matrix is \(480 \times 60\), with \(N_{\ell }=15.\)
The proposed solvers UTDtN_main and UTDtN_main_adaptive rely on the algorithmic schemes UTDtN_init, UTDtN_upper_right, UTDtN_upper_left, UTDtN_lower_right and UTDtN_lower_left. As discussed above, the most computationally demanding part of formulating the global relation and determining the unknown boundary values is the solution of the resulting linear system using the QR method. Since the key part of the aforementioned algorithms is the repetitive solution of the global relation, the total computational work is determined by the number of linear systems that need to be solved. Each of the algorithmic schemes UTDtN_init, UTDtN_upper_right, UTDtN_upper_left, UTDtN_lower_right and UTDtN_lower_left requires \(4(\textit{MaxIt}+1)\zeta \) arithmetic operations, where \(\zeta =4n^{3}MN_{\ell }^{2}-\frac{2}{3}n^{3}N_{\ell }^{3}\). When the a posteriori error estimator is used as a stopping criterion, we have \(4i_{l,s}\zeta \) arithmetic operations, where \(i_{l,s}\) is the number of iterations performed at the \(l_{th}\) level, in the \(s_{th}\) subdomain. The total number of arithmetic operations of the algorithmic scheme UTDtN_main is described by the following expression:
Analogously, in the case of using the a posteriori estimator, we have:
Hence, the overall computational complexity of UTDtN_main is approximately:
Additionally, when using the a posteriori estimator, the overall computational complexity of UTDtN_main is approximately:
The UTDtN_main_adaptive algorithm is similar to UTDtN_main, based on a posteriori error estimation. The main difference is that at each hierarchical level, a limited number of subdomains are further discretized, according to the threshold parameter TSH. In the worst-case scenario, where all of the subdomains are discretized, the approximate computational complexity of UTDtN_main_adaptive is given by Eq. (30).
7 Numerical results
In this section, the applicability of the proposed methodology is illustrated. In particular, the Laplace equation is solved on a square domain, \([-1,1]^{2}\), using the proposed algorithms. In order to assess the accuracy of the approximated solutions, we have used the following closed-form expression
Initially, numerical results concerning the accuracy of the proposed method are given, using the two versions of the UTDtN algorithm (maximum iterations, error tolerance) for the Laplace equation on a square domain. Secondly, results are given for the UTDtN algorithm, including the adaptive subdomain generation technique based on error estimation for a square domain as well. The third set of numerical results involves the solution of Laplace PDE in an L-shaped domain in the presence of singularity, whereas in the fourth set, results are given for a nonconvex domain with multiple re-entrant corners. The final set of numerical results involves the performance of the parallel version of the UTDtN algorithm, whereas a brief comparison to a standard finite element domain decomposition scheme is conducted.
7.1 Numerical convergence on a square domain
In this subsection, the accuracy of the approximated Legendre expansion coefficients is assessed. These coefficients correspond to the solution as well as its normal derivative on the interfaces of the resulting subdomains, at each hierarchical level. The numerical errors have been computed with reference to the following equation
In Fig. 8, the computed errors for the Dirichlet \((b_{12},b_{32},b_{14},b_{34})\) and the Neumann \((a_{12},a_{32},a_{14},a_{34})\) expansion coefficients at the interfaces of a first-level decomposed domain, for each iteration with \(MaxIt=500\), are given. In Fig. 9, the computed errors for the Dirichlet and the Neumann expansion coefficients at the interfaces of the four subdomains of a second-level decomposed domain, for each iteration with \(MaxIt=400\), are given. In Figs. 10 and 11, the computed errors for the Dirichlet and the Neumann expansion coefficients at the interfaces of the subdomains 1–8 and 9–16, respectively, are shown for a third-level decomposed domain, for each iteration with \(MaxIt = 400\). In Tables 1 and 2, the estimated error, the exact errors for the Dirichlet and the Neumann values, and the total DtN iterations, for three hierarchical levels with TOL = 1E−08, are presented.
It should be noted that the a posteriori error estimation technique is a preferable termination criterion since it is possible to avoid the execution of excess iterations which can occur when assigning an arbitrarily large value for the MaxIt parameter.
In Tables 3 and 4, the estimated error, cf. Eq. (20), the exact errors for the Dirichlet and the Neumann values, as well as the total DtN iterations, for three hierarchical levels with TOL = 1E−03, are given.
7.2 Adaptive subdomain generation results
In this subsection, numerical results are presented for the UTDtN algorithm with adaptive subdomain generation, based on a posteriori error estimation. We present three different cases with various numbers for the error estimate parameter TOL and the adaptivity threshold parameter TSH. The results involve the presentation of the computational domain after the final adaptive decomposition as well as the corresponding maximum computed errors at each hierarchical level. The numerical results have been obtained with reference to the exact solution given by Eq. (31).
In Fig. 12, an adaptively decomposed subdomain into seven hierarchical levels, with TOL = 1E−03 and TSH = 1E−04, is presented. The corresponding maximum computed error for each hierarchical level is given in Fig. 13.
In Fig. 14, an adaptively decomposed subdomain into seven hierarchical levels, with TOL = 1E−04 and TSH = 5E−06, is shown, and the corresponding maximum computed error for each hierarchical level is given in Fig. 15. In Fig. 16, an adaptively decomposed subdomain into six hierarchical levels, with TOL = 1E−05, is depicted. The threshold parameter TSH has been adaptively selected with TSH = 1E−07 at levels 1–2 and TSH = 1E−06 at levels 3–6. The corresponding maximum computed error for each hierarchical level is given in Fig. 17.
The numerical results indicate that the given computational domain can be possibly further decomposed in particular areas where the a posteriori error estimate does not satisfy the prescribed threshold parameter. Moreover, the corresponding computed errors point out that the order of the accuracy is maintained through all hierarchical levels. These adaptive decompositions result in domains that are finer in areas with relatively large error estimates and coarser in the other areas.
7.3 L-shaped domain with singularity
Here, we present a test case where the Laplace equation is solved over an L-shaped domain in the presence of a singularity near the origin. Generally, there are two effective ways to successfully solve such a problem. The first approach requires the numerical solution using adaptive mesh refinement techniques near the singular point [46]. The second approach involves the inclusion of singular functions directly in the numerical method [14]. In the context of the Fokas method, supplementing the Legendre basis functions with singular functions results in exponential convergence rates as opposed to algebraic ones when those functions are not included; however, conditioning issues are expected to arise as noted in [11]. Here, the UTDtN method without singular functions is considered for solving the above problem and the numerical results are compared to a solution obtained by an adaptive finite element method. Although we expect algebraic convergence as indicated by Theorem 2.8 in [3], we show that in this numerical example, our method is faster than the adaptive FEM. In a future publication, we aim to combine our domain decomposition approach with singular functions in order to improve both the accuracy and the relevant condition number of the linear system.
In Fig. 18, the L-shaped domain and a two-level decomposition are depicted. In Fig. 19, the computational mesh generated by an adaptive finite element method using 22172 triangles is depicted on the left. On the right, the subdomains generated by the UTDtN technique where the solution approximated on each boundary are shown. In Fig. 20, the solutions obtained using the adaptive FEM (red colour) and the UTDtN technique (black colour) are plotted against each other. Using \(N_{\ell }=15\) basis functions, we obtain an error of 2.95E−04 near the singularity, by comparing to the solution computed by the finite element method. The time required for computing the solution with the adaptive FEM was 8.21 s, whereas for the UTDtN was 4.87 s. It should be mentioned that UTDtN used 83 DtN iterations at the first level as well as 145, 148, and 149 iterations for each of the three subdomains at level 2. At the third level, three subdomains were generated with 149 DtN iterations, respectively. At the fourth level, four subdomains were generated with 140, 140, 141, and 143 iterations, respectively. Finally, at the fifth level, four subdomains were again generated with 154, 154, 156, and 158 DtN iterations, respectively.
7.4 Nonconvex domain with multiple re-entrant corners
In this subsection, we consider the Laplace equation on a relatively complicated nonconvex domain with multiple re-entrant corners. We compare the usual unified transform on the boundary versus the proposed iterative domain decomposition algorithm (UTDtN). For the numerical results, we have used the exact solution given by Eq. (31). In Fig. 21, the nonconvex domain is depicted along with the corresponding ten-subdomain decomposition. We use the UTDtN algorithm in order to compute the solution on the interfaces \(\varGamma \). In Fig. 22, the numerical convergence for the solution across the interfaces \(\varGamma \) after 1000 iterations and using \(N_{\ell }=7\) basis functions is given. In Table 5, the condition numbers of the corresponding linear systems are given for the entire nonconvex domain as well as the decomposed subdomains. It is worth mentioning that using the UTDtN algorithm, the condition number decreases from 2.84E+18 to a maximum of 4.16E+04. Also, the time to compute all interface values after 1000 iterations was 4.16 s.
Since the optimal choice for the \(\theta \) value of the iterative DtN procedure is not known, we have considered a parameter testing numerical experiment. In Fig. 23, we present the maximum error across the interfaces \(\varGamma \) for several values of the parameter \(\theta \). It appears that the best numerical results in terms of accuracy are obtained in the interval \(\left[ 0.01, \; \, 0.105 \right] \).
It should be noted that the problem of decomposing a nonconvex polygon into a minimum number of convex polygons has been solved and “optimal decomposition algorithms” have been provided [9]. The above numerical results indicate that the proposed domain decomposition approach in conjunction with the method of Fokas can be useful for solving BVPs formulated on nonconvex domains.
An alternative approach using virtual sides has been introduced in [11] for solving BVPs on nonconvex domains via the Fokas method. This approach relies on introducing a virtual side between the appropriate corners of the domain in order to derive a number of convex polygons. The difference between this technique and ours lies in the solution procedure; in [11], one linear system is assembled including the matched Cauchy data across the virtual side in the solution vector. If \(n_{vs}\) virtual sides are used, the coefficient matrix is then augmented by \(2n_{vs}\) block columns of size depending on the order N of the selected basis functions. The authors have reported a two-subdomain numerical experiment obtaining high accuracy in approximately 0.06 s; however, their method needs to be tested in a complicated nonconvex domain with multiple re-entrant corners in order to make a proper comparison to our approach. It is certain that including a large number of subdomains, hence a large number of virtual sides, the technique in [11] will result in a relatively large coefficient matrix. Using our technique, it is possible to handle any nonconvex domain by solving small linear systems of low condition number, cf. Table 5, obtaining high-order accuracy as well, cf. Fig. 22. Hence, in cases of large nonconvex domains needing to be decomposed into a large number of subdomains, our method is expected to be faster since it is inherently parallel and each subdomain can be solved independently.
7.5 Parallel performance of UTDtN
In this subsection, the performance of the parallel implementation of the UTDtN algorithm is presented. We assess the proposed parallel algorithm by solving the Laplace equation on a square domain using the solution given by Eq. (31) as reference. Furthermore, to highlight the possible advantages, it is compared to a parallel implementation of a standard Schur complement approach, based on the finite element method.
In Table 6, the total time (in seconds) to compute both the Dirichlet and the Neumann interface values, for various numbers of subdomains and cores, is given with TOL = 1E−03 and TOL = 1E−08. In addition, in Figs. 24 and 25, the speedups for various numbers of subdomains and cores are presented with TOL = 1E−03 and TOL = 1E−08, respectively. The numerical results depicted in Table 6 and Figs. 24 and 25 were obtained on a machine with an Intel Xeon CPU E5-2420v2, 2.2 GHz with 64 GB RAM.
Additionally, we have performed numerical experiments on the ARIS supercomputer (GRNET). Each compute node consists of \(2\,\times\,\) Ivy Bridge Intel Xeon E5-2680v2 (10 cores each) and 64 GB RAM. In Table 7, the total time (in seconds) to compute both the Dirichlet and the Neumann interface values, for 256 and 1024 subdomains, and several numbers of cores, is given with TOL = 1E−03 and TOL = 1E−08. Moreover, in Figs. 26 and 27, the speedups for 256 and 1024 subdomains, and various numbers of cores are presented with TOL = 1E−03 and TOL = 1E−08, respectively.
In Table 8, a comparison between the parallel UTDtN and a parallel finite element, Schur complement scheme (DDFEM), is presented. The total time to compute all interface values as well as the maximum computed errors on Intel Xeon CPU E5-2420v2, 2.2 GHz, 64 GB RAM, with six cores, is given.
The DDFEM scheme produces subdomains in the same manner as the UTDtN algorithm does. The main difference is the fact that DDFEM proceeds in the discretization of the interior of the subdomains, using a triangulation. Additionally, the initial mesh can be iteratively refined according to a parameter MR, which denotes the number of mesh refinements. It should be mentioned that the refinement procedure involves the division of each triangle into four new triangles of the same shape. The DDFEM scheme relies on a Schur complement approach; however, the corresponding Schur complement matrix S does not need to be explicitly formed; the associated linear system is solved using an iterative method where only matrix-by-vector operations are required. In this case, we have considered the preconditioned conjugate gradient (PCG) method [39], using 500 maximum iterations and 1E-10 tolerance. It should be noted that in order to be properly compared, only the interface values have been computed. In “Appendix”, the details of the DDFEM algorithm are given.
The numerical results indicate that our proposed scheme is much better in terms of speed and accuracy, especially in the case of a large number of subdomains. The main disadvantage of the finite element method lies in the generation of the computational mesh, a task that is computationally expensive when higher accuracy is needed.
As it has been previously stated, our proposed technique does not require mesh generation; the solution is only computed on the subdomains’ interfaces by solving smaller one-dimensional problems. In the case that higher resolution is required, the computational domain is further hierarchically decomposed, and the solution is obtained at the newly introduced interfaces.
8 Conclusion
A class of novel techniques for the solution of linear elliptic PDEs, based on the unified transform, has been presented. The proposed techniques rely on an iterative Dirichlet-to-Neumann approach, where the Fokas global relations constitute the essential component of the methodology. By reformulating an iterative Dirichlet-to-Neumann algorithm in terms of the approximate global relation, we have designed a domain decomposition-type class of techniques that have the following properties: inherent parallelism, high accuracy, and meshless subdomains resulting in reduced dimension approximations. In addition, we have presented an error estimation technique based on the global relation that has been initially used for the termination of the associated iterative procedures and secondly for the design of a numerical scheme that adaptively generates meshless subdomains. We have presented various sets of numerical results indicating the applicability of the proposed techniques. Furthermore, we have made brief comparisons to a Schur complement finite element method as well as an adaptive finite element method, highlighting the possible advantages. Future work will be directed towards the parallelization of the proposed scheme on distributed memory parallel systems and further study and application of the proposed techniques to a variety of practical problems arising in engineering and sciences as well as ill-posed problems.
References
Arabnia HR, Taha TR (1998) A parallel numerical algorithm on a reconfigurable multi-ring network. J Telecommun Syst 10:185–203
Ashton ACL (2013) The spectral Dirichlet–Neumann map for Laplace’s equation in a convex polygon. SIAM J Math Anal 45(6):3575–3591
Babuska I, Guo B (2000) Optimal estimates for lower and upper bounds of approximation errors in the p-version of the finite element method in two dimensions. Numer Math 85(2):219–255
Balasubramanian P, Arabnia HR (2014) Computation of error resiliency of Muller C-element. In: Proceedings on International Conference on Computational Science and Computational Intelligence, pp 179–180
Bhandarkar SM, Arabnia HR (1995) The REFINE multiprocessor-theoretical properties and algorithms. Parallel Comput 21(11):1783–1806
Bjorck A (2015) Numerical methods in matrix computations. Texts in Applied Mathematics. Springer, Berlin
Canuto C, Hussaini MY, Quarteroni A, Zang TA (2006) Spectral methods. Springer, Berlin
Chan TF, Goovaerts D (1989) Schur complement domain decomposition algorithms for spectral methods. Appl Numer Math 6:53–64
Chazelle B, Dobkin D (1985) Optimal convex decompositions. In: Toussaint G (ed) Computational geometry. North-Holland, Amsterdam, pp 63–133
Colbrook MJ (2018) Extending the unified transform: curvilinear polygons and variable coefficient PDEs. IMA J Numer Anal. https://doi.org/10.1093/imanum/dry085
Colbrook MJ, Flyer N, Fornberg B (2018) On the Fokas method for the solution of elliptic problems in both convex and non-convex polygonal domains. J Comput Phys 374:996–1016
Courant R, Hilbert D (1989) Methods of mathematical physics, vol 1. Wiley, Hoboken
Davis C-IR, Fornberg B (2014) A spectrally accurate numerical implementation of the Fokas transform method for Helmholtz-type PDEs. Complex Var Elliptic Equ 59(4):564–577
Elliotis M, Georgiou G, Xenophontos C (2005) Solving Laplacian problems with boundary singularities: a comparison of a singular boundary integral method with the p/hp version of the finite element method. Appl Math Comput 169:485–499
Fernandez A, Baleanu D, Fokas AS (2018) Solving PDEs of fractional order using the unified transform method. Appl Math Comput 339:738–749
Fokas AS (1997) A unified transform method for solving linear and certain nonlinear PDEs. Proc R Soc Lond Ser A 453:1411–1443
Fokas AS (2001) Two-dimensional linear PDEs in a convex polygon. Proc R Soc Lond Ser A 457:371–393
Fokas AS (2002) A new transform method for evolution PDEs. IMA J Appl Math 67:559–590
Fokas AS (2008) A unified approach to boundary value problems. SIAM, Philadelphia
Fornberg B, Flyer N (2011) A numerical implementation of Fokas boundary integral approach: Laplace’s equation on a polygonal domain. Proc R Soc A 467:2083–3003
Franceschini A, Paludetto Magri V, Ferronato M, Janna C (2018) A robust multilevel approximate inverse preconditioner for symmetric positive definite matrices. SIAM J Matrix Anal Appl 39:123–147
Fulton S, Fokas AS, Xenophontos C (2004) An analytical method for linear elliptic PDEs and its numerical implementation. J Comput Appl Math 167:465–483
Grylonakis E-NG, Filelis-Papadopoulos CK, Gravvanis GA (2015) A note on solving the generalized Dirichlet to Neumann map on irregular polygons using Generic Factored Approximate Sparse Inverses. CMES Comput Model Eng Sci 109(6):505–517
Grylonakis E-NG, Filelis-Papadopoulos CK, Gravvanis GA (2017) A hybrid method for solving inhomogeneous elliptic PDEs based on Fokas method. Comput Methods Appl Math. https://doi.org/10.1515/cmam-2017-0053
Grylonakis E-NG, Filelis-Papadopoulos CK, Gravvanis GA (2018) A class of unified transform techniques for solving linear elliptic PDEs in convex polygons. Appl Numer Math 129:159–180
Grylonakis E-NG, Filelis-Papadopoulos CK, Gravvanis GA, Fokas AS (2019) An iterative spatial-stepping numerical method for linear elliptic PDEs using the Unified Transform. J Comput Appl Math 352:194–209
Grylonakis E-NG, Filelis-Papadopoulos CK, Gravvanis GA, Fokas AS (2019) An adaptive complex collocation method for solving linear elliptic PDEs in regular convex polygons based on the unified transform. Numer Math Theory Methods Appl 12(2):348–369
Hashemzadeh P, Fokas AS, Smitheman SA (2015) A numerical technique for linear elliptic partial differential equations in polygonal domains. Proc Math Phys Eng Sci 471:20140747. https://doi.org/10.1098/rspa.2014.0747
Janna C, Castelletto N, Ferronato M (2015) The effect of graph partitioning techniques on parallel Block FSAI preconditioning: a computational study. Numer Algorithms 68(4):813–836
Jayashree HV, Thapliyal H, Arabnia HR, Agrawal VK (2016) Ancilla-input and Garbage-output Optimized Design of a Reversible Quantum Integer Multiplier. J Supercomput 72(4):1477–1493
Jiri K, Rozloznik M, Tuma M (2017) An adaptive multilevel factorized sparse approximate inverse preconditioning. Adv Eng Softw 113:19–24
Kyziropoulos PE, Filelis-Papadopoulos CK, Gravvanis GA (2018) A class of symmetric factored approximate inverses and hybrid two-level solver. Int J Comput Methods 15(2):1850050
Makaratzis AT, Filelis-Papadopoulos CK, Gravvanis GA (2016) Parallel multilevel recursive approximate inverse techniques for solving general sparse linear systems. J Supercomput 72(6):2259–2282
Mathew T (2008) Domain decomposition methods for the numerical solution of partial differential equations. Springer, Berlin
Moutafis BE, Filelis-Papadopoulos CK, Gravvanis GA (2017) Parallel multi-projection preconditioned methods based on semi-aggregation techniques. J Comput Sci 22:45–54
Moutafis BE, Filelis-Papadopoulos CK, Gravvanis GA (2018) Parallel Schur complement techniques based on multiprojection methods. SIAM J Sci Comput 40(4):634–654
Press WH, Teukolsky SA, Vetterling WT, Flannery BP (2007) Numerical recipes. The art of scientific computing, 3rd edn. Cambridge University Press, Cambridge
Quarteroni A (2014) Numerical models for differential problems (MS&A), 2nd edn. Springer, Berlin
Saad Y (2003) Iterative methods for sparse linear systems, 2nd edn. Society for Industrial and Applied Mathematics, Philadelphia
Sauter SA, Schwab C (2010) Boundary element methods. Springer, Berlin
Sifalakis AG, Fokas AS, Fulton S, Saridakis YG (2008) The generalized Dirichlet–Neumann map for linear elliptic PDEs and its numerical implementation. J Comput Appl Math 219(1):9–34
Toselli A, Widlund O (2005) Domain decomposition methods—algorithms and theory. Springer, Berlin
Valafar H, Arabnia HR, Williams G (2004) Distributed global optimization and its development on the multiring network. Int J Neural Parallel Sci Comput 12(4):465–490
Wang H, Xiang S (2012) On the convergence rates of Legendre approximation. Math Comput 81:861–877
Xi Y, Li R, Saad Y (2016) An algebraic multilevel preconditioner with low-rank corrections for sparse symmetric matrices. SIAM J Matrix Anal Appl 37(1):235–259
Zhu JZ, Zienkiewicz OC (1988) Adaptive techniques in the finite element method. Commun Appl Numer Methods 4:197–204
Zhu Y, Sameh AH (2017) PSPIKE+: a family of parallel hybrid sparse linear system solvers. J Comput Appl Math 311:682–703
Zienkiewicz OC, Taylor OL, Zhu JZ (2013) The finite element method: its basis and fundamentals, 7th edn. Butterworth-Heinemann, Oxford
Acknowledgements
The authors acknowledge the Greek Research and Technology Network (GRNET) for the provision of the National HPC facility ARIS under project PR004033—ScaleSciCompII, and support from EPSRC, UK. The authors are also thankful to Matt Colbrook for useful suggestions.
Funding
Funding was provided by Engineering and Physical Sciences Research Council for A.S. Fokas.
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Appendix: Finite Element Domain Decomposition (DDFEM) algorithm
Appendix: Finite Element Domain Decomposition (DDFEM) algorithm
Let us consider a linear system Au = f, arising from a finite element discretization of a linear elliptic PDE. For a general decomposition into s subdomains, the linear system has the following structure [39]
where matrix A can also be written as
The vectors \(\left\{ u_{j} \right\} _{1}^{s}\) represent the solution at the interior of the s subdomains, and \(u_{\varGamma }\) represents the solution at the interfaces. Using block Gaussian elimination, the interface values are obtained by solving the following reduced system [39]
where
is called the Schur complement matrix [39]. The reduced system can be solved without explicitly assembling the Schur complement matrix S by considering a Krylov subspace iterative method. The matrix-by-vector operations \(Su_{\varGamma }\) are performed as follows [39]:
In Algorithm 10 the Schur complement, finite element procedure is described. It should be noted that R represents a restriction matrix. Further implementation details can be found in [34].
Rights and permissions
About this article
Cite this article
Grylonakis, E.N.G., Gravvanis, G.A., Filelis-Papadopoulos, C.K. et al. A parallel unified transform solver based on domain decomposition for solving linear elliptic PDEs. J Supercomput 75, 4947–4985 (2019). https://doi.org/10.1007/s11227-019-02772-2
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11227-019-02772-2