1 Introduction

The numerical solution of boundary value problems (BVPs) for linear elliptic partial differential equations (PDEs) represents an important topic in the area of applied mathematics and scientific computing. In recent years, efforts have been directed towards the development of novel numerical schemes based on the unified transform, or the Fokas method [16,17,18,19]. This method can be considered as the spectral analogue of the classical method of boundary integral equations. The unified transform that was introduced by one of the authors in the late 1990s has been used for the solution of a large class of BVPs, including linear and certain nonlinear PDEs. For the solution of a particular well-posed BVP in a polygonal domain, this method involves two basic steps: first, the solution is expressed in terms of integrals in the complex Fourier plane; these integrals contain certain integral transforms of the Dirichlet and Neumann boundary values. Second, the computation of these integral transforms through the analysis of the global relation. For the second task, several numerical schemes have been developed over the last years in order to determine the unknown boundary values [13, 20, 22, 23, 28, 41].

In this paper, considering the second ingredient of the unified transform, i.e. the analysis of the global relation, we propose a class of numerical schemes exhibiting the following properties: high accuracy, avoidance of mesh generation, and inherent parallelism. Regarding the first property, high accuracy is achieved by using a Legendre expansion collocation method in the complex plane for computing the unknown Dirichlet and Neumann values on the boundary of a computational domain. The second property relies on the fact that the collocation method for the global relation represents a boundary-based technique that involves the computation of the relevant expansion coefficients; hence, the solution can be evaluated anywhere on the boundary without the generation of a particular computational grid. Regarding the third property, the inherent parallelism is achieved by considering a domain decomposition technique. In particular, by redesigning an iterative Dirichlet-to-Neumann algorithm in terms of the global relation, and by considering a quadtree-type domain decomposition scheme, it is possible to derive several hierarchical partitions of the domain, where groups of subdomains can be processed independently.

Here, we present the details of the proposed methodology for the Laplace equation on a square as well as on an L-shaped domain; however, the relevant numerical schemes can be used with slight modifications for any linear elliptic PDE that admits a global relation, in a general polygonal domain with an arbitrary number of sides. In addition to the method for solving the given PDE in the interior of the domain, we also propose a technique for estimating the error of the approximated solution and the approximated normal derivative on the boundaries of the domain. This a posteriori error estimator is then used as a stopping criterion for terminating the relevant iterative procedures. Furthermore, it is also used for designing a novel technique for adaptively generating subdomains according to a prescribed threshold criterion. Moreover, we show that above approach can be possibly advantageous for solving PDEs formulated on nonconvex polygons with multiple re-entrant corners. Recent developments concerning numerical techniques based on the unified transform can be found in [2, 10, 15, 24,25,26,27].

Over the last decades, the development in the field of high-performance computing (HPC) has directed the efforts of the research community to design highly parallel techniques. Domain decomposition techniques [34, 39, 42] have been used extensively for solving practical problems in science and engineering. There are several methods for solving PDEs using the domain decomposition approach, based on the finite difference (FDM) [38], the finite element (FEM) [48], the finite volume (FVM) [38], as well as the boundary element (BEM) [40] method and spectral methods [7, 8]. The techniques proposed in this paper share several characteristics with the boundary element method, the spectral as well as the meshless method, forming an entire new class of techniques. Regarding the scope of high-performance scientific computing, work related to the utilization of hardware and software on scalable multiprocessor systems can be found in [1, 4, 5, 30, 43]. In addition, recent developments in preconditioning techniques and parallel solution methods can be found in [21, 29, 31,32,33, 35, 36, 45, 47].

In Sect. 2, the collocation method for solving the global relation is briefly reviewed. In Sect. 3, an iterative Dirichlet-to-Neumann (DtN) algorithm based on the global relation is introduced. In Sect. 4, a parallel meshless hierarchical solver based on the iterative DtN algorithm is discussed. In Sect. 5, an adaptive technique for subdomain generation based on a posteriori error estimation is also proposed. In Sect. 6, the computational complexity of the proposed schemes is analysed and discussed. Finally, in Sect. 7, numerical results are presented demonstrating the applicability and possible advantages of the proposed techniques.

2 Discrete global relations

The global relation is an algebraic equation that couples the finite Fourier transforms of all boundary values. Given a well-posed boundary value problem subject to prescribed boundary conditions, the global relation can be solved either analytically or numerically in order to determine the unknown boundary values. In this paper, we consider the first approach, and below we review the collocation technique that will be used in conjunction with the proposed domain decomposition methodology.

Let us consider the Laplace equation in complex coordinates, on a convex two-dimensional polygon \(\varOmega \)

$$\begin{aligned} \frac{\partial ^{2}u}{\partial z \partial {\bar{z}}}=0, \, z=x+iy, \, z \in \varOmega . \end{aligned}$$
(1)

Additionally, let us consider the second identity of Green

$$\begin{aligned} \oint _{\partial \varOmega } \left( u \frac{\partial v }{\partial n} - v \frac{\partial u}{\partial n} \right) {\mathrm{d}}s = 0, \end{aligned}$$
(2)

as well as the identity

$$\begin{aligned} \frac{\partial v}{\partial n} {\mathrm{d}}s=i \left( \frac{\partial v}{\partial {\bar{z}}} d {\bar{z}} - \frac{\partial v}{\partial z} {\mathrm{d}}z \right) . \end{aligned}$$
(3)

Considering the particular solution \(v=e^{-i \lambda z},\lambda \in {\mathbb{C}}\), derived by the method of separation of variables, and using Eqs. (2) and (3), the following global relation is derived

$$\begin{aligned} \oint _{\partial \varOmega }{e^{-i\lambda z} \left[ \frac{\partial u}{\partial \eta } + \lambda u \frac{\ {\mathrm{d}}z}{\ {\mathrm{d}}s } \right] }{\mathrm{d}}s=0. \end{aligned}$$
(4)

In order to solve numerically Eq. (4), each side of the polygonal domain \(\varOmega \) is parameterized using a local parameter t, as follows

$$\begin{aligned} z&=m_{j}+th_{j}, t\in \left[ -1,1 \right] ,\quad j=1,2,\ldots ,n,\\ m_{j}&=\frac{z_{j+1}+z_{j}}{2}, h_{j}=\frac{z_{j+1}-z_{j}}{2}. \end{aligned}$$
(5)

Then, the discrete version of Eq. (4) takes the form

$$\begin{aligned} \sum _{j=1}^{n}e^{-i\lambda m_{j}}\int _{-1}^{1}e^{-i\lambda h_{j}t}\left[ \left| h_{j} \right| \frac{\partial u_{j}(t)}{\partial \eta } + \lambda h_{j} u_{j}(t) \right] {\mathrm{d}}t=0, \lambda \in {\mathbb{C}}. \end{aligned}$$
(6)

The collocation-type technique requires the expansion of the boundary values in terms of appropriate basis functions. Following [28], we choose the Legendre polynomials; hence,

$$\begin{aligned} u_{j}(t)&=\sum _{\ell =0}^{N_{\ell }-1}b_{\ell }^{j}P_{\ell }(t), \quad j=1,\ldots ,n, t\in \left[ -1,1 \right] ,\\ \frac{\partial u_{j}(t)}{\partial \eta }&=\sum _{\ell =0}^{N_{\ell }-1}a_{\ell }^{j}P_{\ell }(t),\quad j=1,\ldots ,n, t\in \left[ -1,1 \right] , \end{aligned}$$
(7)

where [12]

$$\begin{aligned} P_{\ell }(t)=\frac{1}{2^{\ell }\ell !}\sum _{i=0}^{\ell }(-1)^{\ell -i}\left( {\begin{array}{c}\ell \\ i\end{array}}\right) \frac{(2i)!}{(2i-\ell )!}t^{2i-\ell }. \end{aligned}$$
(8)

The finite Fourier transform of the Legendre polynomials can be computed using the modified Bessel functions of the first kind, as follows

$$\begin{aligned} {\widehat{P}}_{\ell }(\omega )=\frac{\sqrt{2\pi \omega }}{\omega }I_{\ell +\frac{1}{2}}(\omega ). \end{aligned}$$
(9)

Hence, the discrete global relation (6) becomes

$$\begin{aligned} \sum _{j=1}^{n}\sum _{\ell =0}^{N_{\ell }-1}e^{-i\lambda m_{j}}\left[ \left| h_{j} \right| b_{\ell }^{j} + \lambda h_{j} a_{\ell }^{j} \right] {\widehat{P}}_{\ell }(-i \lambda h_{j})=0, \lambda \in {\mathbb{C}}. \end{aligned}$$
(10)

By choosing a multitude of values for the complex parameter \(\lambda \), according to the heuristic rule

$$\begin{aligned} \lambda =-{\bar{h}}_{p}\frac{R}{M}r, R>0, M \in {\mathbb{Z}}^{+}, p=1,\ldots ,n, r=1,\ldots ,M, \end{aligned}$$
(11)

a linear system is derived and its solution determines the expansion coefficients for the unknown boundary values. For the particular case of the Dirichlet problem, the linear system has the following form

$$\begin{aligned} Ga=Hb, \end{aligned}$$
(12)

where G and H are nonsquare matrices associated with the Fourier transforms of the Neumann and Dirichlet boundary values, respectively. It should be noted that mixed-type boundary conditions can be implemented by appropriately exchanging columns between the matrices G and H.

3 An iterative unified transform Dirichlet-to-Neumann algorithm

In this section, we present the main domain decomposition technique, which constitutes the backbone of the proposed methodology and is referred to as the Steklov–Poincare framework [34].

Let us consider the following model problem

$$\begin{aligned} Lu&=f\quad {\text{in}}\, \varOmega ,\\ u&=g\quad {\text{in}}\,\partial \varOmega , \end{aligned}$$
(13)

where L is a general elliptic operator and \(\varOmega \) a bounded two-dimensional domain.

Let us also consider a nonoverlapping decomposition of two subdomains \(\varOmega _{1}\) and \(\varOmega _{2}\), separated by an interface \(\varGamma \). At this interface, the following equations, also called as transmission conditions, hold [34]

$$u_{1}= u_{2}, $$
(14)
$$\frac{\partial u_{1}}{\partial n}= \frac{\partial u_{2}}{\partial n}. $$
(15)

The solution of the boundary value problem defined in Eq. (13) can be determined by solving the following coupled system of partial differential equations [34]

$$\begin{aligned} Lu_{1}&=f,\quad {\text{in}}\, \varOmega _{1},\\ u_{1}&=g,\quad {\text{on}}\, \varOmega _{1} {\setminus} \varGamma ,\\ u_{1}&=u_{2},\quad {\text{on}}\, \varGamma ,\\ Lu_{2}&=f,\quad {\text{in}}\, \varOmega _{2},\\ u_{2}&=g,\quad {\text{on}} \varOmega _{2} {\setminus} \varGamma ,\\ \frac{\partial u_{2}}{\partial n}&=\frac{\partial u_{1}}{\partial n},\quad {\text{on}}\, \varGamma .\\ \end{aligned}$$
(16)

The Steklov–Poincare system (16) can be solved iteratively using a modified Dirichlet-to-Neumann algorithm based on the Fokas method, namely unified transform Dirichlet-to-Neumann (UTDtN). In particular, by solving in an iterative approach a Dirichlet problem on domain \(\varOmega _{1}\), as depicted in Fig. 1, followed by the solution of a mixed problem on domain \(\varOmega _{2}\) with Neumann boundary conditions on \(\varGamma \), using the approximate global relations, the resulting approximate solution converges to the exact solution of problem (13).

Fig. 1
figure 1

A two-subdomain decomposition with an interface \(\varGamma \)

Let P be the matrix corresponding to the Legendre polynomials, (8), in \(t\in [-1,1]\). Let us consider a relaxation parameter \(\theta \), with \(0<\theta <1\), in order to ensure the convergence of the iterative Dirichlet-to-Neumann algorithm [34]. The solution of the coupled system (16) is given by the algorithm unified transform Dirichlet-to-Neumann (UTDtN), (Algorithm 1).

figure a

In Algorithm 1, \({\hat{f}}\) denotes the integral transform of the source term, given by

$$\begin{aligned} {\hat{f}}=\iint _{\varOmega }^{ } e^{-i \lambda z} f(x,y) {\mathrm{d}}x{\mathrm{d}}y. \end{aligned}$$
(17)

Furthermore, in lines 6 and 7 of Algorithm 1, the corresponding columns of matrices G and H should be exchanged accordingly in order to account for the mixed-type boundary conditions.

It should be stated that the UTDtN (Algorithm 1) does not require a mesh generation for the two subdomains. Particularly, the solution is computed indirectly by alternately solving for the expansion coefficients on the boundaries of the two subdomains, \(\partial \varOmega _{1}\cup \varGamma \), \(\partial \varOmega _{2}\cup \varGamma \). The solution and the normal derivative can then be evaluated anywhere along the curve \(\partial \varOmega _{1}\cup \partial \varOmega _{1} \cup \varGamma \).

4 A parallel meshless hierarchical solver based on the unified transform

In this section, a parallel, meshless, hierarchical solver is proposed for solving linear elliptic PDEs. The solver is based on a modified version of the iterative Dirichlet-to-Neumann algorithm, cf. UTDtN (Algorithm 1), and has the particular advantage of avoiding the task of mesh generation, and also it can be implemented in parallel. The proposed method requires a quadtree-type decomposition of the given rectangular domain. In Fig. 2, a hierarchical, quadtree-type decomposition of a square domain, using three successive levels, is depicted. Initially, at Level 1, the domain consists of four subdomains; for each successive level, each subdomain is further decomposed into four new subdomains until the prescribed spatial resolution is achieved. For each successive level, the solution as well as the normal derivative on the interfaces of a group of four subdomains can be computed independently.

The procedure begins by decomposing the initial rectangular domain into four nonoverlapping subdomains. At the resulting interfaces, the Dirichlet as well as the Neumann values are required. The determination of those values is accomplished by modifying the iterative algorithm UTDtN (Algorithm 1), in order to be used with four, instead of two, subdomains. In Fig. 3, the process of computing the unknown Dirichlet and Neumann values across the interfaces \(\varGamma _{ij},\,i=1,3,\,j=2,4\), as implemented by the UTDtN_init algorithm, is depicted.

Fig. 2
figure 2

Quadtree-type decomposition of a square domain using three levels

Fig. 3
figure 3

The process of computing the unknown Dirichlet and Neumann values across the interfaces of the decomposed domain (color figure online)

Initially, the four subdomains are grouped into two nonadjacent pairs, as shown in Fig. 3. The process starts by iteratively solving two Dirichlet problems for the subdomains 1 and 3 (red colour), followed by the solution of two mixed problems with Neumann conditions across the interfaces for the subdomains 2 and 4 (blue colour). The arrows on each subdomain of Fig. 3 indicate the orientation of computing the expansion coefficients of the unknown interface values. For subsequent spatial levels, depending on the position of the new subdomains, the orientation for computing the new interface values changes; in that case, the orientation of the previously computed interface values should be reversed. This is a crucial implementation detail, which is taken into account when proceeding on subsequent spatial levels, as the interface values are used as input in the forthcoming computations. This procedure is described in algorithm UTDtN_init, (Algorithm 2). By selecting a maximum number of iterations MaxIt, the unknown Dirichlet and Neumann values are computed across the interfaces \(\varGamma _{ij},\,i=1,3,\,j=2,4\), by solving two Dirichlet problems (lines 8 and 9) followed by the solution of two mixed problems (lines 10 and 11). Then, each of the four subdomains is further decomposed into four new subdomains subject to Dirichlet boundary conditions. In UTDtN_init (Algorithm 2), \(\partial \varOmega \) denotes the boundary of the initial domain (Level 1) and \(\left\{ \partial \varOmega _{j} \right\} _{1}^{4}\) the boundaries of the four subdomains (Level 2). Moreover, \(\left\{ {\bar{a}}_{\partial \varOmega _{j}} \right\} _{1}^{4}\), \(\left\{ {\bar{b}}_{\partial \varOmega _{j}} \right\} _{1}^{4}\) are subsets of the expansion coefficients a, b, corresponding to the sides \(\left\{ \partial \varOmega _{j} \cap \partial \varOmega \right\} _{1}^{4}\). In what follows we denote by a, b the expansion coefficients of the Neumann and Dirichlet values, respectively, and by \(\left[ a,b,c,d \right] \) the indices for identifying a second-level subdomain.

In Fig. 4, a two-level decomposition of a square domain is depicted, indicating the orientations for computing the respective interface values. In Fig. 5, the process of computing the expansion coefficients of the boundary values of the second level, using the first-level interface values, is shown in detail for the upper left subdomain. The expansion coefficients of the interface values on \(\varGamma _{12}\), \(\varGamma _{32}\) are used for determining the Dirichlet boundary values on side 1 of the subdomains a, d and on side 4 of the subdomains c, d, as depicted in Fig. 5. The interfaces \(\varGamma _{12}\), \(\varGamma _{32}\) are parameterized on the interval \([-1,1]\) using equidistant nodes; hence, the boundary values on side 1 of the subdomain a are evaluated on the interval \([-1,0]\), whereas the boundary values on side 1 of the subdomain d are evaluated on the interval [0, 1], using Eq. (7). Similarly, the boundary values on side 4 of the subdomain d are evaluated on the interval \([-1,0]\), whereas the boundary values on side 4 of the subdomain c are evaluated on the interval [0, 1], using Eq. (7). Then, the respective Dirichlet expansion coefficients are obtained by solving the following linear systems:

$$\begin{aligned} Pb_{a1}&=u_{a1}\\ Pb_{d1}&=u_{d1}\\ Pb_{c4}&=u_{c4}\\ Pb_{d4}&=u_{d4}. \end{aligned}$$
(18)

The process of computing the boundary values on the upper right, lower right, and lower left subdomains can be performed similarly. The above process is implemented for each of the four subdomains, cf. Fig. 3, using four functions that take into consideration the position of each subdomain.

figure b

The respective procedures are given in UTDtN_upper_right (Algorithm 3), UTDtN_upper_left (Algorithm 4), UTDtN_lower_right (Algorithm 5), and UTDtN_lower_left (Algorithm 6). Those algorithms receive as input the appropriate Legendre expansion coefficients for the Dirichlet values on the interfaces \(\varGamma _{ij},\,i=1,3,\,j=2,4\) of the first-level decomposition, which are used for evaluating the Dirichlet boundary values of the newly formatted subdomains. If \(\left\{ \partial \varOmega _{j} \right\} _{1}^{4}\) denote the boundaries of a first-level subdomain and \(\left\{ \partial \varOmega _{j} \right\} _{a}^{d}\) the boundaries of a second-level subdomain, then \({\bar{a}}_{\partial \varOmega _{i}^{j}}\), \({\bar{b}}_{\partial \varOmega _{i}^{j}}\) are subsets of the expansion coefficients a, b, corresponding to the sides \(\left\{ \partial \varOmega _{i} \cap \partial \varOmega _{j} \right\} \), \(i\in [1,2,3,4]\), \(j\in [a,b,c,d]\).

After the initial partitioning of the computational domain, the aforementioned algorithms are applied for each of the four subdomains, independently. In particular, the given domain is firstly decomposed into four subdomains, using UTDtN_init, and the respective interface values on \(\varGamma _{12}, \varGamma _{14}, \varGamma _{32}\) and \(\varGamma _{34}\) are also computed. Afterwards, for each of the four subdomains, using Algorithms 3, 4, 5, and 6 results in a subsequent decomposition into four new subdomains where the respective interface values are also computed. At the end of this step, a total of 16 subdomains are derived with known Dirichlet boundary values. In a similar manner, at the third level, Algorithms 3, 4, 5, and 6 can be used for each of the 16 subdomains independently for a subsequent decomposition and computation of the corresponding interface values.

Fig. 4
figure 4

The process of computing the unknown Dirichlet and Neumann values across the interfaces of the decomposed domain, using two hierarchical levels

Fig. 5
figure 5

The process of obtaining the Legendre expansion coefficients for the Dirichlet boundary values of the second-level subdomains, corresponding to the first-level, upper left subdomain

At the end of the third-level process, a total of 64 subdomains are derived. The whole procedure can be implemented in parallel, and the parallelism as expected increases when choosing a large number of hierarchical levels.

figure c
figure d
figure e
figure f
figure g

In Algorithm 7, the UTDtN_main parallel procedure is presented. Initially, the number of hierarchical levels, lev, is chosen, and afterwards, the UTDtN_init algorithm is used in order to partition the original domain into four subdomains. This algorithm also returns the Legendre expansion coefficients for the Dirichlet interface values \(b_{ij},\,i=1,3,\,j=2,4\), on \(\varGamma _{ij},\,i=1,3,\,j=2,4\). Subsequently, since the Dirichlet boundary values are known for each of the resulting subdomains, the four algorithms UTDtN_upper_right, UTDtN_upper_left, UTDtN_lower_right, and UTDtN_lower_left are used for each of the resulting subdomains, in a parallel manner. The above procedure is repeated until the prescribed spatial resolution, i.e. number of levels, is achieved.

In Fig. 6, the parallel computing technique for a three-level domain decomposition process is depicted. Within the scope of a parallel computing environment, a set of tasks is executed by a worker \(W_{j}\) (thread) on a single core of a multicore machine. Hence, on a multicore system, ideally each worker is associated with each core. In this example, the task of decomposing each of the four subdomains of level 1 into a group of four new subdomains at level 2 is taken over by a total of four workers (or four cores). The 16 resulting subdomains can be further decomposed independently by 16 workers (or 16 cores).

Fig. 6
figure 6

The parallel computing technique for a three-level hierarchical decomposition, resulting in 64 subdomains. For each level, each worker (core) \(W_{j}\) takes over the task of generating a group of four subdomains along with approximating the corresponding interface values

5 An adaptive subdomain generation technique based on a posteriori error estimation

In this section, we propose a stopping criterion for the iterative procedures in Algorithms 2, 3, 4, 5, and 6. This criterion is based on an a posteriori error estimation technique, using the approximate global relations. The proposed criterion is then used for the design of a self-adaptive, subdomain generation algorithm, in order to increase the spatial resolution in specific areas of the computational domain, where the estimated errors are large.

Let us consider again the approximate global relation in matrix form for a linear elliptic PDE with source term

$$\begin{aligned} Ga=Hb+{\hat{f}}+\varepsilon , \end{aligned}$$
(19)

where \(\varepsilon \) is a term associated with the truncation error of the Legendre series expansion, as well as the error introduced when computing the integral transform of the source term, \({\hat{f}}\), using a numerical quadrature method. Since the exact expansion coefficients for the Dirichlet and Neumann values, a, b, should satisfy exactly the global relation, the estimator is defined as the maximum error arising when substituting the computed expansion coefficients into Eq. (19).

$$\begin{aligned} E=\left\| Ga^{*}-Hb^{*}-{\hat{f}} \right\| _{\infty }. \end{aligned}$$
(20)

Using Eq. (20), it is possible to prescribe a tolerance parameter, TOL, to be used as a stopping criterion for the aforementioned iterative algorithms. Then, the order of the Legendre polynomials \(N_{\ell }\) should be chosen according to this prescribed criterion. In [7], it is shown that the approximation error when using a truncated Legendre series is given by

$$\begin{aligned} \left\| u-u_{N_{\ell }} \right\| _{L^{2}(-1,1)}\le cN_{\ell }^{-\sigma }\left\| u \right\| _{H^{\sigma }(-1,1)}, \end{aligned}$$
(21)

where \(H^{\sigma }(-1,1)\) denotes the Hilbert space with continuous derivatives of order \(\sigma \). Using the above relation, and under the hypothesis of a sufficiently smooth function u, an exponential convergence rate can be assumed [44]

$$\begin{aligned} \left\| u-u_{N_{\ell }} \right\| _{L^{2}(-1,1)} \sim e^{-\mu N_{\ell }}, \end{aligned}$$
(22)

where \(\mu \in {\mathbb{R}}^{+}\). Hence, the following heuristic formula can be prescribed for the choice of the order of the Legendre polynomials,

$$\begin{aligned} N_{\ell }=\left\lceil \frac{-\ln (TOL)}{\mu } \right\rceil . \end{aligned}$$
(23)

By examining a variety of experimental results, we have found that a good choice for the parameter \(\mu \) is \(\mu \approx 2\).

figure h

In Algorithm 8, a modified version of Algorithm 2 is presented, using a stopping criterion for the termination of the iterative procedure. For each iteration, and for each of the resulting subdomains, the maximum norm of the respective global relation, using the resulting expansion coefficients, is computed. The algorithm is terminated once all the estimated errors for each subdomain become less than or equal to the prescribed error tolerance, TOL. It should be noted that Algorithms 3, 4, 5, and 6 can be modified similarly.

Using the presented error estimators, it is possible to design an alternative algorithm that adaptively generates subdomains on specific areas and not across the whole domain, based on a certain criterion. Let us consider a first-level decomposition, as depicted in Fig. 3. Then, instead of deriving all 16 subdomains at the second hierarchical level, only the first-level subdomains whose error estimates are above a certain threshold parameter, TSH, are further decomposed. The process is then repeated until the desired spatial resolution is obtained.

Fig. 7
figure 7

An adaptively decomposed computational domain with five hierarchical levels

In Fig. 7, an example of a decomposed computational domain using the adaptive subdomain generation technique, with five hierarchical levels, is shown. For this particular example, a threshold parameter, TSH, is assumed to be prescribed. Then, for each hierarchical level, the error estimates of each subdomain are compared to TSH; the estimates exceeding the value of TSH indicate the subdomains to be further decomposed. For the example presented in Fig. 7, the following subdomains are generated: Level 1: 2/4, Level 2: 4/16, Level 3: 8/64, Level 4: 7/256. In Algorithm 9, a modified version of Algorithm 7 is proposed. For each hierarchical level, the error estimate of each derived subdomain is computed along with the respective interface values. Consequently, only the subdomains that their respective error estimates satisfy the threshold criterion, TSH, are decomposed at the next level.

figure i

6 Computational complexity

The key step of the proposed algorithms is the evaluation of the discrete global relation, cf. (10). This procedure requires the formation of the coefficient matrices G, H and the solution of the resulting linear system. In order to form the coefficient matrices, the finite Fourier transform of the Legendre polynomials has to be computed for each entry, using Eq. (9). The modified Bessel function of the first kind is given by the following equation [37]

$$\begin{aligned} I_{\nu }(z)= \left( \frac{z}{2} \right) ^{\nu }\sum _{k=0}^{\infty }\frac{\left( \frac{z^{2}}{4} \right) ^{k}}{k!\varGamma (\nu +k+1)}. \end{aligned}$$
(24)

The series described by Eq. (24) is convergent for all z; however, when \(z\gg 1\), it is not (computationally) practical [37]. For large z, a computational approach and its implementation are given in [37]. Here, we discuss the computational complexity considering Eq. (24). It should be noted that for each entry of the coefficient matrices G, H, the argument of z is given by Eqs. (10) and (11). The magnitude of the argument is given by:

$$\begin{aligned} \left| z \right| =\left| -i\lambda h_{j} \right| =\left| -i\left( -{\bar{h}}_{p}\frac{R}{M}r \right) h_{j} \right| =\left| i {\bar{h}}_{p}h_{j}\frac{R}{M}r \right| , \, p,j=1,\ldots ,n, \, r=1,\ldots ,M. \end{aligned}$$
(25)

Using \(M=nN_{\ell }\), \(R=2M\) [28], for a polygonal domain lying on the unit circle, we have

$$\begin{aligned} \left| i {\bar{h}}_{p}h_{j}2r \right| \le 2\left| {\bar{h}}_{p} \right| \left| h_{j} \right| \left| r \right| =2nN_{\ell } , \, p,j=1,\ldots ,n. \end{aligned}$$
(26)

Considering a truncated series \(k=0,\ldots ,c\) for Eq. (24), we have \(\frac{3}{2}c^{2}+\frac{7}{2}c+(c+2)\nu +2\) arithmetic operations (additions and multiplications). The coefficient matrices G, H consist of \((2nM)\times (nN_{\ell })\) elements. By considering Eqs. (9), (10), setting up the linear system corresponding to the global relation requires \(4n^{2}MN\left( \frac{3}{2}c^{2}+\frac{7}{2}c+(c+2)\nu +3 \right) +k_{1}+k_{2}-2nM\) arithmetic operations. The constants \(k_{1}\), \(k_{2}\) are associated with the number of arithmetic operations for computing the exponential coefficients appearing in matrices G, H. Hence, the computational complexity is \({\mathcal{O}}\left( (6c^{2}+4\nu c)n^{2}MN \right) \).

The next step for determining the unknown boundary values requires the solution of the resulting overdetermined linear system. Here, we have used the QR factorization based on Householder transformations and the computational complexity for a matrix \(A\in {\mathbb{R}}^{m\times n}\) is \(2mn^{2}-\frac{2}{3}n^{3}\) [6]. Hence, in our case solving the resulting linear system requires \(4n^{3}MN_{\ell }^{2}-\frac{2}{3}n^{3}N_{\ell }^{3}\). It should be stated that discretizing the global relation results in relatively small, dense coefficient matrices. Increased accuracy can be achieved in most cases by considering Legendre polynomials of order between 10 and 15. In that case, for a square domain, the order of the coefficient matrix is \(480 \times 60\), with \(N_{\ell }=15.\)

The proposed solvers UTDtN_main and UTDtN_main_adaptive rely on the algorithmic schemes UTDtN_init, UTDtN_upper_right, UTDtN_upper_left, UTDtN_lower_right and UTDtN_lower_left. As discussed above, the most computationally demanding part of formulating the global relation and determining the unknown boundary values is the solution of the resulting linear system using the QR method. Since the key part of the aforementioned algorithms is the repetitive solution of the global relation, the total computational work is determined by the number of linear systems that need to be solved. Each of the algorithmic schemes UTDtN_init, UTDtN_upper_right, UTDtN_upper_left, UTDtN_lower_right and UTDtN_lower_left requires \(4(\textit{MaxIt}+1)\zeta \) arithmetic operations, where \(\zeta =4n^{3}MN_{\ell }^{2}-\frac{2}{3}n^{3}N_{\ell }^{3}\). When the a posteriori error estimator is used as a stopping criterion, we have \(4i_{l,s}\zeta \) arithmetic operations, where \(i_{l,s}\) is the number of iterations performed at the \(l_{th}\) level, in the \(s_{th}\) subdomain. The total number of arithmetic operations of the algorithmic scheme UTDtN_main is described by the following expression:

$$\begin{aligned}&4(MaxIt+1)\zeta +\sum _{j=0}^{lev}4^{j}16(MaxIt+1)\zeta \\&\quad =4(MaxIt+1)\zeta +16(MaxIt+1)\zeta \left( -\frac{1-4^{lev}}{3}+4^{lev} \right) \\&\quad =4(MaxIt+1)\zeta +16(MaxIt+1)\zeta \left( \frac{4^{lev+1}-1}{3} \right) \\&\quad =(MaxIt+1)\zeta \left\{ 4+16\left( \frac{4^{lev+1}-1}{3} \right) \right\} \\&\quad =\left( \frac{4^{lev+3}-4}{3} \right) (MaxIt+1)\zeta . \end{aligned}$$
(27)

Analogously, in the case of using the a posteriori estimator, we have:

$$\begin{aligned}&4i_{0,0}\zeta +4(i_{1,1}+i_{1,2}+i_{1,3}+i_{1,4})\zeta +\cdots \\&\quad =4i_{0,0}\zeta +4\zeta \sum _{m=1}^{lev}\sum _{n=1}^{4^{m}}i_{m,n}=4\zeta \left\{ i_{0,0}+\sum _{m=1}^{lev}\sum _{n=1}^{4^{m}}i_{m,n} \right\} . \end{aligned}$$
(28)

Hence, the overall computational complexity of UTDtN_main is approximately:

$$\begin{aligned} {\mathcal{O}} \left( \left( \frac{4^{lev+3}-4}{3} \right) (MaxIt+1)\left( 4n^{3}MN_{\ell }^{2}-\frac{2}{3}n^{3}N_{\ell }^{3} \right) \right) . \end{aligned}$$
(29)

Additionally, when using the a posteriori estimator, the overall computational complexity of UTDtN_main is approximately:

$$\begin{aligned} {\mathcal{O}} \left( 4\left( 4n^{3}MN_{\ell }^{2}-\frac{2}{3}n^{3}N_{\ell }^{3} \right) \left\{ i_{0,0}+\sum _{m=1}^{lev}\sum _{n=1}^{4^{m}}i_{m,n} \right\} \right) . \end{aligned}$$
(30)

The UTDtN_main_adaptive algorithm is similar to UTDtN_main, based on a posteriori error estimation. The main difference is that at each hierarchical level, a limited number of subdomains are further discretized, according to the threshold parameter TSH. In the worst-case scenario, where all of the subdomains are discretized, the approximate computational complexity of UTDtN_main_adaptive is given by Eq. (30).

7 Numerical results

In this section, the applicability of the proposed methodology is illustrated. In particular, the Laplace equation is solved on a square domain, \([-1,1]^{2}\), using the proposed algorithms. In order to assess the accuracy of the approximated solutions, we have used the following closed-form expression

$$\begin{aligned} u(x,y)=e^{1+x}cos(2+y). \end{aligned}$$
(31)

Initially, numerical results concerning the accuracy of the proposed method are given, using the two versions of the UTDtN algorithm (maximum iterations, error tolerance) for the Laplace equation on a square domain. Secondly, results are given for the UTDtN algorithm, including the adaptive subdomain generation technique based on error estimation for a square domain as well. The third set of numerical results involves the solution of Laplace PDE in an L-shaped domain in the presence of singularity, whereas in the fourth set, results are given for a nonconvex domain with multiple re-entrant corners. The final set of numerical results involves the performance of the parallel version of the UTDtN algorithm, whereas a brief comparison to a standard finite element domain decomposition scheme is conducted.

7.1 Numerical convergence on a square domain

In this subsection, the accuracy of the approximated Legendre expansion coefficients is assessed. These coefficients correspond to the solution as well as its normal derivative on the interfaces of the resulting subdomains, at each hierarchical level. The numerical errors have been computed with reference to the following equation

$$\begin{aligned} \frac{\left\| v_{\text{analytical}}-v_{\text{approximated}} \right\| _{\infty }}{\left\| v_{\text{analytical}} \right\| _{\infty }}. \end{aligned}$$
(32)
Fig. 8
figure 8

The computed errors at each iteration for the Dirichlet and Neumann expansion coefficients, on the interfaces of a first-level decomposed domain, with \(MaxIt=500\)

In Fig. 8, the computed errors for the Dirichlet \((b_{12},b_{32},b_{14},b_{34})\) and the Neumann \((a_{12},a_{32},a_{14},a_{34})\) expansion coefficients at the interfaces of a first-level decomposed domain, for each iteration with \(MaxIt=500\), are given. In Fig. 9, the computed errors for the Dirichlet and the Neumann expansion coefficients at the interfaces of the four subdomains of a second-level decomposed domain, for each iteration with \(MaxIt=400\), are given. In Figs. 10 and 11, the computed errors for the Dirichlet and the Neumann expansion coefficients at the interfaces of the subdomains 1–8 and 9–16, respectively, are shown for a third-level decomposed domain, for each iteration with \(MaxIt = 400\). In Tables 1 and 2, the estimated error, the exact errors for the Dirichlet and the Neumann values, and the total DtN iterations, for three hierarchical levels with TOL = 1E−08, are presented.

It should be noted that the a posteriori error estimation technique is a preferable termination criterion since it is possible to avoid the execution of excess iterations which can occur when assigning an arbitrarily large value for the MaxIt parameter.

Fig. 9
figure 9

The computed errors (y-axis) at each iteration (x-axis) for the Dirichlet and Neumann expansion coefficients, on the interfaces of a second-level decomposed domain, with \(MaxIt=400\)

Fig. 10
figure 10

The computed errors (y-axis) at each iteration (x-axis) for the Dirichlet and Neumann expansion coefficients, on the interfaces of a third-level decomposed domain, with MaxIt = 400 (subdomains 1–8)

Fig. 11
figure 11

The computed errors (y-axis) at each iteration (x-axis) for the Dirichlet and Neumann expansion coefficients, on the interfaces of a third-level decomposed domain, with \(MaxIt=400\) (subdomains 9–16)

Table 1 Computed and estimated errors for the interface values at levels 1 and 2, as well as total DtN iterations, with TOL = 1E−03
Table 2 Computed and estimated errors for the interface values at level 3, as well as total DtN iterations, with TOL = 1E−03

In Tables 3 and 4, the estimated error, cf. Eq. (20), the exact errors for the Dirichlet and the Neumann values, as well as the total DtN iterations, for three hierarchical levels with TOL = 1E−03, are given.

Table 3 Computed and estimated errors for the interface values at Levels 1 and 2, as well as total DtN iterations, with TOL = 1E−08
Table 4 Computed and estimated errors for the interface values at level 3. as well as total DtN iterations, with TOL = 1E−08

7.2 Adaptive subdomain generation results

In this subsection, numerical results are presented for the UTDtN algorithm with adaptive subdomain generation, based on a posteriori error estimation. We present three different cases with various numbers for the error estimate parameter TOL and the adaptivity threshold parameter TSH. The results involve the presentation of the computational domain after the final adaptive decomposition as well as the corresponding maximum computed errors at each hierarchical level. The numerical results have been obtained with reference to the exact solution given by Eq. (31).

Fig. 12
figure 12

Adaptively generated subdomains for seven hierarchical levels, with TOL = 1E−03 and TSH = 1E−04

Fig. 13
figure 13

Maximum computed error for each of the seven hierarchical levels, with TOL = 1E−03 and TSH = 1E−04

In Fig. 12, an adaptively decomposed subdomain into seven hierarchical levels, with TOL = 1E−03 and TSH = 1E−04, is presented. The corresponding maximum computed error for each hierarchical level is given in Fig. 13.

Fig. 14
figure 14

Adaptively generated subdomains for seven hierarchical levels, with TOL = 1E−04 and TSH = 5E−06

Fig. 15
figure 15

Maximum computed error for each of the 7 hierarchical levels, with TOL = 1E−04 and TSH = 5E−06

In Fig. 14, an adaptively decomposed subdomain into seven hierarchical levels, with TOL = 1E−04 and TSH = 5E−06, is shown, and the corresponding maximum computed error for each hierarchical level is given in Fig. 15. In Fig. 16, an adaptively decomposed subdomain into six hierarchical levels, with TOL = 1E−05, is depicted. The threshold parameter TSH has been adaptively selected with TSH = 1E−07 at levels 1–2 and TSH = 1E−06 at levels 3–6. The corresponding maximum computed error for each hierarchical level is given in Fig. 17.

The numerical results indicate that the given computational domain can be possibly further decomposed in particular areas where the a posteriori error estimate does not satisfy the prescribed threshold parameter. Moreover, the corresponding computed errors point out that the order of the accuracy is maintained through all hierarchical levels. These adaptive decompositions result in domains that are finer in areas with relatively large error estimates and coarser in the other areas.

Fig. 16
figure 16

Adaptively generated subdomains for six hierarchical levels with TOL = 1E−05 and adaptive threshold; TSH = 1E−07 at levels 1–2, and TSH = 1E−06 at levels 3–6

Fig. 17
figure 17

Maximum computed error for each of the six hierarchical levels with TOL = 1E−05 and adaptive threshold; TSH = 1E−07 at levels 1–2, and TSH = 1E−06 at levels 3–6

7.3 L-shaped domain with singularity

Here, we present a test case where the Laplace equation is solved over an L-shaped domain in the presence of a singularity near the origin. Generally, there are two effective ways to successfully solve such a problem. The first approach requires the numerical solution using adaptive mesh refinement techniques near the singular point [46]. The second approach involves the inclusion of singular functions directly in the numerical method [14]. In the context of the Fokas method, supplementing the Legendre basis functions with singular functions results in exponential convergence rates as opposed to algebraic ones when those functions are not included; however, conditioning issues are expected to arise as noted in [11]. Here, the UTDtN method without singular functions is considered for solving the above problem and the numerical results are compared to a solution obtained by an adaptive finite element method. Although we expect algebraic convergence as indicated by Theorem 2.8 in [3], we show that in this numerical example, our method is faster than the adaptive FEM. In a future publication, we aim to combine our domain decomposition approach with singular functions in order to improve both the accuracy and the relevant condition number of the linear system.

Fig. 18
figure 18

L-shaped domain with boundary conditions and a two-level decomposition

In Fig. 18, the L-shaped domain and a two-level decomposition are depicted. In Fig. 19, the computational mesh generated by an adaptive finite element method using 22172 triangles is depicted on the left. On the right, the subdomains generated by the UTDtN technique where the solution approximated on each boundary are shown. In Fig. 20, the solutions obtained using the adaptive FEM (red colour) and the UTDtN technique (black colour) are plotted against each other. Using \(N_{\ell }=15\) basis functions, we obtain an error of 2.95E−04 near the singularity, by comparing to the solution computed by the finite element method. The time required for computing the solution with the adaptive FEM was 8.21 s, whereas for the UTDtN was 4.87 s. It should be mentioned that UTDtN used 83 DtN iterations at the first level as well as 145, 148, and 149 iterations for each of the three subdomains at level 2. At the third level, three subdomains were generated with 149 DtN iterations, respectively. At the fourth level, four subdomains were generated with 140, 140, 141, and 143 iterations, respectively. Finally, at the fifth level, four subdomains were again generated with 154, 154, 156, and 158 DtN iterations, respectively.

Fig. 19
figure 19

Left: The computational mesh using an adaptive FEM method. Right: the generated subdomains for the UTDtN technique

Fig. 20
figure 20

The numerical solution over the L-shaped domain using an adaptive FEM method (red) and the UTDtN algorithm (black), shown from multiple angles (color figure online)

7.4 Nonconvex domain with multiple re-entrant corners

In this subsection, we consider the Laplace equation on a relatively complicated nonconvex domain with multiple re-entrant corners. We compare the usual unified transform on the boundary versus the proposed iterative domain decomposition algorithm (UTDtN). For the numerical results, we have used the exact solution given by Eq. (31). In Fig. 21, the nonconvex domain is depicted along with the corresponding ten-subdomain decomposition. We use the UTDtN algorithm in order to compute the solution on the interfaces \(\varGamma \). In Fig. 22, the numerical convergence for the solution across the interfaces \(\varGamma \) after 1000 iterations and using \(N_{\ell }=7\) basis functions is given. In Table 5, the condition numbers of the corresponding linear systems are given for the entire nonconvex domain as well as the decomposed subdomains. It is worth mentioning that using the UTDtN algorithm, the condition number decreases from 2.84E+18 to a maximum of 4.16E+04. Also, the time to compute all interface values after 1000 iterations was 4.16 s.

Since the optimal choice for the \(\theta \) value of the iterative DtN procedure is not known, we have considered a parameter testing numerical experiment. In Fig. 23, we present the maximum error across the interfaces \(\varGamma \) for several values of the parameter \(\theta \). It appears that the best numerical results in terms of accuracy are obtained in the interval \(\left[ 0.01, \; \, 0.105 \right] \).

It should be noted that the problem of decomposing a nonconvex polygon into a minimum number of convex polygons has been solved and “optimal decomposition algorithms” have been provided [9]. The above numerical results indicate that the proposed domain decomposition approach in conjunction with the method of Fokas can be useful for solving BVPs formulated on nonconvex domains.

An alternative approach using virtual sides has been introduced in [11] for solving BVPs on nonconvex domains via the Fokas method. This approach relies on introducing a virtual side between the appropriate corners of the domain in order to derive a number of convex polygons. The difference between this technique and ours lies in the solution procedure; in [11], one linear system is assembled including the matched Cauchy data across the virtual side in the solution vector. If \(n_{vs}\) virtual sides are used, the coefficient matrix is then augmented by \(2n_{vs}\) block columns of size depending on the order N of the selected basis functions. The authors have reported a two-subdomain numerical experiment obtaining high accuracy in approximately 0.06 s; however, their method needs to be tested in a complicated nonconvex domain with multiple re-entrant corners in order to make a proper comparison to our approach. It is certain that including a large number of subdomains, hence a large number of virtual sides, the technique in [11] will result in a relatively large coefficient matrix. Using our technique, it is possible to handle any nonconvex domain by solving small linear systems of low condition number, cf. Table 5, obtaining high-order accuracy as well, cf. Fig. 22. Hence, in cases of large nonconvex domains needing to be decomposed into a large number of subdomains, our method is expected to be faster since it is inherently parallel and each subdomain can be solved independently.

Fig. 21
figure 21

Nonconvex domain with multiple re-entrant corners and a ten-subdomain decomposition

Fig. 22
figure 22

Numerical convergence of the interface values \(\varGamma \) after 1000 iterations

Table 5 Condition number of the linear system for the entire domain versus the decomposed subdomains
Fig. 23
figure 23

Numerical convergence of the interface values \(\varGamma \) after 1000 iterations

7.5 Parallel performance of UTDtN

In this subsection, the performance of the parallel implementation of the UTDtN algorithm is presented. We assess the proposed parallel algorithm by solving the Laplace equation on a square domain using the solution given by Eq. (31) as reference. Furthermore, to highlight the possible advantages, it is compared to a parallel implementation of a standard Schur complement approach, based on the finite element method.

In Table 6, the total time (in seconds) to compute both the Dirichlet and the Neumann interface values, for various numbers of subdomains and cores, is given with TOL = 1E−03 and TOL = 1E−08. In addition, in Figs. 24 and 25, the speedups for various numbers of subdomains and cores are presented with TOL = 1E−03 and TOL = 1E−08, respectively. The numerical results depicted in Table 6 and Figs. 24 and 25 were obtained on a machine with an Intel Xeon CPU E5-2420v2, 2.2 GHz with 64 GB RAM.

Additionally, we have performed numerical experiments on the ARIS supercomputer (GRNET). Each compute node consists of \(2\,\times\,\) Ivy Bridge Intel Xeon E5-2680v2 (10 cores each) and 64 GB RAM. In Table 7, the total time (in seconds) to compute both the Dirichlet and the Neumann interface values, for 256 and 1024 subdomains, and several numbers of cores, is given with TOL = 1E−03 and TOL = 1E−08. Moreover, in Figs. 26 and 27, the speedups for 256 and 1024 subdomains, and various numbers of cores are presented with TOL = 1E−03 and TOL = 1E−08, respectively.

In Table 8, a comparison between the parallel UTDtN and a parallel finite element, Schur complement scheme (DDFEM), is presented. The total time to compute all interface values as well as the maximum computed errors on Intel Xeon CPU E5-2420v2, 2.2 GHz, 64 GB RAM, with six cores, is given.

The DDFEM scheme produces subdomains in the same manner as the UTDtN algorithm does. The main difference is the fact that DDFEM proceeds in the discretization of the interior of the subdomains, using a triangulation. Additionally, the initial mesh can be iteratively refined according to a parameter MR, which denotes the number of mesh refinements. It should be mentioned that the refinement procedure involves the division of each triangle into four new triangles of the same shape. The DDFEM scheme relies on a Schur complement approach; however, the corresponding Schur complement matrix S does not need to be explicitly formed; the associated linear system is solved using an iterative method where only matrix-by-vector operations are required. In this case, we have considered the preconditioned conjugate gradient (PCG) method [39], using 500 maximum iterations and 1E-10 tolerance. It should be noted that in order to be properly compared, only the interface values have been computed. In “Appendix”, the details of the DDFEM algorithm are given.

Table 6 The total time to compute all Dirichlet and Neumann interface values for various numbers of subdomains and cores
Fig. 24
figure 24

Speedups for various numbers of subdomains and cores, with TOL = 1E−03

Fig. 25
figure 25

Speedups for various numbers of subdomains and cores, with TOL = 1E−08

Table 7 The total time to compute all Dirichlet and Neumann interface values for various numbers of subdomains and cores, on ARIS HPC system
Fig. 26
figure 26

Speedups for various numbers of subdomains and cores, with TOL = 1E−03, on ARIS HPC system

Fig. 27
figure 27

Speedups for various numbers of subdomains and cores, with TOL = 1E−08, on ARIS HPC system

Table 8 Total time and corresponding computed error for approximating the interface values, using the DDFEM and UTDtN_main algorithms, for several values of the parameters MR and TOL, and for several numbers of subdomains, sdoms

The numerical results indicate that our proposed scheme is much better in terms of speed and accuracy, especially in the case of a large number of subdomains. The main disadvantage of the finite element method lies in the generation of the computational mesh, a task that is computationally expensive when higher accuracy is needed.

As it has been previously stated, our proposed technique does not require mesh generation; the solution is only computed on the subdomains’ interfaces by solving smaller one-dimensional problems. In the case that higher resolution is required, the computational domain is further hierarchically decomposed, and the solution is obtained at the newly introduced interfaces.

8 Conclusion

A class of novel techniques for the solution of linear elliptic PDEs, based on the unified transform, has been presented. The proposed techniques rely on an iterative Dirichlet-to-Neumann approach, where the Fokas global relations constitute the essential component of the methodology. By reformulating an iterative Dirichlet-to-Neumann algorithm in terms of the approximate global relation, we have designed a domain decomposition-type class of techniques that have the following properties: inherent parallelism, high accuracy, and meshless subdomains resulting in reduced dimension approximations. In addition, we have presented an error estimation technique based on the global relation that has been initially used for the termination of the associated iterative procedures and secondly for the design of a numerical scheme that adaptively generates meshless subdomains. We have presented various sets of numerical results indicating the applicability of the proposed techniques. Furthermore, we have made brief comparisons to a Schur complement finite element method as well as an adaptive finite element method, highlighting the possible advantages. Future work will be directed towards the parallelization of the proposed scheme on distributed memory parallel systems and further study and application of the proposed techniques to a variety of practical problems arising in engineering and sciences as well as ill-posed problems.