Abstract
As the number of processor cores on supercomputers becomes larger and larger, algorithms with high degree of parallelism attract more attention. In this work, we propose a two-level space–time domain decomposition method for solving an inverse source problem associated with the time-dependent convection–diffusion equation in three dimensions. We introduce a mixed finite element/finite difference method and a one-level and a two-level space–time parallel domain decomposition preconditioner for the Karush–Kuhn–Tucker system induced from reformulating the inverse problem as an output least-squares optimization problem in the entire space-time domain. The new full space–time approach eliminates the sequential steps in the optimization outer loop and the inner forward and backward time marching processes, thus achieves high degree of parallelism. Numerical experiments validate that this approach is effective and robust for recovering unsteady moving sources. We will present strong scalability results obtained on a supercomputer with more than 1000 processors.
Similar content being viewed by others
Avoid common mistakes on your manuscript.
1 Introduction
In this paper, we consider an inverse problem associated with the time-dependent convection–diffusion equation defined in \({\varOmega }\in \mathbf {R}^3\):
where \(f(\mathbf{x}, t)\) is the source profile to be recovered, \(a(\mathbf {x})\) and \(\mathbf {v}(\mathbf {x})\) are the given diffusivity and convective coefficients, and \(\varGamma _1\) and \(\varGamma _2\) are two disjoint parts of the boundary \(\partial {\varOmega }\). Dirichlet and Neumann boundary conditions are imposed respectively on \(\varGamma _1\) and \(\varGamma _2\). When the observation data \(C(\mathbf{x},t)\) is available at certain locations, several classes of inverse problems associated with the convection–diffusion equation (1) have been investigated, such as the recovery of the diffusivity coefficient with applications in, for examples, laminar wavy film flows [21], and flows in porous media [27], the recovery of the source with applications in, for examples, convective heat transfer problems [39], indoor airborne pollutant tracking [25], and ground water contamination modeling [29, 32, 36], etc.
The main focus of this work is to study the following inverse problem: given the measurement data \(C^{\varepsilon }(\mathbf {x},t)\) of \(C(\mathbf {x},t)\) at some locations inside \(\varOmega \) for the period \(0<t<T\) (\(\varepsilon \) denotes the noise level), we try to recover the space–time-varying source locations and intensities, i.e., the source function \(f(\mathbf{x}, t)\) in Eq. (1). The inverse source problem has been studied in different cases, for example, the recovering of the location and time-dependent intensity of point sources in [4, 13, 20, 34], the piecewise-constant sources in [3, 37] and Gaussian concentrated sources in [2, 3]. Among these different approaches, the Tikhonov optimization method is most popular [4, 13, 19, 34], which reformulates the original inverse source problem into an output least-squares optimization problem with PDE-constraints, by introducing some appropriate regularizations to ensure the stability of the resulting optimization problem with respect to the change of noise in the observation data [14, 35].
We define the following objective functional with Tikhonov regularization:
where \(C=C(f)\) is the solution to the system (1) corresponding to a given source \(f,\,A(\mathbf{x})\) is the data range indicator function given by \(A(\mathbf{x})=\sum _{i=1}^{s} \delta (\mathbf{x}-\mathbf{x}_i)\), with \(\mathbf{x}_1,\,\mathbf{x}_2,\,\ldots ,\,\mathbf{x}_{s}\) being a set of specified locations where the concentration C is measured, and \(C^{\varepsilon }(\mathbf{x},t)\) represents the measurement of \(C(\mathbf{x},t)\) at a specified location \(\mathbf{x}\) and time t. The term \(N_{\beta }(f)\) in (2) is called the regularization with respect to the source. Since \(f(\mathbf{x}, t)\) depends on both space and time, we propose the following space–time \(H^1\)–\(H^1\) regularization:
Here \(\beta _1\) and \(\beta _2\) are two regularization parameters. Other regularizations, such as \(H^1\)–\(L^2\), may be used, but we will show later by numerical experiments that \(H^1\)–\(H^1\) regularization may offer better numerical reconstructions.
Various approaches are available for the minimization of the nonlinear functional J(f) in (2) associated with the system (1) [22, 34, 38]. One of the approaches is the Lagrange multiplier method, which converts the constrained minimization of functional J(f) into a unconstrained minimization of the corresponding Lagrange functional of J(f). This results in the solution of a so-called Karush–Kuhn–Tucker (KKT) system [23], which involves three coupled time-dependent PDEs here, namely the governing equation (1) for the concentration C, its adjoint equation for the Lagrange multiplier and the equation for the identifying source function f; see Sect. 2 for more detail. For solving such a KKT system, the traditional reduced space SQP method is a popular and natural choice [2, 13, 34]. The SQP method solves the three coupled PDEs in the KKT system alternatively by iteration. One may see the essential sequential feature of the SQP: the outer iteration is sequential among the solutions of three PDEs, the governing equation (1) is forward in time, and the adjoint equation for the Lagrange multiplier is backward in time. The parallelization of the SQP may happen for the solution of each of the three time-dependent PDEs, which can be solved by, e.g., the traditional fast algorithms such as domain decomposition and multigrid methods [2]. SQP requires low memory, but it usually takes a large number of iterations to reach convergence. Because of its essential sequential feature, the reduced space SQP method is less ideal for parallel computers with a large number of processor cores, compared with the full space SQP methods. Full space methods were studied for steady state problems in [9, 11], but for unsteady problems it needs to eliminate the sequential steps in the outer iteration of the SQP and solve the full space–time system as a coupled system. Because of the much larger size of the system, the full space approach may not be suitable for parallel computer systems with a small number of processor cores, but it has fewer sequential steps and thus offers a much higher degree of parallelism required by large scale supercomputers [12].
It is a very active research direction to construct efficient parallelization methods for highly nonlinear optimizations constrained with PDEs. An unsteady PDE-constrained optimization problem was solved in [38] for the boundary control of unsteady incompressible flows by solving a subproblem at each time step. It has the sequential time-marching process and each subproblem is steady-state. The parareal algorithms were studied in [5, 15, 24], which involve a coarse-grid solver for prediction and a fine-grid solver for correction in time. Parallel implicit time integrator method (PITA), space–time multigrid, multiple shooting methods can be categorized as improved versions of the parareal algorithm [17, 18]. The parareal algorithm combined with domain decomposition method [26] or multigrid method can be powerful. However, most existing parareal related studies have focused mainly on the stability and convergence [16].
In this work, we will study an effective reconstruction of the time history and intensity profile of a source function simultaneously. For this aim, we propose a fully implicit, mixed finite element and finite difference discretization scheme for the globally coupled KKT system, and a corresponding space–time overlapping Schwarz preconditioner for solving the large-scale discretised KKT system. The method removes all the sequential inner time steps and achieves full parallelization in both space and time. We shall not compare the proposed method with traditional reduced space SQP methods, since it is likely that, for supercomputers with a small number of processor cores, the traditional approach is still a better choice. The focus of the current study is to formulate the technical details of our new space–time parallel algorithm, which may play a promising role for future exascale computing systems. Furthermore, to resolve the dilemma that the number of linear iterations of one-level methods increases with the number of processors [33], we will develop a two-level space–time hybrid Schwarz preconditioner, which offers better performance in terms of the number of iterations and the total compute time.
The rest of the paper is arranged as follows. In Sect. 2, we describe the mathematical formulation of the inverse problem and the derivation of the KKT system. We propose, in Sect. 3, the main algorithm of the paper, and discuss several technical issues involved in the fully implicit discretization scheme and the one- and two-level overlapping Schwarz methods for solving the KKT system. Numerical experiments for the recovery of 3D sources are given in Sect. 4, and some concluding remarks are provided in Sect. 5.
2 Strong Formulation of KKT System
We now derive the KKT system of the minimisation of J(f) in (2) combined with the Eq. (1). To do so, we formally write the first equation in (1) as an operator equation \(L(C, f)=0\). By introducing a Lagrange multiplier or adjoint variable \(G\in H^1(0,T; H^1({\varOmega }))\), the Lagrange functional [3, 22]
transforms the PDE-constrained minimization of J(f) in (2) into an unconstrained saddle-point optimization problem, where (G, L(C, f)) denotes the inner product of G and L(C, f). Two approaches are available for the resulting saddle-point optimization of \({\mathscr {J}}\), the optimize–then–discretize approach and the discretize–then–optimize approach. The first approach derives a continuous optimality system and then applies certain discretization scheme, such as a finite element method to obtain a discrete system ready for computation. The second approach discretizes the Lagrange functional \({\mathscr {J}}\), and then the objective functional becomes a finite dimensional quadratic polynomial. The solution algorithm is then based on the polynomial system. The two approaches perform the approximation and discretization at different stages, both have been applied successfully [28]. We shall use the optimize–then–discretize approach in this work.
The first-order optimality conditions of \({\mathscr {J}}\) in (4), i.e., the KKT system, is obtained by taking its variations with respect to \(G,\,C\) and f as
for all \(v, w\in L^2(0,T; H^1_{\varGamma _1}({\varOmega }))\) with zero traces on \(\varGamma _1\) and \(g\in H^1(0,T; H^1({\varOmega }))\). Then by using integration by parts, we may derive the following strong form of the KKT system:
To derive the boundary, initial and terminal conditions for each variable of the equations, we make use of the property that (5) holds for arbitrary directional functions v, w and g. For the state equation, i.e., the first one in (6), it maintains the same conditions as in (1). For the adjoint equation, i.e., the second one in (5) or (6), we can deduce by integration by parts for any test function \(w\in L^2(0,T; H^1_{\varGamma _1}({\varOmega }))\) with \(w(\cdot , 0)=0\) and \(a(\mathbf{x}){\partial w}/{\partial \mathbf {n}}=0\) on \(\varGamma _2\):
By the arbitrariness of w, the boundary and terminal conditions for G are derived:
Similarly for the third equation of (5) or (6), we can deduce
Using the arbitrariness of g, we derive the boundary, initial and terminal conditions for f:
3 A Fully Implicit and Fully Coupled Method
In this section, we first introduce a mixed finite element and finite difference method for the discretization of the continuous KKT system derived in the previous section, then we briefly mention the algebraic structure of the discrete system of equations. In the second part of the section, we introduce the one- and two-level space–time Schwarz preconditioners that are the most important components for the success of the overall algorithm.
3.1 Fully Implicit Space–Time Discretization
In this subsection, we introduce a fully implicit finite element/finite difference scheme to discretize (6). To discretize the state and adjoint equations, i.e., the first two equations in (6), we use a second-order Crank–Nicolson finite difference scheme in time and a piecewise linear continuous finite element method in space. Consider a regular triangulation \({\mathscr {T}}^h\) of domain \(\varOmega \), and a time partition \(P^{\tau }\) of the interval [0, T]: \(0=t^0<t^1<\cdots < t^M=T,\) with \(t^n=n \tau , \tau =T/M\). Let \(V^h\) be the piecewise linear continuous finite element space on \({\mathscr {T}}^h\), and \(\mathring{V}^h\) be the subspace of \(V^h\) with zero trace on \(\varGamma _1\). We introduce the difference quotient and the averaging of a function \(\psi (\mathbf{x}, t)\) as
with \(\psi ^n(\mathbf{x}):=\psi (\mathbf{x}, t^n)\). Let \(\pi _h\) be the finite element interpolation associated with the space \(V^h\), then we obtain the discretizations for the state and adjoint equations by finding the sequence of approximations \(C_h^n, G_h^n\in V^h\) for \(n=0, 1,\,\ldots ,\,M\) such that \(C_h^0=\pi _hC_0,\,G_h^M=\mathbf {0}\), and \(C_h^n(\mathbf{x})=\pi _h p(\mathbf{x}, t^n), G_h^n(\mathbf{x})=0\) for \(\mathbf{x}\in \varGamma _1\), and satisfying
Unlike the approximations of the forward and adjoint equations in (8), we shall approximate the source function f differently. We know that the source function satisfies an elliptic equation [see the third equation in (6)] in the space–time domain \({\varOmega }\times (0,T)\). So we shall apply \({\mathscr {T}}^h \times P^{\tau }\) to generate a partition of the space–time domain \({\varOmega }\times (0,T)\), and then apply the piecewise linear finite element method in both space (three dimensions) and time (one dimension), denoted by \(W_h^{\tau }\), to approximate the source function f. Then the equation for \(f\in W_h^{\tau }\) can be discretized as follows: Find the sequence of \(f_h^n\) for \(n=0, 1,\,\ldots ,\,M\) such that
The coupled system (8)–(9) is the so-called fully discretized KKT system. In the Appendix, we provide some details of the discrete structure of this KKT system.
3.2 One- and Two-Level Space–Time Schwarz Preconditioner
The unknowns of the KKT system (8)–(9) are often ordered physical variable by physical variable, namely in the form
Such ordering is used extensively in reduced space SQP methods [13]. In our all-at-once method, the unknowns C, G and f are ordered mesh point by mesh point and time step by time step, and all unknowns associated with a point stay together as a block. At each mesh point \(\mathbf{x}_{j},\,j=1,\,\ldots ,\,N\), and time step \(t^n,\,n=0,\,\ldots ,\,M\), the unknowns are arranged in the order of \(C_{j}^n, G_{j}^n,f_{j}^n\). Such ordering avoids zero values on the main diagonal of the matrix and has better cache performance for point-block LU (or ILU) factorization based subdomain solvers. More precisely, we define the solution vector as
then the linear system (8)–(9) is rewritten as
where \(F_h\) is a sparse block matrix of size \((M+1)(3N)\) by \((M+1)(3N)\) with the following block structure:
where the block matrices \(S_{ij}\) for \(0\le i,j \le M\) are of size \(3N\times 3N\) and most of its elements are zero matrices except the ones in the tridiagonal stripes \(\{S_{i,i-1}\}, \{S_{i,i}\}, \{S_{i,i+1}\}\). It is noted that if we denote the submatrices for C, G and f of size \(N\times N\) respectively by \(S_{ij}^C, S_{ij}^G , S_{ij}^f\) in each block \(S_{ij}\), the sparsity of the matrices are inconsistent, namely, \(S_{ij}^f\) is the densest and \(S_{ij}^G\) is the sparest. This is due to the discretization scheme we have used. The system (10) is large-scale and ill-conditioned, therefore is difficult to solve because the space–time coupled system is denser than the decoupled system, especially in three dimensions.
We shall design the preconditioner by extending the classical spatial Schwarz preconditioner to include both spatial and temporal variables. Such an approach eliminates all sequential steps and the unknowns at all time steps are solved simultaneously. We use a right-preconditioned Krylov subspace method to solve (10),
where \(M^{-1}\) is a space–time Schwarz preconditioner and \(U=M^{-1}U'\).
Denoting the space–time domain by \(\varTheta ={\varOmega }\times (0,T)\), an overlapping decomposition of \(\varTheta \) is defined as follows: we divide \({\varOmega }\) into \(N_s\) subdomains, \({\varOmega }_1, {\varOmega }_2,\,\ldots ,\,{\varOmega }_{N_s}\), then partition the time interval [0, T] into \(N_t\) subintervals using the partition: \(0<T_1<T_2<\cdots <T_{N_t}\). By coupling all the space subdomains and time subintervals, a decomposition of \(\varTheta \) is \(\varTheta = \cup _{i=1}^{N_s} (\cup _{j=1}^{N_t} \varTheta _{ij})\), where \(\varTheta _{ij} = {\varOmega }_i \times (T_{j-1}, T_{j})\). For convenience, the number of subdomains, i.e. \(N_sN_t\), is equal to the number of processors. These subdomains \(\varTheta _{ij}\) are then extended to \(\varTheta '_{ij}\) to overlap each other. The boundary of each subdomain is extended by an integral number of mesh cells in each dimension, and we trim the cells outside of \(\varTheta \). The corresponding overlapping decomposition of \(\varTheta \) is \(\varTheta = \cup _{i=1}^{N_s} (\cup _{j=1}^{N_t} \varTheta '_{ij})\). See the left figure of Fig. 1 for the overlapping extension. The matrix on each subdomain \(\varTheta _{ij}'={\varOmega }_{i}'\times (T'_{j-1}, T'_{j}),\,i=1,2,\,\ldots ,\,N_s,\,j=1,2,\,\ldots ,\,N_t\) is the discretized version of the following system of PDEs
with the following boundary conditions
along with the initial and terminal time boundary conditions
One may notice from (13) that the homogenous Dirichlet boundary conditions are applied in each time interval \((T'_{j-1}, T'_{j})\), as the solution of the subdomain problem is not really physical. This is one of the major differences between the space–time Schwarz method and the parareal algorithm [24]. The time boundary condition for each subproblem of the parareal algorithm is obtained by an interpolation of the coarse solution, and if the coarse mesh is fine enough, the solution of the subdomain problem is physical. Therefore, the parareal algorithms can be used as a solver, but our space–time Schwarz method can only be used as a preconditioner. Surprisingly, as we shall see from our numerical experiments in Sect. 4, the Schwarz preconditioner is an excellent one even though the time boundary conditions violate the physics.
We solve the subdomain problems using the same method as for the global problem (10), no time-marching is performed in our new algorithm, and all unknowns affiliated with the subdomain are solved simultaneously. Let \(M_{ij}\) be the matrix generated in the same way as the global matrix \(F_h\) in (10) but for the subproblem (11)–(13), and \(\tilde{M}_{ij}^{-1}\) be an exact or approximate inverse of \(M_{ij}\). By denoting the restriction matrix from \(\varTheta \) to the subdomain \(\varTheta '_{ij}\) by \(R^{\delta }_{ij}\), with overlapping size \(\delta \), we propose the following one-level space–time restricted Schwarz preconditioner for the global matrix \(F_h\):
As it is well known, any one-level domain decomposition methods are not scalable with the increasing number of subdomains or processors [33]. Instead one should have multilevel methods in order to observe possible scalable effects [1, 33]. We now propose a two-level space–time additive Schwarz preconditioner. To do so, we partition \({\varOmega }\) with a fine mesh \({\varOmega }^h\) and a coarse mesh \({\varOmega }^c\). For the time interval, we have a fine partition \(P^{\tau }\) and a coarse partition \(P^{\tau _c}\) with \(\tau < \tau _c\). We will adopt a nested mesh, i.e., the nodal points of the coarse mesh \({\varOmega }^c\times P^{\tau _c}\) are a subset of the nodal points of the fine mesh \({\varOmega }^h\times P^{\tau }\). In practice, the size of the coarse mesh should be adjusted properly to obtain the best performance. On the fine level, we simply apply the previously defined one-level space–time additive Schwarz preconditioner; and to efficiently solve the coarse problem, a parallel coarse preconditioner is also necessary. Here we use the overlapping space–time additive Schwarz preconditioner and for simplicity divide \({\varOmega }^c\times P^{\tau _c}\) into the same number of subdomains as on the fine level, using the non-overlapping decomposition \(\varTheta = \cup _{i=1}^{N_s} (\cup _{j=1}^{N_t} \varTheta _{ij})\). When the subdomains are extended to overlapping ones, the overlapping size is not necessarily the same as that on the fine mesh. See the right figure of Fig. 1 for a coarse version of the space–time decomposition. We denote and define the preconditioner on the coarse level by
where \(\delta _c\) is the overlapping size on the coarse mesh. Here the matrix \(\tilde{M}^{-1}_{ij,c}\) is an approximate inverse of \(M_{ij,c}\) which is obtained by a discretization of (11)–(13) on the coarse mesh on \(\varTheta '_{ij}\).
To combine the coarse preconditioner with the fine mesh preconditioner, we need a restriction operator \(I^c_h\) from the fine to coarse mesh and an interpolation operator \(I^h_c\) from the coarse to fine mesh. For our currently used nested structured mesh and linear finite elements, \(I^h_c\) is easily obtained using a linear interpolation on the coarse mesh and \(I^c_h = (I^h_c)^T\). We note that when the coarse and fine meshes are nested, instead of using \(I^c_h = (I^h_c)^T\), we may take \(I^c_h\) to be a simple restriction, e.g., the identity one which assigns the values on the coarse mesh using the same values on the fine mesh. In general, the coarse preconditioner and the fine preconditioner can be combined additively or multiplicatively. According to our experiments, the following multiplicative version works well:
where \(F_c^{-1}\) corresponds to the GMRES solver right-preconditioned by \(M_{c}^{-1}\) on the coarse level, and \(F_h\) is the discrete KKT system (10) on the fine level.
4 Numerical Experiments
In this section we present some numerical experiments to study the parallel performance and robustness of the newly developed algorithms. When using the one-level preconditioner, we use a restarted GMRES method (restarts at 50) to solve the preconditioned system; when using the two-level preconditioner, we use the restarted flexible GMRES (fGMRES) method [30] (restarts at 30), considering the fact that the overall preconditioner changes from iteration to iteration because of the iterative coarse solver. Although fGMRES needs more memory than GMRES, we have observed its number of iterations can be significantly reduced. The relative convergence tolerance of both GMRES and fGMRES is set to be \(10^{-6}\). The initial guesses for both GMRES and fGMRES method are zero. The size of the overlap between two neighboring subdomains, denoted by iovlp, is set to be 1 unless otherwise specified. The subsystem on each subdomain is solved by an incomplete LU factorization ILU(k), with k being its fill-in level, and \(k=0\) if not specified. The algorithms are implemented based on the Portable, Extensible Toolkit for Scientific computation (PETSc) [6] and run on a Dawning TC3600 blade server system at the National Supercomputing Center in Shenzhen, China with a 1.271 PFlops/s peak performance.
In our computations, the settings for the model system (1) are taken as follows. The computational domain, the terminal time and the initial condition are taken to be \({\varOmega }=(-2,2)^3,\,T=1\) and \(C(\cdot ,0)=0\) respectively. Let \(L=S=H=2\), then the homogeneous Dirichlet and Neumann conditions in (1) are respectively imposed on \(\varGamma _1=\{\mathbf {x}=(x_1,x_2,x_3); ~|x_1|=L~\text{ or } ~|x_2|=S\}\) and \(\varGamma _2=\{\mathbf {x}=(x_1,x_2,x_3); ~|x_3|=H\}\). Furthermore, the diffusivity and convective coefficients are set to be \(a(\mathbf {x})=1.0\) and \(\mathbf {v}(\mathbf {x})=(1.0,1.0,1.0)^T\).
In order to generate the observation data, we solve the forward convection–diffusion equation (1) on a very fine mesh \(265\times 265\times 265\) with a small time step size 1 / 96, and the resulting approximate solution \(C(\mathbf{x},t)\) is used as the noise-free observation data. The measurement data are chosen on a set of nested meshes (the concrete choice is given for each numerical example later), which may not necessarily be part of the fine mesh we used to compute the noise-free data and hence linear interpolations are needed to obtain the concentration at each selected measurement point. Then a random noise is added in the following form at the locations where the measurements are taken:
here r is a random function with the standard Gaussian distribution, and \(\varepsilon \) is the noise level. We take \(\varepsilon = 1\,\%\) in our numerical experiments if it is not specified otherwise. As for most inverse problems, the regularization parameters [see \(\beta _1\) and \(\beta _2\) in (3)] are important to effective numerical reconstructions. In this work we shall not discuss about the technical selection of these regularization parameters but choose them heuristically.
The numerical tests are designed to investigate the reconstruction effects with different types of three-dimensional sources by the proposed one- or two-level space–time Schwarz method, as well as the robustness of the algorithm with respect to different noise levels, different regularizations and amount of measurement data. In addition, parallel efficiency of the proposed algorithms is also studied.
4.1 Reconstruction of 3D Sources
We devote this subsection to test the numerical reconstruction of three representative 3D sources by the proposed one-level space–time method, with \(np=256\) processors. Each of the three examples are constructed with its own special difficulty.
Example 1
(two Gaussian sources). This example tests two moving Gaussian sources in \({\varOmega }\), namely the source f takes the form:
with \(a=2.0\) and two moving centers of the sources are given by
The moving traces of the sources are shown in Fig. 2.
In the first experiment, we use a \(40\times 40\times 40\) mesh and time step size 1 / 39 for the inversion process. The measurements are taken from the mesh \(14\times 14\times 14\), which is uniformly located in \({\varOmega }\), with the mesh size being 1 / 13. The regularization parameters are set to be \(\beta _1=3.6\times 10^{-6}\) and \(\beta _2=3.6\times 10^{-3}\). In Fig. 3, the numerically reconstructed sources are compared with the exact ones at three moments \(t_1=10/39, t_2=20/39, t_3=30/39\). We can see that the source locations and intensities are quite close to the true values at the three chosen moments. Then we increase the noise level to \(\varepsilon = 5\,\%\) and \(\varepsilon = 10\,\%\), still with the same set of parameters. The reconstruction results are shown in Fig. 4. We observe that the reconstructed profiles deteriorate and become oscillatory as the noise level increases. This is expected since the regularized solutions provided by the minimization of functional J(f) in (2) become less accurate, so are their numerical approximate solutions obtained from the discretised KKT system (8)–(9). We have tested four different sets of regularization parameters for the \(H^1\)–\(H^1\) regularization in (3), and present the \(L^2\)-norm errors between the reconstructed source f and the exact source function \(f^*\) at the aforementioned three moments \(t_1,\,t_2\) and \(t_3\). The \(L^2\)-norm error at time \(t_j\) is defined here by \(e_j= \Vert (f-f^*)(t_j)\Vert _{L^2(\varOmega )}\) for \(j=1,2,3\), and is shown in Table 1 for each set of regularization parameters.
Example 2
(Four constant sources). Appropriate choices of regularizations are important for the inversion process. In the previous example we have used a \(H^1\)–\(H^1\) Tikhonov regularization in both space and time. In this example, we intend to compare the \(H^1\)–\(H^1\) regularization with the following \(H^1\)–\(L^2\) regularization
For the comparisons, we consider the case in which four constant sources move along the diagonals of the cube to their far corner. The four source distributions are specified by
for \(i=1, 2, 3, 4\), where the constants \(a_i\) are given by \(a_1 = a_4 = 2.0,\,a_2 = a_3 = 1.0\), and their traces are described respectively by
Same mesh and measurements are used as in Example 1, and the regularization parameters are set to be \(\beta _1= 10^{-5}, \beta _2= 10^{-3}\) in \(N_{\beta }(f)\), and \(\beta _1= 10^{-5}, \beta _2= 10^{-8}\) in \(\tilde{N}_{\beta }(f)\), respectively. The reconstruction results are compared with the true solution at three moments \(t_1=10/39,\,t_2=20/39,\,t_3=30/39\), and two slices at \(x=0.95\) and \(x=-0.95\). It is observed from Fig. 5 that the resolution of the source profile is much better with the \(H^1\)–\(H^1\) regularization \(N_{\beta }(f)\) than with the \(H^1\)–\(L^2\) regularization \(\tilde{N}_{\beta }(f)\), and the latter presents a reconstruction process that is much less stable and much more oscillatory. Furthermore, we demonstrate in Table 2 the \(L^2\)-norm errors between the reconstructed source functions \(f_i\) and the exact sources \(f_i^*\) at the three specified moments \(t_1,\,t_2,\,t_3\) for both the \(H^1\)–\(H^1\) and \(H^1\)–\(L^2\) regularizations. The \(L^2\)-norm error at time \(t_j\) is defined here by \(e_j= \Vert \sum _{i=1}^4 (f_i-f_i^*)(t_j)\Vert _{L^2(\varOmega )}\) for \(j=1,2,3\). We can see that the errors with the \(H^1\)–\(H^1\) regularization are slightly smaller than that of the \(H^1\)–\(L^2\) regularization.
Example 3
(Eight moving sources). This last example presents a very challenging case that eight Gaussian sources are initially located at the corners of the physical cubic domain, then move inside the cube following their own traces given below. The Gaussian sources are described by
where the coefficients \(a_i\) and the source traces are represented by
and
We shall use the mesh \(64\times 64\times 64\) and the time step size 1 / 47, with two regularization parameters \(\beta _1=3.6\times 10^{-5}\) and \(\beta _2=3.6\times 10^{-1}\). We compare the results recovered by two sets of measurements, collected on two meshes \(22\times 22\times 22\) and \(10\times 10\times 10\), with the mesh sizes 1 / 21 and 1 / 9 respectively. The solution is shown in Fig. 6, at three moments \(t= 0.0, 10/47, 1.0\). Clearly better reconstructions are observed for the case with more measurements collected at the finer mesh \(22\times 22\times 22\), though the coarser mesh \(10\times 10\times 10\) is good enough for locating the sources, only with their recovered source intensities smaller than the true values.
4.2 Performance in Parallel Efficiency
In the previous subsection, we have shown with 3 representative examples that the proposed algorithm can successfully recover the intensities and distributions of unsteady sources and is robust with respect to the noise in the data, the choice of Tikhonov regularizations and the number of measurements. These numerical simulations are all computed using the proposed one-level space–time method with \(np=256\) processors. In this section, we focus on our proposed two-level space–time method and study its parallel efficiency with respect to the number of ILU fill-in levels, namely the number k in ILU(k), the overlap size iovlp on the fine level, and the mesh size on the coarse level. We also compare the number of iterations and the total compute time of the one-level and two-level methods with increasing degrees of freedoms (DOFs) and the number of processors.
Firstly we test how the number of fGMRES iterations and the total compute time of the two-level method change with different ILU fill-in levels. We use the coarse mesh \(21\times 21\times 21\) with the time step 1 / 20, and the fine mesh \(41\times 41\times 41\) with the time step 1 / 40 for Examples 1, 2, and 3, and the overlap size \(iovlp=1\). We see that the total number of DOFs on the fine mesh is 16 times of the one on the coarse mesh. Table 3 shows the comparison with \(np= 256\) processors. Column 2–3, 4–5 and 6–7 present the results for Examples 1, 2 and 3 respectively. It is observed that as the fill-in level increases the number of fGMRES iterations decreases, but the total compute time increases. When the fill-in level increases to 3, the compute time increases significantly and the number of iterations only reduces by 3 times. This suggests a suitable fill-in level to be \({ ilulevel} =0\) or 1.
Next we look at the impact of the overlap size. We still use the same fine and coarse meshes for all examples, and ILU(0) for the solver for each subdomain problem on both the coarse and fine meshes. The overlap size on the coarse mesh is set to be 1. We test different overlap sizes on the fine level, and the results are given in Table 4. It is observed that when the overlap size increases from 1 to 2 and then to 4, the number of fGMRES iterations decreases slowly and the total compute time increases. So we shall mostly use \(iovlp =1\) in our subsequent computations.
It is well known that the size of the coarse mesh is an important factor for a two-level method. Now we investigate the performance of our two-level method with different coarse meshes. We know that if the mesh is too coarse, both the number of outer iterations and the total compute time increase; on the other hand, if the mesh is not coarse enough, too much time is spent on the coarse solver, the number of outer iterations may decrease significantly, but the compute time may increase. For this experiment, we fix the fine mesh \(43\times 43\times 43\) with the time step 1 / 42, and the coarse mesh size in each direction is set to 1 / 2 or 1 / 3 of the fine mesh. We combine these options and obtain four sets of coarse meshes and their corresponding time steps. If we denote the ratios of the DOFs on the fine mesh compared to that on the coarse mesh by n, then \(n=8, 16, 24\) and 27 for these coarse meshes respectively. \(np=256\) processors are used. The fill-in level of the subdomain ILU solver is \(ilulevel =0\) and the size of overlap on both the coarse and fine level is \(iovlp =1\). The computational results presented in Table 5 indicate that basically the number of fGMRES iterations increases when we decrease the coarse mesh size, however the compute time does not necessarily follow this trend, it decreases when the coarse mesh is fixed at \(22\times 22\times 22\) and the time step is reduced from 1 / 42 to 1 / 21, then for the next two coarse mesh settings, the compute time grows slowly. As a result, the proper coarse mesh for this example is \(22\times 22\times 22\) with time step 1 / 21, i.e. 16 times coarser is the optimal choice for this test case.
Lastly we compare the performance of the one-level and two-level space–time Schwarz preconditioners in Tables 6 and 7. On the coarse level, a restarted GMRES is used, with the one-level space–time Schwarz preconditioner. ILU(0) is used as the local preconditioner on each subdomain and the coarse overlap size is set to be 1. A tighter convergence tolerance on the coarse mesh can reduce the number of outer fGMRES iterations, but often increases the total compute time. In the following numerical examples, we set the tolerance to be \(10^{-1}\) and the maximum number of GMRES iterations to four on the coarse mesh.
In the following experiments for Examples 1, 2 and 3, we use three sets of fine meshes, \(33\times 33\times 33,\,49\times 49\times 49\) and \(67\times 67\times 67\), and the corresponding time steps are \(1/32,\,1/48\) and 1 / 66 respectively, while the coarse meshes are chosen to be \(17\times 17\times 17,\,17\times 17\times 17\) and \(23\times 23\times 23\), with the corresponding time steps being \(1/16,\,1/48\) and 1 / 66. So the DOFs on the fine meshes are 16, 27 and 27 times of the ones on the coarse meshes for Examples 1, 2 and 3 respectively. We use \(np=64, 128\) and 512 processors for the three sets of meshes respectively and compare their performance with the one-level method in Table 6. Savings in terms of the number of iterations and the total compute time are obtained for the two-level method with all three sets of meshes. As we observe that the number of iterations of the two-level method is mostly reduced by at least 4 times compared to the one for the one-level method, but the compute time is usually reduced by 2–4 times.
Next we fix the space mesh to be \(49\times 49\times 49\) and the time step to be 1 / 48, resulting in a very large-scale discrete system with 17,294,403 DOFs. For the two-level method, we set the coarse mesh to be \(17\times 17\times 17\) with the time step 1 / 48, which implies that the DOFs on the fine mesh is about 27 times of the ones on the coarse mesh. Then the problem is solved with \(np = 128, 256, 512\), and 1024 processors respectively. The performance results of the one-level and two-level methods are presented in Table 7. We observe that when the number of sources is small, both the one-level and two-level methods are scalable with up to 512 processors, but the two-level method takes much less compute time. The strong scalability deteriorates when the number of processors is too large for the size of the problems. As the number of sources increases, the scalability becomes slightly worse for both one-level and two-level methods, even though the two-level method is still faster in terms of the total compute time.
5 Concluding Remarks
In this work we have proposed and studied a fully implicit, space–time coupled, mixed finite element and finite difference discretization method, and a parallel one- and two-level domain decomposition solver for the three-dimensional unsteady inverse convection–diffusion problem. With a suitable number of measurements, this all-at-once approach provides acceptable reconstruction of the physical sources in space and time simultaneously. The classical overlapping Schwarz preconditioner is extended successfully to the coupled space–time problem with a homogenous Dirichlet boundary condition applied on both the spatial and temporal part of the space–time subdomain boundaries. The one-level method is easier to implement, but the two-level hybrid space–time Schwarz method performs much better in terms of the number of iterations and the total compute time. Good scalability results were obtained for problems with more than 17 millions degrees of freedom on a supercomputer with more than 1000 processors. The approach is promising to more general unsteady inverse problems in large-scale applications.
References
Aitbayev, R., Cai, X.-C., Paraschivoiu, M.: Parallel two-level methods for three-dimensional transonic compressible flow simulations on unstructured meshes. In: Proceedings of Parallel CFD’99 (1999)
Akcelik, V., Biros, G., Draganescu, A., Ghattas, O., Hill, J., Waanders, B.: Dynamic data-driven inversion for terascale simulations: real-time identification of airborne contaminants. In: Proceedings of Supercomputing, Seattle, WA (2005)
Akcelik, V., Biros, G., Ghattas, O., Long, K.R., Waanders, B.: A variational finite element method for source inversion for convective–diffusive transport. Finite Elem. Anal. Des. 39, 683–705 (2003)
Atmadja, J., Bagtzoglou, A.C.: State of the art report on mathematical methods for groundwater pollution source identification. Environ. Forensics 2, 205–214 (2001)
Baflico, L., Bernard, S., Maday, Y., Turinici, G., Zerah, G.: Parallel-in-time molecular-dynamics simulations. Phys. Rev. E. 66, 2–5 (2002)
Balay, S., Buschelman, K., Eijkhout, V., Gropp, W.D., Kaushik, D., Knepley, M.G., McInnes, L.C., Smith, B.F., Zhang, H.: PETSc users manual. In: Technical Report, Argonne National Laboratory (2014)
Battermann, A.: Preconditioners for Karush–Kuhn–Tucker Systems Arising in Optimal Control. In: Master Thesis, Virginia Polytechnic Institute and State University, Blacksburg, Virginia (1996)
Biros, G., Ghattas, O.: Parallel preconditioners for KKT systems arising in optimal control of viscous incompressible flows. In: Proceedings of Parallel CFD’99, Williamsburg, Virginia, USA (1999)
Cai, X.-C., Liu, S., Zou, J.: Parallel overlapping domain decomposition methods for coupled inverse elliptic problems. Commun. Appl. Math. Comput. Sci. 4, 1–26 (2009)
Cai, X.-C., Sarkis, M.: A restricted additive Schwarz preconditioner for general sparse linear systems. SIAM J. Sci. Comput. 21, 792–797 (1999)
Chen, R.L., Cai, X.-C.: Parallel one-shot Lagrange–Newton–Krylov–Schwarz algorithms for shape optimization of steady incompressible flows. SIAM J. Sci. Comput. 34, 584–605 (2012)
Deng, X.M., Cai, X.-C., Zou, J.: A parallel space–time domain decomposition method for unsteady source inversion problems. Inverse Probl. Imag. (2015)
Deng, X.M., Zhao, Y.B., Zou, J.: On linear finite elements for simultaneously recovering source location and intensity. Int. J. Numer. Anal. Model. 10, 588–602 (2013)
Engl, H.W., Hanke, M., Neubauer, A.: Regularization of Inverse Problems. Kluwer Academic Publishers, Dordrecht (1998)
Farhat, C., Chandesris, M.: Time-decomposed parallel time-integrators: theory and feasibility studies for fluid, structure, and fluid-structure applications. Int. J. Numer. Methods Eng. 58, 1397–1434 (2003)
Gander, M.J., Hairer, E.: Nonlinear convergence analysis for the parareal algorithm. In: Proceedings of the 17th International Conference on Domain Decomposition Methods, vol. 60, pp. 45–56 (2008)
Gander, M.J., Petcu, M.: Analysis of a Krylov subspace enhanced parareal algorithm for linear problems. In: Cances E. et al. (eds.) Paris-Sud Working Group on Modeling and Scientific Computing 2007–2008. ESAIM Proceedings of EDP Science, LesUlis, vol. 25, pp. 114–129 (2008)
Gander, M.J., Vandewalle, S.: Analysis of the parareal time-parallel time-integration method. SIAM J. Sci. Comput. 29, 556–578 (2007)
Gorelick, S., Evans, B., Remson, I.: Identifying sources of groundwater pollution: an optimization approach. Water Resour. Res. 19, 779–790 (1983)
Hamdi, A.: The recovery of a time-dependent point source in a linear transport equation: application to surface water pollution. Inverse Probl. 24, 1–18 (2009)
Karalashvili, M., Groß, S., Marquardt, W., Mhamdi, A., Reusken, A.: Identification of transport coefficient models in convection–diffusion equations. SIAM J. Sci. Comput. 33, 303–327 (2011)
Keung, Y.L., Zou, J.: Numerical identifications of parameters in parabolic systems. Inverse Probl. 14, 83–100 (1998)
Kuhn, H.W., Tucker, A.W.: Nonlinear programming. In: Proceedings of 2nd Berkeley Symposium. University of California Press, Berkeley, pp. 481–492 (1951)
Lions, J.-L., Maday, Y., Turinici, G.: A parareal in time discretization of PDE’s. C. R. Acad. Sci. Ser. I Math. 332, 661–668 (2001)
Liu, X., Zhai, Z.: Inverse modeling methods for indoor airborne pollutant tracking literature review and fundamentals. Indoor Air 17, 419–438 (2007)
Maday, Y., Turinici, G.: The parareal in time iterative solver: a further direction to parallel implementation. Domain Decompos. Methods Sci. Eng. 40, 441–448 (2005)
Nilssen, T.K., Karlsen, K.H., Mannseth, T., Tai, X.-C.: Identification of diffusion parameters in a nonlinear convection–diffusion equation using the augmented Lagrangian method. Comput. Geosci. 13, 317–329 (2009)
Prudencio, E., Byrd, R., Cai, X.-C.: Parallel full space SQP Lagrange–Newton–Krylov–Schwarz algorithms for PDE-constrained optimization problems. SIAM J. Sci. Comput. 27, 1305–1328 (2006)
Revelli, R., Ridolfi, L.: Nonlinear convection–dispersion models with a localized pollutant source II—-a class of inverse problems. Math. Comput. Model. 42, 601–612 (2005)
Saad, Y.: A flexible inner–outer preconditioned GMRES algorithm. SIAM J. Sci. Comput. 14, 461–469 (1993)
Samarskii, A.A., Vabishchevich, P.N.: Numerical Methods for Solving Inverse Problems of Mathematical Physics. Walter de Gruyter, Berlin (2007)
Skaggs, T., Kabala, Z.: Recovering the release history of a groundwater contaminant. Water Resour. Res. 30, 71–80 (1994)
Smith, B., Bjørstad, P., Gropp, W.: Domain Decomposition: Parallel Multilevel Methods for Elliptic Partial Differential Equations. Cambridge University Press, Cambridge (2004)
Wong, J., Yuan, P.: A FE-based algorithm for the inverse natural convection problem. Int. J. Numer. Methods. Fluid 68, 48–82 (2012)
Woodbury, K.A.: Inverse Engineering Handbook. CRC Press, Boca Raton (2003)
Yang, X.-H., She, D.-X., Li, J.-Q.: Numerical approach to the inverse convection-diffusion problem. In: 2007 International Symposium on Nonlinear Dynamics (2007 ISND), Journal of Physics: Conference Series, vol. 96,p. 012156 (2008)
Yang, L., Deng, Z.-C., Yu, J.-N., Luo, G.-W.: Optimization method for the inverse problem of reconstructing the source term in a parabolic equation. Math. Comput. Simul. 80, 314–326 (2009)
Yang, H., Prudencio, E., Cai, X.-C.: Fully implicit Lagrange–Newton–Krylov–Schwarz algorithms for boundary control of unsteady incompressible flows. Int. J. Numer. Methods Eng. 91, 644–665 (2012)
Zhang, J., Delichatsios, M.A.: Determination of the convective heat transfer coefficient in three-dimensional inverse heat conduction problems. Fire Saf. J. 44, 681–690 (2009)
Acknowledgments
The authors would like to thank the anonymous referees for their insightful comments and suggestions that helped us improve the quality of the paper. The work was partly supported by NSFC 11501545, 91330111, Shenzhen Program JCYJ20140901003939012, KQCX20130628112914303, 201506303000093 and 863 Program 2015AA01A302. The second author was partly support by NSF CCF-1216314. The third author was substantially supported by Hong Kong RGC Grants 404611 and 405513.
Author information
Authors and Affiliations
Corresponding author
The Discrete Structure of the KKT System
The Discrete Structure of the KKT System
The KKT system (8)–(9) is formulated as follows:
To better understand the discrete structure of (16), we denote the identity and zero matrices as I and \(\mathbf {0}\) respectively, and the basis functions of the finite element spaces \(V^h\) and \(W_h^{\tau }\) by \(\phi =(\phi _i)^T,\,i=1,\,\ldots ,\,N\) and \(g_j^n,\,j=1,\,\ldots ,\,N,\,n=0,\,\ldots ,\,M\), respectively, let
and based on these element matrices we define
Then the system (16) takes the following form
where the block matrices BC, BG and Bf are given by
Rights and permissions
About this article
Cite this article
Deng, X., Cai, XC. & Zou, J. Two-Level Space–Time Domain Decomposition Methods for Three-Dimensional Unsteady Inverse Source Problems. J Sci Comput 67, 860–882 (2016). https://doi.org/10.1007/s10915-015-0109-1
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10915-015-0109-1
Keywords
- Space–time method
- Multilevel method
- Domain decomposition preconditioner
- Unsteady inverse source problem
- Parallel computing