Abstract
Partial differential equations (PDE) are indispensable to describe complex processes. PDE constrained parameter estimation is still a prevailing topic of research. The increase in computation time with increasing complexity of the problem is one of the main problems. With the application of multiple shooting, the number of required derivatives for the generalized Gauss–Newton method rises rapidly. We introduce a method to overcome this challenge. By using directional derivatives the computational effort can be reduced to the minimal number. We demonstrate our methods with help of the heat equation.
Access provided by Autonomous University of Puebla. Download conference paper PDF
Similar content being viewed by others
Keywords
- Newton Method
- Multiple Shooting
- Partial Differential Equation
- Differential Algebraic Equation
- Parameter Estimation Problem
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.
1 Introduction
Validated models are essential for process optimization and optimal control in chemistry, engineering etc. Usually these models depend on parameters that are not known initially but have to be identified from measurement data. Derivative based methods, such as the generalized Gauss–Newton method for direct multiple shooting by Bock [5], have shown good results for parameter estimation problems with ordinary differential equations (ODEs).
If spatially distributed processes are taken into account, we have to consider constraints given by partial differential equations (PDEs). If PDEs are discretized by means of a method of lines, we end up with a high-dimensional system of differential algebraic equations (DAEs). In general, we formulate the parameter estimation problem as a nonlinear least squares problem and apply the generalized Gauss–Newton method. In the context of multiple shooting, the effort for the computation of the Jacobians in each iteration of the generalized Gauss–Newton method is tremendous. We present a method which couples the evaluation of the Jacobians and the subsequent block-Gaussian elimination. Thus, the number of required derivatives is reduced to the minimal number. The reduced approach was introduced by Schlöder [13] for parameter estimation problems constrained by high-dimensional systems of ODEs in 1987. A first extension to DAE constrained problem was presented by Bauer [2]. In this paper, we develop a different formulation of the reduced approach for DAE constraints which we consider as an approximation of the solution of a partial differential equation. The first application of the reduced approach to PDE constrained parameter estimation problems was presented by Dieses [1]. Dieses considered only ODEs to approximate the solution of a PDE.
We first introduce the general formulation of a parameter estimation problem and the generalized Gauss–Newton method. Afterwards, we present the reduced approach. In the end, an application example is investigated to show the advantages of our method compared to the conventional approach. In the end, some conclusions are drawn.
2 Problem Formulation
We consider a dynamic system defined by partial differential equations,
on the bounded domain \(\varOmega \subset \mathbb{R}^{d}.\) \(\mathcal{F}\) is an arbitrary function with time t = [t 0, t end], independent variables \(x \in \mathbb{R}^{d},\) dependent variables \(u \in \mathbb{R}^{n_{u}}\) and parameters \(p \in \mathbb{R}^{n_{p}}.\) We examine transient problems. Thus, let initial values of the form
be given that may depend on the parameters. Additionally, let boundary conditions be defined by
for some constants a and b that can be zero but not at the same time and a given function c on the boundary of the domain.
Assume that n ex experiments have been executed which have provided a set of measurements \(\eta _{k}^{\,j},\ k = 1,\ldots,m^{\,j},\ j = 1,\ldots,n_{\mathrm{ex}}.\) By \(h_{k}^{j}(t_{k}^{\,j},u^{{\ast},j}(t_{k}^{\,j}),p^{{\ast}}),\) k = 1, …, m j, j = 1, …, n ex, we denote the corresponding model response evaluated for the true, but unknown parameters p ∗ and states \(u^{{\ast},j}(t_{k}^{\,j})\) computed by Eq. (1) for p ∗. We assume the measurement errors
to be normally distributed
with mean 0 and standard deviation \(\sigma _{k}^{j}.\) The difference between measurement values and model response can be evaluated for other values of p and u, too, and we obtain the residuals
The parameter estimation problem is then to minimize the weighted sum of the residuals
with the variances \(\sigma _{k}^{j}\) as weights. Equation (5) can also be interpreted as a log-likelihood estimator for the parameters, see Seber [14].
Often, we have to deal with additional interior point and boundary constraints as well:
with \(r_{l}: \mathbb{R}^{n_{u}} \times \mathbb{R}^{n_{p}} \rightarrow \mathbb{R}^{r}.\)
Considering the model equations (1)–(3) and the interior point constraints (6) as additional constraints, we state the PDE constrained parameter estimation problem
3 Discretization in Space and in Time
In this section, we present concepts for the discretization of the model equations (1) in space and the concept of multiple shooting. If not declared otherwise, the following methods are presented for the first experiment only. For the remaining experiments, the steps have to be repeated. We neglect the superscript 1.
The first step in the parametrization of the parameter estimation problem (7) consists of the discretization of the PDE constraints (1) by a method of lines, cf. Schiesser [12], e.g., by a finite difference methods (FDMs) or a finite element methods (FEMs).
The approach leads to a high-dimensional system of differential algebraic equations
where A(y(t), z(t), p) denotes the mass matrix that may depend on the spatially discretized states \(y(t) \in \mathbb{R}^{n_{y}}\) and \(z(t) \in \mathbb{R}^{n_{z}}\) and the parameters p. We consider only DAEs with differentiation index 1, i.e. the matrix \(\frac{\partial g} {\partial z}\) is regular.
To solve System (8) in time, we apply direct multiple shooting, see Bock [6]. Thus, the parameter estimation problem is transformed into a problem with finite dimensional constraints.
We define the shooting grid, i.e., a partition of the time interval [t 0, t end],
and the shooting intervals
We introduce artificial initial values \(s_{i} = (s_{i}^{y^{T} },\ s_{i}^{z^{T} })^{T},\ i = 0,\ldots,n_{\mathrm{ms}},\) with
for the differential and algebraic states y(t) and z(t), respectively, and examine the relaxed DAE system
on each of the n ms + 1 subintervals. Here, β(t) is a continuous, monotonically decreasing function with
The evaluation of the DAE system leads to a step-by-step formulation of the trajectory
We refer to \(\psi _{i}(t;s^{i},p),\ t \in \mathcal{I}_{i},\ i = 0,\ldots,n_{\mathrm{ms}}\) as the nominal trajectory
By Eq. (11), we obtain a piecewise continuous, finite dimensional parametrization of the nominal trajectories of (10). To assure continuity of the trajectory for the solution \(\hat{p}\) of the parameter estimation problem for the whole time interval and consistency for the algebraic equations, we impose continuity constraints
and consistency constraints
The variables s i , i = 0, …, n ms are additional degrees of freedom of the parameter estimation problem.
Before we formulate the finite dimensional constrained parameter estimation problem, we have to adjust the interior point constraints (6) to the shooting discretization. With the initial conditions (9) added to the set of constraints, we introduce a new vector of variables s r that is locally uniquely determined by the interior point constraints, i.e. the matrix \(\frac{\partial r} {\partial s^{r}},\) has full rank. Here, we use the definition
where we neglect the dependencies of the nominal trajectories ψ of s and p, respectively.
When we present the generalized Gauss–Newton method in Sect. 5, we will clarify the necessity of the variables s r in more detail. For later considerations, we define the vector
and the function
Note, that the initial conditions may depend on the variables s r, too.
Summarized, this results in the following finite dimensional nonlinear least squares problem for n ex experiments:
4 The Generalized Gauss–Newton Method
For readability, we introduce a shorter notation of Problem (15)
with \(F_{1}(s,p) \in \mathbb{R}^{n_{1}},\ F_{2}(s,p) \in \mathbb{R}^{n_{2}}\) and \((s,p) \in \mathbb{R}^{n}.\) We use the definitions
Bock [5] suggested to apply the generalized Gauss–Newton method to solve nonlinear least squares problems with ODE constraints. The first application to DAE constrained problems was presented by Bock et al. [7]. For a detailed description we refer to Körkel [10]. Problem (16) is solved iteratively by examining linearized equations
Therefore, we need to compute the Jacobians
For problem (15) and n ex = 1, the Jacobian has the following structure:
with the derivatives of
-
of the residual of the measurements
$$\displaystyle{ D_{i}^{1}:= \frac{\partial F_{1}} {\partial s_{i}},\ i = 0,\ldots,n_{\mathrm{ms}},\quad D_{\hat{v}}^{1}:= \frac{\partial F_{1}} {\partial \hat{v}}, }$$ -
of the continuity constraints
$$\displaystyle{ G_{i}:= \frac{\partial \psi ^{y}} {\partial s_{i}}(\tau _{i+1};s_{i},p),\ i = 0,\ldots,n_{\mathrm{ms}} - 1,\quad G_{i}^{\hat{v}}:= \frac{\partial \psi ^{y}} {\partial \hat{v}}(\tau _{i+1},s_{i},p), }$$ -
of the consistency conditions
$$\displaystyle{ H_{i}:= \frac{\partial g} {\partial s_{i}}(\tau _{i},s_{i},p),\ i = 0,\ldots,n_{\mathrm{ms}},\quad H_{i}^{\hat{v}}:= \frac{\partial g} {\partial \hat{v}}(\tau _{i},s_{i},p), }$$ -
and the initial conditions and the interior point constraints
$$\displaystyle{ D_{i}^{2}:= \frac{\partial d} {\partial s_{i}},\ i = 0,\ldots,n_{\mathrm{ms}},\quad D_{\hat{v}}^{2}:= \frac{\partial d} {\partial \hat{v}}. }$$
with \(\hat{v} = (p,s^{r}).\)
To assure uniqueness of the solution of Problem (16), we assume that the following two conditions hold for all values of (s, p), where we have to evaluate F and J:
-
Constraint Qualification (CQ)
$$\displaystyle{ \text{rank}\;J_{2}(s,p) = n_{2}, }$$(21) -
Positive Definiteness (PD)
$$\displaystyle{ \text{rank}\;J(s,p) = n. }$$(22)
We recall from Sect. 3 the dimension of F 2. The continuity conditions (12) sum up to n y ⋅ n ms constraints, the consistency constraints (13) provide n z ⋅ (n ms + 1) additional constraints and the initial conditions and the interior point and boundary constraints (14) results in n y + n r equations. We end up with
Since we have already defined
shooting variables, we need to introduce n r additional variables s r to guarantee that (CQ) is fulfilled.
The linearization of (16) leads to a comparatively large, but sparse Jacobian. Bock [5] introduced the condensing algorithm for ODE constrained parameter estimation problems that exploits the sparse structure of (20) and eliminates the shooting variables s i , i = 1, …, n ms by a block-Gaussian elimination. The condensed system depends only on \(\varDelta s_{0}^{y},\ \varDelta s^{r}\) and Δ p
By projecting on the algebraic variables s i z, i = 0, …, n ms, the method can be applied to DAE constrained parameter estimation problems too, cf. Leineweber [11].
The procedure of evaluating the Jacobians first and applying the condensing method afterwards is referred to as the general approach. The general approach is implemented in the software package for parameter estimation PARFIT which is based on the methods presented in this section and in Bock [5] and Körkel [10].
5 The Reduced Approach
Especially for DAE constraints, that result from parametrized PDEs, the effort to compute (20) is tremendous due to the high dimension of s i . The computation of submatrices G i and H i scales with the number of (discretized) states. For (n y + n z ) ≫ n p , the effort to evaluate the blocks G i and H i dominates the evaluation of the Jacobian. The computation of these blocks requires the evaluation of (n y + n z ) variational differential equations. That is why the common approach is not suitable to solve PDE constrained parameter estimation problems in the context of multiple shooting.
Schlöder [13] developed an approach for high-dimensional ODE systems that couples the evaluation of the Jacobians and the subsequent condensing by using directional derivatives. Thereby, the effort of the computation of the Jacobian (20) reduces to the one of single shooting, i.e., the smallest possible number. Bauer [3] extended this method to parameter estimation problems with differential algebraic constraints.
We developed a different formulation of the reduced approach which fully eliminates the algebraic constraints and, thus, leads to a reduced condensed system of equal size as Problem (23). The approach of Bauer leads to redundant constraints which may cause numerical problems.
As in Sect. 3, we present the following method only for the first experiment and neglect the superscript 1. For the next steps, we assume that the interior point constraints and the residuals of the measurements can be written in the following form
We refer to Eqs. (24) as separability conditions. We define the derivatives of (6) and (4) with respect to \(\hat{v} = (p,s^{r})\) according to
cf. Eq. (20). To eliminate the shooting variables at t = τ 0, we define
and we examine the rows of the Jacobian (20) that belong to the initial conditions (25) and to the consistency constraints at t = τ 0
Obviously, it holds
Since we consider only DAEs with differentiation index 1, the matrix
is regular. We eliminate n y + n z variables Δ s 0 formally from (26) and obtain
with
We refer to (29) as seed matrices. The following steps are closely related to the ODE formulation of the reduced approach presented by Schlöder [13] with an extension to the consistency constraints which are solved locally for \(s_{j}^{z},\ j = 0,n_{\mathrm{ms}}.\)
The idea is to apply the explicit representation for the increments Δ s 0 given by Eq. (28) to the evaluation of the remaining constraints. We define
Since we use the matrices \(\hat{E}_{l}^{p,0},\ \hat{E}_{l}^{s,0}\) and the vectors \(\hat{u}_{l}^{0},\ l = 1,2,\) for the computation of the reduced condensed system, the notations in Eqs. (30) correspond to (23). Then, we compute recursively
We set
and obtain the reduced condensed system
The seed matrices are updated iteratively by
In Eqs. (30), (31) and (33), the expressions given by ” ⋅ ” are evaluated by directional derivatives and do not denote matrix products. In difference to the common approach, described in Sect. 4, we have to evaluate n p + n r + 1 directions instead of n y + n z + n p .
If the initial conditions (14) are given in the form
i.e., Eq. (14) does not depend explicitly on the variables s r, the number of required directions is independent of the number of states.
After we have solved Problem (32) for Δ p and Δ s r, the increments for the shooting variables (s 1, …, s ms) are determined by
The methods, which have been presented in this section, have been implemented in a software package for parameter estimation called PAREMERA. PAREMERA is a new implementation in Fortran90 that is suited for the treatment of multiple experiments.
6 Example
In the following, we examine the 1D heat equation, e.g., see Evans [9]. which describes the distribution of heat in a given region over time. The system is defined by
with homogeneous Dirichlet conditions on the domain Ω × T = [0, 1] × [0, 1] (Fig. 1).
Here, u is an arbitrary function, usually referred to as temperature. The parameter p 1 is the thermal diffusivity. We discretize (35) with second order central finite differences for three different mesh sizes \(\varDelta x = \left \{0.01,0.002,0.001\right \}\) to obtain an ODE system of 101, 501 and 1001 states, respectively.
To compare the results between the reduced approach and the common one, the time interval is decomposed into four subintervals [τ i , τ i+1) with τ i = 0. 25 ⋅ i, i = 0, …, 4. We measure the peak at x = 0. 5 at t ∈ { 0. 2, 0. 4, 0. 6, 0. 8, 1. 0}. The measurement data is determined by integrating the ODE system applying the software package DAESOL by Bauer [4] with known true parameter
and adding Gaussian noise. For the calculations, p 1 ∗ is scaled to one.
We compare the results of the two previously mentioned parameter estimation tools PAREMERA and PARFIT. Here, we use a version of PARFIT which is eligible for the exploitation of multiple experiment structures, see von Schwerin [15]. Both tools are embedded in software toolbox VPLAN by Körkel et al. [10].
We apply \(p_{1}^{0} = 2\) as starting parameter for all six settings (three mesh sizes and two algorithms). All computations are executed on a 64bit computer with 4 GB memory and an Intel ○R Core2Duo with 2 × 2. 8 GHz. The results are listed in Table 1.
Both algorithms converge for all mesh sizes to approximately the same solution \((\hat{p}_{1} \approx 1.00806),\) but there is a significant difference in the time per iteration. For 101 states, PAREMERA is around four times faster then PARFIT. With increasing number of states, the difference in computational time increases. PAREMERA is 13 and 26 times faster for 501 and 1001 states, respectively.
Another fact that differs drastically, is the number of iterations. For all discretizations, PAREMERA achieves convergence after 5 iterations while PARFIT finds the solution after 10 iterations. This can be explained by the different types of globalization strategies. In PAREMERA, the restricted monotonicity test (RMT) is implemented as it is presented in Bock et al. [8]. In Parfit, only a first order Taylor series expansion is used to compute the curvature information.
Note, that the computed increments (Δ s, Δ p) for the first iteration of both algorithms are exactly the same since both algorithms solve the same system.
In Fig. 2, a comparison is shown between the two fitting curves with \(p_{1}^{0} = 2\) on the left hand side and the resulting parameter \(\hat{p}_{1} = 1.00806\) computed with PAREMERA on the right hand side, respectively.
Even for this rather small example we could show the advantages of the reduced approach. For more complex problems we expect even more significant savings in computation time. This is important to solve higher dimensional PDE problems or to do online parameter estimation. Thus, the reduced approach should be favored to solve this kind of problems.
References
Altmann-Dieses, A., Schlöder, J.P., Bock, H.G., Richter, O.: Optimal experimental design for parameter estimation in column outflow experiments. Water Resour. Res. 38, 1186ff (2002)
Bauer, I.: Numerische Verfahren zur Lösung von Anfangswertaufgaben und zur Generierung von ersten und zweiten Ableitungen mit Anwendungen bei Optimierungsaufgaben in Chemie und Verfahrenstechnik. Ph.D. thesis, Universität Heidelberg (1999)
Bauer, I.: Numerische Verfahren zur Lösung von Anfangswertaufgaben und zur Generierung von ersten und zweiten Ableitungen mit Anwendungen in Chemie und Verfahrenstechnik. Preprint, SFB 359, Universität Heidelberg (2001)
Bauer, I., Bock, H.G., Schlöder, J.P.: DAESOL—a BDF-code for the numerical solution of differential algebraic equations. Internal report, IWR, SFB 359, Universität Heidelberg (1999)
Bock, H.G.: Randwertproblemmethoden zur parameteridentifizierung in systemen nichtlinearer differentialgleichungen. Bonner Mathematische Schriften 183 (1987)
Bock, H.G., Plitt, K.J.: A Multiple Shooting algorithm for direct solution of optimal control problems. In: Proceedings of the 9th IFAC World Congress, pp. 242–247. Pergamon Press, Budapest (1984). Available at http://www.iwr.uni-heidelberg.de/groups/agbock/FILES/Bock1984.pdf
Bock, H.G., Schlöder, J.P., Schulz, V.H.: Numerik großer Differentiell-Algebraischer Gleichungen—Simulation und Optimierung. In: Schuler, H. (ed.), Prozeß-Simulation, Chap. 2, pp. 35–80. VCH, Germany (1995)
Bock, H.G., Kostina, E.A., Schlöder, J.P.: On the role of natural level functions to achieve global convergence for damped Newton methods. In: Powell, M.J.D., Scholtes, S. (eds.) System Modelling and Optimization: Methods, Theory and Applications, pp. 51–74. Kluwer, Dodrecht (2000)
Evans, L.C.: Partial Differential Equations. Graduate Studies in Mathematics, vol. 19, 2nd edn. American Mathematical Society, Providence, RI (2010)
Körkel, S.: Numerische Methoden für optimale Versuchsplanungsprobleme bei nichtlinearen DAE-Modellen. Ph.D. thesis, Universität Heidelberg, Heidelberg (2002)
Leineweber, D.B.: Efficient reduced SQP methods for the optimization of chemical processes described by large sparse DAE models. Fortschritt-Berichte VDI Reihe 3, Verfahrenstechnik, vol. 613. VDI Verlag, Düsseldorf (1999)
Schiesser, W.E.: The Numerical Method of Lines: Integration of Partial Differential Equations, vol. 17. Academic, San Diego (1991)
Schlöder, J.P.: Numerische Methoden zur Behandlung hochdimensionaler Aufgaben der Parameteridentifizierung. Dissertation, Hohe Mathematisch-Naturwissenschaftliche Fakultät der Rheinischen Friedrich-Wilhelms-Universität zu Bonn (1987)
Seber, G.A.F., Wild, C.J.: Nonlinear Regression. Wiley, New York (1989)
von Schwerin, R.: Numerical methods, algorithms, and software for higher index nonlinear differential-algebraic equations in multibody system simulation. Ph.D. thesis, Universität Heidelberg (1997)
Acknowledgements
Financial support by BASF SE and HGS MathComp is gratefully acknowledged.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2015 Springer International Publishing Switzerland
About this paper
Cite this paper
Kircheis, R., Körkel, S. (2015). Parameter Estimation for High-Dimensional PDE Models Using a Reduced Approach. In: Carraro, T., Geiger, M., Körkel, S., Rannacher, R. (eds) Multiple Shooting and Time Domain Decomposition Methods. Contributions in Mathematical and Computational Sciences, vol 9. Springer, Cham. https://doi.org/10.1007/978-3-319-23321-5_5
Download citation
DOI: https://doi.org/10.1007/978-3-319-23321-5_5
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-23320-8
Online ISBN: 978-3-319-23321-5
eBook Packages: Mathematics and StatisticsMathematics and Statistics (R0)