Keywords

These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

1 Introduction

Validated models are essential for process optimization and optimal control in chemistry, engineering etc. Usually these models depend on parameters that are not known initially but have to be identified from measurement data. Derivative based methods, such as the generalized Gauss–Newton method for direct multiple shooting by Bock [5], have shown good results for parameter estimation problems with ordinary differential equations (ODEs).

If spatially distributed processes are taken into account, we have to consider constraints given by partial differential equations (PDEs). If PDEs are discretized by means of a method of lines, we end up with a high-dimensional system of differential algebraic equations (DAEs). In general, we formulate the parameter estimation problem as a nonlinear least squares problem and apply the generalized Gauss–Newton method. In the context of multiple shooting, the effort for the computation of the Jacobians in each iteration of the generalized Gauss–Newton method is tremendous. We present a method which couples the evaluation of the Jacobians and the subsequent block-Gaussian elimination. Thus, the number of required derivatives is reduced to the minimal number. The reduced approach was introduced by Schlöder [13] for parameter estimation problems constrained by high-dimensional systems of ODEs in 1987. A first extension to DAE constrained problem was presented by Bauer [2]. In this paper, we develop a different formulation of the reduced approach for DAE constraints which we consider as an approximation of the solution of a partial differential equation. The first application of the reduced approach to PDE constrained parameter estimation problems was presented by Dieses [1]. Dieses considered only ODEs to approximate the solution of a PDE.

We first introduce the general formulation of a parameter estimation problem and the generalized Gauss–Newton method. Afterwards, we present the reduced approach. In the end, an application example is investigated to show the advantages of our method compared to the conventional approach. In the end, some conclusions are drawn.

2 Problem Formulation

We consider a dynamic system defined by partial differential equations,

$$\displaystyle{ 0 = \mathcal{F}\left (t,x,u, \frac{\partial u_{i}} {\partial t}, \frac{\partial u_{i}} {\partial x_{j}}, \frac{\partial u_{i}} {\partial ^{2}x_{j}\partial x_{k}},\ldots,p\right ), }$$
(1)

on the bounded domain \(\varOmega \subset \mathbb{R}^{d}.\) \(\mathcal{F}\) is an arbitrary function with time t = [t 0, t end], independent variables \(x \in \mathbb{R}^{d},\) dependent variables \(u \in \mathbb{R}^{n_{u}}\) and parameters \(p \in \mathbb{R}^{n_{p}}.\) We examine transient problems. Thus, let initial values of the form

$$\displaystyle{ u(t_{0}) = u_{0}(p) }$$
(2)

be given that may depend on the parameters. Additionally, let boundary conditions be defined by

$$\displaystyle{ au + b\frac{\partial u} {\partial n} = c\quad \text{on }\partial \varOmega }$$
(3)

for some constants a and b that can be zero but not at the same time and a given function c on the boundary of the domain.

Assume that n ex experiments have been executed which have provided a set of measurements \(\eta _{k}^{\,j},\ k = 1,\ldots,m^{\,j},\ j = 1,\ldots,n_{\mathrm{ex}}.\) By \(h_{k}^{j}(t_{k}^{\,j},u^{{\ast},j}(t_{k}^{\,j}),p^{{\ast}}),\) k = 1, , m j,  j = 1, , n ex, we denote the corresponding model response evaluated for the true, but unknown parameters p and states \(u^{{\ast},j}(t_{k}^{\,j})\) computed by Eq. (1) for p . We assume the measurement errors

$$\displaystyle{ \varepsilon _{k}^{j}:=\eta _{ k}^{\,j} - h_{ k}^{j}(t_{ k}^{\,j},u^{j,{\ast}}(t_{ k}^{\,j}),p^{{\ast}}),\quad k = 1,\ldots,m^{\,j},\ j = 1,\ldots,n_{\mathrm{ ex}}, }$$

to be normally distributed

$$\displaystyle{ \varepsilon _{k}^{j} \sim \mathcal{N}\left (0,\left (\sigma _{ k}^{i}\right )^{2}\right ),\quad k = 1,\ldots,m^{\,j},\ j = 1,\ldots,n_{\mathrm{ ex}}, }$$

with mean 0 and standard deviation \(\sigma _{k}^{j}.\) The difference between measurement values and model response can be evaluated for other values of p and u, too, and we obtain the residuals

$$\displaystyle{ \eta _{k}^{\,j} - h_{ k}^{j}(t_{ k}^{\,j},u^{j}(t_{ k}^{\,j}),p),\quad k = 1,\ldots,m^{\,j}\ j = 1,\ldots,n_{\mathrm{ ex}}. }$$
(4)

The parameter estimation problem is then to minimize the weighted sum of the residuals

$$\displaystyle{ \frac{1} {2}\sum _{j=1}^{n_{\mathrm{ex}} }\sum _{k=1}^{m^{\,j} }\left (\frac{\eta _{k}^{\,j} - h_{k}^{j}(t_{k}^{\,j},u^{j}(t_{k}),p)} {\sigma _{k}^{j}} \right )^{2} }$$
(5)

with the variances \(\sigma _{k}^{j}\) as weights. Equation (5) can also be interpreted as a log-likelihood estimator for the parameters, see Seber [14].

Often, we have to deal with additional interior point and boundary constraints as well:

$$\displaystyle{ 0 =\sum _{ l=1}^{n_{r}^{j} }r_{l}^{j}(u^{j}(t_{ l}^{\,j}),p) }$$
(6)

with \(r_{l}: \mathbb{R}^{n_{u}} \times \mathbb{R}^{n_{p}} \rightarrow \mathbb{R}^{r}.\)

Considering the model equations (1)–(3) and the interior point constraints (6) as additional constraints, we state the PDE constrained parameter estimation problem

$$\displaystyle{ \min _{y,p}\quad \frac{1} {2}\sum _{j=1}^{n_{\mathrm{ex}} }\sum _{k=1}^{m^{\,j} }\left (\frac{\eta _{k}^{\,j} - h_{k}^{j}(t_{k}^{\,j},u^{j}(t_{k}^{\,j}),p)} {\sigma _{k}^{j}} \right )^{2} }$$
(7a)
$$\displaystyle{ \text{s.t. }0 = \mathcal{F}^{j}\left (t,x,u^{j}, \frac{\partial u_{i}^{j}} {\partial t}, \frac{\partial u_{i}^{j}} {\partial x_{j}}, \frac{\partial u_{i}^{j}} {\partial ^{2}x_{j}\partial x_{k}},\ldots,p\right )\quad j = 1,\ldots,n_{\mathrm{ex}}. }$$
(7b)
$$\displaystyle{ u^{j}(t_{ 0}) = u_{0}^{j}(p), }$$
(7c)
$$\displaystyle{ c^{j} = a^{j}u^{j} + b^{j}\frac{\partial u^{j}} {\partial n} \quad \text{ on }\partial \varOmega, }$$
(7d)
$$\displaystyle{ 0 =\sum _{ l=1}^{n_{r}^{j} }r_{l}^{j}(u^{j}(t_{ l}^{\,j}),p). }$$
(7e)

3 Discretization in Space and in Time

In this section, we present concepts for the discretization of the model equations (1) in space and the concept of multiple shooting. If not declared otherwise, the following methods are presented for the first experiment only. For the remaining experiments, the steps have to be repeated. We neglect the superscript 1.

The first step in the parametrization of the parameter estimation problem (7) consists of the discretization of the PDE constraints (1) by a method of lines, cf. Schiesser [12], e.g., by a finite difference methods (FDMs) or a finite element methods (FEMs).

The approach leads to a high-dimensional system of differential algebraic equations

$$\displaystyle{ A(y(t),z(t),p)\dot{y} = f(t,y(t),z(t),p), }$$
(8a)
$$\displaystyle{ 0 = g(t,y(t),z(t)), }$$
(8b)
$$\displaystyle{ y(t_{0}) = y_{0}(p), }$$
(8c)

where A(y(t), z(t), p) denotes the mass matrix that may depend on the spatially discretized states \(y(t) \in \mathbb{R}^{n_{y}}\) and \(z(t) \in \mathbb{R}^{n_{z}}\) and the parameters p. We consider only DAEs with differentiation index 1, i.e. the matrix \(\frac{\partial g} {\partial z}\) is regular.

To solve System (8) in time, we apply direct multiple shooting, see Bock [6]. Thus, the parameter estimation problem is transformed into a problem with finite dimensional constraints.

We define the shooting grid, i.e., a partition of the time interval [t 0, t end], 

$$\displaystyle{ \tau _{0} = t_{0} <\tau _{1} <\ldots <\tau _{n_{\mathrm{ms}}} <\tau _{n_{\mathrm{ms}}+1} = t_{end}, }$$

and the shooting intervals

$$\displaystyle{ \mathcal{I}_{i} = [\tau _{i},\tau _{i+1}),\quad i = 0,\ldots,n_{\mathrm{ms}}. }$$

We introduce artificial initial values \(s_{i} = (s_{i}^{y^{T} },\ s_{i}^{z^{T} })^{T},\ i = 0,\ldots,n_{\mathrm{ms}},\) with

$$\displaystyle{ s_{0}^{y} = y_{ 0}(p) }$$
(9)

for the differential and algebraic states y(t) and z(t), respectively, and examine the relaxed DAE system

$$\displaystyle{ A(y(t),z(t),p)\dot{y}(t) = f(t,y(t),z(t),p),\qquad t \in \mathcal{I}_{i} }$$
(10a)
$$\displaystyle{ y(\tau _{i}) = s_{i}^{y}, }$$
(10b)
$$\displaystyle{ 0 = g(t,y(t),z(t),p) -\beta (t)g(\tau _{i},s_{i},p),\quad i = 0,\ldots,n_{\mathrm{ms}}, }$$
(10c)
$$\displaystyle{ \beta (t) \in [0,1],\quad \beta (\tau _{i}) = 1, }$$
(10d)

on each of the n ms + 1 subintervals. Here, β(t) is a continuous, monotonically decreasing function with

$$\displaystyle{ \lim _{t\rightarrow \tau _{i+1}}\beta (t) = 0. }$$

The evaluation of the DAE system leads to a step-by-step formulation of the trajectory

$$\displaystyle{ \left (\begin{array}{*{10}c} y(t)\\ z(t) \end{array} \right ) =\psi (t;s^{i},p),\quad t \in \mathcal{I}_{ i},\ i = 0,\ldots,n_{\mathrm{ms}}. }$$
(11)

We refer to \(\psi _{i}(t;s^{i},p),\ t \in \mathcal{I}_{i},\ i = 0,\ldots,n_{\mathrm{ms}}\) as the nominal trajectory

By Eq. (11), we obtain a piecewise continuous, finite dimensional parametrization of the nominal trajectories of (10). To assure continuity of the trajectory for the solution \(\hat{p}\) of the parameter estimation problem for the whole time interval and consistency for the algebraic equations, we impose continuity constraints

$$\displaystyle{ c(\tau _{i},s_{i},s_{i-1},p):=\psi ^{y}(\tau _{ i};s_{i-1}^{y},p) - s_{ i}^{y} = 0,\quad i = 1,\ldots,n_{\mathrm{ ms}} }$$
(12)

and consistency constraints

$$\displaystyle{ g(\tau _{i},s_{i},p) = 0,\quad i = 0,\ldots,n_{\mathrm{ms}}. }$$
(13)

The variables s i ,  i = 0, , n ms are additional degrees of freedom of the parameter estimation problem.

Before we formulate the finite dimensional constrained parameter estimation problem, we have to adjust the interior point constraints (6) to the shooting discretization. With the initial conditions (9) added to the set of constraints, we introduce a new vector of variables s r that is locally uniquely determined by the interior point constraints, i.e. the matrix \(\frac{\partial r} {\partial s^{r}},\) has full rank. Here, we use the definition

$$\displaystyle{ r:=\sum _{ l=1}^{n_{r} }r_{l}(\psi (t_{l}),p,s^{r}),\quad j = 1,\ldots,n_{\mathrm{ ex}}, }$$

where we neglect the dependencies of the nominal trajectories ψ of s and p, respectively.

When we present the generalized Gauss–Newton method in Sect. 5, we will clarify the necessity of the variables s r in more detail. For later considerations, we define the vector

$$\displaystyle{ s:= (s_{0},\ldots,s_{\mathrm{ms}},s^{r}) }$$

and the function

$$\displaystyle{ d(s,p):= \left (\begin{array}{*{10}c} y_{0}(s^{r},p) - s_{ 0}^{y}, \\ r \end{array} \right ). }$$
(14)

Note, that the initial conditions may depend on the variables s r, too.

Summarized, this results in the following finite dimensional nonlinear least squares problem for n ex experiments:

$$\displaystyle{ \min _{s,p}\quad \frac{1} {2}\sum _{j=1}^{n_{\mathrm{ex}} }\sum _{k=1}^{m^{\,j} }\left (\frac{\eta _{k}^{\,j} - h_{k}^{j}(t_{k},\psi (t_{k}),p)} {\sigma _{k}^{j}} \right )^{2} }$$
(15a)
$$\displaystyle{ \text{s.t. }\quad 0 = c^{j}(\tau _{ i},s_{i},s_{i-1},p) - s_{i}^{y},\qquad \qquad i = 1,\ldots,n_{\mathrm{ ms}}^{j},\ j = 1,\ldots,n_{\mathrm{ ex}}, }$$
(15b)
$$\displaystyle{ 0 = g^{j}(\tau _{ i},s_{i},p),\qquad \qquad i = 0,\ldots,n_{\mathrm{ms}}^{j}, }$$
(15c)
$$\displaystyle{ 0 = d^{j}(s,p). }$$
(15d)

4 The Generalized Gauss–Newton Method

For readability, we introduce a shorter notation of Problem (15)

$$\displaystyle{ \min _{s,p}\quad \left \|F_{1}(s,p)\right \|_{2}^{2} }$$
(16a)
$$\displaystyle{ \text{s.t. }0 = F_{2}(s,p). }$$
(16b)

with \(F_{1}(s,p) \in \mathbb{R}^{n_{1}},\ F_{2}(s,p) \in \mathbb{R}^{n_{2}}\) and \((s,p) \in \mathbb{R}^{n}.\) We use the definitions

$$\displaystyle{ F_{1}:= \left (\frac{\eta _{k}^{\,j} - h_{k}^{j}(t_{k},\psi (t_{k}),p)} {\sigma _{k}^{j}} \right )_{\begin{array}{c}k=1,\ldots,m^{\,j} \\ j=1,\ldots,n_{\mathrm{ex}}\end{array}} }$$
(17a)
$$\displaystyle{ F_{2}:= \left (\begin{array}{*{10}c} \left (c^{j}(\tau _{i},s_{i},s_{i-1},p) - s_{i}^{y}\right )_{\begin{array}{c}i=1,\ldots,n_{ ms}^{j} \\ j=1,\ldots,n_{\mathrm{ex}} \end{array}} \\ \left (g^{j}(\tau,s_{i},p)\right )_{\begin{array}{c}i=0,\ldots,n_{ ms}^{j} \\ j=1,\ldots,n_{\mathrm{ex}} \end{array}} \\ \left (d^{j}(s,p)\right )_{j=1,\ldots,n_{\mathrm{ex}}} \end{array} \right ) }$$
(17b)

Bock [5] suggested to apply the generalized Gauss–Newton method to solve nonlinear least squares problems with ODE constraints. The first application to DAE constrained problems was presented by Bock et al. [7]. For a detailed description we refer to Körkel [10]. Problem (16) is solved iteratively by examining linearized equations

$$\displaystyle{ \min _{\varDelta s,\varDelta p}\quad \left \|F_{1} + J_{1}\left (\begin{array}{c} \varDelta s\\ \varDelta p \end{array} \right )\right \|_{2}^{2} }$$
(18a)
$$\displaystyle{ \text{s.t. }0 = F_{2}+J_{2}\left (\begin{array}{c} \varDelta s\\ \varDelta p \end{array} \right ). }$$
(18b)

Therefore, we need to compute the Jacobians

$$\displaystyle{ J_{1}:= \frac{dF_{1}} {d(s,p)}, }$$
(19a)
$$\displaystyle{ J_{2}:= \frac{dF_{2}} {d(s,p)}. }$$
(19b)

For problem (15) and n ex = 1, the Jacobian has the following structure:

$$\displaystyle{ J = \left (\begin{array}{*{10}c} J_{1} \\ J_{2} \end{array} \right ) = \left (\begin{array}{*{10}c} D_{0}^{1} & D_{1}^{1} & \cdots & D_{n_{ms}}^{1} & D_{s^{r}}^{1} & D_{p}^{1} \\ G_{0} & (-I,0)& & & G_{0}^{s^{r} } & G_{0}^{p^{\phantom{I}} }\\ & \ddots & \ddots & & \vdots & \vdots \\ & & G_{n_{\mathrm{ms}}-1} & (-I,0)&G_{n_{\mathrm{ms}}-1}^{s^{r} } & G_{n_{\mathrm{ms}}-1}^{p} \\ H_{0} & & & & H_{0}^{s^{r} } & H_{0}^{p^{\phantom{I}} }\\ & & \ddots & & \vdots & \vdots \\ & & & H_{n_{m}s} & H_{n_{ms}}^{s^{r} } & H_{n_{ms}}^{p} \\ D_{0}^{2} & D_{1}^{2} & \cdots & D_{n_{ms}}^{2} & D_{s^{r}}^{2} & D_{p}^{2^{\phantom{I}} } \end{array} \right ), }$$
(20)

with the derivatives of

  • of the residual of the measurements

    $$\displaystyle{ D_{i}^{1}:= \frac{\partial F_{1}} {\partial s_{i}},\ i = 0,\ldots,n_{\mathrm{ms}},\quad D_{\hat{v}}^{1}:= \frac{\partial F_{1}} {\partial \hat{v}}, }$$
  • of the continuity constraints

    $$\displaystyle{ G_{i}:= \frac{\partial \psi ^{y}} {\partial s_{i}}(\tau _{i+1};s_{i},p),\ i = 0,\ldots,n_{\mathrm{ms}} - 1,\quad G_{i}^{\hat{v}}:= \frac{\partial \psi ^{y}} {\partial \hat{v}}(\tau _{i+1},s_{i},p), }$$
  • of the consistency conditions

    $$\displaystyle{ H_{i}:= \frac{\partial g} {\partial s_{i}}(\tau _{i},s_{i},p),\ i = 0,\ldots,n_{\mathrm{ms}},\quad H_{i}^{\hat{v}}:= \frac{\partial g} {\partial \hat{v}}(\tau _{i},s_{i},p), }$$
  • and the initial conditions and the interior point constraints

    $$\displaystyle{ D_{i}^{2}:= \frac{\partial d} {\partial s_{i}},\ i = 0,\ldots,n_{\mathrm{ms}},\quad D_{\hat{v}}^{2}:= \frac{\partial d} {\partial \hat{v}}. }$$

with \(\hat{v} = (p,s^{r}).\)

To assure uniqueness of the solution of Problem (16), we assume that the following two conditions hold for all values of (s,  p), where we have to evaluate F and J: 

  • Constraint Qualification (CQ)

    $$\displaystyle{ \text{rank}\;J_{2}(s,p) = n_{2}, }$$
    (21)
  • Positive Definiteness (PD)

    $$\displaystyle{ \text{rank}\;J(s,p) = n. }$$
    (22)

We recall from Sect. 3 the dimension of F 2. The continuity conditions (12) sum up to n y ⋅ n ms constraints, the consistency constraints (13) provide n z ⋅ (n ms + 1) additional constraints and the initial conditions and the interior point and boundary constraints (14) results in n y + n r equations. We end up with

$$\displaystyle{ n_{2} = (n_{y} + n_{z}) \cdot (n_{\mathrm{ms}} + 1) + n_{r}. }$$

Since we have already defined

$$\displaystyle{ (n_{y} + n_{z}) \cdot (n_{\mathrm{ms}} + 1) }$$

shooting variables, we need to introduce n r additional variables s r to guarantee that (CQ) is fulfilled.

The linearization of (16) leads to a comparatively large, but sparse Jacobian. Bock [5] introduced the condensing algorithm for ODE constrained parameter estimation problems that exploits the sparse structure of (20) and eliminates the shooting variables s i ,  i = 1, , n ms by a block-Gaussian elimination. The condensed system depends only on \(\varDelta s_{0}^{y},\ \varDelta s^{r}\) and Δ p

$$\displaystyle{ \min _{\varDelta s_{0},\varDelta p}\quad \left \|u_{1} + E_{1}\varDelta s_{0}^{y} + E_{ 1}^{r}\varDelta s^{r} + E_{ 1}^{p}\varDelta p\right \|_{ 2}^{2} }$$
(23a)
$$\displaystyle{ \text{s.t. }\quad 0 = u_{2} + E_{2}\varDelta s_{0}^{y} + E_{ 2}^{r}\varDelta s^{r} + E_{ 2}^{p}\varDelta p }$$
(23b)

By projecting on the algebraic variables s i z,  i = 0, , n ms, the method can be applied to DAE constrained parameter estimation problems too, cf. Leineweber [11].

The procedure of evaluating the Jacobians first and applying the condensing method afterwards is referred to as the general approach. The general approach is implemented in the software package for parameter estimation PARFIT which is based on the methods presented in this section and in Bock [5] and Körkel [10].

5 The Reduced Approach

Especially for DAE constraints, that result from parametrized PDEs, the effort to compute (20) is tremendous due to the high dimension of s i . The computation of submatrices G i and H i scales with the number of (discretized) states. For (n y + n z ) ≫ n p , the effort to evaluate the blocks G i and H i dominates the evaluation of the Jacobian. The computation of these blocks requires the evaluation of (n y + n z ) variational differential equations. That is why the common approach is not suitable to solve PDE constrained parameter estimation problems in the context of multiple shooting.

Schlöder [13] developed an approach for high-dimensional ODE systems that couples the evaluation of the Jacobians and the subsequent condensing by using directional derivatives. Thereby, the effort of the computation of the Jacobian (20) reduces to the one of single shooting, i.e., the smallest possible number. Bauer [3] extended this method to parameter estimation problems with differential algebraic constraints.

We developed a different formulation of the reduced approach which fully eliminates the algebraic constraints and, thus, leads to a reduced condensed system of equal size as Problem (23). The approach of Bauer leads to redundant constraints which may cause numerical problems.

As in Sect. 3, we present the following method only for the first experiment and neglect the superscript 1. For the next steps, we assume that the interior point constraints and the residuals of the measurements can be written in the following form

$$\displaystyle{ 0 =\sum _{ i=0}^{n_{\mathrm{ms}} }\sum _{t_{k}\in \mathcal{I}_{i}}\hat{h}_{k}(\psi (t_{k}),p) =\sum _{ i=0}^{n_{\mathrm{ms}} }R_{i}^{1},\qquad \hat{h}_{ k} = \frac{\eta _{k} - h_{k}} {\sigma _{k}}, }$$
(24a)
$$\displaystyle{ 0 =\sum _{ i=0}^{n_{\mathrm{ms}} }\sum _{t_{l}\in \mathcal{I}_{i}}r_{l}(\psi ^{j}(t_{ l}),p,s^{r}) =\sum _{ i=0}^{n_{\mathrm{ms}} }R_{i}^{2}.\quad }$$
(24b)

We refer to Eqs. (24) as separability conditions. We define the derivatives of (6) and (4) with respect to \(\hat{v} = (p,s^{r})\) according to

$$\displaystyle\begin{array}{rcl} D_{\hat{v}}^{1}& =& \frac{d} {d\hat{v}}\sum _{i=0}^{n_{\mathrm{ms}} }\sum _{t_{k}\in \mathcal{I}_{i}}\hat{h}_{k}(\psi (t_{k}),p) =\sum _{ i=0}^{n_{\mathrm{ms}} } \frac{d} {d\hat{v}}\sum _{t_{k}\in \mathcal{I}_{i}}\hat{h}_{k}(\psi (t_{k}),p) =\sum _{ i=0}^{n_{\mathrm{ms}} }D_{\hat{v},i}^{1}. {}\\ D_{\hat{v}}^{2}& =& \frac{d} {d\hat{v}}\sum _{i=0}^{n_{\mathrm{ms}} }\sum _{t_{l}\in \mathcal{I}_{i}}r_{l}(\psi (t_{l}),p) =\sum _{ i=0}^{n_{\mathrm{ms}} } \frac{d} {d\hat{v}}\sum _{t_{l}\in \mathcal{I}_{i}}r_{l}(\psi (t_{l}),p) =\sum _{ i=0}^{n_{\mathrm{ms}} }D_{\hat{v},i}^{2}. {}\\ \end{array}$$

cf. Eq. (20). To eliminate the shooting variables at t = τ 0, we define

$$\displaystyle{ d_{0}(s,p):= y_{0}(s^{r},p) - s_{ 0}^{y}. }$$
(25)

and we examine the rows of the Jacobian (20) that belong to the initial conditions (25) and to the consistency constraints at t = τ 0

$$\displaystyle\begin{array}{rcl} D_{0}^{s_{0}^{y} }\varDelta s_{0}^{y} + D_{ 0}^{s_{0}^{z} }\varDelta s_{0}^{z} + D_{ 0}^{s^{r} }\varDelta s^{r} + D_{ 0}^{p}\varDelta p + d_{ 0}(s,p)& =& 0, \\ H_{0}^{s_{0}^{y} }\varDelta s_{0}^{y} + H_{ 0}^{s_{0}^{z} }\varDelta s_{0}^{z} + H_{ 0}^{s^{r} }\varDelta s^{r} + H_{ 0}^{p}\varDelta p + g(\tau _{ 0},s_{0},p)& =& 0.{}\end{array}$$
(26)

Obviously, it holds

$$\displaystyle{ D_{0}^{s_{0}^{y} } = -I_{n_{y}},\quad D_{0}^{s_{0}^{z} } = 0. }$$

Since we consider only DAEs with differentiation index 1, the matrix

$$\displaystyle{ M:= \left (\begin{array}{*{10}c} -I_{n_{y}} & 0 \\ H_{0}^{s_{0}^{y} } & H_{0}^{s_{0}^{z} } \end{array} \right ) }$$
(27)

is regular. We eliminate n y + n z variables Δ s 0 formally from (26) and obtain

$$\displaystyle\begin{array}{rcl} \left (\begin{array}{*{10}c} \varDelta s_{0}^{y} \\ \varDelta s_{0}^{z} \end{array} \right )& =& -\left (\begin{array}{*{10}c} -I_{n_{y}} & 0 \\ H_{0}^{s_{0}^{y} } & H_{0}^{s_{0}^{z} } \end{array} \right )^{-1}\left [\left (\begin{array}{*{10}c} D_{0}^{s^{r} } \\ H_{0}^{s^{r} } \end{array} \right )\varDelta s^{r} + \left (\begin{array}{*{10}c} D_{0}^{p} \\ H_{0}^{p} \end{array} \right )\varDelta p + \left (\begin{array}{*{10}c} d_{0} \\ g(\tau _{0},s_{0},p) \end{array} \right )\right ] \\ & =& M_{s}^{0}\varDelta s^{r} + M_{ p}^{0}\varDelta p + M_{ r}^{0} {}\end{array}$$
(28)

with

$$\displaystyle{ M_{r}^{0}:= -\left (\begin{array}{*{10}c} -I_{n_{y}} & 0 \\ H_{0}^{s_{0}^{y} } & H_{0}^{s_{0}^{z} } \end{array} \right )^{-1}\left (\begin{array}{*{10}c} d_{0} \\ g(\tau _{0},s_{0},p) \end{array} \right ), }$$
(29a)
$$\displaystyle{ M_{s}^{0}:= -\left (\begin{array}{*{10}c} -I_{n_{y}} & 0 \\ H_{0}^{s_{0}^{y} } & H_{0}^{s_{0}^{z} } \end{array} \right )^{-1}\left (\begin{array}{*{10}c} D_{0}^{s^{r} } \\ H_{0}^{s^{r} } \end{array} \right ), }$$
(29b)
$$\displaystyle{ M_{p}^{0}:= -\left (\begin{array}{*{10}c} -I_{n_{y}} & 0 \\ H_{0}^{s_{0}^{y} } & H_{0}^{s_{0}^{z} } \end{array} \right )^{-1}\left (\begin{array}{*{10}c} G_{0}^{p} \\ H_{0}^{p} \end{array} \right ). }$$
(29c)

We refer to (29) as seed matrices. The following steps are closely related to the ODE formulation of the reduced approach presented by Schlöder [13] with an extension to the consistency constraints which are solved locally for \(s_{j}^{z},\ j = 0,n_{\mathrm{ms}}.\)

The idea is to apply the explicit representation for the increments Δ s 0 given by Eq. (28) to the evaluation of the remaining constraints. We define

$$\displaystyle{ \hat{E}_{l}^{p,0}:= D_{ 0}^{l} \cdot M_{ p}^{0} + D_{ p,0}^{l}, }$$
(30a)
$$\displaystyle{ \hat{E}_{l}^{s,0}:= D_{ 0}^{l} \cdot M_{ s}^{0} + D_{ s^{r},0}^{l},\quad l = 1,2, }$$
(30b)
$$\displaystyle{ \hat{u}_{l}^{0}:= D_{ 0}^{l} \cdot M_{ r}^{0} + R_{ 0}^{l}. }$$
(30c)

Since we use the matrices \(\hat{E}_{l}^{p,0},\ \hat{E}_{l}^{s,0}\) and the vectors \(\hat{u}_{l}^{0},\ l = 1,2,\) for the computation of the reduced condensed system, the notations in Eqs. (30) correspond to (23). Then, we compute recursively

$$\displaystyle{ \hat{E}_{l}^{p,i} =\hat{ E}_{ l}^{p,i-1} + D_{ i}^{l} \cdot M_{ p}^{i} + D_{ p,i}^{l}, }$$
(31a)
$$\displaystyle{ \hat{E}_{l}^{s,i} =\hat{ E}_{ l}^{s,i-1} + D_{ i}^{l} \cdot M_{ s}^{i} + D_{ s^{r},i}^{l},\quad i = 1,\ldots,n_{\mathrm{ ms}}\quad l = 1,2, }$$
(31b)
$$\displaystyle{ \hat{u}_{l}^{i} =\hat{ u}_{ l}^{i-1} + D_{ i}^{l} \cdot M_{ r}^{i} + R_{ i}^{l}. }$$
(31c)

We set

$$\displaystyle{ \tilde{E}_{l}^{p}:=\hat{ E}_{ l}^{p,n_{\mathrm{ms}} },\tilde{E}_{l}^{s}:=\hat{ E}_{ l}^{s,n_{\mathrm{ms}} },\tilde{u}_{l}:=\hat{ u}_{i}^{n_{\mathrm{ms}} },\quad l = 1,2, }$$

and obtain the reduced condensed system

$$\displaystyle{ \min _{\varDelta p,\varDelta s^{r}}\quad \frac{1} {2}\left \|\tilde{u}_{1} +\tilde{ E}_{1}^{p}\varDelta p +\tilde{ E}_{ 1}^{s}\varDelta s^{r}\right \|_{ 2}^{2} }$$
(32a)
$$\displaystyle{ \text{s.t.}\quad 0 =\tilde{ u}_{2} +\tilde{ E}_{2}^{p}\varDelta p +\tilde{ E}_{ 2}^{s}\varDelta s^{r}. }$$
(32b)

The seed matrices are updated iteratively by

$$\displaystyle{ M_{p}^{i}:= \left (\begin{array}{*{10}c} G_{i} \\ H_{i+1} \end{array} \right )\cdot M_{p}^{i-1}+\left (\begin{array}{*{10}c} G_{i}^{p} \\ H_{i+1}^{p} \end{array} \right ), }$$
(33a)
$$\displaystyle{ M_{s}^{i}:= \left (\begin{array}{*{10}c} G_{i} \\ H_{i+1} \end{array} \right )\cdot M_{s}^{i-1}+\left (\begin{array}{*{10}c} G_{i}^{s^{r} } \\ H_{i+1}^{s^{r} } \end{array} \right ),\qquad k = 1,\ldots,n_{\mathrm{ms}}, }$$
(33b)
$$\displaystyle{ M_{r}^{i}:= \left (\begin{array}{*{10}c} G_{i} \\ H_{i+1} \end{array} \right )\cdot M_{r}^{i-1}+\left (\begin{array}{*{10}c} c(\tau _{i+1},s_{i+1},s_{i},p) \\ g(\tau _{i+1},s_{i+1},p) \end{array} \right ). }$$
(33c)

In Eqs. (30), (31) and (33), the expressions given by ⋅  are evaluated by directional derivatives and do not denote matrix products. In difference to the common approach, described in Sect. 4, we have to evaluate n p + n r + 1 directions instead of n y + n z + n p .

If the initial conditions (14) are given in the form

$$\displaystyle{ d_{0}(s,p) = y_{0}(p) - s_{0}^{y}, }$$

i.e., Eq. (14) does not depend explicitly on the variables s r, the number of required directions is independent of the number of states.

After we have solved Problem (32) for Δ p and Δ s r, the increments for the shooting variables (s 1, , s ms) are determined by

$$\displaystyle{ \varDelta s_{i}^{j} = M_{ r}^{i,j} + M_{ s}^{i,j}\varDelta s^{r} + M_{ p}^{i,j}\varDelta p,\quad i = 0,\ldots,n_{\mathrm{ ms}}^{j},\ j = 1,\ldots,n_{\mathrm{ ex}}. }$$
(34)

The methods, which have been presented in this section, have been implemented in a software package for parameter estimation called PAREMERA. PAREMERA is a new implementation in Fortran90 that is suited for the treatment of multiple experiments.

6 Example

In the following, we examine the 1D heat equation, e.g., see Evans [9]. which describes the distribution of heat in a given region over time. The system is defined by

$$\displaystyle{ 0 = \frac{\partial u} {\partial t} - p_{1}\nabla ^{2}u }$$
(35a)
$$\displaystyle{ 0 = u(t,0) = u(t,1), }$$
(35b)
$$\displaystyle{ u(0,x) = -4 \cdot x \cdot \left (x - 1\right ) }$$
(35c)

with homogeneous Dirichlet conditions on the domain Ω × T = [0, 1] × [0, 1] (Fig. 1).

Fig. 1
figure 1

Heat distribution over the domain Ω × T. The white asterisks mark the measurement points

Here, u is an arbitrary function, usually referred to as temperature. The parameter p 1 is the thermal diffusivity. We discretize (35) with second order central finite differences for three different mesh sizes \(\varDelta x = \left \{0.01,0.002,0.001\right \}\) to obtain an ODE system of 101,  501 and 1001 states, respectively.

To compare the results between the reduced approach and the common one, the time interval is decomposed into four subintervals [τ i , τ i+1) with τ i  = 0. 25 ⋅ i,  i = 0, , 4. We measure the peak at x = 0. 5 at t ∈ { 0. 2,  0. 4,  0. 6,  0. 8,  1. 0}. The measurement data is determined by integrating the ODE system applying the software package DAESOL by Bauer [4] with known true parameter

$$\displaystyle{ p_{1}^{{\ast}} = 0.1 }$$

and adding Gaussian noise. For the calculations, p 1 is scaled to one.

We compare the results of the two previously mentioned parameter estimation tools PAREMERA and PARFIT. Here, we use a version of PARFIT which is eligible for the exploitation of multiple experiment structures, see von Schwerin [15]. Both tools are embedded in software toolbox VPLAN by Körkel et al. [10].

We apply \(p_{1}^{0} = 2\) as starting parameter for all six settings (three mesh sizes and two algorithms). All computations are executed on a 64bit computer with 4 GB memory and an Intel ○​​​​​R Core2Duo with 2 × 2. 8 GHz. The results are listed in Table 1.

Table 1 Survey of the results

Both algorithms converge for all mesh sizes to approximately the same solution \((\hat{p}_{1} \approx 1.00806),\) but there is a significant difference in the time per iteration. For 101 states, PAREMERA is around four times faster then PARFIT. With increasing number of states, the difference in computational time increases. PAREMERA is 13 and 26 times faster for 501 and 1001 states, respectively.

Another fact that differs drastically, is the number of iterations. For all discretizations, PAREMERA achieves convergence after 5 iterations while PARFIT finds the solution after 10 iterations. This can be explained by the different types of globalization strategies. In PAREMERA, the restricted monotonicity test (RMT) is implemented as it is presented in Bock et al. [8]. In Parfit, only a first order Taylor series expansion is used to compute the curvature information.

Note, that the computed increments (Δ s, Δ p) for the first iteration of both algorithms are exactly the same since both algorithms solve the same system.

In Fig. 2, a comparison is shown between the two fitting curves with \(p_{1}^{0} = 2\) on the left hand side and the resulting parameter \(\hat{p}_{1} = 1.00806\) computed with PAREMERA on the right hand side, respectively.

Fig. 2
figure 2

Data fits for the starting parameter \(p_{1}^{0} = 2\) and the estimated parameter \(\hat{p}_{1} = 1.00806\)

Even for this rather small example we could show the advantages of the reduced approach. For more complex problems we expect even more significant savings in computation time. This is important to solve higher dimensional PDE problems or to do online parameter estimation. Thus, the reduced approach should be favored to solve this kind of problems.