Keywords

These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

1 Introduction

Many processes in engineering, chemistry, or physics can be described by dynamical systems given by ordinary differential equations (ODEs) or differential-algebraic equations (DAEs). These equations usually depend on model parameters, for example material specific constants that are not directly accessible by measurements. However, as the models are often highly nonlinear, simulation results can vary strongly depending on the values of the model parameters. Thus it is desirable to estimate the model parameters with high precision. This process is called model validation. Only a validated model can be used to make meaningful predictions.

The first step in the model validation process is usually a parameter estimation. That means the model is fitted to given measurement data, yielding a first estimate for the parameters. Then one can perform a sensitivity analysis to obtain estimates for the covariance matrix of the parameters. The covariance matrix can reveal large uncertainties in the parameter values or correlations between different parameters. In this context, it is also possible to quantify the uncertainty of arbitrary model quantities of interest.

An important observation is that the covariance matrix depends not only on the parameters but also on the experimental conditions. This leads to the task of optimum experimental design (OED): Choose experimental conditions such that a subsequent parameter estimation yields parameters with minimum uncertainty. The uncertainty of the vector of parameters is characterized by a functional on the predicted covariance matrix. OED for general statistical models has been studied for several decades and it is a well-established field of research, see the textbooks [2, 10, 18]. Nonlinear OED for processes modeled by differential equations has been investigated by several authors, see, e.g., [4, 12, 15] for an overview.

In mathematical terms, optimum experimental design can be cast as a special (nonstandard) type of optimal control (OC) problem. As the objective, namely the functional on the covariance matrix, depends on first-order sensitivities of the states, variational differential equations or sensitivity equations must be explicitly included in the problem formulation, leading to large, but specially structured differential equation systems. We are interested in direct methods for OED problems, in particular direct multiple shooting, that transform the infinite dimensional optimal control problem into a finite dimensional nonlinear programming problem (NLP). The direct multiple shooting method for optimal control problems as described by Bock and Plitt [7] makes use of the partial separability of the objective function. This leads to a block-diagonal Hessian of the Lagrangian that can and should be exploited by Newton-type methods. However, a straightforward formulation of the OED objective function lacks the feature of partial separability, so special care must be taken to reformulate the OED problem as a standard optimal control problem. In [14, 16] direct multiple shooting has been applied to OED. In [13], a collocation discretization is applied to a similar problem. The main contribution of this paper consists in detailed descriptions of the structured NLPs that result from a multiple shooting discretization as well as numerical results that demonstrate their benefits and limitations.

The paper is organized as follows: In Sect. 2, we give an introduction to optimum experimental design and formulate it as a nonstandard optimal control problem. In Sect. 3, the direct multiple shooting method is described for standard optimal control problems. Afterwards, in Sect. 4 we propose two ways how to transform the OED problem to a specially structured standard OC problem. In Sect. 5 we show how to efficiently evaluate the constraint Jacobian as well as the gradient and Hessian of the objective. A numerical example from chemical engineering illustrates the efficacy of the approach in Sect. 6. Section 7 concludes.

2 Nonlinear Optimum Experimental Design for Parameter Estimation

Optimum experimental design aims at improving parameter estimation results in a statistical sense. We first introduce the type of parameter estimation problems whose results we seek to improve with OED along with some general notation that we use throughout the paper.

2.1 Parameter Estimation for Dynamical Systems

We consider a dynamical process on a fixed time horizon [t 0, t f ] that is described by the following differential-algebraic equation (DAE) system with interior point and boundary conditions:

$$\displaystyle\begin{array}{rcl} \dot{y}(t)& =& f(t,y(t),z(t),p,u(t)),\quad y(t_{0}) = y_{0}(p,u){}\end{array}$$
(1a)
$$\displaystyle\begin{array}{rcl} 0& =& g(t,y(t),z(t),p,u(t)){}\end{array}$$
(1b)
$$\displaystyle\begin{array}{rcl} 0& =& \sum _{i=1}^{N_{r} }r_{i}(t_{i},y(t_{i}),z(t_{i}),p){}\end{array}$$
(1c)

where

$$\displaystyle\begin{array}{rcl} y(t) \in \mathbb{R}^{n_{y} }& & \quad \mathit{(differentialstates)} {}\\ z(t) \in \mathbb{R}^{n_{z} }& & \quad \mathit{(algebraicstates)} {}\\ p \in \mathbb{R}^{n_{p} }& & \quad \mathit{(parameters)} {}\\ u(t) \in \mathbb{R}^{n_{u} }& & \quad \mathit{(controls)} {}\\ r_{i}(t_{i},y(t_{i}),z(t_{i}),p) \in \mathbb{R}^{n_{r} }& & \quad \mathit{(boundaryconditions)}. {}\\ \end{array}$$

Throughout the text we will use the notation x(t) = (y(t)T, z(t)T)T to denote both differential and algebraic states.

Assume measurement data \(\eta _{1},\ldots,\eta _{N_{M}}\) are available at sampling times \(t_{1},\ldots,t_{M}\) such that

$$\displaystyle\begin{array}{rcl} h_{i}(t_{i},y(t_{i}),z(t_{i}),p^{\star }) =\eta _{ i} +\varepsilon _{i},\quad i = 1,\ldots,N_{M},& & {}\\ \end{array}$$

where \(p^{\star }\) are the true—but inaccessible—parameters, y and z the corresponding states, and \(\varepsilon _{i}\) are independently normally distributed with zero mean and standard deviation \(\sigma _{i}\). This assumption states that the model is structurally correct and errors arise only due to inaccuracies in the measurement process. We call

$$\displaystyle\begin{array}{rcl} h_{i}(t,y(t),z(t),p),\quad i = 1,\ldots,N_{M}& & {}\\ \end{array}$$

the model response or observable.

The maximum likelihood parameter estimation problem can be stated as:

$$\displaystyle\begin{array}{rcl} \min _{y_{0},x,p}\sum _{i=1}^{N_{m} }& & \left (\frac{h_{i}(t_{i},y(t_{i}),z(t_{i}),p) -\eta _{i}} {\sigma _{i}} \right )^{2}{}\end{array}$$
(2a)
$$\displaystyle\begin{array}{rcl} \mathrm{s.t.}\dot{y}(t)& =& f(t,y(t),z(t),p,u(t)),\quad y(t_{0}) = y_{0}(p,u){}\end{array}$$
(2b)
$$\displaystyle\begin{array}{rcl} 0& =& g(t,y(t),z(t),p,u(t)){}\end{array}$$
(2c)
$$\displaystyle\begin{array}{rcl} 0& =& \sum _{i=1}^{N_{r} }r_{i}(t_{i},y(t_{i}),z(t_{i}),p){}\end{array}$$
(2d)

Parameter estimation problems constrained by differential equations can be solved by different approaches, e.g. by direct multiple shooting in combination with a generalized Gauss-Newton method, see [6].

2.2 Sensitivity Analysis

The solution \(\hat{p}\) of the parameter estimation problem (2) is a random variable due to the fact that the measurements η i are random. The variance-covariance matrix C of \(\hat{p}\) is given by:

$$\displaystyle\begin{array}{rcl} C = \left (\begin{array}{*{10}c} I &0 \end{array} \right )\left (\begin{array}{*{10}c} \mathcal{J}_{1}^{T}\mathcal{J}_{ 1} & \mathcal{J}_{2}^{T} \\ \mathcal{J}_{2} & 0 \end{array} \right )^{-1}\left (\begin{array}{*{10}c} I \\ 0 \end{array} \right )& &{}\end{array}$$
(3)

where \(\mathcal{J}_{1} \in \mathbb{R}^{N_{m}\times n_{p}}\) and \(\mathcal{J}_{2} \in \mathbb{R}^{n_{r}\times n_{p}}\) are the Jacobians of the residual vectors of the parameter estimation problem. We denote by \(\mathcal{J}_{1,i}\) the rows of \(\mathcal{J}_{1}\) and by \(\mathcal{J}_{2,i}\) the summands that make up \(\mathcal{J}_{2}\):

$$\displaystyle\begin{array}{rcl} \mathcal{J}_{1,i}& =& \left (\frac{\sqrt{w_{i}}} {\sigma _{i}} \left (\frac{\partial h_{i}} {\partial x} (t_{i},y(t_{i}),z(t_{i}),p)\frac{\partial x} {\partial p}(t_{i}) + \frac{\partial h_{i}} {\partial p} (t_{i},y(t_{i}),z(t_{i}),p)\right )\right )_{i=1,\ldots,N_{m}}{}\end{array}$$
(4)
$$\displaystyle\begin{array}{rcl} \mathcal{J}_{2}& =& \sum _{i=1}^{N_{r} }\mathcal{J}_{2,i},\quad \mathcal{J}_{2,i} = \frac{\partial r_{i}} {\partial x} (t_{i},y(t_{i}),z(t_{i}),p)\frac{\partial x} {\partial p}(t_{i}) + \frac{\partial r_{i}} {\partial p} (t_{i},y(t_{i}),z(t_{i}),p).{}\end{array}$$
(5)

We assume that \(\mathcal{J}_{2}\) has full rank and that \(\mathcal{J}_{1}^{T}\mathcal{J}_{1}\) is positive definite on \(\text{Ker}\mathcal{J}_{2}\) which implies existence of C.

In (4) we have also introduced measurement weights \(w_{i} \in \{ 0,1\},\ i = 1,\ldots,N_{m}\) for each measurement time. They are fixed in the parameter estimation context but will be design variables in the experimental design where they allow us to select or de-select measurements.

The sensitivities of the states x with respect to the parameters p are subject to the following variational differential-algebraic equations (VDAE), also called sensitivity equations:

$$\displaystyle\begin{array}{rcl} \dot{y}_{p}(t)& =& \frac{\partial f} {\partial x}(t,y(t),z(t),p,u(t))x_{p}(t) + \frac{\partial f} {\partial p} (t,y(t),z(t),p,u(t)){}\end{array}$$
(6)
$$\displaystyle\begin{array}{rcl} 0& =& \frac{\partial g} {\partial x}(t,y(t),z(t),p,u(t))x_{p}(t) + \frac{\partial g} {\partial p}(t,y(t),z(t),p,u(t)),{}\end{array}$$
(7)

where

$$\displaystyle\begin{array}{rcl} x_{p}(t) = \frac{\partial x} {\partial p}(t) = \left (\frac{\partial y} {\partial p}(t), \frac{\partial z} {\partial p}(t)\right ).& & {}\\ \end{array}$$

Initial values for the VDAE are given by

$$\displaystyle\begin{array}{rcl} y_{p}(t_{0}) = \frac{\partial y_{0}} {\partial p} & & {}\\ \end{array}$$

for the variational differential states and by (7) for the variational algebraic states. Note that (6) and (7) depend on y(t) and z(t) and therefore have to be solved together with (1a) and (1b).

2.3 The Optimum Experimental Design Problem

Based on the sensitivity analysis, we can predict the variance-covariance matrix for different experimental settings that are characterized by controls u(t) as well as a choice of measurements. An experiment may also be constrained by external process constraints, e.g. safety or cost constraints.

The task of optimum experimental design is to choose experimental settings such that the predicted covariance matrix has the best properties in some statistical sense. The quality of the matrix is measured by a criterion ϕ from statistical experimental design:

  • A-criterion: \(\phi =\mathop{ \mathrm{tr}}\nolimits C\)

  • D-criterion: \(\phi =\det C\)

  • E-criterion: \(\phi =\max \{\lambda _{i}\), \(i = 1,\ldots,n_{p}\), \(\lambda _{i}\) eigenvalue of C} =  | | C | | 2

  • M-criterion: \(\phi =\max \{ C_{ii}\), \(i = 1,\ldots,n_{p}\}\)

The complete optimum experimental design problem is

$$\displaystyle\begin{array}{rcl} \min _{y_{0},x,x_{p},u,w}& & \phi \left (\left (\begin{array}{*{10}c} I &0 \end{array} \right )\left (\begin{array}{*{10}c} \mathcal{J}_{1}^{T}\mathcal{J}_{ 1} & \mathcal{J}_{2}^{T} \\ \mathcal{J}_{2} & 0 \end{array} \right )^{-1}\left (\begin{array}{*{10}c} I \\ 0 \end{array} \right )\right ){}\end{array}$$
(8a)
$$\displaystyle\begin{array}{rcl} \mathrm{s.t.}\dot{y}(t)& =& f(t,y(t),z(t),p,u(t)),\quad t \in [t_{0},t_{f}]{}\end{array}$$
(8b)
$$\displaystyle\begin{array}{rcl} 0& =& g(t,y(t),z(t),p,u(t)),\quad t \in [t_{0},t_{f}]{}\end{array}$$
(8c)
$$\displaystyle\begin{array}{rcl} y(t_{0})& =& y_{0}(p,u){}\end{array}$$
(8d)
$$\displaystyle\begin{array}{rcl} \dot{y}_{p}(t)& =& \frac{\partial f} {\partial x}(t,y(t),z(t),p,u(t))x_{p}(t) + \frac{\partial f} {\partial p} (t,y(t),z(t),p,u(t)),\ t \in [t_{0},t_{f}]{}\end{array}$$
(8e)
$$\displaystyle\begin{array}{rcl} 0& =& \frac{\partial g} {\partial x}(t,y(t),z(t),p,u(t))x_{p}(t) + \frac{\partial g} {\partial p}(t,y(t),z(t),p,u(t)),\ t \in [t_{0},t_{f}]{}\end{array}$$
(8f)
$$\displaystyle\begin{array}{rcl} y_{p}(t_{0})& =& \frac{\partial y_{0}} {\partial p}{}\end{array}$$
(8g)
$$\displaystyle\begin{array}{rcl} 0& =& \sum _{i=1}^{N_{r} }r_{i}(t_{i},y(t_{i}),z(t_{i}),p){}\end{array}$$
(8h)
$$\displaystyle\begin{array}{rcl} 0& \leq & c\left (t,y(t),z(t),u(t),w\right ),\quad t \in [t_{0},t_{f}]{}\end{array}$$
(8i)
$$\displaystyle\begin{array}{rcl} w_{i}& \in & \{0,1\},\quad i = 1,\ldots,N_{m}{}\end{array}$$
(8j)
$$\displaystyle\begin{array}{rcl} \mathcal{J}_{1,i}& =& \left (\frac{\sqrt{w_{i}}} {\sigma _{i}} \left (\frac{\partial h_{i}} {\partial x} (t_{i},y(t_{i}),z(t_{i}),p)x_{p}(t_{i}) + \frac{\partial h_{i}} {\partial p} (t_{i},y(t_{i}),z(t_{i}),p)\right )\right )_{i=1,\ldots,N_{m}}{}\end{array}$$
(8k)
$$\displaystyle\begin{array}{rcl} \mathcal{J}_{2}& =& \sum _{i=1}^{N_{r} }\frac{\partial r_{i}} {\partial x} (t_{i},y(t_{i}),z(t_{i}),p)x_{p}(t_{i}) + \frac{\partial r_{i}} {\partial p} (t_{i},y(t_{i}),z(t_{i}),p){}\end{array}$$
(8l)

with the nominal DAE system (8b) and (8c) with initial values (8d), the variational DAE system (8e) and (8f) with initial values (8g), multipoint boundary constraints from the parameter estimation problem (8h), path and control constraints (8i), and integrality constraints for the measurement weights (8j). The Jacobians of the parameter estimation residuals (8k) and (8l) are given to define the covariance matrix on which a functional ϕ is minimized (8a). Note that while initial values for the nominal differential states (8b) may be degrees of freedom in the optimization, initial values for the variational differential states (8e) are explicitly defined by the relation (8g) and for the algebraic states z(t) and z p (t) they are implicitly defined by the algebraic conditions (8c) and (8f).

3 The Direct Multiple Shooting Method for Optimal Control Problems

The direct multiple shooting method for optimal control problems has been first introduced in [7]. Let us first consider the standard optimal control problem

$$\displaystyle\begin{array}{rcl} & \min _{\tilde{y}_{0},\tilde{x},u}\ & \varPhi (\tilde{x}(t_{f})){}\end{array}$$
(9a)
$$\displaystyle\begin{array}{rcl} \mathrm{s.t.}\dot{\tilde{y}}(t)& =\tilde{ f}(t,\tilde{y}(t),\tilde{z}(t),\tilde{u}(t)),\quad \tilde{y}(t_{0}) =\tilde{ y}_{0}&{}\end{array}$$
(9b)
$$\displaystyle\begin{array}{rcl} 0& =& \tilde{g}(t,\tilde{y}(t),\tilde{z}(t),\tilde{u}(t)){}\end{array}$$
(9c)
$$\displaystyle\begin{array}{rcl} 0& \leq & \tilde{c}(t,\tilde{y}(t),\tilde{z}(t),\tilde{u}(t)){}\end{array}$$
(9d)
$$\displaystyle\begin{array}{rcl} 0& \leq & \sum _{i=1}^{N_{\tilde{r}} }\tilde{r}_{i}(t_{i},\tilde{y}(t_{i}),\tilde{z}(t_{i})).{}\end{array}$$
(9e)

In direct methods the infinite-dimensional optimal control problem (9) is approximated by a nonlinear programming problem (NLP) which is then solved by suitable numerical methods. The following infinite-dimensional objects of the optimal control problem must be treated adequately when setting up the finite-dimensional NLP:

  • control functions \(\tilde{u}\)

  • differential and algebraic states \(\tilde{y}\) and \(\tilde{z}\)

  • path constraints \(0 \leq \tilde{ c}(t,\tilde{y}(t),\tilde{z}(t),\tilde{u}(t))\)

3.1 Control Functions

We consider a time grid

$$\displaystyle\begin{array}{rcl} t_{0} =\tau _{ 0}^{c} <\tau _{ 1}^{c} < \cdots <\tau _{ N_{c}}^{c} = t_{ f}& &{}\end{array}$$
(10)

on which the control function \(\tilde{u}(\cdot )\) is parameterized by means of local basis functions:

$$\displaystyle\begin{array}{rcl} \tilde{u}(t) =\varphi ^{j}(t,q^{j}),\quad t \in [\tau _{ j}^{c},\tau _{ j+1}^{c}],& & {}\\ \end{array}$$

where the \(q^{j} \in \mathbb{R}^{n_{u}}\) are vectors of finitely many real optimization variables. We define \(q:= (q^{0},\ldots,q^{N_{c}-1})^{T}\). The local functions \(\varphi ^{j}\) are typically polynomials of low degree, e.g. linear or constant functions.

3.2 States

In shooting methods, initial value problem solvers are employed to obtain representations of the states x for given q and y 0. In the case of direct single shooting and pure ODEs, the states are regarded as dependent variables, and only q and y 0 are kept as variables in the optimization problem. Thus the tasks of simulation and optimization are kept separate.

The direct multiple shooting method for DAEs is a simultaneous strategy to resolve simulation and optimization in parallel. Again, we consider a discretization of the time horizon

$$\displaystyle{ t_{0} =\tau _{ 0}^{s} <\tau _{ 1}^{s} < \cdots <\tau _{ N_{s}}^{s} = t_{ f} }$$
(11)

where we assume without loss of generality the grid points to be a subset of the grid points of the control grid (10). On this shooting grid we consider the following set of initial value problems with initial values s x j = (s y j, s z j) that become variables in the optimization problem:

$$\displaystyle\begin{array}{rcl} \dot{\tilde{y}}(t)& =& \tilde{f}(t,\tilde{y}(t),\tilde{z}(t),p,\tilde{u}(t))\tilde{y}(\tau _{j}^{s}) = s_{ y}^{j}{}\end{array}$$
(12a)
$$\displaystyle\begin{array}{rcl} 0& =& \tilde{g}(t,\tilde{y}(t),\tilde{z}(t),p,\tilde{u}(t)) -\theta _{j}(t)\tilde{g}(\tau _{j}^{s},s_{ y}^{j},s_{ z}^{j},p,\tilde{u}(t))\tilde{z}(\tau _{ j}^{s}) = s_{ z}^{j},{}\end{array}$$
(12b)

where \(\theta _{j}(\cdot )\) is a fast decreasing damping function with \(\theta (\tau _{j}^{s}) = 1\). This relaxed formulation was proposed in [8] and means that the algebraic condition (12b) is automatically consistent for any initial values s z j. That means the DAE solver does not need to solve the (nonlinear) algebraic condition in every iteration of the optimization algorithm to find feasible initial values. Instead, the nonlinear algebraic consistency conditions

$$\displaystyle\begin{array}{rcl} 0 =\tilde{ g}(\tau _{j}^{s},s_{ y}^{j},s_{ z}^{j},\hat{q}^{j}),\quad j = 0,\ldots,N_{ s}& & {}\\ \end{array}$$

are added to the optimization problem which ensures the solution of the original DAE at the solution of the optimization problem.

Note that the DAEs (12) are solved independently on the smaller time intervals [τ j s, τ j+1 s] as the initial values s x j are variables of the optimization problem. To ensure equivalence to the original system (1), continuity conditions are added to the optimization problem for every shooting interval. Let us denote by \(\tilde{y}(\tau _{j+1}^{s};s_{y}^{j},s_{z}^{j},\hat{q}^{j})\) a representation of the solution to problem (12) on the intervals [τ j s, τ j+1 s], where \(\hat{q}^{j}\) denotes the subvector of q that represents \(\tilde{u}\) on the interval [τ j s, τ j+1 s]. Then the continuity conditions read as:

$$\displaystyle\begin{array}{rcl} \tilde{y}(\tau _{j+1}^{s};s_{ y}^{j},s_{ z}^{j},\hat{q}^{j}) = s_{ y}^{j+1},\quad j = 0,\ldots,N_{ s} - 1.& & {}\\ \end{array}$$

Figure 1 illustrates the concept of direct multiple shooting. Note that we explicitly maintain separate grids for controls and states. A special case is of course to choose the same grid for both. However, in our experience, the decoupling of grids provides greater flexibility and a smaller number of shooting intervals can greatly accelerate convergence for problems where a relatively fine discretization of the controls is desirable.

Fig. 1
figure 1

Concept of direct multiple shooting for one state and one piecewise constant control. The continuity conditions are violated (vertical dotted lines). Note how the control is also allowed to switch within shooting intervals

3.3 Path Constraints

All path constraints such as (9d) that are required to hold at infinitely many points are evaluated on finitely many checkpoints only. Let us assume—without loss of generality—that the checkpoints are the grid points of the multiple shooting grid (10). Then the discretized path constraints read as

$$\displaystyle\begin{array}{rcl} 0 \leq \tilde{ c}(\tau _{j}^{s},s_{ y}^{j},s_{ z}^{j},\hat{q}^{j}),\quad j = 0,\ldots,N_{ s}.& &{}\end{array}$$
(13)

Depending on the choice of the time grid, the constraints might be violated in between grid points. There exist strategies how to adaptively add checkpoints, see, e.g., [17], but to keep the notation simple we assume for the scope of this paper that they match the grid points of the shooting grid.

3.4 Structured NLP

We have now addressed all constraints of the optimal control problem and can formulate the structured multiple shooting NLP as follows:

$$\displaystyle\begin{array}{rcl} & \min _{s_{y},s_{z},q}\ & \phi (s_{x}^{N}){}\end{array}$$
(14a)
$$\displaystyle\begin{array}{rcl} \mathrm{s.t.}0& =\tilde{ y}(\tau _{0}^{s}) - s_{y}^{0}&{}\end{array}$$
(14b)
$$\displaystyle\begin{array}{rcl} 0& =\tilde{ y}(\tau _{j+1}^{s};s_{y}^{j},s_{z}^{j},\hat{q}^{j}) - s_{y}^{j+1},& j = 0,\ldots,N_{s} - 1{}\end{array}$$
(14c)
$$\displaystyle\begin{array}{rcl} 0& =\tilde{ g}(\tau _{j}^{s},s_{y}^{j},s_{z}^{j},\hat{q}^{j}),& j = 0,\ldots,N_{s}\phantom{ - 1}{}\end{array}$$
(14d)
$$\displaystyle\begin{array}{rcl} 0& \leq c(\tau _{j}^{c},\tilde{y}(\tau _{j}^{c}),\tilde{z}(\tau _{j}^{c}),\hat{q}^{j}),& j = 0,\ldots,N_{s}\phantom{ - 1}{}\end{array}$$
(14e)
$$\displaystyle\begin{array}{rcl} 0& \leq \sum _{j=0}^{N_{s}}\sum _{ \begin{array}{c}i, \\ \tau _{j}\leq t_{i}<\tau _{j+1}\end{array}}^{N_{\tilde{r}}}\tilde{r}_{i}(t_{i},\tilde{y}(t_{i};s_{y}^{j},s_{z}^{j},\hat{q}^{j}),\tilde{z}(t_{i};s_{y}^{j},s_{z}^{j},\hat{q}^{j})),&{}\end{array}$$
(14f)

where (14c) and (14d) are the continuity and consistency conditions that guarantee the solution of the original DAE (1) at the solution of the optimizationproblem.

In Newton-type methods, the Jacobian of the constraints and the Hessian of the Lagrangian are of special importance. It is clear that the evaluation of the continuity and consistency constraints with index j only depend on variables s x j and \(\hat{q}^{j}\) in a nonlinear way. This leads to a constraint Jacobian that has a banded structure and a Hessian of the Lagrangian with a block diagonal structure according to the shooting discretization. These structures can be seen in the structure of the KKT matrix as depicted in Fig. 2.

Fig. 2
figure 2

Sparsity pattern of the KKT matrix of a multiple shooting discretized optimal control problem. The constraint Jacobian comprises continuity constraints that are responsible for the banded structure. Linearly coupled constraints give rise to a dense block. The Hessian of the Lagrangian (upper left) has block diagonal structure. Different block sizes may occur if the numbers of control variables on two intervals differ

Depending on the shooting discretization, problems of type (14) can be very large, but sparse. Algorithmic techniques such as condensing (see [7]) exploit this sparsity and reduce the additional effort considerably that is caused by the larger matrices when using a fine multiple shooting discretization.

4 Optimum Experimental Design as Separable NLP

We now want to apply the multiple shooting discretization as described in the previous section to the optimum experimental design problem (8). In particular we need to extend the problem formulation (14) to cope with the special kind of coupled objective that is characteristic for OED. Multiple Shooting for OED problems has first been applied in [16] and has been further investigated in [14].

4.1 Measurements

The grid of possible measurements depends on the process and should be independent of the shooting and control grid. In particular, more than one measurement could be taken at the same time, see [15].

In the original formulation, integrality of the measurement weights is required. In our formulation we employ a continuous relaxation:

$$\displaystyle\begin{array}{rcl} 0 \leq w_{i} \leq 1,\quad i = 1,\ldots,N_{m}& & {}\\ \end{array}$$

In practice, this often yields satisfactory results, as a bang-bang structure is observed for the measurements and so integrality is satisfied automatically. In fact there is also some theoretical evidence for this, see [20].

The measurement weights, along with the controls, are experimental design variables. All simple bounds and linear constraints on the measurement weights fit into the framework of general path constraints and linearly coupled interior point constraints (9d) and (9e).

4.2 Dynamical System

We combine nominal and variational states into one system:

$$\displaystyle\begin{array}{rcl} \tilde{y}(t) = \left \{\begin{array}{c} y(t)\\ \downharpoonright \! y_{ p}(t)\! \downharpoonleft \end{array} \right.\quad \in \mathbb{R}^{n_{y}+n_{y}\cdot n_{p} }& & {}\\ \end{array}$$

for the differential states and

$$\displaystyle\begin{array}{rcl} \tilde{z}(t) = \left \{\begin{array}{c} z(t)\\ \downharpoonright \! z_{ p}(t)\! \downharpoonleft \end{array} \right.\quad \in \mathbb{R}^{n_{z}+n_{z}\cdot n_{p} }& & {}\\ \end{array}$$

for the algebraic states, where we denote by \(\downharpoonright \!\cdot \!\downharpoonleft \) the map that combines the columns of an m × n matrix into a single m ⋅ n column vector by stacking them one below the other.

That leaves us with the new DAE system

$$\displaystyle\begin{array}{rcl} \dot{\tilde{y}}(t)& =& \tilde{f}(t,\tilde{y},\tilde{z},p,u) = \left \{\begin{array}{l} f(t,y(t),z(t),p,u(t)) \\ \downharpoonright \! \frac{\partial f} {\partial x}(t,y(t),z(t),p,u(t))x_{p}(t) + \frac{\partial f} {\partial p}(t,y(t),z(t),p,u(t))\! \downharpoonleft \end{array} \right.{}\end{array}$$
(15)
$$\displaystyle\begin{array}{rcl} 0& =& \tilde{g}(t,\tilde{y},\tilde{z},p,u) = \left \{\begin{array}{l} g(t,y(t),z(t),p,u(t)) \\ \downharpoonright \! \frac{\partial g} {\partial x}(t,y(t),z(t),p,u(t))x_{p}(t) + \frac{\partial g} {\partial p}(t,y(t),z(t),p,u(t))\! \downharpoonleft \end{array} \right..{}\end{array}$$
(16)

This system has of course a special structure that can and should be exploited in an efficient implementation. We will give details on this in Sect. 5.

4.3 Objective Function

An important difference between the problem of optimum experimental design (8) and the standard optimal control problem (9) is the nonlinear coupling in time in the objective that is due to the inversion when computing the covariance matrix, as it has been noted in [16]. In particular this violates the property of partial separation of the Lagrange function that is responsible for its sparse, block-diagonal Hessian. We present two approaches to resolve that nonlinear coupling.

4.3.1 Linearly Coupled Constraint for Information Matrix

Recall the definition of the covariance matrix:

$$\displaystyle\begin{array}{rcl} C = \left (\begin{array}{*{10}c} I &0 \end{array} \right )\left (\begin{array}{*{10}c} \mathcal{J}_{1}^{T}\mathcal{J}_{ 1} & \mathcal{J}_{2}^{T} \\ \mathcal{J}_{2} & 0 \end{array} \right )^{-1}\left (\begin{array}{*{10}c} I \\ 0 \end{array} \right )& & {}\\ \end{array}$$

with the rows of \(\mathcal{J}_{1}\) and the summands that constitute \(\mathcal{J}_{2}\) as defined by (4) and (5):

$$\displaystyle\begin{array}{rcl} \mathcal{J}_{1,i}& =& \left (\frac{\sqrt{w_{i}}} {\sigma _{i}} \left (\frac{\partial h_{i}} {\partial x} x_{p}(t_{i}) + \frac{\partial h_{i}} {\partial p} \right )\right )_{i=1,\ldots,N_{m}} {}\\ \mathcal{J}_{2,i}& =& \frac{\partial r_{i}} {\partial x} (t_{i},y(t_{i}),z(t_{i}),p)x_{p}(t_{i}) + \frac{\partial r_{i}} {\partial p} (t_{i},y(t_{i}),z(t_{i}),p). {}\\ \end{array}$$

In [16] it has been pointed out that

$$\displaystyle\begin{array}{rcl} \left (\begin{array}{*{10}c} \mathcal{J}_{1}^{T}\mathcal{J}_{ 1} & \mathcal{J}_{2}^{T} \\ \mathcal{J}_{2} & 0 \end{array} \right ) = \left (\begin{array}{*{10}c} \sum _{i=0}^{N_{m}}\mathcal{J}_{1,i}^{T}\mathcal{J}_{1,i}&\sum _{i=0}^{N_{r}}\mathcal{J}_{2,i}^{T} \\ \sum _{i=0}^{N_{r}}\mathcal{J}_{2,i} & 0 \end{array} \right ).& &{}\end{array}$$
(17)

Note that in particular, \(\mathcal{J}_{1,i}\) and \(\mathcal{J}_{2,i}\) depend on evaluations of the nominal and variational states x and x p at individual points t i . Thus, (17) implies that the matrices \(\mathcal{J}_{1}^{T}\mathcal{J}_{1}\) and \(\mathcal{J}_{2}\) only exhibit a linear coupling in time. In a multiple shooting context, we assign the points t i to the proper shooting intervals and plug in the representation of the solution \(x(\tau _{j+1}^{s};s_{y}^{j},s_{z}^{j},\hat{q}^{j})\) and \(x_{p}(\tau _{j+1}^{s};s_{\tilde{y}}^{j},s_{\tilde{z}}^{j},\hat{q}^{j})\), respectively. We write this as

$$\displaystyle\begin{array}{rcl} \mathcal{J}_{1}^{T}\mathcal{J}_{ 1}& =& \sum _{j=0}^{N_{s}-1}\sum _{ \tau _{j}<t_{i}\leq \tau _{j+1}}^{N_{m} }\mathcal{J}_{1,i}(s_{\tilde{y}}^{j},s_{\tilde{ z}}^{j},\hat{q}^{j},w_{ i})^{T}\mathcal{J}_{ 1,i}(s_{\tilde{y}}^{j},s_{\tilde{ z}}^{j},\hat{q}^{j},w_{ i}) {}\\ \mathcal{J}_{2}& =& \sum _{j=0}^{N_{s}-1}\sum _{ \tau _{j}<t_{i}\leq \tau _{j+1}}^{N_{r} }\mathcal{J}_{2,i}(s_{\tilde{y}}^{j},s_{\tilde{ z}}^{j},\hat{q}^{j}). {}\\ \end{array}$$

We introduce additional variables H and J and linearly coupled constraints that fit into the framework of (14). The objective then only depends on the newly introduced variables H and J and we obtain the following structured NLP:

$$\displaystyle\begin{array}{rcl} \min _{s_{\tilde{y}},s_{\tilde{z}},q,w,H,J}\phi \left (\left (\begin{array}{*{10}c} I &0 \end{array} \right )\left (\begin{array}{*{10}c} H &J^{T} \\ J & 0 \end{array} \right )^{-1}\left (\begin{array}{*{10}c} I \\ 0 \end{array} \right )\right )& &{}\end{array}$$
(18a)
$$\displaystyle\begin{array}{rcl} \mathrm{s.t.}0& =& H -\sum _{j=0}^{N_{s}-1}\sum _{ \tau _{j}<t_{i}\leq \tau _{j+1}}^{N_{m} }\mathcal{J}_{1,i}(s_{\tilde{y}}^{j},s_{\tilde{ z}}^{j},\hat{q}^{j},w_{ i})^{T}\mathcal{J}_{ 1,i}(s_{\tilde{y}}^{j},s_{\tilde{ z}}^{j},\hat{q}^{j},w_{ i}){}\end{array}$$
(18b)
$$\displaystyle\begin{array}{rcl} 0& =& J -\sum _{j=0}^{N_{s}-1}\sum _{ \tau _{j}<t_{i}\leq \tau _{j+1}}^{N_{r} }\mathcal{J}_{2,i}(s_{\tilde{y}}^{j},s_{\tilde{ z}}^{j},\hat{q}^{j}){}\end{array}$$
(18c)
$$\displaystyle\begin{array}{rcl} 0& =& \tilde{y}(\tau _{0}^{s};\hat{q}^{0}) - s_{\tilde{ y}}^{0}{}\end{array}$$
(18d)
$$\displaystyle\begin{array}{rcl} 0& =& \tilde{y}(\tau _{j+1}^{s};s_{\tilde{ y}}^{j},s_{\tilde{ z}}^{j},\hat{q}^{j}) - s_{\tilde{ y}}^{j+1},\quad j = 0,\ldots,N_{ s} - 1{}\end{array}$$
(18e)
$$\displaystyle\begin{array}{rcl} 0& =& g(\tau _{j}^{s},s_{\tilde{ y}}^{j},s_{\tilde{ z}}^{j},\hat{q}^{j}),\quad j = 0,\ldots,N_{ s}\phantom{ - 1}{}\end{array}$$
(18f)
$$\displaystyle\begin{array}{rcl} 0& \leq & c(\tau _{j}^{s},s_{ y}^{j},s_{ z}^{j},\hat{q}^{j},w^{j}),\quad j = 0,\ldots,N_{ s}\phantom{ - 1}{}\end{array}$$
(18g)
$$\displaystyle\begin{array}{rcl} 0& \leq & w_{i} \leq 1,\quad i = 1,\ldots,N_{m}{}\end{array}$$
(18h)
$$\displaystyle\begin{array}{rcl} 0& \leq & \sum _{j=0}^{N_{s} }\sum _{\tau _{j}\leq t_{i}<\tau _{j+1}}^{N_{r} }r_{i}(t_{i},y(t_{i};s_{y}^{j},s_{ z}^{j},\hat{q}^{j}),z(t_{ i};s_{y}^{j},s_{ z}^{j},\hat{q}^{j})).{}\end{array}$$
(18i)

4.3.2 Pseudo States for Covariance Matrix

Another possibility to resolve the coupling in the objective is to move the computation of the covariance to the constraints by deriving a recursion formula.

Using this formula, we introduce matrix-valued variables

$$\displaystyle\begin{array}{rcl} C^{j} = \left (\begin{array}{*{10}c} C_{1}^{j}&C_{2}^{jT} \\ C_{2}^{j}& C_{3}^{j} \end{array} \right ) \in \mathbb{R}^{(n_{p}+n_{r})\times (n_{p}+n_{r})},j = 1,\ldots,N_{ s}& &{}\end{array}$$
(19)

and constraints for the recursion at the multiple shooting nodes and add them as additional variables to the NLP. This resembles the treatment of dynamical states in a multiple shooting method, hence we refer to C j as pseudo states for the covariance matrix.

Let us first derive the formula in the unconstrained case: Let H j denote the information matrix including all terms up to time τ j . Then H j is given as the sum of H j−1 and the information gain by measurements and constraint evaluations in the interval (τ j−1, τ j ]:

$$\displaystyle\begin{array}{rcl} H^{j+1} = H^{j} +\sum _{ i:\tau _{j}<t_{i}\leq \tau _{j+1}}\mathcal{J}_{1,i}(s_{\tilde{y}}^{j},s_{\tilde{ z}}^{j},\hat{q}^{j},w_{ i})^{T}\mathcal{J}_{ 1,i}(s_{\tilde{y}}^{j},s_{\tilde{ z}}^{j},\hat{q}^{j},w_{ i}),& &{}\end{array}$$
(20)

where we start with some initial information given as a positive definite matrix H 0, e.g., from previous experiments or from literature.

At every grid point τ j the covariance matrix taking into account all measurements up to time τ j is the inverse of H j. From (20) we obtain as recursion formula for the covariance matrix:

$$\displaystyle\begin{array}{rcl} C^{0}& =& (H^{0})^{-1}{}\end{array}$$
(21)
$$\displaystyle\begin{array}{rcl} C^{j+1}& =& \left ((C^{j})^{-1} +\sum _{ i:\tau _{j}<t_{i}\leq \tau _{j+1}}\mathcal{J}_{1,i}(s_{\tilde{y}}^{j},s_{\tilde{ z}}^{j},\hat{q}^{j},w_{ i})^{T}\mathcal{J}_{ 1,i}(s_{\tilde{y}}^{j},s_{\tilde{ z}}^{j},\hat{q}^{j},w_{ i})\right )^{-1}.{}\end{array}$$
(22)

Equation (22) can be simplified to

$$\displaystyle\begin{array}{rcl} C^{j+1}& =& C^{j}\left (I +\sum _{ i:\tau _{j}<t_{i}\leq \tau _{j+1}}\mathcal{J}_{1,i}(s_{\tilde{y}}^{j},s_{\tilde{ z}}^{j},\hat{q}^{j},w_{ i})^{T}\mathcal{J}_{ 1,i}(s_{\tilde{y}}^{j},s_{\tilde{ z}}^{j},\hat{q}^{j},w_{ i})C^{j}\right )^{-1}. {}\\ \end{array}$$

This can be easily generalized to the case of constrained parameter estimation problems and we obtain the complete NLP by replacing the coupled constraints (18b) and (18c) by the pseudo continuity constraints

$$\displaystyle\begin{array}{rcl} 0& =& \left (\left (\begin{array}{*{10}c} C_{1}^{j}&C_{2}^{jT} \\ C_{2}^{j}& C_{3}^{j} \end{array} \right )^{-1} +\sum _{ i:\tau _{j}<t_{i}\leq \tau _{j+1}}\left (\begin{array}{*{10}c} \mathcal{J}_{1i}^{T}\mathcal{J}_{1i}&\mathcal{J}_{2i}^{T} \\ \mathcal{J}_{2i} & 0 \end{array} \right )\right )^{-1} -\left (\begin{array}{*{10}c} C_{1}^{j+1} & C_{2}^{j+1T} \\ C_{2}^{j+1} & C_{3}^{j+1} \end{array} \right ), \\ & & j = 0,\ldots,N - 1. {}\end{array}$$
(23)

As initial values for the pseudo states we take some a priori uncertainty given by a positive definite matrix C 1 0 and a full rank matrix C 2 0 such that C 2 0 C 2 0T is positive definite to ensure invertibility of C 0. The a priori uncertainty should be chosen several orders of magnitude larger than the expected uncertainty after the OED to make sure that the choice of C 1 0 and C 2 0 does not interfere significantly with the solution. The objective for this formulation simplifies to

$$\displaystyle\begin{array}{rcl} \phi \left (\left (\begin{array}{*{10}c} I &0 \end{array} \right )\left (\begin{array}{*{10}c} C_{1}^{N_{s}} & C_{2}^{N_{s}T} \\ C_{2}^{N_{s}} & C_{3}^{N_{s}} \end{array} \right )\left (\begin{array}{*{10}c} I\\ 0 \end{array} \right )\right ).& &{}\end{array}$$
(24)

While this formulation seems to be very much in the spirit of multiple shooting—the covariance of the system is modelled as a kind of state on each interval which is coupled via continuity-type constraints, the nonlinearity of the objective is distributed over the shooting intervals—it is computationally much less appealing: The objective (24) is simpler because the matrix inversion is now contained in the constraints, however, the pseudo continuity constraints (23) are numerically delicate, each containing two matrix inversions (in fact, we can reduce this to one by appropriate reformulation) of potentially ill-conditioned matrices.

5 Evaluation of Problem Functions

After discretization we need to solve the large but structured NLP (18). In derivative based methods such as sequential quadratic programming often the number of variables and constraints corresponds to the overall runtime of the method. In shooting methods for optimal control and especially optimum experimental design, however, the runtime is often dominated by the evaluation of states and their derivatives within the continuity constraints because they comprise possibly expensive calls to external numerical integrators. Thus it is worthwhile to have a closer look at the structures of constraints and objective and derive efficient and accurate evaluation schemes. We concentrate on the formulation (18) of the problem—linearly coupled constraints for the information matrix—as it is numerically more promising.

5.1 Constraint Derivatives

The OED problem (18) has a special structure in the constraints due to the fact that the dynamic system consists of closely related states, namely nominal and corresponding variational states. Ideally, they are evaluated together using the principle of internal numerical differentiation (IND), see [1, 5].

Let us now have a closer look at the derivatives of continuity and consistency constraints (18e) and (18f) for the system that consists of nominal and variational states.

We denote by

$$\displaystyle\begin{array}{rcl} s_{\tilde{x}}^{j}& =& \left (\begin{array}{*{10}c} s_{x}^{j},&s_{x,p_{ 1}}^{j},&\ldots,&s_{ x,p_{N_{p}}}^{j} \end{array} \right )^{T}{}\\ \end{array}$$

the shooting variables for the nominal and variational states at one shooting node.

Observation 1. Variational states for different parameters are independent. This means

$$\displaystyle\begin{array}{rcl} \frac{\partial x_{p_{i}}(\tau _{j+1})} {\partial s_{x,p_{k}}^{j}} = 0,\quad i\neq k.& & {}\\ \end{array}$$

Observation 2. By differentiating (6) and (7), we see that the derivative of a variational state with respect to its initial value satisfies the following n x × n x variational DAEs:

$$\displaystyle\begin{array}{rcl} \dot{\left (\frac{\partial y_{p_{i}}(t)} {\partial s_{x,p_{i}}^{j}}\right )}& =& \frac{\partial } {\partial x_{p_{i}}}\left (\frac{\partial f} {\partial x}x_{p_{i}}(t) + \frac{\partial f} {\partial p}\right ) \cdot \frac{\partial x_{p_{i}}(t)} {\partial s_{x,p_{i}}^{j}} = \frac{\partial f} {\partial x} \cdot \frac{\partial x_{p_{i}}(t)} {\partial s_{x,p_{i}}^{j}} {}\\ 0& =& \frac{\partial } {\partial x_{p_{i}}}\left (\frac{\partial g} {\partial x}x_{p_{i}}(t) + \frac{\partial g} {\partial p}\right ) \cdot \frac{\partial x_{p_{i}}(t)} {\partial s_{x,p_{i}}^{j}} = \frac{\partial g} {\partial x} \cdot \frac{\partial x_{p_{i}}(t)} {\partial s_{x,p_{i}}^{j}}. {}\\ \end{array}$$

These are the same equations that describes the sensitivity of the nominal states x with respect to their initial values and hence

$$\displaystyle\begin{array}{rcl} \frac{\partial x_{p_{i}}(\tau _{j+1})} {\partial s_{x,p_{i}}^{j}} = \frac{\partial x(\tau _{j+1})} {\partial s_{x}^{j}}.& & {}\\ \end{array}$$

In particular that means that we do not need to evaluate any additional state sensitivities when we explicitly discretize the variational equations by multiple shooting. Instead \(\frac{\partial x(\tau )} {\partial s_{x}}\) needs to be computed only once for each shooting interval and then can be used multiple times in the constraint Jacobian.

The part of the constraint Jacobian that corresponds to the derivative of the continuity constraints (18e) coupling the states starting at τ j−1 and at τ j has the following structure:

$$\displaystyle{ \left (\begin{array}{ccccccccccc} \frac{\partial y(\tau _{j})} {\partial s_{y}^{j-1}} & & & & \frac{\partial y(\tau _{j})} {\partial s_{z}^{j-1}} & & & & \frac{\partial y(\tau _{j})} {\partial q^{j-1}} & 0 \\ \frac{\partial y_{p_{1}}(\tau _{j})} {\partial s_{y}^{j-1}} & \frac{\partial y(\tau _{j})} {\partial s_{y}^{j-1}} & & & \frac{\partial y_{p_{1}}(\tau _{j})} {\partial s_{z}^{j-1}} & \frac{\partial y(\tau _{j})} {\partial s_{z}^{j-1}} & & & \frac{\partial y_{p_{1}}(\tau _{j})} {\partial q^{j-1}} & 0\\ \vdots & &\ddots & & \vdots & &\ddots & & \vdots & \vdots \\ \frac{\partial y_{p_{N_{ p}}}(\tau _{j})} {\partial s_{y}^{j-1}} & & & \frac{\partial y(\tau _{j})} {\partial s_{y}^{j-1}} & \frac{\partial y_{p_{N_{ p}}}(\tau _{j})} {\partial s_{z}^{j-1}} & & & \frac{\partial y(\tau _{j})} {\partial s_{z}^{j-1}} & \frac{\partial y_{p_{N_{ p}}}(\tau _{j})} {\partial q^{j-1}} & 0 \end{array} \right ), }$$

where the variable vector is ordered as follows:

$$\displaystyle\begin{array}{rcl} \left (\begin{array}{*{10}c} s_{y}^{j-1},&s_{ y,p_{1}}^{j-1},&\ldots,&s_{ y,p_{N_{p}}}^{j-1},&s_{ z}^{j-1},&s_{ z,p_{1}}^{j-1},&\ldots,&s_{ z,p_{N_{p}}}^{j-1},&q^{j-1},&w^{j-1} \end{array} \right )^{T}.& & {}\\ \end{array}$$

A similar structure can be observed for the consistency conditions (18f).

Note that the derivatives \(\frac{\partial y} {\partial (\cdot )}\) are first-order and \(\frac{\partial y_{p}} {\partial (\cdot )}\) are second-order sensitivities of the states and must be supplied by the integrator.

5.2 Objective Derivatives

The objective (18a) also deserves special attention as it is a nontrivial—but more or less fixed for the whole problem class—function that in particular comprises the inversion of a symmetric matrix.

First of all we note that (18a) only depends on the newly introduced variables H and J. This allows us to derive explicit formulas for the first and second derivative of the objective. Furthermore H and J enter the constraints only linearly through the coupled constraints (18b) and (18c), so the second derivative of the objective contains the entire curvature of the problem with respect to H and J. This can be used to cheaply compute part of the Hessian approximations in SQP methods without the need to evaluate state sensitivities of higherorder.

We only discuss the case of unconstrained parameter estimation problems (i.e. no matrix J) and \(\phi (\cdot ) =\mathop{ \mathrm{tr}}\nolimits (\cdot )\) as optimization criterion.

5.2.1 Objective Gradient

We have an objective ϕ that maps a symmetric matrix to a scalar, however, we only include the lower triangular matrix as degrees of freedom in the nonlinear program. So whenever we need the gradient of the objective in a derivative based method we need derivatives of ϕ with respect to every entry H uv , 1 ≤ v ≤ u ≤ n p .

In the unconstrained case we have \(H^{-1} = C\) and for every entry H uv

$$\displaystyle\begin{array}{rcl} \frac{\partial \phi } {\partial H_{uv}} = \frac{\partial \phi } {\partial C} \cdot \frac{\partial C} {\partial H_{uv}}.& & {}\\ \end{array}$$

Using the general formula

$$\displaystyle{ \frac{\partial H_{ij}^{-1}} {\partial H_{kl}} = -(H^{-1})_{ ik} \cdot (H^{-1})_{ lj} }$$
(25)

we obtain for the derivative with respect to a fixed entry H uv :

$$\displaystyle\begin{array}{rcl} \frac{\partial \phi } {\partial H_{uv}} =\phantom{ -}\sum _{\begin{array}{c}1\leq i,j\leq n_{p} \\ 1\leq k,l\leq n_{p}\end{array}} \frac{\partial \phi } {\partial C_{ij}} \cdot \frac{\partial C_{ij}} {\partial H_{kl}} \cdot \frac{\partial H_{kl}} {\partial H_{uv}} = -\sum _{\begin{array}{c}1\leq i,j\leq n_{p} \\ 1\leq k,l\leq n_{p}\end{array}} \frac{\partial \phi } {\partial C_{ij}} \cdot C_{ik} \cdot C_{lj} \cdot \frac{\partial H_{kl}} {\partial H_{uv}}.& & {}\\ \end{array}$$

Taking symmetry into account, we have

$$\displaystyle\begin{array}{rcl} \frac{\partial H_{kl}} {\partial H_{uv}} = \left \{\begin{array}{l} 1,\quad \mathrm{if}(k,l) = (u,v)\mathrm{or}(l,k) = (u,v)\\ 0, \quad \mathrm{else }, \end{array} \right.& & {}\\ \end{array}$$

and thus

$$\displaystyle\begin{array}{rcl} \frac{\partial \phi } {\partial H_{uu}}& =& -\sum _{1\leq i,j\leq n_{p}} \frac{\partial \phi } {\partial C_{ij}} \cdot C_{iu} \cdot C_{uj}{}\end{array}$$
(26)
$$\displaystyle\begin{array}{rcl} \frac{\partial \phi } {\partial H_{uv}}& =& -2\sum _{1\leq i,j\leq n_{p}} \frac{\partial \phi } {\partial C_{ij}} \cdot C_{iu} \cdot C_{vj},\quad u > v.{}\end{array}$$
(27)

For \(\phi =\mathop{ \mathrm{tr}}\nolimits\) this simplifies to

$$\displaystyle\begin{array}{rcl} \frac{\partial \phi } {\partial H_{uu}}& =& -\sum _{1\leq i\leq n_{p}}C_{iu}^{2}{}\end{array}$$
(28)
$$\displaystyle\begin{array}{rcl} \frac{\partial \phi } {\partial H_{uv}}& =& -2\sum _{1\leq i\leq n_{p}}C_{iu} \cdot C_{vi},\quad u > v.{}\end{array}$$
(29)

This can be calculated immediately once the covariance matrix C is computed.

5.2.2 Objective Hessian

We now use formulae (28) and (29) to derive an explicit formula for the Hessian of \(\mathop{\mathrm{tr}}\nolimits (H^{-1})\) with respect to the entries of H.

When computing \(\frac{\partial ^{2}\mathop{ \mathrm{tr}}\nolimits (H^{-1})} {\partial H_{uv}\partial H_{rs}}\) we distinguish between three cases:

  • H uv and H rs diagonal elements

  • H uv diagonal, H rs off-diagonal element

  • H uv and H rs off-diagonal elements.

For two diagonal elements we obtain:

$$\displaystyle\begin{array}{rcl} \frac{\partial ^{2}\phi (C)} {\partial H_{uu}\partial H_{rr}}& =& \frac{\partial } {\partial H_{uu}}\left ( \frac{\partial \phi } {\partial H_{rr}}\right ) = \frac{\partial } {\partial H_{uu}}\left (-\sum _{1\leq i\leq n_{v}}C_{ir}^{2}\right ) {}\\ & =& -2\sum _{1\leq i\leq n_{v}} \frac{\partial C_{ir}} {\partial H_{uu}} \cdot C_{ir} = 2\sum _{1\leq i\leq n_{v}}C_{iu} \cdot C_{ur} \cdot C_{ir}. {}\\ \end{array}$$

If we have one diagonal and one off-diagonal element we compute:

$$\displaystyle\begin{array}{rcl} \frac{\partial ^{2}\phi (C)} {\partial H_{uu}\partial H_{rs}}& =& \frac{\partial } {\partial H_{uu}}\left ( \frac{\partial \phi } {\partial H_{rs}}\right ) = \frac{\partial } {\partial H_{uu}}\left (-2\sum _{1\leq i\leq n_{v}}C_{ir} \cdot C_{si}\right ) {}\\ & =& -2\sum _{1\leq i\leq n_{v}} \frac{\partial C_{ir}} {\partial H_{uu}} \cdot C_{si} + C_{ir} \cdot \frac{\partial C_{si}} {\partial H_{uu}} {}\\ & =& 2\sum _{1\leq i\leq n_{v}}C_{iu} \cdot C_{ur} \cdot C_{si} + C_{ir} \cdot C_{sp} \cdot C_{ui}. {}\\ \end{array}$$

Taking the second derivative with respect to two off-diagonal elements yields:

$$\displaystyle\begin{array}{rcl} \frac{\partial ^{2}\phi (C)} {\partial H_{pq}\partial H_{rs}}& =& \frac{\partial } {\partial H_{uv}}\left ( \frac{\partial \phi } {\partial H_{rs}}\right ) + \frac{\partial } {\partial H_{vu}}\left ( \frac{\partial \phi } {\partial H_{rs}}\right ) {}\\ & =& \frac{\partial } {\partial H_{uv}}\left (-2\sum _{1\leq i\leq n_{v}}C_{ir} \cdot C_{si}\right ) + \frac{\partial } {\partial H_{vu}}\left (-2\sum _{1\leq i\leq n_{v}}C_{ir} \cdot C_{si}\right ) {}\\ & =& -2\sum _{1\leq i\leq n_{v}} \frac{\partial C_{ir}} {\partial H_{uv}} \cdot C_{si} + C_{ir} \cdot \frac{\partial C_{si}} {\partial H_{uv}} + \frac{\partial C_{ir}} {\partial H_{vu}} \cdot C_{si} + C_{ir} \cdot \frac{\partial C_{si}} {\partial H_{vu}} {}\\ & =& 2\sum _{1\leq i\leq n_{v}}C_{iu} \cdot C_{vr} \cdot C_{si} + C_{ir} \cdot C_{su} \cdot C_{vi} + C_{iv} \cdot C_{ur} \cdot C_{si} + C_{ir} \cdot C_{sv} \cdot C_{ui}.{}\\ \end{array}$$

As the variables H enter the constraints only linearly we note that the objective Hessian with respect to H is in fact the same as the Hessian of the Lagrangian with respect to H.

6 Numerical Results

We test our methods on two examples: The first one is a predator-prey model adapted to OED [19] that serves as proof of concept for both formulations. The second one is a more involved example from chemical engineering: the urethane reaction [15]. We report SQP iterations which basically amount to derivative evaluations as well as CPU time and compare the results to a single shooting implementation.

6.1 Implementation

We implemented direct multiple shooting for OED within our software package VPLAN [15] that allows to formulate, simulate and optimize DAE models. From a user specified formulation of the nominal DAE system, parameters, controls, and process constraints it generates structured NLPs for OED as described in Sect. 4. A special focus is on the efficient sparse evaluation of the constraints and their derivatives as outlined in Sect. 5 and parallelization of state integration and derivative evaluation on multi experiment and shooting node level.

As integrator we use DAESOL [3], a variable order and stepsize BDF method that can efficiently compute sensitivities of first and second order. The absolute and relative tolerance for all computations was set to 10−9.

For the solution of the structured nonlinear programs, we implemented a filter based line search SQP method as described in [21]. The block structured quadratic subproblems are solved by a modified version of the parametric active set solver qpOASES [11] that uses direct sparse linear algebra. The Hessian of the Lagrangian is approximated blockwise depending on the number of shooting intervals. For each block a positive definite damped BFGS update is employed which is scaled by the centered Oren-Luenberger sizing factor as described in [9]. Both full space and limited memory updates are available. In a multiple shooting context, the exact objective Hessian is computed cheaply as discussed in Sect. 5.2 and used for the lowermost diagonal block instead of a BFGS approximation. The optimality and nonlinear feasibility tolerance are set to 10−5. Details of the SQP implementation will be discussed in an upcoming publication.

All results were obtained on a workstation with two Intel Xeon hexacore CPUs (2.4 GHz) allowing 24 parallel threads in total and 32 GB RAM running Ubuntu 12.04.

6.2 Example 1: Predator-Prey Model

We consider a predator-prey dynamics taken from [19] on the time horizon [0, 12] with fixed initial values and an additional fishing term 0 ≤ u(t) ≤ 1 as control:

$$\displaystyle\begin{array}{rcl} \dot{y}_{1}(t)& =& y_{1}(t) - p_{1} \cdot y_{1}(t) \cdot y_{2}(t) - 0.4 \cdot u(t) \cdot y_{1}(t),\quad y_{1}(0) = 0.5 {}\\ \dot{y}_{2}(t)& =& -y_{2}(t) + p_{2} \cdot y_{1}(t) \cdot y_{2}(t) - 0.2 \cdot u(t) \cdot y_{2}(t),\quad y_{2}(0) = 0.7. {}\\ \end{array}$$

We assume \(p_{1} = p_{2} = 1\) and that both states can be observed at most four times during the experiment with constant variances for the measurement errors, i.e., \(\sigma _{i} = 1\). Both the control u(t) and the grid of possible measurements are discretized on 50 equidistant intervals. The objective is to minimize the average variance of p 1 and p 2, i.e. \(\frac{1} {2}\mathop{ \mathrm{tr}}\nolimits (C)\).

The initial guess for the controls is \(u(t) \equiv 0.3\) and all 50 measurements selected, i.e. \(w_{i}^{1} = w_{i}^{2} = 1\), \(i = 1,\ldots,50\). It yields an objective function value (average variance) of \(\frac{1} {2}\mathop{ \mathrm{tr}}\nolimits (C) = 0.00683\) but note that this design is infeasible: all 50 possible measurements for both states are selected but only four per state are allowed.

6.3 Example 2: Urethane Reaction

The Urethane reaction is a well-known example from chemical reaction kinetics, see [15]. The reaction scheme is the following:

$$\displaystyle\begin{array}{rcl} A + B& \rightarrow & C {}\\ A + C& \rightleftharpoons & D {}\\ 3A& \rightarrow & E {}\\ \end{array}$$

Educts are phenylisocyanate A and butanol B in the solvent dimethylsulfoxide L. During the reaction the product urethane C, the byproduct allophanate D and the byproduct isocyanate E are formed.

The products C, D, and E are modelled as differential states while A, B, and L can be computed from C, D, and E using molar number balance. In total, six parameters have to be identified, namely the frequency factors and activation energies for the Arrhenius kinetics. The objective is to minimize \(\frac{1} {6}\mathop{ \mathrm{tr}}\nolimits (C)\). To achieve this, one experiment is to be designed for the time horizon [t 0, t end ] = [0h, 80h] and one out of three possible measurements can be taken at each of 11 equidistant points in time. The reactor is run in a stirrer tank with two feeds: Feed 1 contains phenylisocyanate and the solvent, feed 2 contains dimethylsulfoxide and the solvent. Both can be fed into the reactor during the process. Furthermore, the temperature can be controlled. For our numerical experiments, we parameterize the derivative of the control functions \(\dot{T}(t)\), \(\dot{f}eed_{1}(t)\), and \(\dot{f}eed_{2}(t)\) by piecewise constant functions on ten equidistant intervals. The actual process controls T(t), feed 1(t), and feed 2(t) are set up as additional differential states, so we end up with a total of six state variables. The constraints on the controls are formulated as path constraints. The full model including process constraints and measurement methods is summarized in Fig. 3. The variances of the measurement errors are assumed constant with \(\sigma _{i} = 1\).

Fig. 3
figure 3

Urethane reaction model

The initial guess for the controls is depicted in Fig. 4. It yields an objective function value (average variance) of \(\frac{1} {6}\mathop{ \mathrm{tr}}\nolimits (C) = 1936.3\).

Fig. 4
figure 4

Initial values for controls and corresponding states for the urethane example. The objective (average variance) is \(\frac{1} {6}\mathop{ \mathrm{tr}}\nolimits (C) = 1936.3\)

6.4 Results Predator-Prey

We solved the problem with both direct multiple shooting formulations introduced in Sect. 4. We keep the control discretization fixed but vary the number of shooting intervals giving rise to different, yet equivalent, NLPs. We discovered a number of different local minima that differ mainly with respect to the placing of the measurements. Table 1 shows the results for the first multiple shooting formulation—a coupled constraint for the information matrix—as well as the single shooting formulation while Table 2 shows results for the formulation where the covariance is distributed over the shooting intervals. Both algorithms terminated successfully when restarted in the minima found by the other one indicating the structural correctness of our approach.

Table 1 Performance of multiple shooting for information matrix for predator-prey
Table 2 Performance of multiple shooting for covariance matrix for predator-prey

Table 1 shows that the first formulation converges for all discretizations. However, we note how the number of SQP iterations increases when we increase the number of shooting intervals. Preliminary experiments show that this is probably due to the BFGS update that is unable to reflect negative curvature of the underlying Lagrangian. A block SQP method that can handle indefinite approximations which can reduce this effect is currently under development. When we look at the CPU time, however, we see that the benefits of the efficient evaluation scheme for the multiple shooting formulation outweighs the smaller number of SQP iterations for single shooting, making direct multiple shooting with a moderate number of shooting intervals the best overall choice. The following aspects are responsible for this:

  • derivatives with respect to controls are required only locally that means less directional derivatives of second order are needed

  • derivative evaluation is easily parallelized on a multicore machine.

While the first formulation converges for all discretizations within a reasonable number of SQP iterations, the second formulation does not converge for every multiple shooting discretization. Furthermore, many of the SQP steps were reduced steps, which means many additional constraint and objective evaluations were necessary. Overall, the behaviour was competitive to the first formulation and single shooting only for certain problem instances.

6.5 Results Urethane

We solved the problem with the more promising variant of direct multiple shooting meaning we transform it to an NLP of the form (18) where we introduce a coupled constraint for the information matrix. Again, we use different numbers of shooting intervals with the same control discretization. The problem exhibits several structurally different local minima, however, all of them yield significantly better objective values than the initial guess. The states and controls for one of them are depicted in Fig. 5.

Fig. 5
figure 5

Optimum experimental design for the urethane example. The objective (average variance) is \(\frac{1} {6}\mathop{ \mathrm{tr}}\nolimits (C) = 0.0665\)

The SQP method was able to find a local minimum for every multiple shooting discretization. The results comprising the number of major iterations, the final objective value and the CPU time in seconds are summarized in Table 3.

Table 3 Performance of single and multiple shooting for urethane

We see that the method performs comparably well in terms of SQP iterations for single and multiple shooting. For CPU time, again the results shift strongly in favor of multiple shooting because the derivative evaluation can be done much cheaper.

7 Conclusions

In this paper, we reviewed a nonstandard optimal control problem formulation of OED. For this formulation, we showed how to extend the classical direct multiple shooting method for optimal control problems for OED problems in two ways, leading to highly structured nonlinear programs. Special structures in the constraints and objective derivatives are highlighted that must be taken into account in an efficient implementation. The algorithms presented are implemented within the software package VPLAN. We presented two application examples, one of them a challenging example from chemical engineering that could be solved successfully with one of the new formulations. Our implementation outperforms an existing single shooting implementation in terms of CPU time.

We expect direct multiple shooting for OED to have more benefits for more challenging, large-scale real-life problems. Especially when nontrivial path constraints are present, as it is often the case for real-life systems, direct single shooting can run into problems finding feasible points. The direct multiple shooting method as introduced in this paper allows to choose fine shooting discretizations in critical regions and offers more flexibility for initialization. Another point is that OED, even for small nominal systems, basically requires the solution of an additional n x × n p variational system and even though this can be done efficiently using the principles of IND, its solution becomes very time consuming for large-scale systems. Here the lower sensitivity load as well as the excellent potential for parallelization provide great benefits for multiple shooting.