Abstract
Optimum experimental design (OED) for parameter identification has become a key technique in the model validation process for dynamical systems. This paper deals with optimum experimental design for systems modelled by differential-algebraic equations. We show how to formulate OED as a nonstandard nonlinear optimal control problem. The direct multiple shooting method is a state of the art method for the solution of standard optimal control problems that leads to structured nonlinear programs. We present two possibilities how to adapt direct multiple shooting to OED by introducing additional variables and constraints. We highlight special structures in the constraint and objective derivatives whose evaluation is usually the bottleneck when solving dynamic optimization problems by multiple shooting. We have implemented a structure exploiting algorithm that takes all these structures into account. Two benchmark examples show the efficiency of the new algorithm.
Access provided by Autonomous University of Puebla. Download conference paper PDF
Similar content being viewed by others
Keywords
- Optimal Control Problem
- Multiple Shooting
- Parameter Estimation Problem
- Path Constraint
- Optimum Experimental Design
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.
1 Introduction
Many processes in engineering, chemistry, or physics can be described by dynamical systems given by ordinary differential equations (ODEs) or differential-algebraic equations (DAEs). These equations usually depend on model parameters, for example material specific constants that are not directly accessible by measurements. However, as the models are often highly nonlinear, simulation results can vary strongly depending on the values of the model parameters. Thus it is desirable to estimate the model parameters with high precision. This process is called model validation. Only a validated model can be used to make meaningful predictions.
The first step in the model validation process is usually a parameter estimation. That means the model is fitted to given measurement data, yielding a first estimate for the parameters. Then one can perform a sensitivity analysis to obtain estimates for the covariance matrix of the parameters. The covariance matrix can reveal large uncertainties in the parameter values or correlations between different parameters. In this context, it is also possible to quantify the uncertainty of arbitrary model quantities of interest.
An important observation is that the covariance matrix depends not only on the parameters but also on the experimental conditions. This leads to the task of optimum experimental design (OED): Choose experimental conditions such that a subsequent parameter estimation yields parameters with minimum uncertainty. The uncertainty of the vector of parameters is characterized by a functional on the predicted covariance matrix. OED for general statistical models has been studied for several decades and it is a well-established field of research, see the textbooks [2, 10, 18]. Nonlinear OED for processes modeled by differential equations has been investigated by several authors, see, e.g., [4, 12, 15] for an overview.
In mathematical terms, optimum experimental design can be cast as a special (nonstandard) type of optimal control (OC) problem. As the objective, namely the functional on the covariance matrix, depends on first-order sensitivities of the states, variational differential equations or sensitivity equations must be explicitly included in the problem formulation, leading to large, but specially structured differential equation systems. We are interested in direct methods for OED problems, in particular direct multiple shooting, that transform the infinite dimensional optimal control problem into a finite dimensional nonlinear programming problem (NLP). The direct multiple shooting method for optimal control problems as described by Bock and Plitt [7] makes use of the partial separability of the objective function. This leads to a block-diagonal Hessian of the Lagrangian that can and should be exploited by Newton-type methods. However, a straightforward formulation of the OED objective function lacks the feature of partial separability, so special care must be taken to reformulate the OED problem as a standard optimal control problem. In [14, 16] direct multiple shooting has been applied to OED. In [13], a collocation discretization is applied to a similar problem. The main contribution of this paper consists in detailed descriptions of the structured NLPs that result from a multiple shooting discretization as well as numerical results that demonstrate their benefits and limitations.
The paper is organized as follows: In Sect. 2, we give an introduction to optimum experimental design and formulate it as a nonstandard optimal control problem. In Sect. 3, the direct multiple shooting method is described for standard optimal control problems. Afterwards, in Sect. 4 we propose two ways how to transform the OED problem to a specially structured standard OC problem. In Sect. 5 we show how to efficiently evaluate the constraint Jacobian as well as the gradient and Hessian of the objective. A numerical example from chemical engineering illustrates the efficacy of the approach in Sect. 6. Section 7 concludes.
2 Nonlinear Optimum Experimental Design for Parameter Estimation
Optimum experimental design aims at improving parameter estimation results in a statistical sense. We first introduce the type of parameter estimation problems whose results we seek to improve with OED along with some general notation that we use throughout the paper.
2.1 Parameter Estimation for Dynamical Systems
We consider a dynamical process on a fixed time horizon [t 0, t f ] that is described by the following differential-algebraic equation (DAE) system with interior point and boundary conditions:
where
Throughout the text we will use the notation x(t) = (y(t)T, z(t)T)T to denote both differential and algebraic states.
Assume measurement data \(\eta _{1},\ldots,\eta _{N_{M}}\) are available at sampling times \(t_{1},\ldots,t_{M}\) such that
where \(p^{\star }\) are the true—but inaccessible—parameters, y and z the corresponding states, and \(\varepsilon _{i}\) are independently normally distributed with zero mean and standard deviation \(\sigma _{i}\). This assumption states that the model is structurally correct and errors arise only due to inaccuracies in the measurement process. We call
the model response or observable.
The maximum likelihood parameter estimation problem can be stated as:
Parameter estimation problems constrained by differential equations can be solved by different approaches, e.g. by direct multiple shooting in combination with a generalized Gauss-Newton method, see [6].
2.2 Sensitivity Analysis
The solution \(\hat{p}\) of the parameter estimation problem (2) is a random variable due to the fact that the measurements η i are random. The variance-covariance matrix C of \(\hat{p}\) is given by:
where \(\mathcal{J}_{1} \in \mathbb{R}^{N_{m}\times n_{p}}\) and \(\mathcal{J}_{2} \in \mathbb{R}^{n_{r}\times n_{p}}\) are the Jacobians of the residual vectors of the parameter estimation problem. We denote by \(\mathcal{J}_{1,i}\) the rows of \(\mathcal{J}_{1}\) and by \(\mathcal{J}_{2,i}\) the summands that make up \(\mathcal{J}_{2}\):
We assume that \(\mathcal{J}_{2}\) has full rank and that \(\mathcal{J}_{1}^{T}\mathcal{J}_{1}\) is positive definite on \(\text{Ker}\mathcal{J}_{2}\) which implies existence of C.
In (4) we have also introduced measurement weights \(w_{i} \in \{ 0,1\},\ i = 1,\ldots,N_{m}\) for each measurement time. They are fixed in the parameter estimation context but will be design variables in the experimental design where they allow us to select or de-select measurements.
The sensitivities of the states x with respect to the parameters p are subject to the following variational differential-algebraic equations (VDAE), also called sensitivity equations:
where
Initial values for the VDAE are given by
for the variational differential states and by (7) for the variational algebraic states. Note that (6) and (7) depend on y(t) and z(t) and therefore have to be solved together with (1a) and (1b).
2.3 The Optimum Experimental Design Problem
Based on the sensitivity analysis, we can predict the variance-covariance matrix for different experimental settings that are characterized by controls u(t) as well as a choice of measurements. An experiment may also be constrained by external process constraints, e.g. safety or cost constraints.
The task of optimum experimental design is to choose experimental settings such that the predicted covariance matrix has the best properties in some statistical sense. The quality of the matrix is measured by a criterion ϕ from statistical experimental design:
-
A-criterion: \(\phi =\mathop{ \mathrm{tr}}\nolimits C\)
-
D-criterion: \(\phi =\det C\)
-
E-criterion: \(\phi =\max \{\lambda _{i}\), \(i = 1,\ldots,n_{p}\), \(\lambda _{i}\) eigenvalue of C} = | | C | | 2
-
M-criterion: \(\phi =\max \{ C_{ii}\), \(i = 1,\ldots,n_{p}\}\)
The complete optimum experimental design problem is
with the nominal DAE system (8b) and (8c) with initial values (8d), the variational DAE system (8e) and (8f) with initial values (8g), multipoint boundary constraints from the parameter estimation problem (8h), path and control constraints (8i), and integrality constraints for the measurement weights (8j). The Jacobians of the parameter estimation residuals (8k) and (8l) are given to define the covariance matrix on which a functional ϕ is minimized (8a). Note that while initial values for the nominal differential states (8b) may be degrees of freedom in the optimization, initial values for the variational differential states (8e) are explicitly defined by the relation (8g) and for the algebraic states z(t) and z p (t) they are implicitly defined by the algebraic conditions (8c) and (8f).
3 The Direct Multiple Shooting Method for Optimal Control Problems
The direct multiple shooting method for optimal control problems has been first introduced in [7]. Let us first consider the standard optimal control problem
In direct methods the infinite-dimensional optimal control problem (9) is approximated by a nonlinear programming problem (NLP) which is then solved by suitable numerical methods. The following infinite-dimensional objects of the optimal control problem must be treated adequately when setting up the finite-dimensional NLP:
-
control functions \(\tilde{u}\)
-
differential and algebraic states \(\tilde{y}\) and \(\tilde{z}\)
-
path constraints \(0 \leq \tilde{ c}(t,\tilde{y}(t),\tilde{z}(t),\tilde{u}(t))\)
3.1 Control Functions
We consider a time grid
on which the control function \(\tilde{u}(\cdot )\) is parameterized by means of local basis functions:
where the \(q^{j} \in \mathbb{R}^{n_{u}}\) are vectors of finitely many real optimization variables. We define \(q:= (q^{0},\ldots,q^{N_{c}-1})^{T}\). The local functions \(\varphi ^{j}\) are typically polynomials of low degree, e.g. linear or constant functions.
3.2 States
In shooting methods, initial value problem solvers are employed to obtain representations of the states x for given q and y 0. In the case of direct single shooting and pure ODEs, the states are regarded as dependent variables, and only q and y 0 are kept as variables in the optimization problem. Thus the tasks of simulation and optimization are kept separate.
The direct multiple shooting method for DAEs is a simultaneous strategy to resolve simulation and optimization in parallel. Again, we consider a discretization of the time horizon
where we assume without loss of generality the grid points to be a subset of the grid points of the control grid (10). On this shooting grid we consider the following set of initial value problems with initial values s x j = (s y j, s z j) that become variables in the optimization problem:
where \(\theta _{j}(\cdot )\) is a fast decreasing damping function with \(\theta (\tau _{j}^{s}) = 1\). This relaxed formulation was proposed in [8] and means that the algebraic condition (12b) is automatically consistent for any initial values s z j. That means the DAE solver does not need to solve the (nonlinear) algebraic condition in every iteration of the optimization algorithm to find feasible initial values. Instead, the nonlinear algebraic consistency conditions
are added to the optimization problem which ensures the solution of the original DAE at the solution of the optimization problem.
Note that the DAEs (12) are solved independently on the smaller time intervals [τ j s, τ j+1 s] as the initial values s x j are variables of the optimization problem. To ensure equivalence to the original system (1), continuity conditions are added to the optimization problem for every shooting interval. Let us denote by \(\tilde{y}(\tau _{j+1}^{s};s_{y}^{j},s_{z}^{j},\hat{q}^{j})\) a representation of the solution to problem (12) on the intervals [τ j s, τ j+1 s], where \(\hat{q}^{j}\) denotes the subvector of q that represents \(\tilde{u}\) on the interval [τ j s, τ j+1 s]. Then the continuity conditions read as:
Figure 1 illustrates the concept of direct multiple shooting. Note that we explicitly maintain separate grids for controls and states. A special case is of course to choose the same grid for both. However, in our experience, the decoupling of grids provides greater flexibility and a smaller number of shooting intervals can greatly accelerate convergence for problems where a relatively fine discretization of the controls is desirable.
3.3 Path Constraints
All path constraints such as (9d) that are required to hold at infinitely many points are evaluated on finitely many checkpoints only. Let us assume—without loss of generality—that the checkpoints are the grid points of the multiple shooting grid (10). Then the discretized path constraints read as
Depending on the choice of the time grid, the constraints might be violated in between grid points. There exist strategies how to adaptively add checkpoints, see, e.g., [17], but to keep the notation simple we assume for the scope of this paper that they match the grid points of the shooting grid.
3.4 Structured NLP
We have now addressed all constraints of the optimal control problem and can formulate the structured multiple shooting NLP as follows:
where (14c) and (14d) are the continuity and consistency conditions that guarantee the solution of the original DAE (1) at the solution of the optimizationproblem.
In Newton-type methods, the Jacobian of the constraints and the Hessian of the Lagrangian are of special importance. It is clear that the evaluation of the continuity and consistency constraints with index j only depend on variables s x j and \(\hat{q}^{j}\) in a nonlinear way. This leads to a constraint Jacobian that has a banded structure and a Hessian of the Lagrangian with a block diagonal structure according to the shooting discretization. These structures can be seen in the structure of the KKT matrix as depicted in Fig. 2.
Depending on the shooting discretization, problems of type (14) can be very large, but sparse. Algorithmic techniques such as condensing (see [7]) exploit this sparsity and reduce the additional effort considerably that is caused by the larger matrices when using a fine multiple shooting discretization.
4 Optimum Experimental Design as Separable NLP
We now want to apply the multiple shooting discretization as described in the previous section to the optimum experimental design problem (8). In particular we need to extend the problem formulation (14) to cope with the special kind of coupled objective that is characteristic for OED. Multiple Shooting for OED problems has first been applied in [16] and has been further investigated in [14].
4.1 Measurements
The grid of possible measurements depends on the process and should be independent of the shooting and control grid. In particular, more than one measurement could be taken at the same time, see [15].
In the original formulation, integrality of the measurement weights is required. In our formulation we employ a continuous relaxation:
In practice, this often yields satisfactory results, as a bang-bang structure is observed for the measurements and so integrality is satisfied automatically. In fact there is also some theoretical evidence for this, see [20].
The measurement weights, along with the controls, are experimental design variables. All simple bounds and linear constraints on the measurement weights fit into the framework of general path constraints and linearly coupled interior point constraints (9d) and (9e).
4.2 Dynamical System
We combine nominal and variational states into one system:
for the differential states and
for the algebraic states, where we denote by \(\downharpoonright \!\cdot \!\downharpoonleft \) the map that combines the columns of an m × n matrix into a single m ⋅ n column vector by stacking them one below the other.
That leaves us with the new DAE system
This system has of course a special structure that can and should be exploited in an efficient implementation. We will give details on this in Sect. 5.
4.3 Objective Function
An important difference between the problem of optimum experimental design (8) and the standard optimal control problem (9) is the nonlinear coupling in time in the objective that is due to the inversion when computing the covariance matrix, as it has been noted in [16]. In particular this violates the property of partial separation of the Lagrange function that is responsible for its sparse, block-diagonal Hessian. We present two approaches to resolve that nonlinear coupling.
4.3.1 Linearly Coupled Constraint for Information Matrix
Recall the definition of the covariance matrix:
with the rows of \(\mathcal{J}_{1}\) and the summands that constitute \(\mathcal{J}_{2}\) as defined by (4) and (5):
In [16] it has been pointed out that
Note that in particular, \(\mathcal{J}_{1,i}\) and \(\mathcal{J}_{2,i}\) depend on evaluations of the nominal and variational states x and x p at individual points t i . Thus, (17) implies that the matrices \(\mathcal{J}_{1}^{T}\mathcal{J}_{1}\) and \(\mathcal{J}_{2}\) only exhibit a linear coupling in time. In a multiple shooting context, we assign the points t i to the proper shooting intervals and plug in the representation of the solution \(x(\tau _{j+1}^{s};s_{y}^{j},s_{z}^{j},\hat{q}^{j})\) and \(x_{p}(\tau _{j+1}^{s};s_{\tilde{y}}^{j},s_{\tilde{z}}^{j},\hat{q}^{j})\), respectively. We write this as
We introduce additional variables H and J and linearly coupled constraints that fit into the framework of (14). The objective then only depends on the newly introduced variables H and J and we obtain the following structured NLP:
4.3.2 Pseudo States for Covariance Matrix
Another possibility to resolve the coupling in the objective is to move the computation of the covariance to the constraints by deriving a recursion formula.
Using this formula, we introduce matrix-valued variables
and constraints for the recursion at the multiple shooting nodes and add them as additional variables to the NLP. This resembles the treatment of dynamical states in a multiple shooting method, hence we refer to C j as pseudo states for the covariance matrix.
Let us first derive the formula in the unconstrained case: Let H j denote the information matrix including all terms up to time τ j . Then H j is given as the sum of H j−1 and the information gain by measurements and constraint evaluations in the interval (τ j−1, τ j ]:
where we start with some initial information given as a positive definite matrix H 0, e.g., from previous experiments or from literature.
At every grid point τ j the covariance matrix taking into account all measurements up to time τ j is the inverse of H j. From (20) we obtain as recursion formula for the covariance matrix:
Equation (22) can be simplified to
This can be easily generalized to the case of constrained parameter estimation problems and we obtain the complete NLP by replacing the coupled constraints (18b) and (18c) by the pseudo continuity constraints
As initial values for the pseudo states we take some a priori uncertainty given by a positive definite matrix C 1 0 and a full rank matrix C 2 0 such that C 2 0 C 2 0T is positive definite to ensure invertibility of C 0. The a priori uncertainty should be chosen several orders of magnitude larger than the expected uncertainty after the OED to make sure that the choice of C 1 0 and C 2 0 does not interfere significantly with the solution. The objective for this formulation simplifies to
While this formulation seems to be very much in the spirit of multiple shooting—the covariance of the system is modelled as a kind of state on each interval which is coupled via continuity-type constraints, the nonlinearity of the objective is distributed over the shooting intervals—it is computationally much less appealing: The objective (24) is simpler because the matrix inversion is now contained in the constraints, however, the pseudo continuity constraints (23) are numerically delicate, each containing two matrix inversions (in fact, we can reduce this to one by appropriate reformulation) of potentially ill-conditioned matrices.
5 Evaluation of Problem Functions
After discretization we need to solve the large but structured NLP (18). In derivative based methods such as sequential quadratic programming often the number of variables and constraints corresponds to the overall runtime of the method. In shooting methods for optimal control and especially optimum experimental design, however, the runtime is often dominated by the evaluation of states and their derivatives within the continuity constraints because they comprise possibly expensive calls to external numerical integrators. Thus it is worthwhile to have a closer look at the structures of constraints and objective and derive efficient and accurate evaluation schemes. We concentrate on the formulation (18) of the problem—linearly coupled constraints for the information matrix—as it is numerically more promising.
5.1 Constraint Derivatives
The OED problem (18) has a special structure in the constraints due to the fact that the dynamic system consists of closely related states, namely nominal and corresponding variational states. Ideally, they are evaluated together using the principle of internal numerical differentiation (IND), see [1, 5].
Let us now have a closer look at the derivatives of continuity and consistency constraints (18e) and (18f) for the system that consists of nominal and variational states.
We denote by
the shooting variables for the nominal and variational states at one shooting node.
Observation 1. Variational states for different parameters are independent. This means
Observation 2. By differentiating (6) and (7), we see that the derivative of a variational state with respect to its initial value satisfies the following n x × n x variational DAEs:
These are the same equations that describes the sensitivity of the nominal states x with respect to their initial values and hence
In particular that means that we do not need to evaluate any additional state sensitivities when we explicitly discretize the variational equations by multiple shooting. Instead \(\frac{\partial x(\tau )} {\partial s_{x}}\) needs to be computed only once for each shooting interval and then can be used multiple times in the constraint Jacobian.
The part of the constraint Jacobian that corresponds to the derivative of the continuity constraints (18e) coupling the states starting at τ j−1 and at τ j has the following structure:
where the variable vector is ordered as follows:
A similar structure can be observed for the consistency conditions (18f).
Note that the derivatives \(\frac{\partial y} {\partial (\cdot )}\) are first-order and \(\frac{\partial y_{p}} {\partial (\cdot )}\) are second-order sensitivities of the states and must be supplied by the integrator.
5.2 Objective Derivatives
The objective (18a) also deserves special attention as it is a nontrivial—but more or less fixed for the whole problem class—function that in particular comprises the inversion of a symmetric matrix.
First of all we note that (18a) only depends on the newly introduced variables H and J. This allows us to derive explicit formulas for the first and second derivative of the objective. Furthermore H and J enter the constraints only linearly through the coupled constraints (18b) and (18c), so the second derivative of the objective contains the entire curvature of the problem with respect to H and J. This can be used to cheaply compute part of the Hessian approximations in SQP methods without the need to evaluate state sensitivities of higherorder.
We only discuss the case of unconstrained parameter estimation problems (i.e. no matrix J) and \(\phi (\cdot ) =\mathop{ \mathrm{tr}}\nolimits (\cdot )\) as optimization criterion.
5.2.1 Objective Gradient
We have an objective ϕ that maps a symmetric matrix to a scalar, however, we only include the lower triangular matrix as degrees of freedom in the nonlinear program. So whenever we need the gradient of the objective in a derivative based method we need derivatives of ϕ with respect to every entry H uv , 1 ≤ v ≤ u ≤ n p .
In the unconstrained case we have \(H^{-1} = C\) and for every entry H uv
Using the general formula
we obtain for the derivative with respect to a fixed entry H uv :
Taking symmetry into account, we have
and thus
For \(\phi =\mathop{ \mathrm{tr}}\nolimits\) this simplifies to
This can be calculated immediately once the covariance matrix C is computed.
5.2.2 Objective Hessian
We now use formulae (28) and (29) to derive an explicit formula for the Hessian of \(\mathop{\mathrm{tr}}\nolimits (H^{-1})\) with respect to the entries of H.
When computing \(\frac{\partial ^{2}\mathop{ \mathrm{tr}}\nolimits (H^{-1})} {\partial H_{uv}\partial H_{rs}}\) we distinguish between three cases:
-
H uv and H rs diagonal elements
-
H uv diagonal, H rs off-diagonal element
-
H uv and H rs off-diagonal elements.
For two diagonal elements we obtain:
If we have one diagonal and one off-diagonal element we compute:
Taking the second derivative with respect to two off-diagonal elements yields:
As the variables H enter the constraints only linearly we note that the objective Hessian with respect to H is in fact the same as the Hessian of the Lagrangian with respect to H.
6 Numerical Results
We test our methods on two examples: The first one is a predator-prey model adapted to OED [19] that serves as proof of concept for both formulations. The second one is a more involved example from chemical engineering: the urethane reaction [15]. We report SQP iterations which basically amount to derivative evaluations as well as CPU time and compare the results to a single shooting implementation.
6.1 Implementation
We implemented direct multiple shooting for OED within our software package VPLAN [15] that allows to formulate, simulate and optimize DAE models. From a user specified formulation of the nominal DAE system, parameters, controls, and process constraints it generates structured NLPs for OED as described in Sect. 4. A special focus is on the efficient sparse evaluation of the constraints and their derivatives as outlined in Sect. 5 and parallelization of state integration and derivative evaluation on multi experiment and shooting node level.
As integrator we use DAESOL [3], a variable order and stepsize BDF method that can efficiently compute sensitivities of first and second order. The absolute and relative tolerance for all computations was set to 10−9.
For the solution of the structured nonlinear programs, we implemented a filter based line search SQP method as described in [21]. The block structured quadratic subproblems are solved by a modified version of the parametric active set solver qpOASES [11] that uses direct sparse linear algebra. The Hessian of the Lagrangian is approximated blockwise depending on the number of shooting intervals. For each block a positive definite damped BFGS update is employed which is scaled by the centered Oren-Luenberger sizing factor as described in [9]. Both full space and limited memory updates are available. In a multiple shooting context, the exact objective Hessian is computed cheaply as discussed in Sect. 5.2 and used for the lowermost diagonal block instead of a BFGS approximation. The optimality and nonlinear feasibility tolerance are set to 10−5. Details of the SQP implementation will be discussed in an upcoming publication.
All results were obtained on a workstation with two Intel Xeon hexacore CPUs (2.4 GHz) allowing 24 parallel threads in total and 32 GB RAM running Ubuntu 12.04.
6.2 Example 1: Predator-Prey Model
We consider a predator-prey dynamics taken from [19] on the time horizon [0, 12] with fixed initial values and an additional fishing term 0 ≤ u(t) ≤ 1 as control:
We assume \(p_{1} = p_{2} = 1\) and that both states can be observed at most four times during the experiment with constant variances for the measurement errors, i.e., \(\sigma _{i} = 1\). Both the control u(t) and the grid of possible measurements are discretized on 50 equidistant intervals. The objective is to minimize the average variance of p 1 and p 2, i.e. \(\frac{1} {2}\mathop{ \mathrm{tr}}\nolimits (C)\).
The initial guess for the controls is \(u(t) \equiv 0.3\) and all 50 measurements selected, i.e. \(w_{i}^{1} = w_{i}^{2} = 1\), \(i = 1,\ldots,50\). It yields an objective function value (average variance) of \(\frac{1} {2}\mathop{ \mathrm{tr}}\nolimits (C) = 0.00683\) but note that this design is infeasible: all 50 possible measurements for both states are selected but only four per state are allowed.
6.3 Example 2: Urethane Reaction
The Urethane reaction is a well-known example from chemical reaction kinetics, see [15]. The reaction scheme is the following:
Educts are phenylisocyanate A and butanol B in the solvent dimethylsulfoxide L. During the reaction the product urethane C, the byproduct allophanate D and the byproduct isocyanate E are formed.
The products C, D, and E are modelled as differential states while A, B, and L can be computed from C, D, and E using molar number balance. In total, six parameters have to be identified, namely the frequency factors and activation energies for the Arrhenius kinetics. The objective is to minimize \(\frac{1} {6}\mathop{ \mathrm{tr}}\nolimits (C)\). To achieve this, one experiment is to be designed for the time horizon [t 0, t end ] = [0h, 80h] and one out of three possible measurements can be taken at each of 11 equidistant points in time. The reactor is run in a stirrer tank with two feeds: Feed 1 contains phenylisocyanate and the solvent, feed 2 contains dimethylsulfoxide and the solvent. Both can be fed into the reactor during the process. Furthermore, the temperature can be controlled. For our numerical experiments, we parameterize the derivative of the control functions \(\dot{T}(t)\), \(\dot{f}eed_{1}(t)\), and \(\dot{f}eed_{2}(t)\) by piecewise constant functions on ten equidistant intervals. The actual process controls T(t), feed 1(t), and feed 2(t) are set up as additional differential states, so we end up with a total of six state variables. The constraints on the controls are formulated as path constraints. The full model including process constraints and measurement methods is summarized in Fig. 3. The variances of the measurement errors are assumed constant with \(\sigma _{i} = 1\).
The initial guess for the controls is depicted in Fig. 4. It yields an objective function value (average variance) of \(\frac{1} {6}\mathop{ \mathrm{tr}}\nolimits (C) = 1936.3\).
6.4 Results Predator-Prey
We solved the problem with both direct multiple shooting formulations introduced in Sect. 4. We keep the control discretization fixed but vary the number of shooting intervals giving rise to different, yet equivalent, NLPs. We discovered a number of different local minima that differ mainly with respect to the placing of the measurements. Table 1 shows the results for the first multiple shooting formulation—a coupled constraint for the information matrix—as well as the single shooting formulation while Table 2 shows results for the formulation where the covariance is distributed over the shooting intervals. Both algorithms terminated successfully when restarted in the minima found by the other one indicating the structural correctness of our approach.
Table 1 shows that the first formulation converges for all discretizations. However, we note how the number of SQP iterations increases when we increase the number of shooting intervals. Preliminary experiments show that this is probably due to the BFGS update that is unable to reflect negative curvature of the underlying Lagrangian. A block SQP method that can handle indefinite approximations which can reduce this effect is currently under development. When we look at the CPU time, however, we see that the benefits of the efficient evaluation scheme for the multiple shooting formulation outweighs the smaller number of SQP iterations for single shooting, making direct multiple shooting with a moderate number of shooting intervals the best overall choice. The following aspects are responsible for this:
-
derivatives with respect to controls are required only locally that means less directional derivatives of second order are needed
-
derivative evaluation is easily parallelized on a multicore machine.
While the first formulation converges for all discretizations within a reasonable number of SQP iterations, the second formulation does not converge for every multiple shooting discretization. Furthermore, many of the SQP steps were reduced steps, which means many additional constraint and objective evaluations were necessary. Overall, the behaviour was competitive to the first formulation and single shooting only for certain problem instances.
6.5 Results Urethane
We solved the problem with the more promising variant of direct multiple shooting meaning we transform it to an NLP of the form (18) where we introduce a coupled constraint for the information matrix. Again, we use different numbers of shooting intervals with the same control discretization. The problem exhibits several structurally different local minima, however, all of them yield significantly better objective values than the initial guess. The states and controls for one of them are depicted in Fig. 5.
The SQP method was able to find a local minimum for every multiple shooting discretization. The results comprising the number of major iterations, the final objective value and the CPU time in seconds are summarized in Table 3.
We see that the method performs comparably well in terms of SQP iterations for single and multiple shooting. For CPU time, again the results shift strongly in favor of multiple shooting because the derivative evaluation can be done much cheaper.
7 Conclusions
In this paper, we reviewed a nonstandard optimal control problem formulation of OED. For this formulation, we showed how to extend the classical direct multiple shooting method for optimal control problems for OED problems in two ways, leading to highly structured nonlinear programs. Special structures in the constraints and objective derivatives are highlighted that must be taken into account in an efficient implementation. The algorithms presented are implemented within the software package VPLAN. We presented two application examples, one of them a challenging example from chemical engineering that could be solved successfully with one of the new formulations. Our implementation outperforms an existing single shooting implementation in terms of CPU time.
We expect direct multiple shooting for OED to have more benefits for more challenging, large-scale real-life problems. Especially when nontrivial path constraints are present, as it is often the case for real-life systems, direct single shooting can run into problems finding feasible points. The direct multiple shooting method as introduced in this paper allows to choose fine shooting discretizations in critical regions and offers more flexibility for initialization. Another point is that OED, even for small nominal systems, basically requires the solution of an additional n x × n p variational system and even though this can be done efficiently using the principles of IND, its solution becomes very time consuming for large-scale systems. Here the lower sensitivity load as well as the excellent potential for parallelization provide great benefits for multiple shooting.
References
Albersmeyer, J.: Adjoint based algorithms and numerical methods for sensitivity generation and optimization of large scale dynamic systems. Ph.D. thesis, Ruprecht-Karls-Universität Heidelberg (2010)
Atkinson, A.C., Donev, A.: Optimum Experimental Designs. Oxford Statistical Sciences Series, vol. 8. Oxford University Press, Oxford (1992)
Bauer, I., Bock, H.G., Schlöder J.P.: DAESOL – a BDF-code for the numerical solution of differential algebraic equations. Internal Report, IWR, SFB 359, Universität Heidelberg (1999)
Bauer, I., Bock, H.G., Körkel, S., Schlöder, J.P.: Numerical methods for optimum experimental design in DAE systems. J. Comput. Appl. Math. 120(1–2), 1–15 (2000)
Bock, H.G.: Numerical treatment of inverse problems in chemical reaction kinetics. In: Ebert, K.H., Deuflhard, P., Jäger, W. (eds.) Modelling of Chemical Reaction Systems. Springer Series in Chemical Physics, vol. 18, pp. 102–125. Springer, Heidelberg (1981)
Bock, H.G.: Randwertproblemmethoden zur parameteridentifizierung in systemen nichtlinearer differentialgleichungen. Bonner Mathematische Schriften, vol. 183. Universität Bonn, Bonn (1987)
Bock, H.G., Plitt, K.J.: A Multiple Shooting algorithm for direct solution of optimal control problems. In: Proceedings of the 9th IFAC World Congress, pp. 242–247. Pergamon Press, Budapest (1984). Available at http://www.iwr.uni-heidelberg.de/groups/agbock/FILES/Bock1984.pdf
Bock, H.G. Eich, E., Schlöder. J.P.: Numerical solution of constrained least squares boundary value problems in differential-algebraic equations. In: Strehmel, K. (ed.) Numerical Treatment of Differential Equations. Proceedings of the NUMDIFF-4 Conference, Halle-Wittenberg, 1987. Texte zur Mathematik, vol. 104, pp. 269–280. Teubner, Leipzig (1988)
Contreras, M., Tapia, R.A.: Sizing the BFGS and DFP updates: numerical study. J. Optim. Theory Appl. 78(1), 93–108 (1993)
Fedorov, V.V.: Theory of Optimal Experiments. Elsevier, Amsterdam (1972)
Ferreau, H.J., Kirches, C., Potschka, A., Bock, H.G., Diehl, M.: qpOASES: A parametric active-set algorithm for quadratic programming. Math. Program. Comput. 6(4), 327–363 (2014)
Franceschini, G., Macchietto, S.: Model-based design of experiments for parameter precision: state of the art. Chem. Eng. Sci. 63, 4846–4872 (2008)
Hoang, M.D., Barz, T., Merchan, V.A., Biegler, L.T., Arellano-Garcia, H.: Simultaneous solution approach to model-based experimental design. AIChE J. 59(11), 4169–4183 (2013)
Janka, D.: Optimum experimental design and multiple shooting. Master’s thesis, Universität Heidelberg, Heidelberg (2010)
Körkel, S.: Numerische methoden für optimale versuchsplanungsprobleme bei nichtlinearen DAE-modellen. Ph.D. thesis, Universität Heidelberg, Heidelberg (2002)
Körkel, S., Potschka, A., Bock, H.G., Sager, S.: A multiple shooting formulation for optimum experimental design. Math. Program. (2012, submitted revisions)
Potschka, A., Bock, H.G., Schlöder, J.P.: A minima tracking variant of semi-infinite programming for the treatment of path constraints within direct solution of optimal control problems. Optim. Methods Softw. 24(2), 237–252 (2009)
Pukelsheim, F.: Optimal design of experiments. In: Classics in Applied Mathematics, vol. 50. SIAM, Philadelphia (2006). ISBN:978-0-898716-04-7.
Sager, S.: MIOCP benchmark site. (2014) http://mintoc.de
Sager, S.: Sampling decisions in optimum experimental design in the light of Pontryagin’s maximum principle. SIAM J. Control. Optim. 51(4), 3181–3207 (2013)
Wächter, A., Biegler, L.T.: Line search filter methods for nonlinear programming: motivation and global convergence. SIAM J. Optim. 16(1), 1–31 (2005)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2015 Springer International Publishing Switzerland
About this paper
Cite this paper
Janka, D., Körkel, S., Bock, H.G. (2015). Direct Multiple Shooting for Nonlinear Optimum Experimental Design. In: Carraro, T., Geiger, M., Körkel, S., Rannacher, R. (eds) Multiple Shooting and Time Domain Decomposition Methods. Contributions in Mathematical and Computational Sciences, vol 9. Springer, Cham. https://doi.org/10.1007/978-3-319-23321-5_4
Download citation
DOI: https://doi.org/10.1007/978-3-319-23321-5_4
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-23320-8
Online ISBN: 978-3-319-23321-5
eBook Packages: Mathematics and StatisticsMathematics and Statistics (R0)