Abstract
Direct Multiple Shooting is a flexible and efficient method to solve difficult optimal control problems constrained by ordinary differential equations or differential-algebraic equations. The aim of this article is to concisely summarize the main conceptual and methodological approaches to solve also optimal control problems with parabolic partial differential equations constraints via a Direct Multiple Shooting method. The main obstacle is the sheer size of the discretized optimization problems. We explain a typical direct discretization approach and discuss an inexact SQP method based on two-grid Newton-Picard preconditioning. Special attention is given to a-posteriori κ-estimators that monitor contraction and to the structure-exploiting treatment of the resulting large-scale quadratic programming subproblems, including an extended condensing technique that exploits Multiple Shooting and two-grid Newton-Picard structures. Finally, we present numerical results for an advection-diffusion and a bacterial chemotaxis example.
Access provided by Autonomous University of Puebla. Download conference paper PDF
Similar content being viewed by others
Keywords
- Optimal Control Problem
- Quadratic Programming
- Coarse Grid
- Multiple Shooting
- Sequential Quadratic Programming
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.
1 Introduction
The early approaches for optimal control for ordinary differential equations (ODEs) were mostly based on Dynamic Programming [3] or Pontryagin’s Maximum Principle [16]. Due to the curse of dimensionality, Dynamic Programming is not applicable to problems with more than a few state variables. The Maximum Principle is an indirect method in the sense that the controls are given as a closed form representation of adjoint states, the so-called co-states, which satisfy an adjoint differential equation and suitable boundary constraints. The resulting boundary value problems in the states and co-states can then be discretized by numerical methods for boundary value problems like Multiple Shooting [15] or Collocation [21]. The Maximum Principle is a typical example for an optimize-then-discretize approach. It was found out later that ideas from numerical methods for boundary value problems can also be used on the optimal control problem itself instead of its optimality conditions, thus disposing of the need to formulate adjoint equations, which can be a time-consuming affair (discretize-then-optimize). Furthermore, this approach lends itself quite naturally to a direct formulation, in which the discretized controls are not eliminated in favor of adjoint variables. For Direct Collocation, we refer the reader to [4, 25] and for Direct Multiple Shooting to [6]. Direct formulations have been found to reflect the real conditioning of the optimal control problem better, in particular when the mapping of the adjoint to the control has a large Lipschitz constant. We want to remark that there are also direct optimize-then-discretize methods for which the control is kept in the system of optimality conditions and its discretization.
In optimal control for partial differential equations (PDEs), indirect optimize-then-discretize methods prevail (see, e.g., [24]). This is mostly due to the fact that indirect methods usually yield more accurate representations of the control (see also [11, Chap. 3]).
The contribution of this article is to summarize a Direct Multiple Shooting method for parabolic PDE constrained problems from Potschka [17, 18] in a concise form and to present numerical results for problems that could also be treated well with indirect methods. The main challenge in this approach is to tame the large-scale nature of the discretized Nonlinear Programming problems (NLPs), which can be mastered by the use of inexact Sequential Quadratic Programming (SQP) with two-grid Newton-Picard preconditioning.
The main strengths of the Direct Multiple Shooting approach can unfold in particular for practitioners who want to solve parabolic PDE constrained optimization problems and need an accurate representation of the system dynamics, but can live with lower resolution of the optimal control. This situation is not unusual due to physical restrictions of manipulator hardware. These practitioners can then benefit from time savings in the problem setup, because no adjoint equations need to be derived, but also in the problem solution, because the method can be easily parallelized on the basis of the Multiple Shooting structures. Furthermore, the solutions delivered by Direct Multiple Shooting can serve as initial guesses for other methods with higher accuracy.
2 Problem Formulation
Let \(\varOmega \subset \mathbb{R}^{d}\) be a bounded d-dimensional spatial domain with sufficiently regular boundary ∂ Ω. Regarding time, we only consider the fixed interval (0, 1) here for simplicity. We assume that the control satisfies
with n q d distributed and n q b boundary controls. We would already like to point out that the pure boundary control case n q d = 0 is especially advantageous in a Multiple Shooting approach, as we describe in Sect. 3.
We consider two kinds of coupled state variables: ODE states \(v \in C^{1}([0,1]; \mathbb{R}^{n_{v}})\), which are not spatially distributed, and PDE states \(u \in W(0,1)^{n_{u}}\), for which we use the standard Hilbert space for parabolic PDEs
where (V, L 2(Ω), V ∗) is a Gelfand triple and ∂ t u denotes the time derivative of u in the sense of vectorial distributions (see, e.g., [29, Chap. IV]). We assume that the spatial regularity of u is \(V \subseteq H^{1}(\varOmega )\). According to [7, Chap. XVIII, Theorem 1], we can trade some spatial regularity for higher temporal regularity, because W(0, 1) is continuously embedded in C 0([0, 1]; L 2(Ω)). Thus, we have existence of the trace \(u(t) \in L^{2}(\varOmega )^{n_{u}}\) for all t ∈ [0, 1], which is important for the formulation of boundary constraints in time.
We assume that the time derivative \(\dot{v}\) equals a sufficiently regular nonlinear function \(f^{\mathrm{ODE}}: Q \times L^{2}(\varOmega )^{n_{u}} \times \mathbb{R}^{n_{v}} \rightarrow \mathbb{R}^{n_{v}}\) and − ∂ t u equals a possibly nonlinear elliptic differential operator \(A: Q \times V \times \mathbb{R}^{n_{v}} \rightarrow V ^{{\ast}}\). Furthermore, we allow for generalized temporal boundary constraints, mixed control-ODE-state path constraints and final ODE state constraints via the functions
The objective function \(\varPhi: L^{2}(\varOmega )^{n_{u}} \times \mathbb{R}^{n_{v}} \rightarrow \mathbb{R}\) depends on the final values of the ODE and PDE states. Finally, we can state the problem of interest of this article as
More general problems with free initial and final time or integral-type objectives can be equivalently reformulated using extra v variables to fit problem class (1). Further assumptions concerning regularity requirements on the occurring functions above is a delicate issue. From a practical point of view, we require that (1) is well-posed and that there exists an appropriate discretization for the involved variables and functions such that the solution of the resulting finite dimensional optimization problem is consistent with (1).
3 Discretization
In Potschka [17], problem (1) is discretized in space first and then in time. We explain here a different route, where time is discretized before space. In both cases, though, the resulting discretized problem (3) exhibits the same structure.
3.1 Time Discretization
We start out by discretizing the controls in time. To this end, let \(0 = t^{0} <\ldots < t^{n_{\text{MS}}} = 1\) denote a partition of the interval [0, 1]. On the shooting intervals \(I^{i}:= (t^{i},t^{i+1}),i = 0,\mathop{\ldots },n_{\text{MS}} - 1\), we perform a piecewise discretization of the controls in time. The piecewise nature is important for decoupling properties to be exploited in the numerics. We restrict ourselves here to the simplest case of a piecewise constant discretization in time such that
where \(\chi _{I^{i}}\) denotes the characteristic function of I i.
In the next step, we parametrize the state variables by local initial values \(u^{i},\boldsymbol{v}^{i},i = 0,\mathop{\ldots },n_{\text{MS}}\): We assume that the local initial value problems of the form (1b)–(1c) on I i with initial values \(u^{i},\boldsymbol{v}^{i}\) admit a unique solution denoted by
Continuity of the states on the full time horizon (0, 1) must then be enforced with matching conditions of the form
At this stage of the discretization procedure, we have arrived at a problem depending only on variables that are either real vectors or functions distributed in space only, namely \(q^{i},u^{i},\boldsymbol{v}^{i},i = 0,\mathop{\ldots },n_{\text{MS}}\). Note, that we have en passant added an additional control variable \(q^{n_{\text{MS}}}\) to lend a common structure to the optimization variables corresponding to each shooting node t i . We have to eliminate these additional degrees of freedom later via (3f).
3.2 Space Discretization
Now, we can discretize q i and u i in space, e.g., by Finite Differences, Finite Elements, or Finite Volumes. We denote the corresponding vectors by \(\boldsymbol{q}^{i}\) and \(\boldsymbol{u}^{i}\). The quantities \(\overline{u}^{i}\) and \(\overline{v}^{i}\) can then be discretized by an appropriate time stepping method. We denote the result by \(\overline{\boldsymbol{u}}^{i}(t^{i+1};\boldsymbol{q}^{i},\boldsymbol{u}^{i},\boldsymbol{v}^{i})\) and \(\overline{\boldsymbol{v}}^{i}(t^{i+1};\boldsymbol{q}^{i},\boldsymbol{u}^{i},\boldsymbol{v}^{i})\). It is not mandatory to use the same grids for the controls or states on all shooting intervals, or even within one shooting interval for the time steps of states in case of a Rothe-type time stepping scheme. According to [10], a good compromise for balancing good mesh adaptivity with few Finite Element matrix assembly calls is to fix the state mesh on each shooting interval and discretize the matching conditions (2) in a variational manner. In this case, a Method of Lines discretization in time seems to be a flexible and efficient approach, because existing ODE solvers can be used. For a more detailed discussion of discretization issues of parabolic PDEs, we refer the reader to [23].
For simplicity, we assume here that \(\boldsymbol{q}^{i}\) and \(\boldsymbol{u}^{i}\) are based on the same spatial discretization in each shooting node. However, we assume that there is a hierarchy of nested discretizations \(V _{h}^{1} \subset V _{h}^{2} \subset \cdots \subset V\) for the states, which in turn yields on each level \(\ell= 1,2,\mathop{\ldots }\) discretized states and shooting solutions. The path constraint is discretized on the shooting grid only. For more sophisticated methods, see Potschka [19].
All remaining functions of problem (1) need to be appropriately discretized as well. Finally, we arrive at a highly structured, finite dimensional optimization problem on each spatial discretization level ℓ
The numerical evaluation of (3c) and (3e) is expensive, because it comprises the solution of an initial value problem for the spatially discretized PDE. To this end, we use an adaptive backward differentiation formula with monitoring strategy [1, 2], which is implemented in the C++ code DAESOL-II. It also supplies efficient first and second order forward and backward directional derivatives of the solutions of the initial value problems on the basis of Internal Numerical Differentiation [5] and Algorithmic Differentiation [9, 28], for which the backwards mode of Algorithmic Differentiation delivers an automated and efficient way to compute gradients in the style of adjoint equations.
We want to remark that special attention has to be paid to the question of consistency of the discretized problem (3) with the original problem (1) for fine discretizations. For instance, it is necessary to use correct weighting matrices for the discrete approximations of Hilbert space inner products and their corresponding norms. Their inverses must also be used for correct discrete Riesz representations of discrete variables that really live in the dual space, for example the Lagrange multipliers of (3b) and (3c). Then, it is typically possible to prove consistency of (3) and (1) for the particular problem at hand.
4 Newton-Picard Inexact SQP
When disregarding its special structure for a moment, we see that problem (3) is a Nonlinear Programming (NLP) problem of the form
with \(f: \mathbb{R}^{n} \rightarrow \mathbb{R}\) and \(g: \mathbb{R}^{n} \rightarrow \mathbb{R}^{m}\). We assume throughout that f and g are twice continuously differentiable functions and that the sets \(\mathcal{E}\) and \(\mathcal{I}\) form a partition of \(\{1,\mathop{\ldots },m\} = \mathcal{E}\ \dot{\cup }\ \mathcal{I}\). For \(z = (x,y) \in \mathbb{R}^{n+m}\), we can then define the Lagrangian function
For the theory of NLP we refer the reader to [14].
4.1 The Equality Constrained Case
We first consider the equality constrained case \(\mathcal{I} =\{\}\). If \(x^{{\ast}}\in \mathbb{R}^{n}\) satisfies a constraint qualification and is a local minimizer of (3), then there exists a Lagrange multiplier \(y^{{\ast}}\in \mathbb{R}^{m}\) such that with z ∗ = (x ∗, y ∗)
In practice, a Newton-type method is used to solve the necessary optimality condition (5) in order to obtain a critical point z ∗. In general we must then check (second order) sufficient conditions to ensure that x ∗ is really a minimizer and not a maximizer or saddle-point.
Because of the large-scale nature of (3), it is a good idea to employ iterative methods for the solution of the linearized subproblems within the Newton-type method. We recede to a rather simple iterative method here, namely a Linear Iterative Splitting Approach (LISA), because we can generalize it from linear systems to Quadratic Programming (QP) problems in Sect. 4.4.
Let \(N = n + m,J(z) = \frac{\mathrm{d}F} {\mathrm{d}z} (z),\) and \(\hat{M}: \mathbb{R}^{N} \rightarrow \mathbb{R}^{N\times N}\) be given such that \(\hat{M} (z)J(z) \approx \mathbb{I}_{N}\). Then, the LISA-Newton iterates are defined via
where α k ∈ (0, 1] is an appropriately chosen damping factor to ensure global convergence and Δ z k is computed via the inner LISA iteration
The iteration (7) converges for all values of F(x k) and Δ z 0 k if and only if the spectral radius κ lin of \(A_{k}:= \mathbb{I}_{N} -\hat{M} (z^{k})J(z^{k})\) is less than 1, see [22, Theorem 4.1].
The choice of the damping factors α k exceeds the scope of this article. The interested reader is referred to the discussion in [8] and its specialization for the LISA-Newton method [17].
4.2 A-Posteriori κ-Estimators
Based on [17, Lemma 4.28], we observe that there is a representation of Δ z l k for l ≥ 1 in terms of a truncated Neumann series
where M(z k) depends on l. On the basis of this M, we can observe the connection
between the linear convergence rate κ lin of the LISA iteration (7) and the linear convergence rate \(\kappa _{\varepsilon }\) (which depends on an \(\varepsilon\)-dependent norm) of the nonlinear LISA-Newton iteration (6), if each Δ z k is computed from l LISA iterations [17, Theorem 4.29]. This implies that under the assumption κ lin < 1, l = 1 is already enough for local convergence and, moreover, if we perform l > 1 LISA iterations per outer step, then this extra effort is compensated for by fewer outer LISA-Newton iterations, at least asymptotically for \(k \rightarrow \infty \).
Furthermore, this result is the basis for a-posteriori κ-estimators [17]. To explain the different estimators, we fix k and define ζ l : = Δ z l+1 k −Δ z l k. We immediately observe that ζ l+1 = A k ζ l . The Rayleigh κ-estimator is based on the Rayleigh quotient
The sequence (κ l Rayleigh) converges to κ lin for \(l \rightarrow \infty \) if A k is diagonalizable, has a single eigenvalue of largest modulus, and ζ 1 has a nonzero component in the corresponding eigenspace. For non-diagonalizable A k , the Root κ-estimator
can be used. It converges if κ lin > 0 and ζ 1 has a nonzero component in the dominant invariant subspace of A k . However, convergence to κ lin can be quite slow. In contrast, the Ritz κ-estimator converges in a finite number of iterations. It is based on the largest Ritz value κ l Ritz of A k on the order-l Krylov subspace generated by A k and ζ 1. The disadvantage is that an orthonormal basis of the Krylov subspace needs to be maintained, which can be prohibitive due to excessive memory consumption. In our numerical experience, this is not a problem because the Newton-Picard based preconditioners \(\hat{M}\) described in Sect. 4.3 require only few LISA iterations.
The κ-estimators can also be used to asses the error of the LISA iteration (7) in the following sense: For all \(\varepsilon \in (0,1 -\kappa _{\mathrm{lin}})\), there exists a vector-norm \(\left \Vert.\right \Vert _{{\ast},k}\) with corresponding matrix-norm \(\left \Vert A_{k}\right \Vert _{{\ast},k} \leq \kappa _{\mathrm{lin}}+\varepsilon < 1\) such that (compare [17, Lemma 4.30])
In our implementation, we have l ≥ 2 and terminate the LISA iterations as soon as either the κ-estimate is lower than \(\kappa _{\mathrm{max}} = \sqrt{1/2}\) or a maximum number of LISA iterations, in our case l max = 7, is reached. In the latter case, we ameliorate the preconditioner \(\hat{M}\) in order to reduce the contraction rate κ lin as discussed in the following section. In most outer iterations only l = 2 LISA iterations are required. We used the Ritz κ-estimator for the computations in Sect. 6.
4.3 Two-Grid Newton-Picard Preconditioning
The requirement κ lin < 1 for convergence of the LISA iteration (7) is a strong requirement, which calls for specially tailored preconditioners. In our case, Newton-Picard preconditioning is the method of choice for computing \(\hat{M}\), see [20]. Although grid-independent convergence has so far only been proven for a linear quadratic model problem with single shooting, our numerical experience on nonlinear problems with Multiple Shooting is completely satisfying.
The general paradigm of Newton-Picard preconditioners is the following: Under reasonable assumptions, \(\overline{u}^{i}\) has a compact Fréchet derivative with respect to q i and u i. Thus, even though the discretized Jacobian matrices \(G_{q}^{i} = \partial \overline{\boldsymbol{u}}^{i}/\partial \boldsymbol{q}^{i}\) and \(G_{u}^{i} = \partial \overline{\boldsymbol{u}}^{i}/\partial \boldsymbol{u}^{i}\) are in general large dense matrices, their eigenvalues and singular values cluster at 0 and they can thus be approximated well by low-rank matrices.
From a geometrical point of view, the compactness is a result of a smoothing property of \(\overline{u}^{i}\), which can alternatively be exploited to form a low-rank approximation by a two-grid approach. To this end, we approximate the Jacobian matrices G q i and G u i on coarse grids with suitable projection and restriction matrices P and R. This is the preferred way because no large eigenvalue and singular value problems have to be solved. Furthermore, this approach can be extended in a straight-forward fashion to Multiple Shooting approaches.
4.4 The Inequality Constrained Case
We now consider the inequality constrained case when \(\mathcal{I}\neq \{\}\). The Jacobian
is a saddle-point matrix in unsymmetric form. It is singular in general, e.g., due to rank deficiency of J 2(z) if lower and upper variable bounds are present. Thus, we need to recede to a pseudo-inverse approach. Let us first consider the case of exact derivatives in the SQP method. In this case, we can define a suitable pseudo-inverse
by computing the primal-dual solution \(\tilde{z} = (\tilde{x}, \tilde{y} ) \in \mathbb{R}^{n+m}\) of the QP
and then the step \(\varDelta z^{k} = \tilde{z}- z^{k}\). One can show that if QP (8) has a unique solution at z k and if some \(\hat{z} = (\hat{x}, \hat{y} ) \in \mathbb{R}^{N}\) satisfies \(\hat{y} _{i} \geq -y_{i}\) for all \(i \in \mathcal{I}\), then J ⊕ acts linearly on the second argument like a pseudo-inverse in the sense that \(J^{\oplus }(z^{k},J(z^{k})\hat{z} ) = \hat{z}\) [17, Lemma 4.36].
Furthermore, if the active set of QP (8) does not change from iteration k − 1 to k and the second order sufficient condition and strong complementarity hold, then we can construct a matrix M k such that for a neighborhood U of F(z k) it holds that
In words, the pseudo-inverse J ⊕ acts locally like a matrix [17, Theorem 4.37]. In addition, if the SQP method with J ⊕ converges to z ∗, then z ∗ is a critical point of NLP (4). Moreover, the second order sufficiency condition and strong complementarity transfer from QP (8) at z ∗ to NLP (4) at z ∗ [17, Theorem 4.41].
In the case of inexact derivatives on the basis of the Newton-Picard approximation from Sect. 4.3 we use a structured approximation of the Jacobian
In analogy to the construction of J ⊕(z k, . ) from J(z k), we can construct a QP preconditioner \(\hat{M} (z^{k})\) for a QP-LISA iteration. We first compute the primal-dual solution \(\tilde{z} = (\tilde{x}, \tilde{y} )\) of the QP
and then \(\varDelta z_{l}^{k} = \tilde{z}- z^{k} -\varDelta z_{l-1}^{k}\).
In order to have valid κ-estimators, the working set of QP (9) should be fixed for l > 1, effectively deferring further changes of the active set to the next outer iteration.
5 Numerical Solution of the Large-Scale QPs
In order to solve the large-scale QP (9) efficiently, we need to exploit the specific block-sparse and low-rank structures generated in the Multiple Shooting and Newton-Picard approaches. To this end, we first regroup the n 1 PDE variables and n 2 non-PDE variables of NLP (3) in the order
Then we observe that we can also divide the constraints of NLP (3) to obtain the structured NLP form
where \(\vert \mathcal{E}_{1}\vert = n_{1}\). The constraints (10b) comprise the PDE boundary constraints (3b) and PDE matching conditions (3c).
5.1 Multiple Shooting Structure Exploitation
The QP constraint matrix C has the form
with the derivative abbreviations
at the current iterate z k. We want to stress that contrary to the appearance the block C 11 is several orders of magnitude larger than the blocks C 22 and C 32 on fine spatial discretization levels.
Let \(M_{B} = \mathbb{I}_{n_{u}} - (\prod _{i=1}^{n_{\text{MS}}}G_{ u}^{i})R_{ uu}^{\mathrm{b}}\). It is easy to see that if M B is invertible, then C 11 is also invertible and its inverse is given by C 11 −1 =
This decomposition into block-triangular matrices yields a procedural recipe for evaluating matrix-vector products with C 11 −1 that does not require to explicitly form either C 11 or any part of C 11 −1, except for the M B −1 block. In the following, we shall see that a two-grid Newton-Picard approximation of C 11 also enables us to use a simple procedural form for matrix-vector products with the approximation of M B −1.
5.2 Two-Grid Newton-Picard Structure Exploitation
To this end, we need to establish grid transfer operators between a coarse and a fine grid. For two grid levels ℓ c ≤ ℓ f, we denote the PDE-state degrees of freedom by \(n_{\mathrm{c}} = n_{u}^{\ell_{\mathrm{c}}}\) and \(n_{\mathrm{f}} = n_{u}^{\ell_{\mathrm{f}}}\). Typically, n c ≈ 100, while n f > 10, 000. We then assume that the prolongation operator \(P \in \mathbb{R}^{n_{\mathrm{f}}\times n_{\mathrm{c}}}\) and the restriction operator \(R \in \mathbb{R}^{n_{\mathrm{c}}\times n_{\mathrm{f}}}\) satisfy
On nested grids, it is rather simple to construct P via interpolation. The restriction of \(v_{\mathrm{f}} \in \mathbb{R}^{n_{\mathrm{f}}}\) can then be computed as the unique coarse grid vector \(v_{\mathrm{c}} \in \mathbb{R}^{n_{\mathrm{c}}}\) whose prolongation minimizes the L 2-distance to v f on the fine grid. In case of a variational spatial discretization, we can use the coarse and fine grid mass matrices \(M_{\mathrm{c}} \in \mathbb{R}^{n_{\mathrm{c}}\times n_{\mathrm{c}}}\) and \(M_{\mathrm{f}} \in \mathbb{R}^{n_{\mathrm{f}}\times n_{\mathrm{f}}}\) in order to obtain the form
which satisfies (11) due to P T M f P = M c. Again, the matrix R should not be formed explicitly if M c −1 is a dense matrix.
We then approximate the blocks of C via a two-grid approach. To this end, we denote all matrices on the coarse grid with hats (\(\hat{\ }\)) and approximate C by a matrix \(\tilde{C}\) of the same block pattern, but consisting of the approximated blocks
Along the same lines, we use the approximations
We can then observe that if \(\hat{M} _{B}\) is invertible, then \(\tilde{M} _{B}\) is also invertible and we obtain the procedural recipe [17, Lemma 7.2]
Thus, we do not need to explicitly form any matrix but \(\hat{M}_{B}\) on the coarse grid in order to evaluate matrix-vector products with \(\tilde{M}_{B}\) and thus also with \(\tilde{C}_{11}^{-1}\). Furthermore, this form of \(\tilde{M} _{B}\) allows for an even more concise representation of \(\tilde{C}_{11}^{-1}\). To this end, we define the projectors
where ⊗ denotes the Kronecker product of matrices. Thus Π slow is a block-diagonal matrix of n MS blocks PR. Then, we can show [17, Theorem 7.3] that
This equation illustrates the name Newton-Picard: On the slowly converging modes, we approximate the inverse of C 11 by its coarse grid counterpart and perform a Newton iteration on this subspace, while we only use a Picard (or fixed point) iteration on the anyway fast converging modes through approximating the Jacobian with a negative identity. Moreover, we can now easily derive [17, Corollary 7.4] that if it exists, the Newton-Picard approximation of block C 11 has the inverse
5.3 QP Condensing
We now consider again QPs with a structure inherited from NLP (10)
We show that we can employ a partial null-space approach called condensing in order to efficiently solve the large-scale, structured QP (13b). Condensing is the key linear algebra ingredient for an efficient Direct Multiple Shooting method (see, e.g., [6]), but must be extended to also exploit the two-grid Newton-Picard structures. For this purpose, we prefer the following presentation of the concept of condensing.
Theorem 1 (Condensing)
Assume that C 11 in QP (13) is invertible and define
Let furthermore \((x_{2}^{{\ast}},y_{\mathcal{E}_{2}}^{{\ast}},y_{\mathcal{I}}^{{\ast}}) \in \mathbb{R}^{n_{2}+m_{2}+m_{3}}\) be a primal-dual solution of the QP
If we choose
then \((x^{{\ast}},y^{{\ast}}):= (x_{1}^{{\ast}},x_{2}^{{\ast}},y_{\mathcal{E}_{1}}^{{\ast}},y_{\mathcal{E}_{2}}^{{\ast}},y_{\mathcal{I}}^{{\ast}})\) is a primal-dual solution of QP (13) .
Proof
See, e.g., [17, Theorem 7.6].
Theorem 1 shows how the solution of the structured, large-scale QP (13) can be boiled down to a medium-scale QP (14), which only contains non-PDE variables x 2. The pre- and post-processing steps only require matrix-vector products with the large-scale blocks, e.g., n 2 matrix-vector products for the computation of C 11 −1 C 12.
5.4 Two-Grid Hessian Approximation
If we now use the approximated version of QP (13), we see that based on Sects. 5.1 and 5.2, the partial null-space basis can be evaluated purely on the coarse grid due to
This observation suggests a projected Newton-Picard approximation of the Hessian matrix via
Consequently, we have
and thus we can also compute the condensed two-grid Newton-Picard Hessian matrix purely on the coarse grid according to
Thus, it is possible to set up QP (14) efficiently with only a few fine-grid operations.
6 Numerical Examples
We now report on the numerical results for Direct Multiple Shooting applied to tracking problems for an advection-diffusion equation in 2D and a bacterial chemotaxis example in 1D. Further examples can be found in [17], including a real-world separation process from chemical engineering with economic objective and constraint functions. All computational results were computed on a Linux workstation with four 2.67 GHz Intel Core i7 cores and 24 GB of RAM.
6.1 Advection-Diffusion Equation
We consider an advection-diffusion equation on \(\varOmega \subset (-1,1)^{2}\), whose boundary ∂ Ω is partitioned into disjoint sets Γ i , i = 0, 1, 2, 3. Let ν denote the outwards pointing normal on ∂ Ω and let \(\boldsymbol{U} \in L^{2}(\varOmega )^{2}\) be a divergence-free velocity field. For \(\gamma,\varepsilon,T > 0\), inflow profiles u i in ∈ L 2(Γ i ), i = 1, 2, and a desired profile \(\hat{u}\in L^{2}(\varOmega )\), we apply the proposed method to the following periodic tracking-type boundary-control problem with control gradient constraints
The setting q i ∈ H 1(0, T) can be reformulated with L 2(0, T) controls by introduction of ODE states v i , whose time-derivative we control, and adequate adaption of all occurrences of q i in (16).
We obtain a suitable velocity field \(\boldsymbol{U}\) as the solution of a Stokes problem with parabolic inflow velocities on Γ 1 = { −1} × (−1, −1∕3) and Γ 2 = { 1} × (−1, −1∕3) and free outflow velocity on Γ 3 = (−1∕3, 1∕3) ×{ 1} (compare Fig. 1)
We use Finite Differences on equidistant, staggered grids for the discretization of the Stokes problem. This discretization is compatible with an upwind-flux Finite Volume method for the advection-diffusion equation, because the velocity degrees of freedom align with the center of the volume interfaces.
We compute optimal controls for \(T = 10, \hat{u} (x) = 1 + x_{1},u_{i}^{\mathrm{in}}(x) =\exp (-10(x_{2} + 2/3)^{2})),i = 1,2,\) and for varying values of \(\varepsilon = 10^{0},10^{-1},10^{-2},5 \cdot 10^{-3}\) and γ = 10−2, 10−3 on n MS = 20 shooting intervals. We use a 5-level hierarchy of grids with mesh sizes h = 2∕15, 1∕15, 1∕30, 1∕60, 1∕120, giving rise to 125, 500, 2000, 8000, and 32,000 degrees of freedom on the respective level. As initial values, we chose \(\boldsymbol{u}^{i} = 0,\boldsymbol{v}^{i} = (0.1,0.1)^{\mathrm{T}},\boldsymbol{q}^{i} = (0.01,0.01)^{\mathrm{T}}\) for \(i = 0,\mathop{\ldots },20\).
One purpose of treating an advection-diffusion problem here is to fathom the limits of the Newton-Picard approach, which exploits dispersion numerically. We can expect the problem to become more difficult to solve as we approach the hyperbolic limit \(\varepsilon \rightarrow 0\). Because the upwind discretization achieves numerical stability by introduction of spurious numerical dispersion, the discretized problem exhibits more diffusion on coarser grid levels where the effect of numerical dispersion is more pronounced. Thus, the efficiency of the diffusion-exploiting Newton-Picard preconditioners is slightly better for coarser fine grids. We also want to remark that problem (16) is a linear-quadratic problem. We tackle it with a nonlinear solver and thus the reader should keep in mind that numerical linear algebra approaches (e.g., along the lines of Potschka et al. [20]) are more efficient, because they exploit linearity explicitly.
From the computational results in Table 1, we can observe the expected trend in the number of overall inexact SQP iterations that is growing for decreasing values of the diffusion coefficient \(\varepsilon\). The increase is higher for lower regularization parameters γ. The number of inexact SQP iterations with the fine grid on the finest level grows only slightly. For high diffusion \(\varepsilon = 10^{0}\), level ℓ c = 1 for the coarse grid is already enough for sufficient local contraction on all fine grid levels ℓ f ≤ 5. For \(\varepsilon = 10^{-1}\), ℓ c = 1 is still sufficiently fine for a high regularization parameter γ = 10−1, but not for γ = 10−3. For \(\varepsilon = 10^{-2},5 \cdot 10^{-3}\), the κ-estimators trigger a refinement of the coarse level to ℓ c = 2 in order to obtain local contraction of the method on all fine grid levels ℓ f ≤ 5. For \(\varepsilon = 10^{-3}\), the coarse grid needs to be refined to ℓ c = 3. In this case, the memory consumption of DAESOL-II on the coarse and fine grids exceeds the available RAM space. This problem, however, is a momentary technical restriction of the software implementation and not a fundamental restriction of the method.
We can also observe in Table 1 that the number of adaptive time-steps for the initial value problem solver increases with decreasing diffusion coefficient γ. From the number of required time-steps we see that the time discretization has a higher resolution than the spatial discretization. The reason for this imbalance in accuracy are stability requirements for the solution of the initial value problems in (3c) and its first and second order derivatives as discussed in [17, Chap. 9, Sect. 3]. If we take into account the intermediate time steps, we obtain on the finest level between 2. 1 ⋅ 107 and 3. 0 ⋅ 107 state variables, although NLP (3) really only has 6. 7 ⋅ 105 degrees of freedom in the PDE state variables \(\boldsymbol{u}^{i},i = 0,\mathop{\ldots },n_{\text{MS}},\) (independently of \(\varepsilon\)).
If we compare the serial simulation time for the numerical solution of the initial value problems for fixed initial values and fixed controls with the wall-clock time for the inexact SQP method parallelized on four cores, we obtain a ratio of 27–95. This is quite remarkable, because already for the forward problem of finding a time-periodic solution for fixed controls, several Newton-Picard iterations (in the sense of Lust et al. [13]) would be required.
For completeness, we depict the optimal control and the corresponding optimal final state for the case \(\varepsilon = 10^{-2}\) and γ = 10−3 in Figs. 2 and 3. We can observe that most of the control action happens at Γ 2, where high values of u are to be tracked. The control gradient constraint (16i) is active at the slopes of the initial peak of q 2. Furthermore, the optimal state at the end of the period matches the desired profile \(\hat{u}\) well except for the area around the inflow boundary Γ 2.
6.2 Bacterial Chemotaxis
On the basis of a chemotaxis model by Tyson et al. [26, 27], we consider on Ω = (0, 1) with \(\partial \varOmega =\varGamma _{1} \cup \varGamma _{2} =\{ 0\} \cup \{ 1\}\) the nonlinear tracking-type boundary control problem (compare also [12])
where ∂ ν denotes the derivative in direction of the outwards pointing normal on Ω. The parameters considered here are D z = 0. 3, α = 4, β = 1, w = 0. 1, μ = 10, ρ = 0. 1, γ c = 10−2, γ q = 10−5. The targets to be tracked are \(\hat{z} (x) = 2x, \hat{c} (x) = 0\) and the initial states are given as z 0(x) = 1, c 0(x) = 0.
On the basis of a Finite Difference discretization on a six-level hierarchy of nested grids with 322, 642, 1282, 2562, 5122, 10,242 degrees of freedom for the PDE states, we can use the proposed Direct Multiple Shooting method to obtain the optimal controls and states depicted in Fig. 4. Level ℓ c = 1 for the coarse level is sufficient to yield good local contraction for the inexact SQP method. In the solution, DAESOL-II uses 889 integration adaptive steps for the solution of the initial value problems on all 20 shooting intervals together. In this case, the spatial and temporal discretization errors are well balanced, because a high-order method is used in time.
In Table 2 we sum up the cumulative CPU time needed for the computation of values concerning the initial value problems on each spatial discretization level. This is the main part of the computational burden. The solution of all the condensed QPs (14) only takes 0.6 s, for instance. We observe that most of the effort is spent on levels 1 and 6. The effort on level 1 is due to the coarse grid computations in each iteration. Due to DAESOL-II’s memory requirements, we could not refine the fine grid further. Were we able to do so, then the numerical effort spent on the finest grid level ℓ = 7 would dominate the overall effort even more.
7 Conclusions
We elaborated on the challenges and a numerical approach for a Direct Multiple Shooting method for optimal control problems with coupled ODE and parabolic PDE constraints. Even difficult boundary conditions like time periodicity of the states can be treated efficiently in this framework. The large-scale nature of the resulting NLPs can be tamed by a structure exploiting two-grid Newton-Picard inexact SQP method. Encouraging numerical results indicate that challenging PDE constrained optimal control problems can be solved efficiently with a Direct Multiple Shooting method. The proposed method does not require the often time-consuming derivation of adjoint equations and can be easily parallelized. It is in particular suited for practitioners who need an accurate resolution of the system dynamics and can live with lower resolution of the optimal control.
We would like to thank the unknown reviewer for the constructive comments on the initial manuscript. Support by the EU through S. Engell’s and H.G. Bock’s ERC Advanced Investigator Grant MOBOCON (291 458) is gratefully acknowledged.
References
Albersmeyer, J.: Adjoint based algorithms and numerical methods for sensitivity generation and optimization of large scale dynamic systems. Ph.D. thesis, Ruprecht-Karls-Universität Heidelberg (2010)
Albersmeyer, J., Bock, H.G.: Sensitivity generation in an adaptive BDF-method. In: Bock, H.G., Kostina, E., Phu, X.H., Rannacher, R. (eds.) Modeling, Simulation and Optimization of Complex Processes. Proceedings of the International Conference on High Performance Scientific Computing, pp. 15–24, Hanoi, 6–10 March 2006. Springer, Berlin (2008)
Bellman, R.E.: Dynamic Programming, 6th edn. University Press, Princeton (1957)
Biegler, L.T.: Solution of dynamic optimization problems by successive quadratic programming and orthogonal collocation. Comput. Chem. Eng. 8, 243–248 (1984)
Bock, H.G.: Recent advances in parameter identification techniques for ODE. In: Deuflhard, P., Hairer, E. (eds.) Numerical Treatment of Inverse Problems in Differential and Integral Equations, pp. 95–121. Birkhäuser, Boston (1983)
Bock, H.G., Plitt, K.J.: A multiple shooting algorithm for direct solution of optimal control problems. In: Proceedings of the 9th IFAC World Congress, pp. 242–247, Budapest. Pergamon Press (1984)
Dautray, R., Lions, J.-L.: Evolution problems I. In: Craig, A. (ed.) Mathematical Analysis and Numerical Methods for Science and Technology, vol. 5. Springer, Berlin (1992)
Deuflhard, P.: Newton Methods for Nonlinear Problems. Affine Invariance and Adaptive Algorithms. Springer Series in Computational Mathematics, vol. 35. Springer, Berlin (2006)
Griewank, A., Walther, A.: Evaluating Derivatives, 2nd edn. SIAM, Philadelphia (2008)
Hesse, H.K.: Multiple shooting and mesh adaptation for PDE constrained optimization problems. Ph.D. thesis, University of Heidelberg (2008)
Hinze, M., Pinnau, R., Ulbrich, M., Ulbrich, S.: Optimization with PDE Constraints. Springer, New York (2009)
Lebiedz, D., Brandt-Pollmann, U.: Manipulation of self-aggregation patterns and waves in a reaction-diffusion system by optimal boundary control strategies. Phys. Rev. Lett. 91(20), 208301 (2003)
Lust, K., Roose, D., Spence, A., Champneys, A.R.: An adaptive Newton-Picard algorithm with subspace iteration for computing periodic solutions. SIAM J. Sci. Comput. 19(4), 1188–1209 (1998)
Nocedal, J., Wright, S.J.: Numerical Optimization, 2nd edn. Springer, Berlin (2006)
Osborne, M.R.: On shooting methods for boundary value problems. J. Math. Anal. Appl. 27, 417–433 (1969)
Pontryagin, L.S., Boltyanski, V.G., Gamkrelidze, R.V., Miscenko, E.F.: The Mathematical Theory of Optimal Processes. Wiley, Chichester (1962)
Potschka, A.: A direct method for the numerical solution of optimization problems with time-periodic PDE constraints. Ph.D. thesis, Universität Heidelberg (2011)
Potschka, A.: A Direct Method for Parabolic PDE Constrained Optimization Problems. Advances in Numerical Mathematics. Springer, Berlin (2013)
Potschka, A., Bock, H.G., Schlöder, J.P.: A minima tracking variant of semi-infinite programming for the treatment of path constraints within direct solution of optimal control problems. Optim. Methods Softw. 24(2), 237–252 (2009)
Potschka, A., Mommer, M.S., Schlöder, J.P., Bock, H.G.: Newton-Picard-based preconditioning for linear-quadratic optimization problems with time-periodic parabolic PDE constraints. SIAM J. Sci. Comput. 34(2), 1214–1239 (2012)
Russell, R.D., Shampine, L.F.: A collocation method for boundary value problems. Numer. Math. 19, 1–28 (1972)
Saad, Y.: Iterative Methods for Sparse Linear Systems, 2nd edn. SIAM, Philadelpha (2003)
Thomée, V.: Galerkin Finite Element Methods for Parabolic Problems. Springer Series in Computational Mathematics, vol. 25, 2nd edn. Springer, Berlin (2006)
Tröltzsch, F.: Optimale Steuerung partieller Differentialgleichungen: Theorie, Verfahren und Anwendungen, 2nd edn. Vieweg+Teubner Verlag, Wiesbaden (2009)
Tsang, T.H., Himmelblau, D.M., Edgar, T.F.: Optimal control via collocation and non-linear programming. Int. J. Control. 21, 763–768 (1975)
Tyson, R., Lubkin, S.R., Murray, J.D.: Model and analysis of chemotactic bacterial patterns in a liquid medium. J. Biol. 38, 359–375 (1999). doi:10.1007/s002850050153
Tyson, R., Lubkin, S.R., Murray, J.D.: A minimal mechanism for bacterial pattern formation. Proc. R. Soc. B Biol. Sci. 266, 299–304 (1999)
Walther, A., Kowarz, A., Griewank, A.: ADOL-C: a package for the automatic differentiation of algorithms written in C/C++. Technical report, Institute of Scientific Computing, Technical University Dresden (2005)
Wloka, J.: Partial Differential Equations. Cambridge University Press, Cambridge (1987). Translated from the German by C.B. Thomas and M.J. Thomas
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2015 Springer International Publishing Switzerland
About this paper
Cite this paper
Potschka, A. (2015). Direct Multiple Shooting for Parabolic PDE Constrained Optimization. In: Carraro, T., Geiger, M., Körkel, S., Rannacher, R. (eds) Multiple Shooting and Time Domain Decomposition Methods. Contributions in Mathematical and Computational Sciences, vol 9. Springer, Cham. https://doi.org/10.1007/978-3-319-23321-5_6
Download citation
DOI: https://doi.org/10.1007/978-3-319-23321-5_6
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-23320-8
Online ISBN: 978-3-319-23321-5
eBook Packages: Mathematics and StatisticsMathematics and Statistics (R0)