Abstract
In this paper, we show how a special class of inverse optimal control problems of elliptic partial differential equations can be solved globally. Using the optimal value function of the underlying parametric optimal control problem, we transfer the overall hierarchical optimization problem into a nonconvex single-level one. Unfortunately, standard regularity conditions like Robinson’s CQ are violated at all the feasible points of this surrogate problem. It is, however, shown that locally optimal solutions of the problem solve a Clarke-stationarity-type system. Moreover, we relax the feasible set of the surrogate problem iteratively by approximating the lower level optimal value function from above by piecewise affine functions. This allows us to compute globally optimal solutions of the original inverse optimal control problem. The global convergence of the resulting algorithm is shown theoretically and illustrated by means of a numerical example.
Similar content being viewed by others
Avoid common mistakes on your manuscript.
1 Introduction
The aim of the inverse problem considered in this paper is the determination of finitely many parameters in the cost function of a given optimal control problem of a linear (partial) differential equation such that the resulting optimal state and control minimize a given superordinate functional, e.g. the distance to given data functions, see Example 2.1. In the context of human locomotion, similar inverse problems of ordinary differential equations are considered in [2, 3, 28].
Due to their structure, the inverse problems of interest turn out to be so-called bilevel optimal control problems, i.e. hierarchical optimization problems with two decision levels where at least one decision maker has to solve an optimal control problem, see e.g. the monographs [4, 7, 10, 32] and [19, 25, 33, 34] for detailed introductions to bilevel programming and optimal control (of ordinary as well as partial differential equations), respectively. Some more applications of bilevel optimal control can be found in [12, 17, 22, 24] while necessary optimality conditions are the subject in e.g. [5, 26, 27, 35, 36]. First steps regarding inverse optimal control of partial differential equations were carried out recently, see [16, 20]. Therein, the authors heavily exploit the uniqueness of the lower level solution for any fixed upper level variable and the properties of the associated solution operator. Note that optimal control problems with variational inequality constraints like the optimal control of the obstacle problem can be interpreted as bilevel optimal control problems as well, see [15] and the references therein.
To the best of our knowledge, there mainly exist methods for the numerical handling of inverse optimal control problems of ordinary differential equations, see e.g. [2, 3, 17, 18]. These algorithms focus on the replacement of the lower level optimal control problem by means of optimality conditions. This, however, is a delicate approach since the resulting surrogate problem is not necessarily equivalent to the original bilevel programming problem anymore, see [8], where the authors deal with this issue in the finite-dimensional situation. Furthermore, there is an uncomfortable lack of convergence results.
In this paper, we will strike a different path to derive necessary optimality conditions and to state an algorithm which can be used to solve a special class of inverse optimal control problems of (partial) differential equations. For that purpose, the optimal value function of the parametric optimal control problem (OC(x)) is used. The idea of utilizing value functions in hierarchical programming dates back to [30]. Here, we first exploit the aforementioned optimal value function in order to transfer the given hierarchical model (IOC) into an equivalent single-level program. Although the resulting nonconvex surrogate problem does not satisfy standard constraint qualification, see Lemma 5.1 and Example 5.1, necessary optimality conditions of Clarke-stationarity-type can be derived via a relaxation approach, see Theorem 5.2. In our setting, the lower level value function is convex which allows us to compute an upper approximation which is piecewise affine. This idea is taken from [9]. Afterwards, we can decompose the obtained surrogate problem into finitely many convex optimal control problems which enables us to solve the relaxed surrogate problem globally. In an iterative way, the upper approximation of the lower level value function is improved, see Algorithm 1 for details. Finally, it is shown that the proposed algorithm converges to a global solution of the underlying inverse optimal control problem, see Theorem 6.1.
The remaining parts of the paper are organized as follows: In Sect. 2, the precise problem statement is presented. Afterwards, we clarify our notation and comment on some preliminary results in Sect. 3. Section 4 is dedicated to the study of the properties of the lower level optimal value function. In Sect. 5, we transfer the original inverse optimal control problem into an equivalent single-level optimal control problem by exploiting the aforementioned optimal value function. Furthermore, we discuss the properties of the resulting surrogate problem. Using a relaxation approach, necessary optimality conditions for (IOC) are derived for the special setting where \(U_{\text {ad}}\) is a box-constrained set in \(L^2(\varOmega )\). Finally, we present a solution algorithm in Sect. 6. Its global convergence is shown theoretically. An illustrative example is included in order to visualize the obtained theory. Some concluding remarks and perspectives of future research are presented in Sect. 7.
2 Problem statement
We consider the parametric optimal control problem (also called lower level problem)
whose parameters \(x\in \mathbb {R}^n\) have to be identified in the superordinate upper level optimization problem
Therein, \(\varPsi :\mathcal {\mathbb {R}}^n\rightrightarrows {\mathcal {Y}}\times {\mathcal {U}}\) denotes the solution set mapping of the parametric optimization problem (OC(x)) and \(S\subset \mathbb {R}^n\) is a nonempty polytope, i.e. a compact polyhedron.
Before we comment on the inverse optimal control problem (IOC) in more detail, we state the fundamental assumptions of this paper.
Assumption 2.1
Let \({\mathcal {Y}}\), \({\mathcal {M}}\), and \({\mathcal {U}}\) be Hilbert spaces. Furthermore, the objective functional \(F:\mathbb {R}^n\times {\mathcal {Y}}\times {\mathcal {U}}\rightarrow \mathbb {R}\) is assumed to be continuously Fréchet differentiable and convex. The set \(S\subset \mathbb {R}^n\) is a nonempty polytope. Moreover, we fix an isomorphism \({\mathtt {A}}\in {\mathbb {L}}\left[ {\mathcal {Y}},{\mathcal {Y}}^\star \right] \) as well as continuous linear operators \({\mathtt {B}}\in {\mathbb {L}}\left[ {\mathcal {U}},{\mathcal {Y}}^\star \right] \), \({\mathtt {C}}\in {\mathbb {L}}\left[ {\mathcal {Y}},{\mathcal {M}}\right] \), \({\mathtt {P}}\in {\mathbb {L}}\left[ \mathbb {R}^n,{\mathcal {M}}\right] \), and \({\mathtt {Q}}\in {\mathbb {L}}\left[ \mathbb {R}^n,{\mathcal {U}}\right] \). The set of feasible controls \(U_{\text {ad}}\subset {\mathcal {U}}\) is assumed to be nonempty, closed, and convex. Finally, the regularization parameter \(\sigma >0\) is fixed.
Due to these assumptions, the lower level objective function \(f:\mathbb {R}^n\times {\mathcal {Y}}\times {\mathcal {U}}\rightarrow \mathbb {R}\) is continuous and (jointly) convex.
Next, we present a specific setting of the data which shows the close relationship of (IOC) and inverse control.
Example 2.1
Here, we comment on the inverse optimal control of Poisson’s equation. Therefore, let \(\varOmega \subset \mathbb {R}^d\) be a bounded domain and choose the spaces \({\mathcal {Y}}:=H^1_0(\varOmega )\) as well as \({\mathcal {M}}={\mathcal {U}}:=L^2(\varOmega )\). For some observed state \(y_{\text {o}}\in L^2(\varOmega )\) and some observed control \(u_{\text {o}}\in L^2(\varOmega )\), we consider the tracking-type objective
We set \(S:=\varDelta ^n\) where \(\varDelta ^n\) represents the standard simplex, i.e.
Furthermore, let \({\mathtt {C}}\) be the continuous embedding \(H^1_0(\varOmega )\hookrightarrow L^2(\varOmega )\) and let \({\mathtt {Q}}\) be the zero operator. For fixed form functions \(f_1,\dots ,f_n\in L^2(\varOmega )\), we define \({\mathtt {P}}\in {\mathbb {L}}\left[ \mathbb {R}^n,L^2(\varOmega )\right] \) by means of
i.e. the objective of the lower level problem (OC(x)) takes the following form:
The operator \({\mathtt {B}} \in {\mathbb {L}}\left[ L^2(\varOmega ),H^{-1}(\varOmega )\right] \), where \(H^{-1}(\varOmega )\) denotes the dual space of \(H^1_0(\varOmega )\), is the canonical embedding, i.e., the adjoint of \({\mathtt {C}}\). Finally, \({\mathtt {A}} \in {\mathbb {L}}\left[ H_0^1(\varOmega ), H^{-1}(\varOmega )\right] \) equals the negative Laplace operator, i.e., we have \(\left\langle {\mathtt {A}}[y], v\right\rangle _{H_0^1(\varOmega )} = \int _\varOmega \nabla y(\omega ) \cdot \nabla v(\omega ) \, \mathrm {d}\omega \) for all \(y,v \in H_0^1(\varOmega )\).
In [26], the author studies a bilevel programming problem which is closely related to (IOC) in terms of necessary optimality conditions. Exploiting the fact that under the postulated assumptions, (OC(x)) possesses a unique optimal solution for any instance of the parameter \(x\in \mathbb {R}^n\), the bilevel program is transferred into a single-level problem by inserting the solution mapping of the lower level problem into the upper level objective. It is shown that the lower level solution mapping is directionally differentiable as long as \(U_{\text {ad}}\) is polyhedric or, more general, if the projection onto \(U_{\text {ad}}\) is directionally differentiable in the sense of Haraux, see [14]. Thus, the author is capable of deriving necessary optimality conditions for the bilevel programming problem via its implicit reformulation. However, the necessary constraint qualifications may fail to hold in the special setting (IOC) since the upper level variable is finite dimensional while the lower level variables are in general not.
Here, we want to use the so-called optimal value function \(\varphi :\mathbb {R}^n\rightarrow \mathbb {R}\) of the parametric optimization problem (OC(x)) defined by
in order to solve problem (IOC) globally. It is well known that the surrogate optimization problem
is equivalent to the model problem (IOC), see [7]. Utilizing value functions in bilevel programming dates back to [30].
3 Notation and preliminaries
3.1 Notation
In this paper, we equip \(\mathbb {R}^n\), the space of all real vectors with \(n\in {\mathbb {N}}\) components, with the Euclidean norm \(\left| \cdot \right| _{2}\). The Euclidean inner product of two vectors \(x,y\in \mathbb {R}^n\) is denoted by \(x\cdot y\). The sets \(\mathbb {R}^+_0\) and \(\mathbb {R}^-_0\) represent the nonnegative and nonpositive real numbers, respectively. For some arbitrary Banach space \({\mathcal {X}}\), \(\left\| \cdot \right\| _{{\mathcal {X}}}\) denotes its norm. The set \({\mathbb {B}}_{\mathcal {X}}^\varepsilon ({\bar{x}})\) is the closed \(\varepsilon \)-ball around \({\bar{x}}\in {\mathcal {X}}\) w.r.t. the norm in \({\mathcal {X}}\). Let \({\mathcal {X}}^\star \) be the (topological) dual of \({\mathcal {X}}\). Then, \(\left\langle \cdot , \cdot \right\rangle _{{\mathcal {X}}}:{\mathcal {X}}^\star \times {\mathcal {X}}\rightarrow \mathbb {R}\) expresses the associated dual pairing. For some set \(A\subset {\mathcal {X}}\), \({\text {conv}}\,A\), \({\text {cone}}\, A\), \({\text {cl}}\,A\), \({\text {int}}\,A\), and \(\partial A\) denote the convex hull of A, the smallest cone containing A, the closure of A, the interior of A, and the boundary of A, respectively. Furthermore, we define the polar cone of A by means of
Note that \(A^\circ \) is a nonempty, closed, convex cone. Now, assume that A is convex and fix a point \({\bar{x}}\in A\). We define the radial cone, the tangent (or Bouligand) cone, and the normal cone (in the sense of convex analysis) to A at \({\bar{x}}\) via
Note that \({\mathcal {N}}_A({\bar{x}})=(A-\{{\bar{x}}\})^\circ \) holds.
For some other Banach space \({\mathcal {Y}}\), \({\mathbb {L}}\left[ {\mathcal {X}},{\mathcal {Y}}\right] \) is the Banach space of all bounded, linear operators mapping from \({\mathcal {X}}\) to \({\mathcal {Y}}\). For some operator \({\mathtt {F}}\in {\mathbb {L}}\left[ {\mathcal {X}},{\mathcal {Y}}\right] \), \({\mathtt {F}}^\star \in {\mathbb {L}}\left[ {\mathcal {Y}}^\star ,{\mathcal {X}}^\star \right] \) denotes its adjoint. If \({\mathcal {X}}\) and \({\mathcal {Y}}\) are Hilbert spaces, then \({\mathtt {F}}^\star \in {\mathbb {L}}\left[ {\mathcal {Y}},{\mathcal {X}}\right] \) can also denote the so-called Hilbert space adjoint of \({\mathtt {F}}\) (which is obtained from the usual adjoint operator by identifying the spaces \({\mathcal {X}}\) and \({\mathcal {Y}}\) with their respective dual spaces via the associated Riesz isomorphisms). The operators \({\mathtt {O}}\in {\mathbb {L}}\left[ {\mathcal {X}},{\mathcal {Y}}\right] \) and \({\mathtt {I}}_{\mathcal {X}}\in {\mathbb {L}}\left[ {\mathcal {X}},{\mathcal {X}}\right] \) represent the zero-operator (which maps all elements of \({\mathcal {X}}\) to the zero in \({\mathcal {Y}}\)) and the identity operator of \({\mathcal {X}}\), respectively. For Banach spaces \({\mathcal {X}}_1,\dots ,{\mathcal {X}}_n\) and \({\mathcal {Y}}_1,\dots ,{\mathcal {Y}}_m\) as well as operators \({\mathtt {F}}_{i,j}\in {\mathbb {L}}\left[ {\mathcal {X}}_j,{\mathcal {Y}}_i\right] \), \(i=1,\dots ,m\), \(j=1,\dots ,n\), we define the associated product operator in \({\mathbb {L}}\left[ {\mathcal {X}}_1\times \dots \times {\mathcal {X}}_n,{\mathcal {Y}}_1\times \dots \times {\mathcal {Y}}_m\right] \) by
for all \((x_j)_{j=1}^n\in {\mathcal {X}}_1\times \dots \times {\mathcal {X}}_n\). For a Hilbert space \({\mathcal {H}}\), an operator \({\mathtt {G}}\in {\mathbb {L}}\left[ {\mathcal {H}},{\mathcal {H}}^\star \right] \) is called elliptic (or coercive) if there is a constant \(\alpha >0\) such that
Recall that a mapping \(J:{\mathcal {X}}\rightarrow {\mathcal {Y}}\) is called Fréchet differentiable at \({\bar{x}}\in {\mathcal {X}}\) if there exists an operator \(J'({\bar{x}})\in {\mathbb {L}}\left[ {\mathcal {X}},{\mathcal {Y}}\right] \), which satisfies
In case of existence, \(J'({\bar{x}})\) is called the Fréchet derivative of J at \({\bar{x}}\). If the mapping \({\mathcal {X}}\ni x\mapsto J'(x)\in {\mathbb {L}}\left[ {\mathcal {X}},{\mathcal {Y}}\right] \) is well-defined and continuous in a neighborhood of \({\bar{x}}\), then J is said to be continuously Fréchet differentiable at \({\bar{x}}\).
Finally, we would like to mention that for an arbitrary domain \(\varOmega \subset \mathbb {R}^d\), \(L^2(\varOmega )\) is used to represent the usual Lebesgue space of (equivalence classes of) measurable, square-integrable functions. As usual, \(H^1_0(\varOmega )\) denotes the closure of \(C^\infty _0(\varOmega )\), the set of all arbitrarily often continuously differentiable functions with compact support in \(\varOmega \), w.r.t. the common \(H^1\)-Sobolev norm, see [1] for details. We use \(H^{-1}(\varOmega ):=H^1_0(\varOmega )^\star \) for its dual.
3.2 Preliminary results
Now, we take a look at the optimization problem
where \(j:{\mathcal {X}}\rightarrow \mathbb {R}\) as well as \(J:{\mathcal {X}}\rightarrow {\mathcal {Y}}\) are continuously Fréchet differentiable mappings between Banach spaces \({\mathcal {X}}\) and \({\mathcal {Y}}\) while \(C\subset {\mathcal {Y}}\) is a nonempty, closed, convex set. A feasible point \({\bar{x}}\in {\mathcal {X}}\) of (P) satisfies the so-called Karush-Kuhn-Tucker (KKT for short) conditions if the following holds:
If \({\bar{x}}\) is a locally optimal solution of (P) which fulfills Robinson’s constraint qualification, i.e. the condition
is satisfied, then the KKT-conditions are valid, see [6, Theorem 3.9]. In the absence of Robinson’s constraint qualification, this result does not hold in general. Further information on constraint qualifications and necessary optimality conditions addressing optimization problems in Banach spaces can be found in [6, 31, 38].
In order to show the equivalence of certain constraint qualifications of Robinson-type in the setting of product structures, the following lemma will be useful. Similar results can be found in [26, Lemma 3.4, Corollary 3.5].
Lemma 3.1
Let Banach spaces \({\mathcal {X}}_1\), \({\mathcal {X}}_2\), \({\mathcal {Y}}_1\), and \({\mathcal {Y}}_2\) as well as sets \(U\subset {\mathcal {X}}_2\) and \(V\subset {\mathcal {Y}}_1\) be given. Moreover, let \({\mathtt {A}}\in {\mathbb {L}}\left[ {\mathcal {X}}_1,{\mathcal {Y}}_1\right] \), \({\mathtt {B}}\in {\mathbb {L}}\left[ {\mathcal {X}}_2,{\mathcal {Y}}_1\right] \), \({\mathtt {C}}\in {\mathbb {L}}\left[ {\mathcal {X}}_1,{\mathcal {Y}}_2\right] \), and \({\mathtt {D}}\in {\mathbb {L}}\left[ {\mathcal {X}}_2,{\mathcal {Y}}_2\right] \) be linear operators such that \({\mathtt {C}}\) is an isomorphism. We consider the following conditions:
Then, (1a) and (1b) are equivalent.
Proof
We show both implications separately.
“\(\Longrightarrow \)”: Assume that (1a) is valid and choose \(y\in {\mathcal {Y}}_1\). Then, we find \(x\in {\mathcal {X}}_1\), \(u\in U\), and \(v\in V\) such that \({\mathtt {A}}[x]-{\mathtt {B}}[u]-v=y\) and \({\mathtt {C}}[x]+{\mathtt {D}}[u]=0\) are valid. Observing \(x=({\mathtt {C}}^{-1}\circ (-{\mathtt {D}}))[u]\), we obtain \(({\mathtt {A}}\circ {\mathtt {C}}^{-1}\circ (-{\mathtt {D}})+{\mathtt {B}})[u]-v=y\). Consequently, (1b) holds.
“\(\Longleftarrow \)”: Next, we suppose that (1b) is valid and choose \(y_1\in {\mathcal {Y}}_1\) as well as \(y_2\in {\mathcal {Y}}_2\) arbitrarily. Due to the validity of (1b), we find \({\tilde{u}}\in U\) and \({\tilde{v}}\in V\) such that
holds true. Let us set \({\tilde{x}}:={\mathtt {C}}^{-1}[y_2-{\mathtt {D}}[{\tilde{u}}]]\). Then, we have \({\mathtt {C}}[{\tilde{x}}]+{\mathtt {D}}[{\tilde{u}}]=y_2\) and
This shows that (1a) holds. \(\square \)
4 Properties of the lower level optimal value function
The following result follows by standard arguments which are, nevertheless, included for the readers convenience.
Lemma 4.1
There are Lipschitz continuous functions \(\psi ^y:\mathbb {R}^n\rightarrow {\mathcal {Y}}\) and \(\psi ^u:\mathbb {R}^n\rightarrow {\mathcal {U}}\) which satisfy \(\varPsi (x)=\{(\psi ^y(x),\psi ^u(x))\}\) for all \(x\in \mathbb {R}^n\), i.e. the solution set mapping of (OC(x)) is single-valued and Lipschitz continuous.
Proof
Let us first introduce the control-to-observation operator \({\mathtt {S}}:= {\mathtt {C}}\circ {\mathtt {A}}^{-1}\circ {\mathtt {B}}\in {\mathbb {L}}\left[ {\mathcal {U}},{\mathcal {M}}\right] \) of the lower level optimal control problem (OC(x)). Eliminating the state variable y, the so-called reduced formulation of (OC(x)) is given by
Observing that its objective function is continuous, convex, and coercive for any choice of x while its feasible set is nonempty, closed, and convex, standard arguments show that the reduced problem possesses a unique optimal solution \(\psi ^u(x)\) for each \(x\in \mathbb {R}^n\). Defining \(\psi ^y:={\mathtt {A}}^{-1}\circ {\mathtt {B}}\circ \psi ^u\), we can deduce \(\varPsi (x)=\{(\psi ^y(x),\psi ^u(x))\}\).
It is easily seen that the globally optimal solution \(\psi ^u(x)\) of the reduced problem is the uniquely determined solution of the following variational inequality of the first kind:
Noting that the operator \({\mathtt {S}}^\star \circ {\mathtt {S}}+\sigma {\mathtt {I}}_{\mathcal {U}}\) is elliptic with constant \(\sigma >0\) it follows that
see [23, Theorem II.2.1]. Thus, \(\psi ^u\) is Lipschitz continuous. Exploiting the representation \(\psi ^y={\mathtt {A}}^{-1}\circ {\mathtt {B}}\circ \psi ^u\), the Lipschitz continuity of \(\psi ^y\) follows immediately. \(\square \)
As a consequence of Lemma 4.1, the continuity of F, and the compactness of \(S\subset \mathbb {R}^n\), we obtain the following corollary from Weierstraß’ famous theorem.
Corollary 4.1
The inverse optimal control problem (IOC) possesses a globally optimal solution.
Note that in the remaining parts of the paper, we will exploit the notion of \(\psi ^y:\mathbb {R}^n\rightarrow {\mathcal {Y}}\) and \(\psi ^u:\mathbb {R}^n\rightarrow {\mathcal {U}}\) as introduced in Lemma 4.1.
In the following, we study the properties of the optimal value function \(\varphi \) in more detail. First, we show that \(\varphi \) is a convex function. The essentials of the proof date back to [11, Proposition 2.1].
Lemma 4.2
The optimal value function \(\varphi \) is convex.
Proof
Obviously, the function f is convex w.r.t. all variables. Defining
we easily see that
is valid. Clearly, M is convex.
Choose \({\tilde{x}},x'\in \mathbb {R}^n\) as well as \(\alpha \in [0,1]\) arbitrarily. Let us fix the corresponding lower level solutions \(({\tilde{y}},{\tilde{u}}):=(\psi ^y({\tilde{x}}),\psi ^u({\tilde{x}}))\) and \((y',u'):=(\psi ^y(x'),\psi ^u(x'))\). The joint convexity of f as well as the obvious fact \(\alpha M+(1-\alpha )M=M\) lead to
This completes the proof. \(\square \)
For later purposes, we formulate the KKT-system of the lower level problem (OC(x)) for a feasible point \((y,u)\in {\mathcal {Y}}\times {\mathcal {U}}\) of (OC(x)) for fixed parameter \(x\in \mathbb {R}^n\):
It is easily seen that for fixed \(x\in \mathbb {R}^n\), \(y:=\psi ^y(x)\), and \(u:=\psi ^u(x)\), the Lagrange multipliers \(p\in {\mathcal {Y}}\) and \(\lambda \in {\mathcal {U}}^\star \) are uniquely determined. Thus, we introduce mappings \(\phi ^p:\mathbb {R}^n\rightarrow {\mathcal {Y}}\) and \(\phi ^\lambda :\mathbb {R}^n\rightarrow {\mathcal {U}}^\star \) which assign to each \(x\in \mathbb {R}^n\) the respective Lagrange multipliers p and \(\lambda \) which characterize the associated lower level solution (y, u). Clearly, \(\phi ^p\) and \(\phi ^\lambda \) are Lipschitz continuous since \(\psi ^y\) and \(\psi ^u\) are Lipschitz continuous, see Lemma 4.1.
The next lemma shows the differentiability of \(\varphi \). We note that this result follows partially from [6, Theorem 4.13]. However, we included a proof for the reader’s convenience.
Lemma 4.3
The optimal value function \(\varphi \) is continuously Fréchet differentiable. At a given point \({\bar{x}}\in \mathbb {R}^n\), the associated Fréchet derivative is given as follows:
Proof
Fix a reference point \({\bar{x}}\in \mathbb {R}^n\) and choose \(x \in \mathbb {R}^n\) arbitrarily. For brevity, we will use the notation
We exploit that f is a quadratic functional and that the functions \(\psi ^y\) and \(\psi ^u\) are Lipschitz continuous. Thus, we obtain
Here, \((\cdot )\) abbreviates the argument \(({\bar{x}}, {\bar{y}}, {\bar{u}})\). By utilizing the optimality conditions of (OC(x)), we have
Together with \(y - {\bar{y}} = \mathtt {A}^{-1}[\mathtt {B}[u - {\bar{u}}]]\), this yields
Finally, we observe
due to the definition of the normal cone and the Lipschitz continuity of \(\psi ^u\) as well as \(\phi ^\lambda \). Together with the straightforward computation of \(f'_x(\cdot )\), this yields
and the proof is completed. \(\square \)
5 The optimal-value-reformulation
5.1 On the lack of regularity
Exploiting Lemma 4.3, we know that the optimal-value-reformulation (OVR) of (IOC) is an optimization problem with continuously Fréchet differentiable data. However, (OVR) is still a challenging problem due to the following observation.
Lemma 5.1
Robinson’s CQ is violated at each feasible point of (OVR).
Proof
We fix an arbitrary feasible point \(({\bar{x}},{\bar{y}},{\bar{u}})\in \mathbb {R}^n\times {\mathcal {Y}}\times {\mathcal {U}}\) of (OVR). In order to show the lemma’s assertion, it is sufficient to construct a nontrivial singular Lagrange multiplier, see [6, Proposition 3.16(i)].
Let us consider the smooth optimization problem
By definition of the optimal value function \(\varphi \), its infimal value is given by 0. The feasibility of \(({\bar{x}},{\bar{y}},{\bar{u}})\) for (OVR) yields \(f({\bar{x}},{\bar{y}},{\bar{u}})-\varphi ({\bar{x}})=0\), i.e. \(({\bar{x}},{\bar{y}},{\bar{u}})\) is a globally optimal solution of (3). Obviously, Robinson’s CQ is satisfied at each feasible point of problem (3). Thus, we find \(p\in {\mathcal {Y}}\) and \(\eta \in {\mathcal {U}}^\star \) which satisfy
Thus, \((0,1,p,\eta )\in \mathbb {R}^n\times \mathbb {R}\times {\mathcal {Y}}\times {\mathcal {U}}^\star \) is a nonvanishing singular Lagrange multiplier for (OVR) at \(({\bar{x}},{\bar{y}},{\bar{u}})\). Since \(({\bar{x}},{\bar{y}},{\bar{u}})\) was an arbitrarily chosen feasible point of (OVR), the proof is completed. \(\square \)
In [37], the authors try to overcome this well-known drawback of the optimal-value-reformulation of finite-dimensional bilevel programming problems by penalizing the violation of the constraint \(f(x,y,u)-\varphi (x)\le 0\) in the objective function. There exist several situations where this penalization is exact (e.g. if the lower level problem is fully linear) and, consequently, the KKT-conditions of the optimal-value-reformulation may serve as necessary optimality conditions for the original bilevel programming problem. However, in our situation, the lower level problem (OC(x)) is quadratic (and possibly infinite dimensional). In the following example, we show that even in the finite-dimensional setting, it is not promising to rely on the KKT-conditions of (OVR).
Example 5.1
We consider the state-reduced bilevel programming problem
The lower level solution operator and the lower level optimal value function are given as stated below:
One can easily check that the globally optimal solution of the bilevel programming problem is given by \(({\bar{x}},{\bar{u}})=(\tfrac{1}{2},\tfrac{1}{2})\). On the other hand, the KKT-conditions of the associated optimal-value-reformulation at \(({\bar{x}},{\bar{u}})\) are given by
and cannot be satisfied.
5.2 Relaxing the optimal value constraint
The violation of Robinson’s CQ at all the feasible points of (OVR) is caused by the constraint \(f(x,y,u)-\varphi (x)\le 0\) which is, in fact, fulfilled with equality for all feasible points due to the definition of the optimal value function. Thus, a nearby idea to overcome this problem is given by considering the relaxed problem
for a sequence of positive relaxation parameters \(\{\varepsilon _k\}_{k\in \mathbb {N}}\) converging to zero as \(k\rightarrow \infty \). In the results below, we discuss the features of this relaxation approach.
Lemma 5.2
Fix \(k\in \mathbb {N}\). Then, Robinson’s CQ is valid at each feasible point of (OVR\((\varepsilon _k)\)).
Proof
Let \(({\bar{x}},{\bar{y}},{\bar{u}})\in \mathbb {R}^n\times {\mathcal {Y}}\times {\mathcal {U}}\) be a feasible point of (OVR\((\varepsilon _k)\)). We consider two cases.
Case 1\(f({\bar{x}},{\bar{y}},{\bar{u}})-\varphi ({\bar{x}})=\varepsilon _k\). Suppose on the contrary that Robinson’s CQ is violated at \(({\bar{x}},{\bar{y}},{\bar{u}})\). Then, we have
Exploiting the fact that \({\mathtt {A}}\) is a bijection, this leads to
where \(\bar{{\mathtt {S}}}:={\mathtt {A}}^{-1}\circ {\mathtt {B}}\in {\mathbb {L}}\left[ {\mathcal {U}},{\mathcal {Y}}\right] \) is the solution operator of the state equation, see Lemma 3.1. The latter is only possible if
is valid. Clearly, this is a first order optimality condition for the optimization problem
Noting that f is convex while \(\bar{{\mathtt {S}}}\) is linear, the objective function of this optimization problem is convex in u. Consequently, \({\bar{u}}\) solves this problem globally. Due to the definition of \(\bar{{\mathtt {S}}}\), we have \({\bar{u}}=\psi ^u({\bar{x}})\) and \(\bar{{\mathtt {S}}}[{\bar{u}}]={\bar{y}}=\psi ^y({\bar{x}})\). As a result, \(f({\bar{x}},{\bar{y}},{\bar{u}})-\varphi ({\bar{x}})=0\) is valid. This, however, contradicts \(f({\bar{x}},{\bar{y}},{\bar{u}})-\varphi ({\bar{x}})=\varepsilon _k>0\).
Case 2\(f({\bar{x}},{\bar{y}},{\bar{u}})-\varphi ({\bar{x}})<\varepsilon _k\). The condition
is trivially satisfied. Thus, Lemma 3.1 shows that
is valid as well. However, the latter equals Robinson’s CQ for (OVR\((\varepsilon _k)\)) at \(({\bar{x}},{\bar{y}},{\bar{u}})\). \(\square \)
Lemma 5.3
For each \(k\in \mathbb {N}\), the surrogate problem (OVR\((\varepsilon _k)\)) possesses an optimal solution.
Proof
Let \(\{(x_l,y_l,u_l)\}_{l\in \mathbb {N}}\) be a minimizing sequence of (OVR\((\varepsilon _k)\)), i.e.
where \(\alpha _k\in \mathbb {R}\cup \{-\infty \}\) denotes the infimal value of (OVR\((\varepsilon _k)\)).
Obviously, \(\{x_l\}_{l\in \mathbb {N}}\subset S\) is bounded. Using the feasibility of \((x_l,y_l,u_l)\) to (OVR\((\varepsilon _k)\)), we obtain
which yields
Since \(\{x_l\}_{l\in \mathbb {N}}\) is bounded while \(\varphi \) is continuous, see Lemma 4.3, \(\{u_l\}_{l\in \mathbb {N}}\) is bounded as well. Furthermore, we have \(y_l=({\mathtt {A}}^{-1}\circ {\mathtt {B}})[u_l]\) for each \(l\in \mathbb {N}\), i.e. the continuity of \({\mathtt {A}}^{-1}\circ {\mathtt {B}}\in {\mathbb {L}}\left[ {\mathcal {U}},{\mathcal {Y}}\right] \) and the boundedness of \(\{u_l\}_{l\in \mathbb {N}}\) yield the boundedness of \(\{y_l\}_{l\in \mathbb {N}}\). Thus, the sequence \(\{(x_l,y_l,u_l)\}_{l\in \mathbb {N}}\) is bounded and, therefore, possesses a weakly convergent subsequence (w.l.o.g. we do not relabel this sequence) with weak limit point \(({\bar{x}},{\bar{y}},{\bar{u}})\) since \({\mathcal {Y}}\) and \({\mathcal {U}}\) are Hilbert spaces while \(\mathbb {R}^n\) is finite dimensional. Particularly, we have \(x_l\rightarrow {\bar{x}}\). The closedness of S and the convexity and closedness of \(U_{\text {ad}}\) lead to \({\bar{x}}\in S\) and \({\bar{u}}\in U_{\text {ad}}\). Furthermore, \({\mathtt {A}}[{\bar{y}}]-{\mathtt {B}}[{\bar{u}}]=0\) follows from the linearity and continuity of the operators \({\mathtt {A}}\) and \({\mathtt {B}}\). Since f is convex and continuous, it is weakly lower semicontinuous. Furthermore, \(\varphi \) is continuous. This yields
i.e. \(({\bar{x}},{\bar{y}},{\bar{u}})\) is feasible to (OVR\((\varepsilon _k)\)).
Finally, since F is convex and continuous, it is weakly lower semicontinuous as well and we obtain
Combining this observation with the feasibility of \(({\bar{x}},{\bar{y}},{\bar{u}})\) to (OVR\((\varepsilon _k)\)), we obtain that it is a global solution of this problem. This completes the proof. \(\square \)
Lemma 5.4
Fix \(k\in \mathbb {N}\). Let \((x,y,u)\in \mathbb {R}^n\times {\mathcal {Y}}\times {\mathcal {U}}\) be a feasible point of (OVR\((\varepsilon _k)\)) and set \({\bar{y}}:=\psi ^y(x)\) as well as \({\bar{u}}:=\psi ^u(x)\). Then, we have
where \(C>0\) is a constant independent of k, x, y, and u.
Proof
Exploiting the fact that \({\mathcal {U}}\) and \({\mathcal {M}}\) are Hilbert spaces while (y, u) is feasible to (OC(x)) and \(({\bar{y}},{\bar{u}})\) is optimal for (OC(x)), we obtain
which yields the first formula. The second one follows from \(y - {\bar{y}} = ({\mathtt {A}}^{-1}\circ {\mathtt {B}})[u - {\bar{u}}]\) with \(C:=\left\| {\mathtt {A}}^{-1}\circ {\mathtt {B}}\right\| _{{\mathbb {L}}\left[ {\mathcal {U}},{\mathcal {Y}}\right] }\). \(\square \)
Theorem 5.1
For each \(k\in \mathbb {N}\), let \(({\bar{x}}_k,{\bar{y}}_k,{\bar{u}}_k)\in \mathbb {R}^n\times {\mathcal {Y}}\times {\mathcal {U}}\) be a globally optimal solution of (OVR\((\varepsilon _k)\)). Then, the sequence \(\{({\bar{x}}_k,{\bar{y}}_k,{\bar{u}}_k)\}_{k\in \mathbb {N}}\) possesses a convergent subsequence whose limit point is a globally optimal solution of (OVR) and, thus, of (IOC). Moreover, each accumulation point of \(\{({\bar{x}}_k,{\bar{y}}_k,{\bar{u}}_k)\}_{k\in \mathbb {N}}\) is a globally optimal solution of (OVR).
Proof
Since \(\{{\bar{x}}_k\}_{k\in \mathbb {N}}\subset S\) is bounded, it possesses a convergent subsequence (w.l.o.g. we use the same index again) with limit point \({\bar{x}}\) which lies in S since this set is closed. Let us set \({\bar{y}}:=\psi ^y({\bar{x}})\) and \({\bar{u}}:=\psi ^u({\bar{x}})\). Obviously, \(({\bar{x}},{\bar{y}},{\bar{u}})\) is feasible to (OVR).
Using Lemma 5.4, we find a constant \(C>0\) such that
holds. That is why we obtain
where \(L^u>0\) denotes the Lipschitz constant of \(\psi ^u\), see Lemma 4.1. Particularly, \(\{{\bar{u}}_k\}_{k\in \mathbb {N}}\) converges to \({\bar{u}}\). Similarly, we can show that \(\{{\bar{y}}_k\}_{k\in \mathbb {N}}\) converges to \({\bar{y}}\).
Now, let \((x,y,u)\in \mathbb {R}^n\times {\mathcal {Y}}\times {\mathcal {U}}\) be feasible to (OVR). Then, it is feasible to (OVR\((\varepsilon _k)\)) for all \(k\in \mathbb {N}\) which yields
Noting that \(\{({\bar{x}}_k,{\bar{y}}_k,{\bar{u}}_k)\}_{k\in \mathbb {N}}\) converges strongly to \(({\bar{x}},{\bar{y}},{\bar{u}})\) while F is continuous, we obtain \(F({\bar{x}},{\bar{y}},{\bar{u}})\le F(x,y,u)\). Since \(({\bar{x}},{\bar{y}},{\bar{u}})\) is feasible to (OVR), it must be a globally optimal solution of this problem.
We can reprise the above arguments in order to show that each accumulation point of the sequence \(\{({\bar{x}}_k,{\bar{y}}_k,{\bar{u}}_k)\}_{k\in \mathbb {N}}\) is a globally optimal solution of (OVR). \(\square \)
If, for some \(k\in \mathbb {N}\), a globally optimal solution of (OVR\((\varepsilon _k)\)) is feasible to (IOC), then it is a globally optimal solution of the latter problem as well since its feasible set is smaller than the feasible set of (OVR\((\varepsilon _k)\)).
In the upcoming Sect. 5.3, we show how the above theory can be used to derive necessary stationarity conditions for local minimizers of the inverse optimal control problem (IOC).
5.3 Derivation of stationarity conditions
We intent to derive stationarity conditions for (IOC). However, we will only consider the case where \({\mathcal {U}}:=L^2(\varOmega )\) holds for a bounded domain \(\varOmega \subset \mathbb {R}^d\) equipped with Lebesgue’s measure and \(U_{\text {ad}}\) is given as stated below:
Therein, \(u_a,u_b:\varOmega \rightarrow [-\infty ,\infty ]\) are measurable functions such that \(U_{\text {ad}}\) is nonempty.
First, we will formulate the desired stationarity conditions.
Definition 5.1
We say that a feasible point \(({\bar{x}}, {\bar{y}}, {\bar{u}})\in \mathbb {R}^n\times {\mathcal {Y}}\times L^2(\varOmega )\) of (OVR) is weakly stationary, W-stationary for short, if there exist multipliers \({\bar{p}}\in {\mathcal {Y}}\), \({\bar{\lambda }}\in L^2(\varOmega )\), \({\bar{z}}\in \mathbb {R}^n\), \({\bar{w}}\in L^2(\varOmega )\), \({\bar{\mu }}\in {\mathcal {Y}}\), \({\bar{\rho }}\in {\mathcal {Y}}\), and \({\bar{\xi }}\in L^2(\varOmega )\) which satisfy
If these multipliers additionally satisfy the condition
then \(({\bar{x}},{\bar{y}},{\bar{u}})\) is said to be Clarke-stationary, C-stationary for short.
Here, the measurable sets \(I^{a+}({\bar{u}}),I^{b-}({\bar{u}})\subset \varOmega \) are given by
and these sets are well-defined up to subsets of \(\varOmega \) possessing measure zero.
Note that (4f)–(4i) are the KKT-conditions for the lower level problem (OC(x)) for the fixed parameter \(x={\bar{x}}\). Obviously, these conditions are always satisfied for a feasible point of (OVR). The remaining conditions (4a)–(4e), (4j), and (4k) can be interpreted as the weak stationarity conditions of the equilibrium problem
which results from (IOC) replacing the implicit constraint \((y,u)\in \varPsi (x)\) by the lower level feasibility and optimality conditions.
We will derive the system of W-stationarity by using the KKT-conditions of (OVR\((\varepsilon _k)\)) and observing the behaviour of the system as k tends to infinity. For a fixed \(k\in \mathbb {N}\), let \((x_k,y_k,u_k)\in \mathbb {R}^n\times {\mathcal {Y}}\times L^2(\varOmega )\) be a globally optimal solution of (OVR\((\varepsilon _k)\)). Due to Lemma 5.2, we know that the point \((x_k,y_k,u_k)\) satisfies the KKT-conditions of (OVR\((\varepsilon _k)\)). Thus, there exist multipliers \(z_k\in \mathbb {R}^n\), \(\alpha _k\in \mathbb {R}\), \(p_k\in {\mathcal {Y}}\), and \(\lambda _k\in L^2(\varOmega )\) which solve the system
Recall that \(\psi ^y:\mathbb {R}^n\rightarrow {\mathcal {Y}}\) and \(\psi ^u:\mathbb {R}^n\rightarrow L^2(\varOmega )\) are the solution functions of the lower level (OC(x)), while \(\phi ^p:\mathbb {R}^n\rightarrow {\mathcal {Y}}\) and \(\phi ^\lambda :\mathbb {R}^n\rightarrow L^2(\varOmega )\) denote the associated Lagrange multiplier mappings defined via the KKT-system (2), see Sect. 4 for details.
Lemma 5.5
For each \(k\in \mathbb {N}\), let \((x_k,y_k,u_k)\in \mathbb {R}^n\times {\mathcal {Y}}\times L^2(\varOmega )\) be a global minimizer of (OVR\((\varepsilon _k)\)) and let \((z_k,\alpha _k,p_k,\lambda _k)\in \mathbb {R}^n\times \mathbb {R}\times {\mathcal {Y}}\times L^2(\varOmega )\) be the respective Lagrange multipliers from (7). We assume w.l.o.g. that \(\{(x_k,y_k,u_k)\}_{k\in \mathbb {N}}\) converges to a globally optimal solution \(({\bar{x}},{\bar{y}},{\bar{u}})\in \mathbb {R}^n\times {\mathcal {Y}}\times L^2(\varOmega )\) of (IOC), see Theorem 5.1. Then, there exist \({\bar{z}}\in \mathbb {R}^n\), \({\bar{w}}\in L^2(\varOmega )\), \({\bar{\rho }}\in {\mathcal {Y}}\), \({\bar{\xi }}\in L^2(\varOmega )\), and \({\bar{\mu }}\in {\mathcal {Y}}\) such that the convergences
hold along a subsequence. Moreover, the limits \({\bar{z}}\), \({\bar{w}}\), \({\bar{\xi }}\), \({\bar{\mu }}\), and \({\bar{\rho }}\) satisfy the conditions (4a)–(4e).
Proof
Exploiting the introduced notation, we notice that
is valid for each \(k\in \mathbb {N}\). Let us focus on (8a). Multiplying (9a) and (9b) with \(\alpha _k\) and subtracting these equations from (7b) and (7c) yields
Testing (10a) with \(y_k-\psi ^y(x_k)\) gives us
and, therefore,
holds for a constant \(C_1>0\) that is independent of k. If we use this after testing (10b) with \(u_k-\psi ^u(x_k)\), we obtain
for a constant \(C_2>0\) that is independent of k. Here, the first inequality follows from \(\alpha _k\ge 0\), \(\phi ^\lambda (x_k)\in {\mathcal {N}}_{U_{\text {ad}}}(\psi ^u(x_k))\), and \(\lambda _k\in {\mathcal {N}}_{U_{\text {ad}}}(u_k)\). From above, it follows that \(\alpha _k(u_k-\psi ^u(x_k))\) is bounded in \(L^2(\varOmega )\). Hence, we can extract a weakly convergent subsequence and denote its weak limit by \({\bar{w}}\in L^2(\varOmega )\) in order to satisfy (8a).
Due to \({\mathtt {A}}[y_k-\psi ^y(x_k)]={\mathtt {B}}[u_k-\psi ^u(x_k)]\), we obtain (8b) and (4d) from (8a). Since F is continuously Fréchet differentiable, it follows from (10a) that \({\mathtt {A}}^\star [p_k-\alpha _k\phi ^p(x_k)]\) converges weakly. Since \({\mathtt {A}}\) is continuously invertible, the sequence \(\{p_k-\alpha _k\phi ^p(x_k)\}_{k\in \mathbb {N}}\) is weakly convergent as well. Thus, defining \({\bar{\rho }}\) as its weak limit yields (8c) and (4b). Using the newly obtained convergence in (10b), we get that \(\lambda _k-\alpha _k\phi ^\lambda (x_k)\) converges weakly in \(L^2(\varOmega )\). The weak limit \({\bar{\xi }}\) satisfies (8d) and (4c).
It remains to show the convergence of \(z_k\). Using Lemma 4.3, we obtain
Then, the previous convergences and (7a) imply (8e) and (4a). Finally, (4e) follows from \(x_k\rightarrow {\bar{x}}\) and \(z_k\rightarrow {\bar{z}}\) due to the properties of the normal cone. \(\square \)
Lemma 5.6
We consider the setting of Lemma 5.5. Then, the multipliers \({\bar{\xi }},{\bar{w}}\in L^2(\varOmega )\) satisfy (4j) and (4k).
Proof
We start with proving (4j). W.l.o.g., we assume that \(\{u_k\}_{k\in \mathbb {N}}\) and \(\{\psi ^u(x_k)\}_{k\in \mathbb {N}}\) converge pointwise almost everywhere to \({\bar{u}}\) (otherwise, we select a subsequence with that property). Due to \(\lambda _k\in {\mathcal {N}}_{U_{\text {ad}}}(u_k)\) and \(\phi ^\lambda (x_k)\in {\mathcal {N}}_{U_{\text {ad}}}(\psi ^u(x_k))\), we have
for all \(k\in \mathbb {N}\). Thus, the pointwise convergence of \(\{u_k\}_{k\in \mathbb {N}}\) and \(\{\psi ^u(x_k)\}_{k\in \mathbb {N}}\) implies that \( \lambda _k-\alpha _k\phi ^\lambda (x_k)\rightarrow 0\) holds almost everywhere on \(I^{a+}({\bar{u}})\cap I^{b-}({\bar{u}})\). Since \(\{\lambda _k-\alpha _k\phi ^\lambda (x_k)\}_{k\in \mathbb {N}}\) converges weakly in \(L^2(\varOmega )\) and pointwise almost everywhere on a subset, the limits must coincide on that subset, i.e. \({\bar{\xi }}=0\) holds almost everywhere on \(I^{a+}({\bar{u}})\cap I^{b-}({\bar{u}})\).
We continue with proving (4k). If \(\{\alpha _k\}_{k\in \mathbb {N}}\) has a bounded subsequence, this already implies \({\bar{w}}=0\) and (4k). Thus, we only need to consider the situation \(\alpha _k\rightarrow \infty \). The continuity of \(\phi ^\lambda \) implies \(\phi ^\lambda (x_k)\rightarrow {\bar{\lambda }}\) in \(L^2(\varOmega )\). Combining this with (8d) yields the convergence \(\alpha _k^{-1}\lambda _k\rightarrow {\bar{\lambda }}\) in \(L^2(\varOmega )\).
Now, let \(G\subset \varOmega \) be an arbitrary measurable set and let \(\chi _G\in L^\infty (\varOmega )\) denote the corresponding characteristic function which equals 1 on G and vanishes otherwise. The inequalities below follow immediately from the definition of the normal cone:
Using the strong convergence to \({\bar{\lambda }}\) and the weak convergence to \({\bar{w}}\) yields
Since \(G\subset \varOmega \) was an arbitrary measurable set, this implies (4k). \(\square \)
In the next lemma, we show that condition (5) holds under an additional assumption.
Lemma 5.7
We consider the setting of Lemma 5.5. If one of the operators \({\mathtt {B}}\) or \({\mathtt {C}}\) is compact, then the resulting multipliers \({\bar{\xi }},{\bar{w}}\in L^2(\varOmega )\) additionally satisfy the condition (5).
Proof
In this proof, we will only consider subsequences such that the convergences in Lemma 5.5 hold. Let \(G\subset \varOmega \) be an arbitrary measurable set and \(\chi _G\in L^\infty (\varOmega )\) the associated characteristic function.
Our first goal is to show
If, on the one hand, \({\mathtt {B}}\) is compact, then (11) follows immediately from (8a), (8c), and the definition of the adjoint. If, on the other hand, \({\mathtt {C}}\) is compact, then we obtain the strong convergence \(\alpha _k({\mathtt {C}}^\star \circ {\mathtt {C}})[y_k-\psi ^y(x_k)] \rightarrow ({\mathtt {C}}^\star \circ {\mathtt {C}})[{\bar{\mu }}]\). Using (10a), this implies the strong convergence \(p_k-\alpha _k\phi ^p(x_k)\rightarrow {\bar{\rho }}\) and (11) follow.
Now, we can combine (11) with (4c), (10b), and the sequential weak lower semi-continuity of \(w\mapsto \left\langle \sigma w, \chi _Gw\right\rangle _{L^2(\varOmega )}\) which yields
Here, the last inequality follows from \(\lambda _k\in {\mathcal {N}}_{U_{\text {ad}}}(u_k)\) and \(\phi ^\lambda (x_k)\in {\mathcal {N}}_{U_{\text {ad}}}(\psi ^u(x_k))\). Since \(G\subset \varOmega \) is an arbitrary subset, the pointwise condition in the claim follows. \(\square \)
Now, we are in a position to state the final result of this section.
Theorem 5.2
Let \(({\bar{x}},{\bar{y}},{\bar{u}})\in \mathbb {R}^n\times {\mathcal {Y}}\times L^2(\varOmega )\) be a locally optimal solution of (IOC). Then, \(({\bar{x}},{\bar{y}},{\bar{u}})\) is a W-stationary point of (IOC).
If, additionally, one of the operators \({\mathtt {B}}\) or \({\mathtt {C}}\) is compact, then \(({\bar{x}},{\bar{y}},{\bar{u}})\) is already a C-stationary point of (IOC).
Proof
First, we note that due to Lemma 4.1, \({\bar{x}}\) is a locally optimal solution of
Let \(\varepsilon >0\) be the associated radius of local optimality w.r.t. the 1-norm. Then, \({\bar{x}}\) is the unique globally optimal solution of the regularized problem
where \({\mathbb {B}}^\varepsilon _1({\bar{x}}):=\{x\in \mathbb {R}^n\,|\,\left| x-{\bar{x}}\right| _{1}\le \varepsilon \}\) is the closed \(\varepsilon \)-ball around \({\bar{x}}\) w.r.t. the 1-norm. Consequently, \(({\bar{x}},{\bar{y}},{\bar{u}})\) is the unique global minimizer of
Now, we are in a position to apply Lemmas 5.5, 5.6, and 5.7 to (12) which allows us to infer that \(({\bar{x}},{\bar{y}},{\bar{u}})\) is a W- or even (under the additional compactness of \({\mathtt {B}}\) or \({\mathtt {C}}\)) C-stationary point of (12).
Noting that the first-order derivative of \(x\mapsto \tfrac{1}{2}\left| x-{\bar{x}}\right| _{2}^2\) vanishes at \({\bar{x}}\) while \({\bar{x}}\) is an interior point of \({\mathbb {B}}^\varepsilon _1({\bar{x}})\), the W- and C-stationarity conditions of (12) and (IOC) at the point \(({\bar{x}},{\bar{y}},{\bar{u}})\) coincide. This shows the theorem’s assertion. \(\square \)
We finish this section with two brief remarks. First, we comment on the compactness of \({\mathtt {B}}\) and \({\mathtt {C}}\). Afterwards, the actual strength of the derived necessary optimality conditions is discussed.
Remark 5.1
Let us consider the setting of Example 2.1. Therein, we fixed a bounded domain \(\varOmega \subset \mathbb {R}^d\) and set \({\mathcal {M}}={\mathcal {U}}:=L^2(\varOmega )\) as well as \({\mathcal {Y}}:=H^1_0(\varOmega )\). Furthermore, \({\mathtt {C}}\) was chosen to be the natural embedding of \(H^1_0(\varOmega )\) into \(L^2(\varOmega )\) which is known to be compact, see [1, Theorem 6.3].
Next, assume that \({\mathtt {A}}\in {\mathbb {L}}\left[ H^1_0(\varOmega ),H^{-1}(\varOmega )\right] \) is an elliptic operator while \({\mathtt {B}}\) is given by
Thus, \({\mathtt {B}}\) equals the natural embedding of \(L^2(\varOmega )\) into \(H^{-1}(\varOmega )\) which is compact since the adjoint embedding \(H^1_0(\varOmega )\hookrightarrow L^2(\varOmega )\) is compact.
Consequently, in the standard setting of elliptic optimal control, the operators \({\mathtt {B}}\) and \({\mathtt {C}}\) are compact. Thus, due to Theorem 5.2, each locally optimal solution of the associated program (IOC) is a C-stationary point.
Remark 5.2
In the setting where \({\mathcal {Y}}:=\mathbb {R}^m\) and and \({\mathcal {U}}:=\mathbb {R}^p\) hold while the controls have to satisfy standard box-constraints at the lower level stage, the equilibrium problem (6) is a so-called mathematical program with complementarity constraints with affine data. Due to [13, Theorem 3.9], each local minimizer of this optimization problem is a so-called Mordukhovich-stationary point, and this is a stronger stationarity concept than Clarke-stationarity, see [13, Sect. 2]. As a consequence, the locally optimal solutions of the associated inverse optimal control problem are Mordukhovich-stationary points.
It remains a question of future research whether it is possible to extend the results from Theorem 5.2 to (pointwise) Mordukhovich-stationarity in the \(L^2\)-setting. Related issues w.r.t. other equilibrium-type optimal control problems are discussed in [15, 16].
6 Computing globally optimal solutions
The major drawback of Theorem 5.1 w.r.t. its applicability is the fact that the optimal value function \(\varphi \) is only implicitly known and, consequently, it is not clear how to solve the relaxed problems (OVR\((\varepsilon _k)\)) at any iterate of a potential algorithm.
Due to the convexity of \(\varphi \), see Lemma 4.2, we note that the optimal-value-reformulation (OVR) of (IOC) is a so-called DC-program where DC is the classical abbreviation for Difference of Convex Functions. This structure allows the construction of algorithms which can be used to find the global minimizers of (IOC), see [21, Sect. 5] and the references therein. However, the fact that \(\varphi \) is only implicitly known may induce some essential difficulties again when trying to adapt those methods directly. On the other hand, we note that the concavity of (OVR) only appears within the constraint \(f(x,y,u)-\varphi (x)\le 0\) and only w.r.t. the variable x.
In order to exploit this observation for the construction of a suitable solution algorithm, we approximate the convex optimal value function \(\varphi \) from above using a piecewise affine function \(\xi \) which exactly interpolates \(\varphi \) at an increasing number of points from S. Thus, we formulate a relaxed surrogate problem associated with (OVR) which is different from (OVR\((\varepsilon _k)\)). Since \(\xi \) is piecewise affine, our approach allows us to decompose the resulting surrogate problems into finitely many convex subproblems which are easy to solve. This idea is used in [9, Sect. 4] and [10, Sect. 3.6.2] to solve finite-dimensional bilevel programming problems with fully convex lower level problem.
6.1 The algorithm and its convergence properties
Let \(X:=\{x^1,\dots ,x^m\}\subset \mathbb {R}^n\) be a nonempty set such that \(S\subset {\text {int}}\,{\text {conv}}\,X\) holds true. The convexity of \(\varphi \) yields
Obviously, \(\xi _X:{\text {conv}}\,X\rightarrow \mathbb {R}\) is the optimal value function of a fully convex linear parametric optimization problem and, thus, convex and piecewise affine, see [29, Theorem 6.7, Theorem 6.9]. These properties also follow from the observation
where \({\text {epi}}\,\xi _X\) represents the epigraph of \(\xi _X\). Particularly, there exists a finite partition \(\{R_X^j\}_{j=1}^{r(X)}\) of \({\text {conv}}\,X\) into so-called regions of stability such that \(\xi _X\) is affine on every set \(R_X^j\), \(j=1,\dots ,r(X)\).
Note that for each \(i=1,\dots ,m\), we have \(\xi _X(x^i)=\varphi (x^i)\), i.e. \(\xi _X\) interpolates \(\varphi \) exactly at all the points in X. This observation gives rise to the formulation of Algorithm 1.
Below, we comment on some features of Algorithm 1.
Remark 6.1
Similar as in Lemma 5.3 it is possible to show that (OVR\((X_k)\)) possesses a global solution for each \(k\in \mathbb {N}\). As mentioned above, at each iteration \(k\in \mathbb {N}\), we can decompose S into \(r(k)\in \mathbb {N}\) regions of stability on which \(\xi _{X_k}\) is affine, respectively. Thus, (OVR\((X_k)\)) can be decomposed into r(k) convex subproblems which can be solved by exploiting standard methods.
Remark 6.2
For arbitrarily chosen \(k,k'\in \mathbb {N}\) satisfying \(k\le k'\), we have
In order to analyze the qualitative properties of Algorithm 1, we need the following lemma.
Lemma 6.1
Let \(X_1\subset \mathbb {R}^n\) be a finite set satisfying \(S\subset {\text {int}}\,{\text {conv}}\,X_1\). Then, there is a constant \(L>0\) such that \(\xi _X\) is Lipschitz continuous on the set S with modulus L for each set \(X\subset \mathbb {R}^n\) which is the union of \(X_1\) and some finite subset of S.
Proof
By definition of \(\varphi \), \(\xi _X\), and \(\xi _{X_1}\) the following relations are obvious:
Since \(\xi _{X_1}\) is continuous on the compact set \({\text {conv}}\,X_1\), its maximal value \(M\ge 0\) is well-defined and an upper bound for all the real numbers \(|\xi _X(x)|\) such that \(x\in {\text {conv}}\,X_1\) holds.
Since we have \(S\subset {\text {int}}\,{\text {conv}}\,X_1\), there is some \(\varepsilon >0\) such that \(S+{\mathbb {B}}^\varepsilon _2(0)\subset {\text {conv}}\,X_1\) holds true. Here, \({\mathbb {B}}^\varepsilon _2(0)\) denotes the closed \(\varepsilon \)-ball in \(\mathbb {R}^n\) around 0 w.r.t. the Euclidean norm.
For \(x,y\in S\) satisfying \(x\ne y\), we define
By construction, we have \(z\in S+{\mathbb {B}}^\varepsilon _2(0)\subset {\text {conv}}\,X_1\) and \(x=(1-\alpha )y+\alpha z\). Noting that \(\xi _X\) is convex, we obtain \(\xi _X(x)\le \xi _X(y)+\alpha (\xi _X(z)-\xi _X(y))\) which yields the estimate \(\xi _X(x)-\xi _X(y)\le \alpha |\xi _X(z)-\xi _X(y)|\le 2\alpha M\). Thus, we obtain
Interchanging the roles of x and y yields that \(\xi _X\) is Lipschitz continuous on S with Lipschitz modulus \(L:=\tfrac{2M}{\varepsilon }\). Note that neither M nor \(\varepsilon \) depend on the precise choice of X. Thus, the proof is complete. \(\square \)
The next example shows that the requirement \(S\subset {\text {int}}\,{\text {conv}}\,X_1\) in Lemma 6.1 cannot be relaxed.
Example 6.1
Let us assume that \(n = 2\) and \(\varphi (x) = x_2^2\) hold. Further, we set
Now, we consider an increasing sequence \(\{\theta _k\}_{k\in \mathbb {N}} \subset (0,1)\subset \mathbb {R}\) with \(\theta _k\rightarrow 1\). For all \(k\in \mathbb {N}\), let us define \(X_k := X_1 \cup \bigcup _{i = 1}^{k-1} \{(\theta _i, 0)\}\subset \mathbb {R}^2\). Then, it can be checked that
is valid. In particular, the Lipschitz constant of \(\xi _{X_k}\) on S is given by \((1 - \theta _{k-1})^{-1}\). Clearly, this term is not bounded as \(k \rightarrow \infty \).
Now, we are well prepared to study the qualitative properties of Algorithm 1.
Theorem 6.1
Either, Algorithm 1 terminates after a finite number of steps producing a globally optimal solution, or it computes a sequence \(\{({\bar{x}}_k,{\bar{y}}_k,{\bar{u}}_k)\}_{k\in \mathbb {N}}\) of globally optimal solutions of (OVR\((X_k)\)). This sequence possesses a convergent subsequence whose limit point \(({\bar{x}},{\bar{y}},{\bar{u}})\) is a globally optimal solution of (OVR) and, thus, of (IOC). Moreover, each accumulation point of \(\{({\bar{x}}_k,{\bar{y}}_k,{\bar{u}}_k)\}_{k\in \mathbb {N}}\) is a globally optimal solution of (OVR).
Proof
The feasible set of (OVR\((X_k)\)) is larger than the feasible set of (OVR) for each \(k\in \mathbb {N}\), Thus, if \(({\bar{x}}_k,{\bar{y}}_k,{\bar{u}}_k)\) is feasible to (OVR) for some \(k\in \mathbb {N}\), then this point must be a globally optimal solution of this problem.
Suppose that the algorithm does not terminate. Clearly, the sequence \(\{{\bar{x}}_k\}_{k\in \mathbb {N}}\) is bounded. The feasibility of \(({\bar{x}}_k,{\bar{y}}_k,{\bar{u}}_k)\) to (OVR\((X_k)\)) and Remark 6.2 yield
Since \(\{{\bar{x}}_k\}_{k\in \mathbb {N}}\) is bounded while \(\xi _{X_1}\) is continuous on S, \(\{{\bar{u}}_k\}_{k\in \mathbb {N}}\) is bounded. From the relation \({\bar{y}}_k=({\mathtt {A}}^{-1}\circ {\mathtt {B}})[{\bar{u}}_k]\) for each \(k\in \mathbb {N}\) and the continuity of \({\mathtt {A}}^{-1}\circ {\mathtt {B}}\in {\mathbb {L}}\left[ {\mathcal {U}},{\mathcal {Y}}\right] \), \(\{{\bar{y}}_k\}_{k\in \mathbb {N}}\) is bounded as well. Consequently, \(\{({\bar{x}}_k,{\bar{y}}_k,{\bar{u}}_k)\}_{k\in \mathbb {N}}\) is bounded and, therefore, possesses a weakly convergent subsequence (without relabeling) with weak limit \(({\bar{x}},{\bar{y}},{\bar{u}})\). Particularly, we have \({\bar{x}}_k\rightarrow {\bar{x}}\). The closedness of S and the closedness and convexity of \(U_{\text {ad}}\) yield \({\bar{x}}\in S\) and \({\bar{u}}\in U_{\text {ad}}\). Furthermore, \({\mathtt {A}}[{\bar{y}}]-{\mathtt {B}}[{\bar{u}}]=0\) follows from the linearity and continuity of \({\mathtt {A}}\) and \({\mathtt {B}}\). Let \(L>0\) be the Lipschitz constant of \(\xi _{X_k}\) on S which, due to Lemma 6.1, does not depend on k. Since f is convex and continuous, it is weakly lower semicontinuous. Furthermore, \(\varphi \) is continuous by means of Lemma 4.3. Due to the construction of the algorithm, we have \(\xi _{X_k}({\bar{x}}_{k-1})=\varphi ({\bar{x}}_{k-1})\) for each \(k\in \mathbb {N}\) which yields
Therefore, \(({\bar{x}},{\bar{y}},{\bar{u}})\) is feasible to (OVR). Moreover, the sequence of lower level function values \(\{f({\bar{x}}_k,{\bar{y}}_k,{\bar{u}}_k)\}_{k\in \mathbb {N}}\) converges to \(f({\bar{x}},{\bar{y}},{\bar{u}})\). Combining this with the weak lower semicontinuity of the functionals given by \(\mathbb {R}^n\times {\mathcal {Y}}\ni (x,y)\mapsto \tfrac{1}{2}\left\| {\mathtt {C}}[y]-{\mathtt {P}}[x]\right\| _{{\mathcal {M}}}^2\in \mathbb {R}\) and \(\mathbb {R}^n\times {\mathcal {U}}\ni (x,u)\mapsto \tfrac{1}{2}\left\| u-{\mathtt {Q}}[x]\right\| _{{\mathcal {U}}}^2\in \mathbb {R}\) yields
That is why we have \(\left\| {\bar{u}}_k-{\mathtt {Q}}[{\bar{x}}_k]\right\| _{{\mathcal {U}}}\rightarrow \left\| {\bar{u}}-{\mathtt {Q}}[{\bar{x}}]\right\| _{{\mathcal {U}}}\). Recalling \({\bar{u}}_k-{\mathtt {Q}}[{\bar{x}}_k]\rightharpoonup {\bar{u}}-{\mathtt {Q}}[{\bar{x}}]\) and the fact that \({\mathcal {U}}\) is a Hilbert space, we already have \(u_k-{\mathtt {Q}}[{\bar{x}}_k]\rightarrow {\bar{u}}-{\mathtt {Q}}[{\bar{x}}]\). Since \({\bar{x}}_k\rightarrow {\bar{x}}\) and, thus, \({\mathtt {Q}}[{\bar{x}}_k]\rightarrow {\mathtt {Q}}[{\bar{x}}]\) holds true, we obtain the strong convergence \({\bar{u}}_k\rightarrow {\bar{u}}\) in \({\mathcal {U}}\). Applying the operator \(\mathtt {A}^{-1}\circ {\mathtt {B}}\), the convergence \({\bar{y}}_k\rightarrow {\bar{y}}\) in \({\mathcal {Y}}\) follows.
Pick an arbitrary feasible point \((x,y,u)\in \mathbb {R}^n\times {\mathcal {Y}}\times {\mathcal {U}}\) of (OVR). By construction, we have
Since F is continuous while \(\{({\bar{x}}_k,{\bar{y}}_k,{\bar{u}}_k)\}_{k\in \mathbb {N}}\) converges strongly to \(({\bar{x}},{\bar{y}},{\bar{u}})\), we obtain
This shows that \(({\bar{x}},{\bar{y}},{\bar{u}})\) is a globally optimal solution of (OVR) and, consequently, for (IOC).
The above arguments can be reprised in order to show that each accumulation point of the sequence \(\{({\bar{x}}_k,{\bar{y}}_k,{\bar{u}}_k)\}_{k\in \mathbb {N}}\) is a globally optimal solution of (OVR). \(\square \)
Next, we present a counterexample which shows that Theorem 6.1 will not necessarily hold if we only require \(S \subset {\text {conv}}\,X_1\) in Algorithm 1.
Example 6.2
We consider a finite-dimensional version of (IOC). In particular, we investigate the upper level problem
in which \(\varPsi :\mathbb {R}^2 \rightrightarrows \mathbb {R}\) is the solution map of the (state-reduced) lower level problem
The set \(X_1\) is chosen as in Example 6.1. It is clear that \(\varPsi (x) = \{x_1\}\) and \(\varphi (x) = x_2^2\) hold for all \(x \in S\). Thus, the unique global solution of (13) is given by \({\bar{x}} := (0,0)\) and \({\bar{u}} := 0\).
Now, we will check that the sequence \(\{({\bar{x}}_k, {\bar{u}}_k)\}_{k\in \mathbb {N}}\) generated by Algorithm 1 (adapted to the problem at hand, cf. [9]) is recursively given by
with initial data \({\bar{x}}_0 = 0\). The convergences \({\bar{x}}_k \rightarrow (1,0)\) and \({\bar{u}}_k \rightarrow 1 + \tfrac{\sqrt{2}}{2}\) can be checked. This limit point is not a feasible point of (13).
In order to check that the algorithm produces the above iterates, we consider the subproblem
with \(X := X_1 \cup \{(\alpha , 0)\}\) where \(\alpha \in [0,1)\). As in Example 6.1, we find
Thus, (15) can be decomposed into three convex problems and, due to symmetry, we have to solve two of them. The first problem is
It can be checked (e.g. via the KKT-conditions) that its unique solution is given by
Moreover, the objective value can be bounded from above by
The second subproblem for (15) is
The last constraint implies
Thus, for a feasible point (x, u) of this second subproblem, we can bound the objective by
Hence, the global solution of (15) is given by (16).
Now, the problem solved by Algorithm 1 in step k is precisely (15) with \(\alpha = ({\bar{x}}_{k-1})_1\), see also the calculation in Example 6.1.
Finally, we mention that Example 6.2 shows that [9, Theorem 4.3] does not hold as stated. In the proof, the authors claim that \(\xi _{X_k}\) possesses the same Lipschitz constant as \(\varphi \) on S, but this is not true, see also Example 6.1.
6.2 A numerical example
Now, we are going to present a numerical example of an inverse optimal control problem with a PDE constraint. This will illustrate the performance of Algorithm 1. We will use an example where the reduced upper level objective function \(x \mapsto F(x, \psi ^y(x), \psi ^u(x))\) has multiple local minimizers and is not differentiable at the global minimum.
The problem under investigation is very similar to Example 2.1. First, we fix the domain \(\varOmega :=(-1,1)\times (-1,1)\) and set \(n:=2\), \({\mathcal {Y}}:=H_0^1(\varOmega )\), and \({\mathcal {M}}={\mathcal {U}}:=L^2(\varOmega )\). Moreover, we define \(f_1, f_2 \in L^2(\varOmega )\) via
for all \(\omega =(\omega _1,\omega _2)\in \varOmega \) and choose \(\sigma :=0.01\) for the regularization parameter. Let us set
In order to construct a nonsmooth and nonconvex example, we first set \({\tilde{x}}:=(0.3,0.3)\) and define \({\tilde{u}}\in L^2(\varOmega )\) to be the unique solution of the optimal control problem without control constraints
Actually, we do not solve (17) but its finite element discretization. For the discretization, we use piecewise linear finite elements (with mesh size 0.1) for the state y and the control u. In order to obtain a coefficientwise projection formula (for problem (19) below), we use mass lumping for the control variables.
Next, we define the lower and upper bounds \(u_a,u_b\in L^2(\varOmega )\) by
These bounds will be used in the lower level problem in order to get a nonsmooth and nonconvex example.
Now, we consider the inverse optimal control problem
where \(\varPsi :\mathbb {R}^2\rightrightarrows H^1_0(\varOmega )\times L^2(\varOmega )\) denotes the solution map of the parametric optimal control problem
For problem (19), we use the same discretization as for (17).
We give some comments concerning the construction of this problem. Since \({\tilde{u}}\) is the optimal solution of problem (17), and since \(u_a \le {\tilde{u}} \le u_b\) holds almost everywhere on \(\varOmega \) (by choice of the bounds), it is also the solution of the lower level problem (19) for \(x = {\tilde{x}}\). Moreover, since \({\tilde{u}}\) is even the solution of (17), the bounds in (19) are only weakly active, i.e., the multipliers corresponding to these bounds are zero. Thus, we could expect that the function \(x \mapsto \psi ^u(x)\) is only directionally differentiable in \({\tilde{x}}\). Finally, the objective function in the upper level problem (18) is chosen such that \({\tilde{x}}\) really becomes a global minimizer.
For convenience, we denote the reduced objective function of the upper level problem by
We can see in Fig. 1 that this construction yields a nonsmooth reduced objective function \(\gamma \) with a global minimizer at \({\bar{x}} = {\tilde{x}}=(0.3,0.3)\). Moreover, \(\gamma \) is nonsmooth at \({\bar{x}}\) and has further local minimizers at (0, 0) and approximately (0.36, 0).
We initialize the algorithm with the choice
in order to guarantee \(S\subset {\text {int}}\,{\text {conv}}\,X_1\). Then, we run 1000 iterations of Algorithm 1. Because we solve a relaxed optimization problem in each iteration, we obtain (increasing) lower bounds on the optimal value of the bilevel problem (18). On the other hand, calculating \(\gamma ({\bar{x}}_k)\) yields upper bounds. Since \({\bar{x}}_k\) is the solution of the relaxed problem, the true value \(\gamma ({\bar{x}}_k)\) can be quite large. Therefore, we denote by \({\hat{x}}_k\) the best known point of \(\gamma \) in iteration k, i.e.,
This yields a decreasing upper bound \(\gamma ({\hat{x}}_k)\).
The convergence of these lower and upper bounds, as well as the difference between both bounds can be seen in Fig. 2a. Moreover, we present the triangulation of \({\text {conv}}\,X_1\) given by the regions of stability at iteration \(k=50\) in Fig. 3. Recall that the nodes of this triangulation are given by the set \(X_k := X_1 \cup \bigcup _{j = 1}^{k-1}\{{\bar{x}}_j\}\). Finally, we record the Euclidean distances \(\left| {\bar{x}}_k-{\bar{x}}\right| _{2}\) and \(\left| {\hat{x}}_k-{\bar{x}}\right| _{2}\) in Fig. 2b, and one can believe that \({\bar{x}}_k\rightarrow {\bar{x}}\) holds as predicted by Theorem 6.1.
From these results, we can see that the convergence of the algorithm is comparatively slow. Particularly, we have to solve several auxiliary convex problems in each iteration [to obtain the global solution of (OVR\((X_k)\))]. This is, however, the expectable price we have to pay for the guaranteed convergence towards a global minimizer.
It is possible to improve the performance of the algorithm by reducing the curvature of the value function \(\varphi \). This can be done by using the modified function
instead of f as the objective function of the lower level problem (OC(x)). Here, we exploit the matrix
where \(\mathtt {S}:=\mathtt {C}\circ \mathtt {A}^{-1}\circ \mathtt {B}\) is the control-to-observation operator of (OC(x)). Note that this transformation does neither change the lower level solution mapping \(\varPsi \) nor the overall (local and global) solutions of the superordinate bilevel optimization problem. Although \({\tilde{f}}\) is no longer (jointly) convex on \(\mathbb {R}^n\times {\mathcal {Y}}\times {\mathcal {U}}\), it can be shown that it is convex on the subspace
Since the feasible set of the lower level problem is contained in this subspace, this is sufficient for our purposes. Due to the reduced curvature of the resulting optimal value function \({\tilde{\varphi }}\), its approximation by piecewise affine functions is much simpler. In Fig. 4a, we can see that the error of the upper and lower bounds converges faster compared to the case without curvature reduction in Fig. 2a. Likewise, the distances \(\left| {\bar{x}}_k-{\bar{x}}\right| _{2}\) and \(\left| {\hat{x}}_k-{\bar{x}}\right| _{2}\) converge faster as well, see Fig. 4b (cf. Fig. 2b).
7 Conclusions
In this paper, a certain class of inverse optimal control problems has been studied. Using the optimal value function \(\varphi \) of the associated parametric optimal control problem, we were able to derive necessary optimality conditions of Clarke-stationarity-type in the setting where the controls are chosen from a standard box-constrained set in \(L^2(\varOmega )\). At the moment, it is not clear whether this result can be extended to (pointwise) Mordukhovich-stationarity. Afterwards, a global solution algorithm for the problem class of interest which exploits an iteratively refined piecewise affine upper approximation of \(\varphi \) has been suggested. Its convergence properties have been investigated theoretically and illustrated by means of an example from PDE control. Finally, we demonstrated that it is possible to speed up our algorithm by reducing the curvature of \(\varphi \).
Clearly, the computational realization of Algorithm 1 heavily exploits the finite-dimensional setting of the upper level variable x. In the future, it needs to be investigated if some of our ideas can be carried over to the infinite-dimensional situation. The most difficult part in Algorithm 1 is the global solution of the nonconvex relaxed surrogate problems. While this can be done by decomposing the problem into finitely many convex optimization problems in the presence of a finite-dimensional upper level variable, it is not clear how this issue can be handled in the infinite-dimensional situation. Thus, it seems to be a nearby topic of future research to study the settings where these surrogate problems are solved only locally or to some kind of stationarity.
References
Adams, R.A., Fournier, J.J.F.: Sobolev Spaces. Elsevier Science, Oxford (2003)
Albrecht, S., Leibold, M., Ulbrich, M.: A bilevel optimization approach to obtain optimal cost functions for human arm movements. Numer. Algebra Control Optim. 2(1), 105–127 (2012). https://doi.org/10.3934/naco.2012.2.105
Albrecht, S., Ulbrich, M.: Mathematical programs with complementarity constraints in the context of inverse optimal control for locomotion. Optim. Methods Softw. 32(4), 670–698 (2017). https://doi.org/10.1080/10556788.2016.1225212
Bard, J.F.: Practical Bilevel Optimization: Algorithms and Applications. Kluwer Academic, Dordrecht (1998)
Benita, F., Mehlitz, P.: Bilevel optimal control with final-state-dependent finite-dimensional lower level. SIAM J. Optim. 26(1), 718–752 (2016). https://doi.org/10.1137/15M1015984
Bonnans, J.F., Shapiro, A.: Perturbation Analysis of Optimization Problems. Springer, New York (2000)
Dempe, S.: Foundations of Bilevel Programming. Kluwer, Dordrecht (2002)
Dempe, S., Dutta, J.: Is bilevel programming a special case of a mathematical program with complementarity constraints? Math. Program. 131(1), 37–48 (2012). https://doi.org/10.1007/s10107-010-0342-1
Dempe, S., Franke, S.: On the solution of convex bilevel optimization problems. Comput. Optim. Appl. 63(3), 685–703 (2016). https://doi.org/10.1007/s10589-015-9795-8
Dempe, S., Kalashnikov, V., Pérez-Valdéz, G., Kalashnykova, N.: Bilevel Programming Problems—Theory, Algorithms and Applications to Energy Networks. Springer, Berlin (2015)
Fiacco, A.V., Kyparisis, J.: Convexity and concavity properties of the optimal value function in parametric nonlinear programming. J. Optim. Theory Appl. 48(1), 95–126 (1986). https://doi.org/10.1007/BF00938592
Fisch, F., Lenz, J., Holzapfel, F., Sachs, G.: On the solution of bilevel optimal control problems to increase the fairness in air races. J. Guid. Control Dyn. 35(4), 1292–1298 (2012). https://doi.org/10.2514/1.54407
Flegel, M.L., Kanzow, C.: On M-stationary points for mathematical programs with equilibrium constraints. J. Math. Anal. Appl. 310(1), 286–302 (2005). https://doi.org/10.1016/j.jmaa.2005.02.011
Haraux, A.: How to differentiate the projection on a convex set in Hilbert space. Some applications to variational inequalities. J. Math. Soc. Jpn. 29(4), 615–631 (1977). https://doi.org/10.2969/jmsj/02940615
Harder, F., Wachsmuth, G.: Comparison of optimality systems for the optimal control of the obstacle problem. GAMM Mitt. 40(4), 312–338 (2018). https://doi.org/10.1002/gamm.201740004
Harder, F., Wachsmuth, G.: Optimality conditions for a class of inverse optimal control problems with partial differential equations. Optimization (2018). https://doi.org/10.1080/02331934.2018.1495205
Hatz, K.: Efficient Numerical Methods for Hierarchical Dynamic Optimization with Application to Cerebral Palsy Gait Modeling. Ph.D. thesis, University of Heidelberg, Germany (2014)
Hatz, K., Schlöder, J.P., Bock, H.G.: Estimating parameters in optimal control problems. SIAM J. Sci. Comput. 34(3), A1707–A1728 (2012). https://doi.org/10.1137/110823390
Hinze, M., Pinnau, R., Ulbich, M., Ulbrich, S.: Optimization with PDE Constraints. Springer, Berlin (2009)
Holler, G., Kunisch, K., Barnard, R.C.: A bilevel approach for parameter learning in inverse problems. Inverse Probl. 34(11), 1–28 (2018). https://doi.org/10.1088/1361-6420/aade77
Horst, R., Thoai, N.V.: DC programming: overview. J. Optim. Theory Appl. 103(1), 1–43 (1999). https://doi.org/10.1023/A:1021765131316
Kalashnikov, V., Benita, F., Mehlitz, P.: The natural gas cash-out problem: a bilevel optimal control approach. Math. Probl. Eng. (2015). https://doi.org/10.1155/2015/286083
Kinderlehrer, D., Stampacchia, G.: An Introduction to Variational Inequalities and Their Applications. Academic Press, New York (1980)
Knauer, M., Büskens, C.: Hybrid Solution Methods for Bilevel Optimal Control Problems with Time Dependent Coupling. In: M. Diehl, F. Glineur, E. Jarlebring, W. Michiels (eds.) Recent Advances in Optimization and its Applications in Engineering: The 14th Belgian-French-German Conference on Optimization, pp. 237–246. Springer, Berlin (2010). https://doi.org/10.1007/978-3-642-12598-0_20
Lewis, F.L., Vrabie, D., Syrmos, V.L.: Optimal Control. Wiley, Hoboken (2012)
Mehlitz, P.: Necessary optimality conditions for a special class of bilevel programming problems with unique lower level solution. Optimization 66(10), 1533–1562 (2017). https://doi.org/10.1080/02331934.2017.1349123
Mehlitz, P., Wachsmuth, G.: Weak and strong stationarity in generalized bilevel programming and bilevel optimal control. Optimization 65(5), 907–935 (2016). https://doi.org/10.1080/02331934.2015.1122007
Mombaur, K., Truong, A., Laumond, J.P.: From human to humanoid locomotion—an inverse optimal control approach. Auton. Robots 28(3), 369–383 (2010). https://doi.org/10.1007/s10514-009-9170-7
Nožička, F., Guddat, J., Hollatz, H., Bank, B.: Theorie der Linearen Parametrischen Optimierung. Akademie, Berlin (1974)
Outrata, J.V.: On the numerical solution of a class of Stackelberg problems. Z. für Oper. Res. 34(4), 255–277 (1990). https://doi.org/10.1007/BF01416737
Robinson, S.M.: Stability theory for systems of inequalities, part II: differentiable nonlinear systems. SIAM J. Numer. Anal. 13(4), 497–513 (1976). https://doi.org/10.1137/0713043
Shimizu, K., Ishizuka, Y., Bard, J.F.: Nondifferentiable and Two-Level Mathematical Programming. Kluwer Academic, Dordrecht (1997)
Tröltzsch, F.: Optimal Control of Partial Differential Equations. Vieweg, Wiesbaden (2009)
Troutman, J.L.: Variational Calculus and Optimal Control. Springer, New York (1996)
Ye, J.J.: Necessary conditions for bilevel dynamic optimization problems. SIAM J. Control Optim. 33(4), 1208–1223 (1995). https://doi.org/10.1137/S0363012993249717
Ye, J.J.: Optimal strategies for bilevel dynamic problems. SIAM J. Control Optim. 35(2), 512–531 (1997). https://doi.org/10.1137/S0363012993256150
Ye, J.J., Zhu, D.L.: Optimality conditions for bilevel programming problems. Optimization 33(1), 9–27 (1995). https://doi.org/10.1080/02331939508844060
Zowe, J., Kurcyusz, S.: Regularity and stability for the mathematical programming problem in Banach spaces. Appl. Math. Optim. 5(1), 49–62 (1979). https://doi.org/10.1007/BF01442543
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
This work is supported by the DFG Grant Analysis and Solution Methods for Bilevel Optimal Control Problems (Grant Nos. DE 650/10-1 and WA3636/4-1) within the Priority Program SPP 1962 (Non-smooth and Complementarity-based Distributed Parameter Systems: Simulation and Hierarchical Optimization).
Rights and permissions
About this article
Cite this article
Dempe, S., Harder, F., Mehlitz, P. et al. Solving inverse optimal control problems via value functions to global optimality. J Glob Optim 74, 297–325 (2019). https://doi.org/10.1007/s10898-019-00758-1
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10898-019-00758-1
Keywords
- Bilevel optimal control
- Global optimization
- Inverse optimal control
- Optimality conditions
- Solution algorithm