1 Introduction

For some positive time \(T>0\) and the associated time interval \(I:=(0,T)\), we consider the following optimal control problem with terminal complementarity constraints

$$\begin{aligned} \begin{aligned} J(x,u)&\,\rightarrow \,\min \limits _{x,u}&\\ \mathtt S(u)-x&\,=\,0&\\ g_i(x(T))&\,\le \,0&i\in {\mathcal {L}}\\ 0\,\le \, G_j(x(T))\perp H_j(x(T))&\,\ge \,0&j\in {\mathcal {K}}. \end{aligned} \end{aligned}$$
(OCTCC)

Here, we use the index sets \({\mathcal {L}}:=\{1,\ldots ,l\}\) and \({\mathcal {K}}:=\{1,\ldots ,k\}\). Furthermore, the mapping \(\mathtt S:L^2(I)^m\rightarrow H^1(I)^n\) represents the operator which assigns to any \(u\in L^2(I)^m\) the uniquely determined solution \(x\in H^1(I)^n\) of the ODE-system

$$\begin{aligned} \begin{aligned} \dot{x}(t)-\mathbf Ax(t)-\mathbf Bu(t)&\,=\,0&\text {a.e.}\ \text { on }I\\ x(0)&\,=\,0.&\end{aligned} \end{aligned}$$
(ODE)

It is well known that \(\mathtt S\) is linear and continuous, see [2, Section 18]. For simplicity, the target-type objective functional \(J:H^1(I)^n\times L^2(I)^m\rightarrow {\mathbb {R}}\) given by

$$\begin{aligned} J(x,u):=f(x(T))+\tfrac{1}{2}\left\| x-\mathbf x_\text {d}\right\| _{L^2(I)^n}^2+\tfrac{\sigma }{2}\left\| u-\mathbf u_\text {d}\right\| _{L^2(I)^m}^2 \end{aligned}$$

for any \(x\in H^1(I)^n\) and \(u\in L^2(I)^m\) will be considered throughout this manuscript. However, it is possible to add some integral terms postulating additional assumptions. Note that terminal equality constraints can be easily added to the model as well.

In the recent paper [5], problem (OCTCC) was presented in a more general setting, where mixed control-state constraints were included as well. Some real-world applications from gas balancing on energy markets [15] or multi-agent control [4, 7, 16] motivate the study of (OCTCC). Clearly, model (OCTCC) belongs to the rich class of so-called mathematical programs with complementarity constraints, MPCCs for short.

The precise assumptions on (OCTCC) are stated below.

Assumption 1.1

The functions \(f,g_1,\ldots ,g_l,G_1,\ldots ,G_k,H_1,\ldots ,H_k:{\mathbb {R}}^n\rightarrow {\mathbb {R}}\) are continuously differentiable. Furthermore, f is bounded from below. The matrices \(\mathbf A\in {\mathbb {R}}^{n\times n}\) and \(\mathbf B\in {\mathbb {R}}^{n\times m}\) are chosen such that

$$\begin{aligned}\begin{bmatrix}\mathbf B&\mathbf A\mathbf B&\dots&\mathbf A^{n-1}\mathbf B\end{bmatrix}\in {\mathbb {R}}^{n\times nm}\end{aligned}$$

possesses full row rank n (i.e. the ODE-system in the constraints of (OCTCC) is controllable, see [3]). In order to exclude trivial situations, it is assumed that (OCTCC) possesses at least one feasible point. Finally, the regularization parameter \(\sigma >0\), the desired state \(\mathbf x_\text {d}\in H^1(I)^n\), and the desired control \(\mathbf u_\text {d}\in L^2(I)^m\) are fixed.

Under the postulated assumptions, (OCTCC) possesses an optimal solution, see [5, Theorem 5.1]. In [5], the authors derive necessary optimality conditions and constraint qualifications for (OCTCC) which were motivated by the rich theory on finite-dimensional complementarity programming, see [17, 19]. However, the manuscript does not answer the question how problem (OCTCC) can be treated numerically. The aim of this study is to close this gap. Therefore, we follow standard ideas and relax the complementarity constraints as suggested by Scholtes [21]. However, there exist several other relaxation techniques for the computational treatment of complementarity constraints, see [13]. In principle, the findings of this study can be extended to all these relaxation methods doing some obvious changes. Particularly, this manuscript justifies the application of well-known numerical methods for the treatment of finite-dimensional complementarity problems to solve (OCTCC). Note that we do not focus on numerical details of the proposed method’s implementation. Here, we abstain from a detailed numerical analysis associated with (OCTCC).

The remaining parts of the paper are structured as follows: Sect. 2 presents fundamental notation and recalls some preliminaries from [5]. Afterwards, we study Scholtes’ relaxation scheme for (OCTCC) in Sect. 3. First, a global convergence result is presented. Second, we consider the situation where only Karush–Kuhn–Tucker (KKT for short) points of the surrogate problems are computed. We visualize the proposed approach in Sect. 4 by means of two numerical examples where the second one represents a particular model from multi-agent control with terminal friction constraints. Finally, the paper is closed by some concluding remarks in Sect. 5.

2 Preliminaries

In this work, we basically exploit standard notation which has been used in [5] already. However, let us briefly recall that \(L^2(I)\) denotes the Banach space of all (equivalence classes of) scalar, measurable function on I whose square is Lebesgue integrable which is equipped with the usual norm \(\left\| \cdot \right\| _{L^2(I)}\). Furthermore, \(H^1(I)\) is the common Sobolev space of all functions \(x\in L^2(I)\) which possess a weak derivative \(\dot{x}\) which belongs to \(L^2(I)\) as well. It is equipped with the norm defined below:

$$\begin{aligned} \forall x\in H^1(I):\quad \left\| x\right\| _{H^1(I)}:=\left( \left\| x\right\| _{L^2(I)}+\left\| \dot{x}\right\| _{L^2(I)}\right) ^{1/2}. \end{aligned}$$

By \(L^2(I)^n\) and \(H^1(I)^n\) we denote the n-fold Cartesian product of \(L^2(I)\) and \(H^1(I)\), respectively. Let the mapping \(\mathtt E:H^1(I)^n\rightarrow L^2(I)^n\) be the associated natural embedding and denote by \(\mathtt E_T:H^1(I)^n\rightarrow {\mathbb {R}}^n\) the evaluation operator \(H^1(I)^n\ni x\mapsto x(T)\in {\mathbb {R}}^n\). For brevity, we will use the notation \(x_T:=x(T)=\mathtt E_T(x)\) for all \(x\in H^1(I)^n\) throughout the remaining parts of the paper. By means of [5, Lemma 4.2], \(\mathtt E\) and \(\mathtt E_T\) are compact.

Recall that \(\mathtt S:L^2(I)^m\rightarrow H^1(I)^n\) denotes the solution operator of (ODE). We set \(\overline{\mathtt S}:=\mathtt E\circ \mathtt S\) and \(\mathtt S_T:=\mathtt E_T\circ \mathtt S\). Note \(\overline{\mathtt S}:L^2(I)^m\rightarrow L^2(I)^n\) and \(\mathtt S_T:L^2(I)^m\rightarrow {\mathbb {R}}^n\). The controllability of the system (ODE) ensures that the operator \(\mathtt S_T\) is surjective. Thus, we already know that the associated adjoint \(\mathtt S_T^\star \) is injective. Below, the adjoint operators \(\overline{\mathtt S}^\star \) and \(\mathtt S_T^\star \) are characterized explicitly. These results are taken from [5, Lemmas 4.3, 4.4].

Lemma 2.1

  1. 1.

    For any \(v\in L^2(I)^n\), we have \(\overline{\mathtt S}^\star (v)=\mathbf B^\top p_1\) where \(p_1\in H^1(I)^n\) is the unique solution of the ODE-system

    $$\begin{aligned}\begin{aligned} \dot{p}_1(t)+\mathbf A^\top p_1(t)+v(t)&\,=\,0\qquad \text {a.e.}\ \text { on }I\\ p_1(T)&\,=\,0. \end{aligned}\end{aligned}$$
  2. 2.

    For any \(b\in {\mathbb {R}}^n\), we have \(\mathtt S_T^\star (b)=\mathbf B^\top p_2\) where \(p_2\in H^1(I)^n\) is the unique solution of the ODE-system

    $$\begin{aligned} \begin{aligned} \dot{p}_2(t)+\mathbf A^\top p_2(t)&\,=\,0\qquad \text {a.e.}\ \text { on }I\\ p_2(T)&\,=\,b. \end{aligned} \end{aligned}$$

The subsequent corollary follows easily from the above lemma.

Corollary 2.1

For \(v\in L^2(I)^n\), \(b\in {\mathbb {R}}^n\), and \(w\in L^2(I)^m\), the function \(p\in H^1(I)^n\) may solve the system

$$\begin{aligned}\begin{aligned} \dot{p}(t)+\mathbf A^\top p(t)+v(t)&\,=\,0\qquad \text {a.e.}\ \text { on }I\\ w(t)+\mathbf B^\top p(t)&\,=\,0\qquad \text {a.e.}\ \text { on }I\\ p(T)&\,=\,b. \end{aligned}\end{aligned}$$

Then, \(0=\overline{\mathtt S}^\star (v)+\mathtt S_T^\star (b)+w\) holds true.

3 Scholtes’ relaxation technique

The terminal complementarity constraints appearing in our model (OCTCC) make a direct numerical treatment difficult due to two main issues: First, the feasible set of (OCTCC) is almost disconnected. Second, the appearance of the terminal complementarity constraint implies that standard regularity assumptions from nonlinear programming are generally violated at the feasible points of (OCTCC). In order to overcome these difficulties, for a sequence \(\{\theta ^r\}_{r\in {\mathbb {N}}}\subset {\mathbb {R}}^+\) of positive relaxation parameters converging to zero, we consider the relaxed problem

figure a

This idea was introduced in [21] to solve standard complementarity problems in the finite-dimensional context. Qualitative results associated with this relaxation approach are presented in [13] as well. Noting that the complementarity requirement enters problem (OCTCC) only in terms of the terminal conditions, it is reasonable to think that some of these results can be generalized to the optimal control setting we are considering here.

Mimicking the proof of [5, Theorem 5.1], we easily see that (\(\hbox {OCTCC}(\theta ^r)\)) possesses an optimal solution for any \(r\in {\mathbb {N}}\). Clearly, the feasible set of (\(\hbox {OCTCC}(\theta ^r)\)) is a superset of the feasible set of (OCTCC). Consequently, if a local minimizer of (\(\hbox {OCTCC}(\theta ^r)\)) is already feasible to (OCTCC), then this point is a local minimizer of the latter.

Forthwith, we will present two types of convergence results. First, we investigate the situation where (\(\hbox {OCTCC}(\theta ^r)\)) can be solved globally for any \(r\in {\mathbb {N}}\). This might be possible if the structure of the terminal constraints appearing in (OCTCC) is not too difficult. Here, we show the natural result that the global minimizers of (\(\hbox {OCTCC}(\theta ^r)\)) converge strongly (along a subsequence) to a global minimizer of (OCTCC). Second, we examine the case where only KKT points of the surrogate problems (\(\hbox {OCTCC}(\theta ^r)\)) can be computed. Due to the nonconvexity of (\(\hbox {OCTCC}(\theta ^r)\)), this assumption is much more natural than the first one. Clearly, one cannot hope that the computed sequence converges to a local or even global minimizer of (OCTCC). However, we prove that under reasonable assumptions a sequence of KKT points associated with (\(\hbox {OCTCC}(\theta ^r)\)) converges (along a subsequence) to a so-called Clarke-stationary point of (OCTCC), see Definition 3.2 below. This seems to be a natural extension of a similar convergence result for finite-dimensional MPCCs obtained in [13, Theorem 3.1].

We start with the promised global convergence result.

Theorem 3.1

Let \(\{\theta ^r\}_{r\in {\mathbb {N}}}\subset {\mathbb {R}}^+\) be a sequence of positive relaxation parameters converging to zero. For any \(r\in {\mathbb {N}}\), let \((\bar{x}^r,\bar{u}^r)\in H^1(I)^n\times L^2(I)^m\) be a globally optimal solution of (\(\hbox {OCTCC}(\theta ^r)\)). Then, \(\{(\bar{x}^r,\bar{u}^r)\}_{r\in {\mathbb {N}}}\) possesses a convergent subsequence (without relabeling) whose limit point \((\bar{x},\bar{u})\in H^1(I)^n\times L^2(I)^m\) is a globally optimal solution of (OCTCC).

Proof

Fix an arbitrary feasible point \((x,u)\in H^1(I)^n\times L^2(I)^m\) of (OCTCC). Clearly, (xu) is feasible to (\(\hbox {OCTCC}(\theta ^r)\)) as well which yields

$$\begin{aligned} \begin{aligned}&f(\bar{x}^r_T)+\tfrac{1}{2}\left\| \bar{x}^r-\mathbf x_\text {d}\right\| _{L^2(I)^n}^2+\tfrac{\sigma }{2}\left\| \bar{u}^r-\mathbf u_\text {d}\right\| _{L^2(I)^m}^2\\&\quad \le f(x_T)+\tfrac{1}{2}\left\| x-\mathbf x_\text {d}\right\| _{L^2(I)^n}^2+\tfrac{\sigma }{2}\left\| u-\mathbf u_\text {d}\right\| _{L^2(I)^m}^2 \end{aligned} \end{aligned}$$

for any \(r\in {\mathbb {N}}\) by definition of J. Noting that f is bounded from below while \(\sigma >0\) holds true, \(\{\bar{u}^r\}_{r\in {\mathbb {N}}}\) must be bounded in \(L^2(I)^m\). Thus, it possesses a weakly convergent subsequence (without relabeling) with weak limit \(\bar{u}\in L^2(I)^m\). Set \(\bar{x}:=\mathtt S(\bar{u})\). Due to the continuity of \(\mathtt S\), \(\{\bar{x}^r\}_{r\in {\mathbb {N}}}\) converges weakly in \(H^1(I)^n\) to \(\bar{x}\). Since \(\mathtt E\) is compact, we obtain \(\bar{x}^r\rightarrow \bar{x}\) in \(L^2(I)^n\). Noting that \(\mathtt E_T\) is a compact operator as well, we have \(\bar{x}^r_T\rightarrow \bar{x}_T\) in \({\mathbb {R}}^n\). The continuity of g, G, and H as well as \(\theta ^r\rightarrow 0\) guarantee the feasibility of \((\bar{x},\bar{u})\) to (OCTCC).

Next, we exploit the above convergences, the continuity of f, as well as the weak lower semicontinuity of norms in order to obtain

$$\begin{aligned} J(\bar{x},\bar{u})&=f(\bar{x}_T)+\tfrac{1}{2}\left\| \bar{x}-\mathbf x_\text {d}\right\| _{L^2(I)^n}^2+\tfrac{\sigma }{2}\left\| \bar{u}-\mathbf u_\text {d}\right\| _{L^2(I)^m}^2\\&\le \lim \limits _{r\rightarrow \infty }f(\bar{x}^r_T) +\lim \limits _{r\rightarrow \infty }\tfrac{1}{2}\left\| \bar{x}^r-\mathbf x_\text {d}\right\| _{L^2(I)^n}^2 +\liminf \limits _{r\rightarrow \infty }\tfrac{\sigma }{2}\left\| \bar{u}^r-\mathbf u_\text {d}\right\| _{L^2(I)^m}^2\\&=\liminf \limits _{r\rightarrow \infty }\left( f(\bar{x}^r_T)+\tfrac{1}{2}\left\| \bar{x}^r-\mathbf x_\text {d}\right\| _{L^2(I)^n}^2 +\tfrac{\sigma }{2}\left\| \bar{u}^r-\mathbf u_\text {d}\right\| _{L^2(I)^m}^2\right) \\&\le \limsup \limits _{r\rightarrow \infty }\left( f(\bar{x}^r_T)+\tfrac{1}{2}\left\| \bar{x}^r-\mathbf x_\text {d}\right\| _{L^2(I)^n}^2 +\tfrac{\sigma }{2}\left\| \bar{u}^r-\mathbf u_\text {d}\right\| _{L^2(I)^m}^2\right) \\&\le f(x_T)+\tfrac{1}{2}\left\| x-\mathbf x_\text {d}\right\| _{L^2(I)^n}^2+\tfrac{\sigma }{2}\left\| u-\mathbf u_\text {d}\right\| _{L^2(I)^m}^2=J(x,u) \end{aligned}$$

Thus, \((\bar{x},\bar{u})\) solves (OCTCC) globally. Choosing \(x:=\bar{x}\) as well as \(u:=\bar{u}\) and exploiting \(f(\bar{x}^r_T)\rightarrow f(\bar{x}_T)\) as well as \(\bar{x}^r\rightarrow \bar{x}\) in \(L^2(I)^n\), we obtain \(\left\| \bar{u}^r-\mathbf u_\text {d}\right\| _{L^2(I)^m}\rightarrow \left\| \bar{u}-\mathbf u_\text {d}\right\| _{L^2(I)^m}\). We combine this with the weak convergence \(\bar{u}^r\rightharpoonup \bar{u}\) in \(L^2(I)^m\) and the property of \(L^2(I)^m\) to be a Hilbert space in order to get the strong convergence \(\bar{u}^r\rightarrow \bar{u}\) in \(L^2(I)^m\). Finally, the continuity of \(\mathtt S\) already yields \(\bar{x}^r\rightarrow \bar{x}\) in \(H^1(I)^n\) which completes the proof. \(\square \)

In practice, the computation of a globally optimal solution of the nonconvex relaxed surrogate problem (\(\hbox {OCTCC}(\theta ^r)\)) might be difficult. Instead, it is a nearby presumption that only KKT points of the surrogate problems can be identified. In the following definition, the KKT conditions of (\(\hbox {OCTCC}(\theta ^r)\)) are presented. The derivation of this system is omitted here since the necessary arguments mainly parallel those ones used in [5, Section 6].

Definition 3.1

For fixed \(r\in {\mathbb {N}}\), let \((\bar{x}^r,\bar{u}^r)\in H^1(I)^n\times L^2(I)^m\) be feasible to problem (\(\hbox {OCTCC}(\theta ^r)\)). Then, \((\bar{x}^r,\bar{u}^r)\) is a KKT point of (\(\hbox {OCTCC}(\theta ^r)\)) if and only if there are an adjoint state \(p^r\in H^1(I)^n\) and multipliers \(\lambda ^r\in {\mathbb {R}}^l\) as well as \(\alpha ^r,\beta ^r,\xi ^r\in {\mathbb {R}}^k\) which solve the following system:

$$\begin{aligned} 0&=\dot{p}^r(t)+\mathbf A^\top p^r(t)+\bar{x}^r(t)-\mathbf x_\text {d}(t)\qquad \text {a.e.}\ \text { on }I,\\ p^r_T&=\nabla f(\bar{x}^r_T)+\sum \limits _{i\in {\mathcal {L}}}\lambda _i^r\nabla g_i(\bar{x}^r_T)\\&\quad -\sum \limits _{j\in {\mathcal {K}}}\bigl [\alpha _j^r-\xi ^r_jH_j(\bar{x}^r_T)\bigr ]\nabla G_j(\bar{x}^r_T)\\&\quad -\sum \limits _{j\in {\mathcal {K}}}\bigl [\beta _j^r-\xi _j^rG_j(\bar{x}^r_T)\bigr ]\nabla H_j(\bar{x}^r_T),\\ 0&=\sigma (\bar{u}^r(t)- \mathbf u_\text {d}(t))+\mathbf B^\top p^r(t)\qquad \text {a.e.}\ \text { on }I,\\ \lambda ^r&\ge 0,\quad \forall i\notin {\mathcal {I}}^g_r:\,\lambda _i^r=0,\\ \alpha ^r&\ge 0,\quad \forall j\notin {\mathcal {I}}^G_r:\,\alpha _j^r=0,\\ \beta ^r&\ge 0,\quad \forall j\notin {\mathcal {I}}^H_r:\,\beta _j^r=0,\\ \xi ^r&\ge 0,\quad \forall j\notin {\mathcal {I}}^{GH}_r:\,\xi _j^r=0. \end{aligned}$$

Here, the index sets \({\mathcal {I}}^g_r\), \({\mathcal {I}}^G_r\), \({\mathcal {I}}^H_r\), and \({\mathcal {I}}^{GH}_r\) are defined as stated below:

$$\begin{aligned} {\mathcal {I}}^g_r&:=\{i\in {\mathcal {L}}\,|\,g_i(\bar{x}^r_T)=0\},\\ {\mathcal {I}}^{G}_r&:=\{j\in {\mathcal {K}}\,|\,G_j(\bar{x}^r_T)=0\},\\ {\mathcal {I}}^{H}_r&:=\{j\in {\mathcal {K}}\,|\,H_j(\bar{x}^r_T)=0\},\\ {\mathcal {I}}^{GH}_r&:=\{j\in {\mathcal {K}}\,|\,G_j(\bar{x}^r_T)H_j(\bar{x}^r_T)-\theta ^r=0\}. \end{aligned}$$

As mentioned above, the second convergence result of this work shows that a sequence \(\{(\bar{x}^r,\bar{u}^r)\}_{r\in {\mathbb {N}}}\subset H^1(I)^n\times L^2(I)^m\) of KKT points associated with (\(\hbox {OCTCC}(\theta ^r)\)) contains a convergent subsequence whose limit point is a so-called Clarke-stationary point of (OCTCC) provided \(\{\bar{u}^r\}_{r\in {\mathbb {N}}}\) is bounded. This observation parallels classical results from [13, 21] for standard complementarity problems. Below, we state an appropriate generalization of Clarke-stationarity for (OCTCC). Other reasonable stationarity notions for (OCTCC), namely weak, Mordukhovich- and strong stationarity, are introduced in [5, Section 6].

Definition 3.2

Let \((\bar{x},\bar{u})\in H^1(I)^n\times L^2(I)^m\) be a feasible point of (OCTCC). Then, \((\bar{x},\bar{u})\) is called Clarke-stationary, C-stationary for short, if and only if there exist an adjoint state \(p\in H^1(I)^n\) as well as multipliers \(\lambda \in {\mathbb {R}}^l\) and \(\mu ,\nu \in {\mathbb {R}}^k\) which satisfy the following conditions:

$$\begin{aligned} 0&=\dot{p}(t)+\mathbf A^\top p(t)+\bar{x}(t)-\mathbf x_\text {d}(t)\qquad \text {a.e.}\ \text { on }I, \end{aligned}$$
(1a)
$$\begin{aligned} p_T&=\nabla f(\bar{x}(T))+\sum \limits _{i\in {\mathcal {L}}}\lambda _i\nabla g_i(\bar{x}_T) -\sum \limits _{j\in {\mathcal {K}}}\bigl [\mu _j\nabla G_j(\bar{x}_T)+\nu _j\nabla H_j(\bar{x}_T)\bigr ],\end{aligned}$$
(1b)
$$\begin{aligned} 0&=\sigma (\bar{u}(t)- \mathbf u_\text {d}(t))+\mathbf B^\top p(t)\qquad \text {a.e.}\ \text { on }I,\end{aligned}$$
(1c)
$$\begin{aligned} \lambda&\ge 0,\quad \forall i\notin {\mathcal {I}}^g:\,\lambda _i=0,\end{aligned}$$
(1d)
$$\begin{aligned} \forall j&\in {\mathcal {I}}^{+0}:\,\mu _j=0,\end{aligned}$$
(1e)
$$\begin{aligned} \forall j&\in {\mathcal {I}}^{0+}:\,\nu _j=0,\end{aligned}$$
(1f)
$$\begin{aligned} \forall j&\in {\mathcal {I}}^{00}:\,\mu _j\nu _j\ge 0. \end{aligned}$$
(1g)

Therein, the index sets \({\mathcal {I}}^g\), \({\mathcal {I}}^{+0}\), \({\mathcal {I}}^{0+}\), and \({\mathcal {I}}^{00}\) are defined as stated below:

$$\begin{aligned}\begin{aligned} {\mathcal {I}}^g&:=\{i\in {\mathcal {L}}\,|\,g_i(\bar{x}_T)=0\},\\ {\mathcal {I}}^{0+}&:=\{j\in {\mathcal {K}}\,|\,G_j(\bar{x}_T)=0\,\wedge \, H_j(\bar{x}_T)>0\},\\ {\mathcal {I}}^{+0}&:=\{j\in {\mathcal {K}}\,|\,G_j(\bar{x}_T)>0\,\wedge \, H_j(\bar{x}_T)=0\},\\ {\mathcal {I}}^{00}&:=\{j\in {\mathcal {K}}\,|\,G_j(\bar{x}_T)=0\,\wedge \, H_j(\bar{x}_T)=0\}. \end{aligned}\end{aligned}$$

Next, we postulate our standing assumptions for further theoretical investigations of the relaxation technique.

Assumption 3.1

We fix a sequence \(\{\theta ^r\}_{r\in {\mathbb {N}}}\subset {\mathbb {R}}^+\) converging to zero. For any \(r\in {\mathbb {N}}\), let \((\bar{x}^r,\bar{u}^r)\in H^1(I)^n\times L^2(I)^m\) be a KKT point of (\(\hbox {OCTCC}(\theta ^r)\)). Furthermore, let \(p^r\in H^1(I)^n\), \(\lambda ^r\in {\mathbb {R}}^l\), and \(\alpha ^r,\beta ^r,\xi ^r\in {\mathbb {R}}^k\) be the associated Lagrange multipliers, see Definition 3.1. We assume that \(\{\bar{u}^r\}_{r\in {\mathbb {N}}}\) is bounded and, thus, possesses a weakly convergent subsequence with weak limit \(\bar{u}\in L^2(I)^m\). For simplicity of notation, we do not relabel this subsequence but use the expression \(\{\bar{u}^r\}_{r\in {\mathbb {N}}}\) again. Finally, \(\bar{x}:=\mathtt S(\bar{u})\) is the state function associated with \(\bar{u}\).

Let us briefly comment on the assumption that \(\{\bar{u}^r\}_{r\in {\mathbb {N}}}\) is bounded.

Remark 3.1

The boundedness of \(\{\bar{u}^r\}_{r\in {\mathbb {N}}}\subset L^2(I)^m\) is not very restrictive and can be guaranteed under a mild assumption: If the sequence \(\{J(\bar{x}^r,\bar{u}^r)\}_{r\in {\mathbb {N}}}\) of objective values associated with the KKT points \((\bar{x}^r,\bar{u}^r)\) of (\(\hbox {OCTCC}(\theta ^r)\)) is bounded in \({\mathbb {R}}\), then \(\{\bar{u}^r\}_{r\in {\mathbb {N}}}\) needs to be bounded in \(L^2(I)^m\) since the function f is bounded from below. Note that this observation has been exploited in the proof of Theorem 3.1 as well.

For the proof of our next convergence result, we need to study the behavior of the index sets \({\mathcal {I}}^g_r\), \({\mathcal {I}}^G_r\), and \({\mathcal {I}}^H_r\) as \(r\rightarrow \infty \). Some corresponding observations which directly follow from Assumption 3.1 are stated in the lemma below.

Lemma 3.1

For sufficiently large \(r\in {\mathbb {N}}\), the following relations hold true:

$$\begin{aligned}\begin{aligned}&{\mathcal {I}}^g_r\subset {\mathcal {I}}^g,\\&{\mathcal {I}}^G_r\subset {\mathcal {I}}^{0+}\cup {\mathcal {I}}^{00},\\&{\mathcal {I}}^H_r\subset {\mathcal {I}}^{+0}\cup {\mathcal {I}}^{00}. \end{aligned}\end{aligned}$$

Proof

Due to \(\bar{u}^r\rightharpoonup \bar{u}\) in \(L^2(I)^m\) and the continuity of the solution operator \(\mathtt S\) associated with (ODE), we obtain \(\bar{x}^r\rightharpoonup \bar{x}\) in \(H^1(I)^n\). Recalling that \(\mathtt E_T\) is compact, \(\bar{x}^r_{T}\rightarrow \bar{x}_{T}\) holds true in \({\mathbb {R}}^n\). Thus, the validity of the presented inclusions is guaranteed by continuity of g, G, and H.\(\square \)

Now, we are in position to state our second convergence result. For its validation, we exploit ideas used in the proof of [13, Theorem 3.1]. However, some essentials of infinite-dimensional programming have to be taken into account as well in order to show the following theorem.

Theorem 3.2

Suppose that the following constraint qualification is valid:

$$\begin{aligned} \left. \begin{aligned}&0=\sum \limits _{i\in {\mathcal {L}}}\lambda _i\nabla g_i(\bar{x}_{T}) -\sum \limits _{j\in {\mathcal {K}}}\bigl [\mu _j\nabla G_j(\bar{x}_{T})+\nu _j\nabla H_j(\bar{x}_{T})\bigr ],\\&\lambda \ge 0,\quad \forall i\notin {\mathcal {I}}^g:\,\lambda _i=0,\\&\forall j\in {\mathcal {I}}^{+0}:\,\mu _j=0,\\&\forall j\in {\mathcal {I}}^{0+}:\,\nu _j=0 \end{aligned} \right\} \,\Longrightarrow \, \left\{ \begin{aligned}&\lambda =0,\\&\mu =0,\\&\nu =0. \end{aligned} \right. \end{aligned}$$
(CQ)

Then, \((\bar{x},\bar{u})\) is a C-stationary point of (OCTCC). Furthermore, the following convergences hold along a subsequence:

$$\begin{aligned} \bar{x}^r\rightarrow \bar{x}\quad \text {in }H^1(I)^n,\qquad \bar{u}^r\rightarrow \bar{u}\quad \text {in }L^2(I)^m,\qquad p^r\rightarrow p\quad \text {in }H^1(I)^n.\end{aligned}$$

Here, \(p\in H^1(I)^n\) is the adjoint state which appears in the system (1).

If \(\mathbf u_d \in H^1(I)^m\) is valid, then \(\{\bar{u}^r\}_{r\in {\mathbb {N}}}\subset H^1(I)^m\) and \(\bar{u}\in H^1(I)^m\) hold. Furthermore, we obtain \(\bar{u}^r\rightarrow \bar{u}\) in \(H^1(I)^m\).

Proof

From \(\bar{u}^r\rightharpoonup \bar{u}\) in \(L^2(I)^m\), we obtain \(\bar{x}^r\rightharpoonup \bar{x}\) in \(H^1(I)^n\) and \(\bar{x}^r_{T}\rightarrow \bar{x}_{T}\) in \({\mathbb {R}}^n\) since \(\mathtt S\) is continuous while \(\mathtt S_T\) is compact and continuous.

We note that for any \(r\in {\mathbb {N}}\) and any \(j\in {\mathcal {K}}\), the multipliers \(\alpha ^r_j\) and \(\xi ^r_j\) (\(\beta ^r_j\) and \(\xi ^r_j\), respectively) cannot be positive at the same time since \({\mathcal {I}}^G_r\cap {\mathcal {I}}^{GH}_r=\varnothing \) (\({\mathcal {I}}^H_r\cap {\mathcal {I}}^{GH}_r=\varnothing \)) is valid by definition. For any \(r\in {\mathbb {N}}\), let us introduce \(\mu ^r,\nu ^r\in {\mathbb {R}}^k\) as stated below for any \(j\in {\mathcal {K}}\):

$$\begin{aligned} \begin{aligned} \mu _j^r&:= {\left\{ \begin{array}{ll} \alpha ^r_j&{}\text {if }\alpha ^r_j>0,\\ -\xi ^r_jH_j(\bar{x}^r_{T})&{}\text {if }\xi ^r_j>0\,\wedge \,j\notin {\mathcal {I}}^{+0},\\ 0&{}\text {otherwise,} \end{array}\right. } \\ \nu _j^r&:= {\left\{ \begin{array}{ll} \beta ^r_j&{}\text {if }\beta ^r_j>0,\\ -\xi ^r_jG_j(\bar{x}^r_{T})&{}\text {if }\xi ^r_j>0\,\wedge \,j\notin {\mathcal {I}}^{0+},\\ 0&{}\text {otherwise.} \end{array}\right. } \end{aligned} \end{aligned}$$

Then, we obtain

$$\begin{aligned} \begin{aligned} p^r_{T}&=\nabla f(\bar{x}^r_{T})+\sum \limits _{i\in {\mathcal {L}}}\lambda ^r_i\nabla g_i(\bar{x}^r_{T}) -\sum \limits _{j\in {\mathcal {K}}}\bigl [\mu ^r_j\nabla G_j(\bar{x}^r_{T})+\nu ^r_j\nabla H_j(\bar{x}^r_{T})\bigr ]\\&\quad +\sum \limits _{j\in {\mathcal {I}}^{+0}}\xi ^r_j H_j(\bar{x}^r_{T})\nabla G_j(\bar{x}^r_{T}) +\sum \limits _{j\in {\mathcal {I}}^{0+}}\xi ^r_jG_j(\bar{x}^r_{T})\nabla H_j(\bar{x}^r_{T}) \end{aligned} \end{aligned}$$

from Definition 3.1. Note that the appearing index sets \({\mathcal {I}}^{+0}\) and \({\mathcal {I}}^{0+}\) correspond to the limit point \(\bar{x}\) and not to the KKT points of (\(\hbox {OCTCC}(\theta ^r)\)). Now, we apply Corollary 2.1 and Definition 3.1 to deduce

$$\begin{aligned} \begin{aligned} 0&=\overline{\mathtt S}^\star (\mathtt E(\bar{x}^r)-\mathbf x_\text {d})+\sigma (\bar{u}^r-\mathbf u_\text {d})\\&\quad +\mathtt S_T^\star \left[ \nabla f(\bar{x}^r_{T})+\sum \limits _{i\in {\mathcal {L}}}\lambda ^r_i\nabla g_i(\bar{x}^r_{T}) -\sum \limits _{j\in {\mathcal {K}}}\bigl [\mu ^r_j\nabla G_j(\bar{x}^r_{T})+\nu ^r_j\nabla H_j(\bar{x}^r_{T})\bigr ]\right. \\&\quad \left. +\sum \limits _{j\in {\mathcal {I}}^{+0}}\xi ^r_j H_j(\bar{x}^r_{T})\nabla G_j(\bar{x}^r_{T}) +\sum \limits _{j\in {\mathcal {I}}^{0+}}\xi ^r_jG_j(\bar{x}^r_{T})\nabla H_j(\bar{x}^r_{T})\right] . \end{aligned} \end{aligned}$$
(2)

Suppose that the sequence \(\{(\lambda ^r,\mu ^r,\nu ^r,\xi ^r_{{\mathcal {I}}^{+0}\cup {\mathcal {I}}^{0+}})\}_{r\in {\mathbb {N}}}\) is not bounded, define the constants \(\kappa ^r:=\left| (\lambda ^r,\mu ^r,\nu ^r,\xi ^r_{{\mathcal {I}}^{+0}\cup {\mathcal {I}}^{0+}})\right| _{2}\) for all \(r\in {\mathbb {N}}\), and set

$$\begin{aligned} (\tilde{\lambda }^r,\tilde{\mu }^r,\tilde{\nu }^r,\tilde{\xi }^r_{{\mathcal {I}}^{+0}\cup {\mathcal {I}}^{0+}}) := \frac{1}{\kappa ^r}(\lambda ^r,\mu ^r,\nu ^r,\xi ^r_{{\mathcal {I}}^{+0}\cup {\mathcal {I}}^{0+}}) \end{aligned}$$

for all \(r\in {\mathbb {N}}\). Then, \(\{(\tilde{\lambda }^r,\tilde{\mu }^r,\tilde{\nu }^r,\tilde{\xi }^r_{{\mathcal {I}}^{+0}\cup {\mathcal {I}}^{0+}})\}_{r\in {\mathbb {N}}}\) is a bounded sequence and converges w.l.o.g. to a nonvanishing multiplier \((\tilde{\lambda },\tilde{\mu },\tilde{\nu },\tilde{\xi }_{{\mathcal {I}}^{+0}\cup {\mathcal {I}}^{0+}})\). Hence, dividing (2) by the positive number \(\kappa ^r\), taking the limit \(r\rightarrow \infty \), and observing that \(\{\bar{x}^r\}_{r\in {\mathbb {N}}}\) and \(\{\bar{u}^r\}_{r\in {\mathbb {N}}}\) are bounded in \(L^2(I)^n\) and \(L^2(I)^m\), respectively, yields

$$\begin{aligned}0 = \mathtt S^\star _T \left[ \sum \limits _{i\in {\mathcal {L}}}\tilde{\lambda }_i\nabla g_i(\bar{x}_{T}) -\sum \limits _{j\in {\mathcal {K}}}\bigl [\tilde{\mu }_j\nabla G_j(\bar{x}_{T})+\tilde{\nu }_j\nabla H_j(\bar{x}_{T})\bigr ] \right] \end{aligned}$$

as well as \(\tilde{\lambda }\ge 0\), \(\tilde{\lambda }_i=0\) (\(i\notin {\mathcal {I}}^g\)), \(\tilde{\mu }_j=0\) (\(j\in {\mathcal {I}}^{+0}\)), and \(\tilde{\nu }_j=0\) (\(j\in {\mathcal {I}}^{0+}\)), see Lemma 3.1. We note that the validity of the constraint qualification (CQ) implies \(\tilde{\lambda }=0\), \(\tilde{\mu }=0\), and \(\tilde{\nu }=0\) since \(\mathtt S_T^\star \) is injective. Thus, due to the above observation, \(\tilde{\xi }_{{\mathcal {I}}^{+0}}\) or \(\tilde{\xi }_{{\mathcal {I}}^{0+}}\) possesses a nonvanishing component. Assume w.l.o.g. that there is \(j_0\in {\mathcal {I}}^{+0}\) such that \(\tilde{\xi }_{j_0}\) does not vanish. Then, \(\xi ^r_{j_0}>\varepsilon \kappa ^r\) holds true for sufficiently large \(r\in {\mathbb {N}}\) and some \(\varepsilon >0\). By construction, \(\nu ^r_{j_0}<-\varepsilon \kappa ^r G_{j_0}(\bar{x}^r_{T})\) is valid for sufficiently large \(r\in {\mathbb {N}}\), i.e. taking the limit \(r\rightarrow \infty \) yields

$$\begin{aligned} \tilde{\nu }_{j_0} = \lim \limits _{r\rightarrow \infty }\frac{\nu ^r_{j_0}}{\kappa ^r} \le -\varepsilon \lim \limits _{r\rightarrow \infty } G_{j_0}(\bar{x}^r_T) =-\varepsilon G_{j_0}(\bar{x}_T)<0 \end{aligned}$$

due to \(j_0\in {\mathcal {I}}^{+0}\). This, however, contradicts \((\tilde{\lambda },\tilde{\mu },\tilde{\nu })=(0,0,0)\).

Thus, \(\{(\lambda ^r,\mu ^r,\nu ^r,\xi ^r_{{\mathcal {I}}^{+0}\cup {\mathcal {I}}^{0+}})\}_{r\in {\mathbb {N}}}\) is bounded and converges (along a subsequence without relabeling) to a multiplier \((\lambda ,\mu ,\nu ,\xi _{{\mathcal {I}}^{+0}\cup {\mathcal {I}}^{0+}})\). Due to \(\bar{x}^r\rightarrow \bar{x}\) in \(L^2(I)^n\), \(\bar{x}^r_{T}\rightarrow \bar{x}_{T}\) in \({\mathbb {R}}^n\), and the continuity of f, g, G, and H, we infer \(\bar{u}^r\rightarrow \bar{u}\) in \(L^2(I)^m\), (1d), (1e), (1f), as well as

$$\begin{aligned} \begin{aligned}&0=\overline{\mathtt S}^\star (\mathtt E(\bar{x})-\mathbf x_\text {d})+\sigma (\bar{u}-\mathbf u_\text {d})\\&\qquad +\mathtt S_T^\star \left[ \nabla f(\bar{x}_{T})+\sum \limits _{i\in {\mathcal {L}}}\lambda _i\nabla g_i(\bar{x}_{T}) -\sum \limits _{j\in {\mathcal {K}}}\bigl [\mu _j\nabla G_j(\bar{x}_{T})+\nu _j\nabla H_j(\bar{x}_{T})\bigr ]\right] \end{aligned} \end{aligned}$$

from (2), see Lemma 3.1. Now, we apply Lemma 2.1 in order to obtain the conditions (1a)–(1c).

Fix \(j\in {\mathcal {I}}^{00}\) and suppose that \(\mu _j\nu _j<0\) holds true. If \(\mu _j<0\) and \(\nu _j>0\) are valid, then \(\mu ^r_j<0\) and \(\nu ^r_j>0\) must be satisfied for sufficiently large \(r\in {\mathbb {N}}\). On the other hand, \(\mu ^r_j<0\) implies \(\xi ^r_j>0\) which contradicts \(\beta ^r_j=\nu ^r_j>0\). Similarly, we obtain a contradiction from \(\mu _j>0\) and \(\nu _j<0\). Therefore, condition (1g) holds as well, i.e. \((\bar{x},\bar{u})\) is a C-stationary point of (OCTCC).

Noting that we have the convergences \(\bar{x}^r\rightarrow \bar{x}\) in \(L^2(I)^n\), \(\bar{x}^r_{T}\rightarrow \bar{x}_{T}\) in \({\mathbb {R}}^n\), and \((\lambda ^r,\mu ^r,\nu ^r,\xi ^r_{{\mathcal {I}}^{+0}\cup {\mathcal {I}}^{0+}})\rightarrow (\lambda ,\mu ,\nu ,\xi _{{\mathcal {I}}^{+0}\cup {\mathcal {I}}^{0+}})\) while the solution operator of the ODE-system

$$\begin{aligned}\begin{aligned} \dot{p}(t)+\mathbf A^\top p(t)+v(t)&\,=\,0\qquad \text {a.e.}\ \text { on }I\\ p_{T}&\,=\,b \end{aligned}\end{aligned}$$

is continuous as a mapping \(L^2(I)^n\times {\mathbb {R}}^n\ni (v,b)\mapsto p\in H^1(I)^n\), see e.g. [2, Chapter 18], we obtain \(p^r\rightarrow p\) in \(H^1(I)^n\) from Definition 3.1 and (1a), (1b). Combining Definition 3.1 and (1c), we have

$$\begin{aligned} \bar{u}^r=\mathbf u_\text {d}-\tfrac{1}{\sigma }\mathbf B^\top p^r\rightarrow \mathbf u_\text {d}-\tfrac{1}{\sigma }\mathbf B^\top p=\bar{u} \end{aligned}$$

in \(L^2(I)^m\). Thus, if \(\mathbf u_\text {d}\) is a function from \(H^1(I)^m\), then the same holds true for \(\bar{u}^r\), \(r\in {\mathbb {N}}\), and \(\bar{u}\) and the above convergence can be extended to \(H^1(I)^m\). This completes the proof.\(\square \)

Let us present some brief remarks regarding the regularity condition (CQ).

Remark 3.2

Assume that the computed limit point \((\bar{x},\bar{u})\) satisfies the constraint qualification (CQ). If \((\bar{x},\bar{u})\) is a local minimizer of (OCTCC), then it is already a Mordukhovich-stationary point, i.e. it satisfies the C-stationarity conditions (1) where (1g) is strengthened to

$$\begin{aligned} \forall j\in {\mathcal {I}}^{00}:\,\mu _j\nu _j=0\,\vee \,(\mu _j>0\,\wedge \,\nu _j>0), \end{aligned}$$

see [5, Theorem 7.5]. Staying close to the notion of finite-dimensional complementarity programming, the constraint qualification (CQ) might be referred to as MPCC-MFCQ, see [13, Definition 2.4].

In order to ensure that locally or even globally optimal solutions of the surrogate problem (\(\hbox {OCTCC}(\theta ^r)\)) satisfy the KKT conditions stated in Definition 3.1, the validity of a constraint qualification is necessary. Here, we rely on Robinson’s constraint qualification which is the fundamental regularity condition in the context of Banach space programming. It has been introduced by Robinson [20] in order to study the stability properties of parameterized nonlinear systems in Banach spaces. Later, Kurcyusz and Zowe exploited this condition in order to derive necessary optimality conditions of KKT-type in Banach space programming, see [24]. Further information on Robinson’s constraint qualification can be found in the monograph [6].

Here, we want to emphasize that the validity of the constraint qualification (CQ) at the limit point \((\bar{x},\bar{u})\) implies that Robinson’s constraint qualification holds at the iterates \((\bar{x}^r,\bar{u}^r)\) w.r.t. the surrogate problem (\(\hbox {OCTCC}(\theta ^r)\)) for sufficiently large r. Consequently, the assumption that \((\bar{x}^r,\bar{u}^r)\) is a KKT point of (\(\hbox {OCTCC}(\theta ^r)\)) would not be restrictive anymore provided \((\bar{x}^r,\bar{u}^r)\) is a locally optimal solution of the relaxed surrogate.

Note that the upcoming result can be seen as a natural extension of [13, Theorem 3.2] which shows that the validity of MPCC-MFCQ at the limit point produced by Scholtes’ relaxation scheme (applied to standard MPCCs) implies that MFCQ holds in a neighborhood of this point for all the relaxed surrogate problems where the relaxation parameter is sufficiently small. The proof of this result is omitted since it follows easily reprising the arguments in [13] while recalling that the linear operator \((\bar{\mathtt S},\mathtt S_T):L^2(I)^m\rightarrow L^2(I)^n\times {\mathbb {R}}^n\) is surjective, see [5, Lemmas 7.1 and 7.4].

Lemma 3.2

Let the constraint qualification (CQ) be valid. Then, Robinson’s constraint qualification is valid at the iterates \((\bar{x}^r,\bar{u}^r)\) for sufficiently large \(r\in {\mathbb {N}}\). Particularly, if \((\bar{x}^r,\bar{u}^r)\) is locally optimal for (\(\hbox {OCTCC}(\theta ^r)\)), then it satisfies the KKT conditions from Definition 3.1.

In our above considerations, we only commented on Scholtes’ relaxation scheme. Investigating the presented proofs which, from the infinite-dimensional point of view, only exploit the controllability of (ODE) and the present function space setting but not the precise geometry of the relaxed feasible set, similar results are likely to be satisfied for other relaxation schemes, see [13].

4 Numerical experiments

4.1 Discretization strategy

Throughout the section, we assume \(\mathbf u_\text{ d }\in H^1(I)^m\) which is non-restrictive in the setting of the aforementioned underlying real-world applications where \(\mathbf u_\text {d}\) generally vanishes, see [4, 7, 15, 16]. Invoking Theorem 3.2, we now can restrict our consideration to the situation where the control function in (OCTCC) is chosen from \(H^1(I)^m\). Noting that \(H^1(I)\) is continuously embedded into \(C(\overline{I})\), see [1, Theorem 6.3], the pointwise evaluation of state and control functions is reasonable in this setting.

For the computational treatment of (OCTCC) via a sequence of surrogate problems of the form (\(\hbox {OCTCC}(\theta ^r)\)), we rely on a first-discretize-then-optimize-approach. Therefore, we decompose I into \(N\in \mathbb N\) equidistant intervals of length \(h:=T/N\). The discretized variables are given by \(x_s:=x(hs)\) and \(u_s:= u(hs)\) for \(s=0,\ldots , N\). Note that the data functions \(\mathbf x_\text {d}\) and \(\mathbf u_\text {d}\) are discretized similarly. Now, we need to choose an appropriate strategy to represent the discretized linear system associated with (ODE). Therefore, we exploit the trapezoidal rule which is a certain Runge–Kutta method, see [8]. Particularly, we set

$$\begin{aligned} x_{s+1}=x_{s}+\tfrac{h}{2}\Bigl [\mathbf A\bigl [x_s+x_{s+1}\bigr ]+\mathbf B\bigl [u_s+u_{s+1}]\Bigr ]\qquad s=0,\ldots ,N-1. \end{aligned}$$

For any function \(w\in H^1(I)\), we have

$$\begin{aligned} \begin{aligned} \int _{0}^Tw(t)\mathrm dt&\approx \sum \limits _{s=0}^{N-1}\frac{w(sh)+w((s+1)h)}{2N}= \frac{1}{N}\left( \tfrac{1}{2}w(0)+\sum \limits _{s=1}^{N-1}w(sh)+\tfrac{1}{2}w(T)\right) . \end{aligned} \end{aligned}$$

This observation provides the following discretization scheme for the norms in the objective functional of (OCTCC):

$$\begin{aligned}\begin{aligned}&\tfrac{1}{2}\left\| x-\mathbf x_\text {d}\right\| _{L^2(I)^n}^2 \\&\quad = \frac{1}{2}\sum _{i=1}^n\int _0^T(x_i(t)-\mathbf x_{\text {d,}i}(t))^2\mathrm dt\\&\quad \approx \frac{1}{2N}\sum \limits _{i=1}^n\left( \tfrac{1}{2}(x_{0,i}-\mathbf x_{\text {d},0,i})^2 +\sum \limits _{s=1}^{N-1}(x_{s,i}-\mathbf x_{\text {d},s,i})^2+\tfrac{1}{2}(x_{T,i}-\mathbf x_{\text {d},T,i})^2\right) . \end{aligned}\end{aligned}$$

Similarly, the term \(\tfrac{\sigma }{2}\left\| u-\mathbf u_\text {d}\right\| _{L^2(I)^m}^2\) is discretized.

Note that whenever \(\mathbf u_\text {d}\notin H^1(I)^m\) holds true, we cannot rely on the pointwise evaluation of the control which only possesses \(L^2\)-regularity in general. Thus, the above discretization strategy has to be changed slightly. One possible way in order to proceed might be the approximation of the control by piecewise constant functions, see e.g. [10, Section 5].

The resulting finite-dimensional programs are (due to the relaxation of the terminal complementarity constraint) standard nonlinear problems which can be solved using e.g. the interior-point-solver IPOPT, see [23].

In this study, we focus on the theoretical details of the suggested relaxation method which is why the derivation of error estimates for the suggested Runge–Kutta method is clearly beyond of the paper’s scope. However, let us briefly mention that related considerations regarding the numerical analysis of optimal control problems of ordinary differential equations can be found in [8,9,10, 22]. Another idea for the derivation of error estimates for a different discretization strategy would be to couple the well-known method of variational discretization, where a discrete counterpart of the solution operator to (ODE) depending on h and its convergence in a suitable operator topology are considered, see [11, 12], with already available error estimates for the numerical handling of linear ordinary differential equations via higher order Runge–Kutta schemes, see [8]. Noting that the terminal constraints in (ODE) and (\(\hbox {OCTCC}(\theta ^r)\)) are not convex, the foreshadowed considerations may turn out to be technically challenging.

4.2 Example 1

Here, we first review [5, Example 7.9] in terms of the presented numerical method. For \(n=m:=2\) and \(T:=\ln 2\), we consider the problem

$$\begin{aligned} \begin{aligned} \tfrac{1}{2}\bigl (\Vert x_1-1\Vert ^2_{L^2(I)}+\Vert x_2\Vert ^2_{L^2(I)}\bigr ) +\tfrac{1}{2}\bigl (\Vert u_1\Vert ^2_{L^2(I)}+\Vert u_2\Vert ^2_{L^2(I)}\bigr )&\,\rightarrow \,\min \limits _{x,u}&\\ \dot{x}_1(t)-u_1(t)\,=\,\dot{x}_2(t)-u_2(t)&\,=\,0&\quad&\text {a.e.}\ \text { on }I&\\ x_1(0)\,=\,x_2(0)&\,=\,0&\\ x_1(T)-x_2(T)&\,\le \,0&\\ 0\,\le \,x_1(T)\,\perp \,x_2(T)&\,\ge \,0.&\end{aligned} \end{aligned}$$
(3)

From [5] we know that the unique global solution of (3) is given by

$$\begin{aligned} \bar{x}_1(t) = \tfrac{1}{3}\sinh (t) - \cosh (t) + 1, \qquad \bar{u}_1(t)= \tfrac{1}{3}\cosh (t) - \sinh (t),\qquad \bar{x}_2 (t) = \bar{u}_2 (t) =0 \end{aligned}$$

for all \(t\in I\). First, we analytically calculate the optimal solution of the associated relaxed problem in the given function space setting. For this purpose, we introduce

$$\begin{aligned} \begin{aligned} x_1^c(t)&:=c\sinh t-\cosh h+1&\qquad x_2^d(t)&:=d\sinh t&\\ u_1^c(t)&:=c\cosh t-\sinh t&\qquad u_2^d(t)&:=d\cosh t \end{aligned} \end{aligned}$$

for all \(t\in I\) and constants \(c,d\in {\mathbb {R}}\). Using standard methods, see [14], one can easily check that the relaxation of problem (3) possesses two KKT points which depend on \(\theta >0\). For \(\theta \ge 1/25\), we set \(c_1:=7/15\) and \(d_1:=2/15\) in order to obtain the KKT point \(P=(\bar{x}^1,\bar{u}^1):=((x^{c_1}_1,x^{d_1}_2),(u^{c_1}_1,u^{d_1}_2))\). Furthermore, for any \(\theta \le \tfrac{1}{3}\sqrt{5}\), \(c_2(\theta ):=\tfrac{1}{3}(2\sqrt{\theta }+1)\), and \(d_2(\theta ):=\tfrac{2}{3}\sqrt{\theta }\), another KKT point \(Q(\theta )=(\bar{x}^2(\theta ),\bar{u}^2(\theta ))\) is given by \(((x^{c_2(\theta )}_1,x^{d_2(\theta )}_2),(u^{c_2(\theta )}_1,u^{d_2(\theta )}_2))\). One can easily check that both KKT points coincide for \(\theta =1/25\). The objective value of P equals 19 / 30, while the objective value of \(Q(\theta )\) is given by \(\tfrac{5}{6}\theta -\tfrac{1}{3}\sqrt{\theta }+\tfrac{2}{3}\). Comparing both, the global minimizer of the \(\theta \)-relaxation is given by \(Q(\theta )\) for any \(\theta \in (0,1/25)\) and P for \(\theta \in [1/25,\infty )\).

Figure 1 shows the evolution of controls and states for \(\theta \in [0,0.04]\) and \(N=500\). Note that IPOPT computes the global minimizer of the relaxed problems for sufficiently small values of \(\theta \).

Fig. 1
figure 1

Surface plots of the solutions to the discretized, relaxed surrogate problems

4.3 Example 2

The following example reflects the situation of multi-agent-control with terminal friction conditions. Let us consider \(n_a\in \mathbb N\) agents whose position at time t is denoted by \(x_i(t)\), \(i=1,\ldots ,n_a\). All agents need to start their movements at time \(t=0\) at the point 0. It is desirable that agent i moves as close to the given path \(\mathbf x_{\text {d},i}\) as possible for all \(i=1,\ldots ,n_a\). By \(x_{i+n_a}:=\dot{x_i}\), we denote the speed of agent i, \(i=1,\ldots ,n_a\), which can be controlled by \(u_i\). At terminal time T, the agents need to satisfy given frictional constraints in order to allow the sharing of goods while their respective speed must be zero. Thus, we have \(n=2n_a\) and \(m=n_a\) in this situation and it can be easily checked that the associated ODE-system satisfies the proposed controllability assumption. An overall formulation of the problem is given by

(4)

For our numerical calculation, we choose \(T:=1\), \(n_a:=2\), \(\sigma :=10^{-2}\), and \(k:=1\) with

$$\begin{aligned}G_1(x_T):=1-x_1(T)\qquad H_1(x_T):=1-x_2(T)\end{aligned}$$

for all \(x\in H^1(I)^4\). Furthermore, let us set \(\mathbf x_{\text {d},1}(t):=t-t^2\) and \(\mathbf x_{\text {d},2}(t):=t^2-t^3\) for all \(t\in I\). Note that we have \(\mathbf x_\text {d,1}(1)=\mathbf x_\text {d,2}(1)=0\) for this choice of the desired states which clearly antagonizes the complementarity requirement on \(x_1\) and \(x_2\).

It is important to highlight that IPOPT allows to compute the numerical solution of the relaxed problem associated with (4) without any relaxation. However, the algorithm solves a sequence of barrier problems that may lead to infeasible solutions of the original problem if the feasible set is too small. Furthermore, it turns out that regarding different starting points, the relaxation approach is much more robust than a direct treatment of (4) when it comes down to identifying global minimizers. Thus, it is reasonable to rely on the relaxation approach.

To start, we note that due to \(n_a=2\), the discretized problem associated with (4) can be decomposed into two convex programs by replacing the terminal complementarity constraint by \(x_{T,1}=1\) and \(x_{T,2}\le 1\) in the first or \(x_{T,1}\le 1\) and \(x_{T,2}=1\) in the second case. Thus, by comparing the solution of both programs one can find the global minimizer and its corresponding objective value \(\text {Obj}(N)^*\), where N denotes the total number of (equidistant) sampling intervals. We emphasize that the global minimizer, which clearly depends on N, satisfies \(x_{T,1}=1\) and \(x_{T,2}<1\) for sufficiently large N. The strategy helps to determine the relative gap \({\mathcal {G}}(\theta ,N)\) between the objective value of the relaxed problem Obj(\(\theta , N)\) at the solution determined by IPOPT and \(\text {Obj}(N)^*\) which computes as stated below:

$$\begin{aligned} {\mathcal {G}}(\theta ,N):=\left| \frac{\text {Obj} (\theta ,N) - \text {Obj}(N)^*}{\text {Obj}(N)^*} \right| . \end{aligned}$$

We note that Obj(\(\theta , N)\) does not necessary equal the optimal objective value of the associated \(\theta \)-relaxation with N sampling intervals since the latter is nonconvex and it cannot be guaranteed that IPOPT finds its global minimizer.

We solve the discretized surrogate problem for all

$$\begin{aligned} (\theta ,N)\in \{0.01,0.05,0.1\}\times \{80,160,240,320,400\} \end{aligned}$$

in order to test the performance of the relaxations. Here, we follow the approach of [13] by starting with large values of \(\theta \), i.e., \(\theta =0.1\) and decreasing its value while using the previously found solution as a starting vector. This approach allows to improve the quality of the solution as \(\theta \) is reduced. In order to start the algorithm, we use the global minimizer of the program (4) where the complementarity constraint is weakened to the nonnegativity requirements \(G_j(x_T),H_j(x_T)\ge 0\) for all \(j\in {\mathcal {K}}\). We note that the resulting program is convex and its global minimizer is easily computed using IPOPT.

To illustrate our findings, Fig. 2 displays the position \(x_1\) and \(x_2\) of the agents for different combinations of \(\theta \) and N. One easily sees that feasibility for the complementarity problem is achieved as \(\theta \) tends to zero.

Fig. 2
figure 2

Position of the agents \(x_1\) (dots) and \(x_2\) (triangles) for \((\theta ,N)\in \{0.1,0.05,0.01\} \times \{80,400\}\)

We note that IPOPT solves the \(\theta \)-relaxations globally in this example and by means of Theorem 3.1, the associated sequence of solutions tends to the global minimizer of the underlying complementarity program (4) as \(\theta \) falls to zero. Particularly, the consideration of the relative gap \({\mathcal {G}}(\theta ,N)\) is meaningful and we report on the development of \({\mathcal {G}}(\theta ,N)\) in Table 1. As expected, for fixed N, \({\mathcal {G}}(\theta ,N)\) tends to zero as \(\theta \) falls to zero.

Table 1 Development of relative gap \({\mathcal {G}}(\theta ,N)\)

5 Conclusions

Terminal complementarity constraints appear frequently in the context of optimal control of linear dynamical systems. Common examples arise in the fields of multi-agent control, satellite clustering [7], flocking [18], spacecraft formation [4], or natural gas balancing [15]. In this work, we have shown that relaxation methods which are well known from finite-dimensional complementarity programming, see [13], can be used to treat optimal control problems with terminal complementarity constraints. Exemplary, we analyzed Scholtes’ relaxation technique but similar results are likely to hold for different relaxation approaches. On the one hand, it has been shown that global solutions of the relaxed surrogate problems converge in norm to a global minimizer of the complementarity problem. On the other hand, we demonstrated that a sequence of KKT points associated with the relaxed surrogate problems converges (under reasonable assumptions) to a C-stationary point of the complementarity problem. Numerical examples were presented to illustrate the method and visualize possible applications in multi-agent control.

There exist many interesting topics deserving further investigation like the computation of error estimates resulting from discretization or the practical solution of real-world problems exploiting the proposed numerical method.