1 Introduction

We consider the following linear-quadratic control problem:

figure a

where the cost functional \(f\) is defined by

$$\begin{aligned} f(x,u) :=&\textstyle {\frac{1}{2}} x(T)^{\mathsf {T}}Q x(T) + q^{\mathsf {T}}x(T)\\&+ \int _0^T\!\textstyle \frac{1}{2} x(t)^{\mathsf {T}}W(t)x(t) + x(t)^{\mathsf {T}}S(t) u(t) + w(t)^{\mathsf {T}}x(t) +r(t)^{\mathsf {T}}u(t)\,{\mathrm {d}}t. \end{aligned}$$

Here, \(u(t)\in {\mathbb {R}}^m\) is the control, and \(x(t)\in {\mathbb {R}}^n\) is the state of the system at time \(t\). Further \(Q\in {\mathbb {R}}^{n \times n}\), \(q\in {\mathbb {R}}^n\). The functions \(W:[0,T]\rightarrow {\mathbb {R}}^{n\times n}\), \(S:[0,T]\rightarrow {\mathbb {R}}^{n\times m}\), \(w:[0,T]\rightarrow {\mathbb {R}}^n\), \(r:[0,T]\rightarrow {\mathbb {R}}^m\), \(A:[0,T]\rightarrow {\mathbb {R}}^{n\times n}\) and \(B:[0,T] \rightarrow {\mathbb {R}}^{n\times m}\) are assumed to be Lipschitz continuous. The set \(U\subset {\mathbb {R}}^m\) is defined by upper and lower bounds, i.e.,

$$\begin{aligned} U := \{u\in {\mathbb {R}}^m\mid b_l\le u\le b_u\} \end{aligned}$$

with \(b_l, b_u\in {\mathbb {R}}^m\), \(b_l<b_u\).

Numerical solution methods for optimal control problems have been investigated over the last decades. Most of the research has been dealing with shooting and direct approximation approaches. Discretizations based on Euler’s method or more general Runge-Kutta methods are well-studied for the case that the optimal control is at least Lipschitz continuous (see e.g. [2, 812, 23, 25, 35]). First results on the error analysis for bang–bang controls can be found in [36]. In [6, 7] Euler discretizations for a class of linear-quadratic control problems with bang–bang solutions have been investigated. These results have been extended to a stable implicit discretization scheme in [5].

Since regularization leads to problems with smoother solutions a combined regularization-discretization approach is a good alternative to direct approximation. Therefore the regularization of the cost functional and constraints of optimal control problems have been studied over the last years (see e.g. [21, 2628, 34]). The dependency of solutions on regularization parameters combined with discretization has been investigated in [18] for multiplier methods for optimal control problems governed by ODEs, and in [22] for elliptic problems with state constraints. First results on bang–bang controls of linear-quadratic problems without mixed state-control-term in the cost functional have been presented in [4]. The authors proved error estimates of order \({\mathcal {O}}(\sqrt{h})\), if the regularization parameter \(\alpha =\sqrt{h}\) is chosen w.r.t. the mesh size \(h\) of the discretization. But by treating the regularization and discretization separately and combining the error estimates via the triangle inequality the actual error has been overestimated. This is a known issue in the analysis of combined regularization–discretization approaches (see e.g. [18]). In [33] the error estimates from [4] have been improved to order \({\mathcal {O}}(h)\) by combining proof techniques from [6, 7] and [4].

In this paper we will extend these convergence results on the discrete regularization of linear-quadratic problems with bang–bang controls under weaker assumptions of order \(k\) on the structure of the switching function (see Remarks 2 and  3). In our main result Theorem 5.1 we prove error estimates of order \({\mathcal {O}}((\alpha +h)^{1/(k+1)})\) w.r.t. the mesh size \(h\) of the discretization and the regularization parameter \(\alpha \) for the control, state and adjoint state variables.

The organization of the paper is as follows. In Sect. 1 we give some preliminaries and known auxiliary results for the linear-quadratic control problem \(\mathrm{(OS)}\). Moreover we generalize the second-order condition, which has been an important tool of the analysis in [6, Lemma 4.1,Theorem 4.2], under weaker assumptions on the structure of the switching function to order \(k+1\). In Sect. 2 we present the regularization technique and prove estimates for the regularization error of order \(k\). The combined regularization–discretization approach is introduced in Sect. 3. Section 4 is concerned with error estimates for the optimal values of the discrete control problems. The main result is then derived in Sect. 5. We present Hölder-type error estimates of order \({\mathcal {O}}(h^{1/(k+1)})\) for control, state and adjoint state. In Sect. 6 we improve these error estimates for linear problems. Finally the theoretical findings are illustrated by a numerical example.

We use the following notation (cf. [47]): \({\mathbb {R}}^n\) is the \(n\)-dimensional Euclidean space with the inner product denoted by \(\langle x, y\rangle \) and the norm \(|x| := \langle x,x\rangle ^{1/2}\). For an \(m \times n\)-matrix \(B\) we use the spectral norm \(\Vert B\Vert := \sup _{|z| \le 1} |Bz|\). By \(L^p(0,T;{\mathbb {R}}^m)\) we denote the Banach space of measurable vector functions \(u:[0,T]\rightarrow {\mathbb {R}}^m\) for \(1\le p <\infty \), with

$$\begin{aligned} \Vert u\Vert _p := \left( \int _0^T |u(t)|^p\,{\mathrm {d}}t\right) ^{1/p} < \infty , \end{aligned}$$

and \(L^\infty (0,T;{\mathbb {R}}^m)\) is the Banach space of essentially bounded vector functions \(u:[0,T]\rightarrow {\mathbb {R}}^m\) with the norm

$$\begin{aligned} \Vert u\Vert _\infty := \max _{1\le i\le m} \mathop {\hbox {ess sup}}\limits _{t\in [0,T ]} |u_i(t)|. \end{aligned}$$

A function \(u\) will be said to be of bounded variation if the total variation \({\mathrm {V}}_0^T u\) of \(u\) on \([0,T]\) is finite. For \(1\le p \le \infty \) we denote by \(W^1_p(0,T;{\mathbb {R}}^n)\) the Sobolev spaces of absolutely continuous functions \(x:[0,T]\rightarrow {\mathbb {R}}^n\)

$$\begin{aligned} W^1_p(0,T;{\mathbb {R}}^n) := \left\{ x\in L^p\left( 0,T;{\mathbb {R}}^n\right) \mid \dot{x}\in L^p\left( 0,T;{\mathbb {R}}^n\right) \right\} \end{aligned}$$

with

$$\begin{aligned} \Vert x\Vert _{1,p} := \left( |x(0)|^p+\Vert \dot{x}\Vert _p^p\right) ^{1/p} \end{aligned}$$

for \(1\le p<\infty \) and

$$\begin{aligned} \Vert x\Vert _{1,\infty } := \max \left\{ |x(0)|,\Vert \dot{x}\Vert _\infty \right\} . \end{aligned}$$

We define \(X := X_1\times X_2\), \(X_1 := W^1_{\infty }(0,T;{\mathbb {R}}^n)\), \(X_2 := L^\infty (0,T;{\mathbb {R}}^m)\), and we denote by

$$\begin{aligned} {\mathcal {U}}:=\{u\in X_2\mid u(t)\in U\;\text {a.e. on}\,[0,T]\} =\{u\in X_2\mid b_l\le u(t)\le b_u\;\text {a.e. on}\,[0,T]\} \end{aligned}$$

the set of admissible controls, and by

$$\begin{aligned} {\mathcal {F}} :=\{(x,u)\in X \mid u\in {\mathcal {U}},\; \dot{x}(t) = A(t)x(t)+B(t)u(t)\;\text {a.e. on } [0,T], \;x(0)=a\} \end{aligned}$$

the feasible set of \(\mathrm{(OS)}\). The linear ODE

$$\begin{aligned} \dot{x}(t)&= A(t)x(t)+B(t)u(t) \quad \text {a.e. on } [0,T],\nonumber \\ x(0)&= a, \end{aligned}$$
(1)

will be called system equation of \(\mathrm{(OS)}\). It is well known, that for the solution \(x\) of (1) it holds

$$\begin{aligned} \Vert x\Vert _\infty \le c_1 |a| + c_2 \Vert u\Vert _1 \end{aligned}$$
(2)

with constants \(c_1\) and \(c_2\) independent of \(a\) and \(u\).

Definition 1.1

A pair \((x^0,u^0)\in {\mathcal {F}}\) is called a minimizer for the Problem \(\mathrm{(OS)}\) if \(f(x^0,u^0)\le f(x,u)\) for all \((x,u)\in {\mathcal {F}}\), and a strict minimizer if \(f(x^0,u^0) < f(x,u)\) for all \((x,u)\in {\mathcal {F}}\), \((x,u)\not =(x^0,u^0)\).

In view of convexity of the Problem \(\mathrm{(OS)}\) we make the following assumption throughout the paper:

  • \(\mathrm{(AC)}\) Let the matrices \(Q\) and \(W(t)\)\(t\in [0,T]\), be symmetric and

    $$\begin{aligned}&z(T)^{\mathsf {T}}Qz(T) + \int _0^T \! z(t)^{\mathsf {T}}W(t)z(t) +2 z(t)^{\mathsf {T}}S(t)v(t) \,{\mathrm {d}}t\ge 0\\&\text {for all } (z,v)\in {\mathcal {F}} - {\mathcal {F}}, \,\text {i. e.}\,(z,v)\in X \text { with}\\&\dot{z}(t) = A(t)z(t)+B(t)v(t) \quad \text {a.e. on } [0,T],\\&z(0) = 0,\\&v(t)\in U-U \quad \text {a.e. on } [0,T]. \end{aligned}$$

The following auxiliary results concerning linear-quadratic problems of type \(\mathrm{(OS)}\) are common knowledge and can be found in [6]. The feasible set \({\mathcal {F}}\) is nonempty, closed, convex and bounded. If \(\text{(AC) }\) holds, then the cost functional is convex and continuous on \({\mathcal {F}}\). Therefore, a minimizer \((x^0,u^0)\in W^1_2(0,T;{\mathbb {R}}^n)\times L^2(0,T;{\mathbb {R}}^m)\) of \(\mathrm{(OS)}\) exists (cf. [13], Chap. II, Proposition 1.2]), and since \({\mathcal {U}}\) is bounded we have \((x^0,u^0)\in X=W^1_{\infty }(0,T;{\mathbb {R}}^n)\times L^\infty (0,T;{\mathbb {R}}^m)\). Moreover, the cost functional is Lipschitz continuous on \({\mathcal {F}}\), i.e. there is a constant \(L_f\) such that

$$\begin{aligned} |f(x,u)-f(z,v)| \le L_f\left( \Vert x-z\Vert _\infty +\Vert u-v\Vert _1\right) \quad \forall (x,u), (z,v)\in {\mathcal {F}}. \end{aligned}$$

An immediate consequence of the compactness of \(U\), the Lipschitz continuity of \(A\) and \(B\) as well as the solution formula for linear differential equations, is the existence of a constant \(L_x\) such that for any admissible control \(u\in {\mathcal {U}}\) and the associated solution \(x\) of the system Eq. (1) we have

$$\begin{aligned} \Vert x\Vert _{1,\infty } \le L_x, \end{aligned}$$
(3)

where the constant \(L_x\) is independent of \(x\). This estimate shows that the feasible trajectories are uniformly Lipschitz with Lipschitz modulus \(L_x\).

Let \((x^0,u^0)\in {\mathcal {F}}\) be a minimizer of \(\mathrm{(OS)}\). Then there exists an adjoint variable \(\lambda ^0 \in W_\infty ^1(0,T; {\mathbb {R}}^n)\) such that the adjoint equation

$$\begin{aligned} -\dot{\lambda }^0(t)&= A(t)^{\mathsf {T}}\lambda ^0(t) + W(t)x^0(t) + S(t)u^0(t)+ w(t) \quad \text {a.e. on } [0,T],\nonumber \\ \lambda ^0(T)&= Qx^0(T)+q, \end{aligned}$$
(4)

and the minimum principle

$$\begin{aligned} \left[ B(t)^{\mathsf {T}}\lambda ^0(t)+S(t)^{\mathsf {T}}x^0(t) + r(t)\right] ^{\mathsf {T}}\left( u-u^0(t)\right) \ge 0 \quad \forall u \in U,\; \text {a.e. on}\,[0,T], \end{aligned}$$
(5)

hold. Denoting the switching function by

$$\begin{aligned} \sigma ^0(t):=B(t)^{\mathsf {T}}\lambda ^0(t)+S(t)^{\mathsf {T}}x^0(t) + r(t), \end{aligned}$$
(6)

it is well-known that for each \(i\in \{1,\ldots ,m\}\), the minimum principle (5) implies

$$\begin{aligned} u^0_j(t) = \left\{ \begin{array}{ll} b_{l,j}, &{} \text{ if }\,\sigma ^0_j(t)>0,\\ b_{u,j}, &{} \text{ if }\,\sigma ^0_j(t)<0,\\ \text{ undetermined, } &{} \text{ if }\,\sigma ^0_j(t)=0. \end{array}\right. \end{aligned}$$
(7)

Remark 1

Since \(\lambda \) satisfies the adjoint equation, the parameter functions \(A\)\(W\)\(S\) and \(w\) are Lipschitz continuous, and \(u\) is bounded it follows that \(\dot{\lambda }\) is bounded, i.e. there exists a constant \(L_\lambda \), independent of \(\lambda ^0\), such that for any feasible pair \((x,u) \in {\mathcal {F}}\) and the associated solution \(\lambda \) of the adjoint equation we have

$$\begin{aligned} \Vert \lambda \Vert _{1,\infty } \le L_\lambda . \end{aligned}$$

Hence \(\lambda \) is uniformly Lipschitz continuous with Lipschitz modulus \(L_\lambda \) independent of \(N\), which implies that the switching function \(\sigma \) is uniformly Lipschitz continuous, too. Analogously to (2) for a solution \(\lambda ^0\) of (4) it holds

$$\begin{aligned} \Vert \lambda ^0\Vert _\infty \le c_1 \Vert x^0\Vert _\infty + c_2 \Vert u^0\Vert _1 + c_3 \Vert w\Vert _\infty + c_4 |q| \end{aligned}$$
(8)

with constants \(c_1\)\(c_2\)\(c_3\) and \(c_4\) independent of \((x^0,u^0)\)\(w\) and \(q\).

In the case of Lipschitz continuous optimal controls the convergence analysis of discretization methods is usually based on a second-order optimality condition (see e.g. [10, 23]). For bang–bang controls such conditions are available, too (see e.g. [1, 1417, 24, 2931]). In [6] a second-order condition which has been introduced in [14] turned out to be very useful for the analysis of Euler discretization for linear-quadratic problems. We will extend the approach used in [6], Sect. 4] and [4] to the class of problems \(\mathrm{(OS)}\) with a mixed state-control term in the cost functional. Under weaker assumptions on the structure of the switching function we derive a condition of order \(k+1\), which will play an important role in the analysis of convergence for discrete regularization approaches. This condition is closely connected to the controllability index in [19, 32] (see Remark 3). We make the following assumption on bang–bang regularity (cf. [1] resp. [14, 15]).

\(\mathrm{(A1)}\) :

There exists a solution \((x^0,u^0)\in {\mathcal {F}}\) of \(\mathrm{(OS)}\) such that the set \(\Sigma \) of zeros of the components \(\sigma ^0_j\), \(j=1,\ldots ,m\), of the switching function \(\sigma ^0\) defined by (6) is finite and \(0,T\notin \Sigma \), i.e. \(\Sigma =\{s_1,\ldots ,s_l\}\) with \(0<s_1<\cdots <s_l<T\).

Assumption (A1) implies bounded variation of \(u^0\). We denote the set of active indices for the components of the switching function \(\sigma ^0\) by

$$\begin{aligned} {\mathcal {I}}(s_\iota ):= \{1\le j\le m: \sigma ^0_j(s_\iota )=0\}, \end{aligned}$$

and formulate the second assumption:

\(\mathrm{(A2)}^k\) :

There is a natural number \(\overline{k} \in {\mathbb {N}}\), for which there exist constants \(\bar{\sigma },\bar{\tau } > 0\), such that for all \(\iota \in \{1,\dots ,l\}\), \(j\in {\mathcal {I}}(s_\iota )\) and all \(t \in [s_\iota -\bar{\tau },s_\iota +\bar{\tau }]\) it holds

$$\begin{aligned} \left| \sigma _j(t)\right| \ge \bar{\sigma }\left| t-s_\iota \right| ^{\overline{k}}. \end{aligned}$$

We define \(k \in {\mathbb {N}}\) as the smallest natural number that fulfills this condition.

Remark 2

The Assumption \(\mathrm{(A2)}^k\) is a weaker condition than Assumption \(\mathrm{(A2)}\) in [4, 6] (since it holds \(\mathrm{(A2)}^1=\mathrm{(A2)}\)). In dependence of the parameter \(k \in {\mathbb {N}}\) we will derive generalized convergence results for the discrete regularization approach for problems of type \(\mathrm{(OS)}\). Under the Assumptions \(\mathrm{(A1)}\) and \(\mathrm{(A2)}^k\) we will prove Hölder-type error estimates of order \({\mathcal {O}}(h^{1/(k+1)})\) for control, state and adjoint state.

Example 1.2

We introduce a class of linear problems depending on a parameter \(k \in {\mathbb {N}}\), which fulfills the Assumptions \(\mathrm{(A1)}\) and \(\mathrm{(A2)^k}\) (cf. [7, Example 2.10], [19], Sect. 4]). In the context of the labeling in [32] these problems have a controllability index \(k \in {\mathbb {N}}\). With \(n=k+1\)\(s \in {\mathbb {R}}^{k}\) and \(X=W^1_\infty (0,1;{\mathbb {R}}^n) \times L^\infty (0,1;{\mathbb {R}})\) we define

figure b

The adjoint equation can be written as

$$\begin{aligned} -\dot{\lambda }(t)&=\big (0, s_1 \lambda _1(t), s_2 \lambda _2(t), \dots , s_k \lambda _k(t)\big )^{\mathsf {T}},\\ \lambda (1)&=(1, 0, \dots , 0 )^{\mathsf {T}}. \end{aligned}$$

For \(j=1,\dots ,k+1\) we obtain

$$\begin{aligned} \lambda _j^0(t)=\left( \frac{(-1)^{j+1}}{(j-1)!} \prod \limits _{i=1}^{j-1} s_i\right) (t-1)^{j-1} \end{aligned}$$

and the corresponding switching function

$$\begin{aligned} \sigma ^0(t)=\sum \limits _{j=1}^{k+1}\lambda _j^0(t). \end{aligned}$$

In order to guarantee that for \(k \in {\mathbb {N}}\) the Assumption \(\mathrm{(A2)}^k\) is fulfilled, we choose the parameter \(s\) such that the switching function \(\sigma \) has a zero of order \(k\) at \(t=0.5\), which means that \(\sigma (0.5)=0\) and all derivatives up to order \(k-1\) vanish at \(t=0.5\), too. These requirements can be written as

$$\begin{aligned} \sum \limits _{j=1}^{k} \frac{(-1)^{k+j-\ell }}{(k-1-\ell )! \, 2^{k-1-\ell }}\prod \limits _{i=1}^{j-1} s_i =0, \quad \ell =0,\dots ,k-1 \end{aligned}$$

and are fulfilled if we choose \(s_j := -2(k-j+1)\) for \(j=1,\dots ,k\). The resulting switching function is

$$\begin{aligned} \sigma ^0(t)=2^k(t-0.5)^k. \end{aligned}$$

Therefore the solution of \(\mathrm{(B)}_k\) fulfills the Assumptions \(\mathrm{(A1)}\) and \(\mathrm{(A2)}^k\) with \(k \in {\mathbb {N}}\). With the help of (7) we can characterize the optimal control. For odd \(k\) it holds

$$\begin{aligned} u^0(t)= {\left\{ \begin{array}{ll} 1, &{} 0\le t < 0.5, \\ -1, &{} 0.5 < t \le 1. \end{array}\right. } \end{aligned}$$

If \(k\) is even, the optimal control is a constant function \(u^0(t) \equiv -1\). Because of the weakened Assumption \(\mathrm{(A2)}^k\) this example shows, that the error estimates we will derive in Sects. 5 and 6 hold true for an extended class of control problems in comparison to the results in [4].

The following result is a generalization of [6, Lemma 4.1] (cf. [14, Lemma 3.3]):

Lemma 1.3

Let \((x^0,u^0)\) be a minimizer for Problem (OS) that fulfills the Assumptions \(\mathrm{(A1)}\) and \(\mathrm{(A2)}^k\), and let the switching function \(\sigma \) be defined by (6). Then there are constants \(\beta ,\gamma ,\bar{\delta }>0\) such that for any admissible control \(u \in {\mathcal {U}}\),

$$\begin{aligned} \int _0^T \sigma ^0(t)^{\mathsf {T}}\left( u(t)-u^0(t)\right) \!\,{\mathrm {d}}t\ge \beta \,\Vert u-u^0\Vert _1^{k+1} \end{aligned}$$

if \(\Vert u-u^0\Vert _1 \le 2\gamma \bar{\delta }\), and

$$\begin{aligned} \int _0^T \sigma ^0(t)^{\mathsf {T}}\left( u(t)-u^0(t)\right) \!\,{\mathrm {d}}t\ge \beta \,\Vert u-u^0\Vert _1 \end{aligned}$$

if \(\Vert u-u^0\Vert _1 > 2\gamma \bar{\delta }\).

Proof

For \(0 < \delta < \bar{\tau }\) we define

$$\begin{aligned} I(\delta ) := \bigcup \limits _{1\le \iota \le l} \left[ s_\iota -\delta ,s_\iota +\delta \right] . \end{aligned}$$

Let \(j \in \{1,\dots ,m\}\) be arbitrary and

$$\begin{aligned} \Sigma _j := \left\{ \tau _1,\dots ,\tau _{l_j}\right\} \text { with } 0 <\tau _1<\cdots <\tau _{l_j}<T \end{aligned}$$

the set of zeros of \(\sigma _j\). Moreover we define

$$\begin{aligned} I_{-}(\delta ) := \bigcup \limits _{\iota =1,\dots ,l_j} \left[ \tau _\iota -\delta ,\tau _\iota +\delta \right] \quad \text {and} \quad I_{+}(\delta ) := [0,T] \setminus I_{-}(\delta ). \end{aligned}$$

Since \(\sigma \) is Lipschitz continuous, there exist constants \(\sigma _{j,\min }\) with

$$\begin{aligned} \sigma _{j,\min } := \min \limits _{t\in I_{+}(\bar{\tau })} \left| \sigma _j(t)\right| > 0. \end{aligned}$$

We choose \(0< \bar{\delta }\le \bar{\tau }\), such that

$$\begin{aligned} \bar{\delta }^k \bar{\sigma } \le \min \limits _{1\le j \le m} \sigma _{j,\min }. \end{aligned}$$

From the Assumption \(\mathrm{(A2)}^k\) we obtain for all \(0<\delta <\bar{\delta }\) and \(j \in \left\{ 1,\dots ,m\right\} \)

$$\begin{aligned} \left| \sigma _j(t) \right| \ge \delta ^k \bar{\sigma } \quad \forall \, t \in [0,T] \setminus I(\delta ). \end{aligned}$$
(9)

Let \(u \in {\mathcal {U}}\) be arbitrary. From the minimum principle (5) we know that the signs of \(\sigma _j(t)\) and \(u_j(t)-u_j^0(t)\) coincide a.e. on \([0,T]\). It follows from (9), that

$$\begin{aligned} J&= \int _0^T\! \sigma ^0(t)^{\mathsf {T}}\left( u(t)-u^0(t)\right) \,{\mathrm {d}}t\ge \int \limits _{[0,T]\setminus I(\delta )}\! \sigma ^0(t)^{\mathsf {T}}\left( u(t)-u^0(t)\right) \,{\mathrm {d}}t\nonumber \\&= \int \limits _{[0,T]\setminus I(\delta )}\! \sum \limits _{j=1}^m |\sigma _j(t)|\,|u_j(t)-u_j^0(t)| \,{\mathrm {d}}t\ge \delta ^k \bar{\sigma } \sum \limits _{j=1}^m \int \limits _{[0,T]\setminus I(\delta )}\!|u_j(t)-u_j^0(t)| \,{\mathrm {d}}t.\nonumber \\ \end{aligned}$$
(10)

Furthermore, for \(1\le j \le m\) it holds

$$\begin{aligned} |u_j(t)-u_j^0(t)| \le b_{u,j}-b_{l,j} \quad \forall \, t \in [0,T], \end{aligned}$$

an with

$$\begin{aligned} \gamma := 2lm \max \limits _{1\le j \le m} (b_{u,j}-b_{l,j}) \end{aligned}$$

we get

$$\begin{aligned} \sum \limits _{j=1}^m \int _{I(\delta )}\!|u_j(t)-u_j^0(t)| \,{\mathrm {d}}t\le \gamma \delta . \end{aligned}$$

Together with (10) this implies

$$\begin{aligned} J \ge \delta ^k \bar{\sigma } \left( \Vert u-u^0\Vert _1-\gamma \delta \right) . \end{aligned}$$

Now we choose \(\delta := \min \left\{ \bar{\delta }, \frac{1}{2\gamma }\Vert u-u^0\Vert _1\right\} \). If \(\delta =\bar{\delta }\), i.e. it holds \(\Vert u-u^0\Vert _1 > 2\gamma \bar{\delta }\), we obtain

$$\begin{aligned} J \ge \frac{\bar{\delta }^k \bar{\sigma }}{2} \Vert u-u^0\Vert _1. \end{aligned}$$
(11)

If \(\delta =\frac{1}{2\gamma }\Vert u-u^0\Vert _1\), i.e. it holds \(\Vert u-u^0\Vert _1 \le 2\gamma \bar{\delta }\), we have

$$\begin{aligned} J \ge \frac{\bar{\sigma }}{2^{k+1}\gamma ^k} \Vert u-u^0\Vert _1^{k+1}. \end{aligned}$$
(12)

With

$$\begin{aligned} \beta := \min \left\{ \frac{\bar{\delta }^k \bar{\sigma }}{2}, \frac{\bar{\sigma }}{2^{k+1}\gamma ^k} \right\} \end{aligned}$$

the assertion follows directly from (11) and (12). \(\square \)

This Lemma implies a minorant of order \(k+1\) for the optimal values of \(\mathrm{(OS)}\) in a sufficiently small \(L^1\)-neighborhood and a linear minorant outside this neighborhood. Moreover it directly implies the following generalization of [6, Theorem 4.2]:

Theorem 1.4

Let \((x^0,u^0)\) be a minimizer for Problem \(\mathrm{(OS)}\) that fulfills the Assumptions \(\mathrm{(A1)}\) and \(\mathrm{(A2)}^k\). Then there are constants \(\beta ,\gamma ,\bar{\delta }>0\) such that for any feasible pair \((x,u) \in {\mathcal {F}}\),

$$\begin{aligned} f(x,u)-f\left( x^0,u^0\right) \ge \beta \,\Vert u-u^0\Vert _1^{k+1} \end{aligned}$$

if \(\Vert u-u^0\Vert _1 \le 2\gamma \bar{\delta }\), and

$$\begin{aligned} f(x,u)-f\left( x^0,u^0\right) \ge \beta \,\Vert u-u^0\Vert _1 \end{aligned}$$

if \(\Vert u-u^0\Vert _1 > 2\gamma \bar{\delta }\).

Proof

Let \((x^0,u^0)\) be a solution of \(\mathrm{(OS)}\) that fulfills the Assumptions \(\mathrm{(A1)}\) and \(\mathrm{(A2)}^k\), with corresponding adjoint variable \(\lambda ^0\). Further let \((x,u) \in {\mathcal {F}}\) be arbitrary. We define \(z := x-x^0\), \(v := u-u^0\) and obtain

$$\begin{aligned} f(x,u)-f\left( x^0,u^0\right)&= x^0(T)^{\mathsf {T}}Q z(T) + \frac{1}{2} z(T)^{\mathsf {T}}Q z(T) + q^{\mathsf {T}}z(T)\\&\quad +\int _0^T \! \frac{1}{2} z(t)^{\mathsf {T}}W(t) z(t) + z(t)^{\mathsf {T}}S(t) v(t) \,{\mathrm {d}}t\\&\quad +\int _0^T \! \left[ W(t)x^0(t)+S(t)u^0(t)+w(t)\right] ^{\mathsf {T}}z(t) \,{\mathrm {d}}t\\&\quad +\int _0^T \! \left[ S(t)^{\mathsf {T}}x^0(t)+r(t)\right] ^{\mathsf {T}}v(t) \,{\mathrm {d}}t. \end{aligned}$$

With Assumption \(\mathrm{(AC)}\) it follows

$$\begin{aligned} f(x,u)-f\left( x^0,u^0\right)&\ge x^0(T)^{\mathsf {T}}Q z(T) + q^{\mathsf {T}}z(T)\nonumber \\&+\int _0^T \! \left[ W(t)x^0(t)+S(t)u^0(t)+w(t)\right] ^{\mathsf {T}}z(t)\,{\mathrm {d}}t\nonumber \\&+\int _0^T \! \left[ S(t)^{\mathsf {T}}x^0(t)+r(t)\right] ^{\mathsf {T}}v(t)\,{\mathrm {d}}t. \end{aligned}$$
(13)

Using partial integration we can deduce from the terminal condition of the adjoint Eq. (4)

$$\begin{aligned} x^0(T)^{\mathsf {T}}Q z(T) + q^{\mathsf {T}}z(T)&=\lambda ^0(T)^{\mathsf {T}}z(T)\\&=\int _0^T \! \dot{z}(t)^{\mathsf {T}}\lambda ^0(t)\,{\mathrm {d}}t+\int _0^T \! z(t)^{\mathsf {T}}\dot{\lambda }^0(t)\,{\mathrm {d}}t. \end{aligned}$$

Since \(z\) solves the system Eq. (1) of \(\mathrm{(OS)}\) with \(u=v\) and \(\lambda ^0\) solves the adjoint Eq. (4) with \((x,u)=(x^0,u^0)\) we obtain

$$\begin{aligned}&x^0(T)^{\mathsf {T}}Q z(T) + q^{\mathsf {T}}z(T) =\int _0^T \! \left[ A(t)z(t)+B(t)v(t)\right] ^{\mathsf {T}}\lambda ^0(t)\,{\mathrm {d}}t\\&\quad -\int _0^T \! z(t)^{\mathsf {T}}\left[ A(t)^{\mathsf {T}}\lambda ^0(t)+W(t)x^0(t)+S(t)u^0(t)+w(t)\right] \!\,{\mathrm {d}}t. \end{aligned}$$

Plugging this into (13) leads together with (6) to

$$\begin{aligned} f(x,u)-f\left( x^0,u^0\right)&\ge \int _0^T \! \left[ B(t)^{\mathsf {T}}\lambda ^0(t)+S(t)^{\mathsf {T}}x^0(t)+r(t)\right] ^{\mathsf {T}}v(t)\,{\mathrm {d}}t\\&= \int _0^T \! \sigma ^0(t)^{\mathsf {T}}v(t)\,{\mathrm {d}}t, \end{aligned}$$

and the assertion follows directly from Lemma 1.3. \(\square \)

Remark 3

The results of order \(k+1\) from Lemma 1.3 and Theorem 1.4 are strongly connected to the findings in [19] and [32]. In [32] the authors derived stability results for Mayer-type problems with the help of metric regularity and smoothness assumptions on the problem parameters. The structure of the switching function is characterized with the help of the “controllability index”. In the case of control constraints defined by constant upper and lower bounds

$$\begin{aligned} u(t) \in U=\left\{ u \in {\mathbb {R}}^m \big |b_l \le u \le b_u\right\} \quad \text {a.e. on } [0,T] \end{aligned}$$

the controllability index depends on the derivatives up to order \(k-1\) of the switching function (cf. [32, Remark 1]).

Using the stability results from [32] it was shown in [19] that the direct Euler discretization for Mayer-type problems converges with order \({\mathcal {O}}(h^{1/k})\) w.r.t. the controllability index \(k\) and the mesh size \(h\). Under the Assumptions \(\mathrm{(A1)}\) and \(\mathrm{(A2)}^k\) we will show convergence of order \({\mathcal {O}}(h^{1/(k+1)})\) for a discrete regularization approach applied to linear-quadratic problems of type \(\mathrm{(OS)}\) with bang–bang controls. Moreover we will improve these Hölder-type error estimates for linear problems to order \({\mathcal {O}}(h^{1/k})\) without making further assumptions on the smoothness of the problem parameters.

2 Regularization

By adding the regularization term \(\frac{\alpha }{2} \Vert u\Vert _2^2\) to the cost functional of our initial problem \(\mathrm{(OS)}=\mathrm{(OS)}^0\) we obtain the following class of linear-quadratic control problems with \(\alpha \ge 0\):

figure c

Hereby, the cost functional is defined as

$$\begin{aligned} f^\alpha (x,u)&:=\frac{1}{2} x(T)^{\mathsf {T}}Q x(T) + q^{\mathsf {T}}x(T) + \frac{\alpha }{2} \Vert u\Vert _2^2\\&\qquad + \int _0^T \! \frac{1}{2} x(t)^{\mathsf {T}}W(t) x(t) + x(t)^{\mathsf {T}}S(t) u(t) +w(t)^{\mathsf {T}}x(t)+ r(t)^{\mathsf {T}}u(t) \,{\mathrm {d}}t\\&=f(x,u) + \frac{\alpha }{2} \Vert u\Vert _2^2. \end{aligned}$$

The feasible set of \(\mathrm{(OS)}^\alpha \) is \({\mathcal {F}}\). If we choose \(\alpha >0\), the cost functional \(f^\alpha \) is strictly convex, and therefore the problem \(\mathrm{(OS)}^\alpha \) has got a uniquely defined Lipschitz continuous solution \((x^\alpha ,u^\alpha )\) (cf. [9, Lemma 4]) and a corresponding adjoint state \(\lambda ^\alpha \in X_1\) such that the adjoint equation

$$\begin{aligned} -\dot{\lambda }^\alpha (t)&= A(t)^{\mathsf {T}}\lambda ^\alpha (t) + W(t) x^\alpha (t) + S(t) u^\alpha (t) + w(t) ,\\ \lambda ^\alpha (T)&= Q x^\alpha (T) + q \end{aligned}$$

and the minimum principle

$$\begin{aligned} \left( \alpha u^\alpha (t) + B(t)^{\mathsf {T}}\lambda ^\alpha (t) + S(t)^{\mathsf {T}}x^\alpha (t) + r(t)\right) ^{\mathsf {T}}\left( u-u^\alpha (t)\right) \ge 0\quad \forall \, u \in U, \end{aligned}$$
(14)

hold a.e. on \([0,T]\). For \(t \in [0,T]\) the switching function \(\sigma ^\alpha \) of \(\mathrm{(OS)}^\alpha \) is defined by

$$\begin{aligned} \sigma ^\alpha (t) := \alpha u^\alpha (t) + B(t)^{\mathsf {T}}\lambda ^\alpha (t)+S(t)^{\mathsf {T}}x^\alpha (t) + r(t) \end{aligned}$$

From (14) it follows that for \(\alpha >0\) the optimal control \(u^\alpha \) is

$$\begin{aligned} u^\alpha (t)={\mathrm{Pr}}_{[b_l,b_u]} \left[ -\frac{1}{\alpha } \left( B(t)^{\mathsf {T}}\lambda ^\alpha (t) + S(t)^{\mathsf {T}}x^\alpha (t)+r(t)\right) \right] . \end{aligned}$$
(15)

We will show that for \(\alpha \rightarrow 0\) the solution \((x^\alpha ,u^\alpha )\) of \(\mathrm{(OS)}^\alpha \) converges to \((x^0,u^0)\). The order of convergence depends on the structure of the switching function of the initial problem \(\mathrm{(OS)}\).

Theorem 2.1

Let \((x^0,u^0)\) be a solution of \(\mathrm{(OS)}\) that fulfills the Assumptions \(\mathrm{(A1)}\) and \(\mathrm{(A2)}^k\). Then the uniquely determined solution \((x^\alpha ,u^\alpha )\) of \(\mathrm{(OS)}^\alpha \) can be estimated by

$$\begin{aligned} \Vert u^\alpha -u^0\Vert _1 \le c_u \alpha ^{\frac{1}{k}}\quad \text {and} \quad \Vert x^\alpha -x^0\Vert _\infty \le c_x \alpha ^{\frac{1}{k}}. \end{aligned}$$

For the corresponding adjoint variable it holds

$$\begin{aligned} \Vert \lambda ^\alpha -\lambda ^0\Vert _1 \le c_\lambda \alpha ^{\frac{1}{k}}. \end{aligned}$$

The constants \(c_u\)\(c_x\) and \(c_\lambda \) are independent of the regularization parameter \(\alpha \).

Proof

For the solution \((x^\alpha ,u^\alpha )\) of \(\mathrm{(OS)}^\alpha \) it holds

$$\begin{aligned} f\left( x^\alpha ,u^\alpha \right) +\frac{\alpha }{2} \Vert u^\alpha \Vert _2^2=f^\alpha \left( x^\alpha ,u^\alpha \right) \le f^\alpha \left( x^0,u^0\right) = f\left( x^0,u^0\right) + \frac{\alpha }{2}\Vert u^0\Vert _2^2 \end{aligned}$$

and therefore

$$\begin{aligned} f\left( x^\alpha ,u^\alpha \right) -f\left( x^0,u^0\right) \le \frac{\alpha }{2}\left( \Vert u^0\Vert _2^2-\Vert u^\alpha \Vert _2^2\right) . \end{aligned}$$
(16)

Since \(U\) is bounded we obtain

$$\begin{aligned} \Vert u^0\Vert _2^2-\Vert u^\alpha \Vert _2^2&= \int _0^T \! |u^0(t)|^2 - |u^\alpha (t)|^2 \,{\mathrm {d}}t\\&= \int _0^T \! \left( |u^0(t)| + |u^\alpha (t)|\right) \left( |u^0(t)| - |u^\alpha (t)|\right) \,{\mathrm {d}}t\\&\le \int _0^T \! \left( |u^0(t)| + |u^\alpha (t)|\right) |u^0(t) - u^\alpha (t)| \,{\mathrm {d}}t\\&\le \left( \Vert u^0\Vert _\infty + \Vert u^\alpha \Vert _\infty \right) \Vert u^0 - u^\alpha \Vert _1\\&\le c_1 \Vert u^0 - u^\alpha \Vert _1 \end{aligned}$$

with a constant \(c_1\) independent of \(\alpha \). With (16) it follows

$$\begin{aligned} f\left( x^\alpha ,u^\alpha \right) -f\left( x^0,u^0\right) \le \frac{c_1}{2} \alpha \Vert u^0 - u^\alpha \Vert _1. \end{aligned}$$
(17)

From Theorem 1.4 we know that there are constants \(\beta \)\(\gamma \)\(\bar{\delta } > 0\) with

$$\begin{aligned} f\left( x^\alpha ,u^\alpha \right) -f\left( x^0,u^0\right) \ge \beta \Vert u^\alpha -u^0\Vert _1^{k+1}, \end{aligned}$$

if \(\Vert u^\alpha -u^0\Vert _1 \le 2 \gamma \bar{\delta }\) and

$$\begin{aligned} f\left( x^\alpha ,u^\alpha \right) -f\left( x^0,u^0\right) \ge \beta \Vert u^\alpha -u^0\Vert _1, \end{aligned}$$

if \(\Vert u^\alpha -u^0\Vert _1 \ge 2 \gamma \bar{\delta }\). With (17) we can deduce

$$\begin{aligned} \frac{c_1}{2} \alpha \Vert u^0 - u^\alpha \Vert _1 \ge \beta \Vert u^\alpha -u^0\Vert _1^{k+1}, \end{aligned}$$

if \(\Vert u^\alpha -u^0\Vert _1 \le 2 \gamma \bar{\delta }\) and

$$\begin{aligned} \frac{c_1}{2} \alpha \Vert u^0 - u^\alpha \Vert _1 \ge \beta \Vert u^\alpha -u^0\Vert _1, \end{aligned}$$

if \(\Vert u^\alpha -u^0\Vert _1 \ge 2 \gamma \bar{\delta }\). Together with

$$\begin{aligned} \Vert u^0 - u^\alpha \Vert _1 \le T \left( \Vert u^0\Vert _\infty + \Vert u^\alpha \Vert _\infty \right) \end{aligned}$$

we obtain

$$\begin{aligned} \frac{c_1}{2 \beta } \alpha \ge \Vert u^\alpha -u^0\Vert _1^{k}, \end{aligned}$$
(18)

if \(\Vert u^\alpha -u^0\Vert _1 \le 2 \gamma \bar{\delta }\) and

$$\begin{aligned} \frac{c_1 T}{2 \beta } \left( \Vert u^0\Vert _\infty + \Vert u^\alpha \Vert _\infty \right) \alpha \ge \Vert u^\alpha -u^0\Vert _1, \end{aligned}$$

if \(\Vert u^\alpha -u^0\Vert _1 \ge 2 \gamma \bar{\delta }\). In both cases it follows for sufficiently small \(\alpha \)

$$\begin{aligned} \Vert u^\alpha -u^0\Vert _1^k \le 2 \gamma \bar{\delta }. \end{aligned}$$

From (18) we obtain the desired estimate for the control

$$\begin{aligned} \Vert u^\alpha -u^0\Vert _1 \le c_u \alpha ^{\frac{1}{k}} \end{aligned}$$

with a constant \(c_u\) independent of \(\alpha \). Since \(z := x^\alpha -x^0\) solves

$$\begin{aligned} \dot{z}(t)&=A(t)z(t)+B(t)\left( u^\alpha (t)-u^0(t)\right) \quad \text {a.e. on } [0,T],\\ z(0)&=0 \end{aligned}$$

we get from (2)

$$\begin{aligned} \Vert x^\alpha -x^0\Vert _\infty = \Vert z\Vert _\infty \le \bar{c}_1 \Vert u^\alpha -u^0\Vert _1 \le c_x \alpha ^\frac{1}{k} \end{aligned}$$

with a constant \(c_x\) independent of \(\alpha \). Moreover \(\mu := \lambda ^\alpha - \lambda ^0\) solves

$$\begin{aligned} -\dot{\mu }(t)&=A(t)^{\mathsf {T}}\mu (t)\!+\!W(t)\left( x^\alpha (t)\!-\!x^0(t)\right) \!+\! S(t)\left( u^\alpha (t)-u^0(t)\right) \quad \text {a.e. on } [0,T],\\ \mu (T)&=Q\left( x^\alpha (T)-x^0(T)\right) . \end{aligned}$$

From (8) we can deduce the estimate

$$\begin{aligned} \Vert \lambda ^\alpha - \lambda ^0\Vert _\infty = \Vert \mu \Vert _\infty \le \bar{c}_1 \Vert x^\alpha -x^0\Vert _\infty + \bar{c}_2 \Vert u^\alpha -u^0\Vert _1 \le c_\lambda \alpha ^\frac{1}{k} \end{aligned}$$

with a constant \(c_\lambda \) independent of \(\alpha \). \(\square \)

Now we illustrate the theoretical findings of Theorem 2.1 with the help of Problem \(\mathrm{(B)}_k\) from Example 1.2.

Example 2.2

The \(L^2\)-regularization of \(\mathrm{(B)}_k\) leads to:

figure d

We have already seen that with \(s_j=-2(k-j+1)\) for \(j=1,\dots ,k\) the switching function of \(\mathrm{(B)}_k^0\) is given by

$$\begin{aligned} \sigma ^0(t)=2^k(t-0.5)^k \quad \text {a.e. on } [0,1]. \end{aligned}$$

Using the projection formula (15) the optimal control \(u^\alpha \) of the regularized problem \(\mathrm{(B)}_k^\alpha \) with \(\alpha >0\) can be written as

$$\begin{aligned} u^\alpha (t)= \mathrm{Pr}_{[-1,1]}\left[ -\frac{1}{\alpha }\sigma ^0(t)\right] \quad \text {a.e. on } [0,1]. \end{aligned}$$

We define

$$\begin{aligned} t_1 := \frac{1}{2} - \frac{1}{2} \alpha ^{1/k}\quad \text {and}\quad t_2 := \frac{1}{2} + \frac{1}{2} \alpha ^{1/k}, \end{aligned}$$

and for sufficiently small \(\alpha \), we obtain \([t_1,t_2] \subset [0,1]\). From the projection formula (15) it follows for odd \(k \in {\mathbb {N}}\)

$$\begin{aligned} u^\alpha (t)= {\left\{ \begin{array}{ll} 1, &{} 0 \le t \le t_1 ,\\ -\frac{2^k}{\alpha }(t-0.5)^k, &{} t_1 < t < t_2,\\ -1, &{} t_2 \le t \le 1, \end{array}\right. } \end{aligned}$$

and for even \(k \in {\mathbb {N}}\)

$$\begin{aligned} u^\alpha (t)= {\left\{ \begin{array}{ll} -1, &{} 0 \le t \le t_1 ,\\ -\frac{2^k}{\alpha }(t-0.5)^k, &{} t_1 < t < t_2,\\ -1, &{} t_2 \le t \le 1. \end{array}\right. } \end{aligned}$$

From this we can conclude, that for arbitrary \(k \in {\mathbb {N}}\) and sufficiently small \(\alpha >0\) the optimal control \(u^0\) of \(\mathrm{(B)}_k^0\) and the optimal control \(u^\alpha \) of \(\mathrm{(B)}_k^\alpha \) coincide on \([0,1]\setminus [t_1,t_2]\). Since it holds

$$\begin{aligned} \Vert u^\alpha -u^0\Vert _1=2 \left( \frac{1}{2} \alpha ^{\frac{1}{k}} - \int _{0}^{\frac{1}{2} \alpha ^{\frac{1}{k}}} \frac{2^k}{\alpha } t^k \,{\mathrm {d}}t\right) = \left( 1-\frac{1}{k+1}\right) \alpha ^{\frac{1}{k}}, \end{aligned}$$

this example illustrates the theoretical results of Theorem 2.1.

3 The regularization–discretization approach

We choose a natural number \(N\) and define the mesh size \(h := T/N\). The space \(X_2\) of controls is approximated by functions in the subspace \(X_{2,N}\subset X_2\) of piecewise constant functions represented by their values \(u_{h,i} := u(t_i)\) at the grid points \(t_i := ih\), \(i=0,1,\ldots ,N-1\). Further, we approximate state and adjoint state variables by functions in the subspace \(X_{1,N}\subset X_1\) of continuous, piecewise linear functions represented by their values \(x_{h,i} := x(t_i)\), \(\lambda _{h,i} := \lambda (t_i)\) at the grid points \(t_i\), \(i=0,1,\ldots ,N\). We define \({\mathcal {U}}_N := {\mathcal {U}}\cap X_{2,N}\) and \(X_N := X_{1,N} \times {\mathcal {U}}_N\). In order to get a discrete system equation we use Euler’s method:

$$\begin{aligned} x_{h,i+1}&= x_{h,i} + h \left[ A(t_i)x_{h,i} + B(t_i) u_{h,i}\right] , \quad i=0,1,\ldots ,N-1,\nonumber \\ x_{h,0}&= a. \end{aligned}$$
(19)

Discretizing the cost functional of \(\mathrm{(OS)}\) with respect to the discretization method for the system equation and adding a discrete regularization term leads to the following class of discrete control problems depending on the regularization parameter \(\alpha \):

figure e

where the cost functional \(f_N^\alpha \) is defined by

$$\begin{aligned} f_N^\alpha (x,u) :=&\frac{1}{2} x_N^{\mathsf {T}}Q x_N + q^{\mathsf {T}}x_N + h\sum _{i=0}^{N-1}\frac{\alpha }{2} \, u_i^{\mathsf {T}}u_i + \frac{1}{2} x_i^{\mathsf {T}}W(t_i)x_i + x_i^{\mathsf {T}}S(t_i)u_i\\&+h\sum _{i=0}^{N-1}w(t_i)^{\mathsf {T}}x_i +r(t_i)^{\mathsf {T}}u_i. \end{aligned}$$

The feasible set of \(\mathrm{(OS)}_N^\alpha \) is independent of the regularization parameter \(\alpha \). We denote it by \({\mathcal {F}}_N\). Analogously to (2) it holds for the solution \(x_h\) of (19)

$$\begin{aligned} \Vert x_h\Vert _\infty \le c_1 |a| + c_2 \Vert u_h\Vert _1 \end{aligned}$$
(20)

with constants \(c_1\) and \(c_2\) independent of \(a\) and \(u_h\).

Remark 4

Since we added the regularization term \(h\sum _{i=0}^{N-1}\frac{\alpha }{2} \, u_i^{\mathsf {T}}u_i\) to the cost functional, the problem \(\mathrm{(OS)}_N^\alpha \) fulfills a common sufficient condition for the convergence of finite-dimensional optimization methods. Therefore the combined regularization–discretization approach is numerically more stable and has advantages over direct approximation methods as presented for example in [6].

The following auxiliary results are similar to the ones concerning Euler discretization (cf. [6]).

Definition 3.1

A pair \((x_h^\alpha ,u_h^\alpha )\in {\mathcal {F}}_N\) is called a minimizer for Problem \(\mathrm{(OS)}_N^\alpha \) if \(f_N^\alpha (x_h^\alpha ,u_h^\alpha )\le f_N^\alpha (x_h,u_h)\) for all \((x_h,u_h)\in {\mathcal {F}}_N\), and a strict minimizer for Problem \(\mathrm{(OS)}_N^\alpha \) if \(f_N^\alpha (x_h^\alpha ,u_h^\alpha ) < f_N^\alpha (x_h,u_h)\) for all \((x_h,u_h)\in {\mathcal {F}}_N\), \((x_h,u_h)\not =(x_h^\alpha ,u_h^\alpha )\).

Again, since \(U\) is compact there exists a constant \(L_x\) independent of \(N\) such that for any admissible control \(u_h\in {\mathcal {U}}\) and the associated solution \(x_h \in X_{1,N}\) of the discrete system Eq. (19) we have

$$\begin{aligned} |\dot{x}_h(t)| \le L_x \quad \text {a.e. on}\,[0,T], \end{aligned}$$

which shows that the discrete feasible trajectories are uniformly Lipschitz with Lipschitz modulus \(L_x\) independent of \(N\), where w.l.o.g. \(L_x\) is the same constant as in (3).

Compactness of \(U\) further implies that Problem \(\mathrm{(OS)}_N^\alpha \) has a solution \((x_h^\alpha ,u_h^\alpha ) \in X_N\), and for any solution there exists a continuous, piecewise linear multiplier \(\lambda _h^\alpha \in X_{1,N}\) such that for \(i=0,\ldots ,N-1\) the discrete adjoint equation

$$\begin{aligned} -\frac{\lambda _{h,i+1}^\alpha - \lambda _{h,i}^\alpha }{h}&= A(t_i)^{\mathsf {T}}\lambda _{h,i+1}^\alpha + W(t_i) x_{h,i}^\alpha +S(t_i) u_{h,i}^\alpha + w(t_i),\nonumber \\ \lambda _{h,N}^\alpha&= Q x_{h,N}^\alpha +q, \end{aligned}$$
(21)

and the discrete minimum principle

$$\begin{aligned} \left[ \alpha \, u_{h,i}^\alpha +B(t_i)^{\mathsf {T}}\lambda _{h,i+1}^\alpha +S(t_i)^{\mathsf {T}}x_{h,i}^\alpha +r(t_i)\right] ^{\mathsf {T}}\left( u-u_{h,i}^\alpha \right) \ge 0 \quad \forall u\in U \end{aligned}$$
(22)

are satisfied (cf. [6]). By \(\sigma _h^\alpha :[0,t_{N-1}]\rightarrow {\mathbb {R}}^m\) we denote the discrete switching function, which is the continuous and piecewise linear function defined by the values

$$\begin{aligned} \sigma _h^\alpha (t_i) := \alpha \, u_{h,i}^\alpha +B(t_i)^{\mathsf {T}}\lambda _{h,i+1}^\alpha +S(t_i)^{\mathsf {T}}x_{h,i}^\alpha +r(t_i), \quad i=0,\ldots ,N-1. \end{aligned}$$

If \(\alpha = 0\), from (22) we obtain for \(j=1,\ldots ,m\), \(i=0,\ldots ,N-1\),

$$\begin{aligned} u^0_{h,j}(t_i) = \left\{ \begin{array}{ll} b_{l,j}, &{} \text{ if }\,\sigma _{h,j}^0(t_i)>0,\\ b_{u,j}, &{} \text{ if }\,\sigma _{h,j}^0(t_i)<0,\\ \text{ undetermined, } &{} \text{ if }\,\sigma _{h,j}^0(t_i)=0. \end{array}\right. \end{aligned}$$

Otherwise if \(\alpha >0\), the following projection formula holds

$$\begin{aligned} u_{h,i}^\alpha ={\mathrm {Pr}}_{[b_l,b_u]}\left[ -\frac{1}{\alpha } \left( B(t_i)^{\mathsf {T}}\lambda _{h,i+1}^\alpha +S(t_i)^{\mathsf {T}}x_{h,i}^\alpha +r(t_i)\right) \right] ,\quad i=0,\ldots ,N-1. \end{aligned}$$
(23)

Analogously to (8), for the solution \(\lambda _h^\alpha \) of (21) we have

$$\begin{aligned} \Vert \lambda _h^\alpha \Vert _\infty \le c_1 \Vert x_h^\alpha \Vert _\infty + c_2 \Vert u_h^\alpha \Vert _1 + c_3 \Vert w\Vert _\infty + c_4 |q| \end{aligned}$$
(24)

with constants \(c_1\)\(c_2\)\(c_3\) and \(c_4\) independent of \((x_h^\alpha ,u_h^\alpha )\)\(w\) and \(q\). For the following proofs of convergence results we need some auxiliary results (cf. [6, Lemma 3.1,Lemma 3.2], [33], Satz 3.2.1, Lemma 3.2.2, Satz 3.2.3]):

Lemma 3.2

Let \((x,u) \in {\mathcal {F}}\) be arbitrary with \(u\) having bounded variation, and let \(\hat{u} \in {\mathcal {U}}_N\) be the piecewise constant approximation of \(u\) in the grid points. Then there exists a uniquely defined \(\hat{x} \in X_{1,N}\), such that \((\hat{x},\hat{u}) \in {\mathcal {F}}_N\), and it holds

$$\begin{aligned} \Vert u-\hat{u}\Vert _1 \le h {\mathrm {V}}_0^T u \quad \text {and} \quad \Vert x-\hat{x}\Vert _\infty \le c_1 h {\mathrm {V}}_0^T \dot{x} \le \left( c_2 + c_3 {\mathrm {V}}_0^T u\right) h, \end{aligned}$$

where \(c_1\)\(c_2\) and \(c_3\) are constants independent of \((x,u)\) and \(N\).

Lemma 3.3

Let \((x_h,u_h) \in {\mathcal {F}}_N\). Then there exists a unique state \(x \in X_1\), such that \((x,u_h) \in {\mathcal {F}}\) and

$$\begin{aligned} \Vert x-x_h\Vert _\infty \le c h, \end{aligned}$$

with a constant \(c\) independent of \((x_h,u_h)\) and \(N\).

Lemma 3.4

Let \((x,u) \in {\mathcal {F}}\) be arbitrary with \(u\) having bounded variation, and let \(\lambda \) be the corresponding adjoint variable. Moreover let \(\hat{u} \in {\mathcal {U}}_N\) be the piecewise constant approximation of \(u\) in the grid points, \(\hat{x} \in X_{1,N}\) the solution of the discrete system Eq. (19) with \(u=\hat{u}\) and \(\hat{\lambda } \in X_{1,N}\) the solution of the discrete adjoint Eq. (21) with \((x,u)=\left( \hat{x}, \hat{u}\right) \). Then it holds

$$\begin{aligned} \Vert \lambda -\hat{\lambda }\Vert _\infty \le c_1 h + c_2 h {\mathsf {V}}_0^Tu \end{aligned}$$

with constants \(c_1\) and \(c_2\) independent of \((x,u)\) and \(N\).

Remark 5

In order to derive convergence results we introduce the following notations. Let \(\hat{u}^0 \in {\mathcal {U}}_N\) be the piecewise constant approximation of \(u^0\) in the grid points and \(\hat{x}^0 \in X_{1,N}\) the uniquely defined solution of the discrete system Eq. (19) with \(u=\hat{u}^0\). Moreover let \(\hat{z}^0 \in X_{1,N}\) resp. \(z_h^\alpha \in X_{1,N}\) be the uniquely defined solutions of the system Eq. (1) with \(u=\hat{u}^0\) resp. \(u=u_h^\alpha \). For arbitrary \(\alpha \ge 0\) it holds

$$\begin{aligned} \left( x^0,u^0\right) \in {\mathcal {F}}, \; \left( \hat{z}^0,\hat{u}^0\right) \in {\mathcal {F}}, \; \left( z_h^\alpha ,u_h^\alpha \right) \in {\mathcal {F}}, \; \left( x_h^\alpha ,u_h^\alpha \right) \in {\mathcal {F}}_N, \; \left( \hat{x}^0 ,\hat{u}^0\right) \in {\mathcal {F}}_N. \end{aligned}$$
(25)

4 Convergence of optimal values

To show convergence of the optimal values we have to prove the following auxiliary result:

Lemma 4.1

For \((x_h,u_h) \in {\mathcal {F}}_N\) it holds

$$\begin{aligned} \left| f(x_h,u_h)-f_N^\alpha (x_h,u_h) \right| \le \bar{c}_1 h + \bar{c}_2 \alpha \end{aligned}$$

with constants \(\bar{c}_1\) and \(\bar{c}_2\) independent of \((x_h,u_h)\)\(\alpha \) and \(N\).

Proof

It follows from the Lipschitz continuity of \(x_h\)\(W\)\(S\)\(w\) and \(r\), that for \((x_h,u_h) \in {\mathcal {F}}_N\) it holds (cf. [6, Lemma 3.3])

$$\begin{aligned} \left| f(x_h,u_h)-f_N^0(x_h,u_h)\right| \le \bar{c}_1 h \end{aligned}$$
(26)

with a constant \(\bar{c}_1\) independent of \((x_h,u_h)\)\(\alpha \) and \(N\). Since \(U\) is bounded, we obtain

$$\begin{aligned} \left| f_N^0(x_h,u_h)-f_N^\alpha (x_h,u_h)\right| = \left| h \sum \limits _{i=0}^{N-1}\frac{\alpha }{2}u_{h,i}^{\mathsf {T}}u_{h,i} \right| \le \frac{T}{2} \Vert u_h\Vert _\infty ^2 \alpha = \bar{c}_2 \alpha , \end{aligned}$$
(27)

with a constant \(\bar{c}_2\) independent of \((x_h,u_h)\)\(\alpha \) and \(N\). With the help of the triangle inequality we obtain the assertion from (26) and (27). \(\square \)

Now we are ready to formulate the following theorem:

Theorem 4.2

Let \((x^0,u^0) \in {\mathcal {F}}\) be a solution of \(\mathrm{(OS)}\), with \(u^0\) having bounded variation. Then, for every solution \((x_h^\alpha ,u_h^\alpha ) \in {\mathcal {F}}_N\) of \(\mathrm{(OS)}_N^\alpha \) it holds

$$\begin{aligned} \left| f\left( x^0,u^0\right) -f_N^\alpha \left( x_h^\alpha ,u_h^\alpha \right) \right| \le \bar{c}_1 h + \bar{c}_2 \alpha \end{aligned}$$

with constants \(\bar{c}_1\) and \(\bar{c}_2\) independent of \((x_h^\alpha ,u_h^\alpha )\)\(\alpha \) and \(N\).

Proof

Since \((x_h^\alpha ,u_h^\alpha )\) is a solution of \(\mathrm{(OS)}_N^\alpha \) we obtain with (25)

$$\begin{aligned} 0 \le f_N^\alpha \left( \hat{x}^0,\hat{u}^0\right)&-f_N^\alpha \left( x_h^\alpha ,u_h^\alpha \right) \le f_N^\alpha \left( \hat{x}^0,\hat{u}^0\right) - f\left( x^0,u^0\right) + f\left( x^0,u^0\right) \\&- f_N^\alpha \left( x_h^\alpha ,u_h^\alpha \right) . \end{aligned}$$

Therefore it holds

$$\begin{aligned} f_N^\alpha \left( x_h^\alpha ,u_h^\alpha \right) - f\left( x^0,u^0\right)&\le f_N^\alpha \left( \hat{x}^0,\hat{u}^0\right) - f\left( x^0,u^0\right) \\&\le f_N^\alpha \left( \hat{x}^0,\hat{u}^0\right) \!-\!f\left( \hat{x}^0, \hat{u}^0\right) \!+\!f\left( \hat{x}^0,\hat{u}^0\right) - f\left( x^0,u^0\right) . \end{aligned}$$

With Lemmas 3.2,  4.1 and the Lipschitz continuity of \(f\) we obtain

$$\begin{aligned} f_N^\alpha \left( x_h^\alpha ,u_h^\alpha \right) - f\left( x^0,u^0\right) \le c_1 h + c_2 \alpha \end{aligned}$$
(28)

with constants \(c_1\) and \(c_2\) independent of \((x_h^\alpha ,u_h^\alpha )\)\(\alpha \) and \(N\). Since \((x^0,u^0)\) solves \(\mathrm{(OS)}\) it holds

$$\begin{aligned} 0 \le f(z_h^\alpha ,u_h^\alpha ) \!-\! f\left( x^0,u^0\right) \!=\!f(z_h^\alpha ,u_h^\alpha ) - f_N^\alpha \left( x_h^\alpha ,u_h^\alpha \right) \!+\!f_N^\alpha \left( x_h^\alpha ,u_h^\alpha \right) -f\left( x^0,u^0\right) \end{aligned}$$

and therefore

$$\begin{aligned} f\left( x^0,u^0\right) -f_N^\alpha \left( x_h^\alpha ,u_h^\alpha \right)&\le f(z_h^\alpha ,u_h^\alpha ) -f_N^\alpha \left( x_h^\alpha ,u_h^\alpha \right) \\&\le f(z_h^\alpha ,u_h^\alpha ) - f\left( x_h^\alpha ,u_h^\alpha \right) +f\left( x_h^\alpha ,u_h^\alpha \right) -f_N^\alpha \left( x_h^\alpha ,u_h^\alpha \right) . \end{aligned}$$

With Lemmas 3.3,  4.1 and the Lipschitz continuity of \(f\) we obtain

$$\begin{aligned} f\left( x^0,u^0\right) - f_N^\alpha \left( x_h^\alpha ,u_h^\alpha \right) \le c_3 h + c_4 \alpha \end{aligned}$$

with constants \(c_3\) and \(c_4\) independent of \((x_h^\alpha ,u_h^\alpha )\)\(\alpha \) and \(N\). Together with (28) this leads to the desired assertion. \(\square \)

5 Hölder-type error estimates

With the help of Theorem 4.2 we can now prove the following convergence result for the discrete regularization:

Theorem 5.1

Let \((x^0,u^0) \in {\mathcal {F}}\) be a solution of \(\mathrm{(OS)}\) that fulfills the Assumptions \(\mathrm{(A1)}\) and \(\mathrm{(A2)}^k\). Then, for sufficiently small \(h\) every solution \((x_h^\alpha , u_h^\alpha ) \in {\mathcal {F}}_N\) of \(\mathrm{(OS)}_N^\alpha \) can be estimated by

$$\begin{aligned} \Vert u_h^\alpha -u^0\Vert _1\le c_u (h+\alpha )^\frac{1}{k+1}\quad \text {and} \quad \Vert x_h^\alpha -x^0\Vert _\infty \le c_x (h+\alpha )^\frac{1}{k+1}. \end{aligned}$$

For the corresponding adjoint variables we get

$$\begin{aligned} \Vert \lambda _h^\alpha -\lambda ^0\Vert _\infty \le c_\lambda (h+\alpha )^\frac{1}{k+1}. \end{aligned}$$

The constants \(c_u\)\(c_x\) and \(c_\lambda \) are independent of \((x_h^\alpha ,u_h^\alpha )\)\(\alpha \) and \(N\).

Proof

From Theorem 1.4 we know that there exist constants \(\beta \)\(\gamma \)\(\bar{\delta }>0\), such that with (25) we have

$$\begin{aligned} f(z_h^\alpha ,u_h^\alpha )-f\left( x^0,u^0\right) \ge \beta \Vert u_h^\alpha -u^0\Vert _1^{k+1}, \end{aligned}$$
(29)

if \(\Vert u_h^\alpha -u^0\Vert _1 \le 2 \gamma \bar{\delta }\) and

$$\begin{aligned} f(z_h^\alpha ,u_h^\alpha )-f\left( x^0,u^0\right) \ge \beta \Vert u_h^\alpha -u^0\Vert _1, \end{aligned}$$
(30)

if \(\Vert u_h^\alpha -u^0\Vert _1 \ge 2 \gamma \bar{\delta }\). With Lemmas 3.3, 4.1, Theorem 4.2 and the Lipschitz continuity of \(f\) we further obtain

$$\begin{aligned} f(z_h^\alpha ,u_h^\alpha )-f\left( x^0,u^0\right)&\le \left| f(z_h^\alpha ,u_h^\alpha )-f\left( x_h^\alpha ,u_h^\alpha \right) \right| +\left| f\left( x_h^\alpha ,u_h^\alpha \right) -f_N^\alpha \left( x_h^\alpha , u_h^\alpha \right) \right| \\&\quad + \left| f_N^\alpha \left( x_h^\alpha ,u_h^\alpha \right) -f\left( x^0,u^0\right) \right| \\&\le c_1 L_f \Vert z_h^\alpha -x_h^\alpha \Vert _\infty + c_2 h + c_3 \alpha \\&\le \bar{c} (h + \alpha ) \end{aligned}$$

with a constant \(\bar{c}\) independent of \((x_h^\alpha ,u_h^\alpha )\)\(\alpha \) and \(N\). Together with (29) and (30) it follows

$$\begin{aligned} \Vert u_h^\alpha -u^0\Vert _1 \le \max \left\{ \bar{c}\beta ^{-1} (h+\alpha ), \left( \bar{c}\beta ^{-1}(h + \alpha )\right) ^\frac{1}{k+1}\right\} . \end{aligned}$$

For sufficiently small \(h\) and \(\alpha \) it holds \(\Vert u_h^\alpha -u^0\Vert _1 \le 2 \gamma \bar{\delta }\), and from (29) we obtain

$$\begin{aligned} \Vert u_h^\alpha -u^0\Vert _1 \le c_u (h + \alpha )^\frac{1}{k+1} \end{aligned}$$
(31)

with a constant \(c_u\) independent of \((x_h^\alpha ,u_h^\alpha )\)\(\alpha \) and \(N\). From Assumption \(\mathrm{(A1)}\) we know that \(u^0\) has bounded variation, and together with (20), (31) and Lemma 3.2 it follows for the state variables

$$\begin{aligned} \Vert x_h^\alpha -x^0\Vert _\infty&= \Vert x_h^\alpha -\hat{x}^0+\hat{x}^0-x^0\Vert _\infty \nonumber \\&\le \Vert x_h^\alpha -\hat{x}^0\Vert _\infty +\Vert \hat{x}^0-x^0\Vert _\infty \nonumber \\&\le c_3 \Vert u_h^\alpha -\hat{u}^0\Vert _1 + c_4 h \nonumber \\&\le c_3 \Vert u_h^\alpha -u^0+u^0-\hat{u}^0\Vert _1 + c_4 h \nonumber \\&\le c_z (h+\alpha )^\frac{1}{k+1} \end{aligned}$$
(32)

with a constant \(c_z\) independent of \((x_h^\alpha , u_h^\alpha )\) and \(N\). For the corresponding adjoint states we obtain from (24), (31), (32), Lemmas 3.2 and  3.4

$$\begin{aligned} \Vert \lambda _h^\alpha -\lambda ^0\Vert _\infty&= \Vert \lambda _h^\alpha -\hat{\lambda }^0+\hat{\lambda }^0-\lambda ^0\Vert _\infty \\&\le \Vert \lambda _h^\alpha -\hat{\lambda }^0\Vert _\infty +\Vert \hat{\lambda }^0 -\lambda ^0\Vert _\infty \\&\le c_5 \Vert x_h^\alpha -\hat{x}^0\Vert _\infty + c_6 \Vert u_h^\alpha - \hat{u}^0\Vert _1 + c_7 h \\&\le c_\lambda (h+\alpha )^\frac{1}{k+1} \end{aligned}$$

with a constant \(c_\lambda \) independent of \((x_h^\alpha , u_h^\alpha )\) and \(N\). \(\square \)

By choosing the regularization parameter \(\alpha \) with respect to the mesh size \(h\) of the discretization as \(\alpha := c_\alpha h\) with a constant \(c_\alpha \ge 0\) independent of \(N\), it follows directly from Theorem 5.1:

Corollary 5.2

Let \((x^0,u^0) \in {\mathcal {F}}\) be a solution \(\mathrm{(OS)}\) that fulfills the Assumptions \(\mathrm{(A1)}\) and \(\mathrm{(A2)}^k\). Moreover we choose \(\alpha := c_\alpha h\) with a constant \(c_\alpha \ge 0\) independent of \(N\). Then, for sufficiently small \(h\) every solution \((x_h^\alpha , u_h^\alpha ) \in {\mathcal {F}}_N\) of \(\mathrm{(OS)}_N^\alpha \) can be estimated by

$$\begin{aligned} \Vert u_h^\alpha -u^0\Vert _1\le c_u h^\frac{1}{k+1}\quad \text {and} \quad \Vert x_h^\alpha -x^0\Vert _\infty \le c_x h^\frac{1}{k+1}. \end{aligned}$$

For the corresponding adjoint variables we get

$$\begin{aligned} \Vert \lambda _h^\alpha -\lambda ^0\Vert _\infty \le c_\lambda h^\frac{1}{k+1}. \end{aligned}$$

The constants \(c_u\)\(c_x\) and \(c_\lambda \) are independent of \((x_h^\alpha ,u_h^\alpha )\) and \(N\).

Remark 6

For \(\alpha =0\), Corollary 5.2 directly gives Hölder-type error estimates for the direct Euler discretization of \(\mathrm{(OS)}\), under the Assumptions \(\mathrm{(A1)}\) and \(\mathrm{(A2)}^k\). Therefore the results of this paper extend the findings of [7] and [6], too. Equivalent convergence results for an implicit discretization scheme can be found in [33].

6 Improved error estimates for linear problems

The error estimates of Theorem 5.1 and Corollary 5.2 can be improved for the following class of linear problems:

figure f

For arbitrary \(\alpha \ge 0\) the adjoint equation of \(\mathrm{(OL)}^\alpha \)

$$\begin{aligned} -\dot{\lambda }^\alpha (t)&=A(t)^{\mathsf {T}}\lambda ^\alpha (t) \quad \text {a.e. on } [0,T],\\ \lambda ^\alpha (T)&=q \end{aligned}$$

is independent of \((x,u)\) and \(\alpha \). Therefore it can be solved directly, and it holds \(\lambda ^0=\lambda ^\alpha \). For \(t \in [0,T]\) the corresponding switching function is defined by

$$\begin{aligned} \sigma ^\alpha (t) := B(t)^{\mathsf {T}}\lambda ^\alpha (t)+\alpha u^\alpha (t)=B(t)^{\mathsf {T}}\lambda ^0(t)+\alpha u^\alpha (t)=\sigma ^0(t)+\alpha u^\alpha (t). \end{aligned}$$

The discrete adjoint equation for the finite-dimensional Problem \(\mathrm{(OL)}_N^\alpha \) is given by

$$\begin{aligned} -\frac{\lambda _{h,i+1}^\alpha -\lambda _{h,i}^\alpha }{h}&= A(t_i)^{\mathsf {T}}\lambda _{h,i+1}^\alpha , \quad i=0,\dots ,N-1 \\ \lambda _{h,N}^\alpha&= q. \end{aligned}$$

Again, it holds \(\lambda _h^0=\lambda _h^\alpha \). The corresponding discrete switching function \(\sigma _h^\alpha \in X_{1,N}\) is a continuous, piecewise linear function \(\sigma _h^\alpha :[0,t_{N-1}] \rightarrow {\mathbb {R}}^m\), which is uniquely defined by

$$\begin{aligned} \sigma _{h,i}^\alpha := B(t_i)^{\mathsf {T}}\lambda _{h,i+1}^\alpha + \alpha u_{h,i}^\alpha = \sigma _{h,i}^0 + \alpha u_{h,i}^\alpha , \quad i=0,\dots ,N-1. \end{aligned}$$

From (23) we know for \(\alpha > 0\), that we can characterize the optimal control \(u_h^\alpha \) of \(\mathrm{(OL)}_N^\alpha \) by

$$\begin{aligned} u_{h,i}^\alpha ={{\mathrm {Pr}}}_{[b_l,b_u]}\left[ -\frac{1}{\alpha } \sigma _{h,i}^0\right] , \quad i=0,\dots ,N-1. \end{aligned}$$
(33)

Since for our linear problems \(\mathrm{(OL)}\) resp. \(\mathrm{(OL)}_N^\alpha \) the corresponding adjoint variables \(\lambda ^0\) resp. \(\hat{\lambda }^0\) to \((x^0,u^0)\) resp. \((\hat{x}^0,\hat{u}^0)\) (comp. (25)) are independent of \((x,u)\) and \(\alpha \). Therefore it holds \(\lambda ^\alpha =\lambda ^0\) and \(\lambda _h^\alpha =\hat{\lambda }^0\). From Lemma 3.4 we obtain

$$\begin{aligned} \Vert \lambda _h^\alpha -\lambda ^0\Vert _\infty = \Vert \hat{\lambda }^0-\lambda ^0\Vert _\infty \le c_\lambda h \end{aligned}$$
(34)

with a constant \(c_\lambda \) independent of \((x_h^\alpha , u_h^\alpha )\)\(\alpha \) and \(N\). For arbitrary \(i=0, \dots , N-1\) it further holds

$$\begin{aligned} \left| \sigma _h^0(t_i)-\sigma ^0(t_i) \right|&\le \left| B(t_i)^{\mathsf {T}}\lambda _h^0(t_{i+1})- B(t_i)^{\mathsf {T}}\lambda ^0(t_i)\right| \\&\le \Vert B\Vert _\infty \left| \lambda _h^0(t_i)-\lambda ^0(t_i)\right| + \Vert B\Vert _\infty \left| \lambda _h^0(t_{i+1})-\lambda _h^0(t_i)\right| \\&\le \Vert B\Vert _\infty \Vert \lambda _h^0-\lambda ^0\Vert _\infty + \Vert B\Vert _\infty L_{\lambda _h^0} h\\&=\Vert B\Vert _\infty \Vert \lambda _h^\alpha -\lambda ^0\Vert _\infty + \Vert B\Vert _\infty L_{\lambda _h^0} h\\&\le c h \end{aligned}$$

with a constant \(c\) independent of \((x_h^\alpha , u_h^\alpha )\)\(\alpha \) and \(N\). With the Lipschitz continuity of \(\sigma ^0\) and \(\sigma _h^0\) it follows for \(i = 0,\dots ,N-2\) and \(t \in [t_i, t_{i+1})\)

$$\begin{aligned} \left| \sigma _h^0(t)-\sigma ^0(t)\right|&= \left| \sigma _h^0(t)-\sigma _h^0(t_i)+\sigma _h^0(t_i)-\sigma ^0(t_i) +\sigma ^0(t_i)-\sigma ^0(t)\right| \nonumber \\&\le \left| \sigma _h^0(t_i)-\sigma ^0(t_i)\right| +\left| \sigma _h^0(t)-\sigma _h^0(t_i)\right| +\left| \sigma ^0(t_i)-\sigma ^0(t) \right| \nonumber \\&\le c h + \left( L_{\sigma _h^0} + L_{\sigma ^0}\right) h \nonumber \\&\le c_\sigma h \end{aligned}$$
(35)

with a constant \(c_\sigma \) independent of \((x_h^\alpha , u_h^\alpha )\)\(\alpha \) and \(N\). From (34) and (35) we obtain

$$\begin{aligned} \Vert \lambda _h^\alpha -\lambda ^0\Vert _\infty \le c_\lambda h \quad \text {and} \quad \max \limits _{t \in [0,t_{N-1}]}|\sigma _h^0(t)-\sigma ^0(t)| \le c_\sigma h \end{aligned}$$
(36)

with constants \(c_\lambda \) and \(c_\sigma \) independent of \((x_h^\alpha , u_h^\alpha )\)\(\alpha \) and \(N\). Using (36) we are now able to prove the following improved error estimates:

Theorem 6.1

Let \((x^0,u^0) \in {\mathcal {F}}\) be a solution of \(\mathrm{(OL)}\) that fulfills the Assumptions \(\mathrm{(A1)}\) and \(\mathrm{(A2)}^k\). Then, for sufficiently small \(h\) and \(\alpha \) every solution \((x_h^\alpha , u_h^\alpha ) \in {\mathcal {F}}_N\) of \(\mathrm{(OL)}_N^\alpha \) can be estimated by

$$\begin{aligned} \Vert u_h^\alpha -u^0\Vert _1\le c_u (h+\alpha )^\frac{1}{k}\quad \text {and} \quad \Vert x_h^\alpha -x^0\Vert _\infty \le c_x (h+\alpha )^\frac{1}{k}. \end{aligned}$$

For the corresponding adjoint variable it holds

$$\begin{aligned} \Vert \lambda _h^\alpha -\lambda ^0\Vert _\infty \le c_\lambda h. \end{aligned}$$

Moreover \(u_h^\alpha \) and \(u^0\) coincide, except on a set \(M\) of measure \(\mu (M) \le \bar{\kappa }(h+\alpha )^{\frac{1}{k}}\). The constants \(c_u\)\(c_x\)\(c_\lambda \) and \(\bar{\kappa }\) are independent of \((x_h^\alpha , u_h^\alpha )\)\(\alpha \) and \(N\).

Proof

Let

$$\begin{aligned} \Sigma _j := \left\{ \tau _1,\dots ,\tau _{l_j}\right\} \text { with } 0 <\tau _1<\cdots <\tau _{l_j}<T \end{aligned}$$

be the set of zeros of \(\sigma _j^0\). Moreover we define

$$\begin{aligned} I_{-}(\delta ) := \bigcup \limits _{\iota =1,\dots ,l_j} \left[ \tau _\iota -\delta ,\tau _\iota +\delta \right] \quad \text {and} \quad I_{+}(\delta ) := [0,T] \setminus \left( I_{-}(\delta ) \cup [t_{N-1},T]\right) . \end{aligned}$$

Since the switching function \(\sigma _j\) is Lipschitz continuous it follows from the Assumptions \(\mathrm{(A1)}\) and \(\mathrm{(A2)}^k\)

$$\begin{aligned} \sigma _{j,\min }^0 := \min _{t \in I_+(\bar{\tau })} |\sigma _j^0(t)| >0. \end{aligned}$$

From (36) we obtain

$$\begin{aligned} |\sigma _{h,j}^0(t)| \ge |\sigma _j^0(t)|-c_\sigma h \ge \sigma _{j,\min }^0 - c_\sigma h \quad \forall t \in I_+(\bar{\tau }). \end{aligned}$$

For sufficiently small \(h\) this implies

$$\begin{aligned} \sigma _{h,j,\min }^0 := \min _{t \in I_+(\bar{\tau })} |\sigma _{h,j}^0(t)| \ge \frac{1}{2} \sigma _{j,\min }^0 > 0. \end{aligned}$$

For arbitrary \(\iota \in {1,\dots ,l_j}\) we obtain from \(\mathrm{(A2)}^k\) and (36) for \(t \in [\tau _\iota -\bar{\tau },\tau _\iota +\bar{\tau }]\)

$$\begin{aligned} |\sigma _{h,j}^0(t)| \ge |\sigma _j^0(t)|-c_\sigma h \ge \bar{\sigma }|t-\tau _\iota |^k - c_\sigma h \end{aligned}$$

and therefore

$$\begin{aligned} \left| \frac{1}{\alpha }\sigma _{h,j}^0(t)\right| \ge \frac{1}{\alpha }|\sigma _j^0(t)|-c_\sigma \frac{h}{\alpha } \ge \frac{\bar{\sigma }}{\alpha } |t-\tau _\iota |^k - c_\sigma \frac{h}{\alpha }. \end{aligned}$$

We define

$$\begin{aligned} \gamma _j := \max \left\{ -b_{l,j},b_{u,j}\right\} \end{aligned}$$

and

$$\begin{aligned} d_j(h,\alpha ) := \left( \frac{1}{\bar{\sigma }}\left( c_\sigma h + \gamma _j \alpha \right) \right) ^\frac{1}{k}. \end{aligned}$$

It holds \(\left| \frac{1}{\alpha }\sigma _{h,j}^0(t)\right| >\gamma _j\) and therefore \(u_{h,j}^\alpha =u_j^0\), if \(|t-\tau _\iota | > d_j(h,\alpha )\). Now we choose the index \(i \in \{0,\dots ,N-1\}\), such that \(\tau _\iota \in [t_i,t_{i+1})\). Let \(\ell \in {\mathbb {N}}\) be the smallest number for which it holds

$$\begin{aligned} \ell > d_j(h,\alpha ) h^{-1}. \end{aligned}$$

Then we obtain

$$\begin{aligned} t_{i+\ell +1}-\tau _\iota \ge t_{i+\ell +1}-t_{i+1} = \ell h > d_j(h,\alpha ) \end{aligned}$$

and

$$\begin{aligned} \tau _\iota - t_{i-\ell } \ge t_i - t_{i-\ell }= \ell h > d_j(h,\alpha ) \end{aligned}$$

and

$$\begin{aligned} d_j(h,\alpha ) h^{-1} < \ell \le d_j(h,\alpha ) h^{-1} + 1. \end{aligned}$$

By setting

$$\begin{aligned} \ell _\iota ^+ := i+\ell +1 \quad \text {and} \quad \ell _\iota ^- := i-\ell \end{aligned}$$

we get for sufficiently small \(h\)

$$\begin{aligned} t_{\ell _\iota ^+}-t_{\ell _\iota ^-} = (2 \ell + 1) h \le \left( 2 d_j(h,\alpha ) h^{-1} + 3\right) h \le \kappa d_j(h,\alpha ) \end{aligned}$$

with a constant \(\kappa \) independent of \((x_h^\alpha , u_h^\alpha )\)\(\alpha \) and \(N\). For sufficiently small \(h\) and \(\alpha \) it follows directly

$$\begin{aligned}{}[t_{\ell _\iota ^-},t_{\ell _\iota ^+}] \subset \left[ \tau _\iota -\bar{\tau },\tau _\iota +\bar{\tau }\right] . \end{aligned}$$

By defining

$$\begin{aligned} I_{j,-} := \left( \bigcup \limits _{\iota =1,\dots ,l_j} [t_{\ell _\iota ^-},t_{\ell _\iota ^+}]\right) \cup [t_{N-1},T] \end{aligned}$$

we obtain

$$\begin{aligned} \left| \frac{1}{\alpha }\sigma _{h,j}^0(t)\right| >\gamma _j \quad \forall \, t \in [0,T] \setminus I_{j,-}. \end{aligned}$$
(37)

Now we set

$$\begin{aligned} \gamma := \max _{1 \le j \le m} \gamma _j, \quad d(h,\alpha ) := \left( \frac{1}{\bar{\sigma }}\left( c_\sigma h + \gamma \alpha \right) \right) ^\frac{1}{k} \quad \text {and} \quad M := \bigcup \limits _{j=1,\dots ,m} I_{j,-}. \end{aligned}$$

From (37), together with the projection formula (33) and the estimate (36), we can deduce that \(u^0\) and \(u_h^\alpha \) coincide on \([0,T] \setminus M\). The measure \(\mu (M)\) can be estimated by

$$\begin{aligned} \mu (M) \le \kappa \sum \limits _{j=1}^m l_j d_j(h,\alpha ) \le \kappa \sum \limits _{j=1}^m l_j d(h,\alpha ) \le \bar{\kappa }(h+\alpha )^{\frac{1}{k}} \end{aligned}$$
(38)

with a constant \(\bar{\kappa }\) independent of \((x_h^\alpha , u_h^\alpha )\)\(\alpha \) and \(N\). In order to estimate \(\Vert u_h^\alpha -u^0\Vert _1\) we define \(\rho := \max \limits _{1\le j \le m}(b_{u,j}-b_{l,j})\) and it holds

$$\begin{aligned} \Vert u_h^\alpha -u^0\Vert _1&= \int _0^T \! |u_h^\alpha (t)-u^0(t)| \,{\mathrm {d}}t\nonumber \\&= \int _M \! |u_h^\alpha (t)-u^0(t)| \,{\mathrm {d}}t\nonumber \\&\le \rho \mu (M) \le \rho \bar{\kappa }(h+\alpha )^{\frac{1}{k}} = c_u (h+\alpha )^{\frac{1}{k}} \end{aligned}$$
(39)

with a constant \(c_u\) independent of \((x_h^\alpha , u_h^\alpha )\)\(\alpha \) and \(N\). For the state variables we obtain analogously to the proof of Theorem 5.1 from (39)

$$\begin{aligned} \Vert x_h^\alpha -x^0\Vert _\infty \le c_x (h+\alpha )^{\frac{1}{k}} \end{aligned}$$
(40)

with a constant \(c_x\) independent of \((x_h^\alpha , u_h^\alpha )\)\(\alpha \) and \(N\). We get the desired assertion from the estimates (36), (38), (39) and (40). \(\square \)

Again, by coupling \(\alpha \) with \(h\) we obtain the following convergence result directly from Theorem 6.1:

Corollary 6.2

Let \((x^0,u^0) \in {\mathcal {F}}\) be a solution of \(\mathrm{(OL)}\) that fulfills the Assumptions \(\mathrm{(A1)}\) and \(\mathrm{(A2)}^k\). Moreover we choose  \(\alpha := c_\alpha h\), with a constant \(c_\alpha >0\) independent of \(N\). Then, for sufficiently small \(h\), every solution \((x_h^\alpha , u_h^\alpha ) \in {\mathcal {F}}_N\) of \(\mathrm{(OL)}_N^\alpha \) can be estimated by

$$\begin{aligned} \Vert u_h^\alpha -u^0\Vert _1\le c_u h^\frac{1}{k} \quad \text {and} \quad \Vert x_h^\alpha -x^0\Vert _\infty \le c_x h^\frac{1}{k}. \end{aligned}$$

For the corresponding adjoint variable it holds

$$\begin{aligned} \Vert \lambda _h^\alpha -\lambda ^0\Vert _\infty \le c_\lambda h\,. \end{aligned}$$

Moreover \(u_h^\alpha \) and \(u^0\) coincide, except on a set \(M\) of measure \(\mu (M) \le \bar{\kappa }h^{\frac{1}{k}}\). The constants \(c_u\)\(c_x\), \(c_\lambda \) and \(\bar{\kappa }\) are independent of \((x_h^\alpha , u_h^\alpha )\) and \(N\).

Remark 7

Similar results as in Corollary 6.2 can be found in [19] for the direct Euler discretization of Mayer type problems with controllability index \(k\) (see Remark 3).

7 Numerical example

We illustrate the theoretical findings of Corollary 6.2 by approximating the solution of \(\mathrm{(B)}_k\) from Example 1.2 using discrete regularization:

Example 7.1

We choose \(\alpha := 10h\) and define \(e_N := \Vert u_h^\alpha -u^0\Vert _1\). The numerical results of solving \(\mathrm{(B)}_k\) with the discrete regularization technique are displayed in Table 1 and Fig. 1. They confirm the theoretical results of Corollary 6.2.

Fig. 1
figure 1

Example \(\mathrm{(B)}_k\): solution discrete regularization with \(\alpha =10h\)

Table 1 Example \(\mathrm{(B)}_k\): error discrete regularization with \(\alpha =10h\)

Remark 8

Figure 2 shows the solution of the direct Euler discretization of Example \(\mathrm{(B)}_3\). It seems like the optimal control has three switching points. As we can see in Fig. 1 the combined regularization–discretization approach gives a better understanding of the structure of the optimal control.

Fig. 2
figure 2

Example \(\mathrm{(B)}_k\): solution direct Euler discretization

8 Conclusions

In this paper we proved error estimates for a combined regularization–discretization approach for a class of optimal control problems with bang–bang solutions. We were able to prove convergence of order \({\mathcal {O}}(h^{1/(k+1)})\) with respect to the controllability index \(k\) for problems with mixed state-control-term in the cost functional under weaker assumptions on the structure of the switching function than in [4, 6, 7].