1 Introduction

Discretization methods like Euler discretization are used for the numerical solution of optimal control problems. The accuracy of the approximate solutions obtained in this way are often satisfactory from a practical point of view. However, if the optimal control has a special structure, a discretization method may help to detect this structure and then methods based on structural assumptions can be used to determine the optimal control more accurately. Especially for bang–bang controls Euler discretization can be used to compute approximations of the switching points, and then efficient numerical approaches, such as switching time parameterization, can be employed to determine the switching times more accurately (see e.g. Kaya et al. [23], Maurer et al. [29], Osmolovskii and Maurer [34] and the papers cited therein). It is well-known that, in particular when the solution controls are of bang–bang or bang-singular type, many difficulties are encountered in getting an approximate solution. Therefore, it is of practical interest to have conditions implying error estimates for approximate solutions which ensure that the approximate controls (optimal controls for the discretized problems) converge to the optimal control of the original, continuous-time control problem. Such error estimates are closely related to estimates for solutions of perturbed optimal control problems.

Discretization and perturbation of nonlinear optimal control problems governed by ordinary differential equations are well studied for the case that the optimal control is sufficiently smooth, and the results are usually based on strong second-order optimality conditions which require coercivity of the second derivative of the Lagrangian function with respect to the control variables (see e.g. Dontchev and Hager [10, 11], Dontchev et al. [12], Malanowski [24,25,26], Malanowski et al. [27], Alt [1,2,3]). For control problems with control appearing linearly such conditions are not satisfied and the optimal control may be discontinuous. Therefore, there have been only a few papers on discretization of such problems (see e.g. Alt and Mackenroth [5], Dhamo and Tröltzsch [9], Veliov [41] and the papers cited therein).

New second-order optimality conditions for optimal control problems with control appearing linearly have been developed during the last 10–15 years (see e.g. Felgenhauer [15,16,17,18, 20], Maurer et al. [29], Osmolovskii and Maurer [32,33,34] and the papers cited therein). In case of bang–bang controls these conditions have been used in Alt et al. [4], Alt and Seydenschwanz [7], and in Seydenschwanz [40] to obtain error estimates for Euler discretization of linear-quadratic optimal control problems governed by ordinary differential equations and in Deckelnick and Hinze [8] for discretizations of elliptic control problems. For convex control problems of Mayer type with a linear system equation and bang–bang solutions Veliov [41] has shown convergence of order 1 for Euler discretization. These results have been extended in Haunschmied et al. [21] under more general conditions based on a result on stability of optimal control problems under strong bi-metric regularity of Quincampoix and Veliov [36]. Pietrus et al. [35] investigate high order discrete approximations to Mayer type problems based on second order Volterra-Fliess approximations. Felgenhauer [19] shows convergence of order 1 for a class of nonlinear optimal control problems, where the linear term in the system equation does not depend on the state variables and the solution has bang-singular-bang structure. Alt et al. [6] prove convergence of order 1 for implicit Euler discretization of a general class of convex, linear-quadratic control problems with bang–bang solutions.

In the present paper we investigate a class of optimal control with a nonlinear cost functional of Mayer type, a nonlinear system equation with control appearing linearly and constraints defined by lower and upper bounds for the controls. Under the assumption that the cost functional satisfies a growth condition of order \(\kappa \ge 1\) we prove for the discrete solutions Hölder type error estimates of order \(1/\kappa \) w.r.t. the mesh size of the discretization. If a stronger second-order condition for the derivative of the Lagrangian w.r.t. the control and a weakened coercivity condition for the second derivative of the Lagrangian are satisfied, the order of convergence can be improved to 1.

We use the following notations: \(\mathbb {R}^n\) is the n-dimensional Euclidean space with the inner product denoted by \(\langle x, y\rangle \) and the norm \(|x| = \langle x,x\rangle ^{1/2}\). For an \(m\times n\)-matrix M we denote the spectral norm by \(\Vert M\Vert = \sup _{|z| \le 1} |Mz|\). Let \(t_0,t_f\in \mathbb {R}\), \(t_0<t_f\). We denote by \(L^1(t_0,t_f;\mathbb {R}^m)\) the Banach space of integrable, measurable functions \(u:[t_0,t_f]\rightarrow \mathbb {R}^m\) with

$$\begin{aligned} \Vert u\Vert _1 = \int _{t_0}^{t_f} \sum _{i=1}^m |u_i(t)|\,{\mathrm {d}}t= \sum _{i=1}^m \Vert u_i\Vert _1< \infty , \end{aligned}$$

by \(L^\infty (t_0,t_f;\mathbb {R}^m)\) the Banach space of essentially bounded functions \(u:[t_0,t_f]\rightarrow \mathbb {R}^m\) with the norm

$$\begin{aligned} \Vert u\Vert _\infty = \max _{1\le i\le m} {{\mathrm{ess\,sup}}}_{t\in [t_0,t_f ]} |u_i(t)|, \end{aligned}$$

and \(C(t_0,t_f;\mathbb {R}^m)\) is the Banach space of continuous functions \(u:[t_0,t_f]\rightarrow \mathbb {R}^m\) with the norm

$$\begin{aligned} \Vert u\Vert _\infty = \max _{1\le i\le m} \max _{t\in [t_0,t_f ]} |u_i(t)|. \end{aligned}$$

For \(p\in \{1,\infty \}\) we denote by \(W^1_p(t_0,t_f;\mathbb {R}^n)\) the spaces of absolutely continuous functions on \([t_0,t_f]\) with derivative in \(L^p(t_0,t_f;\mathbb {R}^n)\), i.e.

$$\begin{aligned} W^1_p(t_0,t_f;\mathbb {R}^n) = \left\{ x\in L^p(t_0,t_f;\mathbb {R}^n)\mid \dot{x}\in L^p(t_0,t_f;\mathbb {R}^n)\right\} \end{aligned}$$

with

$$\begin{aligned} \Vert x\Vert _{1,1} = |x(t_0)|+\Vert \dot{x}\Vert _1,\quad \Vert x\Vert _{1,\infty } = \max \left\{ \Vert x\Vert _\infty ,\Vert \dot{x}\Vert _\infty \right\} . \end{aligned}$$

Let \(X= X_1\times X_2\), where \(X_1 = W^1_1(t_0,t_f;\mathbb {R}^n)\), \(X_2 = L^1(t_0,t_f;\mathbb {R}^m)\). We consider the following optimal control problem:

where g is defined by

$$\begin{aligned} g(x,u,t) = g^{(1)}(x,t)+g^{(2)}(x,t)u. \end{aligned}$$
(1.1)

Here, \(u(t)\in \mathbb {R}^m\) is the control, and \(x(t)\in \mathbb {R}^n\) is the state of a system at time \(t\in [t_0,t_f]\). Further \(f:\mathbb {R}^n\rightarrow \mathbb {R}\), \(g^{(1)}:\mathbb {R}^n\times [t_0,t_f]\rightarrow \mathbb {R}^n\), \(g^{(2)}:\mathbb {R}^n\times [t_0,t_f]\rightarrow \mathbb {R}^{n\times m}\), and the set \(U\subset \mathbb {R}^m\) is defined by lower and upper bounds, i.e.,

$$\begin{aligned} U = \{u\in \mathbb {R}^m\mid b_{\ell }\le u\le b_u\} \end{aligned}$$

with \(b_{\ell }, b_u\in \mathbb {R}^m\), \(b_{\ell }<b_u\), where all inequalities are to be understood componentwise.

The organization of the paper is as follows. In Sect. 2 we state some basic results. Section 3 introduces the Euler discretization for Problem (OC). In Sect. 4 Hölder type error estimates are derived assuming a growth condition for the cost functional. Under stronger second-order conditions we prove in Sect. 5 convergence of order 1. Section 6 discusses some numerical results.

2 Basic results

We denote by

$$\begin{aligned} {\mathscr {U}}=\{u\in X_2\mid u(t)\in U\,\forall ' t\in [t_0,t_f]\} \end{aligned}$$

the set of admissible controls, and by

$$\begin{aligned} {{\mathscr {F}}}=\big \{(x,u)\in X\mid {\dot{x}}(t) = g(x(t),u(t),t)\;\hbox { a.e. on}\ [t_0,t_f],\; x(t_0)=a,\;u\in {\mathscr {U}}\,\big \} \end{aligned}$$

the feasible set of Problem (OC). For \(\varepsilon >0\) and \((x^*,u^*)\in X\)

$$\begin{aligned} {{\mathscr {B}}}_\varepsilon (x^*,u^*)=\left\{ (x,u)\in X\mid \Vert x-x^*\Vert _\infty< \varepsilon ,\; \Vert u-u^*\Vert _1 < \varepsilon \,\right\} . \end{aligned}$$

is the open ball around \((x^*,u^*)\) with radius \(\varepsilon \).

Definition 1

A pair \((x^*,u^*)\in {{\mathscr {F}}}\) is called a local minimizer of f on \({\mathscr {F}}\) or a local solution of Problem (OC), if there exists \(\varepsilon >0\) such that \(f(x^*(t_f))\le f(x(t_f))\) for all \((x,u)\in {{\mathscr {F}}}\cap {{\mathscr {B}}}_\varepsilon (x^*,u^*)\). \(\Diamond \)

Note that we allow discontinuous optimal controls, especially bang–bang controls. Therefore, we consider local solutions w.r.t. the \(L^1\)-norm for control functions. We suppose in the following:

  1. (2.1)

    There exists \(\bar{\varepsilon }>0\) and \((x^*,u^*)\in {{\mathscr {F}}}\) such that \(f(x^*(t_f))\le f(x(t_f))\) for all \((x,u)\in {{\mathscr {F}}}\cap {\mathscr {B}}_{\bar{\varepsilon }}(x^*,u^*)\), i.e., \((x^*,u^*)\) is a local solution of (OC).

Since U is bounded, there exists a constant \(K_u\) such that for all \(u\in {{\mathscr {U}}}\)

$$\begin{aligned} |u(t)| \le K_u\quad \hbox { a.e. on}\ [t_0,t_f]. \end{aligned}$$
(2.2)

Let

$$\begin{aligned} {{\mathscr {N}}}_{\bar{\varepsilon }}(x^*) = \left\{ x\in X_1\mid \Vert x-x^*\Vert _\infty < \bar{\varepsilon }\right\} , \end{aligned}$$

and let \({{\mathscr {B}}}\subset \mathbb {R}^n\) be a convex and open set such that

$$\begin{aligned} {{\mathscr {B}}} \supset \left\{ z\in \mathbb {R}^n\mid z=x(t)\; \text {for some }\; t\in [t_0,t_f]\; \text {and some } \;x\in {\mathscr {N}}_{\bar{\varepsilon }}(x^*)\right\} . \end{aligned}$$

For given numbers \(n_1,n_2\in \mathbb {N}\), \(n_1\le n_2\), we define

$$\begin{aligned} J_{n_1}^{n_2} = \{n_1,n_1+1,\ldots ,n_2\}. \end{aligned}$$

We suppose that the following assumptions are satisfied:

  1. (2.3)

    The functions f, \(g^{(1)}\), and \(g^{(2)}\) are continuously differentiable w.r.t. x on \({{\mathscr {B}}}\).

  2. (2.4)

    The functions f, \(g^{(1)}\), and \(g^{(2)}\) are Lipschitz continuous, i.e., there are constants \(L_f\), \(\tilde{L}_f\) and \(L_g\) such that

    $$\begin{aligned}&|f(x)-f(z)|\le L_f\,|x-z|,\\&|g^{(1)}(x,t)-g^{(1)}(z,s)|\le L_g\,(|x-z|+|t-s|),\\&\Vert g^{(2)}(x,t)-g^{(2)}(z,s)\Vert \le L_g\,(|x-z|+|t-s|), \end{aligned}$$

    for all \(s,t\in [t_0,t_f]\) and all \(x,z\in {{\mathscr {B}}}\)

  3. (2.5)

    The functions \(f_x\), \(g^{(1)}_x\), and \(g^{(2)}_x\) are Lipschitz continuous, i.e., there are constants \(L_f^{(1)}\) and \(L_g^{(1)}\) such that

    $$\begin{aligned}&|f_x(x)-f_x(z)|\le L_f^{(1)}\,|x-z|,\\&|g^{(1)}_{j,x}(x,t)-g^{(1)}_{j,x}(z,s)| \le L_g^{(1)}\,(|x-z|+|t-s|),\; j\in J_1^n,\\&|g^{(2)}_{ji,x}(x,t)-g^{(2)}_{ji,x}(z,s)| \le L_g^{(1)}\,(|x-z|+|t-s|),\; j\in J_1^n,\;i\in J_1^m, \end{aligned}$$

    for all \(s,t\in [t_0,t_f]\) and all \(x,z\in {{\mathscr {B}}}\).

For \((x,u)\in X\) with \(\Vert x-x^*\Vert \le \bar{\varepsilon }\) it follows from (2.2) and (2.5) that

$$\begin{aligned} |g(x(t),u(t),t)|&\le |g(x^*(t),u^*(t),t)| +|g(x(t),u(t),t)-g(x^*(t),u^*(t),t)|\\&\le |g(x^*(t),u^*(t),t)| +|g^{(1)}(x(t),t)-g^{(1)}(x^*(t),t)|\\&\quad +\Vert g^{(2)}(x(t),t)-g^{(2)}(x^*(t),t)\Vert \,|u(t)|\\&\quad +\Vert g^{(2)}(x^*(t),t)\Vert \,|u(t)-u^*(t)|\\&\le |g(x^*(t),u^*(t),t)|+L_g\bar{\varepsilon }(1+K_u) +2K_u\Vert g^{(2)}(x^*(t),t)\Vert . \end{aligned}$$

This implies

$$\begin{aligned} |g(x(t),u(t),t)| \le K_g \end{aligned}$$
(2.6)

with some constant \(K_g\) independent of \((x,u)\in X\) with \(\Vert x-x^*\Vert \le \bar{\varepsilon }\). Moreover, for \((x,u)\in {\mathscr {F}}\) with \(x\in {{\mathscr {N}}}_{\bar{\varepsilon }}(x^*)\) and \(t,s\in [t_0,t_f]\) we have

$$\begin{aligned} |\dot{x}(t)-\dot{x}(s)| \le&|g^{(1)}(x(t),t)-g^{(1)}(x(s),s)| +\Vert g^{(2)}(x(t),t)-g^{(2)}(x(s),s)\Vert |u(t)|\\&+\Vert g^{(2)}(x(s),s)\Vert |u(t)-u(s)|. \end{aligned}$$

By (2.2), Assumption (2.4), and (2.6) this implies

$$\begin{aligned} |\dot{x}(t)-\dot{x}(s)| \le L_g(1+K_u)(|x(t)-x(s)|+|t-s|) + K_g\,|u(t)-u(s)|. \end{aligned}$$
(2.7)

This further implies that with some constant \(L_x\)

$$\begin{aligned} |\dot{x}(t)| \le L_x\quad \forall ' t\in [t_0,t_f], \end{aligned}$$
(2.8)

for all \((x,u)\in {{\mathscr {F}}}\) with \(x\in {\mathscr {N}}_{\bar{\varepsilon }}(x^*)\), which shows that the feasible trajectories \(x\in {{\mathscr {N}}}_{\bar{\varepsilon }}(x^*)\) are uniformly Lipschitz with Lipschitz modulus \(L_x\).

The Hamiltonian \({{\mathscr {H}}}:\mathbb {R}^n\times \mathbb {R}^m\times \mathbb {R}^n \times [t_0,t_f]\rightarrow \mathbb {R}\) for Problem (OC) is defined by

$$\begin{aligned} {{\mathscr {H}}}(x,u,\lambda ,t) = \lambda ^{\mathsf {T}}g(x,u,t) = \sum _{j=1}^n\lambda _j\left[ g^{(1)}_j(x,t) +\sum _{i=1}^m u_ig^{(2)}_{j,i}(x,t)\right] . \end{aligned}$$

We denote by

$$\begin{aligned} g^{(2)}_{.i}(x,t) = \left[ g^{(2)}_{1i}(x,t), \ldots , g^{(2)}_{ni}(x,t)\right] ^{\mathsf {T}},\; i=1,\ldots ,m, \end{aligned}$$

the i-th column vector of \(g^{(2)}(x,t)\) and by

$$\begin{aligned} g^{(2)}_{j}(x,t) = \left[ g^{(2)}_{j1}(x,t), \ldots , g^{(2)}_{jm}(x,t)\right] ,\; j=1,\ldots ,n, \end{aligned}$$

the j-th row of \(g^{(2)}(x,t)\). Then

$$\begin{aligned}&{{\mathscr {H}}}_x(x,u,\lambda ,t) = \lambda ^{\mathsf {T}}g_x(x,t) = \lambda ^{\mathsf {T}}\left[ g^{(1)}_x(x,t) +\sum _{i=1}^m u_ig^{(2)}_{.i,x}(x,t)\right] ,\\&{{\mathscr {H}}}_u(x,u,\lambda ,t) = \lambda ^{\mathsf {T}}g^{(2)}(x,t) = \sum _{j=1}^n \lambda _j g^{(2)}_j(x,t). \end{aligned}$$

Optimality conditions for Problem (OC) are well-known. Let \((x^*,u^*)\in {{\mathscr {F}}}\) be a local solution of (OC). Then there exists a function \(\lambda ^*\in W_1^1(t_0,t_f;\mathbb {R}^n)\) such that the adjoint equation

$$\begin{aligned} -\dot{\lambda }^*(t) = {\mathscr {H}}_x(x^*(t),u^*(t),\lambda ^*(t),t)^{\mathsf {T}} = g_x(x^*(t),u^*(t),t)^{\mathsf {T}}\lambda ^*(t) \end{aligned}$$
(2.9)

is satisfied for a.a. \(t\in [t_0,t_f]\) with terminal condition \(\lambda ^*(t_f)=f_x(x^*(t_f))^\mathsf {T}\), and the local minimum principle

$$\begin{aligned}&{{\mathscr {H}}}_u(x^*(t),u^*(t),\lambda ^*(t),t)^{\mathsf {T}} (u-u^*(t))\nonumber \\&\quad =\lambda ^*(t)^{\mathsf {T}}g^{(2)}(x^*(t),t)(u-u^*(t))\ge 0 \end{aligned}$$
(2.10)

holds for a.a. \(t\in [t_0,t_f]\) and all \(u\in U\).

We denote by \(\sigma ^*:[t_0,t_f]\rightarrow \mathbb {R}^m\) the switching function defined by

$$\begin{aligned} \sigma ^*(t) = {\mathscr {H}}_u(x^*(t),u^*(t),\lambda ^*(t),t)^{\mathsf {T}} = g^{(2)}(x^*(t),t)^{\mathsf {T}}\lambda ^*(t). \end{aligned}$$
(2.11)

For a strong local solution \((x^*,u^*)\in {{\mathscr {F}}}\) of Problem (OC) with associated adjoint function \(\lambda ^*\in X_1\), (2.10) implies for \(i\in \{1,\ldots ,m\}\)

$$\begin{aligned} u^*_i(t) = \left\{ \begin{array}{ll} b_{l,i}, &{}\quad \text{ if } \sigma ^*_i(t)>0\text{, }\\ b_{u,i}, &{}\quad \text{ if } \sigma ^*_i(t)<0\text{, }\\ \text{ undetermined, } &{}\quad \text{ if } \sigma ^*_i(t)=0\text{. } \end{array}\right. \end{aligned}$$
(2.12)

Therefore, the optimal control \(u^*\) is of bang–bang type or may have singular arcs.

3 Euler Approximation

Let \(N\in \mathbb {N}\), \(h=h_N=(t_f-t_0)/N\) be the mesh size and \(t_j=t_0+jh\), \(j\in J_0^{N}\), the grid points of the discretization. We approximate the space \(X_2\) of controls by functions in the subspace \(X_{2,N}\subset X_2\) of piecewise constant functions \(u_h\) represented by their values \(u_h(t_j) = u_{h,j}\) at the grid points \(t_j\), \(j\in J_0^{N-1}\). Further, we approximate state and adjoint state variables by functions \(x_h\), resp. \(\lambda _h\), in the subspace \(X_{1,N}\subset X_1\) of continuous, piecewise linear functions represented by their values \(x_h(t_j)=x_{h,j}\), resp. \(\lambda _h(t_j)=\lambda _{h,j}\), at the grid points \(t_j\), \(j\in J_0^N\). Then based on Euler’s method for the discretization of the system equation we obtain the discrete optimal control problem

By \({{\mathscr {F}}}_N\) we denote the feasible set of \(\text{(OC) }_N\).

Definition 2

A pair \((x^*_h,u^*_h)\in {{\mathscr {F}}}_N\) is called a local minimizer of f on \({{\mathscr {F}}}_N\) or a local solution of Problem \(\text{(OC) }_N\), if there exists \(\varepsilon >0\) such that \(f(x^*_{h,N})\le f(x_{h,N})\) for all \((x_h,u_h)\in {{\mathscr {F}}}_N\cap {{\mathscr {B}}}_\varepsilon (x^*_h,u^*_h)\). \(\Diamond \)

Since \({{\mathscr {F}}}_N\) is nonempty and bounded, Problem \(\text{(OC) }_N\) has a (global) solution. Optimality conditions can be derived in the same way as in Ioffe and Tihomirov [22, Section 6.4]. For any local solution \((x^*_h,u^*_h)\in {{\mathscr {F}}}_N\) of Problem \(\text{(OC) }_N\) there exists a multiplier \(\lambda ^*_h\) such that the discrete adjoint equation

$$\begin{aligned} -\frac{\lambda ^*_{h,j+1}-\lambda ^*_{h,j}}{h_N} = H_x(x^*_{h,j},u^*_{h,j},\lambda ^*_{h,j+1},t_j)^{\mathsf {T}} = g_x(x^*_{h,j},u^*_{h,j},t_j)^{\mathsf {T}}\lambda ^*_{h,j+1} \end{aligned}$$
(3.1)

for \(j\in J_0^{N-1}\) with terminal condition \(\lambda ^*_{h,N} = f_x(x^*_{h,N})^{\mathsf {T}}\), and the discrete minimum principle

$$\begin{aligned} H_u(x^*_{h,j},u^*_{h,j},\lambda ^*_{h,j+1},t_j)(u-u^*_{h,j}) = (\lambda ^*_{h,j+1})^{\mathsf {T}}g^{(2)}(x^*_{h,j},t_j)(u-u^*_{h,j}) \ge 0\nonumber \\ \end{aligned}$$
(3.2)

for \(j\in J_0^{N-1}\) and all \(u\in U\) are satisfied.

By \(\lambda ^*_h\) we denote the continuous, piecewise linear function defined by the values \(\lambda _h(t_j)=\lambda _{h,j}\), \(i=0,\ldots ,N\), and by \(\sigma ^*_h(t)\) we denote the continuous, piecewise constant function defined by the values

$$\begin{aligned} \sigma ^*_h(t_j):=g^{(2)}(x^*_{h,j},t_j)^{\mathsf {T}}\lambda ^*_{h,j+1}, \quad j\in J_0^{N-1}, \end{aligned}$$
(3.3)

the discrete analogue of the switching function (2.11). From (3.2) we obtain for \(i=1,\ldots ,m\), \(j\in J_0^{N-1}\),

$$\begin{aligned} u^*_{h,i}(t_j) = \left\{ \begin{array}{ll} b_{l,i} &{}\quad \text{ if } \sigma ^*_{h,i}(t_j)>0\text{, }\\ b_{u,i} &{}\quad \text{ if } \sigma ^*_{h,i}(t_j)<0\text{, }\\ \text{ undetermined } &{}\quad \text{ if } \sigma ^*_{h,i}(t_j)=0\text{. } \end{array}\right. \end{aligned}$$
(3.4)

4 Error estimates for local minimizers

We first prove some auxiliary results. For a function \(z:[t_0,t_f]\rightarrow \mathbb {R}^k\) of bounded variation and \(s_1,s_2\in [t_0,t_f]\), \(s_1<s_2\), we denote by \(\displaystyle \mathsf {V}_{s_1}^{s_2}z\) the total variation of z on \([s_1,s_2]\).

Lemma 1

Suppose that \(u\in X_2\) has bounded variation, and let \(u_h\in X_{2,N}\) be the piecewise constant function defined by the values \(u_{h,j}=u(t_j)\), \(j\in J_0^{N-1}\). Then

$$\begin{aligned} \Vert u-u_h\Vert _1\le h_N\mathsf {V}_{t_0}^{t_f}u. \end{aligned}$$
(4.1)

Proof

Since for \(s\in [t_j,t_{j+1}]\)

$$\begin{aligned} |u(s)-u(t_j)| \le |u(t_{j+1})-u(s)| + |u(s)-u(t_j)| \le \mathsf {V}_{t_j}^{t_{j+1}}u, \end{aligned}$$

we have

$$\begin{aligned} \Vert u-u_h\Vert _1= & {} \sum _{j=0}^{N-1}\int _{t_j}^{t_{j+1}}|u(s)-u(t_j)|\,ds \le \sum _{j=0}^{N-1}\int _{t_j}^{t_{j+1}}\mathsf {V}_{t_j}^{t_{j+1}}u\\\le & {} h_N\mathsf {V}_{t_0}^{t_f}u, \end{aligned}$$

which proves (4.1). \(\square \)

Remark 1

In many applications the optimal control \(u^*\) is a piecewise Lipschitz continuous function. In this case \(u^*\) has bounded variation. \(\Diamond \)

The following result is a special case of Sendov and Popov [39, Theorem 6.1].

Lemma 2

If Assumptions (2.3) and (2.4) are satisfied, \((x,u)\in {{\mathscr {F}}} \cap {{\mathscr {B}}}_\varepsilon (x^*,u^*)\), \(\dot{x}\) has bounded variation, and \(x_h\) is the solution of the discrete system equation

$$\begin{aligned} x_{h,j+1}= x_{h,j}+h_Ng(x_{h,j},u(t_j),t_j),\; j\in J_0^{N-1}, x_{h,0} = a, \end{aligned}$$
(4.2)

then

$$\begin{aligned} \max _{1\le j\le N} |x_{h,j}-x(t_j)| \le c_1\,h_N\mathsf {V}_{t_0}^{t_f}\dot{x}, \end{aligned}$$
(4.3)

where \(c_1= e^{(t_f-t_0)L_g(1+K_u)}\) is a constant independent of N.

Lemma 3

Suppose that Assumptions (2.1), (2.3), and (2.4) are satisfied, and that \(u^*\) has bounded variation. Then for \((x,u)\in {{\mathscr {F}}}\cap {{\mathscr {B}}}_\varepsilon (x^*,u^*)\) we have

$$\begin{aligned} \mathsf {V}_{t_0}^{t_f}\dot{x} \le L_g(1+K_u)(L_x+1)(t_f-t_0) +c_2\,\mathsf {V}_{t_0}^{t_f} u \end{aligned}$$
(4.4)

where \(c_2\) is a constant independent of N.

Proof

The variation of \(\dot{x}\) can be estimated by the variation of the right hand side of the system equation. For \(t,s\in [t_0,t_f]\) we have by (2.7)

$$\begin{aligned} |\dot{x}(t)-\dot{x}(s)| \le L_g(1+K_u)(|x(t)-x(s)|+|t-s|) + K_g\,|u(t)-u(s)|. \end{aligned}$$

Hence, by (2.8) we obtain

$$\begin{aligned} \mathsf {V}_{t_0}^{t_f}\dot{x} \le L_g(1+K_u)(L_x+1)(t_f-t_0) +K_g\,\mathsf {V}_{t_0}^{t_f} u, \end{aligned}$$

which proves the assertion. \(\square \)

For \(\rho >0\) we consider the auxiliary problem

which is Problem \(\text{(OC) }_{N}\) with the additional constraints

$$\begin{aligned} \Vert x_h-x^*\Vert _\infty \le \rho ,\; \Vert u_h-u^*\Vert _1\le \rho . \end{aligned}$$
(4.5)

For \(\rho >0\) we denote by \({{\mathscr {F}}}_{N,\rho }\) the feasible set of Problem \(\text{(OC) }_{N,\rho }\), i.e.

$$\begin{aligned} {{\mathscr {F}}}_{N,\rho }=\left\{ (x_h,u_h)\in {{\mathscr {F}}_N}\mid \Vert x_h-x^*\Vert _\infty \le \rho ,\;\Vert u_h-u^*\Vert _1 \le \rho \,\right\} . \end{aligned}$$

Lemma 4

Suppose that Assumptions (2.1), (2.3), and (2.4) are satisfied, and that \(u^*\) has bounded variation. Further let \(\rho >0\) be arbitrary but fixed. Then for sufficiently large N Problem \(\text{(OC) }_{N,\rho }\) has a solution.

Proof

Let \({\hat{u}}_h\in X_{2,N}\) be defined by the values \({\hat{u}}_{h,j}=u^*(t_j)\), \(j\in J_0^{N-1}\). Then \({\hat{u}}_h\in {{\mathscr {U}}}\), and by Lemma 1 we have

$$\begin{aligned} \Vert u^*-{\hat{u}}_h\Vert _1\le h_N\mathsf {V}_{t_0}^{t_f}u^*. \end{aligned}$$

Let \({\hat{x}}_h\) be the solution of the discrete system Eq. (4.2) for \(u=u^*\). By Lemmas 2 and 3 we have

$$\begin{aligned} \max _{1\le j\le N} |{\hat{x}}_{h,j}-x^*(t_j)| \le c_1\left( L_g(1+K_u)(L_x+1)(t_f-t_0) +c_2\,\mathsf {V}_{t_0}^{t_f} u^*\right) h_N. \end{aligned}$$

This shows that \(({\hat{x}}_h,{\hat{u}}_h)\in {{\mathscr {F}}}_{N,\rho }\), and hence \({{\mathscr {F}}}_{N,\rho }\not =\emptyset \) for sufficiently large N. Since the cost functional is continuous and the feasible set is closed and bounded, a solution exists. \(\square \)

The following result on the dependence of solutions of the system equation on parameters is well-known.

Lemma 5

If Assumptions (2.1), (2.3), and (2.5) are satisfied, there exists \(\bar{\rho }\in \,]0,\bar{\varepsilon }]\) such that for each \(u\in X_2\) with \(\Vert u-u^*\Vert _1<\bar{\rho }\), and each \(\eta \in L^1(t_0,t_f;\mathbb {R}^n)\) with \(\Vert \eta \Vert _1<\bar{\rho }\) the perturbed system equation

$$\begin{aligned} \dot{x}(t) = g(x(t),u(t),t)+\eta (t)\;\text {a.e. on } [t_0,t_f],\; x(t_0)=a, \end{aligned}$$

has a unique solution \(x = x(u,\eta )\), and for \(x_i=x(u_i,\eta _i)\), \(i=1,2\), we have

$$\begin{aligned} \Vert x_2-x_1\Vert _{1,1}\le c_s\left( \Vert u_2-u_1\Vert _1+\Vert \eta _2-\eta _1\Vert _1\right) , \end{aligned}$$

where the constant \(c_s\) is independent of u and \(\eta \).

We further need the following auxiliary result.

Lemma 6

Suppose that Assumptions (2.1), (2.3), (2.4), and (2.5) are satisfied, and let \(\bar{\rho }>0\) be given by Lemma 5. Then there is a number \(\bar{N}\in \mathbb {N}\) such that for \(N\ge {\bar{N}}\) and any \((x_h,u_h)\in {\mathscr {F}}_{N,\bar{\rho }}\) there exists a function \(z\in X_1\) such that \((z,u_h)\in {{\mathscr {F}}}\) and

$$\begin{aligned} \Vert z-x_h\Vert _{1,1} \le c h_N \end{aligned}$$
(4.6)

with a constant c independent of N and \((x_h,u_h)\in {\mathscr {F}}_{N,\bar{\rho }}\).

Proof

Let \((x_h,u_h)\in {{\mathscr {F}}}_{N,\bar{\rho }}\) and \(N\in \mathbb {N}\) be given. Since \(\Vert x_h-x^*\Vert \le \bar{\rho }\le \bar{\varepsilon }\) it follows from (2.6) that

$$\begin{aligned} |g(x_h(t_j),u_h(t_j),t_j)| \le K_g. \end{aligned}$$
(4.7)

By Lemma 5 the system equation of (OC) for \(u=u_h\), i.e.,

$$\begin{aligned} \dot{z}(t) = g(z(t),u_h(t),t)\;\text {a.e. on }[t_0,t_f],\quad z(t_0)=a, \end{aligned}$$

has a unique solution z, i.e. \((z,u_h)\in {{\mathscr {F}}}\). We define the piecewise constant function \({\bar{x}}_h:[t_0,t_f]\rightarrow \mathbb {R}^n\) by \({\bar{x}}_h(t_j) = x_h(t_j)\) for \(j\in J_0^{N-1}\). Then the discrete system Eq. (4.2) for \(x_h\) can be written in the form

$$\begin{aligned} \dot{x}_h(t) = g(x_h(t),u_h(t),t)+\eta (t)\;\text {a.e. on } [t_0,t_f], \quad x_h(t_0)=a, \end{aligned}$$

where

$$\begin{aligned} \eta (t) = g^{(1)}({\bar{x}}_h(t),t)- g^{(1)}(x_h(t),t) +(g^{(2)}(\bar{x}_h(t),t)-g^{(2)}(x_h(t),t))u_h(t). \end{aligned}$$

Since \(x_h\) solves the discrete system Eq. (4.2) we have by (4.7) for \(t\in [t_j,t_{j+1}[\),

$$\begin{aligned} |{\bar{x}}_h(t)-x_h(t)|= & {} \left| (t-t_j)\frac{x(t_{j+1})-x(t_j)}{t_{j+1}-t_j}\right| \\= & {} |(t-t_j)|\,|g(x_h(t_j),u_h(t_j),t_j)| \le h_NK_g, \end{aligned}$$

and by (2.2), Assumption (2.4), and (2.8) it follows that for \(t\in [t_0,t_f]\),

$$\begin{aligned} |\eta (t)| \le L_g(1+K_u)|{\bar{x}}_h(t)-x_h(t)| \le L_g(1+K_u)K_g h_N. \end{aligned}$$

We choose \({\bar{N}}\in \mathbb {N}\) such that

$$\begin{aligned} (t_f-t_0)L_g(1+K_u)K_g h_N < \bar{\rho }\end{aligned}$$

for \(N\ge {\bar{N}}\). Then

$$\begin{aligned} \Vert \eta \Vert _1 \le (t_f-t_0)|\eta (t)|_\infty < \bar{\rho }\end{aligned}$$

for \(N\ge {\bar{N}}\). By Lemma 5 this implies

$$\begin{aligned} \Vert x_h-z\Vert _{1,1} \le c_s\,\Vert \eta \Vert _1 \le c_sL_g(1+K_u)K_g(t_f-t_0)h_N, \end{aligned}$$

which proves (4.6). \(\square \)

In order to obtain error estimates for local solutions we proceed similarly to Alt [1, 2] (compare also Alt et al. [6]). In addition to (2.1) we use the following growth condition for the cost functional:

  1. (4.8)

    There exist \(\alpha >0\), \(\kappa \ge 1\) such that

    $$\begin{aligned} f(x(t_f))-f(x^*(t_f)) \ge \alpha \,\Vert u-u^*\Vert _1^\kappa \end{aligned}$$

    for all \((x,u)\in {{\mathscr {F}}}\cap {\mathscr {B}}_{\bar{\varepsilon }}(x^*,u^*)\).

Remark 2

The growth condition required here implies that \((x^*,u^*)\) is a strict local solution of (OC). Such conditions are closely related to second-order optimality conditions (see e.g. Ioffe and Tihomirov [22, Chapter 7] or Maurer and Zowe [31, Theorem 5.6]). In the following section we use the stronger second-order optimality condition (5.7) implying (4.8) with \(\kappa =2\) (see Theorem 3).

Theorem 1

Let Assumptions (2.1), (2.3), (2.4), (2.5), and (4.8) be satisfied and suppose that \(u^*\) has bounded variation. Then for each \(0<\rho <\bar{\rho }\), where \(\bar{\rho }>0\) is given by Lemma 5, Problem \(\text{(OC) }_{N,\rho }\) has a global solution for sufficiently large N. Further for each such solution \((x^*_h,u^*_h)\) the estimates

$$\begin{aligned} \Vert u^*_h-u^*\Vert _1\le c_uh_N^{\frac{1}{\kappa }},\quad \Vert x^*_h-x^*\Vert _{1,1}\le c_xh_N^{\frac{1}{\kappa }} \end{aligned}$$
(4.9)

hold with constants \(c_u\), \(c_x\) independent of N and the solution \((x^*_h,u^*_h)\).

Proof

We choose \(N\ge {\bar{N}}\) sufficiently large, where \({\bar{N}}\) is defined by Lemma 6. Then by Lemma 4 Problem \(\text{(OC) }_{N,\rho }\) has a (global) solution. Let \((x^*_h,u^*_h)\) be any such solution. By Lemma 6 there exists a function \(z^*\in X_1\), such that \((z^*,u^*_h)\in {{\mathscr {F}}}\) and

$$\begin{aligned} \Vert z^*-x^*_h\Vert _{1,1} \le c_1 h_N \end{aligned}$$
(4.10)

with a constant \(c_1\) independent of N and \((x^*_h,u^*_h)\). Further, since \((x^*_h,u^*_h)\in {{\mathscr {F}}}_{N,\rho }\) we have \(\Vert x^*_h-x^*\Vert _\infty \le \rho \). Together with (4.10) and the fact that \(\Vert z^*-x^*_h\Vert _\infty \le \Vert z^*-x^*_h\Vert _{1,1}\) this implies

$$\begin{aligned} \Vert z^*-x^*\Vert _\infty \le \Vert z^*-x^*_h\Vert _\infty + \Vert x^*_h-x^*\Vert _\infty \le c_1h_N+\rho \end{aligned}$$

and therefore

$$\begin{aligned} \Vert z^*-x^*\Vert _\infty< \bar{\rho }<\bar{\varepsilon }\end{aligned}$$
(4.11)

for sufficiently large N. Since \((x^*_h,u^*_h)\in {\mathscr {F}}_{N,\rho }\) we have \((z^*,u^*_h)\in {{\mathscr {F}}}\cap {{\mathscr {B}}}_{\bar{\varepsilon }}(x^*,u^*)\). By (4.8) we therefore have

$$\begin{aligned} f(z^*(t_f))-f(x^*(t_f)) \ge \alpha \,\Vert u^*_h-u^*\Vert ^\kappa _1. \end{aligned}$$

Further, by (2.4) and (4.10) we have

$$\begin{aligned} f(z^*(t_f)) = f(x^*_h(t_f)) + f(z^*(t_f)) - f(x^*_h(t_f)) \le f(x^*_h(t_f)) +L_fc_1h_N, \end{aligned}$$

and therefore

$$\begin{aligned} \alpha \,\Vert u^*_h-u^*\Vert _1^\kappa \le f(x^*_h(t_f)) - f(x^*(t_f)) + L_fc_1h_N \end{aligned}$$
(4.12)

for N sufficiently large.

Let \({\hat{u}}_h\in X_{2,N}\) be defined as in the proof of Lemma 4. Then for sufficiently large N we have \(({\hat{x}}_h,{\hat{u}}_h)\in {{\mathscr {F}}}_{N,\rho }\) (see proof of Lemma 4) and therefore \(f({\hat{x}}_h(t_f)) \ge f(x^*_h(t_f))\). Further we have

$$\begin{aligned} \max _{1\le j\le N} |{\hat{x}}_h(t_j)-x^*(t_j)| \le c_2h_N \end{aligned}$$
(4.13)

with a constant \(c_2\) independent of N. By (4.12), (2.4) this implies

$$\begin{aligned} \alpha \,\Vert u^*_h-u^*\Vert _1^\kappa \le f({\hat{x}}_h(t_f)) - f(x^*(t_f)) + L_fc_1h_N \le L_f(c_1+c_2)h_N. \end{aligned}$$
(4.14)

In the proof of Lemma 6 we have shown that the discrete system Eq. (4.2) for \(x_h=x^*_h\) can be written in the form

$$\begin{aligned} \dot{x}_h(t) = g(x_h(t),u_h(t),t)+\eta (t)\; \text {a.e. on } [t_0,t_f], \quad x_h(t_0)=a, \end{aligned}$$

where \(|\eta (t)|\le c_3h_N\) with a constant \(c_3\) independent of N. By Lemma 5 we therefore obtain

$$\begin{aligned} \Vert x^*_h-x^*\Vert _{1,1} \le c_4\left( \Vert u^*_h-u^*\Vert _1+\Vert \eta \Vert _1\right) \le c_4\left( \Vert u^*_h-u^*\Vert _1+c_3(t_f-t_0)h_N\right) ,\nonumber \\ \end{aligned}$$
(4.15)

where the constant \(c_4\) is independent of u and N. \(\square \)

Remark 3

Note that Theorem 1 assumes that \((x^*_h,u^*_h)\) is a global solution of Problem \(\text{(OC) }_{N,\rho }\). For such a solution we have \(\Vert u^*_h-u^*\Vert _1<\rho \) and \(\Vert x^*_h-x^*\Vert _{1,1}<\rho \) for sufficiently large N, i.e. the additional constraints (4.5) are not active, and \((x^*_h,u^*_h)\) is a local minimizer of Problem \(\text{(OC) }_N\). Similar results on the existence of approximate local minimizers for control problems obtained by Euler discretization and error estimates for the discrete solutions are well-known in case that the optimal control is continuous (see e.g. Malanowski et al. [27], Dontchev and Hager [11], Dontchev et al. [12, 13]). In these papers a strong second-order sufficient optimality condition is used which also implies local uniqueness of the discrete solutions. This can not be shown under the weaker condition (4.8) used here. \(\Diamond \)

If \((x^*_h,u^*_h)\) is a global solution of Problem \(\text{(OC) }_{N,\rho }\), then by Remark 3 \((x^*_h,u^*_h)\) is a local minimizer of Problem \(\text{(OC) }_N\). Therefore a multiplier \(\lambda ^*_h\) exists satisfying the discrete adjoint Eq. (3.1). In order to derive an error estimate for this multiplier we need some auxiliary results. Since the adjoint equation is a linear differential equation one easily obtains the following result.

Lemma 7

Suppose that Assumptions (2.1), (2.3), (2.4), and (2.5) are satisfied. Let \(\bar{\rho }>0\) be given by Lemma 5 and \(0<\rho \le \bar{\rho }\). Then if N is sufficiently large we have for any solution \((x^*_h,u^*_h)\) of Problem \(\text{(OC) }_{N,\rho }\) and the associated adjoint function \(\lambda ^*_h\) the estimate

$$\begin{aligned} \Vert \lambda ^*_h\Vert _\infty \le K_\lambda \end{aligned}$$
(4.16)

with a constant \(K_\lambda \) independent of N and the solution \((x^*_h,u^*_h)\).

In the proof of Lemma 6 we have shown that the discrete state variables can be viewed as the solution of a perturbation of the system equation of Problem (OC). In the same way one can show that the discrete adjoint variables \(\lambda ^*_h\) can be viewed as the solution of a perturbation of the adjoint Eq. (2.9).

Lemma 8

Suppose that Assumptions (2.1), (2.3), (2.4), and (2.5) are satisfied. Let \(\bar{\rho }>0\) be given by Lemma 5 and \(0<\rho \le \bar{\rho }\). Then, if N is sufficiently large, we can write the discrete adjoint equation (3.1) in the form

$$\begin{aligned} -\dot{\lambda }^*_{h}(t) = g_x(x^*_h(t),u^*_h(t),t)^{\mathsf {T}}\lambda ^*_h(t)+\xi _h(t) \end{aligned}$$
(4.17)

for a.a. \(t\in [t_0,t_f]\), where the function \(\xi _h:[t_0,t_f]\rightarrow \mathbb {R}^n\) can be estimated by

$$\begin{aligned} |\xi _h(t)| \le c_\xi h_N \end{aligned}$$
(4.18)

for a.a. \(t\in [t_0,t_f]\) with a constant \(c_\xi \) independent of N and the solution \((x^*_h,u^*_h)\).

Now we can derive an error estimate for the discrete adjoint functions.

Theorem 2

Let the assumptions of Theorem 1 be satisfied and suppose that \(u^*\) has bounded variation. Then for each \(0<\rho <\bar{\rho }\), where \(\bar{\rho }>0\) is given by Lemma 5, Problem \(\text{(OC) }_{N,\rho }\) has a (global) solution for sufficiently large N. Further for each such solution \((x^*_h,u^*_h)\) and the associated adjoint function \(\lambda ^*_h\) the estimate

$$\begin{aligned} \Vert \lambda ^*_h-\lambda ^*\Vert _1\le c_\lambda h_N^{\frac{1}{\kappa }} \end{aligned}$$
(4.19)

holds with a constant \(c_\lambda \) independent of N and the solution \((x^*_h,u^*_h)\).

Proof

We define \(\lambda := \lambda ^*_h-\lambda ^*\). By (2.9) and (4.17) we have

$$\begin{aligned} -\dot{\lambda }(t) =&-\dot{\lambda }^*_{h}(t) + \dot{\lambda }^*(t)\\ =&\left[ g^{(1)}_x(x^*_h(t),t) +\sum _{i=1}^m u^*_h(t)_ig^{(2)}_{.i,x}(x^*_h(t),t)\right] ^{\mathsf {T}} \lambda ^*_h(t)+\xi _h(t)\\&-\left[ g^{(1)}_x(x^*(t),t) +\sum _{i=1}^m u^*(t)_ig^{(2)}_{.i,x}(x^*(t),t)\right] ^{\mathsf {T}} \lambda ^*(t)\\ =&\left[ g^{(1)}_x(x^*_h(t),t) +\sum _{i=1}^m u^*_h(t)_ig^{(2)}_{.i,x}(x^*_h(t),t)\right. \\&\;\left. -g^{(1)}_x(x^*(t),t) -\sum _{i=1}^m u^*(t)_ig^{(2)}_{.i,x}(x^*(t),t)\right] ^{\mathsf {T}} \lambda ^*_h(t)+\xi _h(t)\\&+ \left[ g^{(1)}_x(x^*(t),t) +\sum _{i=1}^m u^*(t)_ig^{(2)}_{.i,x}(x^*(t),t)\right] ^{\mathsf {T}} \lambda (t) \end{aligned}$$

with terminal condition \(\lambda (t_f) = f_x(x^*_{h,N})^{\mathsf {T}} - f_x(x^*(t_f))^\mathsf {T}\). Since this is a linear differential equation it follows that

$$\begin{aligned} \Vert \lambda \Vert _{1,1}\le c_{\lambda ,1}\left( \Vert x^*_h-x^*\Vert _{1,1} + \Vert u^*_h-u^*\Vert _1+h_N\right) \end{aligned}$$

with some constant \({\bar{c}}_{\lambda ,1}\) independent of N and \((x^*_h,u^*_h))\). Finally, together with (4.15) we obtain

$$\begin{aligned} \Vert \lambda ^*_h-\lambda ^*\Vert _{1,1} \le c_{\lambda ,2}\left( \Vert u^*_h-u^*\Vert _1+h_N\right) \end{aligned}$$
(4.20)

with a constant \(c_{\lambda ,2}\) independent of N and the solution \((x^*_h, u^*_h)\). By Theorem 1 this implies (4.19). \(\square \)

5 Improved error estimates

We can improve the error estimates of the last section to order 1, if we replace condition (4.8) by a stronger second-order sufficient optimality condition. To this end we require in addition to Assumptions (2.1), (2.3), (2.4), and (2.5):

  1. (5.1)

    The functions f, \(g^{(1)}\), and \(g^{(2)}\) are twice continuously differentiable w.r.t. x on \({{\mathscr {B}}}\).

  2. (5.2)

    The functions \(f_{xx}\), \(g^{(1)}_{xx}\), and \(g^{(2)}_{xx}\) are Lipschitz continuous, i.e., there are constants \(L_f^{(2)}\) and \(L_g^{(1)}\) such that for all \(s,t\in [t_0,t_f]\) and all \(x,z\in {{\mathscr {B}}}\)

    $$\begin{aligned}&|f_{xx}(x)-f_{xx}(z)| \le L_f^{(2)}\,|x-z|,\\&|g^{(1)}_{j,xx}(x,t)-g^{(1)}_{j,xx}(z,s)| \le L_g^{(2)}\,(|x-z|+|t-s|),\; j\in J_1^n,\\&|g^{(2)}_{ji,xx}(x,t)-g^{(2)}_{ji,xx}(z,s)| \le L_g^{(2)}\,(|x-z|+|t-s|),\; j\in J_1^n,\;i\in J_1^m. \end{aligned}$$

Remark 4

Assumptions (5.1), (5.2) imply Assumptions (2.3), (2.4), and (2.5). \(\Diamond \)

As in Alt [2, Section 6] we can formulate Problem (OC) as an abstract optimization problem of type

$$\begin{aligned} \min _{z\in X} F(z)\quad \text {s.t.}\quad z\in C,\quad G(z)\in K, \end{aligned}$$

where \(z=(x,u)\in X\), \(F:X\rightarrow \mathbb {R}\) is defined by

$$\begin{aligned} F(z) = F(x,u) = f(x(t_f)), \end{aligned}$$

\(G:X\rightarrow Y:= L^1(t_0,t_f;\mathbb {R}^n)\times \mathbb {R}^n\) is defined by

$$\begin{aligned} G(z)(t) = G(x,u)(t)= \left( \begin{array}{@{}ll@{}} g(x(t),u(t),t) - \dot{x}(t)\\ x(t_0)-a \end{array}\right) , \end{aligned}$$

and \(C=X_1\times {{\mathscr {U}}}\), \(K=\{0_Y\}\). As shown in Alt [2] it then follows by results of Robinson [37, 38] that the set

$$\begin{aligned} T(x^*,u^*)&= \{(x,u)\in C\mid G(x^*,u^*) +G'(x^*,u^*)\left( (x,u)-(x^*,u^*)\right) \in K\}\\&= \{(x,u)\mid (x,u)\in X, u\in {{\mathscr {U}}}, x(t_0)=a,\\ {\dot{x}}-{\dot{x}}^*&= g_x(x^*(\cdot ),u^*(\cdot ),\cdot )(x-x^*) +g_u(x^*(\cdot ),u^*(\cdot ),\cdot )(u-u^*)\} \end{aligned}$$

approximates the feasible set of Problem (OC) in the sense of Maurer and Zowe [31, Definition 4.1]. From Alt [2, Lemma 2.1] and Lemma 5 we therefore get the following result:

Lemma 9

Let Assumptions (2.1), (2.3), (2.4), and (2.5) be satisfied. Then for each \(\gamma >0\) there exists \(\rho (\gamma )>0\) such that for each \((x,u)\in {{\mathscr {F}}}\) with \(\Vert u-u^*\Vert _1 < \rho (\gamma )\) there exists \(({\bar{x}},{\bar{u}})\in T(x^*,u^*)\) with

$$\begin{aligned} \Vert x-{\bar{x}}\Vert _{1,1}+\Vert u-{\bar{u}}\Vert _1 \le \gamma \left( \Vert x-x^*\Vert _{1,1}+\Vert u-u^*\Vert _1\right) . \end{aligned}$$

For \(\lambda \in L^\infty (t_0,t_f;\mathbb {R}^n)\) we define the Lagrange function by

$$\begin{aligned} {{\mathscr {L}}}(x,u,\lambda ) = f(x(t_f)) +\int _{t_0}^{t_f}\lambda (t)^{{\mathsf {T}}}\left[ g(x(t),u(t),t) -\dot{x}(t)\right] \,{\mathrm {d}}t. \end{aligned}$$

It follows from the adjoint Eq. (2.9) that

$$\begin{aligned} {{\mathscr {L}}}_x(x^*,u^*,\lambda ^*)(x)&= f_x\left( x^*(t_f)\right) x(t_f) \nonumber \\&\quad + \int _{t_0}^{t_f}\lambda ^*(t)^{{\mathsf {T}}}\left[ g_x(x^*(t),u^*(t),t)x(t)-\dot{x}(t)\right] \,{\mathrm {d}}t=0\qquad \end{aligned}$$
(5.3)

for all \(x\in X_1\) with \(x(t_0)=0\), and by the local minimum principle (2.10) we have

$$\begin{aligned} \begin{aligned} {{\mathscr {L}}}_u(x^*,u^*,\lambda ^*)(u-u^*)&= \int _{t_0}^{t_f}\lambda ^*(t)^{{\mathsf {T}}}g_u(x^*(t),u^*(t),t) \left( u(t)-u^*(t)\right) \,{\mathrm {d}}t\\&= \int _{t_0}^{t_f}\sigma ^*(t)^{{\mathsf {T}}}\left( u(t)-u^*(t)\right) \,{\mathrm {d}}t\ge 0 \end{aligned} \end{aligned}$$
(5.4)

for all \(u\in {\mathscr {U}}\), where \(\sigma ^*\) is the switching function defined by (2.11).

By \({{\mathscr {L}}}''\) we denote the second derivate of \({{\mathscr {L}}}\) w.r.t. (xu). Since the control u appears only linearly in Problem (OC), we have

$$\begin{aligned} {{\mathscr {H}}}_{uu}(x,u,\lambda ,t)=0\quad \text {for all } (x,u,\lambda ,t)\in \mathbb {R}^n\times \mathbb {R}^m\times \mathbb {R}^n\times [t_0,t_f], \end{aligned}$$
(5.5)

and therefore

$$\begin{aligned} {{\mathscr {L}}}''({\bar{x}}, \bar{u},\bar{\lambda })\left( (x_1,u_1),(x_2,u_2)\right)&= x_1(t_f)^{{\mathsf {T}}}f_{xx}\left( {\bar{x}}(t_f)\right) x_2(t_f)\\&\quad +\int _{t_0}^{t_f}\!x_1(t)^{{\mathsf {T}}} {{\mathscr {H}}}_{xx}({\bar{x}}(t),{\bar{u}}(t),\bar{\lambda }(t),t)x_2(t)\,{\mathrm {d}}t\\&\quad +\int _{t_0}^{t_f}\!x_1(t)^{{\mathsf {T}}} {{\mathscr {H}}}_{xu}({\bar{x}}(t),{\bar{u}}(t),\bar{\lambda }(t),t)u_2(t)\,{\mathrm {d}}t\\&\quad +\int _{t_0}^{t_f}\!u_1(t)^{{\mathsf {T}}} {{\mathscr {H}}}_{ux}(\bar{x}(t),{\bar{u}}(t),\bar{\lambda }(t),t)x_2(t)\,{\mathrm {d}}t\end{aligned}$$

for all \(({\bar{x}},{\bar{u}},\bar{\lambda }), (x,u,\lambda )\in X\times X_1\).

If (5.1) is satisfied, then there exists a constant \(C_{\!{\mathscr {L}}}\) such that

$$\begin{aligned} \begin{aligned}&|{{\mathscr {L}}}''({\bar{x}},{\bar{u}},\bar{\lambda })\left( (x_1,u_1),(x_2,u_2)\right) | \\&\quad \le C_{\!{\mathscr {L}}}\left( \Vert x_1\Vert _\infty \Vert x_2\Vert _\infty +\Vert x_1\Vert _\infty \Vert u_2\Vert _1+\Vert x_2\Vert _\infty \Vert u_1\Vert _1\right) \end{aligned} \end{aligned}$$
(5.6)

for all \(({\bar{x}},{\bar{u}},\bar{\lambda })\in X\times X_1\) with \(\Vert \bar{x}-x^*\Vert _\infty <\bar{\rho }\), \(\Vert \bar{\lambda }-\lambda ^*\Vert _\infty <\bar{\rho }\), \({\bar{u}}\in {\mathscr {U}}\), and all \((x_1,u_1)\), \((x_2,u_2)\in X\).

In case of a continuous optimal control convergence of order 1 can be shown for Euler approximation if a strong second-order optimality condition is satisfied which especially requires that the bilinear form \({{\mathscr {L}}}''(x^*,u^*,\lambda ^*)\) is positive definite w.r.t. to the control function (compare e.g. Dontchev et al. [13]). By (5.5) this condition cannot be satisfied for the class of control problems considered here. We use instead a second-order condition for the switching function \(\sigma ^*\) defined by (2.11). This condition has been introduced by Felgenhauer [15] (see also Maurer and Osmolovskii [30], Maurer et al. [29]) and has been used e.g. in Alt et al. [6], Alt and Seydenschwanz [7], Seydenschwanz [40] to investigate Euler discretization of linear quadratic control problems:

  1. (5.7)

    There exists \(\bar{\alpha }> 0\) such that

    $$\begin{aligned} \int _{t_0}^{t_f} \sigma ^*(t)^{{\mathsf {T}}}\left( u(t)-u^*(t)\right) \,{\mathrm {d}}t= {{\mathscr {L}}}_u(x^*,u^*,\lambda ^*)(u-u^*) \ge \bar{\alpha }\,\Vert u-u^*\Vert _1^2, \end{aligned}$$

    for all \(u\in {\mathscr {U}}\).

Remark 5

One should note that Assumption (5.7) excludes singular arcs of the optimal control, i.e., the optimal control \(u^*\) must be of bang–bang type. As shown in Alt et al. [6, Lemma 4] the assumption is satisfied if the optimal control is of bang–bang type with finitely many boundary arcs and if an additional growth condition for the switching function around its zeros holds. \(\Diamond \)

In Alt and Seydenschwanz [7] and Alt et al. [6] we used an additional assumption ensuring convexity of the linear-quadratic control problems considered there. Here we use the somewhat weaker assumption:

  1. (5.8)

    There exists \(\beta > 0\) such that \(\alpha :=\bar{\alpha }-\beta >0\) and

    $$\begin{aligned}&z(t_f)^{{\mathsf {T}}} f_{xx}\left( x^*(t_f)\right) z(t_f) + \int _{t_0}^{t_f}\!z(t)^{{\mathsf {T}}} {{\mathscr {H}}}_{xx}\left( x^*(t),u^*(t),\lambda ^*(t),t\right) z(t)\,{\mathrm {d}}t\\&\quad +2\int _{t_0}^{t_f}\!z(t)^{{\mathsf {T}}} {{\mathscr {H}}}_{xu}\left( x^*(t),u^*(t),\lambda ^*(t),t\right) v(t)\,{\mathrm {d}}t\\&\quad = {\mathscr {L}}''(x^*,u^*,\lambda ^*)\left( (z,v),(z,v)\right) \ge -\beta \,\Vert v\Vert _1^2 \end{aligned}$$

    for all \((z,v)=(x,u)-(x^*,u^*)\) with \((x,u)\in T(x^*,u^*)\).

Remark 6

The condition \((z,v)=(x,u)-(x^*,u^*)\) with \((x,u)\in T(x^*,u^*)\) is equivalent to \(u\in {{\mathscr {U}}}\), \(z(t_0)=0\) and

$$\begin{aligned} \dot{z}(t) = g_x(x^*(t),u^*(t),t)z(t)+g_u(x^*(t),u^*(t),t)v(t) \end{aligned}$$

for a.a. \(t\in [t_0,t_f]\). Therefore, \(\Vert z\Vert _{1,1}\le c\,\Vert v\Vert _1\) with a constant c independent of v. \(\Diamond \)

Example 1

It can easily be seen that Assumption (6) used in Alt et al. [6] for a class of linear quadratic control problems is equivalent to (5.8).

If the system equation is linear then the coercivity condition in (5.8) reads

$$\begin{aligned} z(t_f)^{{\mathsf {T}}} f_{xx}\left( x^*(t_f)\right) z(t_f) \ge -\beta \,\Vert v\Vert _1^2 \end{aligned}$$

for all \((z,v)=(x,u)-(x^*,u^*)\) with \((x,u)\in T(x^*,u^*)\). \(\Diamond \)

We now show, that Assumptions (5.7) and (5.8) imply the growth condition (4.8) with \(\kappa =2\). The proof is based on a result of Ioffe and Tihomirov [22, Chapter 7] concerning a general second-order sufficient optimality condition for equality constrained optimization problems. More general results can be found in Maurer [28], and Maurer and Zowe [31] (see also Alt [2]). A general result on sufficient optimality conditions for optimal control problems can be found in Felgenhauer [15]. We need some auxiliary results which are modifications of corresponding results in Sect. 3 of Alt [2].

Lemma 10

Let Assumptions (2.1), (5.1), (5.2), (5.7), and (5.8) be satisfied. Then there exists \(0<\delta _1\le \bar{\rho }\) such that

$$\begin{aligned} {{\mathscr {L}}}''({\bar{x}},{\bar{u}},\bar{\lambda })\left( (z,v),(z,v)\right) \ge -\left( \beta +\frac{\alpha }{4}\right) \Vert v\Vert _1^2 \end{aligned}$$

for all \((z,v)=(x,u)-(x^*,u^*)\) with \((x,u)\in T(x^*,u^*)\) and all \(({\bar{x}},{\bar{u}},\bar{\lambda })\in X\times X_1\) with \(\Vert {\bar{x}}-x^*\Vert _\infty +\Vert {\bar{u}}-u^*\Vert _1 +\Vert \bar{\lambda }-\lambda ^*\Vert _\infty < \delta _1\).

Proof

Let \((z,v)=(x,u)-(x^*,u^*)\) with \((x,u)\in T(x^*,u^*)\) and \(({\bar{x}},{\bar{u}},\bar{\lambda })\in X\times X_1\) with \(\Vert \bar{x}-x^*\Vert _\infty +\Vert {\bar{u}}-u^*\Vert _1 +\Vert \bar{\lambda }-\lambda ^*\Vert _\infty < \bar{\varepsilon }\). By Assumption (5.8) we have

$$\begin{aligned}&{{\mathscr {L}}}''({\bar{x}},{\bar{u}},\bar{\lambda })\left( (z,v),(z,v)\right) = {{\mathscr {L}}}''(x^*,u^*,\lambda ^*)\left( (z,v),(z,v)\right) \\&\qquad +{{\mathscr {L}}}''({\bar{x}},{\bar{u}},\bar{\lambda })\left( (z,v),(z,v)\right) -{{\mathscr {L}}}''(x^*,u^*,\lambda ^*)\left( (z,v),(z,v)\right) \\&\quad \ge -\beta \,\Vert v\Vert _1^2 + {{\mathscr {L}}}''({\bar{x}},\bar{u},\bar{\lambda })\left( (z,v),(z,v)\right) -{\mathscr {L}}''(x^*,u^*,\lambda ^*)\left( (z,v),(z,v)\right) , \end{aligned}$$

i.e.,

$$\begin{aligned} \begin{aligned} {{\mathscr {L}}}''&({\bar{x}},{\bar{u}},\bar{\lambda })\left( (z,v),(z,v)\right) +\beta \,\Vert v\Vert _1^2 \ge z(t_f)^{{\mathsf {T}}}\left[ f_{xx}\left( \bar{x}(t_f)\right) -f_{xx}\left( x^*(t_f)\right) \right] z(t_f)\\&+\int _{t_0}^{t_f}\!z(t)^{{\mathsf {T}}} \left[ {{\mathscr {H}}}_{xx}(\bar{x}(t),{\bar{u}}(t),\bar{\lambda }(t),t) -{{\mathscr {H}}}_{xx}(x^*(t),u^*(t),\lambda ^*(t),t)\right] z(t)\,{\mathrm {d}}t\\&+2\int _{t_0}^{t_f}\!z(t)^{{\mathsf {T}}} \left[ {{\mathscr {H}}}_{xu}(\bar{x}(t),{\bar{u}}(t),\bar{\lambda }(t),t) -{\mathscr {H}}_{xu}(x^*(t),u^*(t),\lambda ^*(t),t)\right] v(t)\,{\mathrm {d}}t. \end{aligned} \end{aligned}$$
(5.9)

By (5.2) the absolute value of the first term on the right hand side of this inequality can be estimated by

$$\begin{aligned} c_1\Vert {\bar{x}}-x^*\Vert _\infty \Vert z\Vert _\infty ^2 \end{aligned}$$
(5.10)

with some constant \(c_1\) independent of z, v, \({\bar{x}}\), \(\bar{u}\), and \(\bar{\lambda }\). Using

$$\begin{aligned}&{{\mathscr {H}}}_{xx}({\bar{x}}(t),{\bar{u}}(t),\bar{\lambda }(t),t) -{{\mathscr {H}}}_{xx}(x^*(t),u^*(t),\lambda ^*(t),t)\\&\quad =\sum _{j=1}^n\bar{\lambda }(t)_j\left[ g^{(1)}_{j,xx}({\bar{x}}(t),t) -g^{(1)}_{j,xx}(x^*(t),t)\right] \\&\qquad +\sum _{j=1}^n\left[ \bar{\lambda }(t)_j -\lambda ^*(t)_j\right] g^{(1)}_{j,xx}(x^*(t),t)\\&\qquad +\sum _{j=1}^n\bar{\lambda }(t)_j \sum _{i=1}^m \bar{u}(t)_i\left[ g^{(2)}_{ji,xx}({\bar{x}}(t),t) -g^{(2)}_{ji,xx}(x^*(t),t)\right] \\&\qquad +\sum _{j=1}^n\bar{\lambda }(t)_j \sum _{i=1}^m \left[ \bar{u}(t)_i-u^*(t)_i\right] g^{(2)}_{ji,xx}(x^*(t),t)\\&\qquad +\sum _{j=1}^n\left[ \bar{\lambda }(t)_j-\lambda ^*(t)_j\right] \sum _{i=1}^m u^*(t)_ig^{(2)}_{ji,xx}(x^*(t),t). \end{aligned}$$

the absolute value of the second term on the right hand side of (5.9) can be estimated by

$$\begin{aligned} c_2\left( \Vert {\bar{x}}-x^*\Vert _\infty +\Vert {\bar{u}}-u^*\Vert _1 +\Vert \bar{\lambda }-\lambda ^*\Vert _\infty \right) \Vert z\Vert _\infty ^2 \end{aligned}$$
(5.11)

with some constant \(c_2\) independent of z, v, \({\bar{x}}\), \(\bar{u}\), and \(\bar{\lambda }\). In the same way it can be shown that the absolute value of the third term on the right hand side of (5.9) can be estimated by

$$\begin{aligned} c_3\left( \Vert \bar{x}-x^*\Vert _\infty +\Vert \bar{\lambda }-\lambda ^*\Vert _\infty \right) \Vert z\Vert _\infty \Vert v\Vert _1 \end{aligned}$$
(5.12)

with some constant \(c_3\) independent of z, v, \({\bar{x}}\), \(\bar{u}\), and \(\bar{\lambda }\). Since z satisfies the linear differential equation

$$\begin{aligned} {\dot{z}}(t) = g_x(x^*(t),u^*(t),t)z(t) +g_u(x^*(t),u^*(t),t)\left( u(t)-u^*(t)\right) \; \text {a.e. on } [t_0,t_f] \end{aligned}$$

with initial condition \(z(t_0)=0\) we have

$$\begin{aligned} \Vert z\Vert _\infty = \Vert x-x^*\Vert _\infty \le \Vert x-x^*\Vert _{1,1}\le c_4\Vert u-u^*\Vert _1 \end{aligned}$$
(5.13)

with a constant \(c_4\) independent of x and u. Now combining (5.10)–(5.13) the absolute value of the right hand side of (5.9) can be estimated by

$$\begin{aligned} c_5&\left( \Vert {\bar{x}}-x^*\Vert _\infty +\Vert {\bar{u}}-u^*\Vert _1 +\Vert \bar{\lambda }-\lambda ^*\Vert _\infty \right) \Vert v\Vert _1^2. \end{aligned}$$

The assertion then follows if we choose \(\delta _1>0\) small enough. \(\square \)

Lemma 11

Let Assumptions (2.1), (5.1), (5.2), (5.7), and (5.8) be satisfied. Then there exists \(0<\delta _2\le \delta _1\) such that

$$\begin{aligned} {{\mathscr {L}}}''({\bar{x}},{\bar{u}},\bar{\lambda })\left( (x-x^*,u-u^*), (x-x^*,u-u^*)\right) \ge -\left( \beta +\frac{\alpha }{2}\right) \Vert u-u^*\Vert _1^2 \end{aligned}$$

for all \((x,u)\in {{\mathscr {F}}}\) with \(\Vert u-u^*\Vert _1 < \delta _2\) and all \(({\bar{x}},{\bar{u}},\bar{\lambda })\in X\times X_1\) with \(\Vert \bar{x}-x^*\Vert _\infty + \Vert {\bar{u}}-u^*\Vert _1 + \Vert \bar{\lambda }-\lambda ^*\Vert _\infty < \delta _2\).

Proof

We choose \(\gamma >0\) such that

$$\begin{aligned} \begin{aligned}&\left( \beta +\frac{\alpha }{4}\right) \left( 1+\gamma (1+c_s)\right) ^2 +C_{\!{\mathscr {L}}}\gamma (1+c_s)\left( 1+(1+2\gamma )(1+c_s)\right) \\&\quad +3C_{\!{\mathscr {L}}}\gamma ^2(1+c_s)^2 \le \left( \beta +\frac{\alpha }{2}\right) , \end{aligned} \end{aligned}$$
(5.14)

where \(C_{\!{\mathscr {L}}}\) is defined by (5.6) and \(c_s\) is the constant defined by Lemma 5. Let \(\rho (\gamma )\) be defined by Lemma 9. We choose \(0<\delta _2\le \delta _1\) such that

$$\begin{aligned} (1+c_s)\delta _2 < \max \{\bar{\rho },\rho (\gamma )\}. \end{aligned}$$

Then for \((x,u)\in {{\mathscr {F}}}\) with \(\Vert u-u^*\Vert _1 < \delta _2\) we have by Lemma 5

$$\begin{aligned} \Vert x-x^*\Vert _{1,1}+\Vert u-u^*\Vert _1 \le c_s\Vert u-u^*\Vert _1+\Vert u-u^*\Vert _1 < \max \{\bar{\rho },\rho (\gamma )\}. \end{aligned}$$
(5.15)

Hence by Lemma 9 there exists \(({\bar{z}},{\bar{v}})\in T(x^*,u^*)\) with

$$\begin{aligned} \Vert x-{\bar{z}}\Vert _{1,1}+\Vert u-{\bar{v}}\Vert _1 \le \gamma \left( \Vert x-x^*\Vert _{1,1}+\Vert u-u^*\Vert _1\right) . \end{aligned}$$

Together with (5.15) this implies

$$\begin{aligned} \Vert x-{\bar{z}}\Vert _{1,1}+\Vert u-{\bar{v}}\Vert _1 \le \gamma (1+c_s)\Vert u-u^*\Vert _1 \end{aligned}$$
(5.16)

and therefore

$$\begin{aligned} \begin{aligned} (1-\gamma (1+c_s))\Vert u-u^*\Vert _1&\le \Vert {\bar{v}}-u^*\Vert _1 \le (1+\gamma (1+c_s))\Vert u-u^*\Vert _1,\\ \Vert {\bar{z}}-x^*\Vert _{1,1}&\le (1+\gamma )(1+c_s)\Vert u-u^*\Vert _1. \end{aligned} \end{aligned}$$
(5.17)

Further using \((x-x^*,u-u^*)=({\bar{z}}-x^*+x-{\bar{z}},\bar{v}-u^*+u-{\bar{v}})\) we obtain

$$\begin{aligned} \begin{aligned}&{{\mathscr {L}}}''({\bar{x}},{\bar{u}},\bar{\lambda })\left( (x-x^*,u-u^*), (x-x^*,u-u^*)\right) \\&\quad = {{\mathscr {L}}}''({\bar{x}},{\bar{u}},\bar{\lambda })\left( ({\bar{z}}-x^*, {\bar{v}}-u^*),({\bar{z}}-x^*,{\bar{v}}-u^*)\right) \\&\qquad +2 {{\mathscr {L}}}''({\bar{x}},{\bar{u}},\bar{\lambda })\left( (\bar{z}-x^*, {\bar{v}}-u^*),(x-{\bar{z}},u-{\bar{v}})\right) \\&\qquad + {{\mathscr {L}}}''({\bar{x}},{\bar{u}},\bar{\lambda })\left( (x-{\bar{z}}, u-{\bar{v}}),(x-{\bar{z}},u-{\bar{v}})\right) \end{aligned} \end{aligned}$$
(5.18)

Next we estimate the terms on the ride hand side of (5.18). By Lemma 10 and (5.17) we have

$$\begin{aligned} \begin{aligned} {{\mathscr {L}}}''&({\bar{x}},{\bar{u}},\bar{\lambda })\left( ({\bar{z}}-x^*, {\bar{v}}-u^*),({\bar{z}}-x^*,{\bar{v}}-u^*)\right) \\&\ge -\left( \beta +\frac{\alpha }{4}\right) \Vert {\bar{v}}-u^*\Vert _1^2 \ge -\left( \beta +\frac{\alpha }{4}\right) \left( 1+\gamma (1+c_s)\right) ^2\Vert u-u^*\Vert _1^2. \end{aligned} \end{aligned}$$
(5.19)

By (5.6), (5.16) we obtain

$$\begin{aligned}&|{{\mathscr {L}}}''({\bar{x}},{\bar{u}},\bar{\lambda })\left( ({\bar{z}}-x^*, {\bar{v}}-u^*),(x-{\bar{z}},u-{\bar{v}})\right) |\\&\quad \le C_{\!{\mathscr {L}}}\left( \Vert {\bar{z}}-x^*\Vert _\infty \Vert x-\bar{z}\Vert _\infty +\Vert {\bar{z}}-x^*\Vert _\infty \Vert u-{\bar{v}}\Vert _1 +\Vert x-{\bar{z}}\Vert _\infty \Vert {\bar{v}}-u^*\Vert _1\right) \\&\quad \le C_{\!{\mathscr {L}}}\left( \Vert {\bar{z}}-x^*\Vert _{1,1}\Vert x-\bar{z}\Vert _{1,1} +\Vert {\bar{z}}-x^*\Vert _{1,1}\Vert u-{\bar{v}}\Vert _1 +\Vert x-{\bar{z}}\Vert _{1,1}\Vert {\bar{v}}-u^*\Vert _1\right) \\&\quad \le C_{\!{\mathscr {L}}}\gamma (1+c_s)\left( \Vert {\bar{z}}-x^*\Vert _{1,1} +\Vert {\bar{v}}-u^*\Vert _1\right) \Vert u-u^*\Vert _1. \end{aligned}$$

By (5.17) this implies

$$\begin{aligned} \begin{aligned}&|{{\mathscr {L}}}''({\bar{x}},{\bar{u}},\bar{\lambda })\left( ({\bar{z}}-x^*, {\bar{v}}-u^*),(x-{\bar{z}},u-{\bar{v}})\right) |\\&\quad \le C_{\!{\mathscr {L}}}\gamma (1+c_s)\left( 1+(1+2\gamma ) (1+c_s)\right) \Vert u-u^*\Vert _1^2. \end{aligned} \end{aligned}$$
(5.20)

Again by (5.6) and (5.16) we obtain

$$\begin{aligned} \begin{aligned} |{{\mathscr {L}}}''&({\bar{x}},{\bar{u}},\bar{\lambda })\left( (x-{\bar{z}},u-\bar{v}), (x-{\bar{z}},u-{\bar{v}})\right) \\&\le C_{\!{\mathscr {L}}}\left( \Vert x-{\bar{z}}\Vert _\infty ^2 +2\Vert x-{\bar{z}}\Vert _\infty \Vert u-{\bar{v}}\Vert _1\right) \\&\le C_{\!{\mathscr {L}}}\left( \Vert x-{\bar{z}}\Vert _{1,1}^2 +2\Vert x-{\bar{z}}\Vert _{1,1}\Vert u-{\bar{v}}\Vert _1\right) \\&\le 3C_{\!{\mathscr {L}}}\gamma ^2(1+c_s)^2\Vert u-u^*\Vert _1^2. \end{aligned} \end{aligned}$$
(5.21)

Now inserting the estmates (5.19), (5.20), (5.21) into (5.18) the assertion follows from (5.14). \(\square \)

We can now show, that Assumptions (5.7) and (5.8) imply the growth condition (4.8) with \(\kappa =2\).

Theorem 3

Let Assumptions (2.1), (5.1), (5.2), (5.7), and (5.8) be satisfied. Then

$$\begin{aligned} f(x(t_f))-f(x^*(t_f)) \ge \frac{3}{4}\alpha \,\Vert u-u^*\Vert _1^2 \end{aligned}$$

for all \((x,u)\in {{\mathscr {F}}}\) with \(\Vert u-u^*\Vert _1 < \delta _2\), where \(\delta _2\) is defined by Lemma 11. Moreover, condition (4.8) is satisfied with \(\kappa =2\).

Proof

For arbitrary \((x,u)\in {{\mathscr {F}}}\) with \(\Vert u-u^*\Vert _1 < \delta _2\) we have

$$\begin{aligned} f\left( x(t_f)\right) -f\left( x^*(t_f)\right) = {\mathscr {L}}(x,u,\lambda ^*)-{{\mathscr {L}}}(x^*,u^*,\lambda ^*). \end{aligned}$$

Using \(x(t_0)-x^*(t_0)=0\) and (5.3) we get by Taylor expansion

$$\begin{aligned} f\left( x(t_f)\right) -f\left( x^*(t_f)\right)&= {{\mathscr {L}}}_u(x^*,u^*,\lambda ^*)(u-u^*)\\&\quad + \frac{1}{2}{{\mathscr {L}}}''(z,v,\lambda ^*)\left( (x-x^*, u-u^*),(x-x^*,u-u^*)\right) , \end{aligned}$$

where \((z,v)=(1-\tau )(x^*,u^*)+\tau (x,u)\) with \(\tau \in \,]0,1[\). By (5.7) and Lemma 11 this implies

$$\begin{aligned} f\left( x(t_f)\right) -f\left( x^*(t_f)\right)&\ge \bar{\alpha }\,\Vert u-u^*\Vert _1^2 -\frac{1}{2}\left( \beta +\frac{\alpha }{2}\right) \Vert u-u^*\Vert _1^2\\&=\left( \frac{3}{4}\bar{\alpha }-\frac{\beta }{4}\right) \,\Vert u-u^*\Vert _1^2 \ge \frac{3}{4}\alpha \,\Vert u-u^*\Vert _1^2, \end{aligned}$$

which proves the first part of the assertion. For arbitrary \((x,u)\in {{\mathscr {F}}}_{\delta _2}\) we have \((x,u)\in {{\mathscr {F}}}\) and \(\Vert u-u^*\Vert _1<\delta _2\) which implies that condition (4.8) is satisfied with \(\kappa =2\), \(\alpha \) replaced by \(\frac{3}{4}\alpha \), and \(\bar{\varepsilon }\) replaced by \(\delta _2\). \(\square \)

Remark 7

Note that for the proof of Theorem 3 we only need the fact, that \((x^*,u^*)\) is feasible and satisfies together with the unique solution \(\lambda ^*\) of the adjoint equation the minimum principle and Assumptions (2.3), (2.4), (2.5), (5.1), (5.2), (5.7), and (5.8). Theorem 3 then shows, that \((x^*,u^*)\) is a strict local solution, i.e., Assumptions (5.8), (5.7) can be viewed as sufficient optimality condition. In case of linear-quadratic control problems as considered in Alt et al. [6] the second derivatives of the Hamiltonian and the Lagrange function do not depend on (xu). This allows to use a more general version of condition (5.7). \(\Diamond \)

For the derivation of error estimates of order 1 for the discrete solutions we proceed similarly to Dontchev and Veliov [14], Haunschmied et al. [21] and use the fact that the discrete solutions can be interpreted as solution of a perturbation of Problem (OC). This approach has also been used in Alt et al. [6] for linear-quadratic control problems. For the more general class of nonlinear control problems considered here we adapt results of Alt [2, Section 3], where Lipschitz continuity of perturbed solutions of nonlinear optimization problems has been studied.

Lemma 12

Let Assumptions (2.1), (5.1), (5.2), (5.7), and (5.8) be satisfied. Further let \(0<\rho <\bar{\rho }\), where \(\bar{\rho }>0\) is given by Lemma 5, and let \((x^*_h,u^*_h)\) be a (global) solution of Problem \(\text{(OC) }_{N,\rho }\). Then there is a function \(\zeta _h:[t_0,t_f]\rightarrow \mathbb {R}^m\) satisfying

$$\begin{aligned} |\zeta _h(t)| \le K_\lambda (L_g(L_x+1)+c_A)h_N \end{aligned}$$
(5.22)

such that

$$\begin{aligned} \int _{t_0}^{t_f}\left[ \lambda ^*_h(t)^{\mathsf {T}}g^{(2)}(x^*_h(t),t) +\zeta _h(t)^{\mathsf {T}}\right] \left( u^*(t)-u^*_h(t)\right) \,{\mathrm {d}}t\ge 0. \end{aligned}$$
(5.23)

Proof

By the discrete minimum principle (3.2) we have

$$\begin{aligned} \lambda ^*_h(t_{j+1})^{\mathsf {T}}g^{(2)}(x^*_h(t_j),t_j) (u-u^*_h(t_j))\ge 0 \quad \forall u\in U,\; j\in J_0^{N-1}. \end{aligned}$$
(5.24)

We define piecewise constant functions \({\bar{x}}_h:[t_0,t_f]\rightarrow \mathbb {R}^n\), \(B_h:[t_0,t_f]\rightarrow \mathbb {R}^{n\times m}\), and \(\bar{\lambda }_h:[t_0,t_f]\rightarrow \mathbb {R}^n\) by

$$\begin{aligned} {\bar{x}}_h(t)=x^*_h(t_j),\; B_h(t)=g^{(2)}\left( x^*_h(t_j),t_j\right) ,\; \bar{\lambda }_h(t)=\lambda ^*_h(t_{j+1}),\; \end{aligned}$$

for \(t\in [t_j,t_{j+1}[\), \(j\in J_0^{N-1}\). Then we can write the discrete switching function \(\sigma ^*_h\) defined by (3.3) in the form

$$\begin{aligned} \sigma ^*_h(t)=B_h(t)^{\mathsf {T}}\bar{\lambda }_h(t) =g^{(2)}(x^*_h(t),t)^{\mathsf {T}}\lambda ^*_h(t)+\zeta _h(t)\; \hbox { for a.a.~}\ t\in [t_0,t_f], \end{aligned}$$

where \(\zeta _h\) is defined by

$$\begin{aligned} \zeta _h(t) = B_h(t)^{\mathsf {T}}\bar{\lambda }_h(t)^{\mathsf {T}} -g^{(2)}(x^*_h(t),t)^{\mathsf {T}}\lambda ^*_h(t). \end{aligned}$$

Further we can write the discrete minimum principle (5.24) in the form

$$\begin{aligned} \sigma ^*_h(t)^{\mathsf {T}}(u-u^*_h(t))= & {} \left[ g^{(2)}(x^*_h(t),t)^{\mathsf {T}}\lambda ^*_h(t) +\zeta _h(t)\right] ^{\mathsf {T}}(u-u^*_h(t))\nonumber \\\ge & {} 0 \quad \forall u\in U \end{aligned}$$
(5.25)

for a.a. \(t\in [t_0,t_f]\), From (5.25) we further obtain (5.23). In order to estimate \(|\zeta _h(t)|\) we use (2.4). For \(t\in [t_j,t_{j+1}[\), \(j\in J_0^{N-1}\), we have

$$\begin{aligned} |\zeta _h(t)|&\le |\bar{\lambda }_h(t)|\,\Vert B_h(t)-g^{(2)}(x^*_h(t),t)\Vert +\Vert g^{(2)}(x^*_h(t),t)\Vert \,|\bar{\lambda }_h(t)-\lambda ^*_h(t)|\\&\le K_\lambda L_g\left( |x^*_h(t_j)-x^*_h(t)|+|t_j-t|\right) +c_g|\lambda ^*_h(t_{j+1})-\lambda ^*_h(t)| \end{aligned}$$

with some constant \(c_g\) independent of \((x^*_h,u^*_h)\in {{\mathscr {F}}}_{N,\rho }\). By (2.8) we have

$$\begin{aligned} |x^*_h(t_j)-x^*_h(t)| \le L_xh_N, \end{aligned}$$

and from the discrete adjoint equation we obtain

$$\begin{aligned} |\lambda ^*_h(t_{j+1})-\lambda ^*_h(t)| \le h_N \Vert A_h(t_j)\Vert |\lambda ^*_{h,j+1}| \le c_AK_\lambda h_N, \end{aligned}$$

which implies (5.22). \(\square \)

Theorem 4

Let Assumptions (2.1), (5.1), (5.2), (5.7), and (5.8) be satisfied and suppose that \(u^*\) has bounded variation. Then for each \(0<\rho \le \delta _2\) Problem \(\text{(OC) }_{N,\rho }\) has a (global) solution for sufficiently large N. Further for each such solution \((x^*_h,u^*_h)\) and the associated adjoint function \(\lambda ^*_h\) the estimates

$$\begin{aligned} \Vert u^*_h-u^*\Vert _1\le c_uh_N,\; \Vert x^*_h-x^*\Vert _{1,1}\le c_xh_N,\; \Vert \lambda ^*_h-\lambda ^*\Vert _{1,1}\le c_\lambda h_N \end{aligned}$$
(5.26)

hold with constants \(c_u\), \(c_x\), and \(c_\lambda \) independent of N and the solution \((x^*_h,u^*_h)\).

Proof

Let \(0<\rho \le \delta _2\) be given. By Theorem 3 condition (4.8) is satisfied. Therefore, by Theorem 1 Problem \(\text{(OC) }_{N,\rho }\) has a (global) solution for sufficiently large N. Further for each such solution \((x^*_h,u^*_h)\) the estimates

$$\begin{aligned} \Vert u^*_h-u^*\Vert _1\le c_uh_N^{\frac{1}{2}},\quad \Vert x^*_h-x^*\Vert _{1,1}\le c_xh_N^{\frac{1}{2}} \end{aligned}$$

hold with constants \(c_u\), \(c_x\) independent of N, and by (4.19) we have

$$\begin{aligned} \Vert \lambda ^*_h-\lambda ^*\Vert _{1,1}\le c_{\lambda ,2} h_N^{\frac{1}{2}} \end{aligned}$$

with some constant \(c_{\lambda ,2}\) independent of N and \(x^*_h\), \(x^*\), \(u^*_h\), and \(u^*\). For suffiently large N we therefore have

$$\begin{aligned} \Vert&x^*_h-x^*\Vert _\infty + \Vert u^*_h-u^*\Vert _1 + \Vert \lambda ^*_h-\lambda ^*\Vert _{1,1} \\&\le \Vert x^*_h-x^*\Vert _{1,1} + \Vert u^*_h-u^*\Vert _1 + \Vert \lambda ^*_h-\lambda ^*\Vert _{1,1} <\delta _2, \end{aligned}$$

and \((x^*_h,u^*_h)\in {{\mathscr {F}}}_{N,\bar{\rho }}\). By Lemma 6 there exists a function \(z^*_h\in X_1\), such that \((z^*_h,u^*_h)\in {{\mathscr {F}}}\) and

$$\begin{aligned} \Vert z^*_h-x^*_h\Vert _{1,1} \le c_z h_N \end{aligned}$$
(5.27)

with a constant \(c_z\) independent of N, which implies

$$\begin{aligned} \Vert z^*_h-x^*\Vert _\infty \le \delta _2 \end{aligned}$$
(5.28)

for sufficiently large N. As in the proof of Theorem 3, using \(z^*_h(t_0)-x^*(t_0)=0\) and (5.3), we get by Taylor expansion around \((x^*,u^*)\)

$$\begin{aligned} f\left( z^*_h(t_f)\right) -f\left( x^*(t_f)\right)&= {\mathscr {L}}(z^*_h,u^*_h,\lambda ^*) -{{\mathscr {L}}}(x^*,u^*,\lambda ^*)\\&= {{\mathscr {L}}}_u(x^*,u^*,\lambda ^*)(u^*_h-u^*)\\&\quad +\frac{1}{2}{{\mathscr {L}}}''(z,v,\lambda ^*)\left( (z^*_h-x^*, u^*_h-u^*),\left( z^*_h\right. \right. -\left. \left. x^*,u^*_h-u^*\right) \right) , \end{aligned}$$

where \((z,v)=(1-\tau )(x^*,u^*)+\tau (z^*_h,u^*_h)\) with \(\tau \in \,]0,1[\). By (5.7), (5.8), and Lemma 11 this implies

$$\begin{aligned} \begin{aligned} f\left( z^*_h(t_f)\right) -f\left( x^*(t_f)\right)&\ge \bar{\alpha }\,\Vert u^*_h-u^*\Vert _1^2 -\frac{1}{2}\left( \beta +\frac{\alpha }{2}\right) \,\Vert u^*_h-u^*\Vert _1^2\\&= \left( \frac{3}{4}\alpha +\frac{\beta }{2}\right) \Vert u^*_h-u^*\Vert _1^2. \end{aligned} \end{aligned}$$
(5.29)

Similarly we obtain by Taylor expansion around \((z^*_h,u^*_h)\)

$$\begin{aligned}&f\left( x^*(t_f)\right) -f\left( z^*_h(t_f)\right) = {\mathscr {L}}(x^*,u^*,\lambda ^*_h) -{{\mathscr {L}}}(z^*_h,u^*_h,\lambda ^*_h)\\&\quad = {{\mathscr {L}}}_x(z^*_h,u^*_h,\lambda ^*_h)(x^*-z^*_h) +{{\mathscr {L}}}_u(z^*_h,u^*_h,\lambda ^*_h)(u^*-u^*_h)\\&\qquad +\frac{1}{2}{{\mathscr {L}}}''(z,v,\lambda ^*_h)\left( (x^*-z^*_h, u^*-u^*_h),(x^*-z^*_h,u^*-u^*_h)\right) , \end{aligned}$$

where \((z,v)=(1-\tau )(z^*_h,u^*_h)+\tau (x^*,u^*)\) with \(\tau \in \,]0,1[\,\). By Lemma 11 this implies

$$\begin{aligned} f\left( x^*(t_f)\right) -f\left( z^*_h(t_f)\right) \ge&\;{\mathscr {L}}_x(z^*_h,u^*_h,\lambda ^*_h)(x^*-z^*_h) +{{\mathscr {L}}}_u(z^*_h,u^*_h,\lambda ^*_h)(u^*-u^*_h)\\&\;-\frac{1}{2}\left( \beta +\frac{\alpha }{2}\right) \,\Vert u^*_h-u^*\Vert _1^2. \end{aligned}$$

Combining this estimate with (5.29) we obtain

$$\begin{aligned} \frac{\alpha }{2}\,\Vert u^*_h-u^*\Vert _1^2 \le -{\mathscr {L}}_x(z^*_h,u^*_h,\lambda ^*_h)(x^*-z^*_h) -{\mathscr {L}}_u(z^*_h,u^*_h,\lambda ^*_h)(u^*-u^*_h). \end{aligned}$$
(5.30)

We define \(z=x^*-z^*_h\). Using integration by parts we obtain for the first term on the right hand side of (5.30)

$$\begin{aligned} -{{\mathscr {L}}}_x(z^*_h,u^*_h,\lambda ^*_h)(z) =&-f_x\left( z^*_h(t_f)\right) z(t_f) -\int _{t_0}^{t_f}\lambda ^*_h(t)^{{\mathsf {T}}} g_x(z^*_h(t),u^*_h(t),t)z(t)\,{\mathrm {d}}t\\&+\int _{t_0}^{t_f}\lambda _h^*(t)^{{\mathsf {T}}}\dot{z}(t)\,{\mathrm {d}}t\\ =&-f_x\left( z^*_h(t_f)\right) z(t_f) -\int _{t_0}^{t_f}\lambda ^*_h(t)^{{\mathsf {T}}} g_x(z^*_h(t),u^*_h(t),t)z(t)\,{\mathrm {d}}t\\&+\lambda ^*_h(t_f)z(t_f) - \int _{t_0}^{t_f}\dot{\lambda }_h^*(t)^{{\mathsf {T}}}z(t)\,{\mathrm {d}}t\end{aligned}$$

By Lemma 8 we can write the discrete adjoint Eq. (3.1) in the form (4.17). Using this and the terminal condition \(\lambda ^*_h(t_f) = f_x(x^*_h(t_f))^{\mathsf {T}}\) we further obtain

$$\begin{aligned} -{{\mathscr {L}}}_x(z^*_h,u^*_h,\lambda ^*_h)(z)&= -f_x\left( z^*_h(t_f)\right) z(t_f) -\int _{t_0}^{t_f}\lambda ^*_h(t)^{{\mathsf {T}}} g_x(z^*_h(t),u^*_h(t),t)z(t)\,{\mathrm {d}}t\\&\quad +f_x(x^*_h(t_f))z(t_f)\\&\quad + \int _{t_0}^{t_f}\left[ \lambda ^*_h(t)^{{\mathsf {T}}} g_x(x^*_h(t),u^*_h(t),t) +\xi _h(t)^{{\mathsf {T}}}\right] z(t)\,{\mathrm {d}}t\end{aligned}$$

By (2.4), (4.16), (4.18), and (5.27) this implies

$$\begin{aligned} -{{\mathscr {L}}}_x(z^*_h,u^*_h,\lambda ^*_h)(x^*-z^*_h)&\le L_f|z^*_h(t_f)-x^*_h(t_f)|\,|x^*(t_f)-z^*_h(t_f)|\\&\quad +(t_f-t_0)\left( K_\lambda L_g\Vert z^*_h-x^*_h\Vert _\infty +\Vert \xi _h\Vert _\infty \right) \Vert x^*-z^*_h\Vert _\infty \\&\le \left[ L_f c_z + (t_f-t_0)\left( K_\lambda L_g c_z +c_\xi \right) \right] h_N\Vert x^*-z^*_h\Vert _\infty . \end{aligned}$$

It follows from Lemma 5 that

$$\begin{aligned} \Vert z^*_h-x^*\Vert _\infty \le \Vert z^*_h-x^*\Vert _{1,1} \le c_s\Vert u^*_h-u^*\Vert _1, \end{aligned}$$
(5.31)

and

$$\begin{aligned} \begin{aligned}&-{{\mathscr {L}}}_x(z^*_h,u^*_h,\lambda ^*_h)(x^*-z^*_h)\\&\quad \le \left[ L_f c_z + (t_f-t_0)\left( K_\lambda L_g c_z +c_\xi \right) \right] c_sh_N\Vert u^*_h-u^*\Vert _1. \end{aligned} \end{aligned}$$
(5.32)

By Lemma 12 there is a function \(\zeta _h:[t_0,t_f]\rightarrow \mathbb {R}^m\) satisfying (5.22) such that the discrete minimum principle can be written in the form (5.23). Therefore, defining

$$\begin{aligned} \bar{\zeta }_h(t) = \lambda ^*_h(t)^{\mathsf {T}}\left[ g^{(2)}(x^*_h(t),t) -g^{(2)}(z^*_h(t),t)\right] +\zeta _h(t) \end{aligned}$$

we obtain for the second term on the right hand side of (5.30)

$$\begin{aligned} -{{\mathscr {L}}}_u(z^*_h,u^*_h,\lambda ^*_h)(u^*-u^*_h)&=-\int _{t_0}^{t_f}\lambda ^*_h(t)^{\mathsf {T}} g^{(2)}(z^*_h(t),t)\left( u^*(t)-u^*_h(t)\right) \,{\mathrm {d}}t\\&=-\int _{t_0}^{t_f}\left[ \lambda ^*_h(t)^{\mathsf {T}} g^{(2)}(x^*_h(t),t)+\zeta _h(t)^{\mathsf {T}}\right] \left( u^*(t)-u^*_h(t)\right) \,{\mathrm {d}}t\\&\quad +\int _{t_0}^{t_f}\bar{\zeta }_h(t)^{\mathsf {T}} \left( u^*(t)-u^*_h(t)\right) \,{\mathrm {d}}t\\&\le \int _{t_0}^{t_f}\bar{\zeta }_h(t)^{\mathsf {T}} \left( u^*(t)-u^*_h(t)\right) \,{\mathrm {d}}t. \end{aligned}$$

Further by (2.4), (4.16), (5.22), and (5.27) we have

$$\begin{aligned} |\bar{\zeta }_h(t)| \le K_\lambda L_g|x^*_h(t)-z^*_h(t)|+|\zeta _h(t)| \le K_\lambda \left( L_g(c_z+L_x+1)+c_A\right) h_N, \end{aligned}$$

which implies

$$\begin{aligned}&-{{\mathscr {L}}}_u(z^*_h,u^*_h,\lambda ^*_h)(u^*-u^*_h)\nonumber \\&\quad \le (t_f-t_0)K_\lambda \left( L_g(c_z+L_x+1) + c_A\right) h_N\Vert u^*_h-u^*\Vert _1. \end{aligned}$$
(5.33)

Finally (5.30) and the estimates (5.32) and (5.33) show that with some constant \({\tilde{c}}_u\) independent of N and the solution \((x^*_h,u^*_h)\),

$$\begin{aligned} \frac{\alpha }{2}\,\Vert u^*_h-u^*\Vert _1^2 \le {\tilde{c}}_u h_N \Vert u^*_h-u^*\Vert _1, \end{aligned}$$

which immediately implies the estimate for \(\Vert u^*_h-u^*\Vert _1\) in (5.26). The estimate for \(\Vert x^*_h-x^*\Vert _{1,1}\) then follows from (5.27) and (5.31), and the estimate for \(\Vert \lambda ^*_h-\lambda ^*\Vert _{1,1}\) follows from (4.20). \(\square \)

Remark 8

(compare Remark 3) Theorem 4 also assumes that \((x^*_h,u^*_h)\) is a global solution of Problem \(\text{(OC) }_{N,\rho }\) and we have not shown uniqueness of the discrete solutions. The reason is that for the class of control problems considered here condition (5.7) does in general not hold for the discrete control problems. The linear-quadratic control problems considered in Alt et al. [6] are convex optimization problems. Therefore, the solutions of the discrete problems are global solutions. In this case Theorem 4 implies Alt et al. [6, Theorem 14] for \(\kappa =1\). \(\Diamond \)

6 Numerical results

Example 2

We consider the following modification of the rocket car problem discussed in Alt et al. [4, Example 6.1] with a nonlinear and non convex cost functional and a nonlinear state equation:

Here the function g is defined by

$$\begin{aligned} g(x_1,x_2,u) = \begin{pmatrix} x_2\\ (1+\varepsilon x_2)u \end{pmatrix}, \end{aligned}$$

and

$$\begin{aligned} g_x(x_1,x_2,u) = \begin{pmatrix} 0 &{} 1\\ 0 &{} \varepsilon u \end{pmatrix},\quad g_u(x_1,x_2,u) = \begin{pmatrix} 0\\ 1+\varepsilon x_2 \end{pmatrix}. \end{aligned}$$

The Hamiltonian is defined by

$$\begin{aligned} H(x_1,x_2,u,\lambda _1,\lambda _2) = \lambda _1 x_2 + \lambda _2(1+\varepsilon x_2)u, \end{aligned}$$

and we have

$$\begin{aligned} H_x(x_1,x_2,u,\lambda _1,\lambda _2) = \left( 0,\; \lambda _1+\varepsilon \lambda _2u\right) ,\quad H_u(x_1,x_2,u,\lambda _1,\lambda _2) = \lambda _2(1+\varepsilon x_2), \end{aligned}$$

and

$$\begin{aligned} H_{ux}(x_1,x_2,u,\lambda _1,\lambda _2) = \left( 0,\varepsilon \lambda _2\right) ,\quad H_{xx}(x_1,x_2,u,\lambda _1,\lambda _2) = \begin{pmatrix} 0 &{} 0\\ 0 &{} 0 \end{pmatrix}. \end{aligned}$$

Therefore, the condition for the quadratic form in Assumption (5.8) is here equivalent to

$$\begin{aligned} 3x_1^*(5)z_1(5)^2+z_2(5)^2 + 2\varepsilon \int _0^5 z_2(t)\lambda ^*_2(t)v(t)\,{\mathrm {d}}t\ge -\beta \,\Vert v\Vert _1^2 \end{aligned}$$

for all \((z,v)=(x,u)-(x^*,u^*)\) with \(u\in {{\mathscr {U}}}\), \(z(t_0)=0\) and

$$\begin{aligned} \dot{z}_1(t)&= z_2(t),\\ \dot{z}_2(t)&= \varepsilon u^*(t)z_2(t) +\left( 1 + \varepsilon x^*_2(t)\right) v(t) \end{aligned}$$

for a.a. \(t\in [0,5]\). From the numerical results we have \(x_1^*(5)\approx 1.5475>0\). Therefore, this condition is satisfied for arbitrarily small \(\beta \) if \(\varepsilon \) is sufficiently small.

For \(\varepsilon = \frac{1}{2}\) the optimal control is of bang–bang type with one switching point \(s_1\approx 4.49848\). The discretization errors depicted in Table 1 indicate convergence of order 1 w.r.t. the mesh size h as expected by Theorem 4. Figure 1 shows the computed optimal control and states, and the switching function for \(N=100\). \(\Diamond \)

Table 1 Discretization error
Fig. 1
figure 1

Optimal solution for \(N = 100\) (Color figure online)

7 Conclusions

In this paper we derived error estimates for Euler approximation of a class of nonlinear optimal control problems of Mayer type with control appearing linearly. Such estimates were previously known only in case of continuous controls, for linear-quadratic problems affine w.r.t. the control, and for some special classes of control problems with a nonlinear cost functional but a linear or semilinear state equation. The results were obtained under the growth condition (4.8) for the cost functional or under the stronger second-order sufficient optimality condition (5.7) excluding singular arcs of the optimal control (see Remark 5). Felgenhauer [18] shows for scalar bang–bang controls that a second-order optimality condition for the so-called “induced finite-dimensional problem” of optimizing the switching times implies (4.8) with \(\kappa =2\). It is an open question whether (4.8) can be satisfied if the optimal control has singular arcs. But it should be noted that in this case other second-order conditions may be useful (see e.g. Felgenhauer [19], where a second-order condition in connection with the Goh transformation has been used).

A nonlinear control problem may have many local solutions and Example 2 shows that the analytical verification of conditions (4.8) and (5.7) may be difficult. Therefore, another important topic, not treated in this paper, is the numerical verification of such conditions. For the numerical verification of (4.8) in case of bang–bang controls one can use known results. First the control problem is solved by Euler discretization in order to obtain a good approximation for the switching times. If this works, in a second step the induced finite-dimensional problem mentioned above can be solved in order to compute the switching times more accurately. Then the test for the numerical verification of a second-order sufficient optimality condition for the induced finite-dimensional problem discussed in Maurer et al. [29] can be applied. If the test is successful, then the results of Felgenhauer [18] show that (4.8) holds with \(\kappa =2\). While the results of Maurer et al. [29] are stated for general nonlinear control problems, the results of Felgenhauer [18] so far are restricted to scalar control problems. As an alternative, tests based on Riccati differential equations can be used (see Felgenhauer [18, Section 4], Osmolovskii and Maurer [34] and the papers cited therein).