1 Introduction

The ultimate purpose of the paper is to study optimal control problems with state constraints for systems governed by differential inclusions. This is a fairly nontrivial class of optimization problems. The result we prove seems to offer the sharpest and most general necessary optimality conditions. It strengthens Clarke’s recent theorem [1] (so far the strongest for problems without state constraints), in particular, by extending it to problems involving state constraints, and accordingly weakens the assumptions on which the principal results presented in the monograph by Vinter [2] (for problems with state constraints) are based.

But I believe the way the result has been obtained deserves a special, if not the main, attention. The key element of the proof is reduction of the original problem to unconstrained minimization of a functional that looks similar to the classical Bolza problem of calculus of variations with Lipschitz integrand and off-integral term. (Actually, in the absence of state constraints it is a standard Bolza problem with nonsmooth integrand and off-integral term. In the presence of state constraints, the latter has more complicated structure.) The proof of necessary optimality conditions for strong minimum in this “generalized Bolza problem” is a combination of a simple relaxation theorem that allows to reduce the task to finding necessary conditions for weak minimum in the relaxed problem with subsequent application of standard rules of the nonconvex subdifferential calculus which is a well-developed chapter of local variational analysis.

The idea behind the reduction is actually very simple: if in an optimization problem the cost function is Lipschitz near a solution and constraints are subregular at the point, then the problem admits exact penalization with a Lipschitz (but necessarily nonsmooth) penalty function giving a certain measure of violation of constraints. This fact was first mentioned and used in [3], but the very idea to use one or another type of exact penalization appeared even earlier, at the very dawn of variational analysis. Ioffe and Tikhomirov in [4] and Rockafellar in [5] used such reduction to prove the existence of solutions in optimal control problems. Clarke in [6] applied similar reduction to prove the maximum principle for optimal control problems under the calmness assumption. This is a fairly weak assumption (implied, in particular, by metric regularity of constraints), although difficult to verify directly. However, the techniques used in [6] did not work in the absence of calmness and a different approach (associated with controllability) was chosen in the accompanying paper [7] to prove a full version of the maximum principle.

In the three mentioned works, the reduction was based on the standard trick when a mapping was replaced by the indicator function of its graph, so the penalty functions appeared to be extended real-valued. Loewen [8] was the first to use a Lipschitz penalty to reduce an optimal control problem with free end points (automatically calm) to a Bolza problem with everywhere finite integrand, even having certain Lipschitz properties. His proof of the maximum principle for such problems was amazingly simple. But again, the techniques could not be applied to problems with fixed end points and, like Clarke, Loewen had to use another technique for the proof of a general maximum principle.

A proof fully based on reduction to an unconstrained Bolza problem was given by Ioffe in [9] with the help of different exact penalization techniques associated with subregularity property of constraints. This technique was later formalized in [17] in terms of a certain “optimality alternative” (see the last section). The other key factor that made the proof in [9] possible was the necessary condition for Bolza problems obtained by Ioffe and Rockafellar in [10]. This paper was a crucial step in another (but associated) line of developments related to the very structure of necessary optimality conditions in optimal control problems with differential inclusions.

Here is a brief account of the developments up to the mentioned Clarke’s work [1]. Starting with his early papers [6, 7] and up to Loewen’s book [8], the adjoint Euler–Lagrange inclusions were stated it terms of Clarke’s normal cones which are convex closures of limiting normal cones. In the pioneering papers by Smirnov [11] and Loewen and Rockafellar [12], a new and substantially more precise version of the Euler–Lagrange inclusion was introduced that involved only partial convexification of the limiting normal cones. The problem was that in both cases the proof needed convexity of values of the set-valued mapping in the right-hand side of the inclusion. Mordukhovich in [13] showed that a modification of Smirnov’s proof (for bounded-valued and Hausdorff Lipschitz set-valued maps) allows to establish the same type of Euler–Lagrange inclusion without the convexity assumption. However, the maximum principle was lost in this proof. In 1997, Ioffe in the mentioned paper [9] and Vinter and Zheng [14], using a different approach, did prove that both the partially convexified Euler–Lagrange inclusion and the maximum principle are necessary for a strong local minimum in optimal control problems with differential inclusions under fairly general conditions, basically the same as in [12] but without the convexity assumption on the set-valued map. A detailed proof of the result combining the approaches of [9, 14] is present in Vinter’s monograph [2].

Another line of developments concerns Lipschitz properties of the set-valued mapping in the right-hand side of the differential inclusion near he optimal trajectory (see the comments after the statement of Theorem 3.3).

The plan of the paper is the following. The next section contains information, mainly from variational analysis, which is necessary for the statements and proofs of the main results. Section 3 is central. It contains the statements of the problems and the main theorems. We also give in this section some applications and versions of the theorems. The subsequent two sections contain the proofs of the main theorems for the generalized Bolza problem (Sect. 4) and the optimal control problem (Sect. 5), and in a short last section we return to a question that has remained open for quite a while. To conclude, we mention that for optimal control problems in the classical Pontryagin form the proofs can be noticeably simplified. We hope to discuss this in detail elsewhere.

2 Preliminaries

In what follows \(x\in \mathbb {R}^n\), \(\Vert x\Vert \) is the standard Euclidean norm of x, B is the unit ball, B(xr) is the closed ball of radius r around x and \(\langle w,x\rangle \) stands for the inner product. The symbol C[0, T] stands for the space of \(\mathbb {R}^n\)-valued continuous functions on [0, T] with the standard norm

$$\begin{aligned} \Vert x(\cdot )\Vert = \max _{0\le t\le T}\Vert x(t)\Vert ; \end{aligned}$$

by \(W^{1,p}[0,T]\), \(p\in [1,\infty ]\) (or just \(W^{1,p}\)), we denote the space of \(\mathbb {R}^n\)-valued absolutely continuous functions on [0, T] whose derivatives belong to \(L^p\) with the norm

$$\begin{aligned} \Vert x(\cdot )\Vert _{1,p}= \Vert x(0)\Vert + \Vert \dot{x}(\cdot )\Vert _{p}. \end{aligned}$$

By \(f\circ G\), we denote the composition of a mapping G and a function. But when dependence on the variables should be emphasized, we would often write f(G(x)) rather than \((f\circ G)(x)\).

2.1 Subdifferentials

We shall need several types of subdifferentials: proximal and limiting subdifferential in \(\mathbb {R}^n\) [2, 15] and in general Hilbert spaces [16], Dini-Hadamard subdifferential in separable Banach spaces, and G-subdifferential [17, 18] and Clarke’s generalized gradients in Banach spaces [19], all coinciding with the subdifferential in the sense of convex analysis for convex functions. We shall denote by \(\partial _p\) the proximal subdifferential, by \(\partial _C\) the generalized gradient of Clarke and by \(\partial \) both the limiting subdifferential in \(\mathbb {R}^n\) and the subdifferential in the sense of convex analysis, by \(\partial ^-\) the Dini-Hadamard subdifferential and by \(\partial _G\) the G-subdifferential in the general Banach space.

Here are some basic facts about these subdifferentials we shall need in proofs.

Proposition 2.1

([10], Theorem 2) Let X be a Hilbert space, and let \(f_1,\ldots ,f_k\) be lsc functions on X which are finite at a certain x and uniformly lower semicontinuous at x in the following sense: there is a \(\varepsilon >0\) such that for any sequences \(\{x_{im}\}\) of elements of \(B(x,\varepsilon )\cap \mathrm{dom}~f_i\) such that \(\Vert x_{im}-x_{jm}\Vert \rightarrow 0\) as \(m\rightarrow \infty \) for \({i,j=1,\dots ,k}\), there is a sequence \(\{u_m\}\) such that \(\Vert x_{im}-u_m\Vert \rightarrow 0\) for any \(i=1,\ldots ,k\) and

$$\begin{aligned} \liminf _{m\rightarrow \infty }\sum _i(f_i(x_{im})- f_i(u_m))\ge 0. \end{aligned}$$

Set \(f(x)= f_1(x)+\cdots + f_k(x)\) and let \(x^*\in \partial _pf(x)\). Then for any \(\delta >0\) there are \(x_i\) with \(\Vert x_i-x\Vert \le \delta \) and \(|f_i(x_i)-f_i(x)|<\delta \), and \(x_i^*\in \partial _p f_i(x_i)\) such that \(\Vert x_1^*+\cdots x_k^*-x^*\Vert <\delta \), \(i=1,\ldots ,k\).

The uniform lower semicontinuity property is satisfied, in particular, when all functions but possibly one of them are Lipschitz near x.

Proposition 2.2

([16], Chapter 3, Theorem 5.10) Let \(\varphi (t,x)\) be a function on \([0,T]\times \mathbb {R}^n\) which is measurable in t and lower semicontinuous in x. Consider in \(L^2(0,T)\) the functional

$$\begin{aligned} f(x(\cdot ))= \int _0^T \varphi (t,x(t)){\mathrm{d}}t \end{aligned}$$

and assume that \(\xi (\cdot )\in \partial _pf(x(\cdot ))\). Then \(\xi (t)\in \partial _p\varphi (t,\cdot )(x(t))\) almost everywhere.

Note that it is assumed in [16] that \(\varphi \) does not depend on t and is globally Lipschitz as a function of x. But neither of the assumptions actually plays any role in the proof.

Proposition 2.3

([17], Corollary 7.15) Let X be a Banach space, and let the functions \(f_i\), \(i=1,\ldots ,k\) be Lipschitz continuous in a neighborhood of a certain \({\overline{x}}\). Set \(f(x)=\max _if_i(x)\) and let \(I=\{i:\; f_i({\overline{x}})=f({\overline{x}}) \}\). Then \(\partial _G(f({\overline{x}}))\subset \sum \alpha _i \partial _Gf_i({\overline{x}})\) over all \(\alpha _i\ge 0\) such that \(\; \alpha _i= 0\) for \(i\not \in I\) and \( \sum \alpha _i=1\).

Proposition 2.4

([17], Proposition 4.60) Let X and Y be Banach spaces and \(A: X\rightarrow Y\) a linear bounded operator with \(\mathrm{Im}~A=Y\). Let further f be a function on Y which is Lipschitz continuous in a neighborhood of a certain \({\overline{y}}=A{\overline{x}}\). Set \(g= f\circ A\). Then

$$\begin{aligned} \partial _Gg({\overline{x}}) = A^*(\partial _Gf({\overline{y}})). \end{aligned}$$

Surprisingly the next result seems to be absent in the literature, so we supply it with a complete proof (in which we use without explanation the definition of the G-subdifferential in a separable Banach space).

Proposition 2.5

Let X and Y be separable Banach spaces, let \(A: X\rightarrow Y\) be a bounded linear operator, and let \(\varphi \) be a function on Y which is Lipschitz continuous in a neighborhood of a certain \({\overline{y}}=A{\overline{x}}\). Set \(f(x) = \varphi (Ax)\). Then

$$\begin{aligned} \partial _G f({\overline{x}})\subset A^*\partial _G \varphi ({\overline{y}}). \end{aligned}$$

Proof

Take an x sufficiently close to \({\overline{x}}\), and suppose that \(x^*\in \partial ^-f(x)\). Setting \(y=Ax\), we deduce that for all \(h\in X\) and \(v\in Y\)

$$\begin{aligned} \langle x^*,h\rangle \le \varphi ^- (y;Ah)\le \varphi ^-(y,v) + K\Vert Ah-v\Vert , \end{aligned}$$

where K is the Lipschitz constant of \(\varphi \) near \({\overline{y}}\), and therefore, the global Lipschitz constant of the lower Dini derivative of \(\varphi \) at y:

$$\begin{aligned} \varphi ^-(y;v)=\liminf _{t\rightarrow 0}t^{-1}(\varphi (y+tv)- \varphi (y)). \end{aligned}$$

Set \(g(u,z)= K\Vert Au-z\Vert \). As \(y=Ax\), we see that \(g^-((x,y);(h,v))=K\Vert Ah-v\Vert \) for all \(h,\ v\). Note also that g is a convex continuous function. Hence, its lower directional derivative coincides with the usual directional derivative and the inequality above can be rewritten as

$$\begin{aligned} (x^*,0)\in \partial ^-(\varphi +g)(x,y). \end{aligned}$$

We also have the obvious equality for the subdifferential of g:

$$\begin{aligned} \partial g(u,y)=\{(A^*v^*,-v^*): v^*\in K\partial \Vert \cdot \Vert (Au-y) \}. \end{aligned}$$

Both \(\varphi \) and g are Lipschitz functions, so we can apply the fuzzy sum rule (see, e.g., [17], Proposition 4.33) and find, given \(\varepsilon >0\) and a weak\(^*\)-neighborhoods U and V of zeros in X and Y, some \(u\in X\), \(z_1,z_2\in Y\)\(\varepsilon \)-close, respectively, to x and y, and \(y^*\in \partial ^-\varphi (z_1)\), \((u^*,y_2^*)\in \partial g(u,y_2)\) such that the inclusions \(u^*=A^*v^*\in x^*+U\) and \(y^* - v^*\in V\) hold for some \(v^*\) with \(\Vert v^*\Vert \le K\) . As both \(\varphi \) and g are Lipschitz, we can be sure that actually \(u^*\in x^*+U\cap (\rho B)\) and \(y^*-v^*\in V\cap (\rho B)\) for a sufficiently large \(\rho \) that does not depend on the choice of U and V. In other words, \(A^*v^*=x^*+ w^*\) and \(y^*-v^*=z^*\) for some \(w^*\in U\) and \(z^*\in V\) whose norms do not exceed \(\rho \).

Assume now that \(x^*\in \partial _Gf({\overline{x}})\). Then there are \(x_m\) converging to \({\overline{x}}\) and \(x_m^*\in \partial f^-(x_m)\) weak\(^*\) converging to \(x^*\). Let now \(x_1,x_2,\ldots \) and \(y_1,y_2,\ldots \) be dense countable subsets of X and Y. Take \(U_m=\{u^*\in X^*:\; |\langle u^*,x_i\rangle |<m^{-1}, \; i=1,\ldots m\}\) and \(V_m=\{z^*:\; |\langle z^*,y_i\rangle | <m^{-1},\; i=1,\ldots m \}\). Let finally \(u_m,z_{1m},z_{2m}\) and \(u_m^*,y_{m}^*, v_m^*, w_m^*, z_m^*\) satisfy the above relations with \(x=x_m\), \(U=U_m\) and \(V=V_m\), that is, \(y_m=Ax_m\) and

$$\begin{aligned} \begin{array}{l} \Vert u_m-x_m\Vert<m^{-1},\quad \Vert z_{1m}- y_m\Vert< m^{-1},\quad \Vert z_{2m}- y_m\Vert < m^{-1};\\ u_m^*=A^*v_m^*,\quad w_m^*= u_m^*-x_m^*\in U_m;\\ y_m^*\in \partial ^-\varphi (z_{1m}),\quad (u_m^*,z_m^*)\in \partial g(u_m,z_{2m}),\quad y_m^*-v_m^*= z_m^*\in V_m. \end{array} \end{aligned}$$

Then \(u_m\rightarrow {\overline{x}}\), \(z_{im}\rightarrow {\overline{y}}\) and all sequences of the functionals, being bounded, can be assumed weak\(^*\) converging. Obviously, \(w_m^*\) and \(z_m^*\) converge to zero by the Banach–Steinhaus theorem. Therefore, \(u_m^*\) weak\(^*\) converges to \(x^*\); \(y_{m}^*\) and \(v_m^*\) weak\(^*\) converge to some \(y^*\). It follows that \(y^*\in \partial _G\varphi ({\overline{y}})\) and \(x^*=A^*v^*\) as claimed.

Remark 2.1

If A is an operator onto Y, the proposition remains valid if \(\varphi \) is just lower semicontinuous (and finite at \({\overline{x}}\)). Indeed, in this case, given \(x\in X\) and \(v\in Y\), there is a \(u\in X\) such that \(Au=v\) and \(\Vert x-u\Vert \le K\Vert A(x-u)\Vert \), where K is any number greater than the inverse of the Banach constant of A (the maximal radius of a ball in Y covered by the A-image of the unit ball in X). It follows that

$$\begin{aligned} d((x,\alpha ),\mathrm{epi}~f)\le K d((Ax,\alpha ),\mathrm{epi}~\varphi )\le K\big ( (d(y,\alpha ),\mathrm{epi}~\varphi )+ \Vert y- Ax\Vert \big ) \end{aligned}$$

for any \((x,y,\alpha )\in X\times Y\times \mathbb {R}\), and the result follows from Theorem 7.19 of [17].

Remark 2.2

The previous results remain valid for a function \(g=f\circ F\) with a continuously differentiable \(F: X\rightarrow Y\) if as A we take the derivative of F at the point of interest.

2.2 Lipschitz Properties of Set-Valued Mappings

We denote by \(F: Q\rightrightarrows P\) a set-valued mapping from Q to P. If both Q and P are topological spaces, then F is said to be a closed mapping if its graph is closed in the product topology. The expressions like “closed-valued mapping,” “convex-valued mapping,” etc., need no explanation.

Speaking about Lipschitz properties of maps, we of course assume that both the domain and range spaces are at least metric. Here we mainly deal with mappings between Euclidean spaces and shall use the corresponding notation and language starting with this point. The simplest and the most convenient Lipschitz property of a set-valued map to work with occurs when the mapping is Lipschitz with respect to the Hausdorff metric in the range space. This, however, is a very restrictive property, especially when the values of the mapping are unbounded sets.

A weaker property, that also has an advantage of being local, was introduced by Aubin: it is said that F is pseudo-Lipschitz, or has the Aubin property at \((\bar{x},\bar{y})\in \mathrm{Graph}~F\) if there are \(r>0\), \(R>0\) and \(\varepsilon >0\) such that

$$\begin{aligned} F(x)\cap B({\overline{y}},r)\subset F(x')+ R\Vert x-x'\Vert B,\quad \forall \; x,x'\in B({\overline{x}},\varepsilon ). \end{aligned}$$

We shall call R a Lipschitz constant of F at \((\bar{x},\bar{y})\). The following simple proposition offers a nice metric characterization of the Aubin property.

Proposition 2.6

Let \(F: \mathbb {R}^n\rightrightarrows \mathbb {R}^m\) and \((\bar{x},\bar{y})\in \mathrm{Graph}~F\). Then the following two statements are equivalent:

  1. (i)

    F has the Aubin property at \((\bar{x},\bar{y})\);

  2. (ii)

    there are \(r>0,\ R>0\) and \(\varepsilon >0\) such that the function \(x\rightarrow d(y,F(x))\) is R-Lipschitz in the \(\varepsilon \)-ball around \({\overline{x}}\) for any y with \(\Vert y-{\overline{y}}\Vert \le r\).

The implication \((ii)\rightarrow (i)\) is straightforward. The opposite implication is well known (see, e.g., [15], Exercise 9.37). Note also that the \(\varepsilon \) and R in both properties coincide, while the r in (ii) may be smaller.

We shall also use a noninfinitesimal version of the pseudo-Lipschitz property with a priory fixed constants r, R and \(\varepsilon \). The following fundamental result shows that a part of the mapping associated with the property admits lifting to a bounded-valued mapping which is Lipschitz with respect to the Hausdorff metric.

Proposition 2.7

Let \(F: \mathbb {R}^n\rightrightarrows \mathbb {R}^m\) be a set-valued mapping with closed graph. Let an \({\overline{x}}\) and a closed set \(C\subset F({\overline{x}})\) be given such that for some \(r>0\), \(R>0\), \(\varepsilon >0\) and \(\eta >0\) the relations

$$\begin{aligned} F(x)\cap (u+rB)\subset F(x')+ R\Vert x-x'\Vert B,\quad F(x)\cap B(u,(1-\eta )r)\ne \emptyset \end{aligned}$$

hold for any \(u\in C\) and \(x,x'\in B({\overline{x}},\varepsilon )\). Set

$$\begin{aligned} \Gamma (x)= & {} \{(\lambda ,y):\; \lambda \in [0,1],\; y=u+\lambda (z-u),\; u\in C,\; z\in F(x),\;\Vert z-u\Vert \\\le & {} (1-\lambda \eta )r\}. \end{aligned}$$

Then there is a \(c=c(\eta ,r)\) such that \(\Gamma \) is cR-Lipschitz with respect to the Hausdorff metric in the \(\varepsilon \)-neighborhood of \({\overline{x}}\).

This is a slight generalization of a Clarke’s result ([1], Lemma 1) in which C is a one point set. We omit the proof as it repeats almost word for word the original proof by Clarke.

Lipschitz continuity of \(d(y,F(\cdot ))\), on the other hand, offers a convenient way to compute normal cones to the graph of F which naturally appear in necessary optimality conditions in problems involving set-valued mappings.

Proposition 2.8

([9], Proposition 1) Let \(F: \mathbb {R}^n\rightrightarrows \mathbb {R}^n\) be a closed-valued mapping having the Aubin property at some \((\bar{x},\bar{y})\in \mathrm{Graph}~F\). Then the (limiting) normal cone to \(\mathrm{Graph}~F\) at \((\bar{x},\bar{y})\) is generated by the limiting subdifferential of the function \((x,y)\rightarrow d(y,F(x))\) at \((\bar{x},\bar{y})\).

The last result we have to mention offers a geometric characterization of the Aubin property.

Proposition 2.9

Let \(F: \mathbb {R}^n\rightrightarrows \mathbb {R}^m\) be a set-valued mapping with closed graph that has the Aubin property at \((\bar{x},\bar{y})\in \mathrm{Graph}~F\) with Lipschitz constant R. Let \((q,p)\in N(\mathrm{Graph}~F,(\bar{x},\bar{y}))\). Then \(\Vert q\Vert \le R\Vert p\Vert \).

This is a simple and well-known fact whose proof, however, does not seem to be explicitly available in the existing literature. The inequality is straightforward for Fréchet normal cones (or regular normals in terminology of [15]). The Fréchet normal cone to the graph of F at (xy) is defined as the collection of pairs (qp) such that \(\langle q,h\rangle -\langle p,v\rangle \le o(\Vert h\Vert +\Vert v\Vert )\) if h and v are sufficiently small and \(y+v\in F(x+h)\). By the Aubin property, given an h, there is a v satisfying \(\Vert v\Vert \le R\Vert h\Vert \). If now we take \(h=\lambda q\), then we get \(\lambda \Vert q\Vert ^2\le \langle p,v\rangle \le \lambda R \Vert p\Vert \Vert q\Vert +o(\Vert \lambda q\Vert )\). It remains to take into account that \(N(\mathrm{Graph}~F,(\bar{x},\bar{y}))\) is the outer limit of Fréchet normal cones at \((x,y)\in \mathrm{Graph}~F\), when \((x,y)\rightarrow (\bar{x},\bar{y})\).

2.3 Measurability

A set-valued mapping \(F:[0,T]\rightrightarrows \mathbb {R}^m\) is measurable if its graph belongs to the sigma-algebra generated by all products \(\varDelta \times Q\), where \(\varDelta \subset [0,T]\) is Lebesgue measurable and \(Q\subset \mathbb {R}^n\) is open. For closed-valued and open-valued mappings, a convenient equivalent definition can be given in terms of so-called Castaing representation: F is measurable if and only if there is a countable collection \((u_i(\cdot ))\) of measurable selections of F such that for almost every t the set \(\{u_i(t),i=1,2,\ldots \}\) is dense in F(t).

A set-valued mapping \(F: [0,T]\times \mathbb {R}^n\rightrightarrows \mathbb {R}^m\) is called measurable (in the standard sense) if the set-valued mapping \(t\rightarrow \mathrm{Graph}~F(t,\cdot )\) is measurable. An extended real-valued function f on \( [0.T]\times \mathbb {R}^n\) is measurable (in the standard sense) if so is the mapping \(\mathrm{epi}~f(t)=\{{t,\alpha }:\; \alpha \ge f(t) \}\).

The most important fact is that all basic operations used in analysis preserve measurability. For details, we refer to [15, 17].

2.4 Relaxation

Consider the functional \(\int _0^TL(t,x(t),\dot{x}(t)){\mathrm{d}}t\), assuming that the integrand L is continuous in x and L(tx(t), u(t)) is measurable whenever \(x(\cdot )\) is continuous and \(u(\cdot )\) is measurable.

Proposition 2.10

(relaxation theorem) Given an \(x(\cdot )\in W^{1,1}\) and summable \(\mathbb {R}^n\)-valued functions \(u_1(\cdot ),\ldots ,u_k(\cdot )\). Set \(u_0(t)=\dot{x}(t)\) and assume that there are \(\varepsilon >0\) and a summable c(t) such that for all \(i=0,\ldots ,k\)

$$\begin{aligned} \max _{\Vert x-x(t)\Vert \le \varepsilon }|L(t,x,u_i(t))|\le c(t),\quad \mathrm{a.e.}. \end{aligned}$$

Then there is a \(\delta >0\) such that for any \(\alpha _i\ge 0,\ i=1,\ldots k\), \(\sum \alpha _i<\delta \) there is a sequence of \(x_m(\cdot )\in W^{1,1}\) such that \(\dot{x}_m(t)\in \{\dot{x}(t),u_1(t),\ldots ,u_k(t) \}\) for almost every t, \(x_m(\cdot )\) weakly in \(W^{1,1}\) converge to

$$\begin{aligned} w(t) = x(t) +\sum _i\alpha _i\int _0^t(u_i(\tau )-\dot{x}(\tau )){\mathrm{d}}\tau \end{aligned}$$
(1)

and

$$\begin{aligned} \begin{aligned}&\displaystyle \limsup _{m\rightarrow \infty }\displaystyle \int _0^TL(t,x_m(t),\dot{x}_m(t) ){\mathrm{d}}t \le \displaystyle \int _0^TL(t,w(t),\dot{x}(t) ){\mathrm{d}}t\\&\quad +\displaystyle \sum _i\alpha _i \displaystyle \int _0^T\big (L(t,w(t),u_i(t))-L(t,w(t),\dot{x}(t)) \big ){\mathrm{d}}t. \end{aligned} \end{aligned}$$
(2)

This is a simplified version of Bogolyubov’s convexification theorem [20] (see also [4] for more details).

Proof

Take \(\delta < 1\) to guarantee that \(\delta \Vert u_i(\cdot )-\dot{x}(\cdot )\Vert _{L^1}<\varepsilon /2\) for all \(i=1,\ldots ,k\). Let \(\alpha _i\) and \(x(\cdot )\) satisfy the conditions with the chosen \(\delta \).

Now given an m, set \(\varDelta _j=[(j-1)/m,j/m]\), \(j=1,\ldots ,m\), and let \(\varDelta _{ji}\), \(i=1,\ldots ,k\) be subsets of \(\varDelta _j\) with Lebesgue measure \( \alpha _i/m\), respectively, such that \(\varDelta _{ji}\cap \varDelta _{ji'}=\emptyset \) if \(i\ne i'\). Define \(x_m(\cdot )\) by

$$\begin{aligned} x_m(0)= x(0);\qquad \dot{x}_m(t)=\left\{ \begin{array}{ll} u_i(t),&{}\quad \mathrm{if}\; t\in \varDelta _{ji}\;\mathrm{for \ some}\; j,i;\\ \dot{x}(t),&{}\quad \mathrm{otherwise}.\end{array}\right. \end{aligned}$$

It is clear that \(\dot{x}_m(\cdot )\) are uniformly integrable, so we may assume that \(x_m(\cdot )\) converge uniformly to some \(w(\cdot )\). We can rewrite the definition of \(\dot{x}_m\) as follows:

$$\begin{aligned} \dot{x}_m(t) = \left( 1-\sum _{i=1}^k\alpha _{im}(t)\right) \dot{x}(t) + \sum \alpha _{im}(t)u_i(t), \end{aligned}$$

where

$$\begin{aligned} \alpha _{im}(t)=\left\{ \begin{array}{ll} 1,&{}\quad \mathrm{if}\; t\in \varDelta _{ji},\;\mathrm{for \ some}\; j=1,\ldots ,m;\\ 0,&{}\quad \mathrm{otherwise}.\end{array}\right. \end{aligned}$$

Then

$$\begin{aligned} L(t,x_m(t),\dot{x}_m(t))= L(t,x_m(t),\dot{x}(t))&+\sum _{i=1}^k\alpha _{im}(t)\big (L(,x_m(t),u_i(t))\\&- L(t,x_m(t),\dot{x}(t))\big ). \end{aligned}$$

It is clear that every \(\alpha _{im}(\cdot )\) weak\(^*\)-converges in \(L^{\infty }\) (and hence weakly in \(L^1\)) to the function identically equal to \(\alpha _i\). The result is now immediate. \(\square \)

3 Statements of the Problems and Main Results

The principal object to be studied in the paper is the generalized Bolza problem

figure a

with \(\varphi \) being a function on the space C[0, T], not necessarily a traditional function of the end points (x(0), x(T)).

In what follows, we shall fix some \({\overline{x}}(\cdot )\in W^{1,1}\) which will be assumed a local minimizer of J in one or another sense. Here are the basic hypotheses on the functions \(\varphi \) and L:

(H\(_1\)) \(\varphi \) is Lipschitz in a neighborhood of \({\overline{x}}(\cdot )\) in C[0, T], that is, there are \(K>0\) and \(\overline{\varepsilon }>0\) such that

$$\begin{aligned} |\varphi (x(\cdot ))- \varphi (x'(\cdot ))|\le K\Vert x(\cdot )-x'(\cdot )\Vert ; \end{aligned}$$

if \(x(\cdot ), x'(\cdot )\) are \(\overline{\varepsilon }\)-close to \({\overline{x}}(\cdot )\) in C[0, T].

(H\(_2\)) L(txy) is lower semicontinuous in (xy) and measurable in the standard senseFootnote 1 and there are \(\overline{\varepsilon }>0\), measurable \(r(t)>0\), summable \(R(t)\ge 0\) and \(k(t)\ge 0\) and a measurable set-valued mapping \(D(\cdot ): [0,T]\rightrightarrows \mathbb {R}^n\) such that for almost every \(t\in [0,T]\)the function\(L(t,\cdot )\)is continuous on\(B({\overline{x}}(t),\overline{\varepsilon })\times D(t)\)and

$$\begin{aligned} \begin{array}{l} |L(t,x,y)-L(t,x',y)|\le R(t)\Vert x-x'\Vert ,\quad \forall x,x'\in B({\overline{x}}(t),\overline{\varepsilon }),\; y\in B(\dot{{\overline{x}}}(t),r(t));\\ |L(t,x,y)|\le k(t),\quad \forall x\in B({\overline{x}}(t),\overline{\varepsilon }),\; y\in B(\dot{{\overline{x}}}(t),\overline{\varepsilon }). \end{array} \end{aligned}$$
(3)

hold a.e. on [0, T].

Note that the conditions do not exclude the possibility that L is extended real-valued, in particular, that \(L(t,x,y)= \infty \) if \(y\not \in D(t)\), even if x is close to \({\overline{x}}(t)\). It should also be taken into account that \(\inf r(t)\) can be zero and L need not be Lipschitz with respect to the last argument.

Theorem 3.1

Assume (H\(_1\)) and (H\(_2\)). If \({\overline{x}}(\cdot )\) is a local minimizer of J on \(W^{1,1}\), then there are an \(\mathbb {R}^n\)-valued Radon measure \(\nu \in \partial _G\varphi ({\overline{x}}(\cdot ))\), an \(\mathbb {R}^n\)-valued function of bounded variations p(t), continuous from the left and a summable \(\mathbb {R}^n\)-valued function q(t) such that

$$\begin{aligned} p(t) = -\int _t^Tq(s) \mathrm{d}s - \int _t^T{\mathrm{d}}\nu (t),\qquad p(0)=0, \end{aligned}$$
(4)

and the following conditions are satisfied almost everywhere on [0, T]:

$$\begin{aligned} q(t)\in \mathrm{conv}~\{q: \; (q,p(t))\in \partial L(t,\cdot ,\cdot )({\overline{x}}(t),\dot{{\overline{x}}}(t))\} \end{aligned}$$
(5)

(the Euler inclusion) and

$$\begin{aligned} L(t,{\overline{x}}(t),u)- L(t,{\overline{x}}(t),\dot{{\overline{x}}}(t))-\langle p(t),u-\dot{{\overline{x}}}(t)\rangle \ge 0 \quad \mathrm{if}\; u\in D(t) \end{aligned}$$
(6)

(the Weierstrass condition).

An interesting (and perhaps the most interesting) case corresponds to \(D(t)\equiv \mathbb {R}^n\). It will be easy to see from the proof that in this case the Euler inclusion (5) is a necessary condition for a weak minimum in (P), that is, on a set of \(x(\cdot )\) that are \(W^{1,\infty }\)-close to \({\overline{x}}(\cdot )\). Looking ahead, we just say that the proof of this fact is noticeably simpler and need not any reference to the relaxation theorem.

We deduce from the theorem in this case that the Euler inclusion (5) and the Weierstrass condition (6) together give a necessary condition for a strong minimum in the classical sense, that is, when there is an \(\varepsilon >0\) such that \(J(x(\cdot ))\ge J({\overline{x}}(\cdot ))\) for all \(x(\cdot )\in W^{1,1}\) satisfying \(\Vert x(t)-{\overline{x}}(t)\Vert <\varepsilon ,\; \forall \ t\). Another point we wish to mention in this connection is that L(txy) have to satisfy the Lipschitz property w.r.t x only for y in a possibly small neighborhood of \(\dot{{\overline{x}}}(t)\).

Applying the theorem to the nonsmooth version of the standard Bolza problem:

figure b

(under the assumption that \(D(t)\equiv \mathbb {R}^n\)), we get the following extension of the classical necessary conditions.

Theorem 3.2

Assume that \(\ell \) is Lipschitz near \(({\overline{x}}(0),{\overline{x}}(T))\) and L satisfies (H\(_2\)) with \(D(t)\equiv \mathbb {R}^n\). If \({\overline{x}}(\cdot )\) is a weak local minimum in the problem, then there is a \(p(\cdot )\in W^{1,1}\) satisfying the Euler inclusion

$$\begin{aligned} \dot{p}(t) \in \mathrm{conv}~\{q:\; (q,p(t))\in \partial L(t,\cdot ,\cdot )({\overline{x}}(t),\dot{{\overline{x}}}(t) ) \} \end{aligned}$$
(7)

a.e. on [0, T] and the end points condition

$$\begin{aligned} (p(0),-p(T))\in \partial \ell (({\overline{x}}(0),{\overline{x}}(T))). \end{aligned}$$
(8)

If, moreover, \({\overline{x}}(\cdot )\) is a strong minimum, then for almost every t the Weierstrass condition (4) holds for all \(u\in \mathbb {R}^n\) with the same \(p(\cdot )\).

Proof

As \(\varphi (x(\cdot ))=\ell (x(0),x(T))\) is a function of terminal values of \(x(\cdot )\), it is an easy matter to see that \(\nu \in \partial _G\varphi (x(\cdot ))\) if and only if there are \(g,h\in \mathbb {R}^n\) such that \((g,h)\in \partial \ell (x(0),x (T))\) and the action of \(\nu \) on elements of C([0, T]) is defined by

$$\begin{aligned} \int _0^Tu(t)\nu ({\mathrm{d}}t)= \langle h,u(T)\rangle + \langle g,u(0)\rangle . \end{aligned}$$

Applying Theorem 3.1, we therefore find some \((g,h)\in \partial \ell (({\overline{x}}(0),{\overline{x}}(T)))\) such that for some q(t) and \(p(t)= -\int _t^T\nu ({\mathrm{d}}t) -\int _t^Tq(s){\mathrm{d}}s= -h -\int _t^Tq(s){\mathrm{d}}s\) (for \(t>0\)) we get (if we redefine p(t) at zero by continuity: \(p(0 )=-h+\int _0^Tq(t){\mathrm{d}}t\)) \(p(0)-g=0\), hence (8).

Let us now turn to the optimal control problem:

figure c

Here we assume that there are \({\overline{x}}(\cdot )\in W^{1,1}\) and \(\overline{\varepsilon }>0\) such that

  • (H\(_3\)) \(\ell \) is Lipschitz near \(({\overline{x}}(0),{\overline{x}}(T))\), \(S\subset \mathbb {R}^n\times \mathbb {R}^n\) is closed;

  • (H\(_4\)) g is upper semicontinuous on the set \(\{(t,x):\; t\in [0,T],\; \Vert x-{\overline{x}}(t)\Vert \le \overline{\varepsilon }\}\) and there is a \(K>0\) such that for \(x,x'\in B({\overline{x}}(t),\overline{\varepsilon })\)

    $$\begin{aligned} |g(t,x)- g(t,x')|\le K\Vert x-x'\Vert \quad \mathrm{a.e.\ on}\; [0,T]. \end{aligned}$$
  • (H\(_5\)) F is closed-valued and measurable in the standard sense and there are a measurable \({\overline{r}}(t)>0\) bounded away from zero, a summable \({\overline{R}}(t)\ge 0\) and an \(\eta \in ]0,1[\) such that the relations

    $$\begin{aligned}&F(t,x)\cap B(\dot{{\overline{x}}}(t)),{\overline{r}}(t)\subset F(t,x')+{\overline{R}}(t)\Vert x-x'\Vert B,\nonumber \\&F(t,x)\cap B(\dot{{\overline{x}}}(t),(1-\eta ){\overline{r}}(t))\ne \emptyset \end{aligned}$$
    (9)

    hold for all \( x,x'\in B({\overline{x}}(t),\overline{\varepsilon })\).

The first of the relations is the a nonlocal extension of Aubin’s pseudo-Lipschitz property. In one or another form, it was used in all results on the maximum principle for differential inclusions starting with [12]. The necessity of the second condition in (9) under weaker versions of the first was observed by Clarke in [1].

We need more notation to state the theorem. Let U(t) be the closure of the collection of all \(u\in F(t,{\overline{x}}(t))\) such that

figure d

for some \(r>0,\ R>0,\ \eta>0, \ \varepsilon >0\) (depending on u). Finally, set \(\varDelta (x(\cdot ))= \{t:\; g(t,x(t))=\max _tg(t,x(t)) \}\) and

$$\begin{aligned} \partial _C^{>}g(t,x)= \mathrm{conv}~\{\limsup \partial g(t_m,x_m):\; t_m\rightarrow t,\ x_m\rightarrow x, \ g(t_m,x_m)> g(t,x) \}. \end{aligned}$$

Theorem 3.3

We posit (H\(_3\))–(H\(_5\)) and assume that \({\overline{x}}(t)\in W^{1,1}\) is a local solution in (OC). Then there are \(\lambda \ge 0\), a nonnegative measure \(\mu \) on [0, T] supported on \(\varDelta ({\overline{x}}(\cdot ))\), a measurable \(\mathbb {R}^n\)-valued function q(t) satisfying \(\Vert q(t)\Vert \le R(t)\) a.e., an \(\mathbb {R}^n\)-valued function p(t) of bounded variation, continuous from the left, and a \(\mu \)-measurable selection \(\gamma (\cdot )\) of the set-valued mapping \(t\rightarrow {\partial }_C^{>}(t,x(t))\) such that \( {\mathrm{d}}p(t) = q(t){\mathrm{d}}t +\gamma (t)\mu ({\mathrm{d}}t)\) and the following four conditions are satisfied:

  1. (i)

    \(\lambda + \Vert p(\cdot )\Vert + \mu ([0,T])=1\) (nontriviality);

  2. (ii)

    \((p(0),-p(T)-\gamma (T)\mu (\{T\})\in \lambda \partial \ell ({\overline{x}}(0),{\overline{x}}(T)) + N(S,(x(0),x(T)))\) (transversality);

  3. (iii)

    \(q(t)\in \mathrm{conv}~\{q:\; (q,p(t))\in N(\mathrm{Graph}~F(t,\cdot ),({\overline{x}}(t),\dot{{\overline{x}}}(t)))\}\) a.e. on [0, T] (Euler–Lagrange inclusion);

  4. (iv)

    \(\langle p(t),y-\dot{{\overline{x}}}(t)\rangle \le 0\), \(\forall \; y\in U(t)\) a.e. on [0, T] (maximum principle).

By (H\(_5\)), \(F(t,\cdot )\) has the Aubin property at every \(u\in F(t,{\overline{x}}(t))\) with \(\Vert u-\dot{{\overline{x}}}(t)\Vert <{\overline{r}}(t)\). Thus, replacing in the statement U(t) by \(B(\dot{{\overline{x}}}(t),{\overline{r}}(t))\), we get an extension of Clarke’s “stratified maximum principle” to optimal control problems with state constraints. It seems to be appropriate at this point to add a few words concerning the evolution of the assumptions concerning Lipschitz properties of F. In [7, 8, 11], the mapping was assumed globally Lipschitz in the state variable with the Lipschitz constant being a summable function of the time variable. In [12], this property was weakened and replaced by a certain global version of the Aubin pseudo-Lipschitz property with a linear growth of the Lipschitz constant as a function of the radius of the ball on which the mapping is considered. This assumption (also applied in [9, 14]), although still sufficiently restrictive, made possible meaningful treatment of differential inclusions with unbounded right-hand sides. Finally, Clarke’s stratified maximum principle implied that arbitrary rate of growth works as well.

Remark 3.1

There are certain differences in formulations of the maximum principle for problems with state constraints in the literature. The above statement is structured along the lines of the maximum principle proved in [4] (Theorem 1 of $ 5.2). On the other hand, the statement of the maximum principle for (OC) proved in [2] look differently at the first glance. But it is not a difficult matter to notice that the functions p(t) here and q(t) in [2] coincide up to a delicate difference: the first is continuous from the left and may be discontinuous at the left end of the interval while the second is continuous from the right and may be discontinuous at the right end of the interval.

4 Proof of Theorem 3.1

Without loss of generality, we can assume in the proof that \({\overline{x}}(t)\equiv 0\).

  1. 1.

    Let \(u_i(\cdot )\), \(i=1,\ldots ,k\) be bounded measurable selections of D(t) a.e.. such that for some \(\varepsilon >0\) the functions \(\max _{\Vert x-{\overline{x}}(t)\Vert \le \varepsilon } L(t,x,u_i(t)) \) is summable. Then \(u_i(\cdot )\) satisfy the conditions of Proposition 2.10 with any \(x(\cdot )\in W^{1,1}\) close to \({\overline{x}}(\cdot )\). It follows immediately from the proposition that the vector \(({\overline{x}}(\cdot ),0,\ldots ,0 )\) is a weak local minimizer of the functional

    $$\begin{aligned} \hat{J}(x(\cdot ),\alpha _1,\ldots ,\alpha _k)= & {} \varphi (w(\cdot ))+ \displaystyle \int _0^TL(t,w(t),\dot{x}(t) ){\mathrm{d}}t\\&+\displaystyle \sum _i\alpha _i^+ \displaystyle \int _0^T\big (L(t,w(t),u_i(t))-L(t,w(t),\dot{x}(t)) \big ){\mathrm{d}}t, \end{aligned}$$

    with \(w(\cdot )\) defined by (1), subject to the constraint \(\Vert \dot{x}(t)-\dot{{\overline{x}}}(t)\Vert \le r(t)\) a.e.. (By saying that \(({\overline{x}}(\cdot ),0,\ldots ,0)\) is a weak minimizer of \(\hat{J}\), we mean that there is an \(\varepsilon >0\) such that \(\hat{J}((x(\cdot ),\alpha _1,\ldots ,\alpha _k))\ge \hat{J}(({\overline{x}}(\cdot ),0,\ldots ,0))\) whenever \(\Vert x(\cdot )\Vert _{1,\infty }<\varepsilon \) and \(0\le \alpha _i<\varepsilon \).) Consider the space \(Z=\mathbb {R}^n\times L^2\times L^2\times \mathbb {R}^k\) and let the operator \(\Lambda : Z\rightarrow W^{1,2}\) associate with every \((a,x(\cdot ),y(\cdot ),\alpha )\in Z\), where \(\alpha =(\alpha _1,\ldots ,\alpha _k)\), the function

    $$\begin{aligned} \Lambda (a,y(\cdot ),\alpha _1,\ldots ,\alpha _k)(t)= a+\int _0^t\Big (y(\tau )+\sum _i\alpha _i(u_i(\tau )-y(\tau ))\Big ){\mathrm{d}}\tau . \end{aligned}$$

    To simplify notation, we can assume without loss of generality that \(\overline{x}(t)\equiv 0\). Set

    $$\begin{aligned} \tilde{L}(t,x,y)= \left\{ \begin{array}{ll} L(t,x,y),&{}\quad \mathrm{if}\;\Vert x\Vert \le \overline{\varepsilon },\; \Vert y\Vert \le \min \{\overline{\varepsilon },r(t)\};\\ \infty ,&{} \quad \mathrm{otherwise};\end{array}\right. \end{aligned}$$
    $$\begin{aligned} g_{i\varepsilon }(t)= \max _{\Vert x\Vert ,\Vert y\Vert \le \varepsilon }(L(t,x.u_i(t))-L(t,x,y)); \end{aligned}$$

    and consider the following four functionals on Z (where of course \(z=(a,x(\cdot ),y(\cdot ),\alpha _1,\ldots ,\alpha _k)\) ):

    $$\begin{aligned} I_1(z)= & {} \varphi (\Lambda (a,y(\cdot ),\alpha _1,\ldots ,\alpha _k) );\\ I_2(z)= & {} \displaystyle \int _0^T \tilde{L}(t,x(t),y(t)){\mathrm{d}}t;\\ I_3(z)= & {} \displaystyle \int _0^TR(t)\Big \Vert x(t) - a-\displaystyle \int _0^t\Big (y(\tau )+\displaystyle \sum _i\alpha _i(u_i(\tau )-y(\tau )) \Big ){\mathrm{d}}\tau \Big \Vert {\mathrm{d}}t;\\ I_4(z)= & {} \displaystyle \sum _i\alpha _i\displaystyle \int _0^Tg_{i\varepsilon }(t){\mathrm{d}}t + K\displaystyle \sum _i\alpha _i^-, \end{aligned}$$

    and set \(I=I_1+I_2+I_3+I_4\). It is an easy matter to see that \(\hat{J}(0,\ldots ,0)= I(0.\ldots ,0)\) and (setting \(w(t) = a+\int _0^ty(\tau ){\mathrm{d}}\tau \))

    $$\begin{aligned} \hat{J}(w(\cdot ), \alpha _1,\ldots ,\alpha _k)\le I(a,x(\cdot ),y(\cdot ),\alpha _1,\ldots ,\alpha _k). \end{aligned}$$

    if \(\Vert a\Vert +\int _0^T\Vert y(t)\Vert {\mathrm{d}}t\Vert \le \overline{\varepsilon }\), \(\Vert x(t)\Vert \le \overline{\varepsilon }\) for all t, \(\Vert y(t)\Vert \le \min \{\overline{\varepsilon },r(t)\}\) a.e., \(\alpha _i\ge 0\) and K is sufficiently large. It follows that zero is a local minimum of I(z) in Z. Therefore,

    $$\begin{aligned} 0\in \partial _pI(0). \end{aligned}$$
  2. 2.

    Analysis of the inclusion is the next step in the proof. The functionals \(I_j\) are lower semicontinuous and uniformly lower semicontinuous near zero in Z since all \(I_j\) except \(I_2\) satisfy the Lipschitz condition. By Proposition 2.1 for any given \(\delta >0\), there are \(z_j=(a_j,x_j(\cdot ),y_j(\cdot ),\alpha _{1j},\ldots ,\alpha _{kj})\in Z\), and \(z_j^*=(b_j,x_j^*,y_j^*,\beta _1,\ldots ,\beta _k)\in Z^*\), \(j=1,2,3,4\) such that \(|I_j(z_j)-I_j(0)|<\delta \) and

    $$\begin{aligned}&(b_j,x_j^*,y_j^*,\beta _{1j},\ldots ,\beta _{kj})\in \partial _pI_j(z_j);\nonumber \\&\Vert a_j\Vert +\Vert x_j(\cdot )\Vert +\Vert y_j(\cdot )\Vert +\displaystyle \sum _{i=1}^k|\alpha _{ij}|<\delta ;\nonumber \\&\left\| \displaystyle \sum _{j=1}^4 z_j^*\right\| =\max \left\{ \left\| \displaystyle \sum _{j=1}^4 b_j\right\| ,\left\| \displaystyle \sum _{j=1}^4 x_j^*\right\| ,\left\| \displaystyle \sum _{j=1}^4 y_j^*\right\| ,\left| \displaystyle \sum _{j=1}^4\beta _{1j}\right| ,\ldots ,\left| \sum _{j=1}^4\beta _{kj}\right| \right\} <\delta \nonumber \\ \end{aligned}$$
    (10)

    Finding estimates for \(\partial _pI_j\) does not require much effort.

    • Let \((b,x^*,y^*,\beta _1,\ldots ,\beta _k)\in \partial _GI_1(z)\). Then \(x^*=0\) as \(I_1\) does not depend on \(x(\cdot )\), and if \(w(\cdot )= \Lambda (z)\), then there is a measure \(\nu \in \partial _G\varphi (w(\cdot ))\) such that

      $$\begin{aligned} b= & {} \int _0^T\nu ({\mathrm{d}}t),\quad \langle y^*,v(\cdot )\rangle = (1-\sum \alpha _i)\int _0^T\langle \nu (t),v(t)\rangle {\mathrm{d}}t,\\ \beta _i= & {} \int _0^T\langle \nu (t),u_i(t)-y(t)\rangle {\mathrm{d}}t, \end{aligned}$$

      where we have set \(\nu (t)= \int _t^T\nu ({\mathrm{d}}t)\). Indeed, \(I_1\) can be viewed as a composition of \(\Lambda \) and the restriction of \(\varphi \) to \(W^{1,2}\). Denote for a moment this restriction by \(\tilde{\varphi }\). \(\Lambda \) is a smooth mapping and its derivative at zero (and hence at all nearby points) is onto. By Proposition 2.4, \(\partial _G(z)= \Lambda '(z)^*(\partial _G\tilde{\varphi })(z)\). On the other hand, \(\tilde{\varphi }\) is the composition of embedding \(W^{1,2}\) into C([0, T]) and \(\varphi \). Therefore, by Proposition 2.5\(\partial _G\tilde{\varphi }(x(\cdot ))\) is contained in the set of restrictions of elements of \(\partial _G\varphi (x(\cdot ))\) to \(W^{1,2}\times \mathbb {R}^k\). It remains to recall that the proximal subdifferential is contained in the G-subdifferential.

    • If \((b,x^*,y^*,\beta _1,\ldots ,\beta _k)\in \partial _pI_2(z)\), then \(b=0\), all \(\beta _i=0\), and there are \(\lambda (\cdot )\) and \(\mu (\cdot )\) belonging to \(L^2\) such that \((\lambda (t),\mu (t))\in \partial _p\tilde{L}(t,\cdot )(x(t),y(t))\) almost everywhere and

      $$\begin{aligned} \langle x_2^*, h(t)\rangle + \langle y_2^*,v(t)\rangle =\int _0^T(\langle \lambda (t),h(t)\rangle +\langle \mu (t),v(t)\rangle ){\mathrm{d}}t \end{aligned}$$

      for all \(h(\cdot )\) and \(v(\cdot )\) in \(L^2\). This is immediate from Proposition 2.2.

    • If \((b,x^*,y^*,\beta _1,\ldots ,\beta _k)\in \partial _pI_3(z)\), then there is a \(\xi (\cdot )\in L^2\) with \(\Vert \xi (t)\Vert \le R(t)\) a.e. such that

      $$\begin{aligned} b= & {} - \displaystyle \int _0^T\xi (t){\mathrm{d}}t,\quad \langle x^*,h(\cdot )\rangle =\displaystyle \int _0^T\langle \xi (t),h(t)\rangle {\mathrm{d}}t,\\ \langle y^*,v(\cdot )\rangle= & {} -\left( 1-\sum _i\alpha _i\right) \displaystyle \int _0^T\langle \eta (t),v(t)\rangle {\mathrm{d}}t,\quad \beta _i=-\displaystyle \int _0^T\langle \eta (t),u_i(t)-y(t)\rangle {\mathrm{d}}t, \end{aligned}$$

      where \(\eta (t) = -\int _t^T\xi (t){\mathrm{d}}t\). Indeed, \(I_3\) is a composition of a convex lsc functional \(\int _0^TR(t)\Vert x(t)\Vert {\mathrm{d}}t\) and a smooth mapping from Z into \(L^2\). For such mappings, all subdifferentials coincide and are equal to the composition of the derivative of the inner mapping and the convex subdifferential of the outer function.

    • if \((b,x^*,y^*,\beta _1,\ldots ,\beta _k)\in \partial _pI_4(z)\), then there are \(\rho _i\in [-K,0]\) such that

      $$\begin{aligned} b=0,\quad x^*=0,\quad y^*= 0,\quad \beta _i= \int _0^Tg_{i\varepsilon }(t){\mathrm{d}}t + \rho _i. \end{aligned}$$

      Taking (10) into account, we conclude that for some \(z_j ,\ j=1,2,3,4,\) satisfying the second inequality in (10) there are a regular \(\mathbb {R}^n\)-valued Radon measure \(\nu \in \partial _G\varphi (z_1)\), measurable \((\lambda (t),\mu (t))\in \partial _p\tilde{L}(t,\cdot ,\cdot )(x_2(t),y_2(t))\) and \(\xi (t)\) satisfying \(\Vert \xi (t)\Vert \le R(t)\) and \(\rho _{i}\in [-K,0]\) such that

      $$\begin{aligned} \begin{aligned}&\left\| \displaystyle \int _0^T(\nu ({\mathrm{d}}t) - \xi (t){\mathrm{d}}t)\right\|<\delta ; \\&\displaystyle \int _0^T\left\| \lambda (t)+\xi (t)\right\| {\mathrm{d}}t<\delta ;\\&\displaystyle \int _0^T\Big \Vert (1-\sum _i\alpha _{1i})\nu (t)+\mu (t)-(1-\sum _i\alpha _{3i})\Big (\displaystyle \int _t^T \xi (\tau ){\mathrm{d}}\tau \Big ) \Big \Vert {\mathrm{d}}t<\delta ;\\&\left| \displaystyle \int _0^T\left( \left( 1-\sum _i\alpha _{1i}\right) \langle \nu (t),u_i(t)-y_1(t)\rangle \right. \right. \\&\left. \left. \qquad -\left( 1-\sum _i\alpha _{3i}\right) \langle \eta (t),u_i(t)-y_3(t)\rangle + \,g_{i\varepsilon }(t)\right) {\mathrm{d}}t+\rho _i\right| <\delta ,\quad i=1,\ldots ,k. \end{aligned} \end{aligned}$$
      (11)
  3. 3.

    Taking \(\delta _m\rightarrow 0\), we shall find sequences of \(z_{jm}\) converging to zero as \(m\rightarrow \infty \) and of \(\nu _m\), \((\lambda _m(\cdot ),\mu _m(\cdot ))\), \(\xi _m(\cdot )\) and \(\rho _{im}\in [-K,0]\), \(i=1,\ldots ,k\) such that

    • \(\nu _m\in \partial _G(\varphi (z_{1m}))\);

    • \((\lambda _m(t),\mu _m(t))\in \partial _p\tilde{L}(t,x_{2m}(t),y_{2m}(t))\);

    • \(\Vert \xi _m(t)\Vert \le R(t)\) almost everywhere;

    • for every m (11) holds with \(\delta = \delta _m\), \(\nu =\nu _m\) etc.

    Setting \(q_m(t) = -\xi _m(t)\) and \(p_m(t)= -\nu _m(t) -\int _{t}^{T}q_m(\tau ){\mathrm{d}}\tau \), we rewrite (11) with some \(\gamma _m\rightarrow 0\) and \(y_m(\cdot )\rightarrow 0\) as follows:

    $$\begin{aligned} \begin{aligned}&\Vert p_m(0)\Vert<\gamma _m;\\&\displaystyle \int _0^T\Vert \lambda _m (t) - q_m(t)\Vert {\mathrm{d}}t<\gamma _m;\\&\displaystyle \int _0^T\Vert \mu _m(t) - p_m(t)\Vert {\mathrm{d}}t<\gamma _m;\\&\Big |\displaystyle \int _0^T(-\langle p_m(t) ,u_i(t)-y_m(t)\rangle + g_{i\varepsilon }(t)){\mathrm{d}}t+\rho _{im}\Big |<\gamma _m,\quad i=1,\ldots ,k. \end{aligned} \end{aligned}$$
    (12)

    As \(\varphi \) is Lipschitz near zero, total variations of \(\nu _m\) are uniformly bounded and we may assume that \(\nu _m\) weak\(^*\) converges to some \(\nu \) such that \(\nu \in \partial _G\varphi (0)\). The latter means, in particular, that \(\nu _m(t)\) converge to \(\nu (t)\) at every point of continuity of the latter, that is, everywhere except maybe countably many points. Finally, as all \(q_m(\cdot )\) are bounded by the same summable function R(t), this sequence is relatively compact in the weak topology of \(L^1\). By the Eberlein–Smulian theorem, we can assume that the sequence weak converges to a certain q(t). Hence, the integrals \(\int _t^Tq_m(s){\mathrm{d}}s\) uniformly converge to \(\int _t^Tq(s){\mathrm{d}}s\), total variations of \(p_m(\cdot )\) are uniformly bounded, and \(p_m(t)\) converge to

    $$\begin{aligned} p(t) = -\int _t^Tq(s){\mathrm{d}}s - \nu (t), \end{aligned}$$

    at every t at which \(\nu (t)\) is continuous. The second and the third relations in (12) imply that the distance from \((q_m(t),p_m(t))\) to \(\partial _pL(t,\cdot )(x_m(t),y_m(t))\) goes to zero for almost every t. On the other hand, by definition of the limiting subdifferential, almost everywhere on [0, T]

    $$\begin{aligned} \limsup _{m\rightarrow \infty }\partial _p L(t,\cdot )(x_m(t),y_m(t)) \subset \partial L(t,\cdot )(0,0). \end{aligned}$$

    By the Mazur theorem a certain sequence, \(\tilde{q}_m(\cdot )\) of convex combinations of \(q_m(\cdot )\) converges to \(q(\cdot )\) almost everywhere. Hence the distance from \((\tilde{q}_m(t),p_m(t))\) to the set \(\{(\tilde{q},p): \tilde{q}\in \mathrm{conv}~\{q: \; (q,p)\in \partial L(t,\cdot )(0,0) \} \}\) goes to zero as \(m\rightarrow \infty \). The set \(\partial L(t,\cdot )(0,0)\) is closed and its projection to the q-space is bounded as \(L(t,\cdot ,y)\) is Lipschitz near zero. Therefore, for any p the set \(\mathrm{conv}~\{q: \; (q,p)\in \partial L(t,\cdot )(0,0) \}\) is also closed and we conclude that (q(t), p(t)) belongs to this set for almost every t. The equality \(p(0)=0\) follows from the first relation in (12). This concludes the proof of (4) and (5). Finally, as sequences \((\rho _{im})\) are uniformly bounded, we can assume that each of them converges to some \(\rho _i\le 0\). It follows from the last relation in (12) that

    $$\begin{aligned} \int _0^T(g_{i\varepsilon }(t) -\langle p(t),u_i(t)\rangle ){\mathrm{d}}t=-\rho _i\ge 0. \end{aligned}$$

    This is true for any \(\varepsilon >0\), and when \(\varepsilon \rightarrow 0\) the functions \(g_{i\varepsilon }(t)\) converge decreasingly to \(L(t,0,u_i(t))-L(t,0,0)\) and we get eventually that

    $$\begin{aligned} \int _0^T\big (L(t,0,u_i(t)) - L(t,0,0)-\langle p(t),u_i(t)\rangle \big ){\mathrm{d}}t\ge 0,\; i=1,\ldots ,k. \end{aligned}$$
    (13)
  4. 4.

    We can now conclude the proof. First we note that there is a countable collection \({{\mathcal {U}}}\) of bounded measurable selection of \(D(\cdot )\) such that for every \(u(\cdot )\in {{\mathcal {U}}}\) the function \(\max _{\Vert x\Vert \le \overline{\varepsilon }}L(t,x,u(t))\) is summable and for almost every t the set \(\{u(t): u(\cdot )\in {{\mathcal {U}}}\}\) is dense in D(t). Indeed, as \(D(\cdot )\) is a measurable set-valued mapping, there is a countable collection \(\{v_i(\cdot ),\; i=1,2,\ldots \}\) of measurable selections of \(D(\cdot )\) whose values for almost every t form a dense subset of D(t). Take \(\varepsilon =\overline{\varepsilon }/2\). Then \(\theta _i(t) = \max _{\Vert x\Vert \le \varepsilon }L(t,x,v_i(t))\) is finite-valued by (H\(_2\)). Hence, for every \(j=1,2,\ldots \) there is a subset \(\varDelta _{ij}\) with measure not smaller than \(T-j^{-1}\) on which \(v_i\) is bounded and \(\theta _i(\cdot )\) is summable. Set

    $$\begin{aligned} u_{ij}(t)=\left\{ \begin{array}{ll} v_i(t),&{}\quad \mathrm{if}\; t\in \varDelta _{ij};\\ 0,&{} \quad \mathrm{otherwise}.\end{array}\right. \end{aligned}$$

    Then \({{\mathcal {U}}}=\{u_{ij}:\; i,j=1,2,\ldots \}\) is a required set.

As we have seen, for any finite collection U of \(u(\cdot )\in L^1\) with \(u(t)\in D(t)\) a.e. satisfying the conditions of Proposition 2.10 the set \(\Lambda (U)\) of triples \((p(\cdot ),q(\cdot ),\nu )\) satisfying (4), (5) and such that (13) holds for \(u(\cdot )\in U\) is nonempty. As \(\Vert q(t)\Vert \le R(t)\) a.e. and the total variations of \(\nu \) are bounded by the Lipschitz constant of \(\varphi \), the set \(\Lambda \) is compact in the product of the weak\(^*\)-topologies for functions of bounded variations (that is \(p(\cdot )\) and \(\nu \)) and weak \(L^1\)-topology (for \(q(\cdot )\)). It is also clear that \(\Lambda (U')\subset \Lambda (U)\) if \(U\subset U'\). Therefore, the intersection of all \(\Lambda (U)\), \(U\subset {{\mathcal {U}}}\) is nonempty. It is also clear that (13) holds for restrictions of \(u(\cdot )\) to any measurable subsets of [0, T] (by which we mean the function equals to u(t) on the subset and zero outside).

The standard measurable selection arguments now allow to deduce that this may happen only if for almost every t the inequality \(L(t,0,u) - L(t,0,0)-\langle p(t),u(t)\rangle \ge 0\) holds for any \(u(\cdot )\in {{\mathcal {U}}}\) almost everywhere. This in turn implies (6) since the values of u(t) when \(u(\cdot )\) runs through \({{\mathcal {U}}}\) and \(L(t,{\overline{x}}(t),\cdot )\) are continuous on D(t). This completes the proof of the theorem.

5 Optimal Control: Proof of Theorem 3.3

As was mentioned in introduction, reduction of the optimal control problem (OC) to the generalized Bolza problem (P) is based on a certain optimality alternative. Here is its statement.

Proposition 5.1

(Optimality alternative [17], Theorem 7,39) Let X be a complete metric space and M a closed subset of X. Let also f be a function on X that attains a local minimum on M at some \({\overline{x}}\in M\). If f is Lipschitz near \({\overline{x}}\), then for any lsc nonnegative function \(\varphi \) on X equal to zero at \({\overline{x}}\) the following alternative holds: either there is a \(K>0\) such that \(d(x,M)\le K\varphi (x)\) for all x of a neighborhood of \({\overline{x}}\), so that \(f+K_0\varphi \) has a local minimum at \({\overline{x}}\) for some \(K_0>0\) (not smaller than K times the Lipschitz constant of f), or there is a sequence \((x_n)\rightarrow \overline{x}\) of elements of \(X\backslash M\) such that for each n the inequality \(\varphi (x) + n^{-1}d(x,x_n)\ge \varphi (x_n)\) holds for all \(x\in X\).

The very fact, which is a simple consequence of Ekeland’s principle, was already used in [9]. The nice feature of the approach based on the optimality alternative is that actually there is no need to verify whether the distance estimate is satisfied or not.

We shall also need the following well-known technical fact. Set for \(x(\cdot )\in C[0,T]\)

$$\begin{aligned} \eta (x(\cdot ))=\max _{t\in [0,T]}g(t,x(t)),\qquad \varDelta (x(\cdot ))=\{t\in [0,T]:\; g(t,x(t))=\eta (x(\cdot )) \}, \end{aligned}$$

and let \(\bar{\partial }_Cg(t,x)\) be the convex hull of the collection of \(u\in \mathbb {R}^n\) for which there are sequences \(t_m\rightarrow t\), \(x_m\rightarrow x\) and \(u_m\rightarrow u\) such that \(u_m\in \partial g(t_m,\cdot )(x_m)\).

Proposition 5.2

([19], Theorem 2.8.2; [21], Example 2.5.2) Assume (H\(_4\)). Then for any \(\nu \in \partial _C\eta ({\overline{x}}(\cdot ))\) there are a probability measure \(\mu \) supported on the set \(\varDelta ({\overline{x}}(\cdot ))\) and a \(\mu \)-measurable selection of the set-valued mapping \(t\rightarrow \bar{\partial }_Cg(t,\cdot )({\overline{x}}(\cdot ))\) such that \(\nu ({\mathrm{d}}t) = \gamma (t)\mu ({\mathrm{d}}t)\).

Recall that in the statement of the theorem we use a more precise object, namely, \(\partial _C^>\). It is an easy matter to see that

$$\begin{aligned} \partial _C^{>}g(t,x)= \mathrm{conv}~\{\limsup \bar{\partial }_Cg(t_m,x_m):\; t_m\rightarrow t,\ x_m\rightarrow x, \ g(t_m,x_m)> g(t,x) \}. \end{aligned}$$

So let us pass to the proof of the theorem. Again, we can assume that \({\overline{x}}(t)\equiv 0\). We also assume that \(\ell ({\overline{x}}(0),{\overline{x}}(T))=0\). Then (OC) can be equivalently rewritten as

$$\begin{aligned}&\mathrm{minimize}\quad \varphi (x(\cdot ))= \max \{\ell (x(0),x(T)),\displaystyle \max _t g(t,x(t)) \}, \\&\mathrm{s.t,}\; \dot{x}\in F(t,x),\; (x(0),x(T))\in S. \end{aligned}$$

We shall work with this form of (OC) in the proof. Finally, let V be a neighborhood of zero in \(W^{1,1}\) such that \(\varphi (x(\cdot ))\ge 0=\varphi (0)\) for all feasible \(x(\cdot )\in V\).

  1. 1.

    We shall first prove the theorem under the assumption that\(d(y,F(t,\cdot ))\)is\({\overline{R}}(t)\)-Lipschitz for all\(y\in \mathbb {R}^n\) (which is the case when \(F(t,\cdot )\) if \({\overline{R}}(t)\)-Lipschitz in the Hausdorff metric). Set

    $$\begin{aligned} \psi (x(\cdot ))= d((x(0),x(T)),S)+\int _0^Td(\dot{x}(t),F(t,x(t))){\mathrm{d}}t. \end{aligned}$$

    Clearly \(\psi \) is nonnegative and \(\psi (0)=0\), so we can apply the optimality alternative. Denote by M the feasible set in (OC): \(M=\{x(\cdot )\in W^{1,1}:\; \psi (x(\cdot ))=0 \}\). Then either there is a \(K>0\) such that \(d(x(\cdot ),M)\le K\psi (x(\cdot ))\) for all \(x(\cdot )\) of a neighborhood of zero, so that there is a \(K_0\) such that \(\varphi +K_0\psi \) attains an unconditional local minimum at zero (regular case), or there are \(x_m(\cdot )\in W^{1,1}\), \(m=1,2,\ldots \) converging to zero and such that \(\psi (x_m(\cdot ))>0\) and

    $$\begin{aligned}&\psi (x(\cdot )) +m^{-1}(\Vert x(0)-x_m(0)\Vert +\int _0^T\Vert \dot{x}(t)-\dot{x}_m(t)\Vert {\mathrm{d}}t)>\psi (x_m(\cdot )),\\&\quad \forall \; x(\cdot )\ne x_m(\cdot ). \end{aligned}$$

    (singular case). In the regular case, we shall actually consider the problem with slightly different cost functions:

    $$\begin{aligned} \varphi _{m}(x(\cdot ))= \max \{\ell (x(0),x(T))+m^{-2},\displaystyle \max _t g(t,x(t)) \},\quad m=1,2,\ldots . \end{aligned}$$

    Then \(0=\varphi (0)\le \varphi (x(\cdot ))\le \varphi _m(x(\cdot ))\le \varphi (x(\cdot ))+ m^{-2}\) for all feasible \(x(\cdot )\in V\). If m is so big that V contains the \(m^{-1}\) ball around zero, then by Ekeland’s principle there is a \(x_{m}(\cdot )\in W^{1,1}\) feasible in (OC) and such that \(\Vert x_{m}(\cdot )\Vert _{1,1}\le m^{-1}\) and

    $$\begin{aligned} \varphi _{m}(x(\cdot )) + m^{-1}\Vert x(\cdot )- x_{m}(\cdot )\Vert _{1,1}\ge \varphi _{m}(x_{m}(\cdot ))=a_{m}\quad \end{aligned}$$

    for all feasible \(x(\cdot )\in V\). Clearly, \(a_{m}>0\) for all sufficiently large m. Otherwise, \(\ell (x_{m}(0),x_{m}(T))\) would be strictly smaller than zero and \(g(t,x_{m}(t))\le 0\) for all t, which contradicts to our assumption that the minimal value of the cost function on V in the original formulation of (OC) is zero. Since \(x_{m}(\cdot )\rightarrow 0\) as \(m\rightarrow \infty \), the inequality \(d(x(\cdot ),M)\le K\psi (x(\cdot ))\) holds for \(x(\cdot )\) of a neighborhood of \(x_{m}(\cdot )\). The Lipschitz constants of \(\varphi \) and \(\varphi _m\) coincide; hence, \(x_{m}(\cdot )\) is an unconditional local minimum of \(\varphi _{m}+K_0\psi \). Summarizing, we conclude that there are \(\lambda _0\in \{0,1\}\) (\(\lambda _0=1\) in the regular case and \(\lambda _0=0\) otherwise) such that for sufficiently large \(m>0\) the functional

    $$\begin{aligned} J_{m}(x(\cdot ))= & {} \lambda _0\varphi _{m}(\cdot )+K_0\Big (\displaystyle \int _0^Td(\dot{x}(t), F(t,x(t))){\mathrm{d}}t+ d((x(0),x(T)),S) \Big )\\&+\, m^{-1}\Big (\Vert x(0)-x_{m}(0)\Vert +\displaystyle \int _0^T\Vert \dot{x}(t)-\dot{x}_{m}(t)\Vert {\mathrm{d}}t \Big ) \end{aligned}$$

    attains a local minimum at some \(x_{m}(\cdot )\) (feasible in the regular case and not belonging to M in the singular case) with \(\Vert x_{m}(\cdot )\Vert _{1,1}<m^{-1}\). In either case, it is an easy matter to see that \(J_m\) satisfies the hypotheses (H\(_1\)) and (H\(_2\)) and we can apply Theorem 3.1. Let us start with the regular case\(\lambda _0=1\). Note first that as follows from Propositions 2.3 and 5.2 for any \(\nu \) belonging to the G-subdifferential of \(\varphi _m\) at \(x(\cdot )\) there are \(\lambda \in [0,1]\), a positive measure \(\mu \) with \(\mu ([0,T])=1-\lambda \) supported on the set \(\varDelta _m\) where \(g(t,x_m(t))=\varphi _m(x_m(\cdot ))\), a \(\mu \)-measurable selection \(\gamma (t)\) of the set-valued mapping \(\bar{\partial }_Cg(t,\cdot )(x(t))\) and a pair of vectors gh in the subdifferential of \(\lambda \ell (\cdot )+ d(\cdot ,S)\) at (x(0), x(T)) such that for any continuous \(\mathbb {R}^n\)-valued u(t)

    $$\begin{aligned} \int _0^Tu(t)\nu ({\mathrm{d}}t)=\lambda (\langle h,u(T)\rangle +\langle g,u(0)\rangle )+\int _0^T\langle u(t),\gamma (t)\rangle \mu ({\mathrm{d}}t). \end{aligned}$$

    Thus, applying Theorem 3.1 to \(J_{m}\) and \(x_{m}(\cdot )\) we shall find some \(\lambda _m\in [0,1],g_m,h_m,\mu _m,\gamma _m(\cdot )\), a function \(p_{m}(\cdot )\) of bounded variation, a summable \(q_{m}(\cdot )\) satisfying \(\Vert q_m(t)\Vert \le {\overline{R}}(t)\) a.e. such that \((g_m,h_m)\in \lambda _m\partial (\lambda _0\ell +K_0d(\cdot ,S))(x_m(0),x_m(T))\), \(\mu _m\) is a positive measure with \(\mu _m([0,T])=1-\lambda _m\) supported on the \(\varDelta (x_m(\cdot ))\) on which \(g(t,x_m(t))\) attains maximum , \(\gamma _m(\cdot )\in {\bar{\partial }}_Cg(t,\cdot )(x_m(t))\) and the relations

    $$\begin{aligned} \begin{aligned}&q_m(t)\in \mathrm{conv}~\{q:\; (q,p_m(t))\in \partial L_m(t,\cdot ,\cdot )\};\\&p_m(t) = -h_m-\displaystyle \int _t^Tq_m(s){\mathrm{d}}s-\displaystyle \int _t^T\gamma _m(s)d\mu _m(s),\quad p_m(0)= g_m;\\&\langle p_m(t),u-\dot{x}_m(t)\rangle \ge 0, \quad \mathrm{for\ all }\; u\in F(t,x_m(t)) \end{aligned} \end{aligned}$$
    (14)

    hold for almost every t. Here we have set \(L_m(t,x,y)= K_0d(y,F(t,x))+m^{-1}\Vert y-\dot{x}_m(t)\Vert \). We may assume that \(\lambda _m\) and \((g_m,h_m)\) converge to some \(\lambda \in [0,1]\) and \((h,g)\in \lambda \partial (\ell +d(\cdot ,S))(0,0)\). Furthermore, since g(tx(t)) is upper semicontinuous, the sets \(\varDelta (x_{m})\) are closed and the excess of \(\varDelta (x_{m})\) over \(\varDelta (0)\) goes to zero as \(x_m(\cdot )\) goes to zero uniformly. Therefore, \(\mu _{m}\) weak\(^*\) converge (along a subsequence of m) to a nonnegative measure \(\mu \) supported on \(\varDelta (0)\) and such that \(\mu ([0,T])=1-\lambda \), whence (i). By (H\(_4\)) \(\Vert \gamma _{m}(t)\Vert \le K\), so the sequence \((\gamma _m(\cdot ))\) is weakly compact in \(L^1\) which implies (in view of upper semicontinuity of generalized gradients of a Lipschitz function) that for almost any t the value of any weak limit point of this set at t is contained in \(\bar{\partial }_Cg(t,\cdot )(0)\). If \(\mu _m\ne 0\), we necessarily have \(\max _tg(t,x_{m}(t))=a_{m}>0\), whence the appearance of \(\partial _C^{>}g(t,\cdot )(0)\) at the limit. The sequence \((q_m(\cdot ))\) is weakly compact in \(L^1\) (as \(q_m(\cdot )\) are bounded by a summable function) and we can assume that it weakly converges to some \(q(\cdot )\) in which case \(p_m(\cdot )\) converge almost everywhere to

    $$\begin{aligned} p(t)= -h -\int _t^Tq(s){\mathrm{d}}s -\int _t^T\gamma (s)\mu ({\mathrm{d}}s) \end{aligned}$$

    with \(p(0)=g\). We have \(p(T)= -h-\gamma (T)\mu (\{T\})\) and (ii) follows from the second relation in (14) (recall that the normal cone to a set is generated by the subdifferential of the distance function to the set at the same point). As \(\Vert \dot{x}_m(t)\Vert \rightarrow 0\) almost everywhere, the last relation in (14) implies (iv). Finally, a certain sequence of convex combinations of \(q_m(\cdot )\) converges to \(q(\cdot )\) almost everywhere, and as follows from the first relation in (13), \(q(t)\in \mathrm{conv}~\{q:\; (q,p(t))\in \partial L(t,\cdot ,\cdot )(0,0) \}\), with \(L(t,x,y)=K_0d(y,F(t,x))\), whence (iii). This completes the proof in the regular case. In the singular case, when \(\lambda _0=0\), the same arguments work in substantially simpler situation as the off-integral term reduces to \(d((x(0),x(t)),S)+m^{-1}\Vert x(0)-x_m(0)\Vert \). The only difference is the proof of (i). In the singular case, \(\psi (x_m(\cdot ))>0\). This means that for any m either \(d((x_m(0),x_m(T)),S)>0\) in which case \(\max \{\Vert g_m\Vert ,\Vert h_m\Vert \}=1\), or \(d(\dot{x}_m(t),F(t,x_m(t)))>0\) on the set of positive measure which means that \(\Vert p_m(t)\Vert \ge 1-m^{-1}\) at least at one point. Thus, in either case \(\Vert p_m(t)\Vert \ge 1-m^{-1}\) at least at one point. On the other hand, as \(\mu _m=0\) in the singular case, \(p_m(\cdot )\) are continuous functions converging uniformly, so that \(p(\cdot )\ne 0\). This completes the proof of the theorem for the case when \(d(y,F(t,\cdot ))\) is \({\overline{R}}(t)\)-Lipschitz for any y.

  2. 2.

    Let us turn to the proof under the stated conditions. First we apply Proposition 2.7 to \(F=F(t,\cdot )\), \(C=\{0\}\), \(r={\overline{r}}(t)\), \(R={\overline{R}}(t)\), \(\varepsilon =\overline{\varepsilon }\) and \(\eta \in [\bar{\eta },1)\). Let \(\Gamma _0(t,x)\) be the corresponding mapping into \(\mathbb {R}\times \mathbb {R}^n\). It is obviously measurable. By (H\(_5\)), \({\overline{r}}(t)\) is bounded away from zero, that is, there is an \({\overline{r}}>0\) such that \({\overline{r}}(t)\ge {\overline{r}}\) for almost all t. Fix some positive \(r<{\overline{r}}\), \(R>0\), \(\eta \in [\bar{\eta },1]\) and \(\varepsilon \in (0,\overline{\varepsilon })\) and let C(t) be the set of all \(u\in F(t,0)\) such that (9’) holds with chosen \(r,R,\eta \) and \(\varepsilon \). Clearly, C(t) is a closed set (that in principle can be empty). Again, it is not a difficult matter to verify that \(C(\cdot )\) is a measurable mapping. (Consider the functions

    $$\begin{aligned} g(t,u)= & {} \sup \{\mathrm{ex}(F(x)\cap B(u,r),F(x'+R(\Vert x'-x\Vert B))):\; x,x'\in \varepsilon B\};\\ h(t,u)= & {} \max \{ d(F(t,x),B((u,(1-\eta )r) ))\} \end{aligned}$$

    (where \(\mathrm{ex}(A,B)\) is the excess of A over B and d(AB) is the distance between A and B) and the sets of \(u\in F(t,0)\) where both are equal zero.)

We shall slightly modify C(t): let \(C'(t)=\{0\}\) for t in a set \(Q_{\varepsilon }\) of small positive measure, say smaller than \(\varepsilon \), and otherwise \(C'(t)=C(t)\). Let \(\Gamma '(t)\) be the mapping into \(\mathbb {R}\times \mathbb {R}^n\) corresponding to \(C'(t)\) and the chosen \(r,R,\varepsilon \) and \(\eta \) according to Proposition 2.7. Set \(\Gamma (t,x)=\Gamma _0(t,x)\cup \Gamma '(t,x)\) and consider the problem

$$\begin{aligned} \begin{array}{l} \mathrm{minimize}\quad \varphi (x(\cdot )),\\ \mathrm{s.t.}\quad (\dot{\xi },\dot{x})\in \Gamma (t,x),\; (x(0),x(T))\in S,\; \xi (0)=0,\; \xi (T)=T. \end{array} \end{aligned}$$

Then \(\xi (t)=t\) for any feasible pair \((\xi (\cdot ),x(\cdot ))\), which means that \(x(\cdot )\) is feasible in (OC). Hence, (t, 0) is a local minimum in the problem and we can apply the result obtained in the first part of the proof to get a necessary optimality condition. It follows that there are \(\lambda \ge 0\), a nonnegative measure \(\mu \) supported on the set \(\varDelta (0)\), a \(\mu \)-measurable \(\gamma (t)\in \partial _C^>g(t,0)\), \(\mathbb {R}^{n+1}\)-valued function \((\kappa (t),p(t))\) of bounded variation and summable \((\omega (t),q(t))\) such that

$$\begin{aligned} \begin{aligned}&\lambda +\Vert (\kappa (\cdot ),p(\cdot ))\Vert +\mu ([0,T])=1;\\&(p(0),-p(T)-\gamma (T)\mu (\{T\}))\in \lambda \partial \ell (0,0)+ N(S,(0,0)); \end{aligned} \end{aligned}$$
(15)

and almost everywhere on [0, T]

$$\begin{aligned} \begin{aligned}&\dot{\kappa }(t)=\omega (t),\quad {\mathrm{d}}p(t)= q(t){\mathrm{d}}t+\gamma (t)\mu ({\mathrm{d}}t);\\&(\omega (t),q(t))\in \mathrm{conv}~\{(\omega ,q):\; ((\omega ,q),(\kappa (t),p(t)))\in \partial G(t,\cdot ,\cdot )((t,0),(1,0));\qquad \\&\omega (t)(\chi -1) + \langle p(t),u\rangle \le 0,\quad \forall \; (\chi ,u)\in \Gamma (t,0), \end{aligned} \end{aligned}$$
(16)

where \(G(t,(\xi ,x),(\sigma ,y))=K_0d((\sigma ,y),\Gamma (t,(\xi ,x)))\) and \(K_0\) is a certain positive number. Since \(\Gamma \) does not depend on \(\xi \), we deduce that \(\omega (t)\equiv 0\), and therefore, \(\kappa (t)=const=\kappa \). We claim that actually \(\kappa (t)\equiv 0\) and the second relation in (16) reduces to

$$\begin{aligned} q(t)\in \mathrm{conv}~\{q:\; (q,p(t))\in K\partial L(t,\cdot ,\cdot )(0,0)\}, \end{aligned}$$
(17)

where we have set \(L(t,x,y)= d(y,F(t,x))\). This would prove the theorem for U(t) replaced by the union of \({\overline{r}}(t) B\) and C(t).

Let \(t\in Q_{\varepsilon }\) and \(((0,q),(\kappa ,p(t)))\in \partial G(t,\cdot ,\cdot )((t,0),(1,0))\). Recall that \(\Gamma (t,\cdot )=\Gamma _0(t,\cdot )\) for such t. Choose further small positive \(\delta \) and \(\theta \) to ensure that \((1-\eta ){\overline{r}}>3\delta \) and \(\theta {\overline{R}}(t) <\delta \). The latter implies by (H\(_5\)) that \(d(0,F(t,x))<\delta \) if \(\Vert x\Vert <\theta \).

We have \(d((\sigma ,y),\Gamma _0(t,x))= \inf \{|\sigma -\nu |+\Vert y-\nu z\Vert :\; z\in F(t,x),\; \Vert z\Vert \le (1-\sigma \nu ){\overline{r}}(t) \}\). However, if \(1-\sigma <\delta \), \(\Vert y\Vert <\delta \), \(\Vert x\Vert <\theta \), then the inequality \(\Vert z\Vert \le (1-\sigma \nu ){\overline{r}}(t)\) is automatically satisfied for \((\nu ,z)\) realizing the infimum. Hence, \(d((\sigma ,y),\Gamma _0(t,x))= \inf \{|\sigma -\nu |+\Vert y-\nu z\Vert :\; z\in F(t,x) \}\). On the other hand, we can be sure that \(d((y/\nu ),F(t,x))<1\) for \((\nu ,y)\) close to (1, 0). Therefore, the infimum is attained with \(\nu =\sigma \) and is equal to \(\nu d((y/\nu ),F(t,x))= \nu L(t,x,y/\nu )\).

Set \(g_t(\nu ,,x,y)= \nu L(t,x,y/\nu )\). We can view \(g_t\) as a composition \(h_t\circ A\), where \(h_t(\nu ,x,y)= \nu L(t,x,y)\) and \(A(\nu ,x,y)= (\nu ,x,y/\nu )\) is a smooth map. Applying Proposition 2.5 (with a reference to Remark 2.2), we get the equality \(\partial g_t(1,0,0)= (0,\partial L(t,\cdot ,\cdot ))(0,0)\) which proves the claim.

Denote by \(\Lambda \), the collection of Lagrange multipliers \(\lambda ,p(\cdot ),q(\cdot ),\nu \) (where \(\nu ({\mathrm{d}}t)=\gamma (t)\mu ({\mathrm{d}}t)\)) satisfying (15) (with \(\kappa (\cdot )=0\)), (17). By Proposition 2.9, \(d|p|(t)\le {\overline{R}}(t)|p|(t)+d|\nu |(t)\), where |p| and \(|\nu |\) stand for variations of p and \(\nu \). The total variation of \(\nu \) is bounded by the Lipschitz constant of \(\varphi \); hence, by Grownwall’s lemma the total variation of \(p(\cdot )\) is also bounded and (again by Proposition 2.9) \(\Vert q(t)\Vert \le {\overline{R}}(t)|p|([0,T])\). This means that \(\Lambda \) is a compact set if \(p(\cdot )\) and \(\nu \) are considered with the weak\(^*\)-topology of measures and \(q(\cdot )\) with the weak topology of \(L^1\).

Let now \(r_i\rightarrow 0\), \(\eta _i\rightarrow 0\) (both decreasingly), \(R_i\rightarrow \infty \) (increasingly) and \(\varepsilon _i\rightarrow 0\) (decreasingly) satisfy \(\varepsilon _iR_i<(1-\eta _i)r_i\), and let \(C_i'(t)\) be the corresponding subsets of F(t, 0). If \(Q_{\varepsilon _i}\) are chosen to guarantee that \(Q_{\varepsilon _{i+1}}\subset Q_{\varepsilon _i}\) (which is of course possible), then \(C_i(t)\subset C_{i+1}(t)\) and every \(u\in U(t)\) belongs to \(C_i'(t)\) if i is sufficiently large. As we have seen, for any i the set of \(\Lambda _i\) of multipliers guaranteeing that the conclusion of the theorem holds with U(t) replaced by \(({\overline{r}}(t)B)\cup C_i(t)\) in (iv) is nonempty. It is obvious that \(\Lambda _{i+1}\subset \Lambda _i\), and all \(\Lambda _i\) are compact, \(\cap \Lambda _i\ne \emptyset \). This completes the proof of the theorem.

Remark 5.1

The only purpose of introducing functionals \(\varphi _m\) in the regular case was to get more precise information about location of values of \(\gamma (\cdot )\). In case when we know a priori that \(\overline{\partial }_C g(t,\cdot )\) and \(\partial _C^>g(t,\cdot )\) coincide (e.g., when \(g(t,\cdot )\) is differentiable and the derivative is continuous in both variables), the proof will be noticeably shorter.

6 An Open Problem

If we know a priori that (OC) is regular, that is, the distance to the feasible set from any \(x(\cdot )\) of a \(W^{1,1}\)-neighborhood of \({\overline{x}}(\cdot )\) is majorized by \(K(\int _0^Td( x(t),F(t,x(t))){\mathrm{d}}t + d((x(0),x(T)),S))\), then the conclusion of the theorem can be significantly strengthened. Indeed, regularity would allow us to reduce the problem to unconstrained minimization of

$$\begin{aligned} \varphi (x(\cdot )) + K_0\left( \int _0^Td(x(t),F(t,x(t))){\mathrm{d}}t + d((x(0),x(T)),S)\right) . \end{aligned}$$

Applying Theorem 3.1 to this functional, we would deduce that the conclusion of the theorem is valid with U(t) replaced by the set of \(u\in F(t,{\overline{x}}(t))\) such that the function \(d(u,F(t,\cdot ))\) is continuous in the \(\overline{\varepsilon }\)-neighborhood of \({\overline{x}}(t)\). The question is whether such a replacement can be justified also in the singular case. Positive answer to the question would allow to extend (iv) to those \(u\in F(t,{\overline{x}}(t))\) for which the function \(d(u,F(t,\cdot ))\) is continuous near \({\overline{x}}(t)\).

Similar extension is possible for optimal control problems in the classical Pontryagin form when the right side of the equation is continuous with respect to the state variable (see, e.g., [22] and references therein).

7 Conclusions

As is stressed in the introduction, the purpose of the paper has been not just to prove new and stronger necessary optimality conditions but also to show that how a heavily constrained optimal control problem can be equivalently reduced to an unconstrained Bolza problem with simply structured integrand and off-integral term. It is to be once again emphasized that such a reduction can be performed and effectively used for further analysis only in the framework of modern variational analysis that allows to work with nonsmoothness and set-valuedness. I believe that the approach has a strong potential and can be effectively used far beyond first-order optimality conditions, in particular, to study second-order conditions for a strong minimum in optimal control and the Hamilton–Jacobi theory.