1 Introduction

In this paper, we solve a fundamental question in optimization theory. It deals with different criteria for regularity in a calculus of variations context and some of its consequences for first- and second-order necessary conditions. In particular, our main result allows us to weaken the usual assumption of strong normality found in the literature and, at the same time, preserve the corresponding conditions on the sets of extremals and critical directions.

The calculus of variations problem we shall be concerned with is a fixed endpoint problem of Lagrange posed over piecewise smooth functions and involving inequality and equality isoperimetric constraints. In a recent paper (see [1]), we studied different assumptions found in the literature under which, for that problem, second-order necessary conditions are well known, and showed that all those assumptions are equivalent to a form of strong normality defined in [2]. In this paper, we shall prove, in particular, that those second-order conditions for the problem under consideration can be derived under a much weaker assumption.

One can find in the literature different references treating this and more general problems, some inserted in an optimal control context. In particular, let us mention some differences between our main result, which corresponds to an improvement in Theorem 2.2, and the fundamental research developed in [3,4,5,6,7,8]. In those references, one of the main distinctive features is that the necessary conditions obtained make sense and are derived, without a priori assumptions on the normality of the extremal under consideration. The excellent survey given in [5] provides a full account of these results as well as others related to the inverse function theorem. The paper [6] continues an investigation begun in [3, 4, 8], extending some of the results to the case of mixed constraints. On the other hand, the second-order conditions derived in those references (see, for example, [5, Theorem 2.1]) are expressed in terms of the maximum of a quadratic form, over certain Lagrange multipliers, for all the elements of the so-called critical cone which, for our problem, corresponds to the set of tangential constraints with respect to the original set of inequality and equality constraints. In contrast, Theorem 2.4 provides second-order conditions on the set of tangential constraints with respect to a subset of the previous one, and under the assumption of normality relative not to a set defined only by equality constraints for active indices, as in Theorem 2.2, but relative to a set which properly contains it.

Other references in the subject provide different approaches to the derivation of second-order necessary conditions. Some deal with implicit function theorems (see, for example, [2, 9,10,11,12]), the maximum of a quadratic form for different types of minima [13,14,15], or new notions of conjugacy and critical directions [16,17,18,19].

The paper is organized as follows. In Sect. 2, we pose the problem we shall deal with and state two well-known results on first- and second-order necessary conditions. Both results require a standard assumption of “strong normality” and, based on a particular notion of regularity, we show how they can be easily established. In the same section, we introduce a more general notion of normality and state that the previous, standard second-order conditions hold under the weaker assumption of normality relative to a subset of the set of isoperimetric constraints, which takes into account the sign of the corresponding Lagrange multipliers. This is the main result of the paper. A result on necessary conditions in terms of a more general notion of regularity is then proved and, as explained at the end of that section, our main result turns out to be a simple consequence of the fact that regularity relative to a set S is implied precisely by normality relative to the same set. Section 3 is devoted to prove this fact by introducing the notion of properness relative to a set. Finally, in Sect. 4, we provide a simple example which illustrates the usefulness of our result and for which the classical theory on second-order conditions cannot be applied.

2 The Problem and Main Results

Let us begin by stating the problem. The given data correspond to an interval \(T:=[t_0,t_1]\) in \(\mathbb {R}\), two points \(\xi _0\), \(\xi _1\) in \(\mathbb {R}^n\), functions L and \(L_\gamma \) mapping \(T \times \mathbb {R}^n\times \mathbb {R}^n\) to \(\mathbb {R}\) and scalars \(b_\gamma \) in \(\mathbb {R}\) \((\gamma =1,\ldots ,q)\). Denote by X the space of piecewise \(C^1\) functions mapping T to \(\mathbb {R}^n\) called arcs or trajectories, and let \(X_e\) be the set of all arcs x satisfying the endpoint constraints \(x(t_0)=\xi _0\) and \(x(t_1)=\xi _1\). The admissible arcs are the elements of

$$\begin{aligned} S:=\{x \in X_e : I_\alpha (x) \le 0 \ (\alpha \in R),\ I_\beta (x) = 0 \ (\beta \in Q)\}, \end{aligned}$$

where \(R=\{ 1,\ldots ,r \}\), \(Q=\{ r+1,\ldots ,q \}\), and

$$\begin{aligned} I_\gamma (x) = b_\gamma + \int _{t_0}^{t_1}L_\gamma (t,x(t),{\dot{x}}(t))\mathrm{d}t \quad (x \in X,\ \gamma =1,\ldots ,q). \end{aligned}$$

The problem we shall deal with, which we label (P), is that of minimizing I on S, where

$$\begin{aligned} I(x) = \int _{t_0}^{t_1}L(t,x(t),{\dot{x}}(t))\mathrm{d}t \quad (x \in X). \end{aligned}$$

An arc x is said to solve (P), if x is admissible and \(I(x) \le I(y)\) for all admissible arcs y. For any \(x \in X\), the notation \(({{\tilde{x}}}(t))\) represents \((t,x(t),{\dot{x}}(t))\) and we assume that L, \(L_\gamma \) are \(C^2\).

Given an arc x consider the first variation of I along x given by

$$\begin{aligned} I^\prime (x;y):= \int _{t_0}^{t_1}\{ L_x({{\tilde{x}}}(t))y(t) + L_{\dot{x}}({{\tilde{x}}}(t)){\dot{y}}(t) \} \mathrm{d}t \quad (y \in X) \end{aligned}$$

and the second variation of I along x given by

$$\begin{aligned} I^{\prime \prime }(x;y):= \int _{t_0}^{t_1}2\varOmega (t,y(t),{\dot{y}}(t))\mathrm{d}t \qquad (y \in X) \end{aligned}$$

where, for all \((t,y,{\dot{y}}) \in T \times \mathbb {R}^n\times \mathbb {R}^n\),

$$\begin{aligned} 2\varOmega (t,y,{\dot{y}}) :=\langle y , L_{xx}({{\tilde{x}}}(t))y \rangle + 2\langle y , L_{x{\dot{x}}}({{\tilde{x}}}(t)){\dot{y}}\rangle + \langle {\dot{y}}, L_{{\dot{x}}{\dot{x}}}({{\tilde{x}}}(t)){\dot{y}}\rangle . \end{aligned}$$

The first and second variations of other integrals such as \(I_\gamma \) are defined in a similar way. Define the set of admissible variations as

$$\begin{aligned} Y:= \{ y \in X : y(t_0) = y(t_1) = 0 \}. \end{aligned}$$

2.1 Strong Normality and p-Regularity

For first- and second-order necessary conditions, one is usually interested in proving linearly independence on Y of the first variations \(I^\prime _\gamma (x_0;y)\) (for all \(\gamma \in I_a (x_0) \cup Q\)) of \(I_\gamma \) along an admissible arc \(x_0\), where

$$\begin{aligned} I_a(x_0) = \{ \alpha \in R : I_\alpha (x_0) = 0 \} \end{aligned}$$

denotes the set of active indices at \(x_0\). This is equivalent to the existence of \(y_\gamma \in Y\) \((\gamma \in I_a(x_0) \cup Q)\) such that

$$\begin{aligned} |I^\prime _\beta (x_0;y_\gamma )| \not = 0 \quad (\beta ,\gamma \in I_a(x_0) \cup Q). \end{aligned}$$

Let us call this property strong normality or, as it will become apparent from the theory to follow, normality relative to \(S_0\), where

$$\begin{aligned} S_0:= S_0(x_0) = \{ x \in X_e : I_\gamma (x) = 0 \ (\gamma \in I_a(x_0) \cup Q) \}. \end{aligned}$$

Definition 2.1

We shall call a pair \((x_0,\lambda ) \in S \times \mathbb {R}^q\) an extremal, if the following conditions hold:

  1. i.

    \(\lambda _\alpha \ge 0\) and \(\lambda _\alpha I_\alpha (x_0)=0\) \((\alpha \in R)\).

  2. ii.

    If \(J(x):= I(x) + \sum _1^q \lambda _\gamma I_\gamma (x)\), then \(J^\prime (x_0;y)=0\) for all \(y \in Y\).

Condition (ii), as it is well known, is equivalent to the existence of \(c \in \mathbb {R}^n\) such that

$$\begin{aligned} F_{\dot{x}}({{\tilde{x}}}(t)) = \int _{t_0}^t F_x({{\tilde{x}}}(s))\mathrm{d}s + c \quad (t \in T) \end{aligned}$$

where \(F = L + \sum _1^q \lambda _\gamma L_\gamma \) (see [2]). Denote by \({{\mathcal {E}}}\) the set of all extremals.

We have the following two well-known results on necessary conditions (see, for example, [2]). Both require a strong normality assumption on the solution to the problem.

Theorem 2.1

If \(x_0\) solves (P) and is strongly normal, then \(\exists \lambda \in \mathbb {R}^q\) unique such that \((x_0,\lambda ) \in {{\mathcal {E}}}\).

Theorem 2.2

Suppose \(\exists \lambda \in \mathbb {R}^q\) such that \((x_0,\lambda ) \in {{\mathcal {E}}}\). If \(x_0\) solves (P) and is strongly normal, then \(J^{\prime \prime }(x_0;y) \ge 0\) for all \(y \in Y\) satisfying

a. :

\(I^\prime _{\alpha }(x_0;y) \le 0 \ (\alpha \in I_a(x_0),\ \lambda _\alpha =0)\);

b. :

\(I^\prime _{\beta }(x_0;y) = 0 \ (\beta \in R \ \hbox {with}\ \lambda _\beta > 0, \ \hbox {or}\ \beta \in Q)\).

Let us briefly explain how, based on a particular notion of regularity, these two results can be easily established. We begin by stating the following well-known property of linear functionals on real vector spaces (see [2, 20]).

Lemma 2.1

Let X be a real vector space. Suppose \(L, L_i\) are linear functionals on X \((i \in A \cup B, \ \hbox {where}\ A=\{1,\ldots ,p\},\ B=\{p+1,\ldots ,m\})\), and

$$\begin{aligned} {{\mathcal {R}}}=\{x \in X : L_\alpha (x) \le 0 \ (\alpha \in A),\ L_\beta (x) = 0 \ (\beta \in B)\}. \end{aligned}$$

If \(L(x) \ge 0\) for all \(x\in {{\mathcal {R}}}\), then \(\exists \{\lambda _i\}_1^m\) such that \(\lambda _\alpha \ge 0\) \((\alpha \in A)\) and such that \(L(x) + \sum _1^m \lambda _iL_i(x) = 0\) \((x \in X)\). If \(\{L_i\}_1^m\) is linearly independent, then \(\{\lambda _i\}_1^m\) is unique.

Now, define the set of tangential constraints at \(x_0 \in S\) by

$$\begin{aligned} {{\mathcal {R}}}_S(x_0) = \left\{ y \in Y : I^\prime _\alpha (x_0;y) \le 0 \ (\alpha \in I_a(x_0)),\ I^\prime _\beta (x_0;y) = 0 \ (\beta \in Q) \right\} \end{aligned}$$

and the set of (positive) curvilinear tangents of S at \(x_0\) by

$$\begin{aligned} C_S(x_0)= & {} \{ y \in Y : \exists \delta >0 \ \hbox {and}\ x(\cdot ,\epsilon ) \in S \ (0 \le \epsilon < \delta ) \ \hbox {with}\ x(t,0) = x_0(t),\\ x_\epsilon (t,0)= & {} y(t) \ (t \in T)\}. \end{aligned}$$

Clearly, \(C_S(x_0) \subset {{\mathcal {R}}}_S(x_0)\) since, for all \(y \in C_S(x_0)\),

$$\begin{aligned} {{d} \over {\mathrm{d}\epsilon }} I_\gamma (x(\cdot ,\epsilon ))\biggl |_{\epsilon =0} = \int _{t_0}^{t_1}\{ L_{\gamma x}({{\tilde{x}}}_0(t))y(t) + L_{\gamma {\dot{x}}}({{\tilde{x}}}_0(t)){\dot{y}}(t) \} \mathrm{d}t = I^\prime _\gamma (x_0;y). \end{aligned}$$

If the converse holds, that is, if the two sets coincide, the arc \(x_0\) will be called p-regular. Note that in this event, if \(x_0\) solves the problem, \(y \in {{\mathcal {R}}}_S(x_0)\) and \(\delta > 0\) and x are as in the definition of \(C_S(x_0)\), then \(g(\epsilon ):=I(x(\cdot ,\epsilon ))\) has a local minimum at \(\epsilon = 0\) and so \(g^\prime (0) = I^\prime (x_0;y) \ge 0\). In Theorem 2.1, the existence and uniqueness of \(\lambda _1,\ldots ,\lambda _q\) satisfying (i) and (ii) in the definition of \({{\mathcal {E}}}\), follows by Lemma 2.1. Finally, by an application of the implicit function theorem, one can show that strong normality implies p-regularity.

For Theorem 2.2, consider the subset of S given by

$$\begin{aligned} S_1:= & {} S_1(\lambda ) = \{x \in X_e : I_\alpha (x) \le 0 \ (\alpha \in R, \ \lambda _\alpha =0), \\ I_\beta (x)= & {} 0 \ (\beta \in R \ \hbox {with}\ \lambda _\beta >0, \ \hbox {or}\ \beta \in Q)\} \end{aligned}$$

and note that \(S_1 = \{ x \in S : J(x) = I(x) \}\) and \({{\mathcal {R}}}_{S_1}(x_0)\) is precisely the set of all \(y \in Y\) satisfying (a) and (b) of that theorem, that is,

$$\begin{aligned} {{\mathcal {R}}}_{S_1}(x_0)= & {} \{ y \in Y : I^\prime _{\alpha }(x_0;y) \le 0 \ (\alpha \in I_a(x_0),\ \lambda _\alpha =0),\\ I^\prime _{\beta }(x_0;y)= & {} 0 \ (\beta \in R \ \hbox {with}\ \lambda _\beta >0, \ \hbox {or}\ \beta \in Q)\}. \end{aligned}$$

Since strong normality relative to S is equivalent to strong normality relative to \(S_1\), our assumption implies that \({{\mathcal {R}}}_{S_1}(x_0) = C_{S_1}(x_0)\). Thus, for any y in \({{\mathcal {R}}}_{S_1}(x_0)\), there exist \(\delta \) and x as in the definition of \(C_{S_1}(x_0)\) and so, if we define \(g(\epsilon ):=I(x(\cdot ,\epsilon ))\) as before, then \(g(\epsilon ) = J(x(\cdot ,\epsilon ))\) by definition of \(S_1\), and therefore, as one readily verifies, \(J^{\prime \prime }(x_0;y) = g^{\prime \prime }(0) \ge 0\).

Let us emphasize that, in both proofs (which can be seen with full detail in, for example, [1, 2]), as well as in other proofs found in the literature (with a different but, as explained in [1], equivalent assumption in an optimal control context in [2, 9, 10]), the assumption of strong normality is basic. This notion, as mentioned before, is related to the set \(S_0\) when a different, more general definition of normality is adopted.

2.2 Normality and Regularity Relative to S

An arc \(x_0\) will be said to be normal relative to S, if \(\lambda =0\) is the only solution of

  1. i.

    \(\lambda _\alpha \ge 0\) and \(\lambda _\alpha I_\alpha (x_0)=0\) \((\alpha \in R)\).

  2. ii.

    \(\sum _1^q \lambda _\gamma I^\prime _\gamma (x_0;y)=0\) for all \(y \in Y\).

To understand the origin of this definition, let us state the following well-known set of first-order conditions given in [2].

Theorem 2.3

If \(x_0\) solves (P), then there exist \(\lambda _0 \ge 0\) and \(\lambda _1,\ldots ,\lambda _q\) not all zero, such that

  1. i.

    \(\lambda _\alpha \ge 0\) and \(\lambda _\alpha I_\alpha (x_0)=0\) \((\alpha \in R)\).

  2. ii.

    If \(J_0(x) := \lambda _0 I(x) + \sum _1^q \lambda _\gamma I_\gamma (x)\), then \(J^\prime _0(x_0;y) = 0\) for all \(y \in Y\).

Clearly, if \(x_0\) is a solution to the problem and is normal relative to S then, necessarily, \(\lambda _0 > 0\) in Theorem 2.3 and the multipliers can be chosen so that \(\lambda _0=1\). In this event, the conclusion of Theorem 2.1 follows except possibly for the uniqueness of the multipliers. Normality relative to S will be called weak normality.

Note that this notion, applied to \(S_0\), is equivalent to strong normality since, in view of the definition, \(x_0\) is normal relative to \(S_0\) if \(\lambda =0\) is the only solution of

  1. i.

    \(\lambda _\alpha I_\alpha (x_0)=0\) \((\alpha \in R)\).

  2. ii.

    \(\sum _1^q \lambda _\gamma I^\prime _\gamma (x_0;y)=0\) for all \(y \in Y\).

This notion of normality can, of course, be applied also to the set \(S_1\). An arc \(x_0\) is normal relative to \(S_1\), if \(\mu =0\) is the only solution of

  1. i.

    \(\mu _\alpha \ge 0\) and \(\mu _\alpha I_\alpha (x_0)=0\) \((\alpha \in R,\ \lambda _\alpha =0)\).

  2. ii.

    \(\sum _1^q \mu _\gamma I^\prime _\gamma (x_0;y)=0\) for all \(y \in Y\).

A fundamental question posed in the literature (see also [9, 11, 12, 17,18,19] for other problems in optimal control) is if, in Theorem 2.2, the assumption of strong normality can be replaced with that of normality relative to \(S_1\), without altering the set \({{\mathcal {R}}}_{S_1}(x_0)\) of critical directions where the second-order conditions hold, that is, the set of all \(y \in Y\) satisfying conditions (a) and (b) of Theorem 2.2. In other words, the question is whether the following theorem is valid or not.

Theorem 2.4

Suppose \(\exists \lambda _1,\ldots ,\lambda _q\) such that

  1. i.

    \(\lambda _\alpha \ge 0\) and \(\lambda _\alpha I_\alpha (x_0)=0\) \((\alpha \in R)\).

  2. ii.

    If \(J(x):= I(x) + \sum _1^q \lambda _\gamma I_\gamma (x)\), then \(J^\prime (x_0;y)=0\) for all \(y \in Y\).

If \(x_0\) solves (P) and is normal relative to \(S_1 = \{ x \in S : J(x) = I(x) \}\), then \(J^{\prime \prime }(x_0;y) \ge 0\) for all \(y \in Y\) satisfying

a.:

\(I^\prime _{\alpha }(x_0;y) \le 0 \ (\alpha \in I_a(x_0),\ \lambda _\alpha =0)\);

b.:

\(I^\prime _{\beta }(x_0;y) = 0 \ (\beta \in R \ \hbox {with}\ \lambda _\beta > 0, \ \hbox {or}\ \beta \in Q)\).

We shall pose, and solve this question, by means of a notion of regularity slightly different to the previous one, defined in terms not of curvilinear but sequential tangents. Based on the weak norm on X,

$$\begin{aligned} \Vert x\Vert := \sup \bigl \{ |x(t)|^2 + |{\dot{x}}(t)|^2 \bigr \}^{1/2}, \end{aligned}$$

let us introduce the notion of tangent cone which corresponds to a generalization of the one given by Hestenes [2, 20] for finite-dimensional spaces (see also [21] for equivalent definitions). In what follows, the letter q should not be confused with the cardinality of \(R \cup Q\).

We shall say that a sequence \(\{x_q\} \subset X\) converges to \(x_0\) in the direction y if y is a unit arc, \(x_q \not = x_0\), and

$$\begin{aligned} \lim _{q \rightarrow \infty } \Vert x_q - x_0\Vert = 0,\quad \lim _{q \rightarrow \infty } {{x_q - x_0} \over {\Vert x_q - x_0\Vert }} = y. \end{aligned}$$

The tangent cone of S at \(x_0\), which we shall denoted by \(T_S(x_0)\), is the cone determined by the unit arcs \(y \in Y\) for which there exists a sequence \(\{x_q\}\) in S converging to \(x_0\) in the direction y.

Note that, equivalently, \(T_S(x_0)\) is the set of all \(y \in Y\) for which there exist sequences \(\{x_q\}\) in S and \(\{\epsilon _q\}\) of positive numbers such that

$$\begin{aligned} \lim _{q \rightarrow \infty } \epsilon _q = 0,\quad \lim _{q \rightarrow \infty } {{x_q - x_0} \over {\epsilon _q}} = y. \end{aligned}$$
(1)

This follows since, if \(\{x_q\}\) and \(\{\epsilon _q\}\) satisfy (1), then we have

$$\begin{aligned} \lim _{q \rightarrow \infty } x_q = x_0,\quad \lim _{q \rightarrow \infty } {{\Vert x_q - x_0 \Vert } \over {\epsilon _q}} = \Vert y\Vert . \end{aligned}$$

Hence, if \(y \not \equiv 0\), then \(\Vert x_q - x_0\Vert \not = 0\) for large values of q and

$$\begin{aligned} \lim _{q \rightarrow \infty } {{x_q - x_0} \over {\Vert x_q - x_0 \Vert }} = \lim _{q \rightarrow \infty } {{x_q - x_0} \over {\epsilon _q}} \ \lim _{q \rightarrow \infty } {{\epsilon _q} \over {\Vert x_q - x_0 \Vert }} = {{y} \over {\Vert y \Vert }}. \end{aligned}$$

Therefore, if y is a unit arc and there exist \(\{x_q\} \subset S\) and \(\{\epsilon _q > 0\}\) satisfying (1), then we can choose \(\epsilon _q = \Vert x_q - x_0\Vert \) in (1).

A fundamental property satisfied by this norm is that, as shown in [2], if \(\{x_q\}\) converges to \(x_0\) in the direction y, then

$$\begin{aligned} \lim _{q \rightarrow \infty } {{I(x_q) - I(x_0)} \over {\Vert x_q - x_0\Vert }}=I^\prime (x_0;y). \end{aligned}$$
(2)

Similarly, for the second variation,

$$\begin{aligned} \lim _{q \rightarrow \infty } {{I(x_q) - I(x_0) - I^\prime (x_0;x_q-x_0)} \over {\Vert x_q - x_0\Vert ^2}} = {\textstyle {{1}\over {2}}}I^{\prime \prime }(x_0;y). \end{aligned}$$
(3)

The first of these relations, clearly implies that \(T_S(x_0) \subset {{\mathcal {R}}}_S(x_0)\), and we shall say that \(x_0 \in S\) is a regular arc of S if \(T_S(x_0) = {{\mathcal {R}}}_S(x_0)\). In terms of this notion, we obtain the following first- and second-order necessary conditions.

Theorem 2.5

If \(x_0\) solves (P) and is a regular arc of S, then \(\exists \lambda \in \mathbb {R}^q\) such that \((x_0,\lambda ) \in {{\mathcal {E}}}\).

Theorem 2.6

Suppose \(\exists \lambda \in \mathbb {R}^q\) such that \((x_0,\lambda ) \in {{\mathcal {E}}}\). If \(x_0\) solves (P) and is a regular arc of \(S_1=\{ x \in S : J(x) = I(x) \}\), then \(J^{\prime \prime }(x_0;y) \ge 0\) for all \(y \in {{\mathcal {R}}}_{S_1}(x_0)\).

To prove these results, suppose \(x_0\) solves (P). Clearly, by (2), \(I^\prime (x_0;y) \ge 0\) for all \(y \in T_S(x_0)\). By regularity relative to S, this holds on \({{\mathcal {R}}}_S(x_0)\), and the conclusion of Theorem 2.5 follows by Lemma 2.1. Now, if \(y \in T_{S_1}(x_0)\) is a unit arc and \(\{x_q\} \subset S_1\) a sequence converging to \(x_0\) in the direction y then, since \(J(x) = I(x)\) on \(S_1\), we have \(J(x_q) \ge J(x_0)\), and therefore, by (3), \(J^{\prime \prime }(x_0;y) \ge 0\). The result then follows by regularity relative to \(S_1\).

Note that the only difference between Theorems 2.4 and 2.6 is the assumption imposed on the extremal. For the former, we have normality relative to \(S_1\) while, for the latter, it is regularity relative to \(S_1\). Therefore, Theorem 2.4 will be established, as a simple consequence of Theorem 2.6, if normality (relative to S) implies regularity (relative to S). Thus, this result would allow us to weaken the usual assumption of strong normality, but preserving the conditions on the sets of extremals and critical directions. This is the fundamental question in optimization theory, mentioned in the beginning of the introduction, which we shall now solve.

3 Normality, Regularity and Properness

Recall that an arc \(x_0 \in S\) is called regular relative to S if \({{\mathcal {R}}}_S(x_0) \subset T_S(x_0)\) (implying equality), and normal relative to S if \(\lambda =0\) is the only solution to

  1. i.

    \(\lambda _\alpha \ge 0\) and \(\lambda _\alpha I_\alpha (x_0)=0\) \((\alpha \in R)\).

  2. ii.

    \(\sum _1^q \lambda _\gamma I^\prime _\gamma (x_0;y)=0\) for all \(y \in Y\).

The purpose of this section is to prove that normality implies regularity. Let us begin with a new concept which, as we shall see below, characterizes normality relative to S.

Definition 3.1

We call \(x_0 \in S\) proper relative to S if

a.:

\(\{ I^\prime _\beta (x_0;y) : \beta \in Q \}\) is linearly independent on Y.

b.:

\(\exists y \in Y\) such that \(I^\prime _\alpha (x_0;y) < 0\) \((\alpha \in I_a(x_0))\) and \(I^\prime _\beta (x_0;y) = 0\) \((\beta \in Q)\).

To prove that properness and normality are equivalent, we shall make use of the following auxiliary result on linear functionals (see [2, 20]).

Lemma 3.1

Let \(L_1,\ldots ,L_m\) be linear forms on X, a real vector space, and let

$$\begin{aligned} {{\mathcal {R}}}=\{x \in X : L_\alpha (x)\le 0 \ (\alpha \in A),\ L_\beta (x) = 0 \ (\beta \in B)\} \end{aligned}$$

where \(A=\{1,\ldots ,p\}\) and \(B=\{p+1,\ldots ,m\}\). Suppose that \(L_i(x) = 0\) for all \(x \in {{\mathcal {R}}}\) and \(i \in A \cup B\). Then there exist \(\lambda _\alpha > 0\) \((\alpha \in A)\) and \(\lambda _\beta \in \mathbb {R}\) \((\beta \in B)\) such that \(\sum _1^m \lambda _iL_i(x) = 0\) \((x \in X)\).

Proposition 3.1

Let \(x_0 \in S\). Then \(x_0\) is normal relative to S \(\Leftrightarrow \) \(x_0\) is proper relative to S.

Proof

Without loss of generality, assume that all constraints are active at \(x_0\), that is, \(R = I_a(x_0)\).

\(\Leftarrow \)”: Suppose \(x_0\) is a proper arc of S. Let \(\lambda \) satisfy (i) and (ii) above. If \(R = \emptyset \), then \(\lambda = 0\). If \(R \ne \emptyset \), choose a trajectory y satisfying 3.1(b). We have

$$\begin{aligned} 0 = \sum _1^q \lambda _\gamma I^\prime _\gamma (x_0;y) = \sum _1^r \lambda _\alpha I^\prime _\alpha (x_0;y),\quad \lambda _\alpha \ge 0 \ (\alpha \in R) \end{aligned}$$

and, since \(I^\prime _\alpha (x_0;y) < 0\) \((\alpha \in R)\), we have \(\lambda _\alpha = 0\) \((\alpha \in R)\). Thus,

$$\begin{aligned} \sum _{\beta \in Q} \lambda _\beta I^\prime _\beta (x_0;y) = 0 \end{aligned}$$

and, by 3.1(a), \(\lambda _\beta = 0\) \((\beta \in Q)\).

\(\Rightarrow \)”: Suppose \(x_0\) is a normal arc of S. Clearly, 3.1(a) holds. Without loss of generality, \(R \ne \emptyset \). Define

$$\begin{aligned} C = \{\alpha \in R : I^\prime _\alpha (x_0;y) < 0 \ \hbox {for some}\ y \in {{\mathcal {R}}}_S(x_0)\} \end{aligned}$$

and let \(D := R \sim C = \{\alpha \in R : \alpha \not \in C\}\), so that \(I^\prime _\gamma (x_0;y) = 0\) for all \(\gamma \in D \cup B\) and \(y \in {{\mathcal {R}}}_S(x_0)\). Let

$$\begin{aligned} V:= & {} \{x \in X_e : I_\alpha (x) \le 0 \ (\alpha \in D),\ I_\beta (x) = 0 \ (\beta \in Q) \},\\ V_0:= & {} \{x \in X_e : I_\gamma (x) = 0 \ (\gamma \in D \cup Q) \} \end{aligned}$$

and consider their corresponding sets of tangential constraints at \(x_0\):

$$\begin{aligned} {{\mathcal {R}}}_V(x_0)= & {} \{y \in Y : {I^\prime _\alpha }(x_0;y) \le 0 \ (\alpha \in D),\ {I^\prime _\beta }(x_0;y)= 0 \ (\beta \in Q)\},\\ {{\mathcal {R}}}_{V_0}(x_0)= & {} \{y \in Y : {I^\prime _\gamma }(x_0;y) = 0 \ (\gamma \in D \cup Q)\}. \end{aligned}$$

We claim that \({{\mathcal {R}}}_V(x_0) = {{\mathcal {R}}}_{V_0}(x_0)\). To prove it, consider the following subset of S:

$$\begin{aligned} {{\tilde{S}}}= \{x \in X_e : I_\alpha (x) \le 0 \ (\alpha \in C),\ I_\gamma (x) = 0 \ ( \gamma \in D \cup Q) \}. \end{aligned}$$

and the tangential constraints at \(x_0\) associated with \({{\tilde{S}}}\):

$$\begin{aligned} {{\mathcal {R}}}_{{\tilde{S}}}(x_0)=\{y \in Y : {I^\prime _\alpha }(x_0;y) \le 0 \ (\alpha \in C),\ {I^\prime _\gamma }(x_0;y)= 0 \ (\gamma \in D \cup Q) \}. \end{aligned}$$

Clearly \({{\mathcal {R}}}_S(x_0) = {{\mathcal {R}}}_{{\tilde{S}}}(x_0)\). Without loss of generality, \(C \not = \emptyset \) for, otherwise, \({{\mathcal {R}}}_{{\tilde{S}}}(x_0) = {{\mathcal {R}}}_{V_0}(x_0)\) and \({{\mathcal {R}}}_S(x_0) = {{\mathcal {R}}}_V(x_0)\). Select, for each \(\alpha \in C\), a trajectory \(y_\alpha \in {{\mathcal {R}}}_S(x_0)\) such that \(I^\prime _\alpha (x_0;y_\alpha ) < 0\), and set \({{\hat{y}}}:=\sum _{\alpha \in C} y_\alpha \). Note that

$$\begin{aligned} I^\prime _\alpha (x_0;{{\hat{y}}}) < 0 \quad (\alpha \in C) \quad \hbox {and} \quad I^\prime _\gamma (x_0;{{\hat{y}}}) = 0 \quad (\gamma \in D \cup Q). \end{aligned}$$

Let \(y \not \equiv 0\) in \({{\mathcal {R}}}_V(x_0)\), and let \(\epsilon > 0\) be such that

$$\begin{aligned} {I^\prime _\alpha }(x_0;y+\epsilon {{\hat{y}}}) = {I^\prime _\alpha }(x_0;y) + \epsilon {I^\prime _\alpha }(x_0;{{\hat{y}}}) \le 0 \quad (\alpha \in C). \end{aligned}$$

Note that

$$\begin{aligned} {I^\prime _\alpha }(x_0;y+\epsilon {{\hat{y}}}) = {I^\prime _\alpha }(x_0;y) \le 0 \quad (\alpha \in D),\quad {I^\prime _\beta }(x_0;y+\epsilon {{\hat{y}}}) = 0 \quad (\beta \in Q) \end{aligned}$$

and, therefore, \(y + \epsilon {{\hat{y}}}\in {{\mathcal {R}}}_S(x_0) = {{\mathcal {R}}}_{{\tilde{S}}}(x_0)\). We conclude that

$$\begin{aligned} {I^\prime _\gamma }(x_0;y+\epsilon {{\hat{y}}}) = {I^\prime _\gamma }(x_0;y) = 0 \quad (\gamma \in D \cup Q) \end{aligned}$$

and so \(y \in {{\mathcal {R}}}_{V_0}(x_0)\). This proves the claim.

Now, by Lemma 3.1, \(\exists \lambda _\alpha > 0\) \((\alpha \in D)\) and \(\lambda _\beta \in \mathbb {R}\) \((\beta \in Q)\), such that, for all \(y \in Y\),

$$\begin{aligned} \sum _{\alpha \in D} \lambda _\alpha I^\prime _\alpha (x_0;y) + \sum _{\beta \in Q} \lambda _\beta I^\prime _\beta (x_0;y) = 0. \end{aligned}$$

If \(D \not = \emptyset \), we contradict normality. Thus, \(D = \emptyset \) and so, as before, we can find \(y \in Y\) satisfying 3.1(b) by selecting, for each \(\alpha \in R\), a trajectory \(y_\alpha \in {{\mathcal {R}}}_S(x_0)\) such that \(I^\prime _\alpha (x_0;y_\alpha ) < 0\), and setting \(y :=\sum _{\alpha \in R} y_\alpha \). \(\square \)

In the next result, we shall make use of the closedness of the tangent cone. This property can be easily seen as follows. Suppose \(y \not \equiv 0\), \(\{y_q\} \subset T_S(x_0)\) and \(y_q\) converges to y. Since \(y_q/\Vert y_q\Vert \rightarrow y/\Vert y\Vert \), it is sufficient to consider the case in which \(\Vert y\Vert = \Vert y_q\Vert = 1\). For all q, select \(x_q \in S\) such that

$$\begin{aligned} 0< \epsilon _q := \Vert x_q - x_0 \Vert< {{1} \over {q}} \quad \hbox {and}\quad \biggl \Vert {{x_q - x_0} \over {\epsilon _q}} - y_q \biggr \Vert < {{1} \over {q}}. \end{aligned}$$

We then have

$$\begin{aligned} \biggl \Vert {{x_q - x_0} \over {\epsilon _q}} - y \biggr \Vert \le \biggl \Vert {{x_q - x_0} \over {\epsilon _q}} - y_q \biggr \Vert + \Vert y_q - y \Vert < {{1} \over {q}} + \Vert y_q - y \Vert . \end{aligned}$$

Hence, (1) holds, and so \(y \in T_S(x_0)\), as was to be proved.

Theorem 3.1

If \(x_0\) is a proper arc of S, then it is a regular arc of S.

Proof

Assume, without loss of generality, that \(R = I_a(x_0)\). By 3.1(b), \(R=C\), that is, \(D = \emptyset \). From the theory for equality constraints (see, for example, [1, 2]), 3.1(a) implies that \(x_0\) is regular relative to

$$\begin{aligned} V_0 = \{x \in X_e : I_\beta (x) = 0 \ (\beta \in Q) \}, \end{aligned}$$

that is,

$$\begin{aligned} T_{V_0}(x_0) = {{\mathcal {R}}}_{V_0}(x_0) = \{ y \in Y : I^\prime _\beta (x_0;y) = 0 \ (\beta \in Q) \}. \end{aligned}$$

Assume \(R \not =\emptyset \) since, otherwise, \(V_0 = S\). Define

$$\begin{aligned} K_S(x_0) := \{y \in Y : I^\prime _\alpha (x_0;y) < 0 \ (\alpha \in R),\ I^\prime _\beta (x_0;y)= 0 \ (\beta \in Q)\}. \end{aligned}$$

We claim that \(K_S(x_0) \subset T_S(x_0)\). To prove it, let \(y \in K_S(x_0)\) and observe that, since \(y \in {{\mathcal {R}}}_{V_0}(x_0) = T_{V_0}(x_0)\), there exist sequences \(\{x_q\}\) in \(V_0\) and \(\{\epsilon _q > 0\}\) such that (1) holds. Therefore,

$$\begin{aligned} \lim _{q \rightarrow \infty } {{I_\gamma (x_q)} \over {\epsilon _q}} = \lim _{q \rightarrow \infty } {{I_\gamma (x_q) - I_\gamma (x_0)} \over {\epsilon _q}} = I^\prime _\gamma (x_0;y) \quad (\gamma =1,\ldots ,q). \end{aligned}$$

Since \(I^\prime _\alpha (x_0;y) < 0\) \((\alpha \in R)\), \(\exists N\) such that, for all \(q \ge N\), \(I_\alpha (x_q) < 0\) \((\alpha \in R)\). Since \(\{x_q\} \subset V_0\), it follows that, for \(q \ge N\), \(x_q \in S\). Hence, \(y \in T_S(x_0)\), and this proves the claim. Finally, let \({{\hat{y}}}\in {{\mathcal {R}}}_S(x_0)\) and let y satisfy 3.1(b), so that \(y \in K_S(x_0)\). Then, for all \(\epsilon > 0\), \({{\hat{y}}}+ \epsilon y \in K_S(x_0) \subset T_S(x_0)\) and, since \(T_S(x_0)\) is closed, \({{\hat{y}}}\in T_S(x_0)\). \(\square \)

We have proved that normality implies regularity. In particular, this implies, as explained before, that Theorem 2.4 is true.

4 An Example and Applications

In this section, we provide a simple example that illustrates one of the main consequences of our result. For this example, the classical theory cannot be applied since, for the extremal \((x_0,\lambda )\) under consideration, \(x_0\) is not strongly normal. Therefore, Theorem 2.2 yields no information. However, by an application of Theorem 2.4, we can conclude that \(x_0\) it is not a solution to the problem. This example shows a clear advantage of the main result of the paper over previous ones found in the literature. We end the paper with some comments on possible applications of the results obtained.

Example 4.1

Let \(a = \pi /2\) and consider the problem of minimizing

$$\begin{aligned} I(x) = {{1} \over {2}} \int _0^a \left\{ x^2(t) - {\dot{x}}^2(t)\right\} \mathrm{d}t \end{aligned}$$

subject to \(x(0) = 1\), \(x(a) = -1\),

$$\begin{aligned} \int _0^a x(t)\mathrm{d}t \le 0 \quad \hbox {and}\quad \int _0^a \{x(t) + {\dot{x}}(t)\}\mathrm{d}t \le -2. \end{aligned}$$

In this case, \(T = [0,a]\), \(n=1\), \(r=2\), \(\xi _0 = 1\), \(\xi _1=-1\),

$$\begin{aligned} L(t,x,{\dot{x}}) = (x^2 - {\dot{x}}^2)/2, \ L_1(t,x,{\dot{x}}) = x,\ L_2(t,x,{\dot{x}}) = x + {\dot{x}},\ b_1=0,\ b_2=2. \end{aligned}$$

Consider the function

$$\begin{aligned} F(t,x,{\dot{x}}):= & {} L(t,x,{\dot{x}}) + \sum _1^2 \lambda _\alpha L_\alpha (t,x,{\dot{x}}) \\= & {} {{x^2 - {\dot{x}}^2} \over {2}} + \lambda _1x + \lambda _2(x+{\dot{x}}) \end{aligned}$$

and note that \(F_{\dot{x}}(t,x,{\dot{x}}) = - {\dot{x}}+ \lambda _2\) and \(F_x(t,x,{\dot{x}}) = x + \lambda _1 + \lambda _2\). Euler’s equation is, therefore, given by \({{\ddot{x}}}(t) + x(t) + \lambda _1 + \lambda _2 = 0\), whose general solution is

$$\begin{aligned} x(t) = c_1 \sin t + c_2 \cos t - (\lambda _1 + \lambda _2). \end{aligned}$$

The constraints \(x(0) = 1\) and \(x(a) = -1\) imply that \(c_1 = \lambda _1 + \lambda _2 - 1\) and \(c_2 = \lambda _1 + \lambda _2 + 1\).

Let us consider the arc

$$\begin{aligned} x_0(t):= \cos t - \sin t \quad (t \in [0,a]) \end{aligned}$$

and let \(\lambda = (\lambda _1,\lambda _2) \equiv (0,0)\). Clearly, \(x_0\) is admissible since it satisfies the endpoint constraints and \(I_1(x_0) = I_2(x_0) = 0\). Moreover, in view of the above argument, \((x_0,\lambda )\) is an extremal with \(c_1=-1\) and \(c_2=1\). Observe now that, for this particular multiplier, we have

$$\begin{aligned} S = \{ x \in X_e : I_\alpha (x) \le 0 \ (\alpha = 1,2) \} = S_1. \end{aligned}$$

Also, by definition, \(x_0\) will be normal relative to \(S = S_1\) if \(\mu _1 = \mu _2 = 0\) is the only solution to

  1. i.

    \(\mu _1 \ge 0\), \(\mu _2 \ge 0\);

  2. ii.

    \(\mu _1 I^\prime _1(x_0;y) + \mu _2 I^\prime _2(x_0;y) = 0\) for all \(y \in Y\).

That this is indeed the case follows since, for all \(y \in Y\),

$$\begin{aligned} \mu _1 I^\prime _1(x_0;y) + \mu _2 I^\prime (x_0;y)= & {} \mu _1 \int _0^a y(t) \mathrm{d}t + \mu _2 \int _0^a \{ y(t) + {\dot{y}}(t) \}\mathrm{d}t\\= & {} (\mu _1 + \mu _2) \int _0^a y(t)\mathrm{d}t \end{aligned}$$

and so (i) and (ii) imply \(\mu \equiv 0\). On the other hand, \(x_0\) is not strongly normal since, without imposing condition (i), we have nonnull solutions to (ii) such as \(\mu \equiv (1,-1)\). Therefore, we cannot invoke Theorem 2.2. However, if we define

$$\begin{aligned} y(t) = {\left\{ \begin{array}{ll} -t &{} \hbox {if}\, t \in [0,a/2]\\ t - a &{} \hbox {if}\, t \in [a/2,a]\\ \end{array}\right. } \end{aligned}$$

then \(y \in Y\) with \(I^\prime _1(x_0;y) \le 0\), \(I^\prime _2(x_0;y) \le 0\) and

$$\begin{aligned} J^{\prime \prime }(x_0;y)= & {} \int _0^a \{ y^2(t) - {\dot{y}}^2(t) \} \mathrm{d}t \\= & {} \int _0^{a/2} (t^2 - 1)\mathrm{d}t + \int _{a/2}^a \{(t - a)^2 - 1\}\mathrm{d}t \\= & {} {{a^3} \over {3}} - a < 0. \end{aligned}$$

By Theorem 2.4, the admissible arc \(x_0\) with \((x_0,\lambda )\) an extremal, is not a solution to the problem.

Both the theory of necessary conditions for problems in the calculus of variations involving isoperimetric constraints and its applications to real problems have a long history. Perhaps the best known problems of this type include that of determining a closed curve of given length which encloses maximum area, and the shape of a flexible rope of uniform density that hangs at rest with its endpoints fixed. Quoting [22], “the study of the [calculus of variations] problem (and its numerous variants) is over three centuries old, yet its interest has not waned. Its applications are numerous in geometry and differential equations, in mechanics and physics, and in areas as diverse as engineering, medicine, economics, and renewable resources. It is not surprising, then, that modeling and numerical analysis play a large role in the subject today.”

This paper, however, focuses on one of the crucial mathematical issues: second-order necessary conditions for optimality. For the fundamental aspect of applications to real problems, we refer the reader to [22] for problems in elasticity and acoustics, [23,24,25] in economics and management, [26, 27] in physics and engineering, [28] in biology, and [29] in medicine (seeking, for example, the optimal dose to inject to a patient during a therapy).

5 Conclusions

This paper shows how the classical theory of second-order necessary conditions for the isoperimetric problem of Lagrange in the calculus of variations, involving inequality and equality constraints, can be substantially improved. The new assumption, under which the conditions are obtained, deals with the notion of normality relative to a set of constraints which takes into account the sign of the corresponding Lagrange multipliers, instead of the usual set defined only by equality constraints for active indices. The proof provided is based on the relation between the three notions appearing in the title of the paper. It is shown that normality is equivalent to properness which, in turn, implies regularity. It is of interest to see whether these notions, and the conditions obtained, can be generalized to isoperimetric problems in optimal control.