Keywords

1 Introduction

It is likely that many mathematicians and users have been disoriented by the abundance of concepts in nonsmooth analysis. The following quotation of F.H. Clarke’s review of the book [45] attests that: “In recent years, the subject has known a period of intense abstract development which has led to a rather bewildering array of competing and unclearly related theories.” Even in leaving apart some important approaches such as Demyanov’s quasidifferentials [7, 8], Jeyakumar and Luc’s generalized Jacobians [20], Warga’s derivate containers [46] and in focussing on subdifferentials, one is faced with very different constructs. It may be useful to be aware of their specific features in order to choose the concept adapted to the problem one has to solve. That is the point of view adopted in the author’s book [38]. It appears there that even if the approaches are different, a number of important properties are shared by all usual subdifferentials, at least on suitable spaces.

Thus, several researchers endeavored to give a synthetic approach to generalized derivatives in proposing a list of properties that can be taken as axioms allowing to ignore the various constructions. Such a list may be more or less complete depending on the needs or the aims. Among the many attempts in this direction, we quote the recent papers [18, 41] as they gather the most complete lists to date; see also their references.

Among the results using a general subdifferential , we note mean value theorems [36], characterizations of convexity [5, 6], or generalized convexity [33, 35, 37, 42], optimality conditions using Chaney type second derivatives [40]. It is the purpose of the present paper to show that a general approach using an arbitrary subdifferential can also be workable for general second-order derivatives.

It is amazing that second-order derivatives can be considered for functions that are not even differentiable. In fact, many approaches can be adopted and it is a second aim of the present paper to relate such derivatives. Since the methods are so diverse (primal, dual, using second-order expansions or not), such an aim is not obvious. In fact, several results are already known; we leave them apart. For instance, for the links between parabolic second derivatives with epi-derivatives, we refer to [1315, 30, 39] and [45, Sect. 13.J]. We do not consider generalized hessians and generalized Jacobians of the derivative that have been the object of a strong attention during the last decades (see [29] and its references). We also leave apart second-order derivatives involving perturbations of the nominal point as in [4, 25], [45, Proposition 13.56], [47], and second-order derivatives aimed at convex functions [1012].

2 Calculus of Second-Order Epi-derivatives

The (lower) second-order epi-derivative of \(f: X \rightarrow \overline{\mathbb{R}}\) at \(x \in {f}^{-1}(\mathbb{R})\) relative to x  ∈ X has been introduced by Rockafellar in [43]; it is given by

$$\displaystyle{\forall u \in X\ \ \ \ \ \ \ \ \ \ {f}^{{\prime\prime}}(x,{x}^{{\ast}},u):=\liminf _{ (t,{u}^{{\prime}})\rightarrow (0_{+},u)} \frac{2} {{t}^{2}}(f(x + t{u}^{{\prime}}) - f(x) -\langle {x}^{{\ast}},t{u}^{{\prime}}\rangle ).}$$

We also write \(f_{x,{x}^{{\ast}}}^{{\prime\prime}}(u) = {f}^{{\prime\prime}}(x,{x}^{{\ast}},u).\) When the directional derivative f (x) of f at x exists, we write f x ′ ′ instead of \(f_{x,{x}^{{\ast}}}^{{\prime\prime}}\) with x : = f (x). Obviously, if for all u ∈ X one has \({f}^{{\prime\prime}}(x,{x}^{{\ast}},u) > -\infty,\) one has x  ∈  D f(x), the directional subdifferential of f at x. Moreover, if f is twice directionally differentiable at x in the sense that there exist some \({x}^{{\ast}}:= {f}^{{\prime}}(x) \in {X}^{{\ast}}\) and some \({D}^{2}f(x) \in {L}^{2}(X,X; \mathbb{R})\) such that for all u ∈ X one has

$$\displaystyle\begin{array}{rcl} f^{\prime}(x)(u)& =& \lim _{(t,{u}^{{\prime}})\rightarrow (0_{+},u)}\frac{f(x + t{u}^{{\prime}}) - f(x)} {t}, {}\\ {D}^{2}f(x)(u,u)& =& \lim _{ (t,{u}^{{\prime}})\rightarrow (0_{+},u)}\frac{f(x + t{u}^{{\prime}}) - f(x) -\langle {x}^{{\ast}},t{u}^{{\prime}}\rangle } {{t}^{2}/2}, {}\\ \end{array}$$

then \({f}^{{\prime\prime}}(x,{x}^{{\ast}},u) = {D}^{2}f(x)(u,u).\) More generally, let us say that a continuous linear map H: X → X is a directional subhessian of f at (x, x ) ∈ X × X if there exists a function \(r: \mathbb{R} \times X \rightarrow \mathbb{R}\) such that \(\lim _{(t,{u}^{{\prime}})\rightarrow (0_{+},u)}{t}^{-2}r(t,{u}^{{\prime}}) = 0\) for all u ∈ X and

$$\displaystyle{\forall {u}^{{\prime}}\in X,\ \forall t \in \mathbb{P}\ \ \ \ \ \ \ \ \ f(x + t{u}^{{\prime}}) \geq f(x) +\langle {x}^{{\ast}},t{u}^{{\prime}}\rangle + {t}^{2}\langle H{u}^{{\prime}},{u}^{{\prime}}\rangle + r(t,{u}^{{\prime}}).}$$

If there exists a function \(s: X \rightarrow \mathbb{R}\) such that r(t, u) = s(tu) for all \((t,u) \in \mathbb{R} \times X\) and \({\left \Vert x\right \Vert }^{-2}s(x) \rightarrow 0\) as x → 0 in X∖{0} we say that H is a firm subhessian or a Fréchet subhessian. Then one sees that H is a directional subhessian of f at (x, x ) if, and only if, for all u ∈ X one has

$$\displaystyle{{f}^{{\prime\prime}}(x,{x}^{{\ast}},u) \geq \langle Hu,u\rangle.}$$

We refer to [32] for the links between subhessians and conjugacy and to [19] for calculus rules for (limiting) subhessians.

Second-order epi-derivatives are useful to get second-order optimality conditions (see [4345]).

The calculus rules for such derivatives are not simple (see [16, 17, 27, 31, 34, 45]). However, some estimates can be easily obtained and have some usefulness. Clearly, if f = g + h and if \({x}^{{\ast}} = {y}^{{\ast}} + {z}^{{\ast}}\), one has

$$\displaystyle{{f}^{{\prime\prime}}(x,{x}^{{\ast}},u) \geq {g}^{{\prime\prime}}(x,{y}^{{\ast}},u) + {h}^{{\prime\prime}}(x,{z}^{{\ast}},u)}$$

for all \(u \in \mathrm{ dom}g_{x,{y}^{{\ast}}}^{{\prime\prime}}\cap \mathrm{ dom}h_{x,{z}^{{\ast}}}^{{\prime\prime}}.\) Suppose f: = hg, where g: X → Y is twice directionally differentiable at x and \(h: Y \rightarrow \overline{\mathbb{R}}\) is finite at y: = g(x), given y  ∈ Y and \({x}^{{\ast}}:= {y}^{{\ast}}\circ {g}^{{\prime}}(x)\), one has

$$\displaystyle{\ {f}^{{\prime\prime}}(x,{x}^{{\ast}},u) \geq {h}^{{\prime\prime}}(y,{y}^{{\ast}},{g}^{{\prime}}(x)u) +\langle {y}^{{\ast}},{D}^{2}g(x)(u,u)\rangle,}$$

as shows the fact that

$$\displaystyle{w(t,{u}^{{\prime}}):= \frac{2} {{t}^{2}}(g(x+t{u}^{{\prime}})-g(x)-t{g}^{{\prime}}(x){u}^{{\prime}}-\frac{{t}^{2}} {2} {D}^{2}g(x)(u,u))\mathop{ \rightarrow }\limits_{ (t,{u}^{{\prime}}) \rightarrow (0_{ +},u)}0,}$$

so that \({g}^{{\prime}}(x){u}^{{\prime}} + (1/2)t({D}^{2}g(x)(u,u) + w(t,{u}^{{\prime}})) \rightarrow {g}^{{\prime}}(x)u\) as (t, u ) → (0+, u). 

Let us consider the relationships of such derivatives with graphical derivatives of the subdifferential. Recall that the (outer) graphical derivative at (x, y) ∈ gph(F) of a multimap F: X ⇉ Y between two normed spaces is the multimap DF(x, y) whose graph is the (weak) tangent cone at (x, y) to the graph of F. In other terms,

$$\displaystyle{\forall u \in X\ \ \ \ \ \ \ \ \ \ \ \ \ \ \ DF(x,y)(u):= \text{seq}\mathrm{-weak-}\limsup _{(t,{u}^{{\prime}})\rightarrow (0_{+},u)}\varDelta _{t}F(x,y)({u}^{{\prime}}),}$$

where

$$\displaystyle{\varDelta _{t}F(x,y)({u}^{{\prime}}):= \frac{1} {t} (F(x + t{u}^{{\prime}}) - y).}$$

We say that F has a graphical derivative at (x, y) or that F is (weakly) proto-differentiable at (x, y) if the sequential weak limsup of Δ t F(x, y) as t → 0+ coincides with \(\liminf _{t\rightarrow 0_{+}}\varDelta _{t}F(x,y).\)

For \(f: X \rightarrow \overline{\mathbb{R}}\) finite at x and x  ∈ ∂ f(x), let us introduce the differential quotient

$$\displaystyle{\forall u \in X\ \ \ \ \ \ \ \ \ \varDelta _{t}^{2}f_{ x,{x}^{{\ast}}}(u):= \frac{2} {{t}^{2}}\left [f(x + tu) - f(x) -\langle {x}^{{\ast}},tu\rangle \right ].}$$

We say that \(f: X \rightarrow \overline{\mathbb{R}}\) finite at x has a second-order epi-derivative at (x, x ) with x  ∈ ∂ f(x) (in the sense of Mosco) if the sequential weak limsup as t → 0+ of the epigraph of \(\varDelta _{t}^{2}f_{x,{x}^{{\ast}}}\) coincides with the liminf as t → 0+ of the epigraph of \(\varDelta _{t}^{2}f_{x,{x}^{{\ast}}}.\)

The function \(f: X \rightarrow \overline{\mathbb{R}}\) is said to be paraconvex around x if it is lower semicontinuous, finite at x and if there exist c > 0,  r > 0 such that the function \(f + (c/2){\left \Vert \cdot \right \Vert }^{2}\) is convex on the ball B(x, r) with center x and radius r.

Assertion (a) of the next proposition completes [45, Lemma 13.39]. Assertion (b) extends [45, Theorem 13.40] to the infinite dimensional case and [9] to a nonconvex case. Our assumptions are slightly different from the assumptions of [23] dealing with primal lower-nice functions, a class that is larger than the class of paraconvex functions but requires a more complex analysis. See also [3, 2123, 28].

Proposition 1.

  1. (a)

    For any function \(f: X \rightarrow \overline{\mathbb{R}}\) on a normed space X that is finite at x ∈ X and for all x ∈ ∂f(x) one has

    $$\displaystyle{ D(\partial f)(x,{x}^{{\ast}})(u) = \text{seq}\mathrm{-weak-}\limsup _{ (t,{u}^{{\prime}})\rightarrow (0_{+},u)}\partial \left (\frac{1} {2}\varDelta _{t}^{2}f_{ x,{x}^{{\ast}}}\right )({u}^{{\prime}}). }$$
    (1)
  2. (b)

    If X is a Hilbert space and if f is paraconvex around x, then ∂f has a graphical derivative D(∂f)(x,x ) at (x,x ) if, and only if, f has a second-order epi-derivative \(f_{x,{x}^{{\ast}}}^{{\prime\prime}}\) at (x,x ). Then

    $$\displaystyle{D(\partial f)(x,{x}^{{\ast}})(u) = \frac{1} {2}\partial f_{x,{x}^{{\ast}}}^{{\prime\prime}}(u).}$$

Let us observe that when f has a second-order epi-derivative at (x, x ) that is quadratic and continuous this relation ensures that D(∂ f)(x, x ) is the linear and continuous map from X into X corresponding to this quadratic form.

Proof.

  1. (a)

    We first observe that the calculus rules of subdifferentials ensure that

    $$\displaystyle{\partial \left (\frac{1} {2}\varDelta _{t}^{2}f_{ x,{x}^{{\ast}}}\right )(\cdot ) =\varDelta _{t}(\partial f)(x,{x}^{{\ast}})(\cdot ).}$$

    Passing to the (sequential weak) limsup as t → 0+ yields equality on the graphs and relation (1).

  2. (b)

    Changing f into \(g:= f + (c/2){\left \Vert \cdot \right \Vert }^{2}\) for some appropriate c > 0 and observing that \((c/2){\left \Vert \cdot \right \Vert }^{2}\) is twice continuously differentiable, we reduce the proof to the case f is convex. Now we note that for all t > 0 we have \((0,0) \in \partial (\frac{1} {2}\varDelta _{t}^{2}f_{ x,{x}^{{\ast}}})\) and that \((\varDelta _{t}^{2}f_{x,{x}^{{\ast}}})(0) = 0.\) Thus we can apply the equivalence of Attouch’s theorem ([1, Theorem 3.66]) and get the conclusion.

 □ 

3 Epi-derivatives and Regularization

Let us consider the interplay between second-order epi-differentiation and the Moreau regularization (or Moreau envelope). The Moreau envelope r f: = e r f of a function \(f: X \rightarrow \mathbb{R}_{\infty }:= \mathbb{R} \cup \{\infty \}\) on a Hilbert space X given by

$$\displaystyle{(e_{r}f)(x) =\inf _{w\in X}\left (f(w) + \frac{1} {2r}{\left \Vert x - w\right \Vert }^{2}\right ).}$$

It is well known (see [2, Proprosition 12.29], [38, Theorem 4.124] for instance) that if f is paraconvex around x ∈ X and quadratically minorized in the sense that for some \(a \in \mathbb{R},\) \(b \in \mathbb{R}\) one has \(f \geq b - a{\left \Vert \cdot \right \Vert }^{2},\) then there exists some \(\overline{r} > 0\) such that for all \(r \in ]0,\overline{r}[\) the Moreau envelope e r f is differentiable at x with gradient \(\nabla (e_{r}f)(x) = x_{r}:= (1/r)(x - p_{r})\) where p r : = p r (x) ∈ X is such that \((e_{r}f)(x) = f(p_{r}) + (1/2r){\left \Vert x - p_{r}\right \Vert }^{2}\). The following result has been obtained in [9, Theorem 4.3] for a lower semicontinuous, proper, convex function f under the additional assumption that f is twice epi-differentiable at p r . The method of [9, Theorem 4.3] uses duality and in particular Attouch’s theorem, whereas we take a direct approach. We denote by J: X → X the Riesz isomorphism characterized by ⟨J(x), w⟩ = ⟨xw⟩, where ⟨⋅ ∣⋅ ⟩ is the scalar product of X. 

Proposition 2.

Let X be a Hilbert space and let \(f: X \rightarrow \overline{\mathbb{R}}\) be paraconvex around x ∈ X and quadratically minorized. Then, there exists some \(\overline{r} > 0\) such that for all \(r \in ]0,\overline{r}[\) the Moreau envelope e r f is differentiable at x with gradient \(\nabla (e_{r}f)(x) = x_{r}:= (1/r)(x - p_{r})\) where p r := p r (x) ∈ X is such that \((e_{r}f)(x) = f(p_{r}) + (1/2r){\left \Vert x - p_{r}\right \Vert }^{2}.\) Moreover, the sequential weak (lower) second derivative of the Moreau envelope of f is related to the Moreau envelope of the (lower) sequential weak second derivative of f by the following relation in which \(x_{r}^{{\ast}}:= J(x_{r}) \in \partial f(p_{r})\) is the derivative of e r f at x:

$$\displaystyle{\forall u \in X\ \ \ \ \ \ \ \ \ \ \ \ \frac{1} {2}{(e_{r}f)}^{{\prime\prime}}(x,x_{ r}^{{\ast}},u) = e_{ r}\left (\frac{1} {2}{f}^{{\prime\prime}}(p_{ r},x_{r}^{{\ast}},\cdot )\right )(u).}$$

In particular, when f is lower semicontinuous, proper, convex, and twice epi-differentiable at p r for x r , the function \({(e_{r}f)}^{{\prime\prime}}(x,x_{r}^{{\ast}},\cdot )\) is continuous (and even of class C 1): in such a case \((1/2){f}^{{\prime\prime}}(p_{r},x_{r}^{{\ast}},\cdot )\) is lower semicontinuous, proper, convex, so that \(e_{r}((1/2){f}^{{\prime\prime}}(p_{r},x_{r}^{{\ast}},\cdot ))\) is of class C 1. For the proof we need a result of independent interest about the interchange of minimization with a sequential weak lower epi-limit.

Lemma 1.

Let M be a metric space, let S ⊂ M, \(\overline{s} \in \mathrm{ cl}(S)\) and let Z be a reflexive Banach space endowed with its weak topology. Let \(g: S \times Z \rightarrow \overline{\mathbb{R}}\) be coercive in its second variable, uniformly for s ∈ S ∩ V, where V is a neighborhood of t in M. Then

$$\displaystyle{\liminf _{s(\in S)\rightarrow \overline{s}}(\inf _{z\in Z}g(s,z)) =\inf _{z\in Z}(\text{seq} -\liminf _{(s,{z}^{{\prime}})(\in S\times Z)\rightarrow (\overline{s},z)}g(s,{z}^{{\prime}}))}$$

Proof.

The inequality

$$\displaystyle{\liminf _{s(\in S)\rightarrow \overline{s}}(\inf _{z\in Z}g(s,z)) \leq \inf _{z\in Z}(\text{seq} -\liminf _{(s,{z}^{{\prime}})(\in S\times Z)\rightarrow (\overline{s},z)}g(s,{z}^{{\prime}}))}$$

stems from the fact that for all \(\overline{z} \in Z\), \((z_{n}) \rightarrow \overline{z}\) for the weak topology and \((s_{n}) \rightarrow \overline{s}\) in S one has \(\inf _{z\in Z}g(s_{n},z) \leq g(s_{n},z_{n})\) for all \(n \in \mathbb{N}\), hence \(\liminf _{n}(\inf _{z\in Z}g(s,z)) \leq \liminf _{n}g(s_{n},z_{n}).\)

Let \(r >\liminf _{s(\in S)\rightarrow \overline{s}}(\inf _{z\in Z}g(s,z)).\) Given a sequence \((\varepsilon _{n}) \rightarrow 0_{+}\) one can find some \(s_{n} \in B(\overline{s},\varepsilon _{n}) \cap S\), z n  ∈ Z such that r > g(s n , z n ). We may suppose \(B(\overline{s},\varepsilon _{n}) \subset V\) for all n. The coercivity assumption ensures that (z n ) is bounded. Taking a subsequence if necessary, we may suppose (z n ) has a weak limit z. Then we get

$$\displaystyle{r \geq \liminf _{n}g(s_{n},z_{n}) \geq \text{seq} -\liminf _{(s,{z}^{{\prime}})(\in S\times Z)\rightarrow (\overline{s},z)}g(s,{z}^{{\prime}})}$$

and \(r \geq \inf _{z\in Z}(\text{seq} -\liminf _{(s,{z}^{{\prime}})(\in S\times Z)\rightarrow (\overline{s},z)}g(s,{z}^{{\prime}})).\) □ 

For \((t,u) \in \mathbb{P} \times X\), assuming the classical fact thatr f = e r f is differentiable at x with derivative x r , we adopt the simplified notation

$$\displaystyle{\varDelta _{t}^{2}(e_{ r}f)_{x}(u):= \frac{2} {{t}^{2}}((e_{r}f)(x + tu) - (e_{r}f)(x) -\langle x_{r}^{{\ast}},tu\rangle ).}$$

Then we have the following exchange property generalizing [45, Lemma 13.39].

Lemma 2.

With the preceding notation, one has

$$\displaystyle{\frac{1} {2}\varDelta _{t}^{2}(e_{ r}f)_{x} = e_{r}\left (\frac{1} {2}\varDelta _{t}^{2}f_{ p_{r},x_{r}^{{\ast}}}\right ).}$$

Proof.

Setting w = p r + tz with \(t \in \mathbb{P}\), z ∈ X, for all u ∈ X we have \(\langle x_{r}^{{\ast}},tu\rangle = \frac{t} {r}\langle x - p_{r}\mid u\rangle\), \((e_{r}f)(x) = f(p_{r}) + \frac{1} {2r}{\left \Vert x - p_{r}\right \Vert }^{2}\),

$$\displaystyle\begin{array}{rcl} (e_{r}f)(x + tu)& =& \inf _{w\in X}\left (f(w) + \frac{1} {2r}{\left \Vert x + tu - w\right \Vert }^{2}\right ) {}\\ & =& \inf _{z\in X}\left (f(p_{r} + tz) + \frac{1} {2r}{\left \Vert x - p_{r} + t(u - z)\right \Vert }^{2}\right ), {}\\ \end{array}$$
$$\displaystyle\begin{array}{rcl} \frac{1} {2r}{\left \Vert x - p_{r} + t(u - z)\right \Vert }^{2} - \frac{1} {2r}{\left \Vert x - p_{r}\right \Vert }^{2} - \frac{t} {r}\langle x - p_{r}\mid u\rangle & & {}\\ = -\frac{t} {r}\langle x - p_{r}\mid z\rangle + \frac{{t}^{2}} {2r}{\left \Vert u - z\right \Vert }^{2}& & {}\\ \end{array}$$

hence

$$\displaystyle\begin{array}{rcl} & & \frac{1} {2}\varDelta _{t}^{2}(e_{ r}f)_{x}(u) {}\\ & & = \frac{1} {{t}^{2}}\inf _{w\in X}\left [f(w) - f(p_{r}) + \frac{1} {2r}{\left \Vert x + tu - w\right \Vert }^{2} - \frac{1} {2r}{\left \Vert x - p_{r}\right \Vert }^{2} -\langle x_{ r}^{{\ast}},tu\rangle \right ] {}\\ & & = \frac{1} {{t}^{2}}\inf _{z\in X}\left [f(p_{r} + tz) - f(p_{r}) - \frac{t} {r}\langle x - p_{r}\mid z\rangle + \frac{{t}^{2}} {2r}{\left \Vert u - z\right \Vert }^{2}\right ] {}\\ & & =\inf _{z\in X}\left [ \frac{1} {{t}^{2}}(f(p_{r} + tz) - f(p_{r}) -\langle x_{r}^{{\ast}},tz\rangle ) + \frac{1} {2r}{\left \Vert u - z\right \Vert }^{2}\right ] {}\\ & & = e_{r}\left (\frac{1} {2}\varDelta _{t}^{2}f_{ p_{r},x_{r}^{{\ast}}}\right )(u). {}\\ \end{array}$$

Proof of Proposition 2. The first assertion is deduced from the convex case by considering \(f + c{\left \Vert \cdot \right \Vert }^{2}\) for some appropriate c > 0 (see [2], [45, Theorem 2.26]). For the second one, in Lemma 1 let us set Z: = X, \(S:= \mathbb{R} \times X\), \(M:= \mathbb{P} \times X\), s: = (t, u ), \(\overline{s}:= (0,u)\), g: = h + k with

$$\displaystyle\begin{array}{rcl} h(s,z)&:=& \varDelta _{t}^{2}f(p_{ r},x_{r}^{{\ast}},z):= (1/{t}^{2})(f(p_{ r} + tz) - f(p_{r}) -\langle x_{r}^{{\ast}},tz\rangle ), {}\\ k(s,z)&:=& (1/2r){\left \Vert {u}^{{\prime}}- z\right \Vert }^{2}. {}\\ \end{array}$$

It remains to show that for all z ∈ Z one has

$$\displaystyle{ \text{seq} -\liminf _{(s,{z}^{{\prime}})(\in S\times Z)\rightarrow (\overline{s},z)}g(s,{z}^{{\prime}}) = \frac{1} {2}{f}^{{\prime\prime}}(p_{ r},x_{r}^{{\ast}},z) + \frac{1} {2r}{\left \Vert u - z\right \Vert }^{2}. }$$
(2)

For every sequences (t n ) → 0+, (u n ) → u, (z n ) → z in the weak topology of Z = X, one has

$$\displaystyle\begin{array}{rcl} \liminf _{n}g((t_{n},u_{n}),z_{n})& \geq & \liminf _{n}h((t_{n},u_{n}),z_{n}) +\liminf _{n}k((t_{n},u_{n}),z_{n}) {}\\ & \geq & \frac{1} {2}{f}^{{\prime\prime}}(p_{ r},x_{r}^{{\ast}},z) + \frac{1} {2r}{\left \Vert u - z\right \Vert }^{2}. {}\\ \end{array}$$

Let (t n ) → 0+, (z n ) → z (in the weak topology) be such that \((\varDelta _{t_{n}}^{2}f(p_{r},x_{r}^{{\ast}},z_{n})) \rightarrow \frac{1} {2}{f}^{{\prime\prime}}(p_{ r},x_{r}^{{\ast}},z).\) Setting u n : = u + z n z, we note that (u n ) → u in the weak topology and \({\left \Vert u_{n} - z_{n}\right \Vert }^{2} ={ \left \Vert u - z\right \Vert }^{2}\) for all n, so that

$$\displaystyle\begin{array}{rcl} \frac{1} {2}{f}^{{\prime\prime}}(p_{ r},x_{r}^{{\ast}},z) + \frac{1} {2r}{\left \Vert u - z\right \Vert }^{2}& =& \lim _{ n}h((t_{n},u_{n}),z_{n}) +\lim _{n}k((t_{n},u_{n}),z_{n}) {}\\ & =& \lim _{n}g((t_{n},u_{n}),z_{n}) \geq \text{seq} -\liminf _{(s,{z}^{{\prime}})(\in S\times Z)\rightarrow (\overline{s},z)}g(s,{z}^{{\prime}}).{}\\ \end{array}$$

Since the reverse inequality stems from the preceding estimate, we get relation (2).  □ 

4 Second-Order Derivatives via Coderivatives

Let us recall that the coderivative of a multimap (or set-valued map) F: U ⇉ V between two normed spaces at \((\overline{u},\overline{v}) \in F\) (identified with its graph gph(F)) is the multimap \({D}^{{\ast}}F(\overline{u},\overline{v}): {V }^{{\ast}}\rightrightarrows {U}^{{\ast}}\) defined by

$$\displaystyle{{D}^{{\ast}}F(\overline{u},\overline{v})({v}^{{\ast}}):=\{ {u}^{{\ast}}\in {U}^{{\ast}}: ({u}^{{\ast}},-{v}^{{\ast}}) \in N(F,(\overline{u},\overline{v}))\}.}$$

If \(\varphi: U \rightarrow V\) is a map of class C 1 between two normed spaces, the coderivative of \(\varphi\) at \((\overline{u},\varphi (\overline{u})),\) denoted by \({D}^{{\ast}}\varphi (\overline{u})\) rather than \({D}^{{\ast}}\varphi (\overline{u},\varphi (\overline{u}))\), is

$$\displaystyle{{D}^{{\ast}}\varphi (\overline{u}) {=\varphi }^{{\prime}}{(\overline{u})}^{T}: {V }^{{\ast}}\rightarrow {U}^{{\ast}},}$$

a single-valued (linear) map rather than a multimap. When \(\varphi = {f}^{{\prime}}\) is the derivative of a function \(f: U \rightarrow \mathbb{R}\) (so that V = U ), denoting by D 2 f: U → L(U, U ) the derivative of \(\varphi:= {f}^{{\prime}},\) we obtain that \({D}^{{\ast}}{f}^{{\prime}}(\overline{u}) = {D}^{2}f{(\overline{u})}^{T}\) maps U ∗∗ into U and

$$\displaystyle{{D}^{{\ast}}{f}^{{\prime}}(\overline{u})({u}^{{\ast}{\ast}}) = {u}^{{\ast}{\ast}}{\circ \varphi }^{{\prime}}(\overline{u}) = {u}^{{\ast}{\ast}}\circ {D}^{2}f(\overline{u}) \in {U}^{{\ast}}.}$$

Denoting by AA b the isomorphism from L(U, U ) onto the space \({L}^{2}(U,U; \mathbb{R})\) of continuous bilinear forms on U, we see that the restriction of \({\partial }^{2}f(\overline{u}):= {D}^{{\ast}}{f}^{{\prime}}(\overline{u})\) to U considered as a subspace of U ∗∗ is \({D}^{2}f{(\overline{u})}^{b}:\)

$$\displaystyle{\forall u \in U\ \ \ \ \ \ \ \ \ \ \ \ {\partial }^{2}f(\overline{u})(u):= {D}^{{\ast}}{f}^{{\prime}}(\overline{u})(u) = {D}^{2}f{(\overline{u})}^{b}(u,\cdot ) \in {U}^{{\ast}}.}$$

Calculus rules for coderivatives are presented in [24, 26]. In [27] calculus rules for coderivatives of subdifferentials are given in the finite dimensional case. Here we look for extensions to the infinite dimensional case.

Since the coderivative of a multimap is defined through the normal cone to its graph, it is natural to expect that calculus rules for second-order derivatives of functions depend on calculus rules for normal cones under images and inverse images. Let us recall such rules (see [38, Proposition 2.108, Theorem 2.111]).

Proposition 3.

Let U, V, W be Banach spaces, and let j: U → V, k: U → W be maps of class C 1 , E ⊂ U, H ⊂ W, \(\overline{u} \in E\), \(\overline{v}:= j(\overline{u}) \in V,\) \(\overline{w}:= k(\overline{u}) \in W.\)

If E:= k −1 (H), and if \({k}^{{\prime}}(\overline{u})(U) = W\) , then \(N(E,\overline{u}) = {k}^{{\prime}}{(\overline{u})}^{T}(N(H,\overline{w})).\)

If F:= j(E), then \(N(F,\overline{v}) \subset {({j}^{{\prime}}{(\overline{u})}^{T})}^{-1}(N(E,\overline{u})).\)

If F:= j(k −1 (H)), and if \({k}^{{\prime}}(\overline{u})(U) = W\) , then, for all \({v}^{{\ast}}\in N(F,\overline{v})\) , there exists a unique \({w}^{{\ast}}\in N(H,\overline{w})\) such that \({j}^{{\prime}}{(\overline{u})}^{T}({v}^{{\ast}}) = {k}^{{\prime}}{(\overline{u})}^{T}({w}^{{\ast}}).\)

For the second assertion it is not necessary to suppose U and V are complete. Moreover, the differentiability assumptions on j and k can be adapted to the specific subdifferentials that are used. But since we wish to adopt a general approach, we ignore such refinements.

Let us consider a composite function f: = hg, where X, Y are Banach spaces, g: X 0 → Y is of class C 2 on some open subset X 0 of X whose derivative at x around \(\overline{x} \in X_{0}\) is surjective and \(h: Y \rightarrow \overline{\mathbb{R}}\) is lower semicontinuous and finite around \(g(\overline{x})\). For all usual subdifferentials one has the formula

$$\displaystyle{\partial f(x) = {g}^{{\prime}}{(x)}^{T}(\partial h(g(x))),}$$

where g (x)T is the transpose of the derivative g (x) of g at x. 

In order to compute \({\partial }^{2}f(\overline{x},{\overline{x}}^{{\ast}}):= {D}^{{\ast}}\partial f(\overline{x},{\overline{x}}^{{\ast}}),\) where \({\overline{x}}^{{\ast}}\in \partial f(\overline{x})\), \({\overline{x}}^{{\ast}} = {g}^{{\prime}}{(\overline{x})}^{T}({\overline{y}}^{{\ast}})\) with \({\overline{y}}^{{\ast}} \in \partial h(g(\overline{x}))\), let us denote by F (resp. H) the graph of ∂ f (resp. ∂ h) and set

$$\displaystyle\begin{array}{rcl} E&:=& \{(x,{y}^{{\ast}}) \in X \times {Y }^{{\ast}}:\ (g(x),{y}^{{\ast}}) \in H\}, {}\\ j&:=& I_{X} \times {g}^{{\prime}}{(x)}^{T}: (x,{y}^{{\ast}})\mapsto (x,{y}^{{\ast}}\circ {g}^{{\prime}}(x)), {}\\ k&:=& g \times I_{{Y }^{{\ast}}}: (x,{y}^{{\ast}})\mapsto (g(x),{y}^{{\ast}}). {}\\ \end{array}$$

Then, for U: = X × Y , V: = X × X , W: = Y × Y , one has

$$\displaystyle{E = {k}^{-1}(H),\ \ \ \ \ F = j(E).}$$

Proposition 4.

With the preceding assumptions, one has \({x}^{{\ast}}\in {\partial }^{2}f(\overline{x},{\overline{x}}^{{\ast}})({x}^{{\ast}{\ast}})\) if, and only if, for \({y}^{{\ast}{\ast}} = {g}^{{\prime}}{(\overline{x})}^{TT}({x}^{{\ast}{\ast}})\) and some \({y}^{{\ast}}\in {\partial }^{2}h(g(\overline{x}),{\overline{y}}^{{\ast}})({y}^{{\ast}{\ast}})\) , one has

$$\displaystyle{ {x}^{{\ast}} = {g}^{{\prime}}{(\overline{x})}^{T}({y}^{{\ast}}) +\langle {x}^{{\ast}{\ast}},{\overline{y}}^{{\ast}}\circ ({D}^{2}g(\overline{x})(\cdot ))\rangle. }$$
(3)

Note that for all u ∈ X one has \({D}^{2}g(\overline{x})(u) \in L(X,Y )\) and \({\overline{y}}^{{\ast}}\circ ({D}^{2}g(\overline{x})(u)) \in {X}^{{\ast}}.\) If x ∗∗ is the image of some x ∈ X through the canonical injection of X into X ∗∗, then for \(y:= {g}^{{\prime}}(\overline{x})(x),\) \({y}^{{\ast}}\in {\partial }^{2}h(g(\overline{x}),{\overline{y}}^{{\ast}})(y)\), one gets

$$\displaystyle{{x}^{{\ast}} = {y}^{{\ast}}\circ {g}^{{\prime}}(\overline{x}) +{ \overline{y}}^{{\ast}}\circ ({D}^{2}g(\overline{x})(x)),}$$

a formula akin the classical formula of the twice differentiable case.

Proof.

The derivatives of j and k at \(\overline{u}:= (\overline{x},{\overline{y}}^{{\ast}})\) are given by

$$\displaystyle\begin{array}{rcl} {j}^{{\prime}}(\overline{x},{\overline{y}}^{{\ast}})(x,{y}^{{\ast}})& =& (x,{\overline{y}}^{{\ast}}\circ ({D}^{2}g(\overline{x})(x)) + {y}^{{\ast}}\circ {g}^{{\prime}}(\overline{x})), {}\\ {k}^{{\prime}}(\overline{x},{\overline{y}}^{{\ast}})(x,{y}^{{\ast}})& =& ({g}^{{\prime}}(\overline{x})x,{y}^{{\ast}}), {}\\ \end{array}$$

so that \({k}^{{\prime}}(\overline{x},{\overline{y}}^{{\ast}})(X \times {Y }^{{\ast}}) = Y \times {Y }^{{\ast}}\) and we can apply the last assertion of Proposition 3. Thus, the relation \({j}^{{\prime}}{(\overline{u})}^{T}({v}^{{\ast}}) = {k}^{{\prime}}{(\overline{u})}^{T}({w}^{{\ast}})\) with \({v}^{{\ast}}:= ({x}^{{\ast}},-{x}^{{\ast}{\ast}})\), \({w}^{{\ast}}:= ({y}^{{\ast}},-{y}^{{\ast}{\ast}})\) can be transcribed as

$$\displaystyle{\langle ({x}^{{\ast}},-{x}^{{\ast}{\ast}}),(x,{\overline{y}}^{{\ast}}\circ ({D}^{2}g(\overline{x})(x)) + {y}^{{\ast}}\circ {g}^{{\prime}}(\overline{x}))\rangle =\langle ({y}^{{\ast}},-{y}^{{\ast}{\ast}}),({g}^{{\prime}}(\overline{x})x,{y}^{{\ast}})\rangle.}$$

for all (x, y ) ∈ X × Y . Taking successively x = 0 and then y  = 0, we get

$$\displaystyle\begin{array}{rcl} & & \forall {y}^{{\ast}}\in {Y }^{{\ast}}\ \ \ \ \ \ \ \ \ \ \ \ \langle {x}^{{\ast}{\ast}},{g}^{{\prime}}{(\overline{x})}^{T}{y}^{{\ast}}\rangle =\langle {y}^{{\ast}{\ast}},{y}^{{\ast}}\rangle, {}\\ & & \forall x \in X\ \ \ \ \ \ \ \ \ \ \langle {x}^{{\ast}},x\rangle -\langle {x}^{{\ast}{\ast}},{\overline{y}}^{{\ast}}\circ ({D}^{2}g(\overline{x})(x))\rangle =\langle {y}^{{\ast}},{g}^{{\prime}}(\overline{x})x\rangle {}\\ \end{array}$$

or \({g}^{{\prime}}{(\overline{x})}^{TT}({x}^{{\ast}{\ast}}) = {y}^{{\ast}{\ast}}\), \({x}^{{\ast}} = {g}^{{\prime}}{(\overline{x})}^{T}({y}^{{\ast}}) +\langle {x}^{{\ast}{\ast}},{\overline{y}}^{{\ast}}\circ ({D}^{2}g(\overline{x})(\cdot ))\rangle.\) □ 

5 Coderivative of the Gradient Map of a Moreau Envelope

Let us turn to the coderivative of the gradient map of the Moreau envelope of a paraconvex function f on a Hilbert space X such that for some \(a \in \mathbb{R},\) \(b \in \mathbb{R}\) one has \(f \geq b - a{\left \Vert \cdot \right \Vert }^{2}\). We recalled that there exists some \(\overline{r} > 0\) such that for all \(r \in ]0,\overline{r}[\) the Moreau envelope e r f = r​​ f is differentiable with gradient ∇(e r f)(x) = (1∕r)(xp r (x)) where p r (x) ∈ X is such that \(e_{r}(x) = f(p_{r}(x)) + (1/2r){\left \Vert x - p_{r}(x)\right \Vert }^{2}\) and \(x - p_{r}(x) \in r{\partial }^{\nabla }f(p_{r}(x))\), where the subgradient ∂ g(x) of g at x is the inverse image of ∂ g(x) under the Riesz isomorphism J: X → X . Here we abridge (e r f)(x) into e r (x), whereas we write p r (x) instead of p r in order to take into account the dependence on x. We have to apply calculus rules for coderivatives. We do not need the full pictures of [24, 26]. A simple lemma is enough. Here we take the coderivatives associated with the directional, the firm (Fréchet), the limiting, and the Clarke subdifferentials, and we simply write instead of D , F , L , C , respectively.

Lemma 3.

  1. (a)

    If F −1 : Y ⇉ X is the inverse multimap of F: X ⇉ Y, then for all (y,x) ∈ F −1 and all x ∈ X one has \({y}^{{\ast}}\in {D}^{{\ast}}{F}^{-1}(y,x)({x}^{{\ast}})\) if and only if \(-{x}^{{\ast}}\in {D}^{{\ast}}F(x,y)(-{y}^{{\ast}}).\)

  2. (b)

    \({u}^{{\ast}}\in {D}^{{\ast}}(-F)(x,-y)({v}^{{\ast}})\) if, and only if, \({u}^{{\ast}}\in {D}^{{\ast}}F(x,y)(-{v}^{{\ast}})\) .

  3. (c)

    If F:= I + G, where G: X ⇉ X and I is the identity map, then for all v ∈ X one has \({D}^{{\ast}}F(x,y)({v}^{{\ast}}) = {v}^{{\ast}} + {D}^{{\ast}}G(x,y - x)({v}^{{\ast}}).\)

Proof.

We give the proof in the case the subdifferential is the directional subdifferential D . The cases of the firm, the limiting, and the Clarke subdifferentials are also simple applications of the definitions.

  1. (a)

    We observe that \(N({F}^{-1},(y,x)) = (N{(F,(x,y))}^{-1}.\) Thus, for \({x}^{{\ast}}\in {X}^{{\ast}}\) one has \({y}^{{\ast}}\in {D}^{{\ast}}{F}^{-1}(y,x)({x}^{{\ast}})\) if, and only if, \((-{x}^{{\ast}},{y}^{{\ast}}) \in N(F,(x,y)),\) if, and only if, \(-{x}^{{\ast}}\in {D}^{{\ast}}F(x,y)(-{y}^{{\ast}}).\)

  2. (b)

    The equivalence is a consequence of the fact that

    $$\displaystyle{({u}^{{\ast}},-{v}^{{\ast}}) \in N(\mathrm{gph}(-F),(x,-y))\Longleftrightarrow({u}^{{\ast}},{v}^{{\ast}}) \in N(\mathrm{gph}(F),(x,y)).}$$
  3. (c)

    It is easy to see that v ∈ DF(x, y)(u) for (u, v) ∈ X × X if, and only if, there exists w ∈ DG(x, yx)(u) such that v = u + w, if, and only if, vu ∈ DG(x, yx)(u). Then (u , −v ) ∈ N(F, (x, y)) if, and only if, for all u ∈ X, w ∈ DG(x, yx)(u), one has

    $$\displaystyle{\langle {u}^{{\ast}},u\rangle +\langle -{v}^{{\ast}},u + w\rangle \leq 0,}$$

    if, and only if, \(({u}^{{\ast}}- {v}^{{\ast}},-{v}^{{\ast}}) \in N(G,(x,y - x))\) or \({u}^{{\ast}} = {v}^{{\ast}} + ({u}^{{\ast}}- {v}^{{\ast}}) \in {v}^{{\ast}} + {D}^{{\ast}}G(x,y - x)({v}^{{\ast}}).\)

 □ 

Applying the preceding lemma to the map \(p_{r} = {(I + r{\partial }^{\nabla }f)}^{-1}\), we get, since \(\nabla e_{r}(x) = {r}^{-1}(x - p_{r}(x))\)

$$\displaystyle\begin{array}{rcl} {u}^{{\ast}}\in {D}^{{\ast}}\nabla e_{ r}(x)({v}^{{\ast}})& \Leftrightarrow & {u}^{{\ast}}\in {r}^{-1}({v}^{{\ast}} + {D}^{{\ast}}p_{ r}(x)(-{v}^{{\ast}})), {}\\ {w}^{{\ast}}\in {D}^{{\ast}}p_{ r}(x)(-{v}^{{\ast}})& \Leftrightarrow & {v}^{{\ast}}\in {D}^{{\ast}}(I + r{\partial }^{\nabla }f)(p_{ r}(x),x)(-{w}^{{\ast}}) {}\\ & \Leftrightarrow & {v}^{{\ast}}\in -{w}^{{\ast}} + r{D}^{{\ast}}{\partial }^{\nabla }f(p_{ r}(x),x)(-{w}^{{\ast}}). {}\\ \end{array}$$

Therefore, \({u}^{{\ast}}\in {D}^{{\ast}}\nabla e_{r}(x)({v}^{{\ast}})\) if, and only if, for \({w}^{{\ast}}:= r{u}^{{\ast}}- {v}^{{\ast}}\in {D}^{{\ast}}p_{r}(x)(-{v}^{{\ast}})\) one has \({v}^{{\ast}} + {w}^{{\ast}}\in r{D}^{{\ast}}{\partial }^{\nabla }f(p_{r}(x),x)(-{w}^{{\ast}})\) or \(r{u}^{{\ast}}\in r{D}^{{\ast}}{\partial }^{\nabla }f(p_{r}(x),x)(-{w}^{{\ast}})\) or \({u}^{{\ast}}\in {D}^{{\ast}}{\partial }^{\nabla }f(p_{r}(x),x)({v}^{{\ast}}- r{u}^{{\ast}}).\) We have obtained an expression of D e r f in terms of D f. We state it as follows.

Proposition 5.

If f is paraconvex around x and such that for some \(a \in \mathbb{R},\) \(b \in \mathbb{R}\) one has \(f \geq b - a{\left \Vert \cdot \right \Vert }^{2}\) , then for r > 0 small enough, one has \({u}^{{\ast}}\in {D}^{{\ast}}\nabla (e_{r}f)(x)({v}^{{\ast}})\) if, and only if, \({u}^{{\ast}}\in {D}^{{\ast}}{\partial }^{\nabla }f(p_{r}(x),x)({v}^{{\ast}}- r{u}^{{\ast}})\)

$$\displaystyle{{D}^{{\ast}}\nabla (e_{ r}f)(x) = {(rI + {({D}^{{\ast}}{\partial }^{\nabla }f(p_{ r}(x),x))}^{-1})}^{-1}.}$$

6 Conclusion

The choice of notation and terminology is not a simple matter. It has some importance in terms of use. The famous example of the quarrel between Leibniz and Newton shows that the choice of the mathematician that is often considered as the first discoverer of a concept is not always the choice that remains in use. For what concerns terminology, in mathematics as in other sciences, both names of scientists and descriptive names are in use. The difference is that mathematicians prefer ordinary names such as “field,”“group,”and “ring” to sophisticated names issued from Greek or Latin. For what concerns notation, there is a trade-off between simplicity and clarity. Both are desirable, but avoiding ambiguity is crucial. Nonetheless, local abuses of notation are often tolerated. Here, because our aim was limited to a few glances at second-order generalized derivatives, we have avoided a heavy notation involving upper epi-limits and we have often omitted the mention that weak convergence is involved. That is not always suitable.

Let us observe that the use of a generic subdifferential avoids delicate choices of notation for subdifferentials. In his first contributions the author endeavored to use symbols like f 0(x, ⋅ ), f (x, ⋅ ) rather than letters like f C(x, ⋅ ) to denote generalized derivatives. But the disorder among such symbols led him to adopt a more transparent choice in [38] and elsewhere. Also, he tried to combine the advantages of descriptive terms and authors’ names. More importantly, he chose to adopt the simplest notation for simple, fundamental objects such as Fréchet normal cones N F and Fréchet subdifferentials F that are used in most proofs rather than affect them with decorations such as \(\hat{N}\) or \(\overline{N}\) usually kept for completions or limiting constructions. The fact that limiting constructions can be performed by using different convergences comforts that choice.

It would be interesting to decide whether the influences of dominant philosophies (behaviorism in America, Cartesianism, Kantianism, positivism, existentialism, structuralism in Europe) play some role in the different choices present in the mathematical literature. But that question is outside the scope of the present contribution.