1 Introduction

A principal goal of variational analysis is the search for generalized critical points of nonsmooth functions \(f :\mathbf{R}^n \rightarrow \mathbf{R}\). For example, given a locally Lipschitz function \(f\), we might be interested in points \(x \in \mathbf{R}^n\) having zero in the “Clarke generalized gradient” (or “subdifferential”) \(\partial _c f(x)\), a set consisting of convex combinations of limits of gradients of \(f\) at points near \(x\) [14].

Adding a linear perturbation, we might seek critical points of the function \(x \mapsto f(x) - v^T x\) for a given vector \(v \in \mathbf{R}^m\), or, phrased in terms of the graph of the subdifferential mapping \(\partial _c f\), solutions to the inclusion

$$\begin{aligned} (x,v) \in \text{ gph}\,\partial _c f. \end{aligned}$$

More generally, given a smooth function \(G :\mathbf{R}^m \rightarrow \mathbf{R}^n\), we might be interested in solutions \((x,y) \in \mathbf{R}^m \times \mathbf{R}^n\) to the system

$$\begin{aligned} (G(x),y) \in \text{ gph}\,\partial _c f ~~\text{ and}~~ \nabla G(x)^* y = v \end{aligned}$$
(1)

(where \(*\) denotes the adjoint). Such systems arise naturally when we seek critical points of the composite function \(x \mapsto f(G(x)) - v^T x\).

Generalized critical points of smooth functions \(f\) are, of course, simply the critical points in the classical sense. However, the more general theory is particularly interesting to optimization specialists, because critical points of continuous convex functions are just minimizers [34, Proposition 8.12], and more generally, for a broader class of functions (for instance, those that are Clarke regular [14]), a point is critical exactly when the directional derivative is non-negative in every direction.

The system (1) could, in principle, be uninformative if the graph \(\text{ gph}\,\partial _c f\) is large. In particular, if the dimension (appropriately defined) of the graph is larger than \(n\), then we could not typically expect the system to be a very definitive tool, since it involves \(m+n\) variables constrained by only \(m\) linear equations and the inclusion. Such examples are not hard to construct: indeed, there exists a function \(f :\mathbf{R}\rightarrow \mathbf{R}\) with Lipschitz constant one and with the property that its Clarke subdifferential is the interval \([-1,1]\) at every point [32]. Alarmingly, in a precise mathematical sense, this property is actually typical for such functions [10].

Optimization theorists often consider subdifferentials that are smaller than Clarke’s, the “limiting” subdifferential \(\partial f\) being a popular choice [11, 15, 28, 34]. However, the Clarke subdifferential can be easier to approximate numerically (see [12]), and in any case the potential difficulty posed by functions with large subdifferential graphs persists with the limiting subdifferential [7].

Notwithstanding this pathology, concrete functions \(f :\mathbf{R}^n \rightarrow \mathbf{R}\) encountered in practice have subdifferentials \(\partial _c f\) whose graphs are, in some sense, small and this property can be useful, practically. For instance, Robinson [31] considers algorithmic aspects of functions whose subdifferential graphs are everywhere locally Lipschitz homeomorphic to an open subset of \(\mathbf{R}^n\). As above, dimensional considerations suggest reassuringly that this property should help the definitive power of critical point systems like (1), and Robinson furthermore argues that it carries powerful computational promise. An example of the applicability of Robinson’s techniques is provided by Minty’s theorem, which states that the graph of the subdifferential of a proper, lower semicontinuous, convex function \(f:\mathbf{R}^n\rightarrow \overline{\mathbf{R}}\) is Lipschitz homeomorphic to \(\mathbf{R}^n\) [27].

When can we be confident that a function has a subdifferential graph that is, by some definition, small? The study of classes of functions that are favorable for subdifferential analysis, in particular excluding the pathological examples above, is well-developed. The usual starting point is a unification of smooth and convex analysis, arriving at such properties as amenability [34, Chapter 10.F.], prox-regularity [30], and cone-reducibility [6, Section 3.4.4]. Using Minty’s theorem, Poliquin and Rockafellar [30] showed that prox-regular functions, in particular, have small subdifferentials in the sense of Robinson. Aiming precisely at a class of functions with small subdifferentials (in fact minimal in the class of upper semicontinuous mappings with nonempty compact convex images), [8] considers “essential strict differentiability”.

In this work we take a different, very concrete approach. We focus on the dimension of the subdifferential graph, unlike the abstract minimality results of [8], but we consider the class of semi-algebraic functions—those functions whose graphs are semi-algebraic, meaning composed of finitely-many sets, each defined by finitely-many polynomial inequalities—and prove that such functions have small subdifferentials in the sense of dimension: the Clarke subdifferential has \(n\)-dimensional graph. This result subsumes neither the simple case of a smooth function, nor the case of a convex function, neither of which is necessarily semi-algebraic. Nonetheless, it has a certain appeal: semi-algebraic functions are common, they serve as an excellent model for “concrete” functions in variational analysis [22], and in marked contrast with many other classes of favorable functions, such as amenable functions, they may not even be Clarke regular. Furthermore, semi-algebraic functions are easy to recognize (as a consequence of the Tarski–Seidenberg theorem on preservation of semi-algebraicity under projection). For instance, observe that the spectral radius function on \(n\times n\) matrices is neither Lipschitz nor convex, but it is easy to see that it is semi-algebraic.

To illustrate our results, consider the critical points of the function \(x \!\mapsto \! f(x) \!-\! v^T x\) for a semi-algebraic function \(f :\mathbf{R}^n \rightarrow [-\infty ,+\infty ]\). As a consequence of the subdifferential graph being small, we show that for a generic choice of the vector \(v\), the number of critical points is finite. More precisely, there exists a number \(N\), and a semi-algebraic set \(S \subset \mathbf{R}^n\) of dimension strictly less than \(n\), such that for all vectors \(v\) outside \(S\), there exist at most \(N\) critical points. A result of a similar flavor can be found in [23], where criticality of so called “constraint systems” is considered. Specifically, [23] shows that if a semi-algebraic constrained minimization problem is “normal”, then it has only finitely many critical points. Furthermore, it is shown that normality is a generic property. To contrast their approach to ours, we should note that [23] focuses on perturbations to the constraint structure, whereas we address linear perturbations to the function itself.

To be concrete, we state our results for semi-algebraic functions. Analogous results, with essentially identical proofs, hold for functions definable in an “o-minimal structure” and, more generally, for “tame” functions. (In the case of tame functions, “finiteness” of critical points should be replaced by “local isolation” in Proposition 4.3 and Corollaries 4.4, 5.8, 5.9.) In particular, our results hold for globally subanalytic functions, discussed in [36]. For a quick introduction to these concepts in an optimization context, see [22].

2 Preliminaries

2.1 Variational analysis

In this section, we summarize some of the fundamental tools used in variational analysis and nonsmooth optimization. We refer the reader to the monographs of Rockafellar–Wets [34], Borwein–Zhu [11], Mordukhovich [28, 29], and Clarke–Ledyaev–Stern–Wolenski [15], for more details. Unless otherwise stated, we follow the terminology and notation of [34].

Consider the extended real line \(\overline{\mathbf{R}}:=\mathbf{R}\cup \{-\infty \}\cup \{+\infty \}\). We say that an extended-real-valued function is proper if it is never \(\{-\infty \}\) and is not always \(\{+\infty \}\).

For a function \(f:\mathbf{R}^n\rightarrow \overline{\mathbf{R}}\), we define the domain of \(f\) to be

$$\begin{aligned} \text{ dom}\, f:=\{x\in \mathbf{R}^n: f(x)<+\infty \}, \end{aligned}$$

and we define the epigraph of \(f\) to be

$$\begin{aligned} \text{ epi}\, f:= \{(x,r)\in \mathbf{R}^n\times \mathbf{R}: r\ge f(x)\}. \end{aligned}$$

A set-valued mapping \(F\) from \(\mathbf{R}^n\) to \(\mathbf{R}^m\), denoted by \(F:\mathbf{R}^n\rightrightarrows \mathbf{R}^m\), is a mapping from \(\mathbf{R}^n\) to the power set of \(\mathbf{R}^m\). Thus for each point \(x\in \mathbf{R}^n, \, F(x)\) is a subset of \(\mathbf{R}^m\). For a set-valued mapping \(F:\mathbf{R}^n\rightrightarrows \mathbf{R}^m\), we define the domain of \(F\) to be

$$\begin{aligned} \text{ dom}\, F:=\{x\in \mathbf{R}^n:F(x)\ne \emptyset \}, \end{aligned}$$

and we define the graph of \(F\) to be

$$\begin{aligned} \text{ gph}\, F:=\{(x,y)\in \mathbf{R}^n\times \mathbf{R}^m:y\in F(x)\}. \end{aligned}$$

Definition 2.1

Consider a set-valued mapping \(F:\mathbf{R}^n\rightrightarrows \mathbf{R}^m\).

  1. (1)

    \(F\) is outer semicontinuous at a point \({\bar{x}}\in \mathbf{R}^n\) if for any sequence of points \(x_r\in \mathbf{R}^n\) converging to \(\bar{x}\) and any sequence of points \(y_r\in F(x_r)\) converging to \(\bar{y}\), we must have \(\bar{y}\in F(\bar{x})\).

  2. (2)

    \(F\) is inner semicontinuous at \(\bar{x}\) if for any sequence of points \(x_r\in \mathbf{R}^n\) converging to \(\bar{x}\) and any point \(\bar{y}\in F(\bar{x})\), there exists a sequence \(y_r\in \mathbf{R}^m\) converging to \(\bar{y}\) such that \(y_r\in F(x_r)\) for all \(r\).

If both properties hold, then we say that \(F\) is continuous at \(\bar{x}\).

Definition 2.2

Consider a set \(S\subset \mathbf{R}^n\) and a point \(\bar{x}\in S\). The regular normal cone to \(S\) at \(\bar{x}\), denoted \(\hat{N}_S(\bar{x})\), consists of all vectors \(v \in \mathbf{R}^n\) such that

$$\begin{aligned} \langle v,x-\bar{x} \rangle \le o(|x-\bar{x}|) \mathrm for x\in S, \end{aligned}$$

where we denote by \(o(|x-\bar{x}|) \mathrm for x\in S\) a term with the property that

$$\begin{aligned} \frac{o(|x-\bar{x}|)}{|x-\bar{x}|}\rightarrow 0 \end{aligned}$$

when \(x\stackrel{S}{\rightarrow } \bar{x}\) with \(x\ne \bar{x}\).

Given a closed set \(S\), the mapping \(x\mapsto \hat{N}_S(x)\) does not necessarily have a closed graph. To correct for that, the following definition is introduced.

Definition 2.3

Consider a set \(S\subset \mathbf{R}^n\) and a point \(\bar{x}\in S\). The limiting normal cone to \(S\) at \(\bar{x}\), denoted \(N_S(\bar{x})\), consists of all \(v\in \mathbf{R}^n\) such that there are sequences \(x_r\stackrel{S}{\rightarrow } \bar{x}\) and \(v_r\rightarrow v\) with \(v_r\in \hat{N}_S(x_r)\).

For a set \(S\subset \mathbf{R}^n\), we denote its topological closure by \(\text{ cl}\, S\) and its convex hull by \(\text{ conv}\, S\).

Definition 2.4

Consider a set \(S\subset \mathbf{R}^n\) and a point \(\bar{x}\in S\). The Clarke normal cone to \(S\) at \(\bar{x}\), denoted \(N_S^c(\bar{x})\), is defined by

$$\begin{aligned} N_S^c(\bar{x})=\mathrm cl conv N_S(\bar{x}). \end{aligned}$$

We summarize some simple facts about normal cones that we will need.

Theorem 2.5

Consider a set \(S\subset \mathbf{R}^n\) and a point \(\bar{x}\in S\).

  1. (1)

    \(\hat{N}_S(\bar{x})\subset N_S(\bar{x}) \subset N_S^c(\bar{x})\).

  2. (2)

    \(N_S(\bar{x}), \, \hat{N}_S(\bar{x})\), and \(N_S^c(\bar{x})\) are closed cones. \(\hat{N}_S(\bar{x})\) and \(N_S^c(\bar{x})\) are, in addition, convex.

  3. (3)

    For a set \(F\subset \mathbf{R}^n\) containing \(\bar{x}\) such that \(S\subset F\), we have \(\hat{N}_F(\bar{x})\subset \hat{N}_S(\bar{x})\).

Definition 2.6

(Clarke regularity of sets) A set \(S\subset \mathbf{R}^n\) is said to be Clarke regular at a point \(\bar{x}\in S\) if it is locally closed at \(\bar{x}\) and every limiting normal vector to \(S\) at \(\bar{x}\) is a regular normal vector, that is \(N_S(\bar{x})=\hat{N}_S(\bar{x})\).

Given any set \(S\subset \mathbf{R}^n\) and a mapping \(f:S\rightarrow \widetilde{S}\), where \(\widetilde{S}\subset \mathbf{R}^m\), we say that \(f\) is smooth if for each point \(\bar{x}\in S\), there is a neighbourhood \(U\) of \(\bar{x}\) and a \(\mathbf {C}^1\) mapping \(\hat{f}:\mathbf{R}^n\rightarrow \mathbf{R}^m\) that agrees with \(f\) on \(S\cap U\). If a smooth function \(f\) is bijective and its inverse is also smooth, then we say that \(f\) is a diffeomorphism.

What we call smooth is usually referred to as \(\mathbf {C}^1\) smooth. Since in this work we will not need higher order of smoothness, no ambiguity should arise.

Definition 2.7

([25, Proposition 8.12]) Consider a set \(M\subset \mathbf{R}^n\). We say that \(M\) is a manifold (or “embedded submanifold”) of dimension \(r\) if for each point \(\bar{x}\in M\), there is an open neighbourhood \(U\) around \({\bar{x}}\) such that \(M\cap U=F^{-1}(0)\), where \(F:U\rightarrow \mathbf{R}^{n-r}\) is a smooth map with \(\nabla F(\bar{x})\) of full rank. In this case, we call \(F\) a local defining function for \(M\) around \(\bar{x}\).

Theorem 2.8

([34, Example 6.8]) If \(M\) is a manifold, then for every point \(\bar{x}\in M\), the manifold \(M\) is Clarke regular at \(\bar{x}\) and \(N_M(\bar{x})\) is equal to the normal space to \(M\) at \(\bar{x}\), in the sense of differential geometry.

Normal cones allow us to study geometric objects. We now define subdifferentials, which allow us to analyze behaviour of functions.

Definition 2.9

Consider a function \(f:\mathbf{R}^n\rightarrow \overline{\mathbf{R}}\) and a point \(\bar{x}\in \mathbf{R}^n\) where \(f\) is finite. The regular, limiting, and Clarke subdifferentials of \(f\) at \(\bar{x}\), respectively, are defined by

$$\begin{aligned} \hat{\partial }f(\bar{x})&= \{v\in \mathbf{R}^n: (v,-1)\in \hat{N}_{\text{ epi}\, f}(\bar{x},f(\bar{x}))\},\\ \partial f(\bar{x})&= \{v\in \mathbf{R}^n: (v,-1)\in N_{\text{ epi}\, f}(\bar{x},f(\bar{x}))\},\\ \partial _c f(\bar{x})&= \{v\in \mathbf{R}^n: (v,-1)\in N^c_{\text{ epi}\, f}(\bar{x},f(\bar{x}))\}. \end{aligned}$$

For \(x\) such that \(f(x)\) is not finite, we follow the convention that \(\hat{\partial }f(x)=\partial f(x)=\partial _c f(x)=\emptyset \).

Definition 2.10

(Subdifferential regularity) A function \(f:\mathbf{R}^n\rightarrow \overline{\mathbf{R}}\) is called subdifferentially regular at \(\bar{x}\) if \(f(\bar{x})\) is finite and \(\text{ epi}\, f\) is Clarke regular at \((\bar{x},f(\bar{x}))\) as a subset of \(\mathbf{R}^n\times \mathbf{R}\).

Theorem 2.11

([34, Exercise 8.8, Corollary 10.9]) Consider the function \(h=f+g\), where \(f:\mathbf{R}^n\rightarrow \overline{\mathbf{R}}\) is finite at \(\bar{x}\) and \(g:\mathbf{R}^n\rightarrow \overline{\mathbf{R}}\) is smooth on a neighbourhood of \(\bar{x}\). Then we have

$$\begin{aligned} \hat{\partial }h(\bar{x})= \hat{\partial }f(\bar{x})+\nabla g(\bar{x}),~~ \partial h(\bar{x})=\partial f(\bar{x})+\nabla g(\bar{x}). \end{aligned}$$

Furthermore, \(h\) is subdifferentially regular at \(\bar{x}\) if and only if \(f\) is subdifferentially regular at \(\bar{x}\).

For a set \(S\subset \mathbf{R}^n\), we define \(\delta _S:\mathbf{R}^n\rightarrow \overline{\mathbf{R}}\) to be a function that is \(0\) on \(S\) and \(+\infty \) elsewhere. We call \(\delta _S\) the indicator function of the set \(S\).

Theorem 2.12

([34, Exercise 8.14]) Consider the indicator function \(\delta _S\) of a set \(S\subset \mathbf{R}^n\). Then we have

$$\begin{aligned} \partial \delta _S(\bar{x})=N_S(\bar{x}),~~ \hat{\partial }\delta _S(\bar{x})=\hat{N}_S(\bar{x}). \end{aligned}$$

Furthermore, \(\delta _S\) is subdifferentially regular at \(\bar{x}\) if and only if \(S\) is Clarke regular at \(\bar{x}\).

2.2 Semi-algebraic geometry

A semi-algebraic set \(S\subset \mathbf{R}^n\) is a finite union of sets of the form

$$\begin{aligned} \{x\in \mathbf{R}^n: P_1(x)=0,\ldots ,P_k=0, Q_1(x)<0,\ldots , Q_l(x)<0\}, \end{aligned}$$

where \(P_1,\ldots ,P_k,Q_1,\ldots ,Q_l\) are polynomials in \(n\) variables. In other words, \(S\) is a union of finitely many sets, each defined by finitely many polynomial equalities and inequalities. A map \(F:\mathbf{R}^n\rightrightarrows \mathbf{R}^m\) is semi-algebraic if \(\text{ gph}\, F\subset \mathbf{R}^{n+m}\) is a semi-algebraic set. Semi-algebraic sets enjoy many nice structural properties. We discuss some of these properties in this section. See the monographs of Basu-Pollack-Roy [1], Lou van den Dries [37], and Shiota [36]. For a quick survey, see the article of van den Dries-Miller [38] and the surveys of Coste [16, 17]. Unless otherwise stated, we follow the notation of [38] and [17].

A fundamental fact about semi-algebraic sets is provided by the Tarski–Seidenberg Theorem [17, Theorem 2.3]. It states that the image of any semi-algebraic set \(S\subset \mathbf{R}^n\), under a projection to any linear subspace of \(\mathbf{R}^n\), is a semi-algebraic set. From this result, it follows that a great many constructions preserve semi-algebraicity. In particular, for a semi-algebraic function \(f:\mathbf{R}^n\rightarrow \overline{\mathbf{R}}\), it is easy to see that the set-valued mappings \(\hat{\partial } f, \, \partial f\), and \(\partial _c f\) are semi-algebraic. See for example [22, Proposition 3.1].

The most striking and useful fact about semi-algebraic sets is that they can be partitioned into finitely many semi-algebraic manifolds that fit together in a regular pattern. The particular stratification that we are interested in is defined below.

Definition 2.13

Consider a semi-algebraic set \(Q\) in \(\mathbf{R}^n\). A Whitney stratification of \(Q\) is a finite partition of \(Q\) into semi-algebraic manifolds \(M_i\) (called strata) with the following properties:

  1. (1)

    For distinct \(i\) and \(j\), if \(M_i\cap \text{ cl}\,{M_j} \ne \emptyset \), then \(M_i\subset \text{ cl}\, {M_j}\setminus M_j\).

  2. (2)

    For any sequence of points \((x_k)\) in a stratum \(M_j\) converging to a point \(x\) in a stratum \(M_i\), if the corresponding normal vectors \(y_k \in N_{M_j}(x_k)\) converge to a vector \(y\), then \(y \in N_{M_i}(x)\).

Observe that property 1 of Definition 2.13 gives us topological information on how the strata fit together, while property 2 gives us control over how sharply the strata fit together. Property 1 is called the frontier condition and property 2 is called Whitney condition (a). We should note that Whitney stratification, as defined above, is normally referred to as \(\mathbf {C}^1\)-Whitney stratification. Furthermore, Whitney condition (a) is usually stated somewhat differently. The equivalence is noted in [21]. One simple example of this type of a stratification to keep in mind throughout the discussion is the partition of a polytope into its open faces.

Definition 2.14

Given finite collections \(\{B_i\}\) and \(\{C_j\}\) of subsets of \(\mathbf{R}^n\), we say that \(\{B_i\}\) is compatible with \(\{C_j\}\) if for all \(B_i\) and \(C_j\), either \(B_i\cap C_j=\emptyset \) or \(B_i\subset C_j\).

As discussed above, the following theorem is true.

Theorem 2.15

([38, Theorem 4.8]) Let \(Q, C_1,\ldots ,C_l\) be semi-algebraic sets in \(\mathbf{R}^n\). Then \(Q\) admits a Whitney stratification that is compatible with \(C_1,\ldots ,C_l\).

The notion of a stratification being compatible with some predefined sets might not look natural; in fact, it is crucial since this property enables us to construct refinements of stratifications.

We will have occasion to use the following result.

Theorem 2.16

([38, Theorem 4.8]) Consider a semi-algebraic set \(S\) in \(\mathbf{R}^n\) and a semi-algebraic map \(f:S\rightarrow \mathbf{R}^m\). Let \(\mathcal A \) be a finite collection of semi-algebraic subsets of \(S\) and \(\mathcal B \) a finite collection of semi-algebraic subsets of \(\mathbf{R}^m\). Then there exists a Whitney stratification \(\mathcal A ^{\prime }\) of \(S\) that is compatible with \(\mathcal A \) and a Whitney stratification \(\mathcal B ^{\prime }\) of \(\mathbf{R}^m\) compatible with \(\mathcal B \) such that for every stratum \(Q\in \mathcal A ^{\prime }\), we have that the restriction \(f|_Q\) is smooth and \(f(Q)\in \mathcal B ^{\prime }\).

In particular, it follows that semi-algebraic maps are “generically” (in a sense about to be made clear) smooth.

Definition 2.17

Let \(A\subset \mathbf{R}^n\) be a nonempty semi-algebraic set. Then we define the dimension of \(A, \, \dim A\), to be the maximal dimension of a stratum in any Whitney stratification of \(A\). We adopt the convention that \(\dim \emptyset =-\infty \).

It can be easily shown that the dimension does not depend on the particular stratification. See [37, Chapter 4] for more details.

Theorem 2.18

Let \(A\) and \(B\) be nonempty semi-algebraic sets in \(\mathbf{R}^n\). Then the following hold.

  1. (1)

    If \(A\subset B\), then \(\dim A\le \dim B\).

  2. (2)

    \(\dim A=\dim \text{ cl}\,{A}\).

  3. (3)

    \(\dim (\text{ cl}\,{A}\setminus A)< \dim A\).

  4. (4)

    If \(f:A\rightarrow \mathbf{R}^n\) is a semi-algebraic mapping, then \(\dim f(A)\le \dim A\). If \(f\) is one-to-one, then \(\dim f(A)=\dim A\). In particular, semi-algebraic homeomorphisms preserve dimension.

  5. (5)

    \(\dim A\cup B= \max \{\dim A, \dim B\}\).

  6. (6)

    \(\dim A\times B=\dim A+\dim B\).

We will need the following simple proposition.

Proposition 2.19

Consider a Whitney stratification \(\{M_i\}\) of a semi-algebraic set \(Q\subset \mathbf{R}^n\). Let \(M_j\) be a stratum of maximal dimension. Then for any point \(\bar{x}\in M_j\), there exists a neighbourhood \(B\subset \mathbf{R}^n\) around \(\bar{x}\) so that

$$\begin{aligned} B\cap Q= B\cap M_j. \end{aligned}$$

Proof

Assume otherwise. Then there is a sequence \(x_r\in Q\) converging to \(\bar{x}\) with \(x_r\notin M_j\). Since there are finitely many strata, we can assume that the whole sequence is contained in some stratum \(M\). It follows that \(\bar{x}\) lies in the closure of \(M\). By the frontier condition of the Whitney stratification, it must be that \(\dim M_j< \dim M\), which is a contradiction since the stratum \(M_j\) was chosen to have maximal dimension. \(\square \)

A set \(U\subset \mathbf{R}^n\) is said to be “generic”, if it is large in some precise mathematical sense, depending on context. Two popular choices are that of \(U\) being a full-measure set, meaning its complement has Lebesgue measure zero, and that of \(U\) being topologically generic, meaning it contains a countable intersection of dense open sets. In general, these notions are very different. However for semi-algebraic sets, the situation simplifies drastically. Indeed, if \(U\subset \mathbf{R}^n\) is a semi-algebraic set, then the following are equivalent.

  • \(U\) is full-measure.

  • \(U\) is topologically generic.

  • The dimension of \(U^{c}\) is strictly smaller than \(n\).

We will say that a certain property holds for a generic vector \(v\in \mathbf{R}^n\) if the set of vectors for which this property holds is generic in the sense just described. Generic properties of semi-algebraic optimization problems will be discussed in Sect. 4.

Definition 2.20

Let \(A\subset \mathbf{R}^m\) be a semi-algebraic set. A continuous semi-algebraic mapping \(p:A\rightarrow \mathbf{R}^n\) is semi-algebraically trivial over a semi-algebraic set \(C\subset \mathbf{R}^n\) if there is a semi-algebraic set \(F\) and a semi-algebraic homeomorphism \(h:p^{-1}(C)\rightarrow C\times F\) such that \(p|_{p^{-1}(C)}=\mathrm{proj}\circ h\), or in other words the following diagram commutes:

figure a1

We call \(h\) a semi-algebraic trivialization of \(p\) over \(C\).

Henceforth, we use the symbol \(\cong \) to indicate that two semi-algebraic sets are semi-algebraically homeomorphic.

Remark 2.21

If \(p\) is trivial over some semi-algebraic set \(C\), then we can decompose \(p|_{p^{-1}(C)}\) into a homeomorphism followed by a simple projection. Also, since the homeomorphism \(h\) in the definition is surjective and \(p|_{p^{-1}(C)}=\mathrm{proj}\circ h\), it follows that \(h(p^{-1}(c))= \{c\}\times F\) for any \(c\in C\). Thus for any point \(c\in C\), we have \(p^{-1}(c)\cong F\) and \(p^{-1}(C)\cong C\times p^{-1}(c)\).

The following is a simple example of semi-algebraic triviality.

Example 2.22

We follow the notation of Definition 2.20. Consider the semi-algebraic function \(p:\mathbf{R}\rightarrow \mathbf{R}\) defined by \(p(x)=x^2\). Now consider the semi-algebraic mapping

$$\begin{aligned} h:\mathbf{R}\!\setminus \!\!\{0\}\rightarrow \mathbf{R}_{++}\times \{\pm 1\}, \qquad x\mapsto (x^2,\text{ sgn}\, x). \end{aligned}$$

It is easy to check that \(h\) is a semi-algebraic homeomorphism, and furthermore we have \(p=\mathrm{proj}\circ h\) when restricted to \(\mathbf{R}\setminus \{0\}\). Thus \(h\) is a semi-algebraic trivialization of \(p\) over \(\mathbf{R}_{++}\).

Definition 2.23

In the notation of Definition 2.20, a trivialization \(h\) is compatible with a semi-algebraic set \(B\subset A\) if there is a semi-algebraic set \(H\subset F\) such that \(h(B\cap p^{-1}(C))= C\times H\).

If \(h\) is a trivialization over \(C\) then, certainly, for any set \(B\subset A\) we know \(h\) restricts to a homeomorphism from \(B\cap p^{-1}(C)\) to \(h(B\cap p^{-1}(C))\). The content of the definition above is that if \(p\) is compatible with \(B\), then \(h\) restricts to a homeomorphism between \(B\cap p^{-1}(C)\) and \(C\times H\) for some semi-algebraic set \(H\subset F\). Here is a simple example.

Example 2.24

Let the semi-algebraic functions \(p\) and \(h\) be as defined in Example 2.22. Now notice that \(h(\mathbf{R}_{++}\cap p^{-1}(\mathbf{R}_{++}))= \mathbf{R}_{++}\times \{+1\}\). Thus \(h\) is compatible with \(\mathbf{R}_{++}\).

The following result will be used extensively in the rest of this work. See [37, Chapter 9, Theorem 1.2] for more details.

Theorem 2.25

(Hardt triviality) Let \(A\subset \mathbf{R}^n\) be a semi-algebraic set and \(p:A\rightarrow \mathbf{R}^m\), a continuous semi-algebraic mapping. There is a finite partition of the image \(p(A)\) into semi-algebraic sets \(C_1,\ldots , C_k\) such that \(p\) is semi-algebraically trivial over each \(C_i\). Moreover, if \(Q_1,\ldots ,Q_l\) are semi-algebraic subsets of \(A\), we can require each trivialization \(h_i:p^{-1}(C_i)\rightarrow C_i\times F_i\) to be compatible with all \(Q_j\).

Example 2.26

Consider the following elaboration on Example 2.22. Let the semi-algebraic functions \(p\) and \(h\) be defined as in Example 2.22. We saw that \(h\) is a semi-algebraic trivialization of \(p\) over \(\mathbf{R}_{++}\). Let \(f:\{0\}\rightarrow \{0\}\times \{0\}\) be the zero map. Observe \(f\) is a semi-algebraic trivialization of \(p\) over \(\{0\}\). Thus \(\{\mathbf{R}_{++}, \{0\}\}\) is a partition of \(p(\mathbf{R})\) guaranteed to exist by Theorem 2.25.

Given a continuous semi-algebraic function \(p\), Theorem 2.25 states that we can partition the image of \(p\) into semi-algebraic sets \(C_1,\ldots ,C_k\), so that for each index \(i=1,\ldots ,k\), the restricted mapping \(p|_{p^{-1}(C_i)}\) has a very simple form. By applying Theorem 2.25 to various naturally occurring mappings, many interesting results can be obtained. See [37, Chapter 9] for more details. In particular, by applying this theorem to the projection map we can break up semi-algebraic sets into simple building blocks that have product structure and analyze each one separately. This type of reasoning leads to the following corollary.

Corollary 2.27

Let \(F :\mathbf{R}^n \rightrightarrows \mathbf{R}^m\) be a semi-algebraic set-valued mapping. Then there exists a partition of the domain of \(F\) into semi-algebraic sets \(X_1,X_2,\ldots , X_k\) with the following properties:

  1. (1)

    For each index \(i=1,2,\ldots k\), there exists a semi-algebraic set \(Y_i \subset \mathbf{R}^m\) and a semi-algebraic homeomorphism \(\theta _i :\text{ gph}\,F|_{X_i} \rightarrow X_i \times Y_i\) satisfying

    $$\begin{aligned} \theta _i(\{x\} \times F(x)) = \{x\} \times Y_i~~ \text{ for} \text{ all}~ x \in X_i. \end{aligned}$$

    Consequently, for all \(x \in X_i\), we have \(F(x) \cong Y_i\) and

    $$\begin{aligned} \text{ gph}\,F|_{X_i}\cong X_i\times F(x). \end{aligned}$$
  2. (2)

    If in addition, \(\widetilde{F} :\mathbf{R}^n \rightrightarrows \mathbf{R}^m\) is another semi-algebraic set-valued mapping with \(\widetilde{F}(x)\subset F(x)\), then we may also require that for each index \(i=1,2,\ldots ,k\), there exists a semi-algebraic set \(\widetilde{Y}_i \subset Y_i\), such that \(\theta _i(\text{ gph}\,\widetilde{F}|_{X_i})=X_i \times \widetilde{Y}_i\). Consequently, for all \(x \in X_i\), we have \(\widetilde{F}(x) \cong \widetilde{Y}_i\) and

    $$\begin{aligned} \text{ gph}\,\widetilde{F}|_{X_i}\cong X_i\times \widetilde{F}(x). \end{aligned}$$

Proof

Assume that we are given semi-algebraic set-valued maps \(F\) and \(\widetilde{F}\) such that \(\widetilde{F}(x)\subset F(x)\) for all \(x\in \mathbf{R}^n\). If \(\widetilde{F}\) was not given, proceed with the proof with \(\widetilde{F}(x)=\emptyset \) for all \(x\in \mathbf{R}^n\). Consider \(\text{ gph}\,F\subset \mathbf{R}^n\times \mathbf{R}^m\). Let \(p:\text{ gph}\,F\rightarrow \mathbf{R}^n\) be the projection onto the first \(n\) coordinates. By applying Theorem 2.25 to \(p\), we get a partition of the domain of \(F\) into semi-algebraic sets \(X_1,X_2,\ldots , X_k\) such that \(p\) is semi-algebraically trivial over each \(X_i\) and each trivialization is compatible with \(\text{ gph}\,\widetilde{F}\). Thus there exist semi-algebraic sets \(Y_1,Y_2,\ldots ,Y_k \subset \mathbf{R}^m\) and \(\widetilde{Y_1},\widetilde{Y_2},\ldots ,\widetilde{Y_k} \subset \mathbf{R}^m\) with \(\widetilde{Y_i}\subset Y_i\), such that for each \(i\), there is a semi-algebraic homeomorphism \(\theta _i:p^{-1}(X_i)\rightarrow X_i\times Y_i\), where we have

$$\begin{aligned} \theta _i(\text{ gph}\,\widetilde{F}\cap p^{-1}(X_i))&= X_i\times \widetilde{Y_i},\nonumber \\ \mathrm{proj}_{X_i}\circ \theta _i&= p|_{p^{-1}(X_i)}. \end{aligned}$$
(2)

Observe that \(p^{-1}(X_i)=\text{ gph}\,F|_{X_i}\) and since \(\text{ gph}\, \widetilde{F}\) is contained in \(\text{ gph}\, F\), it follows that \(\text{ gph}\,\widetilde{F}\cap p^{-1}(X_i)=\text{ gph}\,\widetilde{F}|_{X_i}\). Thus to summarize, we have

$$\begin{aligned} \text{ gph}\,F|_{X_i}&\cong X_i\times Y_i,\\ \nonumber \text{ gph}\,\widetilde{F}|_{X_i}&\cong X_i\times \widetilde{Y_i}. \end{aligned}$$
(3)

Finally, from (2) and (3), it follows that for all points \(x \in X_i\), we have

$$\begin{aligned} \theta _i(\{x\}\times F(x))=\{x\}\times Y_i, \end{aligned}$$

completing the proof. \(\square \)

The following proposition appears in [2, 3]; as observed there, this result is an easy and important consequence of Theorem 2.25, and even though we will not have occasion to use it in this work, we include it and its proof below as an elegant illustration.

Proposition 2.28

Let \(F:\mathbf{R}^n\rightrightarrows \mathbf{R}^m\) be a semi-algebraic set-valued mapping. Then there exists a finite partition of the domain of \(F\) into semi-algebraic sets \(X_1,\ldots ,X_k\), such that for each index \(i=1,\ldots ,k\), the restricted mapping \(F|_{X_i}\) is inner semicontinuous. If in addition, the mapping \(F\) is compact-valued, then we can also require the restricted mapping \(F|_{X_i}\) to be outer semicontinuous for each index \(i=1,\ldots ,k\). (In fact, the partition guaranteed to exist by Corollary 2.27 is one such partition.)

Proof

Applying Corollary 2.27 to the mapping \(F\), we get a finite partition of the domain of \(F\) into semi-algebraic sets \(X_1,\ldots ,X_k\), so that, in particular, property 1 of the corollary holds. To see the inner semicontinuity of the restricted map \(F|_{X_i}\), consider any point \((\bar{x}, \bar{y}) \in \text{ gph}\,F|_{X_i}\), and any sequence of points \(x_r \rightarrow \bar{x}\) in the set \(X_i\). We want to construct a sequence of points \(y_r \in F(x_r)\) converging to \(\bar{y}\). Notice that \(\theta _i(\bar{x},\bar{y}) = (\bar{x}, \hat{y})\) for some point \(\hat{y} \in Y_i\). Since \((x_r,\hat{y}) \rightarrow (\bar{x},\hat{y})\), we deduce \(\theta _i^{-1}(x_r,\hat{y}) \rightarrow \theta _i^{-1}(\bar{x},\hat{y}) = (\bar{x},\bar{y})\). But for each index \(r\), we know \(\theta _i^{-1}(x_r,\hat{y}) = (x_r,y_r)\) for some point \(y_r \in F(x_r)\), so the result follows.

Assume now that \(F\) is compact-valued. Consider any point \(\bar{x}\in X_i\) and any sequence of points \((x_r,y_r)\rightarrow (\bar{x},\bar{y})\), where \(\bar{y}\) is some point in \(\mathbf{R}^m\) and \(y_r\in F(x_r)\) for each \(r\). We want to argue that \(\bar{y}\) is in \(F(\bar{x})\). Consider the sequence \((\bar{x},\mathrm{proj}_{Y_i}(\theta _i(x_r,y_r)))\). Observe that this sequence is contained in \(\{\bar{x}\}\times Y_i\), which is a compact set since it is homeomorphic to \(F(\bar{x})\). Thus, without loss of generality, we can assume that \((\bar{x},\mathrm{proj}_{Y_i}(\theta _i(x_r,y_r)))\) converges to \((\bar{x}, \hat{y})\) for some point \(\hat{y}\in Y_i\). So we have

$$\begin{aligned} (x_r,y_r)=\theta _i^{-1}(x_r,\mathrm{proj}_{Y_i}(\theta _i(x_r,y_r)))\rightarrow \theta _i^{-1}(\bar{x}, \hat{y})\in \{\bar{x}\}\times F(\bar{x}). \end{aligned}$$

By the uniqueness of the limit, we must have \(\bar{y}\in F(\bar{x})\). \(\square \)

As a consequence of Proposition 2.28, it follows that any semi-algebraic set-valued mapping \(F:\mathbf{R}^n\rightarrow \mathbf{R}^m\) is generically inner semicontinuous. If, in addition, \(F\) is compact-valued, then \(F\) is generically continuous. In fact, we can do better. If we require the mapping \(F\) just to be closed-valued, then we can still partition its domain into semi-algebraic sets \(X_1,\ldots ,X_k\), such that for each index \(i=1,\ldots ,k\), the restricted mapping \(F|_{X_i}\) is continuous. To see this, we need the following theorem that appears in [34, Theorem 5.55], and is attributed to [13, 24, 35]. Recall that given a topological space \(X\), a subset \(A\) of \(X\) is meager if it is a union of countably many nowhere dense subsets of X.

Theorem 2.29

(Kuratowski) Consider a set \(X\subset \mathbf{R}^n\) and a closed-valued set-valued mapping \(F:X\rightrightarrows \mathbf{R}^m\). Assume that \(F\) is either outer semicontinuous or inner semicontinuous relative to \(X\). Then the set of points where \(F\) fails to be continuous relative to \(X\) is meager in \(X\).

It is easy to see that if a semi-algebraic set \(S\) is meager in another semi-algebraic set \(X\), then the dimension of \(S\) is strictly less than the dimension of \(X\) (see [4] for more details).

Proposition 2.30

Let \(F:\mathbf{R}^n\rightrightarrows \mathbf{R}^m\) be a semi-algebraic closed-valued set-valued mapping. Then there exists a finite partition of the domain of \(F\) into semi-algebraic sets \(X_1,\ldots ,X_k\), such that for each index \(i=1,\ldots ,k\), the restricted mapping \(F|_{X_i}\) is continuous.

Proof

Applying Proposition 2.28 to the mapping \(F\), we get a partition of the domain of \(F\) into semi-algebraic sets \(X_1,\ldots ,X_k\), so that the restricted map \(F|_{X_i}\) is inner semicontinuous. Fix some set \(X_i\). Let \(S^0:=X_i\) and let \(S^1\subset X_i\) be the set of points at which \(F|_{S^0}\) fails to be continuous. By Theorem 2.29, it follows that \(\dim S^1<\dim S^0\). Now by applying this argument inductively, we can create a sequence of semi-algebraic sets \(S^0\supset \cdots \supset S^k\), for some integer \(k\), such that the collection \(\{S_j\setminus S_{j+1}\}^{k-1}_{j=0}\) is a partition of \(X_i\) and \(F\) is continuous when restricted to each \(S_j\setminus S_{j+1}\). By applying this argument to all the sets \(X_i\), for \(i=1,\ldots ,k\), we get the result. \(\square \)

Remark 2.31

In fact, it is shown in Daniilidis–Pang [18] that closed-valued semi-algebraic maps are generically strictly continuous (see [34] for the definition). Their proof of this rather stronger result requires more sophisticated tools.

Finally, we have the following result:

Theorem 2.32

([38, Theorem 4.4]) Let \(A\) be a semi-algebraic subset of \(\mathbf{R}^n\times \mathbf{R}^m\). There is an integer \(\beta \) such that for every point \(x\in \mathbf{R}^n\), the number of connected components of the set \(A_x=\{y\in \mathbf{R}^m:(x,y)\in A\}\) is no greater than \(\beta \).

The following is a simple special case of Theorem 2.32. We record it here for convenience.

Remark 2.33

Let \(F:\mathbf{R}^n\rightrightarrows \mathbf{R}^m\) be a semi-algebraic mapping. Applying Theorem 2.32 to \(\text{ gph}\,F\subset \mathbf{R}^n\times \mathbf{R}^n\), we deduce that there is an integer \(\beta \) such that for every \(x\in \mathbf{R}^n\), the number of connected components of \(F(x)\) is no greater than \(\beta \).

3 Main results

Definition 3.1

Consider a Whitney stratification \(\mathcal A \) of a semi-algebraic set \(Q\subset \mathbf{R}^n\). We define the normal bundle \(N_\mathcal{A }\) associated with the stratification \(\mathcal A \) to be the union of the normal bundles of each stratum, that is

$$\begin{aligned} N_\mathcal{ A }=\bigcup _{M\in \mathcal A } \text{ gph}\,N_M =\bigcup _{M\in \mathcal A } \{(x,y)\in \mathbf{R}^n\times \mathbf{R}^n: x\in M,\, y\in N_M(x)\}. \end{aligned}$$

In the definition above, since there are finitely many strata and for each stratum \(M\in \mathcal A \), the semi-algebraic set \(\text{ gph}\,N_M\) is \(n\)-dimensional, we deduce that the normal bundle \(N_\mathcal{A }\) is a semi-algebraic set of dimension \(n\).

Proposition 3.2

Consider a semi-algebraic set \(Q \subset \mathbf{R}^n\) and suppose it admits a Whitney stratification \(\mathcal A =\{M_i\}\). Then for any stratum \(M_i\) and any point \(\bar{x}\in M_i\), the Clarke normal cone, \(N^c_Q(\bar{x})\), is contained in the normal space, \(N_{M_i}(\bar{x})\). Consequently, the inclusion \(\text{ gph}\,N_Q^c\subset N_\mathcal{A }\) holds and so the graph of the Clarke normal cone has dimension no greater than \(n\).

Proof

Observe that for any stratum \(M_j\), we have the inclusion \(M_j\subset Q\). Hence for any point \(x\in M_j\), the inclusion

$$\begin{aligned} \hat{N}_Q(x)\subset \hat{N}_{M_j}(x)=N_{M_j}(x) \end{aligned}$$
(4)

holds. Now fix some stratum \(M_i\) and a point \(\bar{x}\in M_i\). We claim that the limiting normal cone \(N_Q(\bar{x})\) is contained in \(N_{M_i}(\bar{x})\). To see this, consider a vector \(v\in N_Q(\bar{x})\). By definition of the limiting normal cone, there exist sequences \((x_r)\) and \((v_r)\) such that \(x_r\stackrel{Q}{\rightarrow } \bar{x}\) and \(v_r\rightarrow v\) with \(v_r\in \hat{N}_Q(x_r)\). Since there are finitely many strata, we can assume that there is some stratum \(M_j\) such that the entire sequence \((x_r)\) is contained in \(M_j\). From (4), we deduce \(\hat{N}_Q(x_r)\subset N_{M_j}(x_r)\), and hence \(v_r\in N_{M_j}(x_r)\). Therefore by Whitney condition (a), we have \(v\in N_{M_i}(\bar{x})\). Since \(v\) was arbitrarily chosen from \(N_Q(\bar{x})\), we deduce \(N_Q(\bar{x})\subset N_{M_i}(\bar{x})\) and thus \(N_Q^{c}(\bar{x})=\mathrm cl conv N_Q(\bar{x})\subset N_{M_i}(\bar{x})\), as we needed to show. \(\square \)

Shortly, we will generalize this result to the graph of the Clarke subdifferential. We need the following simple result. We provide a proof for completeness.

Proposition 3.3

Let \(A\subset \mathbf{R}^n\) be a semi-algebraic set and \(p:A\rightarrow \mathbf{R}^m\), a continuous semi-algebraic mapping. Let \(D\) be the image set of the mapping \(p\). Then we have the inequality,

$$\begin{aligned} \dim D+ \min _{x\in D}\dim p^{-1}(x) \le \dim A\le \dim D+\max _{x\in D}\dim p^{-1}(x). \end{aligned}$$

In particular, if there exists an integer \(k\) such that the set \(p^{-1}(x)\) is \(k\)-dimensional for every point \(x\in D\), then the equality,

$$\begin{aligned} \dim A= \dim D +k, \end{aligned}$$

holds.

Proof

Applying Theorem 2.25 to the mapping \(p\), we obtain a finite partition of the set \(D\) into semi-algebraic sets \(\{C_i\}\) such that \(p^{-1}(C_i)\cong C_i\times p^{-1}(c)\), for any \(c\in C_i\). Let \(C_i\) be a partitioning set satisfying \(\dim A = \dim p^{-1}(C_i)\), and let \(c\) be any point in \(C_i\). Then we have

$$\begin{aligned} \dim A = \dim p^{-1}(C_i)=\dim C_i+\dim p^{-1}(c) \le \dim D+\max _{x\in D}\dim p^{-1}(x). \end{aligned}$$

Let \(C_j\) be a partitioning set satisfying \(\dim D = \dim C_j\), and let \(c\) be any point in \(C_j\). Then we obtain

$$\begin{aligned} \dim A \ge \dim p^{-1}(C_j)=\dim C_j+\dim p^{-1}(c) \ge \dim D+\min _{x\in D}\dim p^{-1}(x), \end{aligned}$$

as we needed to show. \(\square \)

We record the following simple and intuitive corollary for reference.

Corollary 3.4

Let \(F:\mathbf{R}^n\rightrightarrows \mathbf{R}^m\) be a semi-algebraic set-valued mapping and let \(D:=\text{ dom}\,F\). Then we have the inequality,

$$\begin{aligned} \dim D+\min _{x\in D}\dim F(x)\le \dim \text{ gph}\,F\le \dim D+\max _{x\in D}\dim F(x). \end{aligned}$$

In particular, if there exists an integer \(k\) such that the set \(F(x)\) is \(k\)-dimensional for every point \(x\in D\), then the equality,

$$\begin{aligned} \dim \text{ gph}\,F= \dim D +k, \end{aligned}$$

holds.

Proof

This is an easy consequence of Corollary 2.27. \(\square \)

Theorem 3.5

Let \(f:\mathbf{R}^n\rightarrow \overline{\mathbf{R}}\) be a semi-algebraic function. Then the graph of the Clarke subdifferential, \(\text{ gph}\, \partial _c f\), has dimension no greater than \(n\).

Proof

Let \(F:=\text{ epi}\,f\) and

$$\begin{aligned} A:= \{(x,r,y)\in \mathbf{R}^{n}\times {\mathbf{R}}\times \mathbf{R}^{n+1}:((x,r),y)\in \text{ gph}\,N_F^c,\, r=f(x),\, y_{n+1}< 0\}. \end{aligned}$$

Using Proposition 3.2, we see

$$\begin{aligned} \dim A\le \dim \text{ gph}\,N_F^c\le n+1. \end{aligned}$$
(5)

Consider the continuous semi-algebraic map

$$\begin{aligned} \phi :A\rightarrow \mathbf{R}^n\times \mathbf{R}^n \end{aligned}$$
$$\begin{aligned} (x,f(x),y)\mapsto \left(x,\pi \Big (\frac{y}{|y_{n+1}|}\Big )\right), \end{aligned}$$

where \(\pi :\mathbf{R}^{n+1}\rightarrow \mathbf{R}^n\) is the canonical projection onto the first \(n\) coordinates. Observe that the image of \(\phi \) is exactly the graph of the Clarke subdifferential \(\partial _c f\). Furthermore, for any pair \((x,v)\in \text{ gph}\,\partial _c f\), we have

$$\begin{aligned} \phi ^{-1}(x,v)=\{x\}\times \{f(x)\}\times \mathbf{R}_{+}(v,-1), \end{aligned}$$

and hence \(\dim \phi ^{-1}(c)=1\) for any point \(c\) in the image of \(\phi \). By Proposition 3.3, we deduce

$$\begin{aligned} \dim \text{ gph}\,\partial _c f+1=\dim A\le n+1, \end{aligned}$$

where the last inequality follows from (5). Hence, we obtain \(\dim \text{ gph}\,\partial _c f\le n\), as we needed to show. \(\square \)

Shortly we will show that for a proper semi-algebraic function \(f:\mathbf{R}^n\rightarrow \overline{\mathbf{R}}\), both \(\text{ gph}\, \partial _c f\) and \(\text{ gph}\, \partial f\) have dimension exactly equal to \(n\). In the case that the domain of \(f\) is full-dimensional, this fact is easy to show. The argument is as follows. By Theorem 2.16, the domain of \(f\) can be partitioned into semi-algebraic manifolds \(\{X_i\}\) such that \(f|_{X_i}\) is smooth. Let \(X_i\) be the manifold of maximal dimension. Observe that for \(x\in X_i\), we have \(\partial f(x)=\{\nabla f(x)\}\) and it easily follows that \(\dim \text{ gph}\, \partial f|_{X_i}=n\). Thus we have

$$\begin{aligned} n\le \dim \text{ gph}\, \partial f\le \dim \text{ gph}\, \partial _c f\le n, \end{aligned}$$

where the last inequality follows from Theorem 3.5, and hence there is equality throughout. The argument just presented no longer works when the domain of \(f\) is not full-dimensional. A slightly more involved argument is required. We record the following simple observation for reference.

Proposition 3.6

Consider a smooth manifold \(M\subset \mathbf{R}^n\) and a smooth real-valued function \(f:M\rightarrow \mathbf{R}\). Define a function \(h:\mathbf{R}^n\rightarrow \overline{\mathbf{R}}\) agreeing with \(f\) on \(M\) and equaling plus infinity elsewhere. Then \(h\) is subdifferentially regular throughout \(M\). Furthermore, at any point \(\bar{x}\in M\), we have

$$\begin{aligned} \partial h(\bar{x})=N_M(\bar{x})+\nabla g(\bar{x}), \end{aligned}$$

where \(g:\mathbf{R}^n\rightarrow \mathbf{R}\) is any smooth function agreeing with \(f\) on \(M\) on a neighbourhood of \(\bar{x}\). Consequently, \(\partial h(\bar{x})\) is nonempty with dimension \(n-\dim M\).

Proof

Observe that near the point \(\bar{x}\), we have \(h=\delta _M+g\). Combining Theorem 2.8 and Theorem 2.12, we have that the function \(\delta _M\) is subdifferentially regular at \(\bar{x}\). By Theorem 2.11, it follows that \(h\) is subdifferentially regular at \(\bar{x}\) and

$$\begin{aligned} \partial h(\bar{x})=\partial \delta _M(\bar{x})+\nabla g(\bar{x})=N_M(\bar{x})+\nabla g(\bar{x}), \end{aligned}$$

as we needed to show. \(\square \)

Theorem 3.7

Let \(f:\mathbf{R}^n\rightarrow \overline{\mathbf{R}}\) be a proper semi-algebraic function. Then the graphs of the regular, limiting, and Clarke subdifferentials have dimension exactly \(n\).

Proof

We know

$$\begin{aligned} \dim \text{ gph}\, \hat{\partial } f\le \dim \text{ gph}\, \partial f\le \dim \text{ gph}\, \partial _c f\le n, \end{aligned}$$

where the last inequality follows from Theorem 3.5. Thus if we show that the dimension of \(\text{ gph}\, \hat{\partial } f\) is no less than \(n\), we will be done. With that aim, applying Theorem 2.16 to the function \(f\), we obtain a Whitney stratification \(\{M_i\}\) of the domain of \(f\) such that for every stratum \(M_i\), the restriction \(f|_{M_i}\) is smooth. Let \(M_j\) be a stratum of \(\text{ dom}\, f\) of maximal dimension.

Now consider the function \(h:\mathbf{R}^n\rightarrow \overline{\mathbf{R}}\), which agrees with \(f\) on \(M_j\) and is plus infinity elsewhere. By Proposition 2.19, the functions \(h\) and \(f\) coincide on a neighbourhood of \(\bar{x}\). Applying Proposition 3.6, we deduce that \(f\) is subdifferentially regular at \(\bar{x}\) and \(\partial f(\bar{x})\) is nonempty with dimension \(n-\dim M_j\). Since the point \(\bar{x}\) was arbitrarily chosen from \(M_j\), we deduce \(\dim \hat{\partial } f(x)=n-\dim M_j\) for any point \(x\in M_j\). Thus applying Corollary 3.4 to the semi-algebraic set-valued map \(\hat{\partial } f|_{M_j}\), we deduce \(\dim \text{ gph}\,\hat{\partial } f|_{M_j} = \dim M_j+n-\dim M_j=n\), and hence the result follows. \(\square \)

More refined, local versions of Theorem 3.7 are investigated in [19].

Theorem 3.5 shows that for a semi-algebraic function \(f:\mathbf{R}^n\rightarrow \overline{\mathbf{R}}\), the Clarke subdifferential \(\partial _c f\) is small in a dimensional sense. If \(f\) is also Lipschitz, it is small in another sense, that we now discuss: we relate our results to the notion of a minimal cusco ( Convex Upper Semicontinuous nonempty compact valued set-valued mapping), introduced in [8]. To that effect, consider a set \(A\subset \mathbf{R}^n\) and a set-valued mapping \(F:A\rightrightarrows \mathbf{R}^m\). We say that \(F\) is upper semicontinuous at some point \(\bar{x}\in A\) if every open set \(U\) containing \(F(\bar{x})\) also contains \(F(z)\) for all points \(z\in A\) close to \(\bar{x}\). If a map is closed-valued and upper semicontinuous, then it is outer semicontinuous. On the other hand, if a map is outer semi-continuous and locally bounded, then it is upper semicontinuous. See [11, Section 5.1.4] for more details. In particular, for a Lipschitz function \(f\), the Clarke subdifferential \(\partial _c f\) is upper semi-continuous [14, Proposition 2.1.5]. The mapping \(F:A\rightrightarrows \mathbf{R}^m\) is said to be a cusco if it is upper-semicontinuous on \(A\) and \(F(x)\) is a nonempty compact convex set for each point \(x\in A\). A minimal cusco is a cusco, whose graph does not strictly contain the graph of any other cusco. Let \(U\subset \mathbf{R}^n\) be an open set and consider a semi-algebraic locally Lipschitz function \(f:U\rightarrow \mathbf{R}\). It follows by a direct application of [8, Corollary 2.2] and generic smoothness of \(f\) that the the set-valued mapping \(\partial _c f\) is, in fact, a minimal cusco.

It is tempting to think that in the semi-algebraic setting, the graph of an arbitrary minimal cusco should have small dimension. However, it is not hard to see that this is not the case. For instance, we will now exhibit a semi-algebraic minimal cusco \(F:\mathbf{R}^3\rightrightarrows \mathbf{R}^3\), whose graph is \(4\)-dimensional. Thus semi-algebraic minimal cuscos with low dimensional graphs, such as the Clarke subdifferential of a semi-algebraic locally Lipschitz function \(f\) defined on an open set, are somewhat special.

To simplify notation, we let \([y<0,z<0]\) be an alias for the set \(\{(x,y,z)\in \mathbf{R}^3:y<0,z<0\}\) and we reserve analogous notation for relaters ‘\(>\)’ and ‘\(=\)’. Consider the semi-algebraic set-valued mapping \(F:\mathbf{R}^3\rightrightarrows \mathbf{R}^3\), defined as follows

$$\begin{aligned}&F|_{[y>0, z>0]}=\{(0,0,0)\}, ~F|_{[y<0, z>0]}=\{(0,0,1)\},\\&F|_{[y<0,z<0]}=\{(0,1,0)\}, ~F|_{[y>0, z<0]}=\{(1,0,0)\},\\&F|_{[y>0, z=0]}=\text{ conv}\,\{(0,0,0),(1,0,0)\}, ~F|_{[y=0, z>0]}=\text{ conv}\,\{(0,0,1),(0,0,0)\},\\&F|_{[y<0,z=0]}=\text{ conv}\,\{(0,1,0),(0,0,1)\},~F|_{[y=0, z<0]}=\text{ conv}\,\{(1,0,0),(0,1,0)\},\\&F|_{[y=0,z=0]}= \text{ conv}\,\{(0,0,0),(0,0,1),(0,1,0),(1,0,0)\}. \end{aligned}$$

It is easy to verify that \(F\) is indeed a minimal cusco with a \(4\)-dimensional graph. In particular, Theorem 3.7 implies that \(F\) is not the Clarke subdifferential mapping \(\partial _c f\) for any semi-algebraic function \(f:\mathbf{R}^3\rightarrow \overline{\mathbf{R}}\).

4 Consequences

Definition 4.1

Consider a function \(f:\mathbf{R}^n\rightarrow \overline{\mathbf{R}}\). We say that a point \(x\in \mathbf{R}^n\) is Clarke-critical for the function \(f\) if \(0\in \partial _{c}f(x)\), and we call such a critical point \(x\) nondegenerate if the stronger property \(0 \in \text{ ri}\, \partial _c f(x)\) holds.

Recall that for a proper convex function \(f:\mathbf{R}^n\rightarrow \overline{\mathbf{R}}\) and a point \(\bar{x}\in \text{ dom}\, f\), the subdifferentials \(\hat{\partial }f(\bar{x}), \, \partial f(\bar{x})\), and \(\partial _c f(\bar{x})\) all coincide and are equal to the convex subdifferential of \(f\) at \(\bar{x}\). So in this case, the notions of Clarke-criticality and Clarke-nondegeneracy reduce to more familiar notions from Convex Analysis. The importance of nondegeneracy for the sensitivity analysis of convex functions is well known: in [26], for example, it is an underlying assumption for a pioneering conceptual approach to superlinearly convergent convex minimization algorithms. Consider the following largely classical theorem (see [4, Proposition 1] and [20]).

Theorem 4.2

Let \(f:\mathbf{R}^n\rightarrow \overline{\mathbf{R}}\) be a proper convex function. Consider the collection of perturbed functions \(h_v(x)=f(x)-\langle v,x\rangle \), parametrized by vectors \(v\in \mathbf{R}^n\). Then for a full measure set of vectors \(v\in \mathbf{R}^n\), the function \(h_v\) has at most one minimizer, which furthermore is nondegenerate.

Shortly, we will prove that a natural analogue of Theorem 4.2 holds for arbitrary semi-algebraic functions, with no assumption of convexity. We will then reference an example of a locally Lipschitz function that is not semi-algebraic, and for which the conclusion of our analogous result fails, thus showing that the assumption of semi-algebraicity is not superfluous. In what follows, for a set \(S\), the number of elements in \(S\) will be denoted by \(S^{\#}\). We begin with the following simple proposition.

Proposition 4.3

Let \(F:\mathbf{R}^n\rightrightarrows \mathbf{R}^m\) be a semi-algebraic set-valued mapping whose graph has dimension no greater than \(n\). Then there exists \(\beta \in \mathbb N \) such that for a generic set of points \(c\in \mathbf{R}^n\), we have \(F(c)^{\#}\le \beta \).

Proof

Let \(D = \text{ dom}\, F\). If the dimension of \(D\) is strictly less than \(n\), then we are done since then the complement of \(D\) is a set satisfying the claimed property with \(\beta =0\). Thus assume that \(D\) has dimension \(n\). Applying Corollary 2.27 to the mapping \(F\), we get a finite partition of \(D\) into semi-algebraic sets \(\{C_i\}\), such that

$$\begin{aligned} \text{ gph}\,F|_{C_i}\cong C_i\times F(c) \end{aligned}$$

for any \(c\in C_i\). Let \(C_i\) be a partitioning set of maximal dimension. So \(\dim C_i=n\) and we have

$$\begin{aligned} n\ge \dim \text{ gph}\,F|_{C_i}=n+\dim F(c) \end{aligned}$$

For any \(c\in C_i\). Thus \(\dim F(c)=0\) and since \(F(c)\) is a semi-algebraic set, it must be finite. Since this argument holds for any \(C_i\) of maximal dimension, we have that for a generic vector \(c\), the set \(F(c)\) is finite. Observe that if \(F(c)\) is a finite non-empty set, then \(F(c)^{\#}\) is equal to the number of connected components of \(F(c)\). By Remark 2.33, there exists \(\beta \in \mathbb N \) such that for all \(c\in \mathbf{R}^n\), the number of connected components of \(F(c)\) is no greater than \(\beta \). So in particular, for generic \(c\), we have \(F(c)^{\#}\le \beta \). \(\square \)

Corollary 4.4

Let \(f:\mathbf{R}^n\rightarrow \overline{\mathbf{R}}\) be a semi-algebraic function and consider the collection of perturbed functions \(h_v(x)=f(x)-\langle v,x\rangle \), parametrized by vectors \(v\in \mathbf{R}^n\). Then there exists a positive integer \(\beta \), such that for generic \(v\in \mathbf{R}^n\), the number of Clarke-critical points of the perturbed function \(h_v\) is no greater than \(\beta \).

Proof

Observe

$$\begin{aligned} 0\in \partial _{c}h_v(x)\Leftrightarrow v\in \partial _{c}f(x)\Leftrightarrow x\in (\partial _{c}f)^{-1}(v). \end{aligned}$$

Thus the set \((\partial _{c}f)^{-1}(v)\) is equal to the set of Clarke-critical points of the function \(h_v\). By Theorem 3.5, we have \(\dim \text{ gph}\, \partial _c f\le n\), hence \(\dim \text{ gph}\, (\partial _c f)^{-1}\le n\). Applying Theorem 4.3 to \((\partial _c f)^{-1}\), we deduce that there exists a positive integer \(\beta \), such that for generic \(v\), we have \(((\partial _c f)^{-1}(v))^{\#}\le \beta \). The result follows. \(\square \)

Corollary 4.5

Let \(f:\mathbf{R}^n\rightarrow \overline{\mathbf{R}}\) be a semi-algebraic function and consider the collection of perturbed functions \(h_v(x)=f(x)-\langle v, x\rangle \), parametrized by vectors \(v\in \mathbf{R}^n\). Then for generic \(v\in \mathbf{R}^n\), every Clarke-critical point of the function \(h_v\) is nondegenerate.

Corollary 4.5 follows immediately from the observation

$$\begin{aligned} 0 \in \text{ ri}\, \partial _c h_v(x)\Leftrightarrow v\in \text{ ri}\, \partial _c f(x), \end{aligned}$$

and the following result.

Corollary 4.6

Let \(f:\mathbf{R}^n\rightarrow \overline{\mathbf{R}}\) be a semi-algebraic function. Then for generic \(v\in \mathbf{R}^n\), we have that

$$\begin{aligned} x\in (\partial _c f)^{-1}(v) \Longrightarrow v\in \text{ ri}\, \partial _c f(x). \end{aligned}$$

Proof

Let \(D = \text{ dom}\, \partial _c f\). Consider the semi-algebraic set-valued mapping

$$\begin{aligned} \widetilde{F}:\mathbf{R}^n\rightrightarrows \mathbf{R}^n, ~~x\mapsto \mathrm rb \partial _c f(x). \end{aligned}$$

Our immediate goal is to show that the dimension of \(\text{ gph}\,\widetilde{F}\) is no greater than \(n-1\). Observe that for each \(x\in \mathbf{R}^n\), we have \(\widetilde{F}(x)\subset \partial _c f(x)\). Applying Corollary 2.27 to the mapping \(\partial _c f\), we get a finite partition of \(D\) into semi-algebraic sets \(\{X_i\}\), such that

$$\begin{aligned} \text{ gph}\,\partial _c f|_{X_i}\cong X_i\times \partial _c f(x) \end{aligned}$$

and

$$\begin{aligned} \text{ gph}\,\widetilde{F}|_{X_i}\cong X_i\times \widetilde{F}(x) \end{aligned}$$

for any \(x\in X_i\) (for each \(i\)). By Theorem 3.5, we have that

$$\begin{aligned} n\ge \dim \text{ gph}\,\partial _c f|_{X_i} = \dim X_i+ \dim \partial _c f(x). \end{aligned}$$

Since \(\widetilde{F}(x)=\mathrm rb \partial _c f(x)\), it follows that

$$\begin{aligned} \dim \widetilde{F}(x) \le \dim \partial _c f(x)-1. \end{aligned}$$

Therefore

$$\begin{aligned} \dim \text{ gph}\,\widetilde{F}|_{X_i}= \dim X_i+ \dim \widetilde{F}(x)\le \dim X_i+\dim \partial _c f(x)-1\le n-1. \end{aligned}$$

Thus

$$\begin{aligned} \dim \text{ gph}\, \widetilde{F}= \dim \Big (\bigcup _i \text{ gph}\, \widetilde{F}|_{X_i}\Big )\le n-1. \end{aligned}$$

And so if we let

$$\begin{aligned} \pi :\text{ gph}\, \widetilde{F}\rightarrow \mathbf{R}^n \end{aligned}$$

be the projection onto the last \(n\) coordinates, we deduce that \(\dim \pi (\text{ gph}\, \widetilde{F})\le n-1\). Finally, observe

$$\begin{aligned} \pi (\text{ gph}\,\widetilde{F}) ~=~ \Big \{ v \in \mathbf{R}^n : v \in \text{ rb}\,\partial _c f(x)~ \text{ for} \text{ some}~ x \in \mathbf{R}^n \Big \}, \end{aligned}$$

and so the result follows. \(\square \)

Remark 4.7

Observe that if a convex function has finitely many minimizers then, in fact, it has a unique minimizer. Thus, for a proper convex semi-algebraic function, Corollaries 4.4 and 4.5 reduce to Theorem 4.2.

Remark 4.8

In Corollaries 4.4 and 4.5, if the function \(f\) is not semi-algebraic, then the results of these corollaries can fail. In fact, these results can fail even if the function \(f\) is locally Lipschitz continuous. For instance, there is a locally Lipschitz function \(f:\mathbf{R}\rightarrow \mathbf{R}\) such that \(\partial _c f(x)=[-x,x]\) for every \(x\in \mathbf{R}\). See the article of Borwein-Moors-Wang [9]. For all \(v\in \mathbf{R}\), the perturbed function \(h_v\) has infinitely many critical points, and for all \(v\in \mathbf{R}\setminus \{0\}\), the function \(h_v\) has critical points that are degenerate.

5 Composite optimality conditions

Consider a composite optimization problem \(\displaystyle \min _x\, g(F(x))\). It is often computationally more convenient to replace the criticality condition \(0\in \partial (g\circ F)(x)\) with the potentially different condition \(0\in \nabla F(x)^{*}\partial g(F(x))\), related to the former condition by an appropriate chain rule. See for example the discussion of Lagrange multipliers in [33]. Thus it is interesting to study the graph of the set-valued mapping \(x\mapsto \nabla F(x)^{*}\partial g(F(x))\).

5.1 Dimensional analysis of the chain rule

The following is a standard result in subdifferential calculus.

Theorem 5.1

[34, Theorem 10.6] Consider a function \(g:\mathbf{R}^m\rightarrow \overline{\mathbf{R}}\) and a smooth mapping \(F:\mathbf{R}^n\rightarrow \mathbf{R}^m\). Then at any point \(\bar{x}\in \mathbf{R}^n\), one has

$$\begin{aligned} \hat{\partial } (g\circ F)(\bar{x}) \supset \nabla F(\bar{x})^{*}\hat{\partial }g(F(\bar{x})). \end{aligned}$$

Now assuming that the functions \(g\) and \(F\) in the theorem above are semi-algebraic, we immediately deduce, using Theorem 3.5, that the dimension of the graph of the mapping \(x\mapsto \nabla F(\bar{x})^{*}\hat{\partial }g(F(\bar{x}))\) is at most \(n\).

One can ask what happens more generally in the case of the limiting and Clarke subdifferentials. It is well known that the inclusion

$$\begin{aligned} \partial (g\circ F)(\bar{x}) \supset \nabla F(\bar{x})^{*}\partial g(F(\bar{x})) \end{aligned}$$

is only guaranteed to hold under certain conditions [34, Theorem 10.6]. The Clarke case is similar [14, Theorem 2.3.10]. Hence, a priori, the dimension of the graph of the set-valued mapping \(x\mapsto \nabla F(x)^{*}\partial g(F(x))\) is unclear. In this section, we will show that if \(g\) is lower semicontinuous, then this dimension is no greater than \(n\) and we will derive some consequences.

The proofs of Proposition 3.2 and Theorem 3.5 are self contained and purely geometric. There is, however, an alternative approach using [5, Proposition 4], which will be useful for us. We state this proposition now. We denote the linear subspace of \(\mathbf{R}^n\) parallel to a nonempty convex set \(S\subset \mathbf{R}^n\) by \(\text{ par}\,S\).

Proposition 5.2

[5, Proposition 4] Consider a proper, lower semicontinuous, semi-algebraic function \(g:\mathbf{R}^m\rightarrow \overline{\mathbf{R}}\). Then there exists a Whitney stratification \(\{M_i\}\) of the domain of \(g\) such that for each stratum \(M_i\) and for any point \(x\in M_i\), the inclusion \(\text{ par}\,\partial _c g(x)\subset N_{M_i}(x)\) holds.

Before proceeding, we record the following special case of Theorem 5.1. Consider a smooth function \(F:\mathbf{R}^n\rightarrow \mathbf{R}^m\) and a nonempty set \(Q\subset \mathbf{R}^m\). Consider any point \(\bar{x}\in \mathbf{R}^n\). Applying Theorem 5.1 to the indicator function of \(Q\), we deduce

$$\begin{aligned} \hat{N}_{F^{-1}(Q)}(\bar{x})\supset \nabla F(\bar{x})^{*}\hat{N}_{Q}(F(\bar{x})). \end{aligned}$$

If we let \(Q=F(X)\), for some set \(X\subset \mathbf{R}^n\), then we obtain

$$\begin{aligned} \hat{N}_{F^{-1}(F(X))}(\bar{x})\supset \nabla F(\bar{x})^{*}\hat{N}_{F(X)}(F(\bar{x})). \end{aligned}$$
(6)

Theorem 5.3

Consider a proper, lower semicontinuous, semi-algebraic function \(g:\mathbf{R}^m\rightarrow \overline{\mathbf{R}}\) and a smooth semi-algebraic mapping \(F:\mathbf{R}^n\rightarrow \mathbf{R}^m\). Then the graph of the semi-algebraic set-valued mapping \(x\mapsto \nabla F(x)^{*}\partial _c g(F(x))\) has dimension no greater than \(n\).

Proof

Consider the Whitney stratification \(\{M_i\}\) of \(\text{ dom}\,g\) that is guaranteed to exist by applying Proposition 5.2 to the function \(g\). Now applying Theorem 2.16 to the mapping \(F\), we obtain a Whitney stratification \(\{X_i\}\) of \(\mathbf{R}^n\) and a Whitney stratification \(\{K_j\}\) of \(\mathbf{R}^m\) compatible with \(\{M_i\}\) such that for each index \(i\), we have \(F(X_i)=K_j\) for some index \(j\). Fix some stratum \(X\) and a point \(\bar{x}\in X\). If \(F(X)\) is not a subset of the domain of \(g\), then clearly \(\nabla F(\cdot )^{*}\partial _c g(F(\cdot ))|_X\equiv \emptyset \). Hence, we only consider \(X\) such that \(F(X)\subset \text{ dom}\,g\). Let \(M\) be the stratum satisfying \(F(X)\subset M\). Observe by our choice of the stratification \(\{M_i\}\), we have

$$\begin{aligned} \nabla F(\bar{x})^{*}\partial _c g(F(\bar{x}))\subset \nabla F(\bar{x})^{*}v+ \nabla F(\bar{x})^{*}N_M(F(\bar{x})), \end{aligned}$$

for some vector \(v\in \mathbf{R}^m\). Hence we have the inclusions

$$\begin{aligned} \text{ par}\,\nabla F(\bar{x})^{*}\partial _c g(F(\bar{x}))\subset \nabla F(\bar{x})^{*}N_M(F(\bar{x}))\subset \nabla F(\bar{x})^{*}N_{F(X)}(F(\bar{x})), \end{aligned}$$
(7)

where the last inclusion follows since the manifold \(F(X)\) is a subset of \(M\). Combining (6) and (7), we obtain

$$\begin{aligned} \text{ par}\,\nabla F(\bar{x})^{*}\partial _c g(F(\bar{x}))\subset \hat{N}_{F^{-1}(F(X))}(\bar{x})\subset N_X(\bar{x}), \end{aligned}$$

where the last inclusion follows since the manifold \(X\) is a subset of \(F^{-1}(F(X))\). So we deduce

$$\begin{aligned} \dim \nabla F(\bar{x})^{*}\partial _c g(F(\bar{x}))\le n-\dim X. \end{aligned}$$

Since the point \(\bar{x}\) was arbitrarily chosen from \(X\), we conclude, using Corollary 3.4, the inequality \(\dim \text{ gph}\,\nabla F(\cdot )^{*}\partial _c g(F(\cdot ))|_X\le n\). Taking the union over the strata \(\{X_i\}\) yields

$$\begin{aligned} \dim \text{ gph}\,\nabla F(\cdot )^{*}\partial _c g(F(\cdot ))\le n, \end{aligned}$$

as we claimed. \(\square \)

Observe that Theorem 5.3 is a generalization of Theorem 3.5. This can easily be seen by taking \(F\) to be the identity map in Theorem 5.3.

Remark 5.4

Consider a proper, lower semicontinuous, semi-algebraic function \(g:\mathbf{R}^m\rightarrow \overline{\mathbf{R}}\) and a smooth semi-algebraic mapping \(F:\mathbf{R}^n\rightarrow \mathbf{R}^m\) satisfying \(\text{ dom}\,g\circ F= F^{-1}(\text{ dom}\,g)\ne \emptyset \). A natural question, in line with Theorem 3.7, is whether the graph of the mapping \(x\mapsto \nabla F(x)^{*}\partial _c g(F(x))\) has dimension exactly \(n\). In fact, there is no hope for that to hold generally. For instance, it is possible to have \(\partial _c g(y)=\emptyset \) for every point \(y\) in the image of \(F\). This example, however motivates the following easy proposition.

Proposition 5.5

Consider a proper, lower semicontinuous, semi-algebraic function \(g:\mathbf{R}^m\rightarrow \overline{\mathbf{R}}\) and a smooth semi-algebraic mapping \(F:\mathbf{R}^n\rightarrow \mathbf{R}^m\). Assume that the set \(F^{-1}(\text{ dom}\,\hat{\partial }g)\) has a nonempty interior. Then the graph of the set-valued mapping \(\nabla F(\cdot )^{*}\hat{\partial } g(F(\cdot ))\) has dimension exactly \(n\). Analogous results hold in the limiting and Clarke cases.

Proof

We show the theorem only for the case of the regular subdifferential. The proof remains unchanged for the other cases. Clearly, as a result of Theorem 5.3, it is sufficient to show that the inequality, \(\dim \text{ gph}\,\nabla F(\cdot )^{*}\hat{\partial } g(F(\cdot ))\ge n\), holds. To that effect, consider an open set \(N\) contained in the interior of the set \(F^{-1}(\text{ dom}\,\hat{\partial }g)\). Then for any point \(x\in N\), the set \(\hat{\partial } g(F(x))\) is nonempty. Hence, the the map \(x\mapsto \nabla F(x)^{*}\hat{\partial } g(F(x))\) has nonempty values on \(N\). In particular, letting \(\pi :\mathbf{R}^n\times \mathbf{R}^n\rightarrow \mathbf{R}^n\) be the projection onto the first \(n\) coordinates, we see

$$\begin{aligned} \pi (\text{ gph}\,\nabla F(\cdot )^{*}\hat{\partial } g(F(\cdot ))|_N)=N. \end{aligned}$$

Hence we conclude \(\dim \text{ gph}\,\nabla F(\cdot )^{*}\hat{\partial } g(F(\cdot ))|_N\ge \dim N=n\), thus completing the proof. \(\square \)

The following easy result, which we state without proof, gives a different approach.

Proposition 5.6

Consider a proper, continuous, semi-algebraic function \(g:\mathbf{R}^m\rightarrow \overline{\mathbf{R}}\) and a smooth semi-algebraic mapping \(F:\mathbf{R}^n\rightarrow \mathbf{R}^m\). If the constraint qualification of [34, Theorem 10.6] holds at some point \(x\in F^{-1}(\text{ dom}\,g)\), then the graph of the set-valued mapping \(\nabla F(\cdot )^{*}\partial g(F(\cdot ))\) has dimension exactly \(n\). An analogous statement holds in the Clarke case.

5.2 Consequences

Let \(F:\mathbf{R}^n\rightarrow \mathbf{R}^m\) be a smooth mapping and \(g:\mathbf{R}^m\rightarrow \overline{\mathbf{R}}\) a proper lower semicontinuous function. (For simplicity, here we assume that the mapping \(F\) is defined on all of \(\mathbf{R}^n\). However the whole section extends immediately to a mapping \(F\) defined only on an open subset \(U\subset \mathbf{R}^n\).) Consider the following collection of composite minimization problems, parametrized by vectors \(v\in \mathbf{R}^n\).

$$\begin{aligned} (P(v))\qquad \min _{x\in \mathbf{R}^n} g(F(x))-\langle v,x \rangle \end{aligned}$$

For a point \(\bar{x}\) to be a minimizer for \(P(v)\), the inclusion \(v\in \partial (g\circ F)(\bar{x})\) must necessarily hold. As discussed in the beginning of the section, it is often more convenient to replace this condition with the potentially different condition \(v\in \nabla F(\bar{x})^{*}\partial g(F(\bar{x}))\). This motivates the following definition.

Definition 5.7

We say that a point \(x\) is Clarke critical for the problem \((P(v))\) if the inclusion \(v\in \nabla F(x)^{*}\partial _c g(F(x))\) holds, and we call such a critical point \(x\) nondegenerate for the problem \((P(v))\) if the stronger property \(v\in \text{ ri}\,\nabla F(x)^{*}\partial _c g(F(x))\) holds.

We are now in position to state a natural generalization of Corollaries 4.4 and 4.5.

Corollary 5.8

Let \(F:\mathbf{R}^n\rightarrow \mathbf{R}^m\) be a semi-algebraic smooth function and \(g:\mathbf{R}^m\rightarrow \overline{\mathbf{R}}\) a proper lower semicontinuous semi-algebraic function. Consider the following collection of optimization problems, parametrized by vectors \(v\in \mathbf{R}^n\).

$$\begin{aligned} (P(v))\qquad \min _{x\in \mathbf{R}^n} g(F(x))-\langle v, x\rangle \end{aligned}$$

Then there exists a positive integer \(\beta \), such that for a generic vector \(v\in \mathbf{R}^n\), the number of Clarke-critical points for the problem \((P(v))\) is no greater than \(\beta \). Furthermore, for a generic vector \(v\in \mathbf{R}^n\), every Clarke-critical point for the problem \((P(v))\) is nondegenerate.

Proof

Observe that by Theorem 5.3, the graph of the mapping \(x\mapsto \nabla F(x)^{*}\partial _c g(F(x))\) has dimension no greater than \(n\). The proof now proceeds along the same lines as the proofs of Corollaries 4.4 and 4.5. \(\square \)

Observe that Corollaries 4.4 and 4.5 can be considered as special cases of Corollary 5.8, in which the map \(F\) is the identity map.

A noteworthy illustration of Corollary 5.8 is the problem of constrained minimization, which we discuss now. Let \(f:\mathbf{R}^n\rightarrow \overline{\mathbf{R}}\) be a semi-algebraic function and \(D\subset \mathbf{R}^n\) a closed semi-algebraic set. Consider the following collection of constrained minimization problems, parametrized by vectors \(v\in \mathbf{R}^n\).

$$\begin{aligned} (P^{\prime }(v))\qquad&\min&f(x)-\langle v,x \rangle \\&\text{ s.t.}\,&x\in D \end{aligned}$$

Observe that \((P^{\prime }(v))\) is equivalent to the problem \(\displaystyle \min _{x\in \mathbf{R}^n} g(F(x))-\langle v, x\rangle \), where we define \(F(x)=(x,x)\) and \(g(x,y)=f(x)+\delta _D(y)\).

Hence, in the sense of composite minimization, it is easy to check that a point \(x\in D\) is Clarke critical for the problem \((P^{\prime }(v))\) if \(v\in \partial _c f(x)+N_D^c(x)\), and such a critical point \(x\) is nondegenerate for the problem \((P^{\prime }(v))\) if the stronger property \(v\in \text{ ri}\,\partial _c f(x)+\text{ ri}\,N_D^c(x)\) holds.

Corollary 5.9

Let \(f:\mathbf{R}^n\rightarrow \mathbf{R}\) be a semi-algebraic function and let \(D\) be a closed, semi-algebraic set. Consider the following collection of optimization problems, parametrized by vectors \(v\in \mathbf{R}^n\).

$$\begin{aligned} (P(v))\qquad&\min&f(x)-\langle v, x\rangle \\&\text{ s.t.}\,&x\in D \end{aligned}$$

Then there exists a positive integer \(\beta \), such that for a generic vector \(v\in \mathbf{R}^n\), the number of Clarke-critical points for the problem \((P(v))\) is no greater than \(\beta \). Furthermore, for a generic vector \(v\in \mathbf{R}^n\), every Clarke-critical point for the problem \((P(v))\) is nondegenerate.

Proof

This follows directly from Corollary 5.8. \(\square \)