1 Introduction

The cardinality constraint, also known as the sparsity constraint, has attracted growing interest in various disciplines due to the “dimension curse” in the big data era. This paper treats a general cardinality-constrained optimization model called the cardinality-constrained cone programming (CCCP), which aims to minimize a general continuously differentiable function subject to the cardinality constraint together with some general nonlinear cone constraint. The CCCP problem provides a unified framework for a rich variety of cardinality-constrained problems arising from many applications including the subset selection problem in regression [1, 2], the portfolio optimization problems [3,4,5,6,7], and the linear and nonlinear compressed sensing in signal processing [8,9,10].

The CCCP problem is generally NP-hard due to the cardinality function and so convex and nicely behaved nonconvex relaxation strategies are typically employed to handle this problem with various numerical algorithms (see, e.g., [11,12,13,14,15,16,17] and references therein). Most existing theoretical results are then addressed on relaxation exactness-guaranteed conditions such as the restricted isometry property (RIP) [10], while less results are explored on the optimality analysis for the cardinality-constrained problem directly. Such limited research results are mainly distributed in [18,19,20,21,22,23,24,25,26]. For example, with the single cardinality constraint, Beck and Eldar [19] presented three types of first-order necessary optimality conditions based upon the notions of stationarity and coordinatewise optimality. Extended analysis for the cardinality and the so-called symmetric set constraints were discussed in [20, 25] in a similar manner. For a relatively more general case, where the feasible region is characterized by the cardinality constraint and nonlinear equality and inequality constraints, Lu and Zhang [24] established a first-order necessary optimality condition by employing the Robinson constraint qualification (RCQ) from variational analysis. Such a model was recently reformulated as a mixed-integer program [21] and complementarity constrained mathematical program [22], and optimality analysis was then built upon the corresponding reformulated models. Besides the tool of constraint qualifications (CQs) as employed in [21, 22, 24], the notions of tangent cones and normal cones were utilized to characterize optimality conditions for cardinality-constrained problems, such as the Bouligand tangent cone and the proximal and Mordukhovich normal cones to the cardinality-constrained set as discussed in [18], the Bouligand and Clarke tangent cones and corresponding normal cones to the intersection of the cardinality-constrained set and the nonnegative orthant in [26], and the Fr\(\acute{e}\)chet and Mordukhovich normal cones to the intersection of the cardinality-constrained set and any given convex polyhedron in [23].

Our purpose is to extend the first- and the second-order optimality conditions to the more general and unified CCCP model by proposing some tailored forms of the RCQ. These CQs are well designed to ensure decomposition properties of the tangent cone and the normal cone to the intersection of the cardinality-constrained set and the nonlinear cone constrained set, which will play a vital role in establishing the desired optimality conditions.

The remainder of the paper is organized as follows. The CCCP model and its related fundamental properties will be introduced in Sect. 2. The restricted forms of the RCQ and the strict RCQ tailored for the CCCP problem will be proposed and discussed in Sect. 3. The first- and the second-order optimality conditions are then proposed in Sects. 4 and 5, respectively. Concluding remarks are drawn in Sect. 6.

2 Preliminaries

In this paper, we focus on the following CCCP problem:

$$\begin{aligned} \min ~~f(x)~~~~~s.t. ~~G(x)\in K,~\Vert x\Vert _0\le s, \end{aligned}$$
(1)

where \(f{:}\,\mathbb {R}^n \rightarrow \mathbb {R}\), \(G{:}\, \mathbb {R}^n \rightarrow \mathbb {R}^m\) are continuously differentiable functions, K is a nonempty, closed and convex cone in \(\mathbb {R}^m\), \(\Vert x\Vert _0\) denotes the cardinality of the vector \(x\in \mathbb {R}^n \), i.e., the number of nonzero elements in x, \(s\in ~]0,n[\) is any given positive integer.

Several specific models of the form (1) arise from statistical regression, finance, signal processing and others; see, e.g., the subset selection in regression [27], the cardinality-constrained principal component analysis [28] and its relaxation [29], the cardinality-constrained semidefinite programming [30], the cardinality-constrained nonnegative matrix factorization [31], etc.

For convenience, we denote the feasible set of (1) as \(\Omega :=\Phi \cap S\) with

$$\begin{aligned} \Phi :=\{x\in \mathbb {R}^n{:}\, G(x)\in K\}~\text{ and }~S:=\{x\in \mathbb {R}^n{:}\,\Vert x\Vert _0\le s\}. \end{aligned}$$
(2)

Some related fundamental notation throughout the paper is listed as follows. For any given \(x\in \mathbb {R}^n\), we denote \(\mathrm {supp}(x):=\{i=1,\ldots ,n{:}\, x_i\ne 0\}\). For an index set \(J\subseteq \{1,2,\ldots ,n\}\), |J| is the cardinality of J and \(\bar{J}=\{1,\ldots ,n\}\backslash J\). \(\mathbb {R}_J^n:=\mathrm{span}\{{e}_i{:}\, i\in J\}\) denotes the subspace of \(\mathbb {R}^{n}\) spanned by \(\{{e}_i{:}\,i\in J\}\), where \({e}_i\) is the i-th unit vector in \(\mathbb {R}^{n}\). \(\mathcal {J}:=\{J\subseteq \{1,2,\ldots ,n\}{:}\,|J|=s\}\). For \(x^*\in S\), we denote \(\Gamma ^*:=\mathrm {supp}(x^*)\) and \(\mathcal {J}^*:=\big \{J\in \mathcal {J}{:}\, J\supseteq \Gamma ^*\big \}\). For any given set \(\Omega \subseteq \mathbb {R}^n\), \(x'\xrightarrow {\Omega } x\) means \(x'\rightarrow x\) and \(x'\in \Omega \). For a cone \(K\subseteq \mathbb {R}^n\), \(K^{\circ }:=\{d\in \mathbb {R}^n{:}\,\langle d,x\rangle \le 0, \forall x\in K\}\) is its (negative) polar cone. The distance from the point \(x\in \mathbb {R}^n\) to set \(\Omega \subseteq \mathbb {R}^n\) is denoted \(\hbox {dist}(x,\Omega )=\inf _{y\in \Omega }\Vert x-y\Vert \). Denote \(G^{-1}{:}\, \mathbb {R}^m \rightarrow \mathbb {R}^n\) as the inverse mapping of G, \(\hbox {DG}(x){:}\, \mathbb {R}^n \rightarrow \mathbb {R}^m\) as the derivative of G and \(\hbox {DG}(x)\mathbb {R}^n=\{\hbox {DG}(x)y{:}\,y\in \mathbb {R}^n\}\). The support function of set \(\Omega \) is denoted as \(\sigma (x,\Omega )=\sup \{\langle x,y\rangle ,y\in \Omega \}\). \(cl(\Omega )\) and \(\hbox {int}(\Omega )\) stand for the closure and the interior of the set \(\Omega \), respectively. Let \(x\in \Omega \). The tangent (also called contingent or Bouligand) cone and Clarke (also called regular) tangent cone to \(\Omega \) at x are denoted as \({T}_{\Omega }(x)\) and \({T}^C_{\Omega }(x)\), respectively (see, e.g., [32, Definition 6.1] and [32, Definition 6.25]). The normal cones in the sense of Fr\(\acute{e}\)chet (also called regular), Clarke, and Mordukhovich (also called limiting or basic) are denoted as \({\hat{N}}_{\Omega }(x)\), \({N}^C_{\Omega }(x)\) and \({N}_{\Omega }(x)\), respectively (see, e.g., [32, Definition 6.3] and [33]). For a nonempty and closed set \(\Omega \subseteq \mathbb {R}^{n}\) and a point \(x\in \Omega \), the outer second-order tangent set to \(\Omega \) at x in the direction d is denoted as \({T}^2_{\Omega }(x,d)\) (see, e.g., [32, Definition 13.11]).

By utilizing the observation \(S=\bigcup _{J\in \mathcal {J}}\mathbb {R}_J^n\), we have the following properties on the tangent cones, normal cones and outer second-order tangent set to S.

Lemma 2.1

[18, Theorems 3.9 and 3.15], [26, Theorems 2.1 and 2.2] Let \(x^*\in S\). We have \({N}^C_{S}(x^*)=\mathbb {R}_{\overline{\Gamma }^*}^n\), \({T}^C_{S}(x^*) =\mathbb {R}_{\Gamma ^*}^n\), and

$$\begin{aligned}&{\hat{N}}_{S}(x^*)=\left\{ \begin{array}{*{20}c} \mathbb {R}_{\overline{\Gamma }^*}^n, &{} if ~~|\Gamma ^*|=s,\\ \{0\}, &{} if ~~|\Gamma ^*|<s, \end{array}\right. ~~~~~~~~{T}_{S}(x^*) =\left\{ \begin{array}{*{20}c} \mathbb {R}_{\Gamma ^*}^n, &{} if ~~|\Gamma ^*|=s,\\ \underset{J\in \mathcal {J}^*}{\bigcup }\mathbb {R}_{J}^n, &{} if ~~|\Gamma ^*|<s.\end{array}\right. \end{aligned}$$

Proposition 2.1

Let \(x^*\in S\) and \(d\in {T}^{}_{S}(x^*)\). Then

$$\begin{aligned} {T}^2_{S}(x^*,d)={T}^{}_{S}(x^*)\cap {T}^{}_{S}(d). \end{aligned}$$

Proof

The desired result follows from

$$\begin{aligned} {T}^2_{S}(x^*,d)= & {} {T}^2_{\bigcup _{J\in \mathcal {J}}\mathbb {R}_J^n}(x^*,d) =\bigcup _{J\in \mathcal {J}}{T}^2_{\mathbb {R}_J^n}(x^*,d) =\bigcup _{J\in \mathcal {J}^*}{T}^2_{\mathbb {R}_J^n}(x^*,d)\\= & {} {{\mathop {\mathop {\bigcup }\limits _{J\in \mathcal {J}^*,}}\limits _{d\in \mathbb {R}_J^n}}}(T_{\mathbb {R}_J^n}(x^*)\,{+}\,\mathbb {R}d) ={{\mathop {\mathop {\bigcup }\limits _{J\in \mathcal {J}^*,}}\limits _{d\in \mathbb {R}_J^n}}}(\mathbb {R}_J^n+\mathbb {R}d)={{\mathop {\mathop {\bigcup }\limits _{J\in \mathcal {J}^*,}}\limits _{d\in \mathbb {R}_J^n}}}\mathbb {R}_J^n={{\mathop {\mathop {\bigcup }\limits _{|J|=s,J\supseteq \Gamma ^*,}}\limits _{J\supseteq \mathrm {supp}(d)}}}\mathbb {R}_J^n \\= & {} \left( {{\mathop {\mathop {\bigcup }\limits _{|J|=s, J\supseteq \Gamma ^*}}}}\mathbb {R}_J^n\right) \cap \left( {{\mathop {\mathop {\bigcup }\limits _{|J|=s, J\supseteq \mathrm {supp}(d)}}}}\mathbb {R}_J^n\right) ={T}^{}_{S}(x^*)\cap {T}^{}_{S}(d), \end{aligned}$$

where the second equality is from [34, Proposition 3.37], the third equality follows from the fact that for any \(\mathbb {R}_J^n\) satisfying \(x^*\notin \mathbb {R}_J^n\) (i.e., \(J\notin \mathcal {J}^*\)), \({T}^2_{\mathbb {R}_J^n}(x^*,d)=\emptyset \) and the forth equality is from [32, Proposition 13.12]. This completes the proof. \(\square \)

It is easy to verify that, for any \(d\in {T}^{}_{S}(x^*)\), \({T}^C_S(x^*)\subseteq {T}^2_{S}(x^*,d)\).

3 Restricted Robinson Constraint Qualification

In order to get a decomposed form for the tangent cone and the normal cone to \(\Phi \cap S \), we will introduce restricted variants of the classical RCQ and the strict RCQ tailored for the constraint system of (1). Recall that the RCQ for \(\Phi \) holds at \(x^*\in \Phi \) if \(0\in \hbox {int}\{G(x^*)+\hbox {DG}(x^*)\mathbb {R}^n-K\}.\) According to [34, Corollary 2.98], the RCQ is equivalent to

$$\begin{aligned} \hbox {DG}(x^*)\mathbb {R}^n-T_K(G(x^*))=\mathbb {R}^m. \end{aligned}$$
(3)

By invoking [34, Lemma 3.27], this further leads to

$$\begin{aligned} \hat{N}_{\Phi }(x^*)=[T_{\Phi }(x^*)]^{\circ }=\hbox {DG}(x^*)^{\top }N_K(G(x^*)). \end{aligned}$$
(4)

Some other essential properties on the RCQ are reviewed as follows.

Lemma 3.1

[34, Proposition 2.90, Corollary 2.91] Suppose that the RCQ holds at some \(x^*\in \Phi \). Then \(T_{\Phi }(x^*)=\{d\in \mathbb {R}^n{:}\,\hbox {DG}(x^*)d\in T_K(G(x^*))\}\), and \(\hbox {dist}(d,T_{\Phi }(x^*))=o(\Vert d\Vert ),~~\forall ~x^*+d\in \Phi \).

Recall that the strict RCQ for \(\Phi \) holds at \((x^*,\lambda ^*)\) with \(\lambda ^*\in N_K(G(x^*))\) if \( 0\in \hbox {int}\{G(x^*)+\hbox {DG}(x^*)\mathbb {R}^n-K_0\}\) with \(K_0:=\{y\in K{:}\,\langle \lambda ^*,y-G(x^*)\rangle =0\}. \)

For \(\Phi \cap S\), we introduce restricted forms of RCQ and strict RCQ as below.

Definition 3.1

Let \(0\ne x^*\in \Phi \cap S\), and \(\lambda ^*\in N_K(G(x^*))\). We say that the restricted RCQ (RRCQ) with respect to \(\Phi \cap S\) holds at \(x^*\) if

$$\begin{aligned} 0\in \hbox {int}\{G(x^*)+\hbox {DG}(x^*)\mathbb {R}^n_{\Gamma ^*}-K\}, \end{aligned}$$
(5)

and the restricted strict RCQ (RSRCQ) to \(\Phi \cap S\) holds at \((x^*,\lambda ^*)\) if

$$\begin{aligned} 0\in \hbox {int}\{G(x^*)+\hbox {DG}(x^*)\mathbb {R}^n_{\Gamma ^*}-K_0\}, \end{aligned}$$
(6)

where \(K_0=\{y\in K{:}\,\langle \lambda ^*,y-G(x^*)\rangle =0\}\).

Properties on the above proposed CQs are discussed as follows.

Proposition 3.1

If the RRCQ holds at some \(0\ne x^*\in \Phi \cap S\), then

$$\begin{aligned} T_{\Phi \cap S}(x^*)= T_{\Phi }(x^*)\cap T_S(x^*). \end{aligned}$$
(7)

Proof

Analogous to the equivalence between RCQ and (3), we immediately get that RRCQ (5) is equivalent to \(\hbox {DG}(x^*)\mathbb {R}^n_{\Gamma ^*}-T_K(G(x^*))=\mathbb {R}^m\), which yields that \(\hbox {DG}(x^*)\mathbb {R}^n_{J}-T_K(G(x^*))=\mathbb {R}^m\), for any \( J\in \mathcal {J}^*\) from \(\mathbb {R}^n_{\Gamma ^*}\subseteq \mathbb {R}^n_{J}\). By taking the polar to both sides, it follows from [32, Corollary 11.25] that \([\hbox {DG}(x^*)^{\top }]^{-1}\mathbb {R}^n_{\bar{J}}\cap N_K(G(x^*))=\{0\}\), and hence

$$\begin{aligned} \mathbb {R}^n_{\bar{J}}\cap \hbox {DG}(x^*)^{\top }N_K(G(x^*))=\{0\}. \end{aligned}$$
(8)

Combining with (4) and the fact \(N_{\mathbb {R}^n_J}(x^*)=\mathbb {R}^n_{\bar{J}}\), it follows from [32, Theorem 6.42] that

$$\begin{aligned} T_{\Phi \cap \mathbb {R}^n_{J}}(x^*)= T_{\Phi }(x^*)\cap \mathbb {R}^n_{J},~\forall J\in \mathcal {J}^*. \end{aligned}$$
(9)

The desired result then holds from

$$\begin{aligned} T_{\Phi \cap S}(x^*)= & {} T_{\bigcup _{J\in \mathcal {J}^*}\Phi \cap \mathbb {R}^n_{J}}(x^*) =\bigcup _{J\in \mathcal {J}^*} T_{\Phi \cap \mathbb {R}^n_{J}}(x^*) =\bigcup _{J\in \mathcal {J}^*}T_{\Phi }(x^*)\cap \mathbb {R}^n_{J} \nonumber \\= & {} T_{\Phi }(x^*)\cap \bigcup _{J\in \mathcal {J}^*} \mathbb {R}^n_{J}=T_{\Phi }(x^*)\cap T_S(x^*). \end{aligned}$$
(10)

\(\square \)

Lemma 3.2

Let \(0\ne x^*\in \Phi \cap S\), \(\lambda ^*\in N_K(G(x^*))\), and the RSRCQ holds at \((x^*,\lambda ^*)\). Then for any \(\lambda \in N_K(G(x^*))\), \(\big (\hbox {DG}(x^*)^{\top }\lambda \big )_{\Gamma ^*}=\big (\hbox {DG}(x^*)^{\top }\lambda ^*\big )_{\Gamma ^*}\) implies \(\lambda =\lambda ^*.\)

Proof

Denote \(A:= \hbox {DG}(x^*)\), \(r:= |\Gamma ^*|\), and \(A_{\Gamma ^*}\) as the submatrix consisting of columns of A indexed by \(\Gamma ^*\). For any given \(\lambda \in N_K(G(x^*))\) satisfying \(\big (A^{\top }\lambda \big )_{\Gamma ^*}=\big (A^{\top }\lambda ^*\big )_{\Gamma ^*}\), denote \(\mu :=\lambda -\lambda ^*\). Then, \(A_{\Gamma ^*}^\top \mu =0\). The RSRCQ implies that for any \(y\in \mathbb {R}^m\) and any sufficiently small \(\epsilon >0\), there exist \(h\in \mathbb {R}^r\) and \(k_0\in K_0\) such that \(\epsilon y=G(x^*)+A_{\Gamma ^*}h-k_0.\) Thus,

$$\begin{aligned} \langle \mu ,G(x^*)-k_0 \rangle =\langle \mu , \epsilon y-A_{\Gamma ^*}h\rangle =\epsilon \langle \mu ,y \rangle -\langle A_{\Gamma ^*}^\top \mu ,h\rangle =\epsilon \langle \mu ,y \rangle . \end{aligned}$$

On the other hand, as \(k_0\in K_0\), we have

$$\begin{aligned} \langle \mu ,G(x^*)-k_0 \rangle =\langle \lambda ,G(x^*)-k_0 \rangle -\langle \lambda ^*,G(x^*)-k_0 \rangle =\langle \lambda ,G(x^*)-k_0 \rangle \ge 0. \end{aligned}$$

The arbitrariness of y immediately implies \(\mu =0\). \(\square \)

Proposition 3.2

Let \(0\ne x^*\in \Phi \cap S\), \(\lambda ^*\in N_K(G(x^*))\), and the RSRCQ holds at \((x^*,\lambda ^*)\). Then

$$\begin{aligned} \hat{N}_{\Phi \cap S}(x^*)=\hat{N}_{\Phi }(x^*)+\hat{N}_{S}(x^*). \end{aligned}$$
(11)

Proof

It is known from (10) and Lemma 3.1 that

$$\begin{aligned} T_{\Phi \cap S}(x^*) =\bigcup _{J\in \mathcal {J}^*}T_{\Phi }(x^*)\cap \mathbb {R}^n_{J}=\bigcup _{J\in \mathcal {J}^*}\left\{ d\in \mathbb {R}^n_{J}{:}\,\hbox {DG}(x^*)d\in T_K(G(x^*))\right\} . \end{aligned}$$
(12)

Now we claim that

$$\begin{aligned} \big (T_{\Phi }(x^*)\cap \mathbb {R}^n_{J}\big )^{\circ }=\hat{N}_{\Phi }(x^*)+\mathbb {R}^n_{\bar{J}}. \end{aligned}$$
(13)

From [32, Corollary 11.25] and (4), we have

$$\begin{aligned} \big (T_{\Phi }(x^*)\cap \mathbb {R}^n_{J}\big )^{\circ }=cl\big [\hat{N}_{\Phi }(x^*)+\mathbb {R}^n_{\bar{J}}\big ]=cl\big [\hbox {DG}(x^*)^\top N_K(G(x^*))+\mathbb {R}^n_{\bar{J}}\big ]. \end{aligned}$$
(14)

Since \(\hbox {DG}(x^*)^{\top }N_K(G(x^*))\) is a nonempty, closed and convex cone under RCQ, and for any \(z_1\in 0^+(\hbox {DG}(x^*)^{\top }N_K(G(x^*)))\) and \(z_2\in 0^+(\mathbb {R}^n_{\bar{J}})=\mathbb {R}^n_{\bar{J}}\) satisfying \(z_1+z_2=0\), we have \(z_1\in \hbox {DG}(x^*)^{\top }N_K(G(x^*))\) and \(z_1=-z_2\in \mathbb {R}^n_{\bar{J}}\), which yields that \(z_1\in \hbox {DG}(x^*)^{\top }N_K(G(x^*))\cap \mathbb {R}^n_{\bar{J}}.\) From (8), we obtain \(z_1=z_2=0\). Therefore, \(z_1\) and \(z_2\) belong to the linearity spaces of \(cl(\hbox {DG}(x^*)^{\top }N_K(G(x^*)))\) and \(cl({\mathbb {R}^n_{\bar{J}}})\). From [35, Corollary 9.1.1],

$$\begin{aligned} cl\big [\hbox {DG}(x^*)^{\top }N_K(G(x^*))+\mathbb {R}^n_{\bar{J}}\big ]=\hbox {DG}(x^*)^{\top }N_K(G(x^*))+\mathbb {R}^n_{\bar{J}}. \end{aligned}$$

Together with (14), we have proved the claim in (13).

If \(\Vert x^*\Vert _0=s\), then \(T_{\Phi \cap S}(x^*)=T_{\Phi }(x^*)\cap \mathbb {R}^n_{\Gamma ^*}\). By (13), we have

$$\begin{aligned} \hat{N}_{\Phi \cap S}(x^*)=(T_{\Phi \cap S}(x^*))^{\circ }=\hat{N}_{\Phi }(x^*)+\mathbb {R}^n_{\overline{\Gamma }^*}=\hat{N}_{\Phi }(x^*)+\hat{N}_{S}(x^*). \end{aligned}$$

If \(0<\Vert x^*\Vert _0<s\), from (10) and (13), we have

$$\begin{aligned} \hat{N}_{\Phi \cap S}(x^*)= & {} \big (T_{\Phi \cap S}(x^*) \big )^{\circ } =\big ( \bigcup _{J\in \mathcal {J}^*}T_{\Phi }(x^*)\cap \mathbb {R}^n_{J} \big )^{\circ }\\= & {} \bigcap _{J\in \mathcal {J}^*}\big (T_{\Phi }(x^*)\cap \mathbb {R}^n_{J} \big )^{\circ }=\bigcap _{J\in \mathcal {J}^*}\big (\hat{N}_{\Phi }(x^*)+ \mathbb {R}^n_{\bar{J}}\big ). \end{aligned}$$

We next show that \(\bigcap _{J\in \mathcal {J}^*}\big (\hat{N}_{\Phi }(x^*)+ \mathbb {R}^n_{\bar{J}}\big )=\hat{N}_{\Phi }(x^*)\). It suffices to show \(\bigcap _{J\in \mathcal {J}^*}\big (\hat{N}_{\Phi }(x^*)+ \mathbb {R}^n_{\bar{J}}\big )\subseteq \hat{N}_{\Phi }(x^*).\) Let \(v\in \bigcap _{J\in \mathcal {J}^*}\big (\hat{N}_{\Phi }(x^*)+ \mathbb {R}^n_{\bar{J}}\big )\). Pick any two index sets \(J'\), \(J^{''} \in \mathcal {J}^*\), we have \(v=\hbox {DG}(x^*)^{\top }\lambda '+\xi '=\hbox {DG}(x^*)^{\top }\lambda ''+\xi ^{''}\), with \(\lambda ',~\lambda ^{''} \in N_K(G(x^*))\) and \(\xi '\in \mathbb {R}^n_{\bar{J'}}\), \(\xi ^{''}\in \mathbb {R}^n_{\bar{J^{''}}}\). Since \(\bigcap _{J\in \mathcal {J}^*}J=\Gamma ^*\), then

$$\begin{aligned} v_{\Gamma ^*}=\big (\hbox {DG}(x^*)^{\top }\lambda '\big )_{\Gamma ^*}=\big (\hbox {DG}(x^*)^{\top }\lambda ^{''}\big )_{\Gamma ^*}. \end{aligned}$$

From the RSRCQ and Lemma 3.2, we have \(\lambda '=\lambda ^{''}\). \(\bigcup _{J\in \mathcal {J}^*}J=\{1,\ldots ,m\}\), then \(v\in \hbox {DG}(x^*)^{\top }N_K(G(x^*))\). By invoking Lemma 2.1, we obtain (11).\(\square \)

It is worth pointing out that when \(K=\mathbb R^l_+ \times \{0\}^{m-l}\), the RRCQ is exactly the CC-MFCQ as proposed in [22] and the RSRCQ can be implied by the CC-LICQ in [22]. Some weak CQs, which are generalized versions of Mangasarian–Fromovitz, Abadie and Guignard constraint qualifications (abbreviated as GMFCQ, GACQ and GGCQ, respectively) are proposed in [36] for a general minimization problem whose feasible region is \(\{x\in \mathbb R^n{:}\, F(x)\in \Lambda \}\) with some continuously differentiable function F and some nonempty and closed set \(\Lambda \). Since the constrained system of (1) can be rewritten as \(\big (G(x),x\big )\in K\times \bigcup _{J\in \mathcal {J}}\mathbb {R}^n_J=:K(s)\), where K(s) is evidently nonempty and closed, the corresponding GMFCQ can be easily shown to be weaker than RRCQ. The relationship among all the above CQs is illustrated as follows (Fig. 1).

Fig. 1
figure 1

The relationship among the above CQs, where * means the case of \(K=\mathbb {R}^l_+\times \{0\}^{m-l}\)

All the involved functions in the constrained systems as discussed in the aforementioned CQs are continuously differentiable. Functions without continuous differentiability are also considered in [37, 38] to define the weakest CQ for equality and inequality constrained systems. In [38], the involved functions are called \(\mathcal {C}\)-differentiable functions and hence the corresponding CQ is called the CCQ. If the problem with differentiable equality and inequality constraints, we have the following scheme:

It will be a challenging work to extend this CCQ to conic constrained system with cardinality constraint or to consider CCCP problem where the involved functions are not continuously differentiable.

4 First-Order Optimality Conditions

Denote the Lagrangian function of problem (1) as

$$\begin{aligned} L(x,\lambda ):=f(x)+\langle \lambda ,G(x)\rangle , ~(x,\lambda )\in \mathbb {R}^n\times \mathbb {R}^m. \end{aligned}$$
(15)

Lagrangian multiplier sets in the sense of Fr\(\acute{e}\)chet and Clarke are defined as

$$\begin{aligned}&\Lambda ^B(x^*):=\{\lambda \in \mathbb {R}^m{:}\, -\nabla _xL(x^*,\lambda )\in \hat{N}_S(x^*),\lambda \in N_K(G(x^*))\}, \end{aligned}$$
(16)
$$\begin{aligned}&\Lambda ^C(x^*):=\{\lambda \in \mathbb {R}^m{:}\, -\nabla _xL(x^*,\lambda )\in N^C_S(x^*),\lambda \in N_K(G(x^*))\}, \end{aligned}$$
(17)

respectively. Obviously, \(\Lambda ^B(x^*)\subseteq \Lambda ^C(x^*)\).

Theorem 4.1

(First-order necessary optimality condition) Let \(x^*\ne 0\) be a locally optimal solution of (1), and \(\lambda ^*\in N_K(G(x^*))\).

  1. (i)

    If the RRCQ holds at \(x^*\), then \(d=0\) is an optimal solution of

    $$\begin{aligned} \min _{d\in \mathbb {R}^n} \nabla f(x^*)^{\top }d ~~\mathrm {s.t.} ~~d\in T_{S}(x^*)~ \mathrm {and}~\hbox {DG}(x^*)d\in T_K(G(x^*)). \end{aligned}$$
    (18)
  2. (ii)

    If the RRCQ holds at \(x^*\), then \(\Lambda ^C(x^*)\) is a nonempty, convex and compact set.

  3. (iii)

    If the RSRCQ holds at \((x^*,\lambda ^*)\), then \(\Lambda ^B(x^*)\) is a singleton.

Proof

(i) By Proposition 3.1 and Lemma 3.1, the RRCQ implies

$$\begin{aligned} T_{\Omega }(x^*)=T_{\Phi }(x^*)\cap T_{S}(x^*)=\{d\in T_S(x^*){:}\,\hbox {DG}(x^*)d\in T_K(G(x^*))\}. \end{aligned}$$

Utilizing the property of tangent cone, we can easily get (i).

(ii) Consider the following convex problem

$$\begin{aligned} \min _{d\in \mathbb {R}^n} \nabla f(x^*)^{\top }d ~~\mathrm {s.t.} ~~d\in T^C_{S}(x^*)~ \mathrm {and}~\hbox {DG}(x^*)d\in T_K(G(x^*)). \end{aligned}$$
(19)

Obviously, \(d=0\) is an optimal solution of (19) since \( T^C_{S}(x^*)\subseteq T_S(x^*)\), and its RCQ holds by the hypothesis. Note that the dual of (19) takes the form

$$\begin{aligned} \max _{\lambda }\{0:-\nabla _xL(x^*,\lambda )\in N^C_S(x^*), \lambda \in N_K(G(x^*) \}. \end{aligned}$$

The strong duality theory for convex programming implies (ii) immediately.

(iii) Under Proposition 3.2 and (4), we have

$$\begin{aligned} \hat{N}_{\Omega }(x^*)=\hat{N}_{\Phi }(x^*)+\hat{N}_{S}(x^*)=(\hbox {DG}(x^*))^{\top }N_K(G(x^*))+\hat{N}_{S}(x^*). \end{aligned}$$

Additionally, it is known from (i) that \(-\nabla f(x^*)\in \hat{N}_{\Omega }(x^*)\). Thus, there always exists some \(\lambda \in N_K(G(x^*))\) such that

$$\begin{aligned} -\nabla _x L(x^*,\lambda )=-\nabla f(x^*)-(\hbox {DG}(x^*))^{\top }\lambda \in \hat{N}_{S}(x^*), \end{aligned}$$

which indicates that \(\Lambda ^B(x^*)\ne \emptyset \). For any \(\lambda '\in \Lambda ^B(x^*)\), it follows from Lemma 2.1 and (16) that \(\big (\hbox {DG}(x^*)^{\top }\lambda \big )_{\Gamma ^*}=\big (\hbox {DG}(x^*)^{\top }\lambda '\big )_{\Gamma ^*}=0.\) Immediately we have \(\lambda '=\lambda \) by invoking Lemma 3.2. This completes the proof. \(\square \)

The inverse implications of the conditions (ii) and (iii) in Theorem 4.1 do not hold as the following two examples illustrate.

Example 4.1

Consider the problem \(\min _{x\in \mathbb {R}^2}\left\{ x_2~~s.t.~ x_2= 0,~\Vert x\Vert _0\le 1\right\} .\) Obviously, \(x^*=(1,0)^\top \) is an optimal solution, and the RRCQ does not hold at \(x^*\) since \((0,1)(\mathbb {R},0)^{\top }-\{0\}\ne \mathbb {R}\). To prove that the inverse implication of (ii) in Theorem 4.1 does not hold, \(\Lambda ^C(x^*)\) should be a nonempty, convex and compact set; however, \(\Lambda ^C(x^*)={\mathbb {R}}\) is not compact.

Example 4.2

Consider the problem

$$\begin{aligned} \min _{x\in \mathbb {R}^3}\left\{ f(x)=x_1^2~~s.t. ~G(x)=(g_{i,j}(x))\in \mathbb {S}_+^3,~\Vert x\Vert _0\le 2\right\} ,\end{aligned}$$
(20)

where \(g_{1,1}(x)=x_1\), \(g_{i,4-i}(x)=x_2\), \(i=1,2,3\), \(g_{3,3}(x)=1-x_3\) and others are zero. \(\mathbb {S}_+^3\) is the cone consisting of all 3-by-3 symmetric positive semidefinite matrices. Evidently, the optimal solution set of (20) is \(\{(0,0,x_3){:}\,x_3\le 1\}\). Let \(x^*=(0,0,1)^{\top }\). Then \(\Gamma ^*=\{3\}\) and \(\Lambda ^B(x^*) =\{ O_{3\times 3}\}\). For any given \(Z\in N_{{\mathbb {S}}^3_+}(G(x^*))\), denote \(K_0:=\left\{ Y\in {\mathbb {S}}^3_+:\langle Z,Y-G(x^*)\rangle =0\right\} ={\mathbb {S}}^3_+\cap Z^{\perp }.\) Thus,

$$\begin{aligned} \mathcal{M}:=G(x^*)+\hbox {DG}(x^*){\mathbb {R}}^3_{\Gamma ^*}-K_0=\left\{ t\mathrm {e}_3\mathrm {e}_3^{\top }-Y: t\in {\mathbb {R}}, Y\in {\mathbb {S}}^3_+\cap Z^{\perp }\right\} . \end{aligned}$$

Note that for any \(U\in \mathcal{M}\), the submatrix \(U_{[1:2,1:2]}\in -{\mathbb {S}}^2_+\). This immediately implies that \(0\notin \hbox {int} \mathcal{M}\), which indicates that the RSRCQ does not hold at \((x^*,\lambda ^*)\) with \(\lambda ^*\in N_K(G(x^*))\) for any \(Z\in N_K(G(x^*))\).

Next, we proceed with establishing the first-order sufficient conditions. We define a sufficient optimality condition for the local minimizer.

Definition 4.1

We say that the \(\gamma \)-growth (\(\gamma >0\)) condition holds for a function f at point \(x^*\), if there exist a constant \(c>0\) and a neighborhood \(\mathcal {N}(x^*)\) of \(x^*\) such that for all \(x\in \mathcal {N}(x^*)\), the following inequality holds

$$\begin{aligned} f(x)\ge f(x^*)+c\Vert x-x^*\Vert ^{\gamma },~~\forall x\in \Omega \cap \mathcal {N}(x^*). \end{aligned}$$

For a feasible point \(x^*\in \Omega \) and \(\eta \ge 0\), define the set

$$\begin{aligned} \mathcal {F}_{\eta }(x^*):=\{d\in T_S(x^*), \hbox {dist}[\hbox {DG}(x^*)d, T_K(G(x^*))]\le \eta \Vert d\Vert \}. \end{aligned}$$
(21)

\(\mathcal {F}_{\eta }(x^*)\) is a closed cone. If \(\eta =0\), \(\mathcal {F}_{0}(x^*):=\{d\in T_S(x^*), \hbox {DG}(x^*)d\in T_K(G(x^*))\}\) is the feasible set of problem (18).

Theorem 4.2

(First-order sufficient optimality condition) Let \(x^*\) be a feasible point of the problem (1). The following assertions hold.

  1. (i)

    If there exist constants \(\alpha >0\) and \(\eta >0\) such that

    $$\begin{aligned} \nabla f(x^*)^{\top }d\ge \alpha \Vert d\Vert ,~~\forall d\in \mathcal {F}_{\eta }(x^*), \end{aligned}$$
    (22)

    then the first-order growth condition holds at \(x^*\).

  2. (ii)

    Suppose that RRCQ is satisfied at \(x^*\). Set \(\eta =0\). If there exists an \(\alpha >0\) such that (22) holds, then the first-order growth condition holds at \(x^*\).

Proof

(i) Consider a feasible point \(x^*+d\in \Phi \cap S\) for a sufficiently small \(d\in \mathbb {R}^n\). We first show \(d\in T_S(x^*)\). Assume \(d_i\ne 0\). Then \((x^*+d)_i\ne 0\) holds if \(x^*_i=0\) obviously and also holds if \(x^*_i\ne 0\) by letting \(\Vert d\Vert <\min \{|x^*_i|, i\in \Gamma ^*\}\). This means that \(\hbox {supp}(d)\subseteq \hbox {supp}(x^*+d)\), or \(\Vert d\Vert _0\le \Vert x^*+d\Vert _0\le s\). Furthermore, we assert that \(\Gamma ^*\subseteq \hbox {supp}(d)\). In fact, if there is an \(i_0\in \Gamma ^*\) with \(d_{i_0}= 0\), then we can let \(s+1-|\Gamma ^*|\) elements of \(d_{\overline{\Gamma }^*}\) be nonzeros. In such case, \(\Vert d\Vert _0=s\) but \(\Vert x^*+d\Vert _0\ge s+1\), which is a contradiction with \(x^*+d\in S\). Hence, \(d\in T_S(x^*)\).

It follows from the Taylor expansion \(G(x^*+d)=G(x^*)+\hbox {DG}(x^*)d+o(\Vert d\Vert )\) and \(G(x^*+d)\in K\subseteq G(x^*)+T_K(G(x^*))\) that \(\hbox {DG}(x^*)d+o(\Vert d\Vert )\in T_K(G(x^*))\). Then it yields that

$$\begin{aligned} \hbox {dist}(\hbox {DG}(x^*)d,T_K(G(x^*)))\le \Vert \hbox {DG}(x^*)d-\hbox {DG}(x^*)d-o(\Vert d\Vert )\Vert =o(\Vert d\Vert ). \end{aligned}$$
(23)

So \(d\in \mathcal {F}_{\eta }(x^*)\) for sufficiently small d. By (22), we obtain

\(~~~~~~~~f(x^*+d)=f(x^*)+ \nabla f(x^*)^{\top }d+o(\Vert d\Vert )\ge f(x^*)+\alpha \Vert d\Vert +o(\Vert d\Vert ),\)

which proves (i).

(ii) Using the similar proof as (i), for a feasible point \(x^*+d\in \Omega \) with a sufficiently small \(d\in \mathbb {R}^n\), we have \(d\in T_S(x^*)\). Suppose \(d\in \mathbb {R}^n_J\) for some \(J\in \mathcal {J}^*\). From \(G(x^*+d)\in K\), we also have (23).

Under (5), we have \(\hbox {DG}(x^*)\mathbb {R}^n_J-T_K(G(x^*))=\mathbb {R}^m.\) Lemma 3.1 yields that

\(\hbox {dist}(d, (\hbox {DG}(x^*))^{-1})T_K(G(x^*))\le c\cdot \hbox {dist}(\hbox {DG}(x^*)d,T_K(G(x^*)))=o(\Vert d\Vert ). \)

Then there is a \(d'\in (\hbox {DG}(x^*))^{-1})T_K(G(x^*)\) such that \(\Vert d-d'\Vert =o(\Vert d\Vert )\). From \(\Vert d-d'\Vert =o(\Vert d\Vert )\),we have \(\hbox {supp}(d')\subseteq \hbox {supp}(d)\). Otherwise, if \(d'_i\ne 0\) but \(d_i=0\), then \(\Vert d-d'\Vert \ge |d'_i|\). This is contradicted with \(\Vert d-d'\Vert =o(\Vert d\Vert )\). Then \(d'\in \mathbb {R}^n_J\) from \(d\in \mathbb {R}^n_J\). Therefore, we obtain that there exists a \(d'\in \mathbb {R}^n_J\subseteq T_S(x^*)\) satisfying \(\hbox {DG}(x^*)d'\in T_K(G(x^*)\) and \(\Vert d-d'\Vert =o(\Vert d\Vert )\). By (22), we have \( \nabla f(x^*)^{\top }d'\ge \alpha \Vert d'\Vert .\)

Using the first-order Taylor expansion of f,

$$\begin{aligned} \begin{aligned} f(x^*+d)=&f(x^*)+ \nabla f(x^*)^{\top }d+o(\Vert d\Vert )\\ =&f(x^*)+ \nabla f(x^*)^{\top }d'+ \nabla f(x^*)^{\top }(d-d')+o(\Vert d\Vert )\\ \ge&f(x^*)+\alpha \Vert d'\Vert + \nabla f(x^*)^{\top }(d-d')+o(\Vert d\Vert )\\ \ge&f(x^*)+\alpha (\Vert d\Vert -\Vert d-d'\Vert )+ \nabla f(x^*)^{\top }(d-d')+o(\Vert d\Vert )\\ =&f(x^*)+\alpha \Vert d\Vert +o(\Vert d\Vert ). \end{aligned} \end{aligned}$$

Then the first-order growth condition holds. \(\square \)

5 Second-Order Optimality Conditions

In this section, we assume that functions \(f(\cdot )\) and \(G(\cdot )\) in problem (1) are twice continuously differentiable on \(\Omega \). The linearized problem of (1) is defined by (18) and the critical cone is

$$\begin{aligned} C(x^*):=\{d\in T_S(x^*): \hbox {DG}(x^*)d\in T_K(G(x^*)), \nabla f(x^*)^{\top }d=0\}. \end{aligned}$$
(24)

For some \(d, w\in \mathbb {R}^n\), denote a path \(x(\cdot ): \mathbb {R}_+\mapsto S\) with the form:

$$\begin{aligned} x(t)=x^*+td+\frac{1}{2}t^2w+r(t) \end{aligned}$$

such that \(r(t)=o(t^2)\). Then by using the second-order Taylor expansion of G at \(x^*\), we obtain

(25)

where \(D^2G(x^*)(d, d)=[D^2G(x^*)d]d\) is the quadratic form corresponding to the second-order derivative \(D^2G(x)\) of G at \(x^*\). It follows from the definition of the outer second-order tangent set that for some \(t_n\downarrow 0\), \({\hbox {dist}}\) (G(x(t)) , K) = \(o(t_n^2)\) if and only if

$$\begin{aligned} \hbox {DG}(x^*)w+D^2G(x^*)(d,d)\in T_K^2(G(x^*),\hbox {DG}(x^*)d). \end{aligned}$$
(26)

Lemma 5.1

Suppose that \(x^*\) is a locally optimal solution of problem (1) and RRCQ holds at \(x^*\). Then for any \(d\in C(x^*)\) and \(w\in T_S^2(x^*,d)\) satisfying (26), it follows that

$$\begin{aligned} \nabla f(x^*)^{\top }w+\langle d,\nabla ^2f(x^*)d\rangle \ge 0. \end{aligned}$$
(27)

Proof

Consider \(d\in C(x^*)\) and \(w\in T_S^2(x^*,d)=T_S(x^*)\cap T_S(d)\) satisfying (26). It yields that there exists a sequence \(t_n\downarrow 0\) such that for some \(J\in \mathcal {J}^*\), \(\bar{x}(t_n): =x^*+t_nd+\frac{1}{2}t_n^2w\in \mathbb {R}^n_J\) and \(\hbox {dist}(G(\bar{x}(t_n)),K)=o(t_n^2).\) Therefore, we obtain \( \hbox {dist}(\bar{x}(t_n),G^{-1}(K))\le \hbox {dist}(G(\bar{x}(t_n),K))=o(t_n^2) \) from RRCQ and Lemma 3.1. By restricting \(o(t_n^2)\) on \(\mathbb {R}^n_J\), we have

\(~~~~~~~~x(t_n)=\bar{x}(t_n)+o(t_n^2)=x^*+t_nd+\frac{1}{2}t_n^2w+o(t_n^2)\in G^{-1}(K)\cap \mathbb {R}^n_J,\)

i.e., the point \(x(t_n)\) are feasible.

By the second-order Taylor expansion of f at \(x^*\), we have

$$\begin{aligned} \begin{aligned} f(x(t_n))=&f\left( x^*+t_nd+\frac{1}{2}t_n^2w+o(t_n^2)\right) \\ =&f(x^*)+\frac{1}{2}t_n^2[ \nabla f(x^*)^{\top }w+\langle d,\nabla ^2f(x^*)d\rangle ]+o(t_n^2). \end{aligned} \end{aligned}$$

Combining with \( f(x(t_n))\ge f(x^*)\), we get (27). \(\square \)

Immediately, for any \(d\in C(x^*)\), the optimal value of the following problem is nonnegative

(28)

Theorem 5.1

(Second-order necessary optimality condition) Suppose \(x^*\ne 0\) is a locally optimal solution of (1) and the RRCQ holds at \(x^*\). Then for any \(d\in C(x^*)\) and any convex set \(\mathcal {T}(d)\subseteq T_K^2\big (G(x^*),\hbox {DG}(x^*)d \big ),\) the following inequality holds:

$$\begin{aligned} \sup _{\lambda \in \Lambda ^C(x^*)} \big \{\langle d,\nabla ^2_{xx}L((x^*,\lambda )d\rangle -\sigma \big (\lambda ,\mathcal {T}(d))\big \}\ge 0. \end{aligned}$$
(29)

Proof

It follows from \(T_{S}^2(x^*,d)=T_S(x^*)\cap T_S(d)\) for any \(d\in T_S(x^*)\) that \(\mathbb {R}^n_{\Gamma ^*}\subseteq T_{S}^2(x^*,d)\). Let \(T(d):=cl\{\mathcal {T}(d)+T_K(G(x^*))\}\), which is the topological closures of the sum of two convex sets. Hence T(d) is convex.

Since \(\mathcal {T}(d)\subseteq T_K^2\big (G(x^*),\hbox {DG}(x^*)d \big )\) and \(\hbox {DG}(x^*)d\in T_K(G(x^*))\), by [34, Proposition 3.34] and the closedness of \(T_K^2\big (G(x^*),\hbox {DG}(x^*)d \big )\), we have

$$\begin{aligned} T(d)\subset T_K^2\big (G(x^*),\hbox {DG}(x^*)d \big ). \end{aligned}$$

Clearly, if we replace \(T_S^2(x^*,d)\) and \(T_K^2\big (G(x^*),\hbox {DG}(x^*)d \big )\) in (28) by its subset \(\mathbb {R}^n_{\Gamma ^*}\) and T(d), respectively, the optimal value of the obtained optimization problem will be greater than or equal to the optimal value of (28), and hence the optimal value of the problem

$$\begin{aligned} \min _w\left\{ \nabla f(x^*)^{\top }w\,{+}\,\langle d,\nabla ^2f(x^*)d\rangle :\hbox {DG}(x^*)w\,{+}\,D^2G(x^*)(d,d)\in T(d),w\in \mathbb {R}^n_{\Gamma ^*}\right\} \end{aligned}$$
(30)

is nonnegative as well. Equation (30) is linear, and its dual is

$$\begin{aligned} \max _{\lambda \in \Lambda ^C(x^*)} \big \{\langle d,\nabla ^2_{xx}L((x^*,\lambda )d\rangle -\sigma (\lambda ,T(d))\big \}. \end{aligned}$$
(31)

Indeed, the Lagrangian function of (30) is

$$\begin{aligned}&\mathcal {L}(w,\mu ,\lambda )\\&\quad = \nabla f(x^*)^{\top }w+\langle d,\nabla ^2f(x^*)d\rangle +\langle \lambda ,\hbox {DG}(x^*)w\rangle +\langle \lambda ,D^2G(x^*)(d,d)\rangle +\langle \mu ,w\rangle \\&\quad =\nabla f(x^*)^{\top }w+\langle d,\nabla ^2f(x^*)d\rangle +\langle \lambda ,\hbox {DG}(x^*)w\rangle +\langle d,[D^2G(x^*)d]^{\top }\lambda \rangle +\langle \mu ,w\rangle \\&\quad = \langle \nabla _xL(x^*,\lambda )+\mu , w\rangle +\langle d,\nabla ^2_{xx}L((x^*,\lambda )d\rangle . \end{aligned}$$

Thus, \(\inf _w \mathcal {L}(w, \mu , \lambda )=\langle d,\nabla ^2_{xx}L((x^*,\lambda )d\rangle , ~~\nabla _xL(x^*,\lambda )+\mu =0.\)

It follows that \(\sigma ((\mu ,\lambda ),\mathbb {R}^n_{\Gamma ^*}\times T(d))=\sigma (\mu ,\mathbb {R}^n_{\Gamma ^*})+\sigma (\lambda ,T(d))=\infty \), for any \(\mu \notin \mathbb {R}^n_{\overline{\Gamma }^*}=N^C_S(x^*)\) or \(\lambda \notin (T_K(G(x^*)))^{\circ }=N_K(G(x^*))\). Therefore, the effective domain of the parametric dual of (30) is contained in \(\Lambda ^C(x^*)\). The duality then follows.

For any \(z=(z_1,z_2)\in \mathbb {R}^n_{\Gamma ^*}\times T(d)=cl\{\mathbb {R}^n_{\Gamma ^*}\times \mathcal {T}(d)+\mathbb {R}^n_{{\Gamma }^*}\times T_K(G(x^*))\}\), we have \(z+\mathbb {R}^n_{{\Gamma }^*}\times T_K(G(x^*))\subset \mathbb {R}^n_{\Gamma ^*}\times T(d).\) Then \(z_2+ T_K(G(x^*))\subseteq T(d)\). Moreover, the RRCQ implies that \(\hbox {DG}(x^*)\mathbb {R}^n_{\Gamma ^*}-T_K(G(x^*))=\mathbb {R}^m.\) Thus, \(z_2+\hbox {DG}(x^*)\mathbb {R}^n_{\Gamma ^*}-T(d)=\mathbb {R}^m, \) which means \(\hbox {DG}(x^*)\mathbb {R}^n_{\Gamma ^*}-T(d)=\mathbb {R}^m.\) Hence there is \(w\in \mathbb {R}^n_{\Gamma ^*}\) such that \(\hbox {DG}(x^*)w+D^2G(x^*)(d,d)\in T(d).\) Therefore, Eq. (30) has a feasible solution. Moreover, from RRCQ, we have

$$\begin{aligned} 0\in \hbox {int}\{\hbox {DG}(x^*)\mathbb {R}^n_{\Gamma ^*}-T_K(G(x^*))\} \subseteq \hbox {int}\{\hbox {DG}(x^*)\mathbb {R}^n_{\Gamma ^*}-T(d)\}, \end{aligned}$$

where the last inclusion is due to \(T_K(G(x^*))\subseteq T(d)\). From the feasibility of \(w^*\), it holds that \(\hbox {DG}(x^*)w^*+D^2G(x^*)(d,d)\in T(d)\). Then

$$\begin{aligned}&\hbox {int}\{\hbox {DG}(x^*)\mathbb {R}^n_{\Gamma ^*}-T(d)\} \\= & {} \hbox {int}\{\hbox {DG}(x^*)\mathbb {R}^n_{\Gamma ^*}+\hbox {DG}(x^*)w^*+D^2G(x^*)(d,d)-T(d)\}\\= & {} \hbox {int}\{\hbox {DG}(x^*)\mathbb {R}^n_{\Gamma ^*}+D^2G(x^*)(d,d)-T(d)\}, \end{aligned}$$

where the last equality follows from the feasibility of \(w^*\) and \(\mathbb {R}^n_{\Gamma ^*}+w^*=\mathbb {R}^n_{\Gamma ^*}\). Therefore, the RCQ for problem (30) holds. Consequently, from [34, Theorem 2.165], there is no duality gap between (30) and its dual (31). We obtain that the optimal value of (31) is nonnegative. Since \(\mathcal {T}(d)\subset T(d)\), we have that \(\sigma (\lambda ,\mathcal {T}(d))\le \sigma (\lambda ,T(d))\), and hence (29) follows. \(\square \)

Let \(\mathcal {A}_{K,M}(d):=\mathcal {A}_{K,M}(G(x^*),\hbox {DG}(x^*)d)\) with \(M:=\hbox {DG}(x^*)\) be the upper second-order approximation set for K at the point \(G(x^*)\) in the direction \(\hbox {DG}(x^*)d\in T_K(G(x^*))\) (see [34, Definition 3.82]). We have the following second-order sufficient condition for the CCCP.

Theorem 5.2

(Second-order sufficient optimality condition) Suppose that \(x^*\ne 0\) is a feasible solution with \(\Lambda ^B(x^*)\ne \emptyset \), and the RRCQ holds at \(x^*\). Then the second-order growth condition holds at \(x^*\) if

$$\begin{aligned} \sup _{\lambda \in \Lambda ^B(x^*)} \big \{\langle d,\nabla ^2_{xx}L((x^*,\lambda )d\rangle -\sigma (\lambda , \mathcal {A}_{K,M}(d))\big \}>0,~\forall d\in C(x^*)\setminus \{0\}. \end{aligned}$$
(32)

Proof

By invoking Proposition 3.1, the desired result can be obtained by mimicking the proof of [34, Theorem 3.83]. \(\square \)

It is worth mentioning that for the above two second-order optimality conditions as stated in Theorems 5.1 and 5.2, one may observe that besides the change from the inequality in (29) to the strict inequality in (32), the Lagrangian multiplier set for taking the supremum over becomes smaller since \(\Lambda ^B(x^*)\subseteq \Lambda ^C(x^*)\). This makes that (32) implies (29).

6 Conclusions

In this paper, we studied the first- and second-order optimality conditions for the CCCP problem which provides a unified framework for most existing cardinality-constrained optimization problems. By introducing the RRCQ and RSRCQ, we decomposed the tangent cone and the normal cone to the involved feasible region and consequently presented the first-order optimality conditions. The second-order optimality conditions were also proposed by exploiting the outer second-order tangent set of the cardinality-constrained set. These proposed results heavily relied on the tailored constraint qualifications and the special structure of the cardinality-constrained set. It might be an interesting future research topic to see whether one can obtain the optimality conditions when the involved objective function and the closed convex cone enjoy special structures in the general CCCP model.