1 Introduction

Consider the cardinality-constrained optimization problem

$$\begin{aligned} \min _x \ f(x) \quad \text {s.t.} \quad x \in X, \Vert x \Vert _0 \le \kappa \end{aligned}$$
(1.1)

with a set \(X \subseteq {\mathbb {R}}^n\) described by some standard constraints

$$\begin{aligned} X := \left\{ x \in {\mathbb {R}}^n \mid g_i(x) \le 0 \ (i = 1, \ldots , m), \ h_i(x) = 0 \ (i = 1, \ldots , p) \right\} \end{aligned}$$
(1.2)

and \(\Vert x \Vert _0\) denoting the number of nonzero components of the vector x. Throughout the paper, we assume implicitly that \(f, g_i, h_i : {\mathbb {R}}^n \rightarrow {\mathbb {R}}\) are continuously differentiable, and that the parameter \(\kappa \) is less than n (otherwise, there would be no sparsity constraint on the cardinality of the solution vector).

Problem (1.1) has many important applications including portfolio optimization problems with constraints on the number of assets [7], the subset selection problem in regression [16], or the compressed sensing technique applied in signal processing [9]. It is, however, very difficult to solve due to the presence of the cardinality constraint \(\Vert x \Vert _0 \le \kappa \). It is usually viewed as a mixed-integer problem since it can be reformulated in such a way by introducing suitable binary variables, see [7]. Methods for the solution of cardinality-constrained optimization problems therefore typically apply or adapt techniques from discrete optimization and often consider only special instances of our problem (1.1), see [6, 7, 10, 17, 24, 26, 29] and references therein for a couple of examples. A notable exception is the paper [3] where the authors use ideas from continuous optimization in order to find suitable stationary points of the problem (1.1) with \(X = {\mathbb {R}}^{n}\). Since the particular structure of the additional constraints described by X plays a central role in the subsequent analysis, our results are completely different from those presented in [3].

Here we follow an observation which is very close to the one made in [11] in the context of the related class of sparse optimization problems. This observation was slightly afterwards (and independently) also made by the authors of the accompanying paper [8] for cardinality-constrained problems: It is not difficult to see that \(x^*\) is a solution of (1.1) with X given by (1.2) if and only if there exists a suitable vector \(y^*\) such that \((x^*, y^*)\) is a solution of the mixed-integer program

$$\begin{aligned} \begin{array}{rcl} \displaystyle {\min _{x,y} \ f(x)} &{} \text {s.t.} &{} \displaystyle {g_i(x) \le 0 \quad \forall i = 1, \ldots , m,} \\ &{} &{} \displaystyle {h_i(x) = 0 \quad \forall i = 1, \ldots , p,} \\ &{} &{} \displaystyle {e^T y \ge n - \kappa ,} \\ &{} &{} \displaystyle {y_i \in \{0, 1\} \quad \forall i = 1, \ldots , n,} \\ &{} &{} \displaystyle {x_i y_i = 0 \quad \forall i = 1, \ldots , n,} \end{array} \end{aligned}$$

where \(e := (1, \ldots , 1)^T \in {\mathbb {R}}^n\) denotes the all one vector. Using the standard relaxation of the binary constraints \(y_i \in \{0, 1\} \), we arrive at the following continuous optimization problem:

$$\begin{aligned} \begin{array}{rcl} \displaystyle {\min _{x,y} \ f(x)} &{} \text {s.t.} &{} \displaystyle {g_i(x) \le 0 \quad \forall i = 1, \ldots , m,} \\ &{} &{} \displaystyle {h_i(x) = 0 \quad \forall i = 1, \ldots , p,} \\ &{} &{} \displaystyle {e^T y \ge n - \kappa ,} \\ &{} &{} \displaystyle {0 \le y_i \le 1 \quad \forall i = 1, \ldots , n,} \\ &{} &{} \displaystyle {x_i y_i = 0 \quad \forall i = 1, \ldots , n,} \end{array} \end{aligned}$$
(1.3)

which is sometimes called the relaxed problem due to its derivation. As pointed out in the accompanying paper [8], there is still a very close relation between the two problems (1.1) and (1.3): \( x^*\) is a solution (global minimum) of (1.1) with X as in (1.2) if and only if there exists a vector \(y^*\) such that the pair \((x^*, y^*)\) solves (1.3). Furthermore, every local minimum of (1.1), where the feasible set is given by (1.2), yields a local minimum of (1.3), and the converse is also true under suitable conditions, see [8] for more details.

Hence (1.3) provides a reformulation of the difficult cardinality constrained optimization problem as a minimization problem in continuous variables. This reformulation has been used to obtain suitable methods for the solution of problem (1.1) in [8] which are, of course, not able to find a global minimum, but an appropriate stationary point in an efficient way, see also the extensive numerical results presented in [11] for sparse optimization problems. Here we want to exploit the relationship between the two problems (1.1) and (1.3) in order to obtain suitable optimality conditions for the original cardinality-constrained problem.

There is, however, still a difficulty since the reformulation (1.3) typically does not satisfy the standard constraint qualifications known for nonlinear programs, hence the usual KKT theory cannot be applied, at least not directly. In addition, the reformulation (1.3) looks similar to a mathematical program with complementarity constraints (MPCC) for which, in the meantime, there exists a rich theory, cf. [14, 20] for some background material on MPCCs.

However, direct application of most MPCC-tailored constraint qualifications is not possible in general. Furthermore, and most interestingly, it turns out that a direct inspection of the problem (1.3) yields results that are much stronger than the corresponding ones known for general MPCCs. Observations of this kind have already been given in [8] and will also arise in this paper, cf. Sect. 5 for a more detailed discussion.

The paper is organized in the following way: We first recall some basic definitions and a result on polar cones in Sect. 2. We then derive suitable constraint qualifications tailored to the cardinality-constrained problem (1.1) in Sect. 3 and provide sufficient conditions for these constraint qualifications. The main results are given in Sect. 4 where we use the previously introduced constraint qualifications to prove that a strong (KKT-like) stationarity result holds under very mild conditions (much less restrictive than for the corresponding result known for MPCCs). A detailed comparison of our results for the cardinality-constrained problem with the corresponding results known for MPCCs is given in Sect. 5.

Notation: \(e_i\) denotes the i-th unit vector in \( {\mathbb {R}}^n\). We denote by \({\mathbb {R}}_+ := [0, \infty )\) and \({\mathbb {R}}_{-} := (-\infty , 0]\) the set of nonnegative and nonpositive numbers, respectively. In this paper we use the term positive linear independence in the following sense. We say that a finite set of vectors \(a_i \ (i \in I)\) and \(b_j \ (j \in J)\) is positively linearly independent if there are no nonzero multipliers \((\lambda , \mu )\) with \(\lambda _i \ge 0 \ (i \in I)\) such that

$$\begin{aligned} \sum _{i \in I} \lambda _i a_i + \sum _{j \in J} \mu _j b_j = 0; \end{aligned}$$

otherwise these vectors are called positively linearly dependent. Note that, from the context, it should usually be clear which coefficients are sign-constrained and which coefficients are not. Following standard terminology in the optimization community, we call an affine-linear function simply linear.

2 Preliminaries

This section presents some preliminary material that will play an essential role in our subsequent analysis. To this end, we recall the definitions of several constraint qualifications for standard nonlinear programs and a result which simplifies the calculation of the cones that play a central role later on.

Let us begin by considering a nonlinear program in the standard form

$$\begin{aligned} \begin{array}{rcl} \displaystyle {\min _{x} \ f(x)} &{} \text {s.t.} &{} \displaystyle {g_i(x) \le 0 \quad \forall i = 1, \ldots , m,} \\ &{} &{} \displaystyle {h_i(x) = 0 \quad \forall i = 1, \ldots , p,} \end{array} \end{aligned}$$
(2.1)

Let X denote the feasible set of this program. Then the (Bouligand) tangent cone of X at a feasible point \(x^* \in X\) is defined by

$$\begin{aligned} {\mathscr {T}}_X(x^*) := \left\{ d \in {\mathbb {R}}^n \ \big | \ \exists \{ x^k\} \subseteq X, \exists \{t_k\} \downarrow 0: x^k \rightarrow x^* \text { and } \lim _{k \rightarrow \infty } \frac{x^k-x^*}{t_k} = d \right\} . \end{aligned}$$

On the other hand, the corresponding linearization cone of X at \(x^* \in X\) is given by

As part of our subsequent analysis, we will derive suitable expressions for the tangent and linearization cones corresponding to the specially structured nonlinear program from (1.3).

We also recall that, given an arbitrary cone \({\mathscr {C}} \subseteq {\mathbb {R}}^n \), its polar cone is defined by

$$\begin{aligned} {\mathscr {C}}^\circ := \left\{ v \in {\mathbb {R}}^n \mid v^T d \le 0 \ \forall d \in {\mathscr {C}} \right\} . \end{aligned}$$

Note that the polar cone is sometimes also denoted by \({\mathscr {C}}^*\) in the literature, see, for example, [23].

In the special case of a polyhedral convex cone, there is a simple duality relation known from convex analysis, see, e.g., [4, Proposition B.16].

Lemma 2.1

Let the cones

$$\begin{aligned} {\mathscr {C}}_1 := \left\{ d \in {\mathbb {R}}^n \mid a_i^T d \le 0 \ (i = 1, \ldots , m), \ b_i^T d = 0 \ (i = 1, \ldots , p) \right\} \end{aligned}$$

and

$$\begin{aligned} {\mathscr {C}}_2 := \left\{ v \in {\mathbb {R}}^n \ \big | \ v = \sum _{i=1}^m \lambda _i a_i + \sum _{i=1}^p \mu _i b_i, \ \lambda _i \ge 0 \ (i = 1, \ldots , m) \right\} \end{aligned}$$

be given. Then \({\mathscr {C}}_1 = {\mathscr {C}}_2^\circ \) and \({\mathscr {C}}_2 = {\mathscr {C}}_1^\circ \).

Based on the previously introduced notions, we are able to restate the definitions of several constraint qualifications known for standard nonlinear programs.

Definition 2.2

Let \(x^*\) be a feasible point of the nonlinear program (2.1). Then we say that \(x^*\) satisfies the

  1. (a)

    linear independence constraint qualification (LICQ) if the gradient vectors

    $$\begin{aligned} \nabla g_i(x^*) \ (i: g_i(x^*) = 0), \ \nabla h_i(x^*) \ (i = 1, \ldots , p) \end{aligned}$$

    are linearly independent;

  2. (b)

    Mangasarian–Fromovitz constraint qualification (MFCQ) if the gradient vectors \(\nabla h_i(x^*)\) \((i = 1, \ldots , p)\) are linearly independent and, in addition, there exists a vector \(d \in {\mathbb {R}}^n\) such that \(\nabla h_i(x^*)^T d = 0\) \((i = 1, \ldots , p)\) and \(\nabla g_i(x^*)^T d < 0\) \((i: g_i(x^*) = 0)\) hold;

  3. (c)

    constant rank constraint qualification (CRCQ) if for any subsets \(I_1 \subseteq \{i : g_i(x^*) = 0\}\) and \(I_2 \subseteq \{1, \ldots , p\}\) such that the gradient vectors

    $$\begin{aligned} \nabla g_i(x) \ (i \in I_1), \ \nabla h_i(x) \ (i \in I_2) \end{aligned}$$

    are linearly dependent in \(x = x^*\), they remain linearly dependent for all x in a neighborhood of \(x^*\);

  4. (d)

    constant positive linear dependence condition (CPLD) if for every subsets \(I_1 \subseteq \{i : g_i(x^*) = 0\}\) and \(I_2 \subseteq \{1, \ldots , p\}\) such that the gradient vectors

    $$\begin{aligned} \nabla g_i(x) \ (i \in I_1) \quad \text {and} \quad \ \nabla h_i(x) \ (i \in I_2) \end{aligned}$$

    are positive-linear dependent in \(x = x^*\), they are linearly dependent for all x in a neighborhood of \(x^*\);

  5. (e)

    Abadie constraint qualification (ACQ) if \({{\mathscr {T}}}_X (x^*) = {{\mathscr {L}}}_X(x^*)\) holds;

  6. (f)

    Guignard constraint qualification (GCQ) if \({{\mathscr {T}}}_X(x^*)^\circ = {{\mathscr {L}}}_X(x^*)^\circ \) holds.

Most of these constraint qualifications are well-known from the literature, see, e.g., [2, 18]. One exception might be the CPLD condition which was introduced in [22] and later shown to be a constraint qualification in [1]. The following implications hold between these conditions:

figure a

Almost all of these implications follow directly from the corresponding definitions. The only exception is that CPLD implies ACQ, but this observation can be derived from [1, 5]. It is clear from the above diagram that LICQ is the strongest and GCQ is the weakest constraint qualification among those considered here. In fact, it is possible to show that (in a certain sense), GCQ is the weakest possible constraint qualification which guarantees that a local minimum of the program (2.1) is also a stationary point, see [2] for more details.

3 Abadie- and Guignard-type constraint qualifications

Though being one of the weakest constraint qualifications for optimization problems, standard ACQ is usually violated at a feasible point of the cardinality-constrained optimization problem (1.1), see Example 3.6 for a counterexample. We therefore introduce a problem-tailored modification of the standard ACQ in this section which will be satisfied under much weaker assumptions than the usual ACQ condition. In a similar way, we also develop a variant of the standard GCQ condition. Quite surprisingly, however, this modified GCQ condition turns out to be equivalent to the usual GCQ assumption. We call these modified ACQ- and GCQ-conditions CC-ACQ and CC-GCQ (\(\text {CC} = \text {cardinality}\) constraints).

3.1 Derivation of Abadie- and Guignard-type constraint qualifications

Let Z be the feasible set of the continuous optimization problem (1.3), and let the pair \((x^*, y^*) \in Z\) be any feasible point. We then introduce the following index sets:

$$\begin{aligned} I_g(x^*):= & {} \{i \in \{1, \ldots , m\} \mid g_i(x^*) = 0\}, \\ I_0(x^*):= & {} \{i \in \{1, \ldots , n\} \mid x_i^* = 0\} \\ I_{\pm 0} (x^*, y^*):= & {} \{i \in \{1, \ldots , n\} \mid x_i^* \ne 0, y_i^* = 0\}, \\ I_{00} (x^*,y^*):= & {} \{i \in \{1, \ldots , n\} \mid x_i^* = 0, y_i^* = 0\}, \\ I_{0+} (x^*,y^*):= & {} \{i \in \{1, \ldots , n\} \mid x_i^* = 0, y_i^* \in (0,1)\}, \\ I_{01} (x^*, y^*):= & {} \{i \in \{1, \ldots , n\} \mid x_i^* = 0, y_i^* = 1\}. \end{aligned}$$

Note that the subscripts are used to indicate the signs of the variables \(x_i^*\) and \(y_i^*\). The definitions of these index sets immediately show that we have the partitions

$$\begin{aligned} \{1, \ldots , n\} = I_0(x^*) \cup I_{\pm 0} (x^*, y^*) \end{aligned}$$

and

$$\begin{aligned} I_0(x^*) = I_{00} (x^*,y^*) \cup I_{0+} (x^*,y^*) \cup I_{01} (x^*,y^*). \end{aligned}$$

We further note that the index set \(I_{\pm 0} (x^*, y^*)\) could alternatively be called \( I_{\pm } (x^*)\) only, since \( x_i^* \ne 0\) together with the assumed feasibility of \((x^*,y^*)\) immediately yields \(y_i^* = 0\). However, we prefer to use double indices also for this index set since this makes it easier to remember the sign distribution of the two components \(x_i^*\) and \(y_i^*\).

Using these index sets, the linearization cone of Z at \((x^*, y^*)\) is given by

An alternative representation of the linearization cone, that is better suited for our purposes, is given in the following result whose proof is rather obvious and therefore omitted.

Lemma 3.1

Let \((x^*, y^*) \in Z\) be a feasible point of the program (1.3). Then the linearization cone of Z at \((x^*, y^*)\) is given by

In order to derive suitable constraint qualifications which take into account the particular structure of the nonlinear program (1.3), we now introduce the CC-linearization cone (CC = cardinality constraints) by

Comparing this definition with the representation of the standard linearization cone from Lemma 3.1, it turns out that the only difference is that we included the last line into the CC-linearization cone. In particular, we therefore have

$$\begin{aligned} {\mathscr {L}}_Z^{CC} (x^*, y^*) \subseteq {\mathscr {L}}_Z (x^*, y^*). \end{aligned}$$
(3.1)

While the linearization cone is, by definition, a polyhedral convex cone, simple examples such as Example 3.6 show that both the tangent cone \({\mathscr {T}}_Z(x^*, y^*)\) and the CC-linearization cone \({\mathscr {L}}_Z^{CC} (x^*,y^*)\) are, in general, nonconvex, specifically, they are, usually, the union of finitely many polyhedral convex cones.

This observation can be made precise by introducing certain subsets of the feasible set Z. Recall that \((x^*, y^*) \in Z\) still denotes a given (fixed) feasible point of the program (1.3). For an arbitrary subset \(I \subseteq I_{00} (x^*,y^*)\), we then define the restricted feasible sets

$$\begin{aligned} Z_I := \big \{(x,y) \mid g_i(x)\le & {} 0 \forall i = 1, \ldots , m, \\ h_i(x)= & {} 0 \forall i = 1, \ldots , p, \\ e^T y\ge & {} n - \kappa , \\ x_i= & {} 0, y_i \in [0,1] \forall i \in I_{0+} (x^*, y^*) \cup I_{01} (x^*, y^*) \cup I, \\ y_i= & {} 0 \forall i \in I_{\pm 0} (x^*, y^*) \cup \left( I_{00} (x^*, y^*){\setminus }I \right) \big \}, \end{aligned}$$

i.e., we split the bi-active index set \(I_{00} (x^*,y^*)\) into the two sets I and \(I_{00} (x^*,y^*){\setminus }I\) and require that \(x_i = 0\) on the first set and \(y_i = 0\) on the second set. In particular, we therefore have \(Z_I \subseteq Z\) for all \(I \subseteq I_{00} (x^*,y^*)\). Furthermore, we have the following result showing that the tangent cone is indeed the union of finitely many polyhedral convex cones.

Proposition 3.2

Let \((x^*,y^*) \in Z\) be feasible for the program (1.3). Then the tangent cone and its polar satisfy the following equations:

  1. (a)

    \({\mathscr {T}}_Z (x^*, y^*) = \bigcup _{I \subseteq I_{00} (x^*,y^*)} {\mathscr {T}}_{Z_I} (x^*, y^*)\).

  2. (b)

    \({\mathscr {T}}_Z (x^*, y^*)^\circ = \bigcap _{I \subseteq I_{00} (x^*,y^*)} {\mathscr {T}}_{Z_I} (x^*, y^*)^\circ \).

Proof

Statement (a) was given in [8]. Formally, that paper assumes that \(g_i\) and \(h_i\) are linear. However, a simple inspection of that proof shows that possibly nonlinear functions \(g_i\) and \(h_i\) do not change anything. Furthermore, part (b) then follows from (a) and [2, Theorem 3.1.9]. \(\square \)

A similar representation can be shown for the CC-linearization cone.

Proposition 3.3

Let \((x^*,y^*) \in Z\) be feasible for (1.3). Then the CC-linearization cone and its polar satisfy the following equations:

  1. (a)

    \({\mathscr {L}}_Z^{CC} (x^*, y^*) = \bigcup _{I \subseteq I_{00} (x^*,y^*)} {\mathscr {L}}_{Z_I} (x^*, y^*)\).

  2. (b)

    \({\mathscr {L}}_Z^{CC} (x^*, y^*)^\circ = \bigcap _{I \subseteq I_{00} (x^*,y^*)} {\mathscr {L}}_{Z_I} (x^*, y^*)^\circ \).

Statement (b), once again, follows from part (a) by applying [2, Theorem 3.1.9] to the nonempty cones \({\mathscr {L}}_{Z_I} (x^*, y^*)\). On the other hand, the first statement can be obtained relatively easily by exploiting the structure of the linearization cone \({\mathscr {L}}_{Z_I}\) for an arbitrary index set \(I \subseteq I_{00} (x^*,x^*)\) given by

(3.2)

We therefore skip the details of the proof.

Since the tangent cone is always a subset of the corresponding linearization cone, Propositions 3.2 and 3.3 immediately yield the inclusions

$$\begin{aligned} {\mathscr {T}}_Z(x^*,y^*) = \bigcup _{I \subseteq I_{00}(x^*,y^*)} {\mathscr {T}}_{Z_I}(x^*,y^*) \subseteq \bigcup _{I \subseteq I_{00}(x^*,y^*)} {\mathscr {L}}_{Z_I}(x^*,y^*) = {\mathscr {L}}_Z^{CC} (x^*,y^*). \end{aligned}$$
(3.3)

Together with (3.1), we therefore get the following result.

Proposition 3.4

The inclusions

$$\begin{aligned} {\mathscr {T}}_Z(x^*,y^*) \subseteq {\mathscr {L}}_Z^{CC} (x^*,y^*) \subseteq {\mathscr {L}}_Z (x^*,y^*) \end{aligned}$$

hold for any feasible point \((x^*,y^*) \in Z\).

Recall that the standard ACQ requires \({\mathscr {T}}_Z (x^*,y^*) = {\mathscr {L}}_Z (x^*,y^*)\). However, taking into account that the tangent cone is usually nonconvex as a finite union of polyhedral convex cones, whereas the linearization cone is polyhedral convex by its definition, it follows that the ACQ assumption typically does not hold in our context. On the other hand, the CC-linearization cone is also nonconvex, and Proposition 3.4 indeed motivates the following modifications of standard ACQ and standard GCQ.

Definition 3.5

Let \((x^*,y^*) \in Z\) be feasible for the program (1.3). Then we say that

  1. (a)

    CC-ACQ holds at \((x^*,y^*)\) if \({\mathscr {T}}_Z (x^*,y^*) = {\mathscr {L}}_Z^{CC} (x^*,y^*)\) holds.

  2. (b)

    CC-GCQ holds at \((x^*, y^*)\) if \({\mathscr {T}}_Z(x^*,y^*)^\circ = {\mathscr {L}}_Z^{CC} (x^*,y^*)^\circ \) holds.

Note that CC-ACQ implies CC-GCQ, whereas the converse is not true in general. Moreover, standard ACQ also implies CC-ACQ, whereas the following example shows that CC-ACQ might hold also in situations where standard ACQ is violated.

Example 3.6

Consider the simplest possible cardinality-constrained problem

$$\begin{aligned} \min f(x) \quad \text {s.t.}\quad \Vert x\Vert _0 \le 1 \end{aligned}$$

with \(x \in {\mathbb {R}}^2\) and the corresponding relaxed problem

If we choose the feasible point \((x^*,y^*)\) with \(x^* = (0,0)^T\) and \(y^* = (0,1)^T\) and denote the feasible set of the relaxed problem by Z, we obtain

$$\begin{aligned} {\mathscr {T}}_Z(x^*,y^*)= & {} \left\{ (d_x,d_y) \mid (d_x)_2 = 0, d_y = 0\right\} \\&\cup \left\{ (d_x,d_y) \mid d_x = 0, (d_y)_1 \ge 0, (d_y)_2 \le 0, (d_y)_1+(d_y)_2 \ge 0\right\} ,\\ {\mathscr {L}}_Z(x^*,y^*)= & {} \left\{ (d_x,d_y) \mid (d_x)_2 = 0, (d_y)_1 \ge 0, (d_y)_2 \le 0, (d_y)_1+(d_y)_2 \ge 0\right\} , \end{aligned}$$

where the tangent cone \({\mathscr {T}}_Z(x^*,y^*)\) and the linearization cone \({\mathscr {L}}_Z(x^*,y^*)\) can be calculated, e.g., using Proposition 3.2 and Lemma 3.1, respectively. Here, \({\mathscr {T}}_Z(x^*,y^*)\) is nonconvex, more precisely, it is the union of two polyhedral convex cones, and consequently \({\mathscr {T}}_Z(x^*,y^*) \subsetneq {\mathscr {L}}_Z(x^*,y^*)\). Hence standard ACQ is violated in this example. On the other hand, a simple computation shows that the CC-linearization cone \({\mathscr {L}}_Z^{CC} (x^*,y^*)\) and the tangent cone \({\mathscr {T}}_Z (x^*,y^*)\) coincide, hence CC-ACQ holds.

Example 3.6 shows that CC-ACQ is indeed weaker than standard ACQ. Since Proposition 3.4 implies that

$$\begin{aligned} {\mathscr {L}}_Z (x^*,y^*)^\circ \subseteq {\mathscr {L}}_Z^{CC} (x^*,y^*)^\circ \subseteq {\mathscr {T}}_Z (x^*,y^*)^\circ , \end{aligned}$$
(3.4)

it follows that standard GCQ implies CC-GCQ, and it is rather tempting to believe that CC-GCQ is also strictly weaker than standard GCQ. Surprisingly, however, it turns out that CC-GCQ and standard GCQ coincide. This is a consequence of the following result.

Theorem 3.7

Let \((x^*,y^*) \in Z\) be feasible for (1.3). Then \({\mathscr {L}}_Z(x^*,y^*)^\circ = {\mathscr {L}}_Z^{CC}(x^*,y^*)^\circ \).

Proof

In view of (3.4), we only have to show that \({\mathscr {L}}_Z^{CC} (x^*,y^*)^\circ \subseteq {\mathscr {L}}_Z (x^*,y^*)^\circ \). To verify this inclusion, let us first calculate the corresponding polar cones. Using the representation of \({\mathscr {L}}_Z(x^*,y^*)\) given in Lemma 3.1 and applying Lemma 2.1, we immediately obtain

$$\begin{aligned} {\mathscr {L}}_Z(x^*,y^*)^\circ= & {} \{w = (w_x,w_y) \mid \\&\quad w_x = \sum _{i \in I_g(x^*)} \lambda _i \nabla g_i(x^*) + \sum _{i=1}^p \mu _i \nabla h_i(x^*) + \sum _{i \in I_{01} \cup I_{0+}} \gamma _i e_i,\\&\quad w_y = -\delta e + \sum _{i \in I_{\pm 0} \cup I_{00} \cup I_{01}} \nu _i e_i,\\&\quad \lambda _i \ge 0 \text { for all } i \in I_g(x^*),\\&\quad \delta \ge 0 \text { if } e^Ty^* = n-\kappa \text { and otherwise } \delta = 0,\\&\quad \nu _i \ge 0 \text { for all } i \in I_{01}(x^*,y^*),\\&\quad \nu _i \le 0 \text { for all } i \in I_{00}(x^*,y^*)\}. \end{aligned}$$

In order to calculate the polar cone of \({\mathscr {L}}_Z^{CC}(x^*,y^*)\), we use the formula from Proposition 3.3. Using the representation of the linearization cones \({\mathscr {L}}_{Z_I}(x^*,y^*)\) stated in (3.2) and applying once again Lemma 2.1, we obtain for all \(I \subseteq I_{00}(x^*,y^*)\)

$$\begin{aligned} {\mathscr {L}}_{Z_I}(x^*,y^*)^\circ= & {} \{w = (w_x,w_y) \mid \\&\quad w_x = \sum _{i \in I_g(x^*)} \lambda _i \nabla g_i(x^*) + \sum _{i=1}^p \mu _i \nabla h_i(x^*) + \sum _{i \in I_{01} \cup I_{0+} \cup I} \gamma _i e_i,\\&\quad w_y = -\delta e + \sum _{i \in I_{\pm 0} \cup I_{00} \cup I_{01}} \nu _i e_i,\\&\quad \lambda _i \ge 0 \text { for all } i \in I_g(x^*),\\&\quad \delta \ge 0 \text { if } e^Ty^* = n-\kappa \text { and otherwise } \delta = 0,\\&\quad \nu _i \ge 0 \text { for all } i \in I_{01}(x^*,y^*),\\&\quad \nu _i \le 0 \text { for all } i \in I\}. \end{aligned}$$

Now, take an arbitrary element \(w = (w_x, w_y) \in {\mathscr {L}}_Z^{CC} (x^*, y^*)^\circ \). Then \(w \in {\mathscr {L}}_{Z_I} (x^*,y^*)^\circ \) for all \(I \subseteq I_{00} (x^*,y^*)\). In particular, taking \(I = \emptyset \), we obtain suitable scalars \(\hat{\lambda }_i\), \(\hat{\mu }_i\), \(\hat{\delta }\), \(\hat{\gamma }_i\), \(\hat{\nu }_i\) such that the above representation holds with \(I = \emptyset \) so that, in particular, we have

$$\begin{aligned} \hat{\gamma }_i = 0 \quad \forall i \in I_{00} (x^*,y^*). \end{aligned}$$

On the other hand, taking \(I = I_{00} (x^*,y^*)\), we get possibly different coefficients \(\tilde{\lambda }_i\), \(\tilde{\mu }_i\), \(\tilde{\delta }\), \(\tilde{\gamma }_i\), \(\tilde{\nu }_i\) such that the above representation holds with \(I = I_{00} (x^*,y^*)\) which, in particular, yields

$$\begin{aligned} \tilde{\nu }_i \le 0 \quad \forall i \in I_{00} (x^*,y^*). \end{aligned}$$

Since none of the constraints occurring in \(Z_I\) depends on both x and y, the partial derivatives with respect to x and y are completely independent of each other. Therefore, defining \(\lambda _i, \mu _i, \eta , \gamma _i, \nu _i\) by

$$\begin{aligned} \lambda _i:= & {} \hat{\lambda }_i \quad \forall i \in I_g(x^*), \\ \mu _i:= & {} \hat{\mu }_i \quad \forall i = 1, \ldots , p, \\ \gamma _i:= & {} \hat{\gamma }_i \quad \forall i \in I_{0+} (x^*,y^*) \cup I_{01} (x^*,y^*) \cup I_{00} (x^*,y^*), \end{aligned}$$

i.e., using the scalars of the first representation for the derivatives with respect to the x-vector, and

$$\begin{aligned} \delta:= & {} \tilde{\delta }, \\ \nu _i:= & {} \tilde{\nu }_i \quad \forall i \in I_{00} (x^*,y^*) \cup I_{01} (x^*,y^*) \cup I_{\pm 0} (x^*,y^*), \end{aligned}$$

i.e., using the coefficients of the second representation for the derivatives with respect to the y-vector, we altogether obtain

$$\begin{aligned} w_x= & {} \sum _{i \in I_g(x^*)} \lambda _i \nabla g_i(x^*) + \sum _{i=1}^p \mu _i \nabla h_i(x^*) + \sum _{i \in I_{0+} \cup I_{01} \cup I_{00}} \gamma _i e_i, \\ w_y= & {} - \delta e + \sum _{i \in I_{00} \cup I_{01} \cup I_{\pm 0}} \nu _i e_i \end{aligned}$$

with \(\lambda _i \ge 0\) for all \(i \in I_g(x^*)\), \(\delta \ge 0\) and \(\delta = 0\) if we have \(e^T y^* > n - \kappa \), \(\gamma _i = 0\) for all \(i \in I_{00} (x^*,y^*)\), \(\nu _i \le 0\) for all \(i \in I_{00}(x^*,y^*)\), and \(\nu _i \ge 0\) for all \(i \in I_{01} (x^*,y^*)\). This shows that \(w = (w_x, w_y) \in {\mathscr {L}}_Z (x^*,y^*)^\circ \), and therefore completes the proof. \(\square \)

Note that the previous proof exploits the structure of the polar cones \({\mathscr {L}}_{Z_I} (x^*,y^*)^\circ \) only for the two extreme cases \(I = \emptyset \) and \(I = I_{00} (x^*, y^*)\). Motivated by the terminology coined in [13], the property \({\mathscr {L}}_Z(x^*,y^*)^\circ = {\mathscr {L}}_Z^{CC}(x^*,y^*)^\circ \) might be called the CC-intersection property.

Using Theorem 3.7 together with the definition of CC-GCQ, we immediately get the following consequence.

Corollary 3.8

Let \((x^*,y^*) \in Z\) be feasible for (1.3). Then CC-GCQ holds at \((x^*,y^*)\) if and only if GCQ holds there.

3.2 Sufficient conditions for CC-ACQ

This section presents some conditions which imply that CC-ACQ (hence also CC-GCQ and thus standard GCQ) holds. A first and very simple result is contained in our next lemma.

Lemma 3.9

Let \((x^*,y^*) \in Z\) be feasible for the program (1.3), and assume that each of the restricted feasible sets \(Z_I\) with \(I \subseteq I_{00}(x^*,y^*)\) satisfies the standard ACQ. Then CC-ACQ (hence also GCQ) holds at \((x^*,y^*)\).

Proof

The statement follows immediately from (3.3), since the only inclusion in that formula is now an equality due to the assumed ACQ condition. \(\square \)

Since linear constraints automatically satisfy the standard ACQ condition, it follows that CC-ACQ also holds in the case of linear functions \(g_i, h_i\).

Corollary 3.10

Let \((x^*,y^*) \in Z\) be feasible for the program (1.3), and assume that the functions \(g_i\) and \(h_i\) are all linear. Then CC-ACQ (hence also GCQ) holds at \((x^*,y^*)\).

Our next aim is to show that the assertion of Corollary 3.10 may also hold for possibly nonlinear functions \(g_i\) and \(h_i\) under suitable assumptions. To this end, we first recall a number of other tailored constrained qualifications that were introduced in [8] for cardinality-constrained optimization problems. To motivate these definitions, let \((x^*,y^*)\) be a feasible point of the relaxed program (1.3), and define the corresponding tightened nonlinear program TNLP(\(x^*\)):

$$\begin{aligned} \min _x f(x) \quad \text {s.t.}\quad g(x) \le 0,\ h(x) = 0,\ x_i = 0 \ (i \in I_0(x^*)). \end{aligned}$$

We then say that \((x^*,y^*)\) satisfies a constraint qualification for the relaxed problem (1.3) if \(x^*\) satisfies the corresponding standard constraint qualification for TNLP (\(x^*\)). In this way, we obtain the following definitions.

Definition 3.11

Let \((x^*,y^*)\) be feasible for the relaxed problem (1.3). Then \((x^*,y^*)\) satisfies

  1. (a)

    CC-LICQ if the gradients

    $$\begin{aligned} \nabla g_i(x^*) \ (i \in I_g(x^*)), \ \nabla h_i(x^*) \ (i=1, \ldots , p), \ e_i \ (i \in I_0(x^*)) \end{aligned}$$

    are linearly independent;

  2. (b)

    CC-MFCQ if the gradients

    $$\begin{aligned} \nabla g_i(x^*) \ (i \in I_g(x^*)), \ \quad \text {and} \quad \nabla h_i(x^*) \ (i=1, \ldots , p), \ e_i \ (i \in I_0(x^*)) \end{aligned}$$

    are positively linearly independent;

  3. (c)

    CC-CRCQ if for any subsets \(I_1 \subseteq I_g(x^*)\), \(I_2 \subseteq \{1, \ldots , p\}\), and \(I_3 \subseteq I_0(x^*)\) such that the gradients

    $$\begin{aligned} \nabla g_i(x) \ (i \in I_1), \ \nabla h_i(x) \ (i \in I_2), \ e_i \ (i \in I_3) \end{aligned}$$

    are linearly dependent in \(x=x^*\), they remain linearly dependent in a neighborhood of \(x^*\);

  4. (d)

    CC-CPLD if for any subsets \(I_1 \subseteq I_g(x^*)\), \(I_2 \subseteq \{1, \ldots , p\}\), and \(I_3 \subseteq I_0(x^*)\) such that the gradients

    $$\begin{aligned} \nabla g_i(x) \ (i \in I_1), \ \quad \text {and} \quad \nabla h_i(x) \ (i \in I_2), \ e_i \ (i \in I_3) \end{aligned}$$

    are positively linearly dependent in \(x=x^*\), they are linearly dependent in a neighborhood of \(x^*\).

Note that all these constraint qualifications depend on the vector \(x^*\) only, and not on the vector pair \((x^*, y^*)\). Hence these conditions may be viewed as constraint qualifications for the original cardinality constrained optimization problem (1.1).

We claim that the following implications among all these constraint qualifications hold:

figure b

Note that these implications are the direct counterparts of those known for the corresponding standard constraint qualifications, cf. Sect. 2. Most of these implications (therefore) follow directly from the corresponding definitions. The only nontrivial part is that CC-CPLD implies CC-ACQ. To verify this statement, we begin with a preliminary result.

Lemma 3.12

Let \((x^*,y^*) \in Z\) be feasible for the relaxed program (1.3), and suppose that CC-CPLD holds at \((x^*,y^*)\). Then, for any \(I \subseteq I_{00} (x^*,y^*)\), standard CPLD is satisfied for the restricted feasible set \(Z_I\).

Proof

Consider a fixed index set \(I \subseteq I_{00} (x^*,y^*)\). The corresponding feasible set \(Z_I\) then has constraints that either depend on x or on y, but never on both. Consequently, it suffices to verify CPLD for the constraints depending on x and on y separately. Since all constraints depending on y are linear, they satisfy CRCQ and thus also CPLD. It therefore suffices to show that the standard CPLD condition holds for those constraints that depend on x only. We can restrict ourselves to the gradient vectors that arise by taking the partial derivatives with respect to the x-variables only, since the partial derivatives with respect to the y-variables are in this case all zero. More precisely, this means that we have to show that for all subsets \(I_1 \subseteq I_g(x^*), I_2 \subseteq \{1, \ldots , p\}\), and \(I_3 \subseteq I_{0+} (x^*,y^*) \cup I_{01} (x^*,y^*) \cup I\) such that the gradient vectors

$$\begin{aligned} \nabla g_i(x^*) \ (i \in I_1), \ \quad \text {and} \quad \nabla h_i(x^*) \ (i \in I_2), \ e_i \ (i \in I_3) \end{aligned}$$

are positively linearly dependent, they are linearly dependent in a whole neighborhood of \(x^*\). However, since \(I_3\) may, in particular, be viewed as a subset of \(I_0(x^*)\), this statement follows immediately from the definition of CC-CPLD. \(\square \)

Similar to the previous result, one can also show that CC-CRCQ implies that a piecewise CRCQ condition holds, by which we mean that standard CRCQ holds for all sets \(Z_I, I \subseteq I_{00} (x^*,y^*)\). On the other hand, CC-LICQ does not imply piecewise LICQ. In fact, piecewise LICQ would require that the following gradients (with respect to x and y) are linearly independent for all subsets \(I \subseteq I_{00} (x^*,y^*)\):

$$\begin{aligned}&\left( {\begin{array}{c}\nabla g_i(x^*)\\ 0\end{array}}\right) \ (i \in I_g(x^*)), \ \left( {\begin{array}{c}\nabla h_i(x^*)\\ 0\end{array}}\right) \ (i = 1, \ldots , p), \ \left( {\begin{array}{c}0\\ -e\end{array}}\right) \ (\text { if } e^T y^* = n - \kappa ), \\&\quad \left( {\begin{array}{c}e_i\\ 0\end{array}}\right) \ (i \in I_{0+} \cup I_{01} \cup I), \ \left( {\begin{array}{c}0\\ -e_i\end{array}}\right) \ (i \in I), \ \left( {\begin{array}{c}0\\ e_i\end{array}}\right) \ (i \in I_{01}), \ \left( {\begin{array}{c}0\\ e_i\end{array}}\right) \ (i \in I_{\pm 0} \cup (I_{00}{\setminus }I)). \end{aligned}$$

While CC-LICQ implies that those gradients which have nonzero entries with respect to the x-part are linearly independent, it is clear that the other gradients are linearly dependent whenever \(e^T y^* = n - \kappa \) holds and the set \(I_{0+} := I_{0+} (x^*,y^*)\) is empty, a situation that typically holds at a solution of the cardinality-constrained optimization problem.

In a similar way, it is also possible to see that CC-MFCQ does, in general, not imply a piecewise MFCQ condition. We skip the corresponding details especially since this observation will not be used subsequently.

Lemma 3.12 allows us to state the following generalization of Corollary 3.10.

Theorem 3.13

Let \((x^*,y^*) \in Z\) be feasible for the program (1.3), and assume that CC-CPLD holds. Then CC-ACQ (hence also GCQ) holds at \((x^*,y^*)\).

Proof

Since CC-CPLD holds at \((x^*,y^*)\), it follows from Lemma 3.12 that standard CPLD holds for each of the feasible sets \(Z_I, I \subseteq I_{00} (x^*,y^*)\). But standard CPLD implies that standard ACQ holds for each of the feasible sets \(Z_I\). The statement therefore follows immediately from Lemma 3.9. \(\square \)

4 Stationarity conditions

This section shows that, in every local minimum \((x^*,y^*)\) of the relaxed program (1.3), in which a suitable CC-constraint qualification holds, certain KKT-type optimality conditions are satisfied. We distinguish two optimality conditions here, one is called strong stationarity (and is equivalent to the standard KKT conditions), and the other one is called M-stationarity. We first note that strong stationarity provides a necessary optimality condition under the CC-GCQ assumption. Under certain assumptions, strong stationarity is also a sufficient condition for a local minimum. The slightly weaker M-stationarity condition is therefore a necessary optimality condition under the CC-GCQ condition, too. This type of stationary points arises quite naturally as limit points within the algorithmic framework in [8].

We begin by stating the two stationarity concepts that will be used in our analysis. To shorten the notation, we sometimes abbreviate index sets such as \(I_{00}:=I_{00}(x^*,y^*)\) when the reference point \((x^*,y^*)\) is clear from the context.

Definition 4.1

Let \((x^*, y^*)\) be feasible for the relaxed program (1.3). Then \((x^*, y^*)\) is called

  1. (a)

    S-stationary (S = strong) if there exist multipliers \(\lambda \in {\mathbb {R}}^m, \mu \in {\mathbb {R}}^p\), and \(\gamma \in {\mathbb {R}}^n\) such that the following conditions hold:

    $$\begin{aligned}&\nabla f(x^*) + \sum _{i\in I_g(x^*)} \lambda _i \nabla g_i(x^*) + \sum _{i=1}^p \mu _i \nabla h_i(x^*) + \sum _{i\in I_{01}\cup I_{0+}} \gamma _i e_i = 0, \\&\quad \lambda _i \ge 0 \quad \forall i \in I_g(x^*). \end{aligned}$$
  2. (b)

    M-stationary (M = Mordukhovich) if there exist multipliers \(\lambda \in {\mathbb {R}}^m, \mu \in {\mathbb {R}}^p\), and \(\gamma \in {\mathbb {R}}^n\) such that the following conditions hold:

    $$\begin{aligned}&\nabla f(x^*) + \sum _{i\in I_g(x^*)} \lambda _i \nabla g_i(x^*) + \sum _{i=1}^p \mu _i \nabla h_i(x^*) + \sum _{i\in I_{01}\cup I_{0+}\cup I_{00}} \gamma _i e_i = 0, \\&\quad \lambda _i \ge 0 \quad \forall i \in I_g(x^*). \end{aligned}$$

The notions of S- and M-stationarity are motivated by a similar terminology used in the context of MPCCs, see Sect. 5 for more details. It is clear from the definition that M-stationarity is implied by S-stationarity. We further note that S-stationarity was shown to be equivalent to the usual KKT conditions of the relaxed program (1.3) in [8].

As a consequence of the previous section, we can show that a local minimum satisfying CC-GCQ is a strongly stationary point of the relaxed program (1.3).

Theorem 4.2

Let \((x^*,y^*)\) be a local minimum of (1.3) such that CC-GCQ holds at \((x^*,y^*)\). Then \((x^*,y^*)\) is an S-stationary point.

Proof

Since CC-GCQ holds at \((x^*,y^*)\) by assumption, it follows from Corollary 3.8 that standard GCQ holds at \((x^*,y^*)\). Under standard GCQ, however, the usual KKT conditions are necessary optimality conditions at the local minimum \((x^*,y^*)\), see, e.g., [2]. On the other hand, these KKT conditions are shown to be equivalent to strong stationarity in [8]. Hence the assertion follows. \(\square \)

Since, by Corollary 3.10, CC-GCQ holds if both \(g_i\) and \(h_i\) are linear, we re-obtain the following result from [8] as a special case of our theory.

Corollary 4.3

Assume that all functions \(g_i\) and \(h_i\) are linear, and let \((x^*,y^*)\) be a local minimum of the corresponding relaxed program (1.3). Then \((x^*, y^*)\) is an S-stationary point.

For standard nonlinear programs it is well-known that any KKT point yields a global minimum provided that we have a convex program. The cardinality-constrained program is, of course, nonconvex, therefore we cannot expect a result of this kind. However, under a convexity-type condition, the following result shows that every strongly stationary point yields a local minimum of the relaxed program (1.3). This observation is similar to one in the MPCC-setting, see [28].

Theorem 4.4

Assume that f and each \(g_i\) are convex and each \(h_i\) is linear. Let \((x^*,y^*)\) be an S-stationary point of the relaxed program (1.3). Then \((x^*,y^*)\) is a local minimum of this program.

Proof

Let (xy) be an arbitrary feasible point of the relaxed program (1.3). We then obtain

Since we take the last sum only over all indices \(i \in I_{01}(x^*,y^*) \cup I_{0+}(x^*,y^*)\), it follows that, in a sufficiently small neighborhood of \((x^*,y^*)\), we still have \(y_i \ne 0\) for all \(i \in I_{01}(x^*,y^*) \cup I_{0+}(x^*,y^*)\), hence the feasibility of the pair (xy) yields \(x_i = 0\). Consequently, we have \(f(x) \ge f(x^*)\) for all (xy) in a sufficiently small neighborhood of \((x^*,y^*)\). \(\square \)

Note that the previous proof shows that the S-stationary point \((x^*,y^*)\) is actually a global minimum of the relaxed program (1.3) under the convexity-type assumption provided that \(\gamma _i = 0\) for all indices i such that \(y_i^* \ne 0\). Of course, this assumption is usually not satisfied.

5 Comparison with MPCCs

This section gives a detailed comparison between (a special class of) cardinality-constrained optimization problems on the one hand and MPCCs on the other hand. Despite several similarities, it turns out that both problems have substantially different properties. In particular, this justifies to treat cardinality-constrained problems separately.

A MPCC is an optimization problem of the form

$$\begin{aligned} \min _x f(x)\,\, \text {s.t.}\, g_i(x)\le & {} 0 \quad \forall i=1, \ldots , m,\nonumber \\ h_i(x)= & {} 0 \quad \forall i=1, \ldots , p,\\ G_i(x)\ge & {} 0, H_i(x) \ge 0, G_i(x)H_i(x) = 0 \quad \forall i=1, \ldots , q, \nonumber \end{aligned}$$
(5.1)

see [14, 20] for more information on this problem class.

In the special case where some of the inequality constraints \(g(x) \le 0\) correspond to nonnegativity constraints \(x \ge 0\), say \(g(x) = \left( \tilde{g}(x), - x \right) \) for a suitable function \(\tilde{g}\), the cardinality-constrained problem (1.3) is (after a redefinition of g by setting \(g := \tilde{g}\)) of the form

$$\begin{aligned} \min _{x,y} f(x)\ \text {s.t.}\ g_i(x)\le & {} 0 \quad \forall i=1, \ldots , m,\nonumber \\ h_i(x)= & {} 0 \quad \forall i=1, \ldots , p,\nonumber \\ e^Ty\ge & {} n-\kappa ,\\ y_i\le & {} 1 \quad \forall i=1, \ldots , n, \nonumber \\ x_i\ge & {} 0, y_i \ge 0, x_iy_i = 0 \quad \forall i=1, \ldots , n, \nonumber \end{aligned}$$
(5.2)

i.e., it is an MPCC in the variables (xy) with \(G_i(x,y) := x_i, H_i(x,y) := y_i\) and the constraint \(e^T y \ge n - \kappa \) being part of the standard inequality constraints in (5.1). The situation \(x \ge 0\) occurs, for example, in portfolio optimization, see [7].

In order to compare the results obtained in this paper for cardinality constrained problems with those known for MPCCs, let us state the corresponding definitions for MPCCs, see [19, 25, 27] for some discussion and a derivation of these stationarity concepts.

Definition 5.1

Let \(x^*\) be feasible for (5.1). Then \(x^*\) is called

  1. (a)

    W-stationary (W = weakly), if there are multipliers \(\lambda \in {\mathbb {R}}^m, \mu \in {\mathbb {R}}^p, \gamma \in {\mathbb {R}}^q\) and \(\nu \in {\mathbb {R}}^q\) such that the following conditions hold:

    $$\begin{aligned}&\nabla f(x^*) + \sum _{i: g_i(x^*)=0} \lambda _i \nabla g_i(x^*) + \sum _{i=1}^p \mu _i \nabla h_i(x^*) - \sum _{i: G_i(x^*)=0} \gamma _i \nabla G_i(x^*) \\&\qquad - \sum _{i: H_i(x^*)=0} \nu _i \nabla H_i(x^*) = 0, \\&\lambda _i \ge 0 \quad \forall i : g_i(x^*)=0; \end{aligned}$$
  2. (b)

    C-stationary (C = Clarke), if it is W-stationary and, in addition, it holds that \(\gamma _i \nu _i \ge 0\) for all i such that \(G_i(x^*) = 0\) and \(H_i(x^*) = 0\);

  3. (c)

    M-stationary (M = Mordukhovich), if it is W-stationary and, in addition, for all i such that \(G_i(x^*) = 0\) and \(H_i(x^*) = 0\), we either have \(\gamma _i, \nu _i \ge 0\) or \(\gamma _i \nu _i = 0\);

  4. (d)

    S-stationary (S = strongly), if it is W-stationary and, in addition, it holds that \(\gamma _i, \nu _i \ge 0\) for all i such that \(G_i(x^*) = 0\) and \(H_i(x^*) = 0\).

Note that S-stationarity implies M-stationarity, M-stationarity implies C-stationarity, and C-stationarity implies W-stationarity. Furthermore, there exist examples which show that each of these implications is strict, i.e., none of these concepts coincides for general MPCCs.

Now, there are two different ways to look at the cardinality-constrained problem (5.2) with nonnegativity constraints on the variables x: One way is to view this as a special cardinality-constrained problem with the nonnegativity constraints \(x \ge 0\) as additional inequality constraints, and the other way is to view this problem as an MPCC, with the nonnegativity constraints being a part of the complementarity conditions. Taking the first point of view, we write down the S- and M-stationarity conditions (in the sense of Definition 4.1) in the following result.

Lemma 5.2

Let \((x^*,y^*)\) be feasible for (5.2). Then \((x^*,y^*)\) is

  1. (a)

    S-stationary if and only if there exist suitable multipliers such that the following conditions hold:

    $$\begin{aligned}&\nabla f(x^*)+ \sum _{i\in I_g(x^*)} \lambda _i \nabla g_i(x^*) + \sum _{i=1}^p \mu _i \nabla h_i(x^*) + \sum _{i\in I_{00}\cup I_{0+}\cup I_{01}} \gamma _i e_i = 0, \\&\quad \lambda _i \ge 0 \quad \forall i \in I_g(x^*),\\&\quad \gamma _i\le 0 \quad \forall i \in I_{00}(x^*,y^*). \end{aligned}$$
  2. (b)

    M-stationary if and only if there exist suitable multipliers such that the following conditions hold:

    $$\begin{aligned}&\nabla f(x^*)+ \sum _{i\in I_g(x^*)} \lambda _i \nabla g_i(x^*) + \sum _{i=1}^p \mu _i \nabla h_i(x^*) + \sum _{i\in I_{00}\cup I_{0+}\cup I_{01}} \gamma _i e_i = 0, \\&\quad \quad \lambda _i \ge 0 \quad \forall i \in I_g(x^*). \end{aligned}$$

Proof

(a) Applying the S-stationarity conditions from Definition 4.1 to (5.2) gives

$$\begin{aligned}&\nabla f(x^*)+ \sum _{i\in I_g(x^*)} \lambda _i \nabla g_i(x^*) - \sum _{i\in I_{00}\cup I_{0+}\cup I_{01}} \lambda ^+_i e_i + \sum _{i=1}^p \mu _i \nabla h_i(x^*) \\&\qquad + \sum _{i\in I_{0+}\cup I_{01}} \gamma _i e_i = 0, \\&\quad \lambda _i \ge 0 \quad \forall i \in I_g(x^*),\\&\quad \lambda _i^+ \ge 0 \quad \forall i \in I_{00}(x^*,y^*) \cup I_{0+}(x^*,y^*) \cup I_{01}(x^*,y^*). \end{aligned}$$

Replacing \(\gamma _i - \lambda _i^+\) by a new \(\gamma _i\) for all \(i \in I_{0+} (x^*,y^*) \cup I_{01} (x^*,y^*)\) and setting \(\gamma _i := - \lambda _i^+\) for all \(i \in I_{00} (x^*,y^*)\) yields the desired statement.

(b) Writing down the M-stationarity conditions from Definition 4.1 to (5.2) yields

$$\begin{aligned}&\nabla f(x^*)+ \sum _{i\in I_g(x^*)} \lambda _i \nabla g_i(x^*) - \sum _{i \in I_{00}\cup I_{0+}\cup I_{01}} \lambda _i^+ e_i + \sum _{i=1}^p \mu _i \nabla h_i(x^*) \\&\qquad + \sum _{i\in I_{00}\cup I_{0+}\cup I_{01}} \gamma _i e_i = 0, \\&\quad \lambda _i \ge 0 \quad \forall i \in I_g(x^*),\\&\quad \lambda _i^+ \ge 0 \quad \forall i \in I_{00}(x^*,y^*) \cup I_{0+}(x^*,y^*) \cup I_{01} (x^*,y^*). \end{aligned}$$

Replacing \(\gamma _i - \lambda _i^+\) by a new \(\gamma _i\) for each index \(i \in I_{00}(x^*,y^*) \cup I_{0+}(x^*,y^*) \cup I_{01} (x^*,y^*)\) gives the desired representation of M-stationarity. \(\square \)

Note the close relation between S- and M-stationarity conditions for the cardinality-constrained problem from (1.3) and the corresponding stationarity conditions for the specially structured problem (5.2) involving nonnegativity constraints: S-stationarity differs only in the last sum where now also the bi-active index set \(I_{00}\) is included (with some sign constraints on the corresponding multipliers). As for M-stationarity, there is absolutely no difference though the problem itself is different!

Our definition of M- and S-stationarity for cardinality-constrained problems was motivated by the corresponding concepts for MPCCs, and indeed in the special case (5.2), the definitions turn out to be identical.

Lemma 5.3

Let \((x^*,y^*)\) be feasible for (5.2). Then the point \((x^*,y^*)\) is S-stationary (M-stationary) in the sense of Definition 4.1 if and only if it is S-stationary (M-stationary) in the sense of Definition 5.1.

Proof

(a) We first verify the statement for S-stationary points. Using the same index sets as before, the S-stationarity conditions from Definition 5.1 read

$$\begin{aligned}&\nabla f(x^*)+ \sum _{i\in I_g(x^*)} \lambda _i \nabla g_i(x^*) + \sum _{i=1}^p \mu _i \nabla h_i(x^*) - \sum _{i\in I_{00}\cup I_{0+}\cup I_{01}} \gamma _i e_i = 0, \\&\qquad -\delta e + \sum _{i \in I_{01}} \nu _i e_i - \sum _{i \in I_{\pm 0} \cup I_{00}} \nu _i e_i = 0, \\&\quad \lambda _i \ge 0 \quad \forall i \in I_g(x^*),\\&\quad \gamma _i\ge 0, \ \nu _i \ge 0 \quad \forall i \in I_{00}(x^*,y^*),\\&\quad \delta \ge 0 \quad \text {if} \quad e^Ty^* = n-\kappa , \quad \text {else} \quad \delta = 0,\\&\quad \nu _i\ge 0 \quad \forall i \in I_{01}(x^*,y^*). \end{aligned}$$

Since the conditions on \(\delta \) and \(\nu \) are can always be satisfied by choosing \(\delta = 0\), \(\nu = 0\), the S-stationarity conditions from Definition 5.1 hold if and only if the following conditions hold:

$$\begin{aligned}&\nabla f(x^*)+ \sum _{i\in I_g(x^*)} \lambda _i \nabla g_i(x^*) + \sum _{i=1}^p \mu _i \nabla h_i(x^*) - \sum _{i\in I_{00}\cup I_{0+}\cup I_{01}} \gamma _i e_i = 0, \\&\quad \lambda _i \ge 0 \quad \forall i \in I_g(x^*),\\&\quad \gamma _i\ge 0 \quad \forall i \in I_{00}(x^*,y^*). \end{aligned}$$

Using Lemma 5.2 and replacing \(\gamma _i\) by \(- \gamma _i\) everywhere, these are precisely the S-stationarity conditions from Definition 4.1.

(b) We next verify the corresponding statement for M-stationary points. The proof is completely analogous to the one for S-stationary points, but it is stated here since it yields an interesting observation that is stated formally in the subsequent Remark 5.4.

Let us first write down the M-stationarity conditions for problem (5.2) in the sense of Definition 5.1:

$$\begin{aligned}&\nabla f(x^*)+ \sum _{i\in I_g(x^*)} \lambda _i \nabla g_i(x^*) + \sum _{i=1}^p \mu _i \nabla h_i(x^*) - \sum _{i\in I_{00}\cup I_{0+}\cup I_{01}} \gamma _i e_i = 0, \\&\qquad -\delta e + \sum _{i \in I_{01}} \nu _i e_i - \sum _{i \in I_{\pm 0} \cup I_{00}} \nu _i e_i = 0, \\&\quad \lambda _i \ge 0 \quad \forall i \in I_g(x^*),\\&\quad \gamma _i\ge 0, \nu _i \ge 0 \quad \text {or} \quad \gamma _i \nu _i = 0 \quad \forall i \in I_{00}(x^*,y^*),\\&\quad \delta \ge 0 \quad \text {if} \quad e^Ty^* = n-\kappa , \quad \text {else} \quad \delta = 0,\\&\quad \nu _i\ge 0 \quad \forall i \in I_{01}(x^*,y^*). \end{aligned}$$

Using once again that the conditions on \(\delta \) and \(\nu \) can always be satisfied by choosing \(\delta = 0\), \(\nu = 0\), it follows that there exist multipliers satisfying the M-stationarity conditions from Definition 5.1 if and only if there exist multipliers such that the following simplified conditions hold:

$$\begin{aligned} \begin{array}{rcl} &{}&{}\displaystyle {\nabla f(x^*)+ \sum _{i\in I_g(x^*)} \lambda _i \nabla g_i(x^*) + \sum _{i=1}^p \mu _i \nabla h_i(x^*) - \sum _{i\in I_{00}\cup I_{0+}\cup I_{01}} \gamma _i e_i = 0,} \\ &{}&{}\quad \displaystyle {\lambda _i \ge 0 \quad \forall i \in I_g(x^*).} \end{array} \end{aligned}$$
(5.3)

But these are precisely the M-stationarity conditions given in Lemma 5.2 (note that we can change the sign of the \(\gamma _i\) without loss of generality). \(\square \)

A simple inspection of part (b) in the previous proof shows that the M-stationarity conditions of problem (5.2) in the sense of Definition 5.1 are satisfied by some multipliers if and only if the C-stationarity conditions hold for this problem, and this, in turn, is also equivalent to the satisfaction of the W-stationarity conditions. Hence we have the following observation.

Fig. 1
figure 1

Illustration of Example 5.5

Remark 5.4

The M-, C-, and W-stationarity points in the sense of Definition 5.1 are the same for the particular MPCC (5.2) that arises from our cardinality-constrained problem in the case where all variables are assumed to be nonnegative. In view of Lemma 5.3, this means that, in this particular situation, the M-stationary points in the sense of Definition 4.1 are also the same as the W- and C-stationary points in the sense of Definition 5.1.

On the other hand, S- and M-stationarity are different concepts. This is shown by the following example.

Example 5.5

Consider the cardinality-constrained problem (1.1), (1.2) with \(n = 2\), \(\kappa = 1\), objective function \( f(x) := x_2 - x_1 \) and feasible set \(X := \{x \mid x \ge 0, \ x_1^2 + (x_2 - 1)^2 \le 1\}\), see Fig. 1. The unique solution of this problem is \(x^* := (0,0)\). There exist different corresponding optimal y-parts. For example, taking \(y^* := (0,1)\), it is easy to see that \((x^*,y^*)\) is not an S-stationary point; in particular, using Theorem 4.2, it follows that this problem does not satisfy GCQ, hence CC-GCQ is also violated. However, \((x^*,y^*)\) is M-stationary, for example, one may take \(\lambda = 0, \gamma _1 = 1, \gamma _2 = -1\) to see that the conditions from Lemma 5.2 (b) hold.

On the other hand, \(x^*\) together with \(y^* := (1,0)\) is also optimal, and this pair turns out to be S-stationary. Indeed, a simple calculation shows that, for example, the multipliers \(\gamma _1 := 1, \gamma _2 := -1,\) and \(\lambda := 0\) satisfy the S-stationarity conditions from Lemma 5.2 (a).

We next investigate the relation between our CC-linearized cone and the MPCC-linearized cone, which for a point \(x^*\) feasible for the MPCC (5.1) is defined by

see, e.g., [12, 21].

Lemma 5.6

Let \((x^*,y^*)\) be feasible for (5.2). Then \({\mathscr {L}}_Z^{MPCC}(x^*,y^*) \!=\! {\mathscr {L}}_Z^{CC}(x^*,y^*)\).

Proof

For \((x^*,y^*)\) feasible for (5.2), the MPCC-linearized cone is of the form

which is exactly the same as the CC-linearized cone \({\mathscr {L}}_Z^{CC} (x^*,y^*)\). \(\square \)

Consequently, for points \((x^*,y^*)\) feasible for (5.2), the CC-constraint qualifications CC-ACQ and CC-GCQ coincide with their MPCC counterparts MPCC-ACQ and MPCC-GCQ.

However, even though there are these close connections between cardinality-constrained problems and MPCCs in case \(x \ge 0\), the results we have proven in this paper are quite different from what can be shown for general MPCCs. We summarize some of the major differences between general MPCCs and cardinality-constrained optimization problems in the following remark.

Remark 5.7

  1. (a)

    Standard GCQ holds for MPCCs under the MPCC-LICQ assumption, but already under the MPCC-MFCQ condition it can be violated, see, e.g., [25]. The different behavior we observe for cardinality constrained problems is due to the equality \({\mathscr {L}}_Z(x^*,y^*)^\circ = {\mathscr {L}}_Z^{CC}(x^*,y^*)^\circ \) established in the proof of Theorem 3.7. This so-called intersection property, which implies CC-GCQ = GCQ, is not satisfied for general MPCCs.

  2. (b)

    As a consequence of the previous remark, it follows that S-stationarity is a necessary optimality condition under the fairly strong MPCC-LICQ condition, but neither under the MPCC-MFCQ nor under any weaker MPCC constraint qualification. Recall that this is very much in contrast to the situation for cardinality-constrained problems.

  3. (c)

    Even if all functions involved in the MPCC are linear, a local minimum is, in general, only an M-stationary point. A counterexample from [25] shows that the stronger S-stationarity cannot be obtained without further assumptions.

  4. (d)

    While W-, C-, M-, and S-stationarity are four different stationarity concepts that arise in different contexts for general MPCCs, it turns out that these four stationarity conditions reduce to only two when applied to the particular MPCC (5.2) that results from the cardinality-constrained problem in case where this includes nonnegativity constraints.

  5. (e)

    For MPCCs, it is known that MPCC-LICQ implies a piecewise LICQ condition, whereas this is not true in our setting, cf. the corresponding discussion after Lemma 3.12.

  6. (f)

    Finally, we would like to stress that the popular MPCC-LICQ condition is likely to be violated for the problem (5.2). To this end, let \((x^*,y^*)\) be a solution satisfying \(y_i^* \in \{0, 1\}\) for all \(i = 1, \ldots , n\) and such that the cardinality constraint \(e^T y \ge n - \kappa \) is active; note that this situation is very likely to hold at a solution. Then the gradient of the cardinality constraint is obviously linearly dependent on the gradients which one obtains from the activity of the constraints \(y_i \ge 0\) and \(y_i \le 1\). Hence MPCC-LICQ is violated in this situation; for the same reason, also MPCC-MFCQ does not hold.

6 Final remarks

In this paper, we exploited the relation between the cardinality-constrained optimization problem and a suitable nonlinear program to define some problem-tailored constraint qualifications which were then used to prove a KKT-type optimality condition under fairly mild conditions. Like for standard nonlinear programs, these constraint qualifications depend on the feasible set, but not directly on the objective function.

There are some recent contributions to MPCCs where optimality conditions are derived under certain assumptions which involve both the constraints and the particular objective function, cf. [15]. In this way, one might still get relatively strong optimality conditions even for problems with difficult constraints. It might therefore be an interesting future research topic to see whether one can also obtain similar results for cardinality-constrained problems, possibly under weaker assumptions than in the MPCC-setting by taking into account the particular structure of our reformulated cardinality-constrained problem.