1 Introduction

Sparse solutions of optimization problems, i.e. solutions with a limited number of nonzero elements, are required in many areas including image and signal processing, mathematical statistics, machine learning, inverse environmental modeling and others. The approach proposed in this paper is motivated mainly by applications of these problems in the portfolio selection theory where sparse solutions are popular due to lower fluctuations of the future (out-of-sample) portfolio performance, cf. DeMiguel et al. [19], and also due to the reduction of the transaction costs which are growing with the increasing number of assets included into the portfolio. A standard way how to ensure the sparsity of the solutions is imposing a cardinality constraint where the number of nonzero elements of the solution is bounded. While some studies employ a penalty on the \(l_1\)-norm of the asset weight vector or its alternatives, e.g., Fastrich et al. [25], some consider the explicit cardinality constraints. The portfolio optimization problem resulting from the latter can be viewed as a mixed-integer problem and it is considered computationally challenging. The examples of solution techniques include exact branch-and-bound methods, e.g., Borchers and Mitchell [8], Bertsimas and Shioda [4]; exact branch-and-cut methods, e.g., Bienstock [6]; heuristic algorithms, e.g., Chang et al. [14]; and relaxation algorithms, e.g., Shaw et al. [55], Murray and Shek [44] and Burdakov et al. [10, 11]. Portfolio optimization problems with sparsity have been one of the main motivations to study optimization problems with cardinality constraints, see [3, 7, 13, 21, 28, 29, 52, 57, 59] and the references therein for some more ideas.

In this paper, we follow the approach from [11, 12, 28] and reformulate the cardinality constrained problem into a continuous optimization problem with orthogonality constraints. The resulting structure is similar to mathematical programs with complementarity constraints (MPCC), see [41, 45] and the references therein for some background. Thus one can try to adapt solution methods originally designed for MPCCs to the continuous reformulation of cardinality constrained optimization problems. One popular class of algorithms for MPCCs are the so-called regularization or relaxation methods, see for example [18, 35, 36, 40, 53, 56]. In the recent paper [11] the regularization method from [36] was adapted to cardinality constrained problems. In this paper we want to focus on the older regularization method from [53], which was suggested for MPCCs by Scholtes in 2001. Although the theoretical properties of this method are weaker than those of most of the other regularization methods, a numerical study on a collection of MPCC test problems in [34] showed that for MPCCs the Scholtes regularization performs very well in practice. Hence, in the theoretical part of this paper, we analyze the convergence properties of an adaptation of the Scholtes regularization method to problems with cardinality constraints. As it turns out, for cardinality constrained problems this regularization has stronger convergence properties than those known for MPCCs. This is somewhat surprising since the method studied in [11] retains the same properties known from MPCCs. In fact, for cardinality constrained problems, it seems that the Scholtes-type regularization has better convergence properties than the one discussed in [11]. More details are found in Sect. 3. Furthermore we show that the regularized problems possess better properties than the original problem in terms of constraint qualifications.

In portfolio optimization, two basic types of decision-making frameworks are adopted: the utility maximization and the return-risk trade-off analysis, see, e.g., Levy [39] for properties and relations between these two approaches. In the latter, it is important to define a risk that the concerned system has. In optimization problems governed by uncertain inputs such as rates of return, typically represented as random variables, the risk is explicitly quantified by a risk measure. In return-risk analysis, widely used both in theory and practice, an investor faces a trade-off between expected return and associated risk. In his pioneering work in 1952, Markowitz [42] adopted variance as a measure of risk in his mean-variance analysis. Many other alternatives were introduced since then. Nowadays, Value-at-Risk (VaR), which measures the maximum loss that can be expected during a specific time horizon with a probability \(\beta \) (\(\beta \) close to 1), is widely used in the banking and insurance industry as a downside risk measure. Despite its popularity, VaR lacks some important mathematical properties. Artzner et al. [2] presented an axiomatic definition of risk measures and coined a coherent risk measure which has certain reasonable properties. Conditional Value-at-Risk (CVaR), the mean value of losses that exceed the value of VaR, exhibits favorable mathematical properties such as coherence implying convexity. Rockafellar and Uryasev [50, 51] proposed to minimize CVaR for optimizing a portfolio so as to reduce the risk of high losses without prior computation of the corresponding VaR while computing VaR as a by-product. Their CVaR minimization formulation results usually in convex or even linear programs which proved attractive for financial optimization and risk management in practice due to their tractability for larger real life instances. For each of these risk measures, one can formulate corresponding mean-risk portfolio optimization problems.

Regardless of the risk measure used, these models are strongly dependent on the underlying distribution and its parameters, which are typically unknown and have to be estimated, cf. Fabozzi et al. [24]. Investors usually face the so-called estimation risk as they rely on a limited amount of data to estimate the input parameters. Portfolios constructed using these estimators perform very poorly in terms of their out-of-sample mean and variance as the resulting portfolio weights fluctuate substantially over time, cf. e.g., Michaud [43] and Chopra and Ziemba [16]. As some reformulations of mean-risk portfolio problems depend on the assumption of normality, poor performance can also be caused by deviations of the empirical distribution of returns from normality. One can thus also consider the distribution ambiguity in the sense that no knowledge of the return distribution for risky assets is assumed while the mean and variance/covariance are assumed to be known. For these reasons, we examine portfolio policies based on robust estimators. Robust portfolio selection deals with eliminating the impacts of estimation risk and/or distribution ambiguity. Goldfarb and Iyengar [32] studied the robust portfolio in the mean-variance framework. Instead of the precise information on the mean and the covariance matrix of asset returns, they introduced special types of uncertainty, such as box uncertainty and ellipsoidal uncertainty. They also considered the robust VaR portfolio selection problem by assuming a normal distribution. Chen et al. [15] minimized the worst-case CVaR risk measure over all distributions with fixed first and second moment information. The reader is referred to El Ghaoui et al. [23] and Popescu [49] for other studies on portfolio optimization with distributional robustness. Paç and Pınar [46] extend Chen et al. [15] to the case where a risk-free asset is also included and distributional robustness is complemented with ellipsoidal mean return ambiguity. Other choices of the ambiguity set for VaR and CVaR are considered e.g., by Tütüncü and Koening [58], Pflug and Wozabal [48], Zhu and Fukushima [60], DeMiguel and Nogales [20] and Delage and Ye [17]. For survey of recent approaches to construct robust portfolios, we refer to Kim et al. [38]. Despite the vast literature on robust portfolio optimization and many works on sparse portfolio optimization, there are only few works that concern both sparse and robust portfolios, cf. e.g., Bertsimas and Takeda [5].

As an application of the general problem class, we consider the cardinality constrained minimization of VaR and CVaR under normality of asset returns and their robust counterparts under distribution ambiguity. We assume that both first and second order moments of the random returns are known. The resulting problem can then be solved using the following four approaches: solve a mixed-integer reformulation using GUROBI, solve the continuous reformulation directly using SNOPT, apply the Scholtes regularization and the regularization from Burdakov et al. [11]. We perform a numerical experiment based on randomly generated test examples from the literature to compare these four approaches. A similar numerical study has been reported in Burdakov et al. [11] for a cardinality constrained (and non-robust) mean-variance model, where the objective function was convex quadratic and the standard constraints linear. Here, we investigate the investment problems with VaR and CVaR as introduced above, which leads to a more complicated convex objective function. In order to be able to solve the resulting problem with GUROBI, the objective function has to be reformulated using a second-order cone constraint.

The paper is organized as follows: we start Sect.  2 with a brief background on the continuous reformulation of cardinality constraints and the related optimality conditions and constraint qualifications. Section 3 is then devoted to the adaptation of the Scholtes regularization method and the analysis of its properties. In Sect.  4, we introduce the risk measures and define investment problems with a condition on portfolio sparsity. Section 5 finally provides an extensive numerical comparison of all four aforementioned solution approaches.

A few words on notation: By \(e \in \mathbb R^n\) we denote the vector with all components equal to one. For two vectors \(x,y \in \mathbb R^n\) the vector \(x \circ y \in \mathbb R^n\) denotes the componentwise (Hadamard) product of x and y. For a vector \(x \in \mathbb {R}^n\) we denote the support and its cardinality by

$$\begin{aligned} {{\mathrm{supp}}}(x) := \{i =1,\ldots ,n |x_i \ne 0\} \quad \text {and} \quad \Vert x\Vert _0 := |{{\mathrm{supp}}}(x)|. \end{aligned}$$

2 Cardinality constrained problems and a continuous reformulation

In this section we want to provide the necessary background on cardinality constrained optimization problems and the continuous reformulation used in [11, 12, 28]. Since the continuous reformulation has similarities to MPCCs, we also have to introduce suitable optimality conditions and constraint qualifications.

Let us consider a general cardinality constrained optimization problem

$$\begin{aligned} \min _x f(x) \quad {\text {s.t.}}\quad g(x) \le 0, \ h(x) = 0, \ \Vert x\Vert _0 \le \kappa , \end{aligned}$$
(1)

where \(f:\mathbb {R}^n \rightarrow \mathbb {R}\), \(g:\mathbb {R}^n \rightarrow \mathbb {R}^m\), and \(h:\mathbb {R}^n \rightarrow \mathbb {R}^p\) are assumed to be continuously differentiable. To simplify notation, we use the index sets

$$\begin{aligned} I_g(x) := \{i |g_i(x) = 0\} \qquad \text {and} \qquad I_0(x) := \{i |x_i = 0\} \end{aligned}$$

in the following.

Before we introduce the continuous reformulation on which our analysis is based, let us mention an alternative mixed integer reformulation, which was used for example in [6] and which we will use in our numerical comparison to solve the portfolio optimization problems with GUROBI. In case lower and upper bounds \(l \le x \le u\) on the variable x are known, problem (1) can be reformulated using binary decision variables into

$$\begin{aligned} \begin{aligned} \min _{x,z} \,\,\,&f(x) \\ \text {s.t. }&g(x) \le 0, \ h(x) = 0, \\&z \in \{0, 1\}^n,\\&l \circ z \le x \le u \circ z,\\&e^\top z \le \kappa . \end{aligned} \end{aligned}$$
(2)

If \(x_i\) is nonzero, then the corresponding \(z_i\) must be equal to one and by the reformulated cardinality constraint \(e^\top z \le \kappa \) this can happen at most \(\kappa \) times. However, as even for simple instances of cardinality constrained problems Bienstock [6] showed the problem to be NP-complete, solving (2) even using specialized global solution techniques can be computationally very time demanding.

Thus, we instead consider the following continuous reformulation of (1) introduced in Burdakov et al. [11]:

$$\begin{aligned} \begin{aligned} \min _{x,y} \,\,\,&f(x) \\ \text {s.t. }&g(x) \le 0, \ h(x) = 0, \\&0 \le y \le e,\\&x \circ y =0,\\&e^\top y \ge n-\kappa . \end{aligned} \end{aligned}$$
(3)

Here, in contrast to the previous reformulation, whenever \(x_i\) is nonzero, the corresponding \(y_i\) has to be equal to zero. Due to the reformulated cardinality constraint \(e^\top y \ge n-\kappa \) this can again occur at most \(\kappa \) times. Note that this problem is closely related to a mathematical program with complementarity constraints (MPCC) due to the “half-complementarity”constraints \(y \ge 0, x \circ y = 0\). In case the additional constraint \(x \ge 0\) is present as in the examples considered above, problem (3) in fact is an MPCC. Consequently (3) is a nonconvex problem, even when the original cardinality constrained problem (except for the cardinality constraint of course) was convex. Thus, one can in general not expect to obtain global minima. But if one is for example interested in obtaining local solutions or good starting points for a global method, this approach can be useful.

For MPCCs in addition to the KKT conditions (or strong stationarity) several weaker optimality conditions such as Mordukhovich- and Clarke-stationarity are known, see for example [12] for precise definitions. In contrast to this it was shown in [11, 12] that M-stationarity and all weaker concepts coincide for the continuous reformulation (3) and that strong and M-stationarity can be reduced to the following conditions.

Definition 1

Let \( (x^*, y^*) \) be feasible for (3). Then \( (x^*, y^*) \) is called

  1. (a)

    S-stationary (S = strong) if there exist multipliers \( \lambda \in \mathbb {R}^m, \mu \in \mathbb {R}^p \), and \( \gamma \in \mathbb {R}^n \) such that the following conditions hold:

    $$\begin{aligned} \nabla f(x^*) + \sum _{i=1}^m \lambda _i \nabla g_i(x^*) + \sum _{i=1}^p \mu _i \nabla h_i(x^*) + \sum _{i=1}^n \gamma _i e_i = 0,&\\ \lambda _i \ge 0, \ \lambda _i g_i(x^*) = 0 \quad \forall \;\;i = 1, \ldots , m, \\ \gamma _i = 0 \quad \forall \;\; i \text { such that } y_i^*=0. \end{aligned}$$
  2. (b)

    M-stationary (M = Mordukhovich) if there exist multipliers \( \lambda \in \mathbb {R}^m, \mu \in \mathbb {R}^p \), and \( \gamma \in \mathbb {R}^n \) such that the following conditions hold:

    $$\begin{aligned} \nabla f(x^*) + \sum _{i=1}^m \lambda _i \nabla g_i(x^*) + \sum _{i=1}^p \mu _i \nabla h_i(x^*) + \sum _{i=1}^n \gamma _i e_i = 0,&\\ \lambda _i \ge 0, \ \lambda _i g_i(x^*) = 0 \quad \forall \;\;i = 1, \ldots , m, \\ \gamma _i = 0 \quad \forall \;\;i \text { such that } x_i^* \ne 0. \end{aligned}$$

Note that S-stationarity is equivalent to the KKT conditions of (3) and still depends on the artificial variable y. In contrast, the M-stationarity condition is slightly weaker but independent from the artificial variable y. Following the idea from [11, 12] one can also define constraint qualifications for (3) depending on the original variable x only. This leads for example to the following version of Mangasarian–Fromowitz constraint qualification:

Definition 2

Let \((x^*,y^*)\) be feasible for (3). Then \((x^*,y^*)\) satisfies the cardinality constrained Mangasarian–Fromowitz constraint qualification (CC-MFCQ) if the gradients

$$\begin{aligned} \nabla g_i(x^*)\ (i\in I_g(x^*))\quad \text {and}\quad \nabla h_i(x^*)\ (i=1,\dots ,p),\ e_i\ (i\in I_0(x^*)) \end{aligned}$$

are positively linearly independent, i.e. if one cannot find multipliers \(\lambda \ge 0\) and \(\mu , \gamma \) such that \((\lambda , \mu , \gamma ) \ne 0\) and

$$\begin{aligned} \sum _{i \in I_g(x^*)} \lambda _i \nabla g_i(x^*) + \sum _{i=1}^p \mu _i \nabla h_i(x^*) + \sum _{i\in I_0(x^*)} \gamma _i e_i = 0. \end{aligned}$$

It was shown in [12] that every local minimum of (3), where CC-MFCQ or a weaker CC constraint qualification holds, is an S-stationary point. This result differs from what is known for general MPCCs, where S-stationarity of local minima can only be guaranteed under an MPCC analogue of the linear independence constraint qualification. Under MPCC-MFCQ and all weaker MPCC constraint qualifications, local minima of an MPCC can only be guaranteed to be M-stationary.

In this paper we are interested in portfolio optimization as an application, see Sect. 4 for details. The resulting optimization problems turn out to be convex, except for the cardinality constraint of course. For this special class of cardinality constrained optimization problems, it is known that S-stationarity of a point \((x^*,y^*)\) implies that it is a local minimum, see [12]:

Theorem 1

Consider (3), where \(f, g_i:\mathbb {R}^n \rightarrow \mathbb {R}\) are convex and \(h:\mathbb {R}^n \rightarrow \mathbb {R}^p\) is linear. Then every S-stationary point \((x^*,y^*)\) is a local minimum of (3).

3 Properties of a Scholtes-type regularization method

Currently, there exist many different solution approaches for MPCCs, see [41, 45] for an introduction. One popular class of algorithms for MPCCs are so-called regularization methods, see [18, 35, 36, 40, 53, 56], where one replaces the original, difficult problem by a sequence of simpler nonlinear programs (NLP), whose feasible set shrinks to the original one in the limit. In this section, we briefly introduce the regularization method by Burdakov et al. [11] for cardinality constrained problems. Then we adapt the regularization method from Scholtes [53] to cardinality constrained problems and analyze its convergence properties as well as the regularized subproblems.

In [11] the idea from Kanzow and Schwartz [36] was adapted to cardinality constrained problems: The corresponding regularized problem was obtained by replacing the constraints \(y \ge 0, x \circ y = 0\) in (3) by the inequalities

$$\begin{aligned} \varPhi ^+(x,y; t) \le 0,\quad \varPhi ^-(x,y; t) \le 0, y \ge 0, \end{aligned}$$
(4)

where \(\varPhi ^+_i(x,y;t) = \varphi (x_i,y_i;t)\) and \(\varPhi ^-_i(x,y;t) = \varphi (-\,x_i,y_i;t)\) with

$$\begin{aligned} \varphi (a,b;t) = {\left\{ \begin{array}{ll} (a-t)(b-t) &{}\quad \text{ if } \; a+b \ge 2t, \\ -\,\frac{1}{2}\left[ (a-t)^2 + (b-t)^2 \right] &{}\quad \text{ if } \; a+b < 2t. \end{array}\right. } \end{aligned}$$

It is not difficult to see that for \(t \ge 0\) the inequality \(\varphi (a,b;t) \le 0\) is equivalent to \(\min \{a,b\} \le t\). In the case one knows \(x \ge 0\), as in the considered application in portfolio optimization, one can of course eliminate the constraint \(\varPhi ^-(x,y; t) \le 0\).

In this paper, we want to adapt the regularization method introduced by Scholtes [53] for MPCCs to (3). Although this regularization technique is one of the oldest and the theoretical results known for MPCCs are weaker than those known for example for the regularization from [36], it is numerically still very successful for MPCCs, see [34].

For a regularization parameter \(t > 0\) we consider the regularized problem

$$\begin{aligned} \begin{aligned} \text {NLP({ t})}:\min _{x,y} \,\,\,&f(x) \\ \text {s.t. }&g(x) \le 0, \quad h(x) = 0, \\&0 \le y \le e,\\&-te \le x \circ y \le te,\\&e^\top y \ge n-\kappa . \end{aligned} \end{aligned}$$
(5)

Again it is easy to see that NLP(0) corresponds to the original problem (3) and that one can eliminate the constraint \(x \circ y \ge -\,te\) in the case one knows \(x \ge 0\). A comparison of the feasible sets for a pair \((x_i,y_i)\) for both regularization techniques is given in Fig. 1.

Fig. 1
figure 1

Illustration of the constraints \(0 \le y_i \le 1, \ x_iy_i = 0\) and the two regularizations. a Complementarity constraints. b Kanzow–Schwartz regularization. c Scholtes regularization

Now consider a sequence of regularized problems NLP(\(t^k\)) for \(t^k \downarrow 0\) and assume that we can calculate a sequence of corresponding KKT points \((x^k,y^k)\), which is converging to some limit \((x^*,y^*)\). Then one easily sees that the limit \((x^*,y^*)\) is feasible for the original problem (3). However, the question remains whether \((x^*,y^*)\) is some kind of stationary point of (3), too. In the classical MPCC case, one can only prove that the limit of such a sequence is a C-stationary point and there exist examples illustrating that this result is sharp. Since C-stationary points and M-stationary points coincide for optimization problems with cardinality constraints, cf. [12], one would assume that we obtain M-stationarity of the limit here. However, it turns out that the limit is in fact even S-stationary.

This result is even more surprising since it was shown in [11] that the adaptation of the Kanzow–Schwartz regularization retains its convergence properties known from MPCCs, i.e. converges to M-stationary points for both MPCCs and cardinality constrained problems. But to be precise, the results for the Kanzow–Schwartz regularization were shown under a constant positive linear dependence constraint qualification, which is weaker than the MFCQ condition used for the Scholtes regularization here.

Theorem 2

Let \((t^k)_{k}\) be a sequence with \(t^k>0\) for all \(k\in \mathbb {N}\) and \(t^k\downarrow 0\). Let \((x^k,y^k)_{k}\) be a sequence of KKT points of NLP(\(t^k\)) converging to \((x^*,y^*)\). If CC-MFCQ holds at \((x^*,y^*)\), then \((x^*,y^*)\) is an S-stationary point of (3).

Proof

Note that \((x^*,y^*)\) is a feasible point of (3). Since \((x^k,y^k)_{k}\) is a sequence of KKT points of NLP(\(t^k\)), there are multipliers \((\lambda ^k,\mu ^k,{{\tilde{\gamma }}}^k,\delta ^k,\nu ^k)\) for all \(k\in \mathbb {N}\) such that

$$\begin{aligned}&\nabla f(x^k)+\sum \limits _{i=1}^m\lambda _i^k \nabla g_i(x^k)+\sum \limits _{i=1}^p\mu _i^k\nabla h_i(x^k)+\sum \limits _{i=1}^n{{\tilde{\gamma }}}_i^ky_i^ke_i = 0,&\end{aligned}$$
(6a)
$$\begin{aligned}&-\delta ^ke+\sum \limits _{i=1}^n\nu _i^ke_i+\sum \limits _{i=1}^n{{\tilde{\gamma }}}_i^kx_i^ke_i = 0,&\end{aligned}$$
(6b)
$$\begin{aligned}&\lambda _i^k{\left\{ \begin{array}{ll} \ge 0, &{}\quad \text {if }\; g_i(x^k)=0,\\ = 0, &{}\quad \text {otherwise}, \end{array}\right. } \quad \forall \; i=1,\dots ,m, \end{aligned}$$
(6c)
$$\begin{aligned}&\delta ^k{\left\{ \begin{array}{ll} \ge 0, &{}\quad \text {if }\; e^\top y^k = n-\kappa ,\\ = 0, &{} \quad \text {otherwise}, \end{array}\right. }&\end{aligned}$$
(6d)
$$\begin{aligned}&{{\tilde{\gamma }}}_i^k{\left\{ \begin{array}{ll} \ge 0,&{}\quad \text {if }\; x_i^k\cdot y_i^k=t^k,\\ \le 0,&{}\quad \text {if }\;x_i^k\cdot y_i^k=-\,t^k,\\ =0,&{}\quad \text {otherwise,} \end{array}\right. } \quad \forall \;i=1,\dots ,n, \end{aligned}$$
(6e)
$$\begin{aligned}&\nu _i^k{\left\{ \begin{array}{ll} \le 0,&{}\quad \text {if }\; y_i^k=0,\\ \ge 0,&{}\quad \text {if }\;y_i^k=1,\\ =0,&{}\quad \text {otherwise,} \end{array}\right. }\quad \forall \; i=1,\dots ,n. \end{aligned}$$
(6f)

Let us first have a closer look at the KKT conditions (6). A componentwise inspection of Eq. (6b) yields

$$\begin{aligned} \delta ^k=\nu _i^k+{{\tilde{\gamma }}}_i^kx_i^k \end{aligned}$$

for all \(i=1,\dots ,n\). The sign restrictions on \({{\tilde{\gamma }}}^k\) imply \({{\tilde{\gamma }}}_i^k\cdot x_i^k\ge 0\). Assuming there is an index \(i\in \{1,\dots ,n\}\) with \(\nu _i^k<0\), it follows that \(y_i^k=0\) and then using (6e) also \({{\tilde{\gamma }}}_i^k=0\). Thus the above equation yields \(0>\nu _i^k=\delta ^k\ge 0\) which is a contradiction. Consequently we have

$$\begin{aligned} \nu _i^k\ge 0\quad \forall \; i=1,\dots ,n. \end{aligned}$$
(7)

In case \(\delta ^k>0\) we have

$$\begin{aligned} 0<\delta ^k=\nu _i^k+{{\tilde{\gamma }}}_i^k x_i^k \end{aligned}$$
(8)

for all \(i=1,\dots ,n\). Then \(\nu _i^k>0\) or \({{\tilde{\gamma }}}_i^kx_i^k>0\) has to hold for all \(i=1,\dots ,n\), which is true if and only if

$$\begin{aligned} y_i^k=1\quad \text {or}\quad y_i^k=\frac{t^k}{|x_i^k|}. \end{aligned}$$
(9)

For all \(k \in \mathbb {N}\) define

$$\begin{aligned} \gamma _i^k:= {{\tilde{\gamma }}}_i^ky_i^k \quad \forall \;i=1,\dots ,n. \end{aligned}$$

Boundedness of the multipliers \((\lambda ^k,\mu ^k,\gamma ^k)_{k}\) We show this by contradiction. Thus, assume that \(\lim _{k\rightarrow \infty } \Vert (\lambda ^k,\mu ^k,\gamma ^k) \Vert =\infty \). Then the sequence

$$\begin{aligned} \left( \frac{(\lambda ^k,\mu ^k,\gamma ^k)}{\Vert (\lambda ^k,\mu ^k,\gamma ^k) \Vert }\right) _{k\in \mathbb {N}} \end{aligned}$$

is bounded and without loss of generality let it converge to some limit

$$\begin{aligned} 0\not =({{\bar{\lambda }}},{{\bar{\mu }}},{{\bar{\gamma }}}):=\lim \limits _{k\rightarrow \infty }\frac{(\lambda ^k,\mu ^k,\gamma ^k)}{\Vert (\lambda ^k,\mu ^k,\gamma ^k) \Vert }. \end{aligned}$$

Clearly, \({{\bar{\lambda }}}\ge 0\). Further, for all i such that \(g_i(x^*) < 0\) we have \(g_i(x^k) < 0\) and thus also \(\lambda _i^k = 0\) for all k sufficiently large. That is, we have \({{\mathrm{supp}}}({{\bar{\lambda }}})\subseteq I_g(x^*)\).

Next, to proceed with obtaining a contradiction, let us show that \({{\mathrm{supp}}}({{\bar{\gamma }}}) \subseteq I_0(x^*)\). Assume, to the contrary, that there is an index \(j\in \{1,\dots ,n\}\), such that \(x_j^*\not =0\) and \({{\bar{\gamma }}}_j\not =0\). Then we have \(y_j^*=0\) and consequently

$$\begin{aligned} x_j^k\not =0,\quad y_j^k < 1 \end{aligned}$$

for sufficiently large k. Since \({{\bar{\gamma }}}_j^k\not =0\) we have \(\gamma _j^k\not =0\) and hence \({{\tilde{\gamma }}}_j^k\not =0\) for all k sufficiently large. This implies \(\delta ^k=\nu _j^k+{{\tilde{\gamma }}}_j^kx_j^k>0\) and thus \(\delta ^k=\nu _i^k+{{\tilde{\gamma }}}_i^kx_i^k > 0\) for all \(i=1,\dots ,n\). Due to the KKT conditions, \(\delta ^k > 0\) is only possible if

$$\begin{aligned} e^\top y^k=n-\kappa \end{aligned}$$
(10)

for sufficiently large k. Furthermore, for sufficiently large k, \({{\tilde{\gamma }}}^k_j \ne 0\) implies

$$\begin{aligned} 0 < y_j^k=\frac{t^k}{|x^k_j|} \,\, \text { and } \,\, \nu _j^k=0{.} \end{aligned}$$
(11)

As \(y_j^k\rightarrow y_j^*=0\) and \(y_j^k > 0\) hold for k sufficiently large, the sequence \((y_j^k)_k\) is strictly monotonically decreasing (at least on a suitable subsequence). Moreover, since \(e^\top y^k=n-\kappa \) for all k sufficiently large (and \(y^k\) is a finite-dimensional vector), strict monotone decrease of \((y_j^k)_k\) implies the existence of an index l such that the sequence \((y_l^k)_k\) is strictly monotonically increasing (possibly on a suitable subsequence) and compensates the decrease of \((y_j^k)_k\) in such a way that \(e^\top y^k=n-\kappa \) is preserved. Thus, we have

$$\begin{aligned} y_l^*> 0,\ x_l^* = 0 \quad \text {and}\quad y_l^k<1,\ \nu _l^k=0\ \text {for sufficiently large }k. \end{aligned}$$

Invoking (9) and \(y_l^k<1\) for sufficiently large k, we have

$$\begin{aligned} y_l^k=\frac{t^k}{|x_l^k|}. \end{aligned}$$
(12)

Since \(\nu _j^k=\nu _l^k=0\), (8), (11) and (12) implies

$$\begin{aligned} \frac{|\gamma _j^k|}{|\gamma _l^k|}=\frac{|{{\tilde{\gamma }}}_j^k\cdot y_j^k|}{|{{\tilde{\gamma }}}_l^k\cdot y_l^k|} =\frac{\left| \frac{\delta ^k}{x_j^k}\cdot \frac{t^k}{|x_j^k|}\right| }{\left| \frac{\delta ^k}{x_l^k}\cdot \frac{t^k}{|x_l^k|}\right| } =\left( \frac{x_l^k}{x_j^k}\right) ^2 \xrightarrow [k\rightarrow \infty ]{} \left( \frac{x_l^*}{x_j^*}\right) ^2 = 0. \end{aligned}$$

This leads to the contradiction

$$\begin{aligned} 0\not =|{{\bar{\gamma }}}_j|=\lim \limits _{k\rightarrow \infty }\frac{|\gamma _j^k|}{\Vert (\lambda ^k,\mu ^k,\gamma ^k) \Vert } \le \lim \limits _{k\rightarrow \infty }\frac{|\gamma _j^k|}{|\gamma _l^k|}=0, \end{aligned}$$

which concludes the proof of \({{\mathrm{supp}}}({{\bar{\gamma }}})\subseteq I_0(x^*)\).

Now, dividing (6a) by \(\Vert (\lambda ^k,\mu ^k,\gamma ^k) \Vert \) and taking the limit \(k\rightarrow \infty \) yields

$$\begin{aligned} \sum \limits _{i\in I_g(x^*)}{{\bar{\lambda }}}_i \nabla g_i(x^*)+\sum \limits _{i=1}^p{{\bar{\mu }}}_i\nabla h_i(x^*)+\sum \limits _{i\in I_0(x^*)}{{\bar{\gamma }}}_ie_i = 0. \end{aligned}$$

However, this, together with \({\bar{\lambda }}\ge 0\), \({{\mathrm{supp}}}(\bar{\lambda }) \subseteq I_g(x^*)\), \({{\mathrm{supp}}}({{\bar{\gamma }}}) \subseteq I_0(x^*)\), and \(({{\bar{\lambda }}},{{\bar{\mu }}},{{\bar{\gamma }}})\not =0\), is in contradiction with the assumption of CC-MFCQ at \((x^*, y^*)\). Thus, the sequence of multipliers \((\lambda ^k,\mu ^k,\gamma ^k)_{k}\) is bounded and without loss of generality we can assume that the whole sequence \((\lambda ^k,\mu ^k,\gamma ^k)_{k}\) converges to some limit

$$\begin{aligned} (\lambda ^*,\mu ^*,\gamma ^*):=\lim \limits _{k\rightarrow \infty }(\lambda ^k,\mu ^k,\gamma ^k). \end{aligned}$$

Taking the limit in (6a) as \(k\rightarrow \infty \), we obtain

$$\begin{aligned} \nabla f(x^*)+\sum \limits _{i\in I_g(x^*)}\lambda ^*_i \nabla g_i(x^*)+\sum \limits _{i=1}^p\mu ^*_i\nabla h_i(x^*)+\sum \limits _{i=1}^n\gamma ^*_ie_i = 0. \end{aligned}$$

S-Stationarity of \(x^*\) together with the multipliers \(\lambda ^*,\mu ^*,\gamma ^*\) Using analogous arguments as previously, we have \(\lambda ^* \ge 0\) and \({{\mathrm{supp}}}(\lambda ^*) \subseteq I_g(x^*)\).

To complete the proof, it remains to show that \(y_i^* = 0\) implies \( \gamma ^*_i = 0\). Again, to the contrary, assume that there exists an index j such that \(y_j^* = 0\) and \( \gamma ^*_j \ne 0\). This implies \( \gamma ^k_j = {{\tilde{\gamma }}}^k_j y^k_j \ne 0\) for all k sufficiently large and sequence \((y^k_j)_n\) with,

$$\begin{aligned} 0 < y^k_j = \frac{t^k}{|x^k_j|}, \end{aligned}$$

is strictly monotonically decreasing to zero (at least on a suitable subsequence). Thus, we have \(x_j^k \ne 0\) and \(\nu ^k_j = 0\) for all k sufficiently large which together with \(\gamma ^k_j \ne 0\) implies \(\delta ^k = {{\tilde{\gamma }}}^k_jx^k_j > 0\) and \(e^\top y^k = n-\kappa \) for all k sufficiently large. Analogously to the previous part of the proof, there has to exist an index l such that \((y^k_l)_k\) is strictly increasing and

$$\begin{aligned} 0< y^k_l = \frac{t^k}{|x^k_l|}. \end{aligned}$$

This implies \(y^k_l \rightarrow y^*_l > 0\) and \(\nu ^k_l = 0\) and thus \(\delta ^k = \gamma ^k_l x^k_l\) and \(x^k_l \ne 0\) for all k sufficiently large. Finally, this together with \(\gamma _i^k = {{\tilde{\gamma }}}^k_i y^k_i\) for all i yields

$$\begin{aligned} \frac{|\gamma ^*_j|}{|\gamma ^*_l|} = \lim \limits _{k\rightarrow \infty }\frac{|\gamma _j^k|}{|\gamma _l^k|} = \lim \limits _{k\rightarrow \infty }\frac{|{{\tilde{\gamma }}}_j^ky_j^k|}{|{{\tilde{\gamma }}}_l^ky_l^k|} = \lim \limits _{k\rightarrow \infty }\frac{\left| \frac{\delta ^k}{|x_j^k|}t^ky_j^k\right| }{\left| \frac{\delta ^k}{|x_l^k|}t^ky_l^k\right| } = \lim \limits _{k\rightarrow \infty }\left( \frac{y_j^k}{y_l^k}\right) ^2 = \left( \frac{y_j^*}{y_l^*}\right) ^2 = 0, \end{aligned}$$

which is a contradiction to \( \gamma _j \ne 0\). This completes the proof. \(\square \)

In the MPCC case it was shown that limit points of the Scholtes regularization are C-stationary not only if one computes exact KKT points of the regularized problems but also if one only computes approximate KKT points, see [37]. An interesting question is of course if the Scholtes-type regularization for cardinality constrained problems inherits this favorable property. Unfortunately the above line of argument can not be transferred directly to a sequence of approximate KKT-points. The proof relies heavily on the complementarity between the inequality constraints and their corresponding multipliers to show that \({{\mathrm{supp}}}(\gamma ) \subseteq I_0(x^*)\), see Eqs. (9) and (12). Thus, at the moment, the convergence properties of the Scholtes-type regularization in the inexact case remain an open question.

The previous proof differs slightly from the one typically used in the MPCC case, see e.g., [34], since CC-MFCQ does not demand positive linear independence of the gradients of the constraints with respect to y. This is the reason why we only normalized the multipliers \(\lambda ^k, \mu ^k, \gamma ^k\) corresponding to constraints on x in the previous proof. The drawback of this approach is that the verification of the correct support of the limit of \(\gamma ^k\) is then more lengthy. The central idea to exploit the fact \(e^\top y^k = n- \kappa \) was borrowed from [1], where a similar structure was used to reformulate chance constrained optimization problems.

Instead of normalizing only the multipliers \(\lambda ^k, \mu ^k, \gamma ^k\) one can also normalize all multipliers \(\lambda ^k, \mu ^k, \gamma ^k, \nu ^k, \delta ^k\). This simplifies the verification of the correct support of \(\gamma ^k\) but makes it harder to obtain a contradiction to CC-MFCQ. We follow this route in the proof of the next result, where we show that the regularized problems NLP(t) indeed have better properties than the original problem (3).

Theorem 3

Let \((x^*,y^*)\) be feasible for (3) and CC-MFCQ hold there. Then there is a neighborhood \({\mathcal {U}}\) of \((x^*,y^*)\) such that for all \(t > 0\) the standard MFCQ for NLP(t) holds at every \((x,y)\in {\mathcal {U}}\) feasible for NLP(t).

Proof

Let us assume that the claim is false. Then there exist sequences \((x^k,y^k)_k \rightarrow (x^*,y^*)\) and \((t^k)_k > 0\) such that \((x^k,y^k)\) is feasible for NLP(\(t^k\)) but MFCQ is violated. Consequently, we can find multipliers \((\lambda ^k,\mu ^k,{{\tilde{\gamma }}}^k,\delta ^k,\nu ^k)_k\) such that for all \(k\in \mathbb {N}\)

$$\begin{aligned} (\lambda ^k,\mu ^k,{{\tilde{\gamma }}}^k,\delta ^k,\nu ^k) \not =0\quad \forall \; k\in \mathbb {N}, \end{aligned}$$
(13)

along with condition

$$\begin{aligned} \sum \limits _{i=1}^m\lambda _i^k \nabla g_i(x^k)+\sum \limits _{i=1}^p\mu _i^k\nabla h_i(x^k)+\sum \limits _{i=1}^n{{\tilde{\gamma }}}_i^ky_i^ke_i = 0,&\end{aligned}$$
(14)

and (6b)–(6f) are satisfied. This means the relevant gradients are positively linearly dependent at \((x^k,y^k)\), hence MFCQ is violated. Analogously to the proof of Theorem 3.1, we can show that \(\nu _i^k < 0\) cannot occur, thus we have to consider \(\nu _i^k \ge 0\) only.

Now, let us define \(\gamma _i^k:={{\tilde{\gamma }}}_i^ky_i^k\) for all \(i=1,\dots ,n\) and all \(k\in \mathbb {N}\). Because for all \(i=1,\dots ,n\) and all \(k\in \mathbb {N}\) we have \({{\tilde{\gamma }}}_i^k\not =0\) only if \(y_i^k\not =0\), we immediately obtain

$$\begin{aligned} {{\mathrm{supp}}}(\gamma ^k) = {{\mathrm{supp}}}({{\tilde{\gamma }}}^k) \quad \forall \;k \in \mathbb {N}. \end{aligned}$$

Using (6e) we have

$$\begin{aligned} {{\tilde{\gamma }}}_i^kx_i^k= {\left\{ \begin{array}{ll} \gamma _i^k\cdot \frac{x_i^k}{y_i^k},&{}\quad \text {if }\; {|x_i^k\cdot y_i^k|=t_k},\\ 0,&{}\quad \text {otherwise.} \end{array}\right. } \end{aligned}$$

Therefore, we can write Eqs. (14) and (6b) for all \(k\in \mathbb {N}\) as

$$\begin{aligned}&\sum \limits _{i=1}^m\lambda _i^k \nabla g_i(x^k)+\sum \limits _{i=1}^p\mu _i^k\nabla h_i(x^k)+\sum \limits _{i=1}^n\gamma _i^ke_i = 0, \end{aligned}$$
(15)
$$\begin{aligned}&{\delta ^k = {\left\{ \begin{array}{ll} \nu _i^k + \gamma _i^k\frac{x_i^k}{y_i^k} &{}\quad \text {if }\; |x_i^k\cdot y_i^k|=t_k, \\ \nu _i^k &{}\quad \text {otherwise}. \end{array}\right. }} \end{aligned}$$
(16)

Due to the sign restrictions in (6e) we know \({{\tilde{\gamma }}}_i^kx_i^k\ge 0\) for all \(i=1,\dots ,n\) and all \(k\in \mathbb {N}\). Hence we also have

$$\begin{aligned} \gamma _i^kx_i^k={{\tilde{\gamma }}}_i^kx_i^ky_i^k\ge 0\quad \forall \; i=1,\dots ,n,\quad \forall \; k\in \mathbb {N}. \end{aligned}$$
(17)

We proceed to deduce a contradiction with CC-MFCQ at \((x^*,y^*)\). Since by assumption \((\lambda ^k,\mu ^k,{{\tilde{\gamma }}}^k,\delta ^k,\nu ^k) \not =0\) for all \(k\in \mathbb {N}\), we can choose the multipliers without loss of generality such that \(\Vert (\lambda ^k,\mu ^k,\gamma ^k,\delta ^k,\nu ^k) \Vert =1\) for all \(k\in \mathbb {N}\) and that the whole sequence converges to some limit

$$\begin{aligned} 0\not =(\lambda ,\mu ,\gamma ,\delta ,\nu ):=\lim \limits _{k\rightarrow \infty }(\lambda ^k,\mu ^k,\gamma ^k,\delta ^k,\nu ^k). \end{aligned}$$

We have \(\lambda \ge 0\). Since for all i such that \(g_i(x^*) < 0\) we know \(g_i(x^k) < 0\) and thus \(\lambda _i^k = 0\) for all k sufficiently large, we have

$$\begin{aligned} {{\mathrm{supp}}}(\lambda )\subseteq I_g(x^*). \end{aligned}$$
(18)

We will prove \({{\mathrm{supp}}}(\gamma )\subseteq I_0(x^*)\) by contradiction. To this end we assume that there is an index \(j\in \{1,\dots ,n\}\) such that \(\gamma _j\not =0\) and \(x_j^*\not =0\). This implies \(|x_i^k\cdot y_i^k|=t_k\) for sufficiently large k and \(y_j^*=0\). Therefore we know \(x_j^k\not =0\) and \(y_j^k>0\) for k sufficiently large and \(y_j^k\rightarrow 0\) for \(k\rightarrow \infty \). Thus we have \(y_j^k<1\) and hence \(\nu _j^k=0\) for sufficiently large k. Keeping in mind (17) it follows from the j-th row of (16) that

$$\begin{aligned} \delta ^k=\nu _j^k+\gamma _j^k\frac{x_j^k}{y_j^k}=\gamma _j^k\frac{x_j^k}{y_j^k}\longrightarrow \infty \quad (k\rightarrow \infty ). \end{aligned}$$

Because \((\lambda ^k,\mu ^k,\gamma ^k,\delta ^k,\nu ^k)_{k }\) is convergent, this is a contradiction. Consequently we have

$$\begin{aligned} {{\mathrm{supp}}}(\gamma )\subseteq I_0(x^*). \end{aligned}$$
(19)

It remains to show that \((\lambda ,\mu ,\gamma )\not =0\). We will show this also by contradiction. Assume that \((\lambda ,\mu ,\gamma )=0\). Since \((\lambda ,\mu ,\gamma ,\delta ,\nu )\not =0\) this implies \((\delta ,\nu )\not =0\). Due to \(\nu ^k \ge 0\) and (17) combined with (16) we know \(\delta ^k \ge \max _{i=1,\ldots ,n}\nu _i^k\) and thus \((\delta ,\nu )\not =0\) implies \(\delta > 0\) and \(\delta ^k > 0\) for all k sufficiently large. This is only possible, if \(e^T y^k = n - \kappa \) for all k large. For all i with \(y_i^* > 0\) we know \(x_i^* = 0\) and thus \(\gamma = 0\) implies

$$\begin{aligned} 0 < \delta ^* = \lim _{k \rightarrow \infty } \nu _i^k + \gamma _i^k \frac{x_i^k}{y_i^k} = \nu _i. \end{aligned}$$

Hence, for all i such that \(y_i^* > 0\) we know \(y_i^k = 1\) for all k sufficiently large and therefore \(y_i^* = 1\). Due to \(e^Ty^k = n- \kappa < n\) for all k large there exists at least one index j such that \(y_j^k = 0\) for all k large and consequently \(\nu _i^k = 0\) and \(|x_i^k \cdot y_i^k| \ne t_k\). This, however, implies \(\delta ^k = 0\), a contradiction.

Thus the assumption \((\lambda ,\mu ,\gamma )=0\) is false and we have

$$\begin{aligned} (\lambda ,\mu ,\gamma )\not =0. \end{aligned}$$

Using (18) and (19), it follows from (15) for \(k\rightarrow \infty \) that

$$\begin{aligned} \sum \limits _{i\in I_g(x^*)}\lambda _i \nabla g_i(x^*)+\sum \limits _{i=1}^p\mu _i\nabla h_i(x^*)+\sum \limits _{i\in I_0(x^*)}\gamma _ie_i = 0. \end{aligned}$$
(20)

Since \((\lambda ,\mu ,\gamma )\not =0\) and \(\lambda \ge 0\), this is a contradiction to CC-MFCQ. \(\square \)

Regarding the existence of a solution of NLP(t), as well as the existence of limit points of a sequence of KKT points of NLP(\(t^k\)), as \(t^k\downarrow 0\), we like to refer the reader to [9]. In that article second order optimality conditions for (3) are established and used to address these points.

4 Robust risk measures for portfolio optimization under distribution ambiguity

In this section we want to provide an application for the abstract cardinality constrained optimization problems discussed in the previous sections. To do so, we consider the following portfolio optimization problem:

$$\begin{aligned} \begin{aligned} \min _x \,\,\,&r(x) \\ \text {s.t. }&e^\top x = 1, \\&0 \le x \le u,\\&\Vert x\Vert _0 \le \kappa . \end{aligned} \end{aligned}$$
(21)

Here, we consider a market with n risky financial assets. The disposable wealth is to be allocated into a portfolio \(x \in \mathbb {R}^n\), such that each component \(x_i\) denotes the fraction of disposable wealth to be invested into the ith asset for \(i=1, \dots , n\). We do not allow short-sales, i.e. we assume \(x \ge 0\). For numerical purposes, we assume that there exist upper bounds \(u \ge 0\) on the possible investment. If no upper bounds are present, one can use \(u = e\) due to the budget constraint \(e^\top x = 1\). The latter states that we demand that the whole disposable budget is invested. Additionally, we introduce the cardinality constraint \(\Vert x\Vert _0 \le \kappa \), i.e. one may invest in at most \(\kappa \) assets. Naturally, we assume that \(\kappa < n\).

For a vector x of allocations to n risky assets and a random vector \(\xi \) of return rates for these assets, we consider the following loss function

$$\begin{aligned} \ell (x,\xi ) = -\, x^\top \xi . \end{aligned}$$

Assume that \(\xi \) follows a probability distribution \(\pi \) from the ambiguity (uncertainty) set \(D = \{\pi |E_\pi [\xi ] = \mu , {{\mathrm{\textit{Cov}}}}_\pi (\xi ) = \Gamma \succ 0 \}\) of distributions with expected value \(\mu \) and positive definite covariance matrix \(\Gamma \). Note that Markowitz [42] considered the variance \(\sigma ^2(x) = x^\top \Gamma x\) as a risk measure associated with a portfolio x.

In the 90s, the investment bank J.P. Morgan reinvented the quantile risk measure (quantile premium principle) used by actuaries for investment banking, giving rise to Value-at-Risk (VaR). Associated with a confidence level \(\beta \) and portfolio x, VaR is defined as

$$\begin{aligned} {{\mathrm{VaR}}}_\beta (x) = \min \{z |P_\pi ( \ell (x, \xi ) \le z) \ge \beta \}. \end{aligned}$$

Artzner et al. [2] defined coherent risk measure as a measure satisfying monotonicity, translation invariance, subadditivity and positive homogeneity. It is known, that VaR is not a coherent risk measure as it fails subadditivity. On the other hand, the Conditional Value-at-Risk (CVaR), as introduced by Rockafellar and Uryasev [50], turns out to be a convex and coherent risk measure. CVaR at level \(\beta \) is defined as the expected value of loss that exceeds \({{\mathrm{VaR}}}_\beta (x)\). Alternatively, Rockafellar and Uryasev [50] showed that calculation of CVaR and VaR can be achieved simultaneously by minimizing the following auxiliary function with respect to \(\alpha \in \mathbb R\)

$$\begin{aligned} F_\beta (x,\alpha ) = \alpha + \frac{1}{1-\beta }E[(\ell (x,\xi )-\alpha )_+], \end{aligned}$$

where \((v)_+ = \max \{0, v\}\). Thus,

$$\begin{aligned} {{\mathrm{CVaR}}}_\beta (x) = \min _\alpha F_\beta (x, \alpha ) \end{aligned}$$

and \({{\mathrm{VaR}}}_\beta (x)\) is the left endpoint of the interval \({{\mathrm{argmin}}}_\alpha F_\beta (x, \alpha )\).

Let us assume normality of returns \(\xi \). Denote by \(\phi \) and \(\varPhi \) density and cumulative distribution function of the standard normal distribution, respectively. Following Fabozzi et al. [24], originating in Rockafellar and Uryasev [50], the Value-at-Risk can be expressed as

$$\begin{aligned} {{\mathrm{VaR}}}_\beta (x) = \zeta _\beta \sqrt{x^\top Q x} - \mu ^\top x, \end{aligned}$$
(22)

where \(\zeta _\beta = -\,\varPhi ^{-1}(1-\beta )\), and assuming \(\beta > 0.5\), the Conditional Value-at-Risk reduces to

$$\begin{aligned} {{\mathrm{CVaR}}}_\beta (x) = \eta _\beta \sqrt{x^\top Q x} - \mu ^\top x, \end{aligned}$$
(23)

where \(\eta _\beta = \frac{-\,\int _{-\,\infty }^{\varPhi ^{-1}(1-\beta )}t\phi (t)dt}{1-\beta }\).

Further, we consider the worst case (robust) VaR for a fixed x with respect to the ambiguity set D defined as

$$\begin{aligned} {{\mathrm{RVaR}}}_\beta (x) = \sup _{\pi \in D} {{\mathrm{VaR}}}_\beta (x). \end{aligned}$$

Analogously, we consider the worst case (robust) CVaR for a fixed x with respect to set D defined as

$$\begin{aligned} {{\mathrm{RCVaR}}}_\beta (x) = \sup _{\pi \in D} {{\mathrm{CVaR}}}_\beta (x) = \sup _{\pi \in D} \min _\alpha F_\beta (x, \alpha ). \end{aligned}$$

Based on Chen et al. [15, proof of Theorem 2.9], further generalized in Paç and Pınar [46] using Shapiro [54, Theorem 2.4], we obtain the following representations of VaR and CVaR under distribution ambiguity,

$$\begin{aligned} {{\mathrm{RVaR}}}_\beta (x) = \frac{2\beta -1}{2\sqrt{\beta (1-\beta )}} \sqrt{x^\top Q x} - \mu ^\top x \end{aligned}$$
(24)

and

$$\begin{aligned} {{\mathrm{RCVaR}}}_\beta (x) = \sqrt{\frac{\beta }{1-\beta }} \sqrt{x^\top Q x} - \mu ^\top x. \end{aligned}$$
(25)

In the following section we consider cardinality constrained portfolio selection models for each of the risk measures (22)–(25) replacing the general risk function r(x) in (21).

The approach introduced in this paper can be also applied to the investment problems with CVaR under a finite discrete distribution of the random returns. Consider S, for example \(S=1000\), equiprobable scenarios \(\xi ^s\) of random rates of return \(\xi \), which can be obtained, e.g., by a simulation leading to the so called sample (average) approximation technique. Let

$$\begin{aligned} X = \{x:\ \mu ^\top x \ge \rho ,\ e^\top x = 1,\ 0 \le x \le {u},\ \Vert x\Vert _0 \le \kappa \} \end{aligned}$$

denote the set of feasible portfolio weights including the sparsity and minimal expected return. Then, the CVaR problem can be formulated as

$$\begin{aligned} \begin{aligned} \min _{x, \alpha , u}&\alpha + \frac{1}{(1-\beta )S} \sum _{s=1}^{S} u_s\\ \quad {\text {s.t.}}\quad&u_s \ge - x^\top \xi ^s - \alpha ,\ s = 1, \dots , S,\\&u_{s} \ge 0,\ s = 1, \dots , S,\\&x \in X, \end{aligned} \end{aligned}$$
(26)

where the nonnegative variables \(u_s\) are used instead of the positive parts, see Rockafellar and Uryasev [50, 51] for details. The only nonlinear part of the problem is represented by the sparsity constraint which can be replaced by the regularizations. However, our numerical experiment showed that there are some serious numerical problems in solving problems larger than 50 assets and 1000 scenarios. In particular, the applied solver SNOPT (see [30, 31]) had problems with the precision when the regularization parameter t decreased. Therefore we have decided not to include the numerical results here and consider this issue as a topic for future research which will employ a decomposition approach.

5 Numerical comparison of different solution methods

In this section, we compare the performance of the Scholtes regularization method introduced in this paper for cardinality constrained optimization problems with the Kanzow–Schwartz regularization from [11], the direct application of an NLP solver to the continuous reformulation (3) and the solution of (2) with a mixed integer solver. We test all four approaches on the investment problems described in the previous section with the VaR and CVaR measures under normality assumption and the robust VaR and CVaR. Moreover, we consider each problem for several levels of \(\beta \), in particular we select \(\beta \in \{0.9, 0.95, 0.99 \}\). Table 1 contains the values of the corresponding quantiles and generalized quantiles, which appear in the exact reformulations of the risk measures.

Table 1 Quantiles and generalized quantiles as defined in (22)–(25)

For the cardinality constraint we set \(\kappa =10\). We use 90 simulated instances with mean vectors and variance matrices, which were already employed by [26] and are freely available at website [27]. The generation of the data was described by [47]. For each number \(n = 200, 300\) and 400 of assets there are 30 different problems included in the dataset.

We compare the performance of the following solution approaches:

GUROBI_60::

Solve the mixed integer formulation (2) using the commercial mixed-integer solver GUROBI, version 6.5, with time limit of 60s and start vector \(x^0 = 0\), \(z^0 = e\).

GUROBI_300_40::

Same as above but with time limit 300s and node limit 40.

Relaxation_01::

Solve the continuous reformulation (3) directly using the sparse SQP method SNOPT, version 7.5, with start vectors \(x^0 = 0\), \(y^0 = e\).

Relaxation_00::

Same as above but with start vectors \(x^0 = 0\), \(y^0 = 0\).

Scholtes_01::

Solve a sequence of Scholtes regularizations (5) using SNOPT with starting point \(x^0 = 0\), \(y^0 = e\).

Scholtes_00::

Same as above but with start vectors \(x^0 = 0\), \(y^0 = 0\).

KanzowSchwartz_01::

Solve a sequence of Kanzow–Schwartz regularizations (4) using SNOPT with starting point \(x^0 = 0\), \(y^0 = e\).

KanzowSchwartz_00::

Same as above but with start vectors \(x^0 = 0\), \(y^0 = 0\).

All computations were done in MATLAB R2014a. Before we discuss the results, let us state a few details on the implementation of the respective solution approaches:

More information on the solver GUROBI and its various options can be found at [33]. To be able to solve the mixed-integer problem (2) with GUROBI, we had to reformulate it in the following form:

$$\begin{aligned} \begin{aligned} \min _{x,z,w,v} \,\,\,&c_\beta v - \mu ^\top x \\ \text {s.t. }&e^\top x = 1,\\&0 \le x \le u \circ z,\\&z \in \{0, 1\}^n,\\&e^\top z \le \kappa ,\\&v \ge 0,\\&w = Q^{\frac{1}{2}} x,\\&v^2 \ge w^\top w, \end{aligned} \end{aligned}$$

where \(c_\beta \) is the respective constant from Table 1 for the different risk measures. Since we used \(x^0 = 0\) as start vector, we also used \(w^0 = 0\) and \(v^0 = 0\).

Note that GUROBI is a global solver, i.e. it tries to verify that a candidate solution is indeed a global minimum. Since the other solution approaches do not provide any guarantee of finding a global solution, we set the option mipfocus to 1 in order to encourage GUROBI to try to find good solutions fast. Additionally we set the option timelimit to 60s at first. However, we observed that GUROBI sometimes spent the whole time by looking for a feasible solution without moving to the branch-and-bound tree. Thus we increased timelimit to 300s and added the condition on the maximal number of computed nodes \(\texttt {nodelimit} = 40\) to obtain results less dependent on slight variations in computation time.

The continuous reformulation (3) and the regularized problems (4) and (5) were all solved using the sparse SQP method SNOPT, see [30, 31] for more information. We started both regularization methods with an initial parameter \(t^0 = 1\) and decreased \(t^k\) in each iteration by a factor of 0.01. Both regularization methods were terminated if either \(\Vert x^k \circ y^k\Vert _\infty \le 10^{-6}\) or \(t^k < 10^{-8}\).

The constraints \(e^\top x = 0\) and \(0 \le x \le u\) were usually satisfied in the solutions \(x^*\) found by all methods (except for GUROBI, which occasionally did not return a feasible solution at all, see below). In order to check whether the cardinality constraint \(\Vert x\Vert _0 \le \kappa \) are also satisfied, we counted the number of all components \(x^*_i > 10^{-6}\).

Table 2 contains results for a particular problem with 400 assets (pard400-e-400). We can see that GUROBI running 60 s was not able to provide a feasible solution for problem with \({{\mathrm{RVaR}}}_{0.99}\). The Scholtes regularization starting from point \(x^0 = 0\), \(y^0= e\) was not successful for \({{\mathrm{RCVaR}}}_{0.95}\). However, in all other cases the Scholtes regularization starting from \(x^0 = 0\), \(y^0 = e\) provided the best solution with a runtime around 1s. In Table 2 we also report the relative gap

$$\begin{aligned} \text {relative gap} = (f-f_\text {best})/f_\text {best}, \end{aligned}$$

where f is the objective value obtained by an algorithm and \(f_\text {best}\) denotes the lowest objective value for a problem computed by any of the compared algorithms.

Table 2 Results for a problem with 400 assets (pard400-e-400)

Summary results for all problems are reported in Tables 3, 4 and 5. For each problem with a particular risk measure, level \(\beta \), number of assets and algorithm we report the following descriptive statistics over 30 instances of problems: average relative gap with respect to the minimal objective value, average computation time (in seconds), number of cases when the algorithm found the best solution, number of cases when the result was infeasible with respect to the sparsity or orthogonality. All computations were done on two computers with comparable performance indicators. Nonetheless, the given computation times should only be used for a qualitative comparison of the methods.

Table 3 Results for 30 instances with 200 assets
Table 4 Results for 30 instances with 300 assets
Table 5 Results for 30 instances with 400 assets

If we consider the initial value \(x^0 = 0\), \(y^0 = e\), it can be observed that the best results were obtained by the Scholtes regularization (Alg. 5). When the results of this regularization were feasible, they correspond to the best obtained solutions. However, in a few cases the portfolios obtained by the Scholtes regularization and in more cases the results obtained by the Kanzow–Schwartz regularization (Alg. 7) were infeasible. Also SNOPT applied directly to the continuous reformulation (Alg. 3) behaved very badly showing an average relative gap greater than 100%.

To further investigate the behavior, we changed the starting point of the continuous reformulation and both regularizations to \(x^0 = 0\), \(y^0 = 0\). In this case, the obtained optimal values were slightly worse for both regularizations, but we have reduced the problems with infeasible solutions. Moreover, for the starting point \(x^0 = 0\), \(y^0 = 0\), the behavior of the continuous reformulation approach (Alg. 4) improved significantly such that it is fully comparable with the regularizations.

Figures 2, 3 and 4 present performance plots introduced by Dolan and Moré [22] for each problem size and algorithm. We identified the minimal objective value for each problem found by any of the eight considered algorithms and then compared it with the remaining objective values using the ratio: actual objective value/minimal objective value. The graphs report the relative number of problems (y-value), where the ratio is lower or equal to the x-value. We would prefer algorithms with the curve close to the upper-left corner, i.e. which produce good and feasible solutions. Since infeasible problems are considered with an infinite objective function value, not all algorithm curves touch the upper bound 1. This is the case for the regularized problems with \(x^0 = 0\), \(y^0 = e\) for all problem sizes. For the largest problems with 400 assets, even GUROBI with 60s limit and Kanzow–Schwartz regularization starting from \(x^0 = 0\), \(y^0 = 0\) were not able to reach the upper bound 1.

Fig. 2
figure 2

Performance plot of the objective function values for \(n = 200\) assets

Fig. 3
figure 3

Performance plot of the objective function values for \(n = 300\) assets

Fig. 4
figure 4

Performance plot of the objective function values for \(n = 400\) assets

6 Conclusions

We adapted the Scholtes regularization method to optimization problems with cardinality constraints and proved its convergence to S-stationary points, which is stronger than the corresponding result known for MPCCs. Additionally, we verified that the corresponding regularized problems have better properties than the original one. We discussed several possible risk measures for portfolio optimization under the assumption of normality and distributional ambiguity. Finally, we compared several solution algorithms applied to these portfolio optimization problems and showed that the Scholtes regularization can keep up even with the commercial solver GUROBI, at least for fast, possibly local solutions.

Sufficient conditions for the existence and convergence of the iterates of the Scholtes-type regularization method can be found in [9]. Future research will be devoted to developing a global solution strategy based on several starting points and combinations of the proposed methods. A further subject of future research are the convergence properties of the regularization method if one only computes a sequence of inexact KKT points.