1 Introduction

In many decision making problems, such as those arising in relief network design, homeland security budget allocation, and financial management, there are multiple random performance measures of interest. In such problems, comparing the potential decisions requires specifying preference relations among random vectors, where each dimension of a vector corresponds to a performance measure (or decision criterion). Moreover, it is often crucial to take into account decision makers’ risk preferences. Incorporating stochastic multivariate preference relations into optimization models is a fairly recent research area. The existing models feature benchmarking preference relations as constraints, requiring the decision-based random vectors to be preferred (according to the specified preference rules) to some benchmark random vectors. The literature mainly focuses on multivariate risk-averse preference relations based on SSD or CVaR.

The SSD relation has received significant attention due to its correspondence with risk-averse preferences [10]. In this regard, the majority of existing studies on optimization models with multivariate risk constraints extend the univariate SSD rule to the multivariate case. In this line of research, scalar-based preferences are extended to vector-valued random variables by considering a family of linear scalarization functions and requiring that all scalarized versions of the random vectors conform to the specified univariate preference relation. Scalarization coefficients can be interpreted as weights representing the subjective importance of each decision criterion. Thus, the scalarization approach is closely related to the weighted sum method, which is widely used in multicriteria decision making (see, e.g., [8]). In such decision-making situations, enforcing a preference relation over a family of scalarization vectors allows the representation of a wider range of views and differing opinions of multiple experts (for motivating discussions see, e.g., [15]). Dentcheva and Ruszczyński [5] consider linear scalarization with all nonnegative coefficients (this set can be equivalently truncated to a unit simplex), and provide a theoretical background for the multivariate SSD-constrained problems. On the other hand, Homem-de-Mello and Mehrotra [12] and Hu et al. [14] allow arbitrary polyhedral and convex scalarization sets, respectively.

Optimization models with univariate SSD constraints can be formulated as linear programs with a potentially large number of scenario-dependent variables and constraints (see, e.g., [4, 16, 18]). While efficient cut generation methods can be employed to solve such large-scale linear programs [6, 9, 20], enforcing these constraints for infinitely many scalarization vectors causes additional challenges. For finite probability spaces, Homem-de-Mello and Mehrotra [12] show that infinitely many risk constraints (associated with polyhedral scalarization sets) reduce to finitely (typically exponentially) many scalar-based risk constraints for the SSD case, naturally leading to a finitely convergent cut generation algorithm. However, such an algorithm is computationally demanding as it requires the iterative solution of non-convex (difference of convex functions) cut generation subproblems. The authors formulate the cut generation problem as a binary mixed-integer program (MIP) by linearizing the piecewise linear shortfall terms, and develop a branch-and-cut algorithm. They also propose concavity and convexity inequalities, and a big-M improvement method within the branch-and-cut tree to strengthen the MIP. However, it appears that for the practical applications, the authors directly solve the MIP formulation of the cut generation problem [13, 14]. In another line of work, Dentcheva and Wolfhagen [7] use methods from difference of convex (DC) programming to perform cut generation for the multivariate SSD-constrained problem. The authors also provide a finite representation of the multivariate SSD relation if the decisions are taken in a finite dimensional space, even if the probability space is not finite.

A few studies [1, 11] consider the multivariate SSD relation based on multidimensional utility functions instead of using scalarization functions. The resulting models enforce stricter dominance relations (than those based on the scalarization approach) but they can be formulated as linear programs, and hence, are computationally more tractable. On the other hand, the scalarization approach allows us to use univariate SSD constraints, which are less conservative than the multivariate version, and also offers the flexibility to control the degree of conservatism by varying the scalarization sets. However, the scalarization-based multivariate SSD relation can still be overly conservative in practice and leads to infeasible formulations. As an alternative, Noyan and Rudolf [17] propose the use of constraints based on coherent risk measures, which provide sufficient flexibility to lead to feasible problem formulations while still being able to capture a broad range of risk preferences. In particular, they focus on the widely applied risk measure CVaR, and replace the multivariate SSD relation by a collection of multivariate CVaR constraints at various confidence levels. This is a very natural relaxation due to the well-known fact that the univariate SSD relation is equivalent to a continuum of \({\text {CVaR}}\) inequalities [4]; we note that a similar idea also led to a cutting plane algorithm for the optimization models with univariate SSD constraints [2]. Noyan and Rudolf [17] define the multivariate CVaR constraints based on the polyhedral scalarization sets; as a result, their modeling approach strikes a good balance between tractability and flexibility. They show that, similar to the SSD-constrained counterpart, it is sufficient to consider finitely many scalarization vectors, and propose a finitely convergent cut generation algorithm. The corresponding cut generation problem has the DC programming structure, as in the SSD case, with similar MIP reformulations involving big-M type constraints. In addition, the authors utilize alternative optimization representations of CVaR to develop MIP formulations for the cut generation problem for the polyhedral CVaR-constrained problem.

Despite the existing algorithmic developments, solving the MIP formulations of the cut generation problems can increasingly become a computational bottleneck as the number of scenarios increases. According to the results presented in Hu et al. [13] and Noyan and Rudolf [17], the cut generation generally takes no less than 90–95 % of the total time spent. The DC functions encountered in the cut generation problems have polyhedral structure that can be exploited to devise enhanced and easy-to-implement models. In line with these discussions, this paper contributes to the literature by providing more effective and easy-to-implement methods to solve the cut generation problems arising in optimization under multivariate polyhedral SSD and CVaR constraints. For SSD-constrained problems, the cut generation problems naturally decompose by scenarios, and the main difficulty is due to the weakness of the MIP formulation involving big-M type constraints. A similar difficulty arises in CVaR-constrained problems. However, in this case, an additional challenge stems from the combinatorial structure required to identify the \(\alpha \)-quantile of the decision-based random variables. Therefore, this study is mainly dedicated to developing computationally efficient methods for the multivariate CVaR-constrained models. However, we also describe how our results can be applied in the SSD case. As in the previous studies, we focus on finite probability spaces, and our approaches can naturally be used in a framework based on sample average approximation.

In the next section, we present the general forms of the optimization models featuring the multivariate polyhedral risk preferences as constraints. In Sect. 3, we study the cut generation problem arising in CVaR-constrained models. We give a new MIP formulation, and several classes of valid inequalities that improve this formulation. In addition, we propose variable fixing methods that are highly effective in certain classes of problems. The cut generation problem involves the epigraph of a piecewise linear concave function, which we refer to as a reverse concave set. We give the complete linear description of this non-convex substructure. In Sect. 4, we give analogous results for SSD-constrained models. We emphasize that the reverse concave sets featured in CVaR and SSD cut generation problems are fundamental sets that may appear in other problems. In Sect. 5, we present our computational experiments on two data sets: a previously studied budget allocation problem and a set of randomly generated test instances. Our results show that the proposed methods lead to more effective cut generation-based algorithms to solve the multivariate risk-constrained optimization models. We conclude the paper in Sect. 6.

2 Optimization with multivariate risk constraints

In this section, we present the general forms of the optimization models featuring multivariate CVaR and SSD constraints based on polyhedral scalarization. Before proceeding, we need to make a note of some conventions used throughout the paper. Larger values of random variables, as well as larger values of risk measures, are considered to be preferable. In this context, risk measures are often referred to as acceptability functionals, since higher values indicate less risky random outcomes. The set of the first n positive integers is denoted by \([n]= \{1,\ldots ,n\}\), while the positive part of a number \(x\in {\mathbbm {R}}\) is denoted by \([x]_+ = \max \{x, 0\}\). Throughout this paper, we assume that all random variables are defined on some finite probability spaces, and simplify our exposition accordingly when possible.

We consider a decision making problem where the multiple random performance measures associated with the decision vector \({\mathbf {z}}\) are represented by the random outcome vector \(G({\mathbf {z}})\). Let \((\varOmega ,2^\varOmega ,\mathcal {P})\) be a finite probability space with \(\varOmega =\{\omega _1,\ldots ,\omega _n\}\) and \(\mathcal {P}(\omega _i)=p_i\). The set of feasible decisions is denoted by Z and the random outcomes are determined according to the mapping \(G:Z\times \varOmega \rightarrow {\mathbbm {R}}^d\). Let \(f:Z\rightarrow {\mathbbm {R}}\) be a continuous objective function and \(C\subset {\mathbbm {R}}_+^d\) be a polytope of scalarization vectors. Considering the interpretation of the scalarization vectors and the fact that larger outcomes are preferred, we naturally assume that \(C\subseteq \{{\mathbf {c}}\in {\mathbbm {R}}_+^d:~\sum _{i\in [d]}c_i= 1\}\). Given the benchmark (reference) random outcome vector \({\mathbf {Y}}\) and the confidence level \(\alpha \in (0,1]\), the optimization problems involving the multivariate polyhedral CVaR and SSD constraints take, respectively, the following forms:

$$\begin{aligned} \left( {\mathbf {G-MCVaR}}\right) {}&\,&\max \quad&f({\mathbf {z}}) \nonumber \\&\,&\text {s.t.}\quad&{\text {CVaR}}_{\alpha }({\mathbf {c}}^\top G({\mathbf {z}})) \ge {\text {CVaR}}_{\alpha }({\mathbf {c}}^\top {\mathbf {Y}}),&\forall ~{\mathbf {c}}\in C,\\&\,&\,&{\mathbf {z}}\in Z.\nonumber \end{aligned}$$
(1)
$$\begin{aligned} \left( {\mathbf {G-MSSD}}\right) {}&\,&\max \quad&f({\mathbf {z}}) \nonumber \\&\,&\text {s.t.}\quad&{\mathbf {c}}^\top G({\mathbf {z}})\succeq _{_{(2)}} {\mathbf {c}}^\top {\mathbf {Y}},&\forall ~{\mathbf {c}}\in C, \\&\,&\,&{\mathbf {z}}\in Z,\nonumber \end{aligned}$$
(2)

where \(X\succcurlyeq _{(2)}Y\) denotes that the univariate random variable X dominates Y in the second order. While \({\mathbf {Y}}\) is allowed to be defined on a probability space different from \(\varOmega \), it is often constructed from a benchmark decision \(\bar{{\mathbf {z}}}\in Z\), i.e., \({\mathbf {Y}}=G(\bar{{\mathbf {z}}})\). For ease of exposition, we present the formulations with a single multivariate risk constraint. However, we can also consider multiple benchmarks, multiple confidence levels, and varying scalarization sets.

According to the results on finite representations of the scalarization polyhedra, it is sufficient to consider finitely many scalarization vectors in (1) and (2). However, these vectors correspond to the vertices of some higher dimensional polyhedra, and therefore, there are still potentially exponentially many scalarization-based risk constraints. A natural approach is to solve some relaxations of the above problems obtained by replacing the set C with a finite subset (can be even empty). This subset is augmented by adding the scalarization vectors generated in an iterative fashion. In this spirit, at each iteration of such a cut generation algorithm, given a current decision vector, we attempt to find a scalarization vector for which the corresponding risk constraint [of the form (1) or (2)] is violated. The corresponding cut generation problem is the main focus of our study.

3 Cut generation for optimization with multivariate CVaR constraints

In this section, we first briefly describe the cut generation problem arising in optimization problems of the form \(\left( {\mathbf {G-MCVaR}}\right) \). Then we proceed to discuss the existing mathematical programming formulations of this cut generation problem, which constitute a basis for our new developments. The rest of the section is dedicated to the proposed, computationally more effective formulations and methods.

Consider an iteration of the cut generation-based algorithm (proposed in Noyan and Rudolf [17]), and let \({\mathbf {X}}=G({\mathbf {z}}^*)\) be the random outcome vector associated with the decision vector \({\mathbf {z}}^*\) obtained by solving the current relaxation of \(\left( {\mathbf {G-MCVaR}}\right) {}\). The aim is to either find a vector \({\mathbf {c}}\in C\) for which the corresponding univariate CVaR constraint (1) is violated or to show that such a vector does not exist. In this regard, we solve the cut generation problem at confidence level \(\alpha \in (0,1]\) of the general form

$$\begin{aligned} \left( {\mathbf {CutGen}}\_{\mathbf {CVaR}}\right) {}\quad \quad \min \limits _{{\mathbf {c}}\in C} {\text {CVaR}}_{\alpha }\left( {\mathbf {c}}^\top {\mathbf {X}}\right) -{\text {CVaR}}_{\alpha }\left( {\mathbf {c}}^\top {\mathbf {Y}}\right) . \end{aligned}$$

Observe that \(\left( {\mathbf {CutGen}}\_{\mathbf {CVaR}}\right) \) involves the minimization of the difference of two concave functions, because \({\text {CVaR}}_{\alpha }(X)\), given by (Rockafellar and Uryasev [19])

$$\begin{aligned} {\text {CVaR}}_{\alpha }(X)=\max \left\{ \eta -\frac{1}{\alpha }\mathbbm {E}\left( [\eta -X]_+\right) {:}~\eta \in {\mathbbm {R}}\right\} , \end{aligned}$$
(3)

is a concave function of a scalar-based random variable X. It is well known that the maximum in definition (3) is attained at the \(\alpha \)-quantile, also known as the value-at-risk at confidence level \(\alpha \) denoted by \({\text {VaR}}_\alpha (X)\). If the optimal objective value of \(\left( {\mathbf {CutGen}}\_{\mathbf {CVaR}}\right) \) is non-negative, it follows that \({\mathbf {z}}^*\) is an optimal solution of \(\left( {\mathbf {G-MCVaR}}\right) {}\). Otherwise, there exists an optimal solution \({\mathbf {c}}^*\in C\) for which the corresponding constraint \({\text {CVaR}}_{\alpha }({{\mathbf {c}}^*}^\top {\mathbf {X}})\ge {\text {CVaR}}_{\alpha }({{\mathbf {c}}^*}^\top {\mathbf {Y}})\) is violated by the current solution.

Note that we can easily calculate the realizations of the random outcome \({\mathbf {X}}=G({\mathbf {z}}^*)\) given the decision vector \({\mathbf {z}}^*\). In the rest of the paper, we focus on solving the cut generation problems given two d-dimensional random vectors \({\mathbf {X}}\) and \({\mathbf {Y}}\) with realizations \({\mathbf {x}}_1,\ldots ,{\mathbf {x}}_n\) and \({\mathbf {y}}_1,\ldots ,{\mathbf {y}}_m\), respectively. Let \(p_1,\ldots ,p_n\) and \(q_1,\ldots ,q_m\) denote the corresponding probabilities.

3.1 Existing mathematical programming formulations

In this section, we present one of the existing mathematical programming formulations of \(\left( {\mathbf {CutGen}}\_{\mathbf {CVaR}}\right) \). The second nonlinear term (\(-{\text {CVaR}}_{\alpha }({\mathbf {c}}^\top {\mathbf {Y}})\)) in \(\left( {\mathbf {CutGen}}\_{\mathbf {CVaR}}\right) \) can be expressed with linear inequalities and continuous variables because it involves the maximization of a piecewise linear concave function (see (3)). What makes it difficult to solve \(\left( {\mathbf {CutGen}}\_{\mathbf {CVaR}}\right) \) is the minimization of the first concave term (\({\text {CVaR}}_{\alpha }({\mathbf {c}}^\top {\mathbf {X}})\)). Using two alternative optimization representations of CVaR, Noyan and Rudolf [17] first formulate \(\left( {\mathbf {CutGen}}\_{\mathbf {CVaR}}\right) \) as a (generally nonconvex) quadratic program. Then instead of dealing with the quadratic problem, the authors propose MIP formulations which are considered to be potentially more tractable.

Note that for finite probability spaces \({\text {VaR}}_{\alpha }({\mathbf {c}}^\top {\mathbf {X}})={\mathbf {c}}^\top {\mathbf {x}}_k\) for at least one \(k\in [n]\), implying

$$\begin{aligned} {\text {CVaR}}_{\alpha }\left( {\mathbf {c}}^\top {\mathbf {X}}\right)&= {\text {VaR}}_{\alpha }\left( {\mathbf {c}}^\top {\mathbf {X}}\right) -\frac{1}{\alpha }\sum \limits _{i\in [n]}p_i[{\text {VaR}}_{\alpha }\left( {\mathbf {c}}^\top {\mathbf {X}}\right) -{\mathbf {c}}^\top {\mathbf {x}}_i]_+ \end{aligned}$$
(4)
$$\begin{aligned}&=\max \limits _{k\in [n]}\left\{ {\mathbf {c}}^\top {\mathbf {x}}_k-\frac{1}{\alpha }\sum \limits _{i\in [n]}p_i[{\mathbf {c}}^\top {\mathbf {x}}_k-{\mathbf {c}}^\top {\mathbf {x}}_i]_+\right\} . \end{aligned}$$
(5)

This key observation leads to the following formulation of \(\left( {\mathbf {CutGen}}\_{\mathbf {CVaR}}\right) \) [17]:

$$\begin{aligned} \left( {\mathbf {MIP}}\_{\mathbf {CVaR}}\right) {}&\,&\min \quad&\mu -\eta +\frac{1}{\alpha }\sum \limits _{l\in [m]} q_lw_{l}\nonumber \\&\,&\text {s.t.}\quad&w_l\ge \eta -{\mathbf {c}}^\top {\mathbf {y}}_l,&\qquad \quad \forall ~l\in [m], \end{aligned}$$
(6)
$$\begin{aligned}&\,&\,&{\mathbf {c}}\in C,~{\mathbf {w}}\in {\mathbbm {R}}_+^m,\end{aligned}$$
(7)
$$\begin{aligned}&\,&\,&\mu \ge {\mathbf {c}}^\top {\mathbf {x}}_k-\frac{1}{\alpha }\sum \limits _{i\in [n]} p_iv_{ik},&\qquad \quad \forall ~k\in [n],\end{aligned}$$
(8)
$$\begin{aligned}&\,&\,&v_{ik}-\delta _{ik}={\mathbf {c}}^\top {\mathbf {x}}_k-{\mathbf {c}}^\top {\mathbf {x}}_i,&\qquad \quad \forall ~i\in [n],~k\in [n],\end{aligned}$$
(9)
$$\begin{aligned}&\,&\,&v_{ik}\le M_{ik}\beta _{ik},&\qquad \forall ~i\in [n],~k\in [n],\end{aligned}$$
(10)
$$\begin{aligned}&\,&\,&\delta _{ik}\le \hat{M}_{ik}(1-\beta _{ik}),&\qquad \quad \forall ~i\in [n],~k\in [n],\end{aligned}$$
(11)
$$\begin{aligned}&\,&\,&\beta _{ik}\in \{0,1\},&\qquad \quad \forall ~i\in [n],~k\in [n],\end{aligned}$$
(12)
$$\begin{aligned}&\,&\,&{\mathbf {v}}\in {\mathbbm {R}}_+^{n \times n},\quad \varvec{\delta }\in {\mathbbm {R}}_+^{n \times n}. \end{aligned}$$
(13)

Here, the continuous variables \(\eta \) and \({\mathbf {w}}\) together with the linear inequalities (6) are used to express \({\text {CVaR}}_{\alpha }({\mathbf {c}}^\top {\mathbf {Y}})\) according to (3). On the other hand, \(\mu \) represents \({\text {CVaR}}_{\alpha }({\mathbf {c}}^\top {\mathbf {X}})\) according to the relation (5), which can be incorporated into the model using the following non-convex constraint

$$\begin{aligned} \mu \ge {\mathbf {c}}^\top {\mathbf {x}}_k-\frac{1}{\alpha }\sum \limits _{i\in [n]}p_i\left[ {\mathbf {c}}^\top {\mathbf {x}}_k-{\mathbf {c}}^\top {\mathbf {x}}_i\right] _+, \quad \forall k\in [n]. \end{aligned}$$

This non-convex constraint corresponds to the epigraph of a piecewise linear concave function, and the variables \(v_{ik}\) and \(\delta _{ik}\) are introduced to linearize the shortfall terms \([{\mathbf {c}}^\top {\mathbf {x}}_k-{\mathbf {c}}^\top {\mathbf {x}}_i]_+\). In addition, \(M_{ik}\) and \(\hat{M}_{ik}\) are sufficiently large constants (big-M coefficients) to make constraints (10) and (11) redundant whenever the right-hand side is positive. Due to constraints (10)–(13) only one of the variables \(v_{ik}\) and \(\delta _{ik}\) is positive. Then, constraint (9) ensures that \(v_{ik}=[{\mathbf {c}}^\top {\mathbf {x}}_k-{\mathbf {c}}^\top {\mathbf {x}}_i]_+\) for all pairs of i and k. A similar linearization is used for the SSD case described in Sect. 4.

Remark 3.1

(Big-M Coefficients) It is well-known that the choice of the big-M coefficients is crucial in obtaining stronger MIP formulations. In \(\left( {\mathbf {MIP}}\_{\mathbf {CVaR}}\right) {}\), we can set

$$\begin{aligned} M_{ik}= & {} \max \left\{ \max \limits _{{\mathbf {c}}\in C}~\left\{ {\mathbf {c}}^\top {\mathbf {x}}_k-{\mathbf {c}}^\top {\mathbf {x}}_i\right\} ,0\right\} \text { and }\\ \hat{M}_{ik}= & {} M_{ki}=\max \left\{ \max \limits _{{\mathbf {c}}\in C}~\left\{ {\mathbf {c}}^\top {\mathbf {x}}_i-{\mathbf {c}}^\top {\mathbf {x}}_k\right\} ,0\right\} . \end{aligned}$$

These parameters can easily be obtained by solving very simple LPs. Furthermore, in practical applications, the dimension of the decision vector \({\mathbf {c}}\) and the number of vertices of the polytope C would be small; e.g., in the homeland security problem in our computational study \(d=4\). Suppose that the vertices of the polytope C are known and given as \(\{\hat{{\mathbf {c}}}_1,\ldots ,\hat{{\mathbf {c}}}_N\}\). Then, \(M_{ik}=\max \{\max _{j\in [N]}\hat{{\mathbf {c}}}_j^\top ({\mathbf {x}}_k-{\mathbf {x}}_i),0\}.\)

In the special case when all the outcomes of \({\mathbf {X}}\) are equally likely, Noyan and Rudolf [17] propose an alternate MIP formulation which involves only O(n) binary variables instead of \(O(n^2)\). We refer to the existing paper for the complete formulation of this special MIP, which is referred to as \(\left( {\mathbf {MIP}}\_{\mathbf {Special}}\right) \) in our study. In the next section, we develop new formulations and methods based on integer programming approaches. We only focus on the general probability case; it turns out that even these general formulations perform better than \(\left( {\mathbf {MIP}}\_{\mathbf {Special}}\right) {}\) as we show in Sect. 5.

3.2 New developments

In this section, we first propose several simple improvements to the existing MIP formulations. Then, we introduce a MIP formulation based on a new representation of VaR. We propose valid inequalities that strengthen the resulting MIPs. We also give the complete linear description of the linearization polytope of a non-convex substructure appearing in the new formulation.

3.2.1 Computational enhancements

We first present valid inequalities based on the bounds for \({\text {CVaR}}_{\alpha }({\mathbf {c}}^\top {\mathbf {X}})\), and then describe two approaches to reduce the number of variables and constraints of \(\left( {\mathbf {MIP}}\_{\mathbf {CVaR}}\right) \).

Bounds on \({\text {CVaR}}_{\alpha }({\mathbf {c}}^\top {\mathbf {X}})\). Suppose that we have a lower bound \(L_\mu \) and an upper bound \(U_\mu \) for \({\text {CVaR}}_{\alpha }({\mathbf {c}}^\top {\mathbf {X}})\). Then, \(\left( {\mathbf {MIP}}\_{\mathbf {CVaR}}\right) \) can be strengthened using the following valid inequalities:

$$\begin{aligned} L_\mu \le \mu \le U_\mu . \end{aligned}$$
(14)

For example, consider two discrete random variables \(X_{\min }\) and \(X_{\max }\) with realizations \(\min _{{\mathbf {c}}\in C}\{{\mathbf {c}}^\top {\mathbf {x}}_i\},~i\in [n]\), and \(\max _{{\mathbf {c}}\in C}\{{\mathbf {c}}^\top {\mathbf {x}}_i\},~i\in [n]\), respectively. The random variable \(X_{\min }\) is no larger than \({\mathbf {c}}^\top {\mathbf {X}}\) with probability one for any \({\mathbf {c}}\in C\). Similarly, \(X_{\max }\) is no smaller than \({\mathbf {c}}^\top {\mathbf {X}}\) with probability one for any \({\mathbf {c}}\in C\). Therefore, we can set \(L_\mu \) and \(U_\mu \) as \({\text {CVaR}}_{\alpha }(X_{\min })\) and \({\text {CVaR}}_{\alpha }(X_{\max })\), respectively. Note that the calculation of the realizations of \(X_{\min }\) and \(X_{\max }\) requires solving n small (d-dimensional) LPs.

Variable reduction using symmetry. We observe the symmetric relation between the \(\varvec{\delta }\) and \({\mathbf {v}}\) variables (\(\delta _{ik}=v_{ki}\) for all pairs of \(i\in [n]\) and \(k\in [n]\)), and substitute \(v_{ki}\) for \(\delta _{ik}\) to obtain a more compact formulation. In this regard, we only need to define \(\beta _{ik}\) for \(i,k \in [n]\) such that \(i<k\), and write constraints (9)–(11) for \(i,k \in [n]:~i<k\). Furthermore, we substitute \(M_{ki}\) for \(\hat{M}_{ik}\), and let \(v_{kk}=0\) in (8). We refer to the resulting simplified MIP as \(\left( {\mathbf {SMIP}}\_{\mathbf {CVaR}}\right) \); the number of binary variables and constraints (9)–(11) associated with the shortfall terms is reduced by half. Furthermore, the linearization polytope defined by (9)–(13) can be strengthened using valid inequalities. In Sect. 4.2, we study the linearization polytope corresponding to \([{\mathbf {c}}^\top {\mathbf {x}}_k-{\mathbf {c}}^\top {\mathbf {x}}_i]_+\) for a given pair \(i,k\in [n]\). This substructure also arises in the cut generation problems with multivariate SSD constraints.

Preprocessing. Let K be a set of scenarios for which \({\mathbf {c}}^\top {\mathbf {x}}_k\) cannot be equal to \({\text {VaR}}_{\alpha }({\mathbf {c}}^\top {\mathbf {X}})\) for any \({\mathbf {c}}\in C\). Preprocessing methods can be used to identify the set K, which would allow us to enforce constraint (8) for a reduced set of scenarios \(k\in \bar{K}:=[n]{\setminus } K\). This would also result in reduced number of variables and constraints (9)–(13) that are used to represent the shortfall terms. In particular, we need to define the variables \(v_{ik}\) only for all \(k \in \bar{K}, i\in [n]\) and for \(i \in \bar{K}, k\in {K}\). In addition, we define variables \(\beta _{ik}\) and constraints (9)–(11) for \(i,k\in \bar{K}, i<k\) and for \(k \in \bar{K}, i\in {K}\) (note that due to the elimination of some of the v variables, the symmetry argument does not hold for the latter condition, so we do not have the restriction that \(i<k\) unless \(i,k\in \bar{K}\)). We refer to the resulting more compact MIP, which also involves (14), as \(\left( {\mathbf {RSMIP}}\_{\mathbf {CVaR}}\right) \).

Next, we elaborate on how to identify K that yields a reduced set of scenarios \(\bar{K}\). Recall that we focus on the left tail of the probability distributions; for example, under equal probabilities, \({\text {VaR}}_{b/n}({\mathbf {c}}^\top {\mathbf {X}})\) is the bth smallest realization of \({\mathbf {c}}^\top {\mathbf {X}}\) where b is a small integer. Thus, \({\mathbf {c}}^\top {\mathbf {x}}_k\) values which definitely take relatively larger values cannot correspond to \({\text {VaR}}_{b/n}({\mathbf {c}}^\top {\mathbf {X}})\). In line with these discussions, we use the next proposition to identify the set \(\bar{K}=[n]{\setminus } K\).

Proposition 3.1

Suppose that the parameters \(M_{ki}\) are calculated as described in Remark 3.1. For a scenario index \(k\in [n]\), let \(L_k=\{i \in [n]{\setminus } {k}{:}~M_{ki}=0\}\) and \(H_k=\{i \in [n] {\setminus } {k}{:}~M_{ik}=0\}\). If \(\sum _{i \in L_k}p_i \ge \alpha \) then \({\mathbf {c}}^\top {\mathbf {x}}_k= {\text {VaR}}_{\alpha }({\mathbf {c}}^\top {\mathbf {X}})\) cannot hold for any \({\mathbf {c}}\in C\), implying \(k \in {K}\). Moreover, \(i \in {K}\) for all \(i \in H_k\).

Proof

Note that for any \(k\in [n]\) and \(i\in L_k\), \(M_{ki}=0\) implies that \({\mathbf {c}}^\top {\mathbf {x}}_i\le {\mathbf {c}}^\top {\mathbf {x}}_k\) for all \({\mathbf {c}}\in C\). Thus, the first claim immediately follows from the following VaR definition: Let \({\mathbf {c}}^\top {\mathbf {x}}_{(1)}\le {\mathbf {c}}^\top {\mathbf {x}}_{(2)}\le \cdots \le {\mathbf {c}}^\top {\mathbf {x}}_{(n)}\) denote an ordering of the realizations of \({\mathbf {c}}^\top {\mathbf {X}}\) for a given \({\mathbf {c}}\). Then, for a given confidence level \(\alpha \in (0,1]\),

$$\begin{aligned} {\text {VaR}}_\alpha \left( {\mathbf {c}}^\top {\mathbf {X}}\right) ={\mathbf {c}}^\top {\mathbf {x}}_{(k)}, \text { where } k=\min \left\{ j \in [n]:\sum _{i\in [j]}p_{(i)}\ge \alpha \right\} . \end{aligned}$$
(15)

Similarly, the second claim holds because \(L_k\subseteq L_i\) for all \(i \in H_k\). \(\square \)

Note that if for some \(k\in [n]\) we have non-empty sets \(L_k\) or \(H_k\), then we can employ variable fixing by letting \(\beta _{ik}=1\), \(\beta _{ki}=0\) for \(i\in L_k\) and \(\beta _{ik}=0\), \(\beta _{ki}=1\) for \(i\in H_k\). Another method can utilize the bounds on \({\text {VaR}}_{\alpha }({\mathbf {c}}^\top {\mathbf {X}})\) while identifying the set \(\bar{K}\). Suppose that we have a lower bound L and an upper bound U for \({\text {VaR}}_{\alpha }({\mathbf {c}}^\top {\mathbf {X}})\). If \(\max _{{\mathbf {c}}\in C}{\mathbf {c}}^\top {\mathbf {x}}_k <L\) or \(\min _{{\mathbf {c}}\in C}{\mathbf {c}}^\top {\mathbf {x}}_k > U\), then \(k \notin \bar{K}\). Similar to the case of CVaR, we can calculate the bounds L and U using the random variables \(X_{\min }\) and \(X_{\max }\): \(L={\text {VaR}}_{\alpha }(X_{\min })\) and \(U={\text {VaR}}_{\alpha }(X_{\max })\).

In our numerical study, we have observed that the above methods can significantly impact the computational performance (see Sect. 5).

3.2.2 An alternative model based on a new representation of VaR

When the realizations are based on a decision, we cannot know their ordering in advance. While the structure of the objective function makes it easy to express VaR in the context of VaR or CVaR maximization, in our cut generation problem we need a new representation of VaR. Recall that we can use the classical definition of CVaR in the second CVaR term appearing in the objective function of \(\left( {\mathbf {CutGen}}\_{\mathbf {CVaR}}\right) \), but for the first CVaR term we need alternative representations of CVaR to develop new computationally more efficient solution methods. The main challenge is to express \({\text {CVaR}}_{\alpha }({\mathbf {c}}^\top {\mathbf {X}})\), which depends on \({\text {VaR}}_{\alpha }({\mathbf {c}}^\top {\mathbf {X}})\). The next theorem provides a set of inequalities to calculate \({\text {VaR}}_{\alpha }({\mathbf {c}}^\top {\mathbf {X}})\) when \({\mathbf {c}}\) is a decision vector. Before proceeding, we first introduce some big-M coefficients. Throughout the paper, we use the notation, M, to emphasize that the associated parameter is used in a big-M type variable upper bounding (VUB) constraint [see, e.g., \(M_{ik}\) defined in Remark 3.1 as the maximum possible value of \(v_{ik}=[{\mathbf {c}}^\top ({\mathbf {x}}_k-{\mathbf {x}}_i)]_+\) over all \({\mathbf {c}}\in C\), used in the VUB constraint (10)]. Let \(M_{i*}=\max _{k \in [n]}M_{ik}\), be the maximum possible value of \([{\mathbf {c}}^\top ({\mathbf {x}}_k-{\mathbf {x}}_i)]_+\) taken over all \(k\in [n]\) for a given \(i\in [n]\). Similarly, let \(M_{*i}=\max _{k \in [n]}M_{ki}\) for \(i\in [n]\). Finally, let \(\tilde{M}_\ell =\max \{c_\ell :~{\mathbf {c}}\in C\}\) for \(\ell \in [d]\) be the maximum possible value of \(c_\ell \) (note that \(\tilde{M}_\ell \le 1\) because C is a subset of the unit simplex).

Theorem 3.1

Suppose that \({\mathbf {X}}\) is a random vector with realizations \({\mathbf {x}}_1,\ldots ,{\mathbf {x}}_n\) and corresponding probabilities \(p_i,~i \in [n]\). For a given confidence level \(\alpha \) and any decision vector \({\mathbf {c}}\in C\), the equality \(z={\text {VaR}}_{\alpha }({\mathbf {c}}^\top {\mathbf {X}})\) holds if and only if there exists a vector \((z,\varvec{\beta },\varvec{\zeta },{\mathbf {u}})\) satisfying the following system:

$$\begin{aligned}&z\le {\mathbf {c}}^\top {\mathbf {x}}_{i}+\beta _iM_{i*},&\qquad \quad i\in [n], \end{aligned}$$
(16)
$$\begin{aligned}&z\ge {\mathbf {c}}^\top {\mathbf {x}}_{i}-(1-\beta _i)M_{*i},&\qquad \quad i\in [n],\end{aligned}$$
(17)
$$\begin{aligned}&\sum \limits _{i\in [n]} p_i\beta _i \ge \alpha ,\end{aligned}$$
(18)
$$\begin{aligned}&\sum \limits _{i\in [n]}p_i\beta _i - \sum \limits _{i\in [n]}p_i u_i \le \alpha -\epsilon , \end{aligned}$$
(19)
$$\begin{aligned}&z=\sum \limits _{i\in [n]}\varvec{\zeta }_i^\top {\mathbf {x}}_i,\end{aligned}$$
(20)
$$\begin{aligned}&\zeta _{i\ell }\le \tilde{M}_\ell u_i,&\qquad \quad i\in [n],~\ell \in [d],\end{aligned}$$
(21)
$$\begin{aligned}&\sum \limits _{i\in [n]}\zeta _{i\ell }=c_\ell ,&\qquad \quad \ell \in [d],\end{aligned}$$
(22)
$$\begin{aligned}&\sum \limits _{i\in [n]} u_i=1,\end{aligned}$$
(23)
$$\begin{aligned}&u_i \le \beta _i,&\qquad \quad i\in [n],\end{aligned}$$
(24)
$$\begin{aligned}&\varvec{\beta }\in \{0,1\}^n,\quad \varvec{\zeta }\in {\mathbbm {R}}_+^{n \times d},\quad {\mathbf {u}}\in \{0,1\}^n. \end{aligned}$$
(25)

In constraint (19), \(\epsilon \) is a very small constant to ensure that the left-hand side is strictly smaller than \(\alpha \).

Proof

Suppose that \(z={\text {VaR}}_{\alpha }({\mathbf {c}}^\top {\mathbf {X}})\) for a decision vector \({\mathbf {c}}\in C\). Let \(\pi \) be a permutation describing a non-decreasing ordering of the realizations of the random vector \({\mathbf {c}}^\top {\mathbf {X}}\), i.e., \({\mathbf {c}}^\top {\mathbf {x}}_{\pi (1)}\le \cdots \le {\mathbf {c}}^\top {\mathbf {x}}_{\pi (n)}\). Defining

$$\begin{aligned} k^*=\min \left\{ k\in [n]{:}~\sum \limits _{i\in [k]} p_{\pi (i)} \ge \alpha \right\} \quad \text {and}\quad K^*=\left\{ \pi (1),\ldots ,\pi \left( k^*\right) \right\} , \end{aligned}$$
(26)

and using (15) we have \(z={\mathbf {c}}^\top {\mathbf {x}}_{\pi (k^*)}\). Then, a feasible solution of (16)–(25) can be obtained as follows:

$$\begin{aligned} \beta _i=\left\{ \begin{array}{l@{\quad }l}1&{}i\in K^*\\ 0&{}\text {otherwise}\end{array} \right. ,\quad u_i=\left\{ \begin{array}{l@{\quad }l}1&{}i=k^*\\ 0&{}\text {otherwise}\end{array} \right. ,\quad \zeta _{i\ell }=\left\{ \begin{array}{l@{\quad }l}c_\ell &{}i=k^*\\ 0&{}\text {otherwise.}\end{array} \right. \end{aligned}$$

For the reverse implication, let us consider a feasible solution \((z,\varvec{\beta },\varvec{\zeta },{\mathbf {u}})\) of (16)–(25) and let \(\bar{K}=\{i \in [n]{:}~\beta _i=1\}\). To prove our claim, it is sufficient to show that there exists a permutation \(\pi \) where \(\bar{K}=K^*\) and \(z={\mathbf {c}}^\top {\mathbf {x}}_{\pi (k^*)}={\mathbf {c}}^\top {\mathbf {x}}_{\bar{k}}\) for a scenario index \(\bar{k}\in \mathop {\hbox {arg max}}_{i\in \bar{K}}\{{\mathbf {c}}^\top {\mathbf {x}}_i\}\) [\(K^*\) and \(k^*\) are defined as in (26)].

We first focus on the intermediate set of linear inequalities (16)–(19), (23)–(24), and the quadratic equality

$$\begin{aligned}&z=\sum \limits _{i\in [n]}u_i{\mathbf {c}}^\top {\mathbf {x}}_i. \end{aligned}$$
(27)

By the definition of \(\bar{K}\) and inequalities (16)–(17) we have \(z\le {\mathbf {c}}^\top {\mathbf {x}}_{i}\), \(i\in [n] {\setminus } \bar{K}\), and \(z\ge {\mathbf {c}}^\top {\mathbf {x}}_{i}\), \(i\in \bar{K}\). Since \(\beta _i=0\) for all \(i\in [n]{\setminus } \bar{K}\), (24) ensures that \(u_i=0\) for all \(i\in [n]{\setminus } \bar{K}\). Then, (23) and (24) guarantee that \(z=\sum _{i\in \bar{K}}u_i{\mathbf {c}}^\top {\mathbf {x}}_i={\mathbf {c}}^\top {\mathbf {x}}_{\bar{k}}\) for a scenario index \(\bar{k}\) such that \({\mathbf {c}}^\top {\mathbf {x}}_{\bar{k}}=\max _{i\in \bar{K}}\{{\mathbf {c}}^\top {\mathbf {x}}_i\}\). Thus, \(u_i=1\) for \(i=\bar{k}\), and 0, otherwise. Then, from (18) and (19), \({\mathcal {P}}({\mathbf {c}}^\top {\mathbf {X}}\le z)=\sum _{i\in \bar{K}} p_i\ge \alpha \) and \(\sum _{i\in \bar{K}{\setminus } {\bar{k}}}p_i <\alpha \). It follows that, according to the definition in (15), \({\text {VaR}}_{\alpha }({\mathbf {c}}^\top {\mathbf {X}})={\mathbf {c}}^\top {\mathbf {x}}_{\bar{k}}=z\).

Since \({\mathbf {c}}\) is a decision vector, equality (27) involves quadratic terms of the form \(u_ic_\ell \). First observe that \(u_ic_\ell =c_\ell ,~\ell \in [d]\), for exactly one scenario index i, implying \(\sum _{i \in [n]}u_ic_\ell =c_\ell ,~\ell \in [d]\), at any feasible solution satisfying (16)–(19), (23)–(25), and (27). Therefore, it is easy to show that we can linearize the \(u_ic_\ell \) terms by replacing them with the new decision variables \(\zeta _{i\ell }\in {\mathbbm {R}}_+\) in (27) to obtain (20), and enforcing the additional constraints (21)–(22). This completes our proof. \(\square \)

Corollary 3.1

The cut generation problem \(\left( {\mathbf {CutGen}}\_{\mathbf {CVaR}}\right) \) is equivalent to the following optimization problem, referred to as \(\left( {\mathbf {NewMIP}}\_{\mathbf {CVaR}}\right) {}:\)

$$\begin{aligned} \min \quad&z-\frac{1}{\alpha }\sum \limits _{i\in [n]}p_iv_{i} -\eta +\frac{1}{\alpha }\sum \limits _{l\in [m]} q_lw_{l} \end{aligned}$$
(28)
$$\begin{aligned} \text {s.t.} \quad&(6){-}(7), (16){-}(25),\nonumber \\&v_{i}-\delta _{i}=z-{\mathbf {c}}^\top {\mathbf {x}}_i,&\qquad \quad i\in [n], \end{aligned}$$
(29)
$$\begin{aligned}&v_{i}\le M_{i*}\beta _{i},&\qquad \quad i\in [n],\end{aligned}$$
(30)
$$\begin{aligned}&\delta _{i}\le M_{*i}(1-\beta _{i}),&\qquad \quad i\in [n],\end{aligned}$$
(31)
$$\begin{aligned}&{\mathbf {v}}\in {\mathbbm {R}}_+^{n},\quad \varvec{\delta }\in {\mathbbm {R}}_+^{n},\end{aligned}$$
(32)
$$\begin{aligned}&L\le z\le U. \end{aligned}$$
(33)

Proof

We represent \({\text {CVaR}}_{\alpha }({\mathbf {c}}^\top {\mathbf {Y}})\) in \(\left( {\mathbf {CutGen}}\_{\mathbf {CVaR}}\right) \) using the classical formulation (3). On the other hand, we express \({\text {CVaR}}_{\alpha }({\mathbf {c}}^\top {\mathbf {X}})\) using the formula (4), i.e., \({\text {CVaR}}_\alpha ({\mathbf {c}}^\top {\mathbf {X}})=z-\frac{1}{\alpha }\sum _{i\in [n]}p_i[z-{\mathbf {c}}^\top {\mathbf {x}}_i]_+\), where \(z={\text {VaR}}_{\alpha }({\mathbf {c}}^\top {\mathbf {X}})\), and ensure the exact calculation of z for any \({\mathbf {c}}\in C\) by enforcing (16)–(25), from Theorem 3.1. Then, by simple manipulation and linearizing the terms \([z-{\mathbf {c}}^\top {\mathbf {x}}_i]_+=:v_i\) using (29)–(32), we obtain the desired formulation. \(\square \)

Note that there are O(n) binary variables in \(\left( {\mathbf {NewMIP}}\_{\mathbf {CVaR}}\right) \) compared to \(O(n^2)\) binary variables in \(\left( {\mathbf {RSMIP}}\_{\mathbf {CVaR}}\right) \). We next describe valid inequalities, which we refer to as ordering inequalities, to strengthen the formulation \(\left( {\mathbf {NewMIP}}\_{\mathbf {CVaR}}\right) \).

Proposition 3.2

Suppose that the parameters \(M_{ki}\) are calculated as described in Remark 3.1. For a scenario index \(k\in [n]\), let \(L_k=\{i \in [n] {\setminus } {k}{:}~M_{ki}=0\}\) and \(H_k=\{i \in [n]{\setminus } {k}{:}~M_{ik}=0\}\). Then the following sets of inequalities are valid for \(\left( {\mathbf {NewMIP}}\_{\mathbf {CVaR}}\right) \):

$$\begin{aligned} \beta _{k}\le \beta _{i},\quad k \in [n],~i \in L_k, \end{aligned}$$
(34)

or equivalently,

$$\begin{aligned} \beta _{i}\le \beta _{k},\quad k \in [n],~i\in H_k. \end{aligned}$$
(35)

Proof

If \(i\in L_k\), then \(M_{ki}=\max _{{\mathbf {c}}\in C}[{\mathbf {c}}^\top ({\mathbf {x}}_i-{\mathbf {x}}_k)]_+=0\). In other words, \({\mathbf {c}}^\top {\mathbf {x}}_k\ge {\mathbf {c}}^\top {\mathbf {x}}_i\) for all \({\mathbf {c}}\in C\). Now if \(z >{\mathbf {c}}^\top {\mathbf {x}}_k\) for some \({\mathbf {c}}\in C\), then \(\beta _{k}=1\). Because \({\mathbf {c}}^\top {\mathbf {x}}_k\ge {\mathbf {c}}^\top {\mathbf {x}}_i\), we also have \(\beta _{i}=1\). On the other hand, if \(z <{\mathbf {c}}^\top {\mathbf {x}}_i\) for some \({\mathbf {c}}\in C\), then \(\beta _{i}=0\). Because \( z< {\mathbf {c}}^\top {\mathbf {x}}_i\le {\mathbf {c}}^\top {\mathbf {x}}_k\), we also have \(\beta _{k}=0\). Thus, inequality (34) is valid. The validity proof of inequality (35) follows similarly. \(\square \)

Introducing inequalities (34) or (35) to \(\left( {\mathbf {NewMIP}}\_{\mathbf {CVaR}}\right) \) provides us with a stronger formulation. When the number of such inequalities is considered to be large, we may opt to introduce them only for a selected set of scenarios. For example, we fix the values of a subset of \(\beta _i\) variables using preprocessing methods when possible, and introduce the ordering inequalities for those that cannot be fixed. The trivial variable fixing sets \(\beta _i=0\) or \(\beta _i=1\) for all \(i\in [n]\) such that \(M_{i*}=0\) or \(M_{*i}=0\), respectively. In addition, we propose a more elaborate variable fixing, which relies on Proposition 3.1 to identify the scenarios for which the corresponding realizations are too large to be equal to \({\text {VaR}}_{\alpha }({\mathbf {c}}^\top {\mathbf {X}})\). Suppose we show that k is among such scenarios, i.e., \(k\notin \bar{K}\). Then, at any feasible solution we have \(\beta _k=0\), and consequently, \(\beta _i=0\) for all \(i \in H_k\). One can also employ variable fixing by using the bounds on \({\text {VaR}}_{\alpha }({\mathbf {c}}^\top {\mathbf {X}})\). In particular, let \(\beta _i=1\) if \(\max _{{\mathbf {c}}\in C}{\mathbf {c}}^\top {\mathbf {x}}_i < L\) and let \(\beta _i=0\) if \(\min _{{\mathbf {c}}\in C}{\mathbf {c}}^\top {\mathbf {x}}_i > U\). We note that the proposed ordering inequalities and variable fixing methods can also be applied to other relevant MIP formulations involving \(\beta _i\) decisions. In such MIPs, e.g., \(\left( {\mathbf {MIP}}\_{\mathbf {Special}}\right) \), the set \(\{k \in [n]{:}~\beta _k=1\}\) corresponds to the realizations which are less than or equal to \({\text {VaR}}_{\alpha }({\mathbf {c}}^\top {\mathbf {X}})\).

3.2.3 Linearization of \((z-\) \({\mathbf {x}}^\top \) \( {\mathbf {c}})_+\) in \(\left( {\mathbf {CutGen}}\_{\mathbf {CVaR}}\right) \)

Consider the convex function \(g(z,{\mathbf {c}})=[z-{\mathbf {x}}_i^\top {\mathbf {c}}]_+:=\max \{0,z-{\mathbf {x}}_i^\top {\mathbf {c}}\}\) for \((z,{\mathbf {c}})\in {\mathbbm {R}}_+^{d+1}\) and \(i\in [n]\) such that \(\sum _{j\in [d]} c_j=1\), which appears in (4) with \(z={\text {VaR}}_{\alpha }({\mathbf {c}}^\top {\mathbf {X}})\). Using formula (4) in \(\left( {\mathbf {CutGen}}\_{\mathbf {CVaR}}\right) \) leads to a concave minimization. Therefore, we study the linearization of the set (referred to as a reverse concave set) corresponding to the epigraph of \(-g(z,{\mathbf {c}})\), given by (29)–(32) in \(\left( {\mathbf {NewMIP}}\_{\mathbf {CVaR}}\right) \). We propose valid inequalities that give a complete linear description of this linearization set for a given \(i\in [n]\). As a result, these valid inequalities can be used to strengthen the formulation \(\left( {\mathbf {NewMIP}}\_{\mathbf {CVaR}}\right) \) (as will be shown in our computational study in Sect. 5).

Throughout this subsection, we drop the scenario indices and focus on the linearization of one term of the form \([z-{\mathbf {x}}^\top {\mathbf {c}}]_+\). Due to the translation invariance of CVaR, we assume without loss of generality that all the realizations of \({\mathbf {X}}\) are non-negative. Therefore, \(x_j\ge 0, j\in [d]\). This implies the nonnegativity of \(z={\text {VaR}}_{\alpha }({\mathbf {c}}^\top {\mathbf {X}})\), since \({\mathbf {c}}\ge {\mathbf {0}}\). In addition, to avoid trivial cases, we assume that \(x_j>0\) for some \(j\in [d]\), because otherwise, we can let \(z=v\) and \(\delta =0\). We are interested in the polytope defined by

$$\begin{aligned}&v-\delta =z-\sum _{j\in [d]} x_jc_j,&\end{aligned}$$
(36)
$$\begin{aligned}&v\le M_v\beta ,&\end{aligned}$$
(37)
$$\begin{aligned}&\delta \le M_\delta (1-\beta ),&\end{aligned}$$
(38)
$$\begin{aligned}&\sum _{j\in [d]} c_j=1,&\end{aligned}$$
(39)
$$\begin{aligned}&{\mathbf {c}},v,\delta \ge 0, \end{aligned}$$
(40)
$$\begin{aligned}&\beta \in \{0,1\},&\end{aligned}$$
(41)
$$\begin{aligned}&0\le z\le \bar{U} .&\end{aligned}$$
(42)

At this time, we let \(\bar{U} =\max _{s\in [n],k\in [d]} \{x_{sk}\}\), i.e, the largest component of \({\mathbf {x}}_s\) over all \(s\in [n]\), which is a trivial upper bound on \({\text {VaR}}_{\alpha }({\mathbf {c}}^\top {\mathbf {X}})\). Also let \(M_v=\bar{U} -\min _{k\in [d]}\{x_k\}\) be the big-M coefficient for the variable \(v=[z-\sum _{j\in [d]} x_jc_j]_+\), and \(M_\delta =\max _{k\in [d]}\{x_k\}\) be the big-M coefficient for the variable \(\delta =[\sum _{j\in [d]} x_jc_j-z]_+\). Let \({\mathcal {Q}}=\{({\mathbf {c}}, v, \delta ,\beta ,z): (36){-}(42)\}\).

First, we characterize the extreme points of \(\text {conv}({\mathcal {Q}})\). Throughout, we let \(e_k\) denote the d-dimensional unit vector with 1 in the kth entry and zeroes elsewhere.

Proposition 3.3

The extreme points \((c,v, \delta ,\beta ,z)\) of \(\text {conv}({\mathcal {Q}})\) are as follows:

\(\mathbf{QEP1}_k\) :

\((e_k,0, x_k, 0,0)\) for all \(k\in [d]\) with \(x_k>0\),

QEP2 \(_k\) :

\((e_k,0, 0,0, x_k)\) for all \(k\in [d]\),

QEP3 \(_{k}\) :

\((e_k,0, 0,1, x_k)\) for all \(k\in [d]\),

QEP4 \(_{k}\) :

\((e_k,\bar{U} -x_k, 0,1, \bar{U} )\) for all \(k\in [d]\) with \(x_k<\bar{U} \).

Proof

First, note that, from the definitions of \(\bar{U}, M_v,\) and \(M_\delta \), we have \(x_k\le M_\delta \le \bar{U} \), and \(0\le \bar{U} -x_k\le M_v\) for all \(k\in [d]\). Hence, points QEP1 \(_k\)QEP4 \(_k\) are feasible and they cannot be expressed as a convex combination of any other feasible points of \(\text {conv}({\mathcal {Q}})\). Finally, observe that any other feasible point with \(0<c_j<1\) for some \(j\in [d]\) cannot be an extreme point, because it can be written as a convex combination of QEP1 \(_k\)QEP4 \(_k\). \(\square \)

Note that if \(x_k=0\) for some \(k\in [d]\), then QEP1 \(_k\) is equivalent to QEP2 \(_k\). Therefore, we only define QEP1 \(_k\) for \(k\in [d]\) with \(x_k>0\). Similarly, if \(x_k=\bar{U} \) for some \(k\in [d]\), then QEP4 \(_k\) is equivalent to QEP3 \(_k\). Therefore, we only define QEP4 \(_k\) for \(k\in [d]\) with \(x_k<\bar{U} \).

Next we give valid inequalities for \({\mathcal {Q}}\).

Proposition 3.4

For \(k\in [d]\), the inequality

$$\begin{aligned} v\le \sum _{j\in [d]}[x_k-x_j]_+ c_j+\left( \bar{U} -x_k\right) \beta \end{aligned}$$
(43)

is valid for \({\mathcal {Q}}\). Similarly, for \(k\in [d]\), the inequality

$$\begin{aligned} \delta \le \sum _{j\in [d]}[x_j-x_k]_+ c_j+x_k (1-\beta ) \end{aligned}$$
(44)

is valid for \({\mathcal {Q}}\).

Proof

First, we prove the validity of inequality (43). If \(\beta =0\), then \(v=0\) from (37). Because \({\mathbf {c}}\ge {\mathbf {0}}\), inequality (43) holds trivially. If \(\beta =1\), then \(\delta =0\) from (38). Thus, for any \(k\in [d]\),

$$\begin{aligned} v-\delta =v&=z-\sum _{j\in [d]} x_jc_j+x_{k}\left( \sum _{j\in [d]} c_j-1\right) = z+\sum _{j\in [d]}(x_k-x_j)c_j-x_k\\&\le \sum _{j\in [d]}[x_k-x_j]_+c_j+\bar{U} -x_k=\sum _{j\in [d]}[x_k-x_j]_+c_j+(\bar{U} -x_k)\beta , \end{aligned}$$

where the last inequality follows from (42). Thus, inequality (43) is valid.

Next, we prove the validity of inequality (44). If \(\beta =1\), then \(\delta =0\) from (38). Because \({\mathbf {c}}\ge {\mathbf {0}}\), inequality (44) holds trivially. If \(\beta =0\), then \(v=0\) from (38). Thus, for any \(k\in [d]\),

$$\begin{aligned} \delta&=\sum _{j\in [d]} x_jc_j-z\le \sum _{j\in [d]}(x_j-x_k)c_j+x_k\le \sum _{j\in [d]}[x_j-x_k]_+c_j+x_k(1-\beta ). \end{aligned}$$

Hence, inequality (44) is valid. \(\square \)

Theorem 3.2

\(\text {conv}({\mathcal {Q}})\) is completely described by equalities (36) and (39), and inequalities (40), (43), and (44).

Proof

Let \(O(\varvec{\gamma },\gamma ^v, \gamma ^\delta , \gamma ^\beta , \gamma ^z)\), denote the index set of extreme point optimal solutions to the problem \(\min \{\varvec{\gamma }^\top {\mathbf {c}}+\gamma ^v v + \gamma ^\delta \delta + \gamma ^\beta \beta +\gamma ^z z: ({\mathbf {c}},v,\delta ,\beta ,z)\in \text {conv}({\mathcal {Q}})\}\), where \((\varvec{\gamma },\gamma ^v, \gamma ^\delta , \gamma ^\beta ,\gamma ^z)\in {\mathbbm {R}}^{d+4}\) is an arbitrary objective vector, not perpendicular to the smallest affine subspace containing \(\text {conv}({\mathcal {Q}})\). In other words, \((\varvec{\gamma },\gamma ^v, \gamma ^\delta , \gamma ^\beta ,\gamma ^z)\ne \lambda (-{\mathbf {x}},-1,1,0,1)\) and \((\varvec{\gamma },\gamma ^v, \gamma ^\delta , \gamma ^\beta , \gamma ^z)\ne \lambda ({\mathbf {1}},0,0,0,0)\) for \(\lambda \in {\mathbbm {R}}\). Therefore, the set of optimal solutions is not \(\text {conv}({\mathcal {Q}})\) (\(\text {conv}({\mathcal {Q}})\ne \emptyset \)). We prove the theorem by giving an inequality among (40), (43), and (44) that is satisfied at equality by \(({\mathbf {c}}^\kappa ,v^\kappa ,\delta ^\kappa ,\beta ^\kappa , z^\kappa )\) for all \(\kappa \in O(\varvec{\gamma },\gamma ^v, \gamma ^\delta , \gamma ^\beta , \gamma ^z)\) for the given objective vector. Then, since \((\varvec{\gamma },\gamma ^v, \gamma ^\delta , \gamma ^\beta , \gamma ^z)\) is arbitrary, for every facet of \(\text {conv}({\mathcal {Q}})\), there is an inequality among (40), (43), and (44) that defines it. Throughout the proof, without loss of generality, we assume that \(x_1\le x_2\le \cdots \le x_d\). We consider all possible cases.

Case A Suppose that \(\gamma ^\beta \ge 0\). Without loss of generality we can assume that \(\gamma ^\delta =0\) by adding \(\gamma ^\delta (v-\delta -z+\sum _{j\in [d]} x_jc_j)\) to the objective. From Eq. (36) the added term is equal to zero, and so this operation does not change the set of optimal solutions. Furthermore, we can also assume that \(\gamma _j\ge 0\) for all \(j\in [d]\) without loss of generality by subtracting \(\gamma _{k^*} (\sum _{j\in [d]} c_j)\) from the objective, where \(k^*:=\arg \min \{\gamma _j, j\in [d]\}\). From Eq. (39), the subtracted term is a constant (\(\gamma _{k^*}\)), and so this operation does not change the set of optimal solutions. Therefore, for the case that \(\gamma ^\beta \ge 0\), we assume that \(\gamma ^\delta =0\), \(\gamma _j\ge 0\) for all \(j\in [d]\), and \(\gamma _{k^*}=0\). Under these assumptions, we can express the cost of each extreme point solution (denoted by \(C(\cdot )\)) given in Proposition 3.3:

C(QEP1 \(_k\)):

\(=\gamma _k\) for \(k\in [d]\) with \(x_k>0\),

C(QEP2 \(_k\)):

\(=\gamma _k+\gamma ^z x_k\) for \(k\in [d]\),

C(QEP3 \(_{k}\)):

\(=\gamma _k+\gamma ^z x_k +\gamma ^\beta \) for \(k\in [d]\),

C(QEP4 \(_{k}\)):

\(=\gamma _k+\gamma ^z \bar{U} +\gamma ^\beta +\gamma ^v(\bar{U} -x_k)\) for \(k\in [d]\) with \(x_k<\bar{U} \).

Note that QEP1 \(_k\) for \(k\in [d]\) with \(x_k>0\) are the only extreme points with \(\delta >0\), and QEP4 \(_k\) for \(k\in [d]\) with \(x_k<\bar{U} \) are the only extreme points with \(v>0\). We use this observation in the following cases we consider.

  1. (i)

    \(\gamma ^z<0\). In this case, C(QEP2 \(_k\))\(<C\)(QEP1 \(_k\)) for all \(k\in [d]\) with \(x_k>0\). Therefore, \(\delta ^\kappa =0\) for all \(\kappa \in O(\varvec{\gamma },\gamma ^v, \gamma ^\delta , \gamma ^\beta , \gamma ^z)\).

  2. (ii)

    \(\gamma ^z\ge 0\). In this case, C(QEP1 \(_k\))\(\le C\)(QEP2 \(_k\))\(\le C\)(QEP3 \(_k\)) for all \(k\in [d]\). Note that C(QEP4 \(_k\))\(=C\)(QEP3 \(_k\))\(+(\gamma ^z+\gamma ^v)(\bar{U} -x_k), k\in [d]\). Therefore, if \(\gamma ^z+\gamma ^v>0\), then C(QEP4 \(_k\))\(>C\)(QEP3 \(_k\)) for all \(k\in [d]\), and hence extreme points QEP4 \(_k, k\in [d]\) are never optimal. As a result, \(v^\kappa =0\) for all \(\kappa \in O(\varvec{\gamma },\gamma ^v, \gamma ^\delta , \gamma ^\beta , \gamma ^z)\). So we can assume that \(\gamma ^z+\gamma ^v\le 0\). Because \(\gamma ^z\ge 0\), we must have \(\gamma ^v\le 0\). Let \(\phi _k := \gamma ^z \bar{U} +\gamma ^\beta +\gamma ^v(\bar{U} -x_k)\) for \(k\in [d]\). Therefore, C(QEP4\(_{k}\)) \(=\gamma _k+\phi _k\). Note that \(\phi _1\le \phi _2\le \cdots \le \phi _d\) because \(x_1\le x_2\le \cdots \le x_d\le \bar{U} \) and \(\gamma ^v\le 0\) by assumption. If \(\phi _1>0\), then \(\phi _k>0\) and so C(QEP4 \(_k\))\(>C\)(QEP1 \(_k\)) for all \(k\in [d]\). Therefore, extreme points QEP4 \(_k, k\in [d]\) are never optimal. Hence, \(v^\kappa =0\) for all \(\kappa \in O(\varvec{\gamma },\gamma ^v, \gamma ^\delta , \gamma ^\beta , \gamma ^z)\). Similarly, if \(\phi _d<0\), then \(\phi _k<0\) for all \(k\in [d]\). Therefore, extreme points QEP1 \(_k, k\in [d]\) are never optimal. Hence, \(\delta ^\kappa =0\) for all \(\kappa \in O(\varvec{\gamma },\gamma ^v, \gamma ^\delta , \gamma ^\beta , \gamma ^z)\). As a result, we can assume that \(\phi _1\le 0\) and \(\phi _d\ge 0\). If there exists \(j\in [d]\) such that \(\gamma _j>0\) and \(\gamma _j+\phi _j>0\), then C(QEP1 \(_{k^*}\))\(=0<C\)(QEP1 \(_j\))\(\le C\)(QEP2 \(_j\))\(\le C\)(QEP3 \(_j\))\(<C\)(QEP4 \(_j\)). Hence, \(c_j^\kappa =0\) for all \(\kappa \in O(\varvec{\gamma },\gamma ^v, \gamma ^\delta , \gamma ^\beta , \gamma ^z)\). As a result, we can assume that either \(\gamma _k=0\) or \(\gamma _k+\phi _k\le 0\) for all \(k\in [d]\). If there exists \(j\in [d]\) such that \(\gamma _j>0\) and \(\gamma _j+\phi _j<0=C\)(QEP1 \(_{k^*}\)), then extreme points QEP1 \(_k, k\in [d]\) are never optimal. Hence, \(\delta ^\kappa =0\) for all \(\kappa \in O(\varvec{\gamma },\gamma ^v, \gamma ^\delta , \gamma ^\beta , \gamma ^z)\). As a result, we can assume that for every \(k\in [d]\), either \(\gamma _k=0\) or \(\gamma _k+\phi _k=0\).

    1. (a)

      If \(\gamma ^\beta >0\), then the optimal extreme point solutions are QEP1 \(_j\) for all \(j\in [d]\) such that \(\gamma _j=0\); QEP2 \(_j\) for all \(j\in [d]\) such that \(\gamma _j=0\) if \(\gamma ^z=0\); and QEP4 \(_k\) for all \(k\in [d]\) such that \(\gamma _k+\phi _k=0\). Let \(k' :=\max \{j\in [d]: \phi _j\le 0\}\). Note that \(\phi _j>0\) for \(j>k'\) by definition, which implies that \(\gamma _j+\phi _j>0\). Therefore, we must have \(\gamma _j=0\) for \(j> k'\). Then inequality (43) for \(k'\) holds at equality for all optimal solutions \(O(\varvec{\gamma },\gamma ^v, \gamma ^\delta , \gamma ^\beta , \gamma ^z)\).

    2. (b)

      If \(\gamma ^\beta =0\) and \(\gamma ^z>0\), then the optimal extreme point solutions are QEP1 \(_j\) for all \(j\in [d]\) such that \(\gamma _j=0\) and QEP4 \(_k\) for all \(k\in [d]\) such that \(\gamma _k+\phi _k=0\). Then inequality (43) for \(k' \) holds at equality for all optimal solutions \(O(\varvec{\gamma },\gamma ^v, \gamma ^\delta , \gamma ^\beta , \gamma ^z)\).

    3. (c)

      The only case left to consider is if \(\gamma ^\beta =\gamma ^z=0\). In this case, because we assume that \(\gamma ^v\le 0\), there are two cases to consider. If \(\gamma ^v=0\), then \(\phi _k=0\) for all \(k\in [d]\) and we must have \(\gamma _k=0\) for all \(k\in [d]\), which contradicts our initial assumption that \((\varvec{\gamma },\gamma ^v, \gamma ^\delta , \gamma ^\beta , \gamma ^z)\ne \lambda ({\mathbf {1}},0,0,0,0)\) for any \(\lambda \in {\mathbbm {R}}\). Therefore, we must have \(\gamma ^v<0\). In this case, \(\phi _k<0\) for all \(k\in [d]\). Suppose there exists \(k^*\in [d]\) (with \(\gamma _{k^*}=0\)) such that \(x_{k^*}<\bar{U} \). Then, C(QEP4 \(_{k^*}\))\(<0=C\)(QEP1 \(_{k^*}\)). Because C(QEP1 \(_{k^*}\))\(\le C\)(QEP1 \(_j\)) for all \(j\in [d]\), extreme points QEP1 \(_j, j\in [d]\) are never optimal. Hence, \(\delta ^\kappa =0\) for all \(\kappa \in O(\varvec{\gamma },\gamma ^v, \gamma ^\delta , \gamma ^\beta , \gamma ^z)\). The only case left to consider is when \(x_k=\bar{U} \) for all k with \(\gamma _k=0\). In this case, inequality (43) for \(k^*\) holds at equality for all optimal solutions \(O(\varvec{\gamma },\gamma ^v, \gamma ^\delta , \gamma ^\beta , \gamma ^z)\). This completes the proof of Case A.

Case B Suppose that \(\gamma ^\beta < 0\). As before, we can assume that \(\gamma _j\ge 0\) for all \(j\in [d]\), and that \(\gamma _{k^*}=0\) for some \(k^*\in [d]\). Finally, we can assume that \(\gamma ^v=0\) by subtracting \(\gamma ^v (v-\delta -z+\sum _{j\in [d]} x_jc_j)\) from the objective. Under these assumptions, we can express the cost of each extreme point solution (denoted by \(C(\cdot )\)) given in Proposition 3.3:

C(QEP1 \(_k\)):

\(=\gamma _k+\gamma ^\delta x_k\) for \(k\in [d]\) with \(x_k>0\),

C(QEP2 \(_k\)):

\(=\gamma _k+\gamma ^z x_k\) for \(k\in [d]\),

C(QEP3 \(_{k}\)):

\(=\gamma _k+\gamma ^z x_k +\gamma ^\beta \) for \(k\in [d]\),

C(QEP4 \(_{k}\)):

\(=\gamma _k+\gamma ^z \bar{U} +\gamma ^\beta \) for \(k\in [d]\) with \(x_k< \bar{U} \).

Note that due to the assumption that \(\gamma ^\beta < 0\), C(QEP2 \(_k\))\(>C\)(QEP3 \(_k\)) for all \(k\in [d]\). So the extreme points QEP2 \(_k, k\in [d]\) are never optimal under these cost assumptions. We use this observation in the following cases we consider.

  1. (i)

    \(\gamma ^z>0\). In this case, C(QEP4 \(_k\))\(>C\)(QEP3 \(_k\)) for all \(k\in [d]\). (Recall that QEP4 \(_k\) exists for some \(k\in [d]\) only if \(\bar{U} >x_k\).) So the extreme points QEP4 \(_k, k\in [d]\) are never optimal under these cost assumptions. Hence, \(v^\kappa =0\) for all \(\kappa \in O(\varvec{\gamma },\gamma ^v, \gamma ^\delta , \gamma ^\beta , \gamma ^z)\).

  2. (ii)

    \(\gamma ^z\le 0\). If \(\gamma ^z\le \gamma ^\delta \), then C(QEP1 \(_k\))\(>C\)(QEP3 \(_k\)) for all \(k\in [d]\). Therefore, extreme points QEP1 \(_k, k\in [d]\) are never optimal. Hence, \(\delta ^\kappa =0\) for all \(\kappa \in O(\varvec{\gamma },\gamma ^v, \gamma ^\delta , \gamma ^\beta , \gamma ^z)\). As a result, we can assume that \(\gamma ^\delta <\gamma ^z \le 0\) and C(QEP4 \(_k\))\(\le C\)(QEP3 \(_k\)) for all \(k\in [d]\). Note that because \(\gamma ^\delta <0\), \(0>\gamma ^\delta x_1\ge \gamma ^\delta x_2\ge \cdots \ge \gamma ^\delta x_d\). In addition, \(\min _{k\in [d]}\{C\)(QEP4 \(_k\))\(\}=C\)(QEP4 \(_{k^*}\))\(=\gamma ^z \bar{U} +\gamma ^\beta \). If \(\gamma ^\delta x_d> \gamma ^z \bar{U} +\gamma ^\beta \), then extreme points QEP1 \(_k, k\in [d]\) are never optimal. Hence, \(\delta ^\kappa =0\) for all \(\kappa \in O(\varvec{\gamma },\gamma ^v, \gamma ^\delta , \gamma ^\beta , \gamma ^z)\). So we can assume that \(\gamma ^\delta x_d\le \gamma ^z \bar{U} +\gamma ^\beta \). If \(\gamma ^\delta x_1 < \gamma ^z \bar{U} +\gamma ^\beta \), then extreme points QEP4 \(_k, k\in [d]\) are never optimal. Hence, \(v^\kappa =0\) for all \(\kappa \in O(\varvec{\gamma },\gamma ^v, \gamma ^\delta , \gamma ^\beta , \gamma ^z)\). So we can assume that \(\gamma ^\delta x_1 \ge \gamma ^z \bar{U} +\gamma ^\beta \). Let \(\bar{k}:=\min \{j\in [d]: \gamma ^\delta x_j \le \gamma ^z \bar{U} +\gamma ^\beta \}\). If there exists \(j\ge \bar{k}\) such that C(QEP1 \(_j\))\(=\gamma _j+\gamma ^\delta x_j<\gamma ^z \bar{U} +\gamma ^\beta =C\)(QEP4 \(_{k^*}\))\(\le C\)(QEP4 \(_{k}\)) for all \(k\in [d]\), then extreme points QEP4 \(_k, k\in [d]\) are never optimal. Hence, \(v^\kappa =0\) for all \(\kappa \in O(\varvec{\gamma },\gamma ^v, \gamma ^\delta , \gamma ^\beta , \gamma ^z)\). Therefore, we have \(\gamma _j+\gamma ^\delta x_j=\gamma ^z \bar{U} +\gamma ^\beta \) for all \(j\ge \bar{k}\). Under these assumptions, the optimal solutions are QEP1 \(_j\) for \(j\ge \bar{k}\); QEP4 \(_k\) for \(k\in [d]\) such that \(\gamma _k=0\); and QEP3 \(_k\) for \(k\in [d]\) such that \(\gamma _k=0\) if \(\gamma ^z=0\). Then inequality (44) for \(\bar{k}\) holds at equality for all optimal solutions \(O(\varvec{\gamma },\gamma ^v, \gamma ^\delta , \gamma ^\beta , \gamma ^z)\). This completes the proof. \(\square \)

Note that in the definition of the set \({\mathcal {Q}}\), we used weaker bounds on \(v,\delta \) and z than are available using the improvements proposed in Sect. 3. In particular, we can let \(z\le U\), where U is the upper bound on VaR obtained by using the quantile information (as described in Sect. 3.2.1); in most cases, \(U<\bar{U} \). Then, we simply update inequality (43) as

$$\begin{aligned} v\le \sum _{j\in [d]}[x_k-x_j]_+ c_j+(U-x_k) \beta . \end{aligned}$$
(45)

In addition, we can let \(z\ge L\), using the lower bound information on VaR, and typically \(L>0\). If this is the case, then we can define new variables \(z'=z-L\) and \(\delta '=\delta -L\), and let \(M'_z=\bar{U} -L\) and \(M'_\delta =M_\delta -L\), and obtain a linearization polytope of the same form as \({\mathcal {Q}}\) in the \(({\mathbf {c}}, v,\delta ',\beta ,z')\) space. The updated inequality (44) in the original space becomes

$$\begin{aligned} \delta \le \sum _{j\in [d]}[x_j-x_k]_+ c_j+(x_k-L) (1-\beta ). \end{aligned}$$
(46)

Therefore, our results hold for \(L>0\) with this translation of variables.

Finally, from Sect. 3, we know that \(v\le M_{i*}\beta \) and \(\delta \le M_{*i}(1-\beta )\) for the given scenario \(i\in [n]\) for which the linearization polytope is written. Again, in most cases, \(M_{i*}\le M_v\) and \( M_{*i}\le M_\delta \). In this case, we cannot have \(c_k=1\) and \(z=L\) for k such that \(x_k-L>M_{*i}\), because otherwise \(\delta =[\sum _{j\in [d]} c_jx_j-z]_+= x_k-L>M_{*i}\), which violates the constraint \(\delta \le {\mathcal {M}}_{*i}(1-\beta )\). Hence for all k with \(x_k-L>M_{*i}\), if \(c_k>0\) and \(z=L\), then we must have \(c_\ell =1-c_k\) for some \(\ell \in [d]\) with \( x_\ell -L< M_{*i}\). Then, \(\delta =M_{*i}\) in such an extreme point solution. In this case, we can construct an equivalent polyhedron where we let \(x_k^\ell =M_{*i}+L\) for all \(k\in [d]\) such that \(x_k-L>M_{*i}\) and \(\ell \in [d]\) such that \( x_\ell -L< M_{*i}\). Similarly, we cannot have \(c_k=1\) and \(z=U\) for k such that \(U-x_k>M_{i*}\), because otherwise \(v=[z-\sum _{j\in [d]} c_jx_j]_+= U- x_k>M_{i\cdot }\), which violates the constraint \(v\le {\mathcal {M}}_{i\cdot }\beta \). If \(c_k>0\) for k with \(U-x_k>M_{i*}\), then we must have \(c_\ell =1-c_k\) for some \(\ell \in [d]\) with \(U-x_\ell < M_{i*}\). Then \(v=M_{i*}\) in such an extreme point solution. In this case, we can construct an equivalent polyhedron where we let \(\bar{x}_k^\ell =U-M_{i*}\) for all \(k\in [d]\) such that \(U-x_k>M_{i*}\) and \(\ell \in [d]\) such that \(U-x_\ell < M_{i*}\). The resulting polyhedron satisfies the bound assumptions in the definition of \({\mathcal {Q}}\), and the non-trivial inequalities that define its convex hull are given by (45) for \(k\in [d]\) such that \(U-x_k\le M_{i*}\), and inequality (46) for \(k\in [d]\) such that \(x_k-L\le M_{*i}\). Note that after this update inequalities (45) for \(k\in [d]\) such that \(U-x_k= M_{i*}\) reduces to \(v\le M_{i*}\beta \), and inequality (46) for \(k\in [d]\) such that \(x_k-L= M_{*i}\) reduces to \(\delta \le {\mathcal {M}}_{*i}(1-\beta )\). Translating back to the original space of variables and re-introducing the scenario indices we have the following corollary.

Corollary 3.2

For \(i\in [n],\) consider the polyhedron \({\mathcal {Q}}'_i=\{({\mathbf {c}}, v_i, \delta _i,\beta _i,z)\in {\mathbbm {R}}_+^{d+4}: (29){-}(31),(33), (39), \beta _i\in \{0,1\} \}\). Then \(\text {conv}({\mathcal {Q}}'_i)\) is completely described by adding inequalities

$$\begin{aligned}&v_i\le \sum \limits _{j\in [d]}[x_{ik}-x_{ij}]_+ c_j+(U-x_{ik}) \beta _i,\quad \forall ~k\in [d]{:}~U-x_{ik} < M_{i*}, \end{aligned}$$
(47)
$$\begin{aligned}&\delta _i \le \sum \limits _{j\in [d]}[x_{ij}-x_{ik}]_+ c_j+(x_{ik}-L) (1-\beta _i), \quad \forall ~k\in [d]:x_{ik}-L < M_{*i} \end{aligned}$$
(48)

to the original constraints (29)–(31),(33), and (39).

In this section and in Sect. 4.2, we derive valid inequalities and convex hull descriptions using only the condition that C is a unit simplex. However, we note that the unit simplex condition applies, without loss of generality, to all scalarization sets of interest, and therefore, the presented inequalities are valid even if there are additional constraints on the scalarization vectors, i.e., even if C is a strict subset of the unit simplex.

4 Cut generation for optimization with multivariate SSD constraints

In this section, we study the cut generation problem arising in optimization problems of the form \(\left( {\mathbf {G-MSSD}}\right) {}\). As in Sect. 3, we focus on solving the cut generation problems given two d-dimensional random vectors \({\mathbf {X}}\) and \({\mathbf {Y}}\) with realizations \({\mathbf {x}}_1,\ldots ,{\mathbf {x}}_n\) and \({\mathbf {y}}_1,\ldots ,{\mathbf {y}}_m\), respectively. Let \(p_1,\ldots ,p_n\) and \(q_1,\ldots ,q_m\) denote the corresponding probabilities, and let C be a polytope of scalarization vectors.

The random vector \({\mathbf {X}}\) is said to dominate \({\mathbf {Y}}\) in polyhedral linear second order with respect to C if and only if

$$\begin{aligned}&{\mathbbm {E}}\left( \left[ {\mathbf {c}}^\top {\mathbf {y}}_l-{\mathbf {c}}^\top {\mathbf {X}}\right] _+\right) \le {\mathbbm {E}}\left( \left[ {\mathbf {c}}^\top {\mathbf {y}}_l-{\mathbf {c}}^\top {\mathbf {Y}}\right] _+\right) ,\quad \forall ~l\in [m],~{\mathbf {c}}\in C,\text { or equivalently, }\nonumber \\&\quad \sum _{i\in [n]}p_i\left[ {\mathbf {c}}^\top {\mathbf {y}}_l-{\mathbf {c}}^\top {\mathbf {x}}_i\right] _+\le \sum _{k \in [m]}q_k\left[ {\mathbf {c}}^\top {\mathbf {y}}_l-{\mathbf {c}}^\top {\mathbf {y}}_k\right] _+,\quad \forall ~l\in [m],~{\mathbf {c}}\in C. \end{aligned}$$
(49)

As discussed in Sect. 2, Homem-de-Mello and Mehrotra [12] show that for finite probability spaces it is sufficient to consider a finite subset of scalarization vectors, obtained as projections of the vertices of m polyhedra. Specifically, each polyhedron corresponds to a realization of the benchmark random vector \({\mathbf {Y}}\) and is given by \(P_l=\{w_k\ge {\mathbf {c}}^\top {\mathbf {y}}_l-{\mathbf {c}}^\top {\mathbf {y}}_k,~k\in [m],~{\mathbf {c}}\in C,~{\mathbf {w}}\in {\mathbbm {R}}_+^m\}\) for \(l\in [m]\). Thus, \(\left( {\mathbf {G-MSSD}}\right) {}\) can be reformulated as an optimization problem with exponentially many constraints, and solved using a delayed constraint generation algorithm [12]. The SSD constraints corresponding to a subset of the scalarization vectors are initially present in the formulation. Then given a solution to this intermediate relaxed problem, a cut generation problem is solved to identify whether there is a constraint violated by the current solution.

Due to the structure of the SSD relation (49), a separate cut generation problem is defined for each realization of the benchmark random vector. Thus, in contrast to the CVaR-constrained models, the number of cut generation problems depends on the number of benchmark realizations. The cut generation problem associated with the lth realization of the benchmark vector \({\mathbf {Y}}\) is given by

$$\begin{aligned} \left( {\mathbf {CutGen}}\_{\mathbf {SSD}}\right) {}\quad \quad \min _{{\mathbf {c}}\in C}~ \sum _{k \in [m]}q_k\left[ {\mathbf {c}}^\top {\mathbf {y}}_l-{\mathbf {c}}^\top {\mathbf {y}}_k\right] _+-\sum _{i\in [n]}p_i\left[ {\mathbf {c}}^\top {\mathbf {y}}_l-{\mathbf {c}}^\top {\mathbf {x}}_i\right] _+. \end{aligned}$$

4.1 Existing mathematical programming approaches

Note that \(\left( {\mathbf {CutGen}}\_{\mathbf {SSD}}\right) \) involves a minimization of the difference of convex functions. Dentcheva and Wolfhagen [7] use methods from DC programming to solve this problem directly. Similar to the case of univariate SSD constraints [3], we can easily linearize the first type of shortfalls featured in the objective function:

$$\begin{aligned} \min \left\{ \sum _{k \in [m]}q_kw_k-\sum _{i\in [n]}p_i\left[ {\mathbf {c}}^\top {\mathbf {y}}_l-{\mathbf {c}}^\top {\mathbf {x}}_i\right] _+{:}~({\mathbf {c}},{\mathbf {w}}) \in P_l\right\} , \end{aligned}$$
(50)

which results in a concave minimization with potentially many local minima. If the optimal objective function value of (50) is smaller than 0, then there is a scalarization vector for which the SSD condition associated with the lth realization is violated. Note that it is crucial to solve the cut generation problem exactly for the correct execution of the solution method for \(\left( {\mathbf {G-MSSD}}\right) {}\). Otherwise, if we obtain a local minimum and the objective is positive, then we might prematurely stop the algorithm.

The methods based on DC programming and concave minimization may not fully utilize the polyhedral nature of the objective and the constraints. In addition, DC methods can only guarantee local optimality. The main challenge in the cut generation problem (50) is to linearize the second type of shortfalls appearing in the objective function. In this regard, Homem-de-Mello and Mehrotra [12] introduce additional variables and constraints, and obtain the following MIP formulation of \(\left( {\mathbf {CutGen}}\_{\mathbf {SSD}}\right) \) associated with the lth realization of the benchmark vector \({\mathbf {Y}}\):

$$\begin{aligned} \left( {\mathbf {MIP}}\_{\mathbf {SSD}}_{l }\right) {}\quad \min \quad&\sum _{k\in [m]}q_kw_k-\sum _{i\in [n]}p_iv_i\nonumber \\ \text {s.t.}\quad&w_k\ge {\mathbf {c}}^\top {\mathbf {y}}_l-{\mathbf {c}}^\top {\mathbf {y}}_k,&\qquad \quad \forall ~k\in [m], \end{aligned}$$
(51)
$$\begin{aligned}&{\mathbf {w}}\in {\mathbbm {R}}_+^m,\end{aligned}$$
(52)
$$\begin{aligned}&v_i-\delta _i={\mathbf {c}}^\top {\mathbf {y}}_l-{\mathbf {c}}^\top {\mathbf {x}}_i,&\qquad \quad \forall ~i\in [n], \end{aligned}$$
(53)
$$\begin{aligned}&v_i \le M_i \beta _i,&\qquad \quad \forall ~i\in [n], \end{aligned}$$
(54)
$$\begin{aligned}&\delta _i \le \hat{M}_i (1-\beta _i),&\qquad \quad \forall ~i\in [n], \end{aligned}$$
(55)
$$\begin{aligned}&{\mathbf {c}}\in C,\quad {\mathbf {v}}\in {\mathbbm {R}}^n_+,\quad \varvec{\delta }\in {\mathbbm {R}}^n_+,\quad \varvec{\beta }\in \{0,1\}^n. \end{aligned}$$
(56)

Here we can set \(M_i=\max \{\max _{{\mathbf {c}}\in C}~\{{\mathbf {c}}^\top {\mathbf {y}}_l-{\mathbf {c}}^\top {\mathbf {x}}_i\},0\}\) and \(\hat{M}_i =-\min \{\min _{{\mathbf {c}}\in C}\{{\mathbf {c}}^\top {\mathbf {y}}_l-{\mathbf {c}}^\top {\mathbf {x}}_i\},0\}.\) This formulation guarantees that \(v_i=[{\mathbf {c}}^\top {\mathbf {y}}_l-{\mathbf {c}}^\top {\mathbf {x}}_i]_+\) for all \(i\in [n]\).

The authors also propose concavity and convexity cuts to strengthen the formulation \(\left( {\mathbf {MIP}}\_{\mathbf {SSD}}_{l }\right) \). However, the concavity cuts require the complete enumeration of a set of edge directions (may be exponential), and solving a system of linear equations based on this enumeration. Hence, this may not be practicable. In addition, the convexity cuts require the solution of another cut generation LP in higher dimension. Indeed, in their computational study, Hu et al. [13] do not utilize these cuts and solve \(\left( {\mathbf {MIP}}\_{\mathbf {SSD}}_{l }\right) \) directly. They also note that this step is the bottleneck taking over 90 % of the total solution time, and it needs to be improved.

4.2 New developments

We begin by presenting an analogue of Proposition 3.2, which provides valid ordering inequalities that strengthen the formulation \(\left( {\mathbf {MIP}}\_{\mathbf {SSD}}_{l }\right) \). Then, we study the structure of a generalization of the linearization polytope defined by (53)–(56) for a given \(l\in [m]\) and \(i\in [n]\). We give two classes of valid inequalities analogous to those in Proposition 3.4 for this polytope. Furthermore, we show that these inequalities are enough to give the complete linear description when added to the formulation with \(C=\{{\mathbf {c}}\in {\mathbbm {R}}_+^d: \sum _{j\in [d]} c_j=1\}\).

Lemma 4.1

The ordering inequalities (34)–(35) are also valid for \(\left( {\mathbf {MIP}}\_{\mathbf {SSD}}_{l }\right) \) given lth realization of the benchmark random vector \({\mathbf {Y}}\).

This claim immediately follows from the trivial observation that z can be replaced by \({\mathbf {c}}^\top {\mathbf {y}}_l\) in (29) (and also in the proof of Proposition 3.2) for any \(l\in [m]\). Next we give a polyhedral study of the set defining the linearization of the piecewise linear convex shortfall terms.

Linearization of \([{\mathbf {a}}^\top {\mathbf {c}}]_+\) in \(\left( {\mathbf {CutGen}}\_{\mathbf {SSD}}\right) \). For a given vector \({\mathbf {a}}\in {\mathbbm {R}}^d\), consider the convex function \(h(\mathbf {c})=[{\mathbf {a}}^\top {\mathbf {c}}]_+:=\max \{0, {\mathbf {a}}^\top {\mathbf {c}}\}\) for \(\mathbf {c}\in {\mathbbm {R}}_+^d\) such that \(\sum _{j\in [d]} c_j=1\). This function appears in the cut generation problems for optimization under multivariate risk given in (50), where \({\mathbf {a}}={\mathbf {y}}_l-{\mathbf {x}}_i\) for some \(l\in [m]\) and \(i\in [n]\). An MIP linearizing this term is given in \(\left( {\mathbf {MIP}}\_{\mathbf {SSD}}_{l }\right) \). Therefore, we study the linearization of the set (also a reverse concave set) corresponding to the epigraph of \(-h({\mathbf {c}})\). (Note that this structure also appears in the cut generation problem for CVaR (9)–(13), where we let \({\mathbf {a}}={\mathbf {x}}_k-{\mathbf {x}}_i\), for \(i,k \in [n]\).) We propose valid inequalities that give a complete linear description of this linearization set for a given \(i\in [n]\). As a result, these valid inequalities can be used to strengthen the formulations involving such linearization terms.

Let \(D^+=\{j\in [d]: a_j\ge 0\}\) and \(D^-=\{j\in [d]: a_j< 0\}\). Due to the nature of the cut generation problems, we can assume that \(D^+\ne \emptyset \) and \(D^-\ne \emptyset \) (otherwise, we can fix the corresponding binary variables). Without loss of generality, we assume that \(D^+=\{1,\ldots ,d_1\}\) with \(a_1\le a_2\le \cdots \le a_{d_1}\), and \(D^-=\{d_1+1,\ldots ,d\}\) with \(-a_{d_1+1}\le -a_{d_1+2}\le \cdots \le -a_{d}\).

In this subsection, we drop the scenario indices, and study the polytope given by

$$\begin{aligned}&v-\delta =\sum _{j\in [d]} a_jc_j,&\end{aligned}$$
(57)
$$\begin{aligned}&v\le \bar{M}_v \beta ,&\end{aligned}$$
(58)
$$\begin{aligned}&\delta \le \bar{M}_\delta (1-\beta ),&\end{aligned}$$
(59)
$$\begin{aligned}&\sum _{j\in [d]} c_j=1,&\end{aligned}$$
(60)
$$\begin{aligned}&{\mathbf {c}},v,\delta \ge 0, \end{aligned}$$
(61)
$$\begin{aligned}&\beta \in \{0,1\}, \end{aligned}$$
(62)

where \(\bar{M}_v=a_{d_1}\) is the big-M coefficient associated with the variable \(v=[\sum _{j\in [d]} a_jc_j]_+\), and \( \bar{M}_\delta =-a_d\) is the big-M coefficient associated with the variable \(\delta =[\sum _{j\in [d]} -a_jc_j]_+\).

Let \({\mathcal {S}}=\{({\mathbf {c}}, v, \delta ,\beta ): (57){-}(62)\}\). First, we characterize the extreme points of \(\text {conv}({\mathcal {S}})\). Recall that \(e_k\) denotes the d-dimensional unit vector with 1 in the kth entry and zeroes elsewhere.

Proposition 4.1

The extreme points \(({\mathbf {c}},v, \delta ,\beta )\) of \(\text {conv}({\mathcal {S}})\) are as follows:

EP1 \(_k\) :

\((e_k, a_k, 0,1)\) for all \(k\in D^+\),

EP2 \(_\ell \) :

\((e_\ell , 0,-a_\ell ,0)\) for all \(\ell \in D^-\),

EP3 \(_{k,\ell }\) :

\(\left( \frac{-a_\ell }{a_k-a_\ell } e_k+\frac{a_k}{a_k-a_\ell } e_\ell , 0, 0,1\right) \) for all \(k\in D^+\) and \(\ell \in D^-\),

EP4 \(_{k,\ell }\) :

\(\left( \frac{-a_\ell }{a_k-a_\ell } e_k+\frac{a_k}{a_k-a_\ell } e_\ell , 0, 0,0\right) \) for all \(k\in D^+\) and \(\ell \in D^-\).

Proof

First, note that, from the definition of \(\bar{M}_v, \bar{M}_\delta \), \(D^+\) and \(D^-\), we have \(0\le a_k\le \bar{M}_v\) for all \(k\in D^+\) and \(0<-a_\ell \le \bar{M}_\delta \) for \(\ell \in D^-\). Hence, points EP1 \(_k\) and EP2 \(_\ell \) are feasible and they cannot be expressed as a convex combination of any other feasible points of \(\text {conv}({\mathcal {S}})\). Finally, observe that any other feasible point with \(0<c_k<1\) for some \(k\in D^+\) must have \(c_\ell =1-c_k\) for some \(\ell \in D^-\) in any extreme point of \(\text {conv}({\mathcal {S}})\) such that \(c_k a_k + c_\ell a_\ell = 0=v=\delta \). In this case, we can have either \(\beta =0\) or \(\beta =1\). As a result, we obtain the extreme points EP3 \(_{k,\ell }\) and EP4 \(_{k,\ell }\). This completes the proof. \(\square \)

Next we give valid inequalities for \({\mathcal {S}}\).

Proposition 4.2

For \(k=1,\ldots , d_1\), the inequality

$$\begin{aligned} v\le \sum _{j=1}^{d_1}[a_j-a_k]_+ c_j+a_k \beta \end{aligned}$$
(63)

is valid for \({\mathcal {S}}\). Similarly, for \(k=d_1+1,\ldots , d\), the inequality

$$\begin{aligned} \delta \le \sum _{j=d_1+1}^{d}[a_k-a_j]_+ c_j-a_k (1-\beta ) \end{aligned}$$
(64)

is valid for \({\mathcal {S}}\).

Proof

If \(\beta =0\), then \(v=0\) from (58). Because \({\mathbf {c}}\ge {\mathbf {0}}\), inequality (63) holds trivially. If \(\beta =1\), then \(\delta =0\) from (59). Thus, for any \(k=1,\ldots ,d_1\),

$$\begin{aligned} v-\delta =v&=\sum _{j\in [d]} a_jc_j\le \sum _{j=1}^{d_1} a_jc_j=\sum _{j=1}^{d_1}(a_j-a_k)c_j+a_k\sum _{j=1}^{d_1}c_j\\&\le \sum _{j=1}^{d_1}[a_j-a_k]_+ c_j+a_k= \sum _{j=1}^{d_1}[a_j-a_k]_+ c_j+a_k\beta , \end{aligned}$$

where the last inequality follows from (60).

To see the validity of inequality (64), note that equality (57) can be rewritten as \(\delta -v=\sum _{j\in [d]} (-a_j)c_j\). Thus, we obtain an equivalent set where v and \(\delta \), and \(D^+\) and \(D^-\) are interchanged. \(\square \)

Remark 4.1

Inequality (58) is a special case of (63) with \(k=d_1\), and inequality (59) is a special case of (64) with \(k=d\).

Remark 4.2

Note that \(\beta \ge 0\) is implied by inequality (58), and \(\beta \le 1\) is implied by (59).

Remark 4.3

Consider a related set, \({\mathcal {T}}\), where constraint (60) is relaxed to \(\sum _{j\in [d]} c_j\le 1\). This set can be written in the form of the set \({\mathcal {S}}\) with \({\mathbf {c}}\in {\mathbbm {R}}^{d+1}\), where \(D=\{0,\ldots ,d\}\), and \(a_0=0\). In this case, inequality (63) for \(k=0\) is given by \(v\le \sum _{j=1}^{d_1}a_j c_j\).

Theorem 4.1

\(\text {conv}({\mathcal {S}})\) is completely described by equalities (57) and (60), and inequalities (61), (63), and (64).

Proof

Let \(O(\varvec{\gamma },\gamma ^v, \gamma ^\delta , \gamma ^\beta )\), denote the index set of extreme point optimal solutions to the problem \(\min \{\varvec{\gamma }^\top {\mathbf {c}}+\gamma ^v v + \gamma ^\delta \delta + \gamma ^\beta \beta : ({\mathbf {c}},v,\delta ,\beta )\in \text {conv}({\mathcal {S}})\}\), where \((\varvec{\gamma },\gamma ^v, \gamma ^\delta , \gamma ^\beta )\in {\mathbbm {R}}^{d+3}\) is an arbitrary objective vector, not perpendicular to the smallest affine subspace containing \(\text {conv}({\mathcal {S}})\). In other words, \((\varvec{\gamma },\gamma ^v, \gamma ^\delta , \gamma ^\beta )\ne \lambda ({\mathbf {a}},-1,1,0)\) and \((\varvec{\gamma },\gamma ^v, \gamma ^\delta , \gamma ^\beta )\ne \lambda ({\mathbf {1}},0,0,0)\) for \(\lambda \in {\mathbbm {R}}\). Therefore, the set of optimal solutions is not \(\text {conv}({\mathcal {S}})\) (\(\text {conv}({\mathcal {S}})\ne \emptyset \)). We prove the theorem by giving an inequality among (61), (63), and (64) that is satisfied at equality by \(({\mathbf {c}}^\kappa ,v^\kappa ,\delta ^\kappa ,\beta ^\kappa )\) for all \(\kappa \in O(\varvec{\gamma },\gamma ^v, \gamma ^\delta , \gamma ^\beta )\) for the given objective vector. Then, since \((\varvec{\gamma },\gamma ^v, \gamma ^\delta , \gamma ^\beta )\) is arbitrary, for every facet of \(\text {conv}({\mathcal {S}})\), there is an inequality among (61), (63), and (64) that defines it. We consider all possible cases.

Case A Suppose that \(\gamma ^\beta \ge 0\). Without loss of generality we can assume that \(\gamma ^\delta =0\) by adding \(\gamma ^\delta (v-\delta -\sum _{j\in [d]} a_jc_j)\) to the objective. From Eq. (57) the added term is equal to zero, and so this operation does not change the set of optimal solutions. Furthermore, we can also assume that \(\gamma _j\ge 0\) for all \(j\in D\) without loss of generality by subtracting \(\gamma _{\min } (\sum _{j\in [d]} c_j)\) from the objective, where \(\gamma _{\min }:=\min _{j\in [d]}\{\gamma _j\}\). From Eq. (60), the added term is a constant (\(-\gamma _{\min }\)), and so this operation does not change the set of optimal solutions. Note also that after this update \(\gamma _{\min }=0\). Therefore, for the case that \(\gamma ^\beta \ge 0\), we assume that \(\gamma ^\delta =0\) and \(\gamma _{\min }=0\). Under these assumptions, we can express the cost of each extreme point solution (denoted by \(C(\cdot )\)) given in Proposition 4.1:

C(EP1 \(_k\)):

\(=\gamma _k+\gamma ^v a_k +\gamma ^\beta \) for \(k\in D^+\),

C(EP2 \(_\ell \)):

\(=\gamma _\ell \) for \(\ell \in D^-\),

C(EP3 \(_{k,\ell }\)):

\(=\gamma _k\frac{-a_\ell }{a_k-a_\ell } +\gamma _\ell \frac{a_k}{a_k-a_\ell } +\gamma ^\beta \) for \(k\in D^+\) and \(\ell \in D^-\),

C(EP4 \(_{k,\ell }\)):

\(=\gamma _k\frac{-a_\ell }{a_k-a_\ell } +\gamma _\ell \frac{a_k}{a_k-a_\ell } \) for \(k\in D^+\) and \(\ell \in D^-\).

Let \(k^*\in \arg \min \{\gamma _j, j\in D^+\}\) and \(\ell ^*\in \arg \min \{\gamma _j, j\in D^-\}\). Note that \(\min \{\gamma _{k^*}, \gamma _{\ell ^*}\}=\gamma _{\min }=0\). Observe that C(EP2 \(_\ell \))\(<C\)(EP4 \(_{k,\ell }\)) for \(k\in D^+\) and \(\ell \in D^-\) if \(\gamma _\ell <\gamma _k\). On the other hand, if \(\gamma _\ell >\gamma _k\), then C(EP2 \(_\ell \))\(>C\)(EP4 \(_{k,\ell }\)) for \(k\in D^+\) and \(\ell \in D^-\). Also, the only extreme points for which \(\delta >0\) are EP2 \(_\ell \) for \(\ell \in D^-\) with \(-a_\ell >0\), and the only extreme points for which \(v>0\) are \(\mathbf{EP1}_k\) for \(k \in D^+\) with \(a_k>0\). We use these observations in the following cases we consider.

  1. (i)

    \(\gamma _{\ell ^*}=0<\gamma _{k^*}\). In this case, EP4 \(_{k,\ell }\) cannot be an optimal solution for any \(k\in D^+\) and \(\ell \in D^-\). Furthermore, because of the assumption that \(\gamma ^\beta \ge 0\), EP3 \(_{k,\ell }\) cannot be an optimal solution for any \(k\in D^+\) and \(\ell \in D^-\) either.

    1. (a)

      If there exists \(j\in D^+\) such that C(EP1 \(_j\))=\(\gamma _j+\gamma ^v a_j +\gamma ^\beta >0=C\)(EP2 \(_{\ell ^*}\)), then \(c_j^\kappa =0\) for all \(\kappa \in O(\varvec{\gamma },\gamma ^v, \gamma ^\delta , \gamma ^\beta )\). So we can assume that \(\gamma _k+\gamma ^v a_k +\gamma ^\beta \le 0\) for all \(k\in D^+\). Now suppose that \(\gamma _j+\gamma ^v a_j +\gamma ^\beta <0\) for some \(j\in D^+\). In this case, C(EP1 \(_j\))\(<C\)(EP2 \(_\ell \)) for all \(\ell \in D^-\). Therefore, \(\delta ^\kappa =0\) for all \(\kappa \in O(\varvec{\gamma },\gamma ^v, \gamma ^\delta , \gamma ^\beta )\). So we can assume that \(\gamma _k+\gamma ^v a_k +\gamma ^\beta = 0\) for all \(k\in D^+\).

    2. (b)

      If there exists \(j\in D^-\) such that C(EP2 \(_j\))=\(\gamma _j>0=C\)(EP2 \(_{\ell ^*}\)), then \(c_j^\kappa =0\) for all \(\kappa \in O(\varvec{\gamma },\gamma ^v, \gamma ^\delta , \gamma ^\beta )\). So we can assume that \(\gamma _\ell =0\) for all \(\ell \in D^-\). In summary, for the case that \(\gamma ^\beta \ge 0\) and \(\gamma _{\ell ^*}=0<\gamma _{k^*}\), we have \(\gamma _k+\gamma ^v a_k +\gamma ^\beta = 0\) for all \(k\in D^+\) and \(\gamma _\ell =0\) for all \(\ell \in D^-\). In this case, the set \(O(\varvec{\gamma },\gamma ^v, \gamma ^\delta , \gamma ^\beta )\) is given by EP1 \(_k\) for all \(k\in D^+\) and EP2 \(_\ell \) for all \(\ell \in D^-\). Inequality (63) for \(k=1\) is tight for all these extreme point optimal solutions. Hence, the proof is complete for this case.

  2. (ii)

    \(\gamma _{\ell ^*}>\gamma _{k^*}=0\). Recall that, in this case, C(EP4 \(_{k^*,\ell }\))\(<C\)(EP2 \(_\ell \)) for all \(\ell \in D^-\). Therefore, \(\delta ^\kappa =0\) for all \(\kappa \in O(\varvec{\gamma },\gamma ^v, \gamma ^\delta , \gamma ^\beta )\). Hence, the proof is complete for this case.

  3. (iii)

    \(\gamma _{\ell ^*}=\gamma _{k^*}=0\).

    1. (a)

      If there exists \(j\in D^-\) such that \(\gamma _j>0\), then \(c_j^\kappa =0\) for all \(\kappa \in O(\varvec{\gamma },\gamma ^v, \gamma ^\delta , \gamma ^\beta )\). So we can assume that \(\gamma _\ell =0\) for all \(\ell \in D^-\).

    2. (b)

      Suppose that \(\gamma _j+\gamma ^v a_j +\gamma ^\beta <0\) for some \(j\in D^+\). In this case, EP1 \(_j\) has a strictly better objective value than EP2 \(_\ell \), EP3 \(_{k,\ell }\), and EP4 \(_{k,\ell }\) for all \(k\in D^+\) and \(\ell \in D^-\). Therefore, \(\delta ^\kappa =0\) for all \(\kappa \in O(\varvec{\gamma },\gamma ^v, \gamma ^\delta , \gamma ^\beta )\). So we can assume that \(\gamma _k+\gamma ^v a_k +\gamma ^\beta \ge 0\) for all \(k\in D^+\). If there exists \(j\in D^+\) such that \(\gamma _j>0\) and \(\gamma _j+\gamma ^v a_j +\gamma ^\beta >0\), then \(c_j^\kappa =0\) for all \(\kappa \in O(\varvec{\gamma },\gamma ^v, \gamma ^\delta , \gamma ^\beta )\). So we can assume that at least one of the conditions \(\gamma _k=0\) or \(\gamma _k+\gamma ^v a_k +\gamma ^\beta = 0\) holds for all \(k\in D^+\). Let \(D_0^+=\{j\in D^+: \gamma _j=0\}\) and \(D_1^+=D^+ {\setminus } D_0^+\). Note that \(k^*\in D_0^+\) and \(\gamma _k+\gamma ^v a_k +\gamma ^\beta = 0\) for all \(k\in D_1^+\).

    3. (c)

      Suppose that \(\gamma _k=0\) for all \(k\in D^+\) (i.e., \(D_1^+=\emptyset \)). Recall that we also have \(\gamma _\ell =0\) for all \(\ell \in D^-\), \(\gamma ^\delta =0\) and \(\gamma ^\beta \ge 0\). If \(\gamma ^\beta = 0\), then \(\gamma ^v\) cannot equal to 0 (then all solutions are optimal). Suppose that \(\gamma ^\beta = 0\), then \(\gamma ^v>0\) (because we showed that \(\gamma _k+\gamma ^v a_k +\gamma ^\beta \ge 0\) for all \(k\in D^+\)). Then \(v^\kappa =0 \) for all \(\kappa \in O(\varvec{\gamma },\gamma ^v, \gamma ^\delta , \gamma ^\beta )\). So we can assume that \(\gamma ^\beta > 0\). If \(\gamma ^v\ge 0\), then EP1 \(_k\) is not optimal for any \(k\in D^+\). Therefore, \(v^\kappa =0 \) for all \(\kappa \in O(\varvec{\gamma },\gamma ^v, \gamma ^\delta , \gamma ^\beta )\). So we can assume that \(\gamma ^v< 0\). Because we showed that \(\gamma _k+\gamma ^v a_k +\gamma ^\beta \ge 0\) for all \(k\in D^+\), and we assume that \(\gamma _k=0\) for all \(k\in D^+\), we have \(\gamma ^\beta \ge -\gamma ^v a_{d_1}\). If \(\gamma ^v a_{d_1} +\gamma ^\beta > 0\), then EP1 \(_k\) is not optimal for any \(k\in D^+\). Therefore, \(v^\kappa =0 \) for all \(\kappa \in O(\varvec{\gamma },\gamma ^v, \gamma ^\delta , \gamma ^\beta )\), and we can assume that \(\gamma ^v a_{d_1} +\gamma ^\beta = 0\). In this case, inequality (63) for \(k=d_1\) holds at equality for the set of all optimal extreme solutions \(O(\varvec{\gamma },\gamma ^v, \gamma ^\delta , \gamma ^\beta )\) (namely, EP1 \(_k\) for \(k\in D^+\) with \(a_k=a_{d_1}\), EP2 \(_\ell \) and EP4 \(_{j,\ell }\) for all \(j\in D^+\) and \(\ell \in D^-\)).

    4. (d)

      There exists \(k\in D^+\) such that \(\gamma _k>0\) (i.e., \(D_1^+\ne \emptyset \)). In this case, for \(k\in D_1^+\), \(\gamma _k=-\gamma ^v a_k-\gamma ^\beta >0\). Because \(\gamma ^\beta \ge 0\), we must have \(\gamma ^v< 0\) and \(a_k>0\) for \(k\in D_1^+\). In this case, we cannot have \(\gamma ^\beta =0\) (unless \(a_j=0\) for all \(j\in D_0^+\)), because otherwise \(\gamma _{j}+\gamma ^v a_{j} +\gamma ^\beta < 0\) for \(j\in D_0^+\) with \(a_j>0\) violating the condition in part (b) that \(\gamma _k+\gamma ^v a_k +\gamma ^\beta \ge 0\) for all \(k\in D^+\). So \(\gamma ^\beta >0\) and EP3 \(_{j,\ell }\) is not optimal for any \(j\in D^+, \ell \in D^-\). Let \(k_1=\min \{j \in D_1^+\}\), then we must have \(k\in D_1^+\) for all \(k\in D^+\) with \( k>k_1\). In this case, the set of all optimal solutions is given by EP1 \(_k\) for \(k\in D_1^+\), EP2 \(_\ell \) and EP4 \(_{j,\ell }\) for all \(j\in D_0^+\) and \(\ell \in D^-\), where the optimal objective value is zero. Then inequality (63) for \(k=k_1\) holds at equality for the set of all optimal extreme solutions \(O(\varvec{\gamma },\gamma ^v, \gamma ^\delta , \gamma ^\beta )\). The last case to consider is that \(a_j=0\) for all \(j\in D_0^+\) and hence \(\gamma ^\beta =0\). In this case, inequality (63) for \(k=k^*\) holds at equality for the set of all optimal extreme solutions \(O(\varvec{\gamma },\gamma ^v, \gamma ^\delta , \gamma ^\beta )\) (namely, EP1 \(_k\) for \(k\in D^+\), EP2 \(_\ell \), EP3 \(_{j,\ell }\) and EP4 \(_{j,\ell }\) for all \(j\in D_0^+\) and \(\ell \in D^-\)).

Case B Suppose that \(\gamma ^\beta < 0\). Without loss of generality we can assume that \(\gamma ^v=0\) by subtracting \(\gamma ^v (v-\delta -\sum _{j\in [d]} a_jc_j)\) from the objective. From Eq. (57), the subtracted term is equal to zero, and so this operation does not change the set of optimal solutions. As argued in the proof of the validity of (64), equality (57) can be rewritten as \(\delta -v=\sum _{j\in [d]} (-a_j)c_j\). Thus, we obtain an equivalent set where v and \(\delta \), and \(D^+\) and \(D^-\) are interchanged. Thus, the proof is complete, using the same arguments as in Case A and inequalities (64). \(\square \)

In line with the above analysis, we introduce \(a_{ij}=({\mathbf {y}}_l-{\mathbf {x}}_i)_j\), \(D_i^+=\{j\in [d]: a_{ij}\ge 0\}\) and \(D_i^-=\{j\in [d]: a_{ij}< 0\}\) for all \(i\in [n]\). Then, an enhanced MIP formulation of \(\left( {\mathbf {CutGen}}\_{\mathbf {SSD}}\right) \) for the lth realization of \({\mathbf {Y}}\) is obtained by replacing (54)–(55) in \(\left( {\mathbf {MIP}}\_{\mathbf {SSD}}_{l }\right) \) with the following constraints:

$$\begin{aligned}&v_i \le \sum _{j \in D^+_i}[a_{ij}-a_{ik}]_+ c_j+a_{ik} \beta _i,&\qquad \quad \forall ~ i\in [n],k \in D^+_i, \end{aligned}$$
(65)
$$\begin{aligned}&\delta _i \le \sum _{j \in D^-_i}[a_{ik}-a_{ij}]_+ c_j-a_{ik} (1-\beta _i),&\qquad \quad \forall ~i\in [n],k \in D^-_i. \end{aligned}$$
(66)

5 Computational study

The goals of our computational study are two-fold. In the first part, we demonstrate that the methods developed in Sect. 3.2—including variable fixing, bounding, and incorporating valid inequalities—are effective in solving \(\left( {\mathbf {CutGen}}\_{\mathbf {CVaR}}\right) \). In the second part, we perform a similar analysis for the methods presented in Sect. 4 for \(\left( {\mathbf {CutGen}}\_{\mathbf {SSD}}\right) \).

All the optimization problems are modeled with the AMPL mathematical programming language. All runs were executed on 4 threads of a Lenovo(R) workstation with two Intel® Xeon® 2.30 GHz CE5-2630 CPUs and 64 GB memory running on Microsoft Windows Server 8.1 Pro x64 Edition. All reported times are elapsed times, and the time limit is set to 5400 s. CPLEX 12.2 is invoked with its default set of options and parameters. If optimality is not proven within the time allotted, we record both the best lower bound on the optimal objective value (retrieved from CPLEX and denoted by \({\text {LB}}\)) and the best available objective value (denoted by \({\text {UB}}\)). In cut generation problems, the optimal objective function can take any value including 0, and so in order to provide more insight, we calculate two types of relative optimality gap: \({\text {G}}_{{\text {1}}}=|{\text {LB}}-{\text {UB}}|/(|{\text {UB}}|)\) and \({\text {G}}_{{\text {2}}}=|{\text {LB}}-{\text {UB}}|/(|{\text {LB}}|)\). It is easy to see that the maximum of \({\text {G}}_{{\text {1}}}\) and \({\text {G}}_{{\text {2}}}\) is an upper bound on the actual relative optimality gap; we do not report \({\text {G}}_{{\text {1}}}\) when \(|{\text {UB}}|=0\) or CPLEX yields a trivial lower bound of \(-\infty \).

We would like to remind the reader that during a cut generation-based algorithm, the solution procedure of the cut generation problem is allowed to terminate early without finding the most violated cut. However, when such a heuristic procedure cannot find a violated cut, it is still required to prove that the optimal objective function value is non-negative. Therefore, in our experiments we opt for solving the cut generation problem to optimality.

5.1 Generation of the problem instances

In this section, we describe two sets of data used for our computational experiments.

5.1.1 Homeland security budget allocation

We test the computational effectiveness of our proposed methods on a homeland security budget allocation (HSBA) problem presented in Hu et al. [13] for optimization under multivariate polyhedral SSD constraints. We follow the related data generation scheme described in Noyan and Rudolf [17], where the polyhedral SSD constraints are replaced by the \({\text {CVaR}}\)-based ones. The main problem is to allocate a fixed budget to ten urban areas in order to prevent, respond to, and recover from national disasters. The risk share of each area is based on four criteria: property losses, fatalities, air departures, and average daily bridge traffic. The penalty for allocations under the risk share is expressed by a budget misallocation function associated with each criterion, and these functions are used as the multiple random performance measures of interest. In order to be consistent with our convention of preferring larger values, we construct random outcome vectors of interest from the negative of the budget misallocation functions associated with four criteria. Two different benchmarks are considered: one based on average government allocations by the Department of Homeland Security’s Urban Areas Security Initiative, and one based on suggestions in the RAND report by Willis et al. [21]. The scalarization polytope is of the form \(C=\left\{ {\mathbf {c}}\in {\mathbbm {R}}^4{:}~\Vert {\mathbf {c}}\Vert _1=1, c_i\ge c^*_i-\frac{\theta }{3}\right\} \), where \({\mathbf {c}}^*\in {\mathbbm {R}}^4\) is a center satisfying \(\Vert {\mathbf {c}}^*\Vert _1=1\), and \(\theta \in [0,1]\) is a constant for which \(\frac{\theta }{3}\le \min _{i\in \{1,\ldots ,4\}}c^*_i\) holds. We consider the “base case” with \(\theta =0.25\) and \({\mathbf {c}}^*=\left( \frac{1}{4},\frac{1}{4},\frac{1}{4},\frac{1}{4}\right) \), unless otherwise stated. We refer the reader to Hu et al. [13] and Noyan and Rudolf [17] for more details on the data generation.

For this set of instances, Noyan and Rudolf [17] report computational results with the formulation \(\left( {\mathbf {MIP}}\_{\mathbf {Special}}\right) \)—developed for the multivariate CVaR-constrained problem under the special case of equal probabilities. For example, for the largest problem instances with 500 scenarios and \(\alpha =0.05\) (resp., \(\alpha =0.01\)), on average, two (resp., 1.6) cut generation problems need to be solved taking 14,386 (resp., 11,507) seconds (around 99.8 % of overall solution time). We note that in the initialization step of the algorithm, four risk constraints are additionally generated based on the vertices of C. Similarly, for the multivariate SSD-constrained problems, Hu et al. [13] report that for the largest test problems with 300 scenarios, only one cut generation problem is solved taking 1318 s (96 % of overall solution time). Since the cut generation is the main bottleneck, in our computational study we only focus on solving the cut generation problems. Hence, different from the existing studies, we also explain how we obtain the realizations of the random vector \({\mathbf {X}}\). In accordance with the existing studies, the risk constraints associated with the vertices of the scalarization polytope C are initially added to the intermediate relaxed problem. In the base case, the polytope C is a three-dimensional simplex with the vertices \(\hat{{\mathbf {c}}}_1,\ldots , \hat{{\mathbf {c}}}_4\), where the ith element of \(\hat{{\mathbf {c}}}_i\) is equal to 0.5, and other elements are 0.5 / 3. We solve the master problem once, and use its optimal solution to calculate the realizations of the associated 4-dimensional random vector \({\mathbf {X}}\). Note that it is clear how to obtain the realizations of the random vector \({\mathbf {Y}}\), since the benchmark allocations are given.

5.1.2 Randomly generated data

To further analyze the computational performance of the proposed methods, we consider a different type of problem (inspired by Dentcheva and Wolfhagen [7]):

$$\begin{aligned} \max \left\{ f({\mathbf {z}}){:}~{\mathbf {R}}{\mathbf {z}}\succcurlyeq {\mathbf {Y}}, \quad {\mathbf {z}}\in {\mathbbm {R}}_+^{100}\right\} , \end{aligned}$$

where \({\mathbf {R}}: \varOmega \mapsto [0,1]^{d\times 100}\) is a random matrix and the relation \(\succcurlyeq \) represents a stochastic multivariate preference relation. In our setup, the relation \(\succcurlyeq \) represents \(\succcurlyeq ^C_{{\text {CVaR}}_{\alpha }}\) and \(\succcurlyeq ^C_{(2)}\) for the multivariate polyhedral CVaR and SSD relation, respectively. We assume that the benchmark vector \({\mathbf {Y}}\) takes the form of \(\bar{{\mathbf {R}}}\bar{{\mathbf {z}}}\), where \(\bar{{\mathbf {R}}}\) is also a \(d\times 100\)-dimensional random matrix and \(\bar{{\mathbf {z}}}\in {\mathbbm {R}}_+^{100}\) is a given benchmark decision. The entries of the matrices \({\mathbf {R}}\) and \(\bar{{\mathbf {R}}}\) are independently generated from the uniform distribution on the interval [0, 1]. Since we directly focus on solving the associated cut generation problems, we also randomly generated the decision variables \({\mathbf {z}}\) and \(\bar{{\mathbf {z}}}\); in particular, they are independently and uniformly generated from the interval [100, 500]. This data generation scheme directly provides us with the realizations of two d-dimensional random vectors \({\mathbf {X}}={\mathbf {R}}{\mathbf {z}}\) and \({\mathbf {Y}}=\bar{{\mathbf {R}}}\bar{{\mathbf {z}}}\).

5.2 Computational performance: cut generation for \(\left( {\mathbf {G-MCVaR}}\right) \)

First, we study the effectiveness of alternative MIP formulations for \(\left( {\mathbf {CutGen}}\_{\mathbf {CVaR}}\right) \). In these experiments, we assume that each scenario is equally likely, and consider confidence levels of the form \(\alpha =k/n\). For an arbitrary confidence level \(\bar{\alpha }\), we calculate k as \(\lceil \bar{\alpha } n\rceil \). In Table 1, we present our experiments on the performances of four alternative formulations: (1) the MIP model—\(\left( {\mathbf {MIP}}\_{\mathbf {Special}}\right) \)—developed for the special case of equal probabilities [17], (2) the MIP model—\(\left( {\mathbf {MIP}}\_{\mathbf {CVaR}}\right) \)—for general probabilities presented in Noyan and Rudolf [17], (3) the more compact model—\(\left( {\mathbf {SMIP}}\_{\mathbf {CVaR}}\right) \)—proposed in Sect. 3.2.1, and (4) the new model—\(\left( {\mathbf {NewMIP}}\_{\mathbf {CVaR}}\right) \)—proposed in Sect. 3.2.2. We report the results averaged over two instances (based on Government and RAND benchmarks) for each combination of \(\alpha \) and n. We see that the new formulation using the VaR representation is highly effective in reducing the solution time for these instances. Problem instances that are not solvable within the time limit of 5400 s with the existing formulation \(\left( {\mathbf {MIP}}\_{\mathbf {CVaR}}\right) \) and its enhancement \(\left( {\mathbf {SMIP}}\_{\mathbf {CVaR}}\right) \), is now solvable in 6 min for all instances but one (HSBA data, \(n=1000, \alpha =0.05\)), which is also solved well within the time limit. We observe that \(\left( {\mathbf {MIP}}\_{\mathbf {CVaR}}\right) \) terminates at the root node for large instances with no integer feasible solution available. This may be due to the large size of the formulation (quadratic number of binary variables). In contrast, \(\left( {\mathbf {NewMIP}}\_{\mathbf {CVaR}}\right) \) contains a linear number of binary variables. What is also surprising is that even the formulation \(\left( {\mathbf {MIP}}\_{\mathbf {Special}}\right) \), which uses more information due to the equal probability assumption, is not able to solve many of the instances. For the HSBA data set, \(\left( {\mathbf {MIP}}\_{\mathbf {Special}}\right) \) has inferior performance when compared to \(\left( {\mathbf {SMIP}}\_{\mathbf {CVaR}}\right) \) for problems with 300 or more scenarios. On the other hand, for the random data set (described in Sect. 5.1.2) \(\left( {\mathbf {MIP}}\_{\mathbf {Special}}\right) \) performs better than \(\left( {\mathbf {MIP}}\_{\mathbf {CVaR}}\right) \) and \(\left( {\mathbf {SMIP}}\_{\mathbf {CVaR}}\right) \). However, it still cannot solve larger instances with 500 or more scenarios. In contrast, \(\left( {\mathbf {NewMIP}}\_{\mathbf {CVaR}}\right) \) solves these problems within a few minutes. We would also like to note that the total time spent on preprocessing for \(\left( {\mathbf {NewMIP}}\_{\mathbf {CVaR}}\right) \) (calculation of the parameters \(L, U, M_{ik}, M_{i*}, M_{*i}, H_k\)), which is not included in the times reported, is negligible. Therefore, we can conclude that \(\left( {\mathbf {NewMIP}}\_{\mathbf {CVaR}}\right) \) is a better formulation than the existing formulations \(\left( {\mathbf {MIP}}\_{\mathbf {Special}}\right) \), \(\left( {\mathbf {MIP}}\_{\mathbf {CVaR}}\right) \) and its enhancement \(\left( {\mathbf {SMIP}}\_{\mathbf {CVaR}}\right) \).

Table 1 Computational performance of the alternative MIPs for \(\left( {\mathbf {CutGen}}\_{\mathbf {CVaR}}\right) {}\)
Table 2 Computational performance of the enhanced MIPs for \(\left( {\mathbf {CutGen}}\_{\mathbf {CVaR}}\right) {}\)—base polytope: HSBA instances
Table 3 Effectiveness of the valid inequalities for \(\left( {\mathbf {NewMIP}}\_{\mathbf {CVaR}}\right) {}\)—unit simplex: random instances (\(\alpha =0.01\))

Next we study the effectiveness of various classes of valid inequalities and preprocessing strategies described in Sects. 3.2.2 and 3.2.3. Note that when we test the performance of a class of inequalities, we add all inequalities a priori to the formulation, because there are polynomially many of them. We consider two sets of data as before, one with HSBA data (Table 2), and one with the randomly generated data (Table 3). In Tables 2 and 3, the relative improvements and optimality gaps are given as percentages and all presented results are averaged over the two instances with different benchmarks. In the first two columns of Table 2, we compare the performance of \(\left( {\mathbf {RSMIP}}\_{\mathbf {CVaR}}\right) \), which is the original formulation enhanced with variable reduction due to symmetry, variable fixing and bounding, against the new formulation \(\left( {\mathbf {NewMIP}}\_{\mathbf {CVaR}}\right) \) without any enhancements. In the third column of Table 2, we report the performance of \(\left( {\mathbf {NewMIP}}\_{\mathbf {CVaR}}\right) \) with variable fixing and bounding. Finally, in the fourth column, we report the performance of \(\left( {\mathbf {NewMIP}}\_{\mathbf {CVaR}}\right) \) with variable fixing, bounding and ordering inequalities (34). Comparing the first two columns of Table 2, we see that fixing and bounding the variables are highly effective strategies, and as a result \(\left( {\mathbf {RSMIP}}\_{\mathbf {CVaR}}\right) \) outperforms \(\left( {\mathbf {NewMIP}}\_{\mathbf {CVaR}}\right) \). However, it cannot solve the larger instances within the time limit, and in general stops with a large relative optimality gap. On the other hand, when these strategies are also applied to \(\left( {\mathbf {NewMIP}}\_{\mathbf {CVaR}}\right) \), all test instances are solved within the time limit, as observed from the third column. The reduction in solution time comparing columns 2 and 3 can be attributed to the large reduction in the binary variables due to variable fixing; fewer than 7 and 17 % of the binary variables remain in the formulation for instances with \(\alpha =0.01\) and \(\alpha =0.05\), respectively. The reduction in binary variables is primarily a result of the observation in Proposition 3.1. We did not observe any additional fixing based on the bounds on VaR in our experiments. Finally, from the last column we see that ordering inequalities are highly effective and have the best performance, when used in addition to fixing and bounding, compared to the other settings that do not use these inequalities. Because a large number of variables are fixed and a relatively large number of ordering relations (34) across scenarios exist in these instances, we did not see much benefit of inequalities (47)–(48). We note that this behavior is highly data-dependent as we see in Table 3. In this table, we compare different settings in the first three columns: (1) \(\left( {\mathbf {NewMIP}}\_{\mathbf {CVaR}}\right) \) without any enhancements, (2) \(\left( {\mathbf {NewMIP}}\_{\mathbf {CVaR}}\right) \) with fixing, bounding, and ordering inequalities (34), and (3) \(\left( {\mathbf {NewMIP}}\_{\mathbf {CVaR}}\right) \) with fixing, bounding, and all classes of cuts ((34) and (47)–(48)). We do not report our detailed results for \(\left( {\mathbf {NewMIP}}\_{\mathbf {CVaR}}\right) \) with fixing and bounding, because the conclusions are similar to Table 2. For these instances, while a significant number of binary variables can be fixed, the percentage of remaining variables is higher than that for the HSBA data. In this case, the setting with all enhancements and valid inequalities yields the best performance in most cases, with close to 50 % reduction in solution time for several instances. The inequalities (47)–(48) are useful when added on to the setting with all other improvements, in the most difficult cases. Overall, with this setting, all instances are solved within the time limit with much fewer branch-and-bound (B&B) nodes explored.

Table 4 Effectiveness of fixing and ordering inequalities for \(\left( {\mathbf {MIP}}\_{\mathbf {SSD}}_{l }\right) {}\)—Unit simplex: Random instances

5.3 Computational performance: cut generation for \(\left( {\mathbf {G-MSSD}}\right) \)

In Table 4, we report our computational experiments with the randomly generated data described in Sect. 5.1.2 to illustrate the effectiveness of the strategies proposed for multivariate SSD-constrained optimization problems. Recall that the cut generation problems decompose by benchmark realizations for SSD. In these experiments, we solve the cut generation problem for \(\lceil m/20\rceil \) of the benchmark realizations. Because we solve multiple cut generation problems for each setting, we let \(n\in \{200, 300, 500\}\). For each setting, we generate two instances and report their average statistics. We report the average and the standard deviation of the solution times taken over all tested benchmark realizations for a given setting. We compare the performance of two formulations: \(\left( {\mathbf {MIP}}\_{\mathbf {SSD}}_{l }\right) \) and \(\left( {\mathbf {MIP}}\_{\mathbf {SSD}}_{l }\right) \) with variable fixing and ordering inequalities. In the first column, we report the elapsed time statistics (in seconds) for \(\left( {\mathbf {MIP}}\_{\mathbf {SSD}}_{l }\right) \) without any computational enhancements. From the standard deviation columns, we observe a high variability in the solution times. In fact, the minimum solution times are in a few seconds, whereas the maximum solution times are at the time limit of 5400 s. We also report the number of instances that were not solved within the time limit under the column “# Unslvd”.

Note that unlike the CVaR case, which benefits from additional information on VaR for fixing variables, in the SSD case not many binary variables can be fixed. On average, over 65 % of the binary variables remain in the formulation. Next, we analyze the performance of ordering inequalities (34), in addition to fixing, reported in the second column. In the last column of Table 4, we report the average number of ordering inequalities added to the formulation \(\left( {\mathbf {MIP}}\_{\mathbf {SSD}}_{l }\right) \). We recognize that the ordering inequalities are highly effective, as they reduce the average solution time significantly, enabling the solution of all instances within the time limit. We also tested the performance of the formulation with inequalities (65)–(66) on these instances, but observed that it does not perform better than the version with ordering inequalities. In our experience, ordering inequalities, when a large number of them exist, are preferable because they are sparse and they provide information on the realizations under multiple scenarios. In contrast, inequalities (65)–(66) are denser with very small coefficients for the instances tested, and they provide information on the correct calculation of the nonlinear shortfall term for a single scenario at a time. As a result, if a much larger number of ordering relations (34) across scenarios exist than the number of inequalities (65)–(66) (given by the multiplication of remaining number of scenarios and d), then it is preferable to use only the ordering inequalities in a brute force method that adds all inequalities a priori to the formulation. Alternatively, a branch-and-cut method can be implemented, with a more elaborate cut management system so as to benefit from both types of cuts. Furthermore, inequalities (65)–(66) can be strengthened using the ordering relation information for a scenario under which the realization is known to be smaller than the realization under another scenario. On the other hand, when the number of ordering relations is relatively small, the additional information provided by inequalities (65)–(66) could be more useful (see Table 3 for the performance of the analogue of inequalities (65)–(66) for the CVaR case).

6 Conclusions

In this paper, we develop alternative mixed-integer programming formulations and solution methods for cut generation problems arising in a class of stochastic optimization problems that features benchmarking constraints based on multivariate polyhedral conditional value-at-risk. We propose a mixed-integer programming formulation of the cut generation problem that involves a new representation of value-at-risk. We show that this new formulation is highly effective in solving the cut generation problems. In addition, we describe computational enhancements involving variable fixing and bounding. Furthermore, we give a class of valid inequalities, which establish a relative order between scenario-dependent binary variables when possible. Finally, we give the convex hull description of a polytope describing the linearization of a non-convex substructure arising in this cut generation problem. Our computational results illustrate the effectiveness of our proposed models and methods for the CVaR-constrained optimization problems. In addition, we show that the proposed computational enhancements can be adapted to cut generation problems for multivariate polyhedral SSD-constrained optimization. We give the convex hull description of a polytope describing the linearization of a non-convex substructure arising in the SSD cut generation problem for each benchmark realization. However, these inequalities need to be further strengthened to improve their practical performance. One possible area of future research is to study the intersection of these linearization polytopes for two or more different realizations of the random vector of interest.