1 Introduction

1.1 The robust model

In the last decade, much progress has been made in the field of robust linear optimization, that is, in finding worst-case solutions under uncertain side conditions. A wide spectrum of models and methods has been proposed. Recent developments in theory and practice are reviewed by Gabrel et al. (2014). For a systematized collection of most significant ones, the reader is referred to Bertsimas et al. (2011). An important part of the literature uses risk measures to quantify the uncertainty of violations of side conditions; see, e.g., Mosler and Bazovkin (2014), Bertsimas and Brown (2009), and Natarajan et al. (2009). Such risk measures are flexible and allow an immediate interpretation. They can be properly selected and tuned to control the relevant sources of uncertainty. Then, essentially, the goal function is optimized under the restriction that the risk of violation stays within acceptable bounds.

The present paper contributes to this strand. We generalize the approach of Mosler and Bazovkin (2014), where a single-constraint optimization was solved, to much more general restrictions. In doing this, for each linear restriction of a given linear program a so-called uncertainty set of parameters is constructed. It consists of all possible values of the unknown coefficients that are acceptable at a specified risk level of constraint violation. We employ multivariate coherent distortion risk measures, which are extensions of the usual univariate coherent distortion risk measures: The uncertainty sets of these measures are convex bodies and come out to coincide with weighted-mean (WM) trimmed regions. WM regions, as recently developed by Dyckerhoff and Mosler (2011), describe a multivariate distribution by regions of different depth (=centrality). They can be exactly calculated in any dimension; see Bazovkin and Mosler (2012).

Various other notions of uncertainty sets have been proposed in the recent literature; a review in the context of portfolio optimization is given in Fabozzi et al. (2010). These approaches define uncertainty sets like confidence sets, describing the uncertainty in parameters of parametric distributional assumptions [e.g., Delage and Ye (2010)], or by special functionals [e.g., \(\varphi \)-divergences in Ben-Tal et al. (2013)]. In our approach we avoid such assumptions and represent the uncertainty of constraint violations in a purely nonparametric way, viz. by depth-based central regions. To obtain a numerical solution of a given optimization problem, we propose an algorithm that employs the well developed geometric machinery of central regions calculation.

Originally, we consider the following linear program,

$$\begin{aligned} {\mathbf {c}}^\prime {\mathbf {x}}\longrightarrow \min \quad s.t. \ \tilde{\mathbf {A}}{\mathbf {x}}\ge \tilde{\mathbf {b}}\,, \end{aligned}$$
(1)

and assume that \(\tilde{\mathbf {A}}\) is an \(m\times d\) matrix having stochastic entries and that \(\tilde{\mathbf {b}}\in \mathbb {R}^m\) may be stochastic, too. From now on, we will mark random variables with a tilde: For example, \(\tilde{\mathbf {b}}\) will denote the stochastic right-hand side vector, while \({\mathbf {b}}\) is used for a deterministic value of it.

The linear program (1) is called a stochastic linear optimization problem, or stochastic linear program (SLP). Of course, whether the stochastic side condition is satisfied or not for some \({\mathbf {x}}\), depends on the realizations of \(\tilde{\mathbf {A}}\) and \(\tilde{\mathbf {b}}\). Thus an optimal solution to (1) becomes a random vector itself. One possibility to cope with this randomness is to quantify the degree of violation of the side condition by some risk measure and to substitute the condition by a bound on the measure.

Risk measures are widely used in financial mathematics, where they assess the ex-ante risk of a financial position [see, e.g., Föllmer and Schied (2004)]. The uncertain position is generally modeled by a random vector \({\mathbf {Y}}\). A univariate risk measure basically provides the value \({\mathbf {c}}\) of a monetary deposit that, being added to the given uncertain position \({\mathbf {Y}}\), cancels its risk. The latter means that the location of the distribution of \({\mathbf {Y}}+{\mathbf {c}}\) satisfies certain requirements that are set by, say, a regulator. Similarly, an \(m\)-variate risk measure maps an \(m\)-variate random vector to a deterministic \(m\)-vector of monetary deposits.

For example, if \({\mathbf {Y}}\) is univariate, the regulator may require that some \(\alpha \)-quantile \(Q_Y(\alpha )\) of \({\mathbf {Y}}\) be non-negative. If we add the constant \(-Q_Y(\alpha )\) to \({\mathbf {Y}}\), which can be interpreted as an insurance deposit to the distribution, we make the condition hold. In other words, only the worst \(\alpha \cdot 100\,\%\) of outcomes of the insured position are expected to be negative. This risk measure is called the value-at-risk at \(\alpha \) (V@R\(_\alpha \)) and widely used in finance. Actually, many risk measures have been proposed and investigated in the literature, each controlling a particular aspect of the outcome distribution. Another popular univariate risk measure is the expected shortfall. Others are the expected minimum and the entropic risk measure. These measures easily transfer from financial losses to losses incurred by violating the side conditions of a general stochastic program.

In what follows, the stochastic side condition in (1) is handled by means of risk constraints,

$$\begin{aligned}&\rho ^m(\tilde{\mathbf {A}}{\mathbf {x}}-{\mathbf {b}}) \le {\mathbf {0}}\,, \text {resp.} \end{aligned}$$
(2)
$$\begin{aligned}&\rho ^1(\tilde{\mathbf {A}}_j {\mathbf {x}}-b_j) \le 0\,, j=1\dots m\,, \end{aligned}$$
(3)

where \(\tilde{\mathbf {A}}_j\) denotes the \(j\)-th row of \(\tilde{\mathbf {A}}\), and \(\rho ^m\) is a risk measure taking values in \(\mathbb {R}^m\), \(m\ge 1\). An SLP that minimizes \({\mathbf {c}}'{\mathbf {x}}\) subject to the restrictions (2) or (3) is called a risk-constrained stochastic linear program.

Alternatively, in place of a risk constraint, the probability of satisfying all restrictions may be controlled by a joint chance constraint,

$$\begin{aligned} \text {Prob}[\tilde{\mathbf {A}}{\mathbf {x}}- {\mathbf {b}}\ge {\mathbf {0}}] \ge 1-\alpha \,. \end{aligned}$$
(4)

Limiting the violation probability by some fixed \(\alpha \) the well-known chance-constrained linear program is obtained. Sometimes, neglecting eventual stochastic dependencies among the constraints, separate chance constraints,

$$\begin{aligned} \text {Prob}[\tilde{\mathbf {A}}_j {\mathbf {x}}- b_j\ge 0] \ge 1 - \alpha _j\,, \quad j=1\dots m\,, \end{aligned}$$
(5)

are considered. Here, a chance constraint limits the risk of each single side condition by imposing a maximum probability \(\alpha _j\) on its violation.

Obviously, when using the constraints (2) or (3), the choice of the risk measure is a crucial point. In this paper we consider no single risk measure but a comprehensive class of such measures, the coherent distortion risk measures.

Definition 1

(Coherent distortion risk measure) Let \(r:[0,1]\rightarrow [0,1]\) be a given function that is increasing and concave with \(r(0)=0\) and \(r(1)=1\). For any univariate random variable \(Y\) define

$$\begin{aligned} \rho (Y)= - \int _0^1 Q_{Y}(t) dr(t)\,, \end{aligned}$$
(6)

provided the integral exists. Here again, \(Q_Y\) denotes the quantile function of \(Y\). Then \(\rho \) is mentioned as a coherent distortion risk measure, and \(r\) as its weight generating function.

If \(Y\) has an empirical distribution on the ordered values \(y_{(1)}, y_{(2)},\dots , y_{(n)}\), this definition becomes

$$\begin{aligned} \rho (Y)= - \sum _{i=1}^n w_i\, y_{(i)}\,, \end{aligned}$$
(7)

with weights \(w_i=r\big (\frac{i}{n}\big )-r\big (\frac{i-1}{n}\big )\) decreasing in \(i\). Coherent distortion risk measures possess certain desired properties: monotonicity, translation invariance, law invariance, positive homogeneity and subadditivity. The last two properties imply a most important postulate of risk measurement: coherence, that means that the risk measure decreases with diversification [cf. Delbaen (2002)]. An example of a coherent distortion risk measure is the expected shortfall, which is defined by choosing

$$\begin{aligned} r(t)= \left\{ \begin{array}{ll} t/\alpha &{}\quad \hbox {if}\, 0\le t<\alpha , \\ 1 &{} \quad \hbox {if}\, \alpha \le t\le 1. \end{array}\right. \end{aligned}$$
(8)

Note that the value-at-risk V@R\(_\alpha \) satisfies (6) with \(r(t)=0\) for \(t<\alpha \) and \(r(t)=1\) otherwise, which is no concave function. The V@R\(_\alpha \) is an example of a non-coherent distortion risk measure.

In a more general context, a risk measure can be regarded as a quality measure [cf. Kall and Mayer (2010)]. Our choice of the quality measure, besides its generality, possesses a clear interpretation and always generates a convex program. Later we will demonstrate that our approach in fact suites an even more general robust program that not only copes with linear stochastic restrictions, but also with those of a robust polyhedral type, which include robust conic restrictions as a special case.

In applications, information about the stochastic parameters has to be extracted from data. Here we assume that a sample of coefficient matrices \({{\mathbf {A}}^1,\ldots ,{\mathbf {A}}^n \in \mathbb {R}^{m\times d}}\) has been observed and the solution of the SLP is based on this data. The data is mentioned as an empirical distribution giving equal mass \(\frac{1}{n}\) to \({\mathbf {A}}^1,\ldots ,{\mathbf {A}}^n\), and with this data the SLP is named an empirical risk-constrained SLP. Similarly, when also the right hand side is stochastic, a joint sample of \((\tilde{\mathbf {A}}, \tilde{\mathbf {b}})\) is considered.

The next subsection outlines the single-constraint algorithm of Mosler and Bazovkin (2014) and provides preliminaries for its extension.

1.2 The single-constraint problem

In case of a single stochastic constraint (\(m=1\)) the SLP (1) specializes to

$$\begin{aligned} {\mathbf {c}}^\prime {\mathbf {x}}\longrightarrow \min \quad s.t. \; \tilde{\mathbf {a}}{\mathbf {x}}\ge b, \end{aligned}$$
(9)

where \(\tilde{\mathbf {a}}\) is a random vector of parameters in \(\mathbb {R}^d\). To solve (9), an uncertainty set \(\mathcal{U}\) is constructed that yields a robust linear program:

$$\begin{aligned} {\mathbf {c}}^\prime {\mathbf {x}}\longrightarrow \min \quad s.t. \; {\mathbf {a}}'{\mathbf {x}}\ge b\; \text {for all } {\mathbf {a}}\in \mathcal{U}. \end{aligned}$$
(10)

The uncertainty set \(\mathcal{U}\) consists of all parameters that are taken into account by the decision maker to make his or her risk acceptable. It was found that the solution of (10) corresponds to a parameter \({\mathbf {a}}\) that lies on the surface of \(\mathcal{U}\).

To transform (9) into (10) we have to construct the uncertainty set \(\mathcal{U}\), depending on the data and the risk measure. For a distortion risk measure \(\rho \) on an empirical distribution it turns out that the single restriction (3) can be equivalently formulated as the restriction in (10) with \(\mathcal{U}\) being a so-called weighted-mean region whose weights depend on \(\rho \).

For an empirical distribution on \({\mathbf {a}}^1, \ldots , {\mathbf {a}}^n\in \mathbb {R}^d\), a WM region is a polytope in \(\mathbb {R}^d\) and defined as

$$\begin{aligned} D_{{\mathbf {w}}_\alpha }({\mathbf {a}}^1,\ldots ,{\mathbf {a}}^n)={{\mathrm{\,conv}}}\left\{ \sum _{j=1}^n w_{\alpha , j}{\mathbf {a}}^{\pi (j)}\,:\, \pi \, \text {permutation of}\, \{1,\dots ,n\}\,\right\} . \end{aligned}$$
(11)

Here \({\mathbf {w}}_\alpha =[w_{\alpha ,1}, \dots , w_{\alpha ,n}]'\) is a vector of ordered non-negative weights that add up to 1. Additionally, for all \(k=1,\dots ,n\), the sum \(\sum _{j=1}^kw_{\alpha ,j}\) has to increase with \(\alpha \). Any such family of weight vectors \(\{{\mathbf {w}}_\alpha \}_{0\le \alpha \le 1}\) defines a notion of WM regions. There are many notions of weighted-mean trimmed regions, among them well-known families like the zonoid regions and the expected convex hull regions. For example, the zonoid regions are given by

$$\begin{aligned} w_{\alpha ,j}=\left\{ \begin{array}{c@{\quad }l} \frac{1}{n\alpha }\,&{}\text {if}\; j>n-\lfloor n\alpha \rfloor , \\ \frac{n\alpha -\lfloor n\alpha \rfloor }{n\alpha }\,&{}\text {if}\; j=n-\lfloor n\alpha \rfloor , \\ 0\,&{}\quad \text {if}\; j<n-\lfloor n\alpha \rfloor , \end{array}\right. \end{aligned}$$

\(0\le \alpha \le 1\). For details on WM regions and their extension to general probability measures, see Dyckerhoff and Mosler (2011) and Dyckerhoff and Mosler (2012). The connection between univariate coherent distortion risk measures and WM regions is established in the following Proposition:

Proposition 1

(Mosler and Bazovkin (2014)) Let \(\rho \) be a univariate coherent distortion risk measure and let the random vector \(\tilde{\mathbf {a}}\) have an empirical distribution on \({\mathbf {a}}^1,\dots ,{\mathbf {a}}^n \in \mathbb {R}^d\). Then there exists a weight vector \({\mathbf {w}}_\alpha \) such that \(D_{{\mathbf {w}}_\alpha }\) is a WM region and it holds

$$\begin{aligned}&\{{\mathbf {x}}\in \mathbb {R}^d\,|\, \rho (\tilde{{\mathbf {a}}}^\prime {\mathbf {x}}-b)\le 0\} \end{aligned}$$
(12)
$$\begin{aligned}&=\{ {\mathbf {x}}\in \mathbb {R}^d\,|\, {{\mathbf {a}}}^\prime {\mathbf {x}}\ge b\ \forall {\mathbf {a}}\in D_{{\mathbf {w}}_\alpha }({\mathbf {a}}^1,\dots ,{\mathbf {a}}^n) \}. \end{aligned}$$
(13)

Specifically, this Proposition states that the class of univariate restrictions (3) involving coherent distortion risk measures corresponds to weighted-mean trimmed regions in \(\mathbb {R}^d\) as uncertainty sets. Calculating \(\mathcal{U}\) turns out to be computationally feasible due to the direct connection between the uncertainty set and a trimmed region, which can be efficiently determined by the algorithm of Bazovkin and Mosler (2012).

According to (7), the risk measure \(\rho \) defines the vector \({\mathbf {w}}_\alpha \) uniquely and, by this, the trimmed region \(D_{{\mathbf {w}}_\alpha }\). For instance, for the expected shortfall at level \(\alpha \) we generate the the following weights by the formula (8):

$$\begin{aligned} w_{j}=\left\{ \begin{array}{cl} \frac{1}{n\alpha }\,&{}\quad \text {if}\; j<\lceil n\alpha \rceil ,\\ \frac{n\alpha -\lfloor n\alpha \rfloor }{n\alpha }\,&{}\quad \text {if}\; j=\lceil n\alpha \rceil ,\\ 0\,&{}\quad \text {if}\; j>\lceil n\alpha \rceil , \end{array}\right. \end{aligned}$$

and see that the obtained weight vector is equivalent, up to the transposition of components, to the one defining the zonoid regions. Obviously, knowing \(n\) and the parameter \(\alpha \) we can calculate \({\mathbf {w}}_\alpha \) in advance and use it as an input for the algorithm.

The further steps of the algorithm of Mosler and Bazovkin (2014) are essentially standing on the search for an intersection of a ray (in the direction of \({\mathbf {c}}\)) with the uncertainty set \(\mathcal{U}=D_{{\mathbf {w}}_\alpha }\). The solution is characterized by the normal to the facet of \(\mathcal{U}\) that is intersected by the line in direction of the vector \({\mathbf {c}}\).

The goal of this paper is to develop a similar approach to (2) and more general SLPs. Specifically, we

  1. 1.

    generalize the analysis of the single-constraint SLP to multiple risk constraints (\(m\ge 2\));

  2. 2.

    construct a geometric algorithm to solve the multi-constraint problem (if constraints cannot be compensated by each other, i.e. in the unsubstitutability case, the algorithm operates in the same dimension \(d\) as the single-constraint procedure does);

  3. 3.

    extend the robust multi-constraint linear optimization to robust polyhedral optimization (which covers the substitutability case);

  4. 4.

    estimate the uncertainty set and the robust solution consistently.

The further material is organized as follows. The construction of the solution for the multi-constraint SLP is described in Sect. 2. The formal algorithm is given in the subsequent Sect. 3. In the same Section the algorithm is extended to a program with stochastic right-hand side. The consistency of the solution for generally distributed data is proven in Sect. 3.5. Finally, Sect. 4 is devoted to a discussion of the algorithm and an application to supervised learning.

2 Multiple constraints

2.1 A general model

Consider the SLP (1) with \(m\ge 2\) constraints and deterministic right-hand side \({\mathbf {b}}\). We aim at generalizing Proposition 1 and eliminating uncertainty by a robust linear program as follows:

$$\begin{aligned} {\mathbf {c}}^\prime {\mathbf {x}}\longrightarrow \min \quad s.t. \ {\mathbf {A}}{\mathbf {x}}\ge {\mathbf {b}}\;\; \text {for all } {\mathbf {A}}:\delta ({\mathbf {A}})\in \mathcal{U}\,, \end{aligned}$$
(14)

where \(\delta ({\mathbf {A}}) = ({\mathbf {A}}_1,...,{\mathbf {A}}_m)^\prime \in \mathbb {R}^{m\cdot d}\) is the vectorized matrix \({\mathbf {A}}\).

Again, a proper uncertainty set \(\mathcal{U}\) has to be constructed. In doing so, we specify a risk measure that is no single vector but a whole set of such vectors in \(\mathbb {R}^m\). Such a set-valued risk measure is interpreted as the set of deterministic vectors in \(\mathbb {R}^m\) which, when being added to the random variable \(\tilde{\mathbf {A}}{\mathbf {x}}-{\mathbf {b}}\), cannot completely shift its \(\alpha \)-central regionFootnote 1 out of the negative orthant \(\mathbb {R}^d_-\). In other words, which cannot, at the given acceptance level, guarantee to avoid a strictly negative outcome of the shifted random variable.

Cascos and Molchanov (2007) have shown that certain multivariate risk measures correspond to trimmed regions of the considered distribution. In particular, the \(m\)-variate set-valued analogue \(\mu ^{m}\) of a spectral risk measure is defined through a WM region \(D_{{\mathbf {w}}_\alpha }\) in the following way:

$$\begin{aligned} \mu ^{m}(\tilde{\mathbf {A}}{\mathbf {x}}-{\mathbf {b}}) = - \Big (D_{{\mathbf {w}}_\alpha }(\tilde{\mathbf {A}}{\mathbf {x}}-{\mathbf {b}}) \oplus \mathbb {R}^m_{+}\Big )\subset \mathbb {R}^m, \end{aligned}$$
(15)

where \(\oplus \) denotes the Minkowski sum.

We should mention that spectral risk measures are equivalent to coherent distortion risk measures. It is easy to see, that in dimension one \(\mu ^1(\tilde{\mathbf {a}}'{\mathbf {x}}-b)\) is the half-line bounded above by \(\rho ^1(\tilde{\mathbf {a}}'{\mathbf {x}}-b)\), where \(\rho ^1\) is a coherent distortion measure of a univariate risk. Thus, (15) can be regarded as a set-valued extension of a univariate distortion risk measure to multiple dimensions.Footnote 2 In this paper, we use \(\mu \) for denoting set-valued, and \(\rho \) for denoting vector-valued or real-valued measures.

If a linear program (1) has more than one stochastic constraints, we must consider not only that some or all of them may be violated, but also that the degree of violation of a restriction may be offset against that of another restriction. That is, the decision maker, in evaluating a possible solution, may compensate the missing strictness of one constraint by the fact that another constraint or a group of constraints is more strictly satisfied. In this, the values of single constraint satisfaction are regarded as substitutable by the decision maker, and his or her task in selecting a solution includes some kind of diversification regarding the constraints. Note that this possible value compensation between the constraints has nothing to do with a potential stochastic dependency among the parameters of different constraints.

To include the possibility of value substitution in our model, we introduce a multivariate utility (= negative loss) function \(u:\mathbb {R}^m\rightarrow \mathbb {R}\) that evaluates the violations \(v_1,\dots , v_m\) of the \(m\) constraints. Consider \(\mathcal{F} = \{{\mathbf {v}}:u({\mathbf {v}})\ge 0\}\) as the set of admissable violations. If \(u\) is a quasiconcave function, \(\mathcal{F}\) is convex. Later we will specialize \(\mathcal{F}\) to be a convex polyhedron, see (17). The marginals may or may not substitute each other. This fact actually affects the form of \(\mathcal{F}\).

Using \(\mathcal{F}\), we rewrite the joint risk constraint (2) as followsFootnote 3:

$$\begin{aligned} -\mu ^m(\tilde{\mathbf {A}}{\mathbf {x}}-{\mathbf {b}})\subset \mathcal{F}. \end{aligned}$$
(16)

The above is illustrated on Fig. 1 in a simplified representation. There the orange dotted trimmed region \(D_{{\mathbf {w}}_\alpha }(\tilde{\mathbf {A}}{\mathbf {x}}-{\mathbf {b}})\) augmented by the Minkowski sum with \(\mathbb {R}^m_{+}\) (the red dashed lines) forms the set \( -\mu ^m(\tilde{\mathbf {A}}{\mathbf {x}}-{\mathbf {b}})\), which, in turn, should be contained in the green set \(\mathcal{F}\).

Fig. 1
figure 1

Illustration of the measure \(\mu ^m\) on a plane

Generally, the set \(\mu ^m(\tilde{\mathbf {A}}{\mathbf {x}}-{\mathbf {b}})\) is contained in the set of all violation vectors \({\mathbf {v}}\in \mathbb {R}^m\) that are admitted. If substitution between constraints is possible, the level of substitutability may vary from full substitutability to unsubstitutability. It is easy to show, that the first extremal case leads to an equivalent single-constraint SLP.

Full substitutability means that \(u\) is additive, \(u({\mathbf {v}})=\sum _j u_j(v_j)\), and marginal utilities \(u_j\) are linear. In this case we obtain

$$\begin{aligned} \mathcal{F}_\text {sub} = \left\{ {\mathbf {v}}:u({\mathbf {v}})=\sum _j u_j(v_j)=\sum _j{k_j\cdot v_j}= {\mathbf {k}}'{\mathbf {v}}\ge 0\right\} \end{aligned}$$

with some \({\mathbf {k}}\ge {\mathbf {0}}\). This reduces to a problem with a single constraint \(({\mathbf {k}}'\tilde{\mathbf {A}}){\mathbf {x}}\ge {\mathbf {k}}'{\mathbf {b}}\). Here the admissable set \(\mathcal{F}_\text {sub}\) is a halfspace bordered by the hyperplane passing the origin and having normal \({\mathbf {k}}\). In the second extreme case (unsubstitutability) the admissable set is the positive orthant, \(\mathcal{F}_\text {unsub}=\mathbb {R}^m_+\). Solving the SLP in this case will be the subject of Sect. 3. In the intermediate case of the partial substitutability \(\mathcal{F}\) is a set lying in \(\mathcal{F}_\text {sub}\) and containing \(\mathcal{F}_\text {unsub}\).

To sum up, for obtaining a general solution of the multi-constraint SLP we have to consider different levels of substitutability among the violation of constraints. It turns out that the general substitutability case can be reduced to unsubstitutability via a special transformation of the model. In the next subsection we will define the transformation and demonstrate this fact. After that, to manage the complete task, we solve the SLP with unsubstitutability.

2.2 The general substitutability case

Now we consider the case that violations of constraints can be balanced against each other. Let us assume that the set \(\mathcal{F}\) of feasible violations is a convex polyhedron being characterized by \(K\) linear inequalities,

$$\begin{aligned} \mathcal{F} = \left\{ {\mathbf {y}}\in \mathbb {R}^m:{\mathbf {p}}_k'{\mathbf {y}}\ge d_k, \quad k=1\ldots K\right\} \end{aligned}$$
(17)

with some \({\mathbf {p}}_1,\ldots ,{\mathbf {p}}_K\in \mathbb {R}^m_+\) and \(d_1,\dots ,d_K\in \mathbb {R}\), that is, \(\mathcal{F}\) is an upper convex polytope. Obviously, this assumption includes the extreme cases of full substitutability and unsubstitutability, and can be seen as an approximation for the intermediate cases.

Proposition 2

Let \(\rho ^1(Z)\) denote the upper border of the halfline \(\mu ^1(Z)\subset \mathbb {R}\). Then it holds: \( -\mu ^m(\tilde{\mathbf {A}}{\mathbf {x}}-{\mathbf {b}}) \subset \mathcal{F}\) if and only if

$$\begin{aligned} \rho ^1({\mathbf {p}}_k'\tilde{\mathbf {A}}{\mathbf {x}}-{\mathbf {p}}_k'{\mathbf {b}})\le -d_k\,, \quad \text {for all} \;\; k=1\dots K. \end{aligned}$$
(18)

Proof

Let \(h^m_S\) denote the support function.Footnote 4 For any \({\mathbf {y}}\in {-\mu ^{m}(\tilde{\mathbf {A}}{\mathbf {x}}-{\mathbf {b}})}\) and \(k=1\dots K\) we obtain:

$$\begin{aligned} {\mathbf {p}}_k'{\mathbf {y}}&\ge \min _{{\mathbf {z}}\in -\mu ^{m}(\tilde{\mathbf {A}}{\mathbf {x}}-{\mathbf {b}})} \{{\mathbf {p}}_k'\cdot {\mathbf {z}}\}\\&= h^m_{-\mu ^{m}(\tilde{\mathbf {A}}{\mathbf {x}}-{\mathbf {b}})}(-{\mathbf {p}}_k)\\&= h^1_{{\mathbf {p}}_k' \cdot \mu ^{m}(\tilde{\mathbf {A}}{\mathbf {x}}-{\mathbf {b}})}(1) = h^1_{\mu ^{1}({\mathbf {p}}_k'\tilde{\mathbf {A}}{\mathbf {x}}-{\mathbf {p}}_k'{\mathbf {b}})}(1)\\&= - \rho ^1({\mathbf {p}}_k'\tilde{\mathbf {A}}{\mathbf {x}}-{\mathbf {p}}_k'{\mathbf {b}})\,. \end{aligned}$$

Consequently, we have \({\mathbf {y}}\in \mathcal{F}\) if \(- \rho ^1({\mathbf {p}}_k'\tilde{\mathbf {A}}{\mathbf {x}}-{\mathbf {p}}_k'{\mathbf {b}})\ge d_k\,.\) Hence the “if” part of the proposition is proved. On the other hand, there exists some \({\mathbf {y}}\in -\mu ^m(\tilde{\mathbf {A}}{\mathbf {x}}-{\mathbf {b}})\) so that the first-line inequality is met with equality. Hence the “only if” part holds, too. \(\square \)

Proposition 2 leads to the following Theorem, which for every general model provides an equivalent unsubstitutability model:

Theorem 1

The SLP with the \(m\)-fold risk constraint (16) and \(\mathcal{F}\) defined by (17) is equivalent to an SLP with \(K\) unsubstitutable constraints

$$\begin{aligned} {\mathbf {p}}_k'\tilde{\mathbf {A}}{\mathbf {x}}\ge {\mathbf {p}}_k'{\mathbf {b}}\quad \text {for} \;\; k=1\dots K\,. \end{aligned}$$
(19)

Proof

According to Proposition 2, the SLP with joint risk constraint (16) is equivalent to the SLP with constraints (18). On the other hand, an SLP with constraints (19) produces the same risk constraints. \(\square \)

Theorem 1 enables us to determine a generalized uncertainty set in the multi-constraint case. Notice that we can discriminate between the \(K\) constraints by choosing different distortion risk measures \(\rho ^1\) for each of them. Keeping in mind the one-to-one correspondence (7) between such a measure and a weight vector \({\mathbf {w}}_{\alpha }\), we may use a weight vector \({\mathbf {w}}_{\alpha _j}\) for the \(j\)-th constraint. For example, it can be the expected shortfall with proper \(\alpha _1,\dots ,\alpha _K\) chosen for each constraint separately. However, clearly, we are not limited to using only the expected shortfall for all constraints.

Corollary 1 The uncertainty set \(\mathcal{U}\) of the matrix \(\tilde{\mathbf {A}}\) in an SLP with risk constraint (16) and violations admissable set \(\mathcal{F}\)  (17) equals

$$\begin{aligned} \mathcal{U}= \left\{ {\mathbf {A}}:\delta ([{\mathbf {p}}_k'{\mathbf {A}}]_{k=1\dots K})\in \mathsf{X}_{j=1..K} D_{{\mathbf {w}}_{\alpha _j}}({\mathbf {p}}_j'\tilde{\mathbf {A}})\right\} \,. \end{aligned}$$
(20)

Proof

Here X denotes the \(K\)-fold Cartesian product of WM regions. Due to Theorem 1, we can transform the general model into a model with unsubstitutability. In this case single uncertainty sets will not affect each other. In turn, each single uncertainty set is calculated as for the single-constraint SLP. \(\square \)

2.3 The equivalent unsubstitutability case

We see that \(\mathcal{U}\) splits up into \(K\) parts, which can be calculated individually. It leads to the following key representation theorem:

Theorem 2

The SLP (1) with admissable set \(\mathcal{F}\) satisfying (17) and joint risk constraint (16) is equivalent to the following problem:

$$\begin{aligned} {\mathbf {c}}^\prime {\mathbf {x}}\longrightarrow \min \quad s.t. \ {\mathbf {A}}{\mathbf {x}}\ge {\mathbf {b}}\;\; \text {for all } {\mathbf {A}}\in \mathcal{U}, \end{aligned}$$
(21)

where \(\mathcal{U}\) is defined as in Corollary 1.

Proof

Follows from Theorem 1, Corollary 1 and Proposition 1. \(\square \)

Note that if all \(d_k\) equal zero, we get the general risk-constrained robust conic program. Besides this, again, note that unsubstitutability does not imply stochastic independence of the constraints.

3 Unsubstitutable violations risks: the optimal solution

3.1 Generalizing the single-constraint approach

By Theorem 1 the general risk-constrained SLP is reduced to an SLP with unsubstitutable constraints. Therefore, it suffices to construct an algorithm for solving the latter SLP. Moreover, Theorem 2 reformulates the stochastic constraints in terms of an uncertainty set.

In this subsection we pursue the following idea: We want to reformulate our problem in a way that makes it similar to the single-constraint case. In doing so, we first define a convolution set that will play the same role as the uncertainty set in the single-constraint SLP algorithm. Then, we show how to construct an equivalent to the optimization line. Having proved the equivalence of the elements, we are able to formulate the generalization of the algorithm to the multi-constraint case.

Let us write \(\mathcal{X}_i\) for the feasible set generated by the constraint \(i\), \(i=1\dots K,\) and \(\mathcal{X}\) for the common feasible set,

$$\begin{aligned} \mathcal{X} = \bigcap _{i=1}^{K}{\mathcal{X}_i}. \end{aligned}$$
(22)

We aim at solving the SLP by a geometric procedure in the parameter space. For each constraint, the parameter space has dimension \(d\), while the uncertainty set \(\mathcal{U}\) lives in \(\mathbb {R}^{d\cdot K}\). We will construct a set \(\mathcal{G}\subset \mathbb {R}^d\) such that \(\mathcal{X}\) can be rewritten as follows:

$$\begin{aligned} \mathcal{X}=\left\{ {\mathbf {x}}\in \mathbb {R}^d: {\mathbf {A}}{\mathbf {x}}\ge {\mathbf {b}}\; \text {whenever}\; \delta ({\mathbf {A}})\in \mathcal{U}\right\} = \left\{ {\mathbf {x}}\in \mathbb {R}^d: {\mathbf {a}}'{\mathbf {x}}\ge 1\;\forall {\mathbf {a}}\in \mathcal{G}\right\} . \end{aligned}$$
(23)

\(\mathcal{G}\) is obtained by convolving the general uncertainty set \(\mathcal{U}\) into \(\mathbb {R}^d\); we will call \(\mathcal{G}\) the parameter convolution set. \(\mathcal{U}\) is then decomposed into the sets \(\mathcal{U}_i, i=1,\dots , K\), that can be separately calculated. This dimension reducing construction is possible as the constraints are not substitutable.

The proper way here is to represent \(\mathcal{G}\) as an image of \(\mathcal{X}\) in the parameter space. According to (22), all \(\mathcal{X}_i\) are combined in one space. Combining \(\mathcal{U}_i\) in the parameter space becomes possible if any parameter \({\mathbf {a}}\) contained by other uncertainty sets corresponds to the same \(X_{\mathbf {a}}=\{{\mathbf {x}}:{\mathbf {a}}'{\mathbf {x}}\ge b\}\) in each case. This condition holds if the right-hand sides \(b_i\) are the same for all constraints. If \({\mathbf {b}}> {\mathbf {0}}\), we multiply all sets \(\mathcal{U}_i\) with \(\frac{1}{b_i}\) and obtain \(b=1\), without changing the set of feasible solutions. If \({\mathbf {b}}\ngtr {\mathbf {0}}\) we apply the transform (26), which is described in the next subsection.

\(\mathcal{G}\) contains the union of \(\frac{1}{b_i}\mathcal{U}_{i},{i=1\dots K}\). Thus, according to (22) and similar to Lemma 1 in Mosler and Bazovkin (2014):

$$\begin{aligned} \mathcal{X} = \bigcap _{{\mathbf {a}}\in \mathcal{G}}{X_{\mathbf {a}}} = \bigcap _{{\mathbf {a}}\in {{\mathrm{\,ext}}}\mathcal{G}}{X_{\mathbf {a}}}, \end{aligned}$$

where \({{\mathrm{\,ext}}}\mathcal{G}\) denotes the set of extreme points of \(\mathcal{G}\), which for a convex body corresponds to its set of vertices.

This proves that \(\mathcal{G}\) has similar properties as the uncertainty set of the single-constraint SLP. In particular, any convex combination of two points in \(\mathcal{G}\) belongs to \(\mathcal{G}\). We obtain:

$$\begin{aligned} \mathcal{G} = \mathrm{conv }\left\{ \bigcup _{i=1}^K\frac{1}{b_i}\cdot \mathcal{U}_{i}\right\} . \end{aligned}$$
(24)

This transformation of \(\mathcal{X}\) is familiar to polar duality [see, e.g., Grünbaum (2003)]. In our robust problem we need no explicit representation of \(\mathcal{X}\) and profit from the ready machinery of WM regions that allows the direct construction of \(\mathcal{G}\) in the parameter space.

The next step is constructing the optimization line, which is equivalent to the single-constraint case and is the line passing through the origin in direction \({\mathbf {c}}\). Actually, \(\varphi \) is the locus of points that are dual to hyperplanes with the normal \({\mathbf {c}}\). However, combining the \(\mathcal{U}_i\) in one parameter space requires individual transforms of each constraint’s parameter spaces, thus resulting in different \(\varphi _i\). Observe that

  • \(\varphi _i \equiv s \cdot \varphi _i\) for any \(s\ne 0\),

  • \(\varphi _i \equiv \varphi \) for all \(i\).

Hence all lines \(\varphi \) coincide. Further, they are invariant to the affine transform in (24). Therefore, the search of the optimum on \(\mathcal{G}\) equals the search on \(\mathcal{U}\) in the single-constraint SLP.

In concluding this subsection, we turn to the deterministic case, where each \(\mathcal{U}_{i}\) degenerates to a one-point-set \(\{{\mathbf {a}}_i\}\).Footnote 5 The steps of our procedure stay the same but allow some simplifications. For example, calculating \(\mathcal{G}\) according to (24) reduces to the simple quickhull routine.Footnote 6

Fig. 2
figure 2

Alternative to the simplex algorithm

In many practical tasks one has to consider some additional deterministic constraints, in particular, those of nonnegativity type. In such a setting, a group of trivial constraints \(x_k\ge 0, k\in \mathcal{J}\), is mapped to a finite cloud of points in the parameter space without calculating the WM region. The latter implies that such constraints do not significantly influence the algorithm’s computational complexity.

3.2 Relaxing the right-hand side

In this subsection we show that the model (1) of the SLP also covers the case of a random coefficient vector \({\mathbf {b}}\), denoted by \(\tilde{\mathbf {b}}\), as the restriction \(\tilde{\mathbf {A}}{\mathbf {x}}\ge \tilde{\mathbf {b}}\) is equivalent to

$$\begin{aligned} \left[ \begin{array}{ll} \tilde{\mathbf {A}}&\quad {\mathbf {1}}-\tilde{\mathbf {b}}\end{array} \right] \left[ \begin{array}{l} {\mathbf {x}}\\ 1 \end{array} \right] \ge {\mathbf {1}}, \end{aligned}$$
(25)

where \({\mathbf {1}}\) is a \(K\)-vector of ones. By this we obtain an SLP with deterministic right-hand side equal to \({\mathbf {1}}\). Now the solution vector has \(d+1\) components, the last of which is fixed to \(1\). Geometrically the solution is searched at the intersection of the feasible set with the \(d\)-dimensional hyperplane \(\{({\mathbf {x}}',x_{d+1})'\in \mathbb {R}^{d+1}:x_{d+1}=1\}\), that is, the task is actually solved on a convex polytope of affine dimension \(d\).

Formally speaking, we have to solve an SLP of the form (1) with an additional equality constraint. For this we propose a similar geometrical approach. The idea is to relax the equality to a regular inequality constraint, thus obtaining the SLP in canonical form. The vector \({\mathbf {c}}\) is accordingly modified.

First, we replace \({\mathbf {c}}\) with the vector \(\bar{{\mathbf {c}}}\) that equals the normal of the facet lying on the hyperplane \(\{({\mathbf {x}}',1)'\in \mathbb {R}^{d+1}\}\). By this, we get a non-unique solution, which also contains the sought-for optimum \({\mathbf {x}}^*\). Looking simultaneously to the parameter space, we see an intersection of \(\varphi \) with \(\mathcal{G}\) (which will be defined in the next section) at the point \((0,\dots 0,1)^\prime \). To get rid of this non-uniqueness, we perform a small rotation of \(\bar{{\mathbf {c}}}\) towards \({\mathbf {x}}^*\). The latter amounts to rotating \(\bar{{\mathbf {c}}}\) towards \(({\mathbf {c}}^\prime ,0)^\prime \), that is, to some new vector \({{\mathbf {c}}}_\epsilon =(1-\epsilon )\bar{{\mathbf {c}}}+\epsilon \cdot ({\mathbf {c}}^\prime ,0)^\prime \) with some \(0<\epsilon < 1\).

Proposition 3

Let \(\tilde{\mathbf {b}}\) be a stochastic vector of dimension \(K\). Then (1) is equivalent to the following model: There exists \(\bar{\epsilon }>0\) such that

$$\begin{aligned} {{\mathbf {c}}}_\epsilon ^\prime \left[ \begin{array}{l} {\mathbf {x}}\\ x_{d+1} \end{array}\right] \longrightarrow \min \end{aligned}$$
(26a)
$$\begin{aligned} s.t. \ \left[ \begin{array}{ll} \tilde{\mathbf {A}}&{}\quad {\mathbf {1}}-\tilde{\mathbf {b}}\\ \mathbf {0}^\prime &{}\quad 1 \end{array} \right] \left[ \begin{array}{l} {\mathbf {x}}\\ x_{d+1} \end{array} \right] \ge {\mathbf {1}}, \end{aligned}$$
(26b)

where \({{\mathbf {c}}}_\epsilon {=}(1-\epsilon )(0,\dots 0,1)^\prime +\epsilon \cdot ({\mathbf {c}}^\prime ,0)^\prime \), \(0<\epsilon < \bar{\epsilon }\).

Proof

The constraints of (1) are equivalent to (25). Obviously, if \(x_{d+1}=1\) holds, (26b) is also equivalent to (25). To show that there exists such a small \(\epsilon > 0\) that the last inequality in (26b) necessarily turns into an equality and, as a consequence, (26b) transforms into (25), we first set \(\epsilon \) to zero. The corresponding program has a trivial non-unique solution, because the vector \({{\mathbf {c}}}_\epsilon \) equals \(\bar{{\mathbf {c}}}\), which was shown above to pick a whole facet of the feasible set for a solution.

Some small rotation of \({{\mathbf {c}}}_\epsilon \) shifts the solution to the border of this facet, however remaining on it. This rotation is given by setting \(\epsilon >0\), i.e. combining the initial vector with the augmented vector \({\mathbf {c}}\), namely \(({\mathbf {c}}^\prime ,0)^\prime \). Both facts together guarantee that \(x_{d+1}\) is fixed to \(1\) while optimizing (26) with \({{\mathbf {c}}}_\epsilon \). In this situation, (26b) reduces to (25). Moreover, the objective function turns into \(\epsilon {\mathbf {c}}'{\mathbf {x}}+1-\epsilon \), which is, up to a constant, equivalent to the objective of (1). This proves the equivalence of (26) and (1). \(\square \)

Let us now imagine the procedure shown above in the parameter space (see Fig. 3). The additional \(\mathcal{U}_{K+1}\) is just a point \((0,\dots 0,1)^\prime \). The virtual optimization vector \({{\mathbf {c}}}_\epsilon \) generates a line \({\varphi }_\epsilon = \epsilon \cdot \varphi + (1-\epsilon )\cdot \{{\mathbf {x}}: x_j=0, j=1\dots d\}\), that is, an equivalent affine combination of \(\varphi \) and an axis passing through the point \((0,\dots 0,1)^\prime \). \({\varphi }_\epsilon \) intersects necessarily the cone (part of the surface of \(\mathcal{G}\)) having \((0,\dots 0,1)^\prime \) as its apex. The respective facet of \({\mathcal{G}}\) determines the optimum \({\mathbf {x}}^*\) similar to step (D.c.) of the algorithm in Sect. 3.3 below.

Fig. 3
figure 3

Adding a dimension

3.3 The algorithm

Prerequisites:

In solving the general robust polyhedral optimization problem with an arbitrary \({\mathcal{F}}\), we first modify our set of constraints and the vector \({\mathbf {b}}\) according to Theorem 1, thus obtaining the \(K\times d\) matrix \(\tilde{\mathbf {A}}\) and the \(K\)-dimensional vector \({\mathbf {b}}\).

Without loss of generality, the algorithm is applied to a minimization problem with all constraints of the same type “\(\ge \)”. If either \({\mathbf {b}}\notin \mathbb {R}_+^K\) or \({\mathbf {b}}\) is stochastic, the pretransformation of Proposition 3 should be applied first. In the sequel both modifications (including that for stochastic right-hand side \(\tilde{\mathbf {b}}\)) are assumed to be done if necessary. Hence, we have to solve an SLP of form (26),

$$\begin{aligned}&{{\mathbf {c}}}_\epsilon ^\prime \left[ \begin{array}{r} {\mathbf {x}}\\ x_{d+1} \end{array} \right] \longrightarrow \min \\&s.t. \ \left[ \begin{array}{ll} \tilde{\mathbf {A}}&{}\quad {\mathbf {1}}-\tilde{\mathbf {b}}\\ \mathbf {0}^\prime &{}\quad 1 \end{array} \right] \left[ \begin{array}{l} {\mathbf {x}}\\ x_{d+1} \end{array} \right] \ge {\mathbf {1}}\,, \end{aligned}$$

with \({{\mathbf {c}}}_\epsilon {=}(1-\epsilon )(0,\dots 0,1)^\prime +\epsilon \cdot ({\mathbf {c}}^\prime ,0)^\prime \), \(0<\epsilon < \bar{\epsilon }\), where \(\bar{\epsilon }\) is a small positive constant.

Input:

  • a vector \({\mathbf {c}}\in \mathbb {R}^d\) of coefficients of the goal function,

  • a combined sample from \(\tilde{\mathbf {A}}\) and \(\tilde{\mathbf {b}}\) : , \(i=1,\dots ,n\),

  • a distortion risk measure \(\mu ^K\), given by a name or an explicit weight vector.

Output:

  • A part of the convolution set \({\mathcal{G}}\) that includes the optimal solution,

  • the optimal solution \({\mathbf {x}}^*\).

Steps:

  1. A.

    Construct the individual uncertainty sets:

    1. a.

      Determine, by the algorithm of Bazovkin and Mosler (2012), \(\{\mathcal{U}_j\}_{j=1\dots K}\), i.e. the WM regions \(\{D_{{\mathbf {w}}_{\alpha _j}}\}_{j=1\dots K}\) of each random vector having an empirical distribution on , \({j=1\dots K}\).

    2. b.

      Add the uncertainty set for the \((K+1)\)-th deterministic constraint in (26b): \(\mathcal{U}_{K+1}=\{({\mathbf {0}}',1)'\}\).

    3. c.

      For each nonnegativity restriction \(x_j\ge 0\), add \(d\) point-sets \(\mathcal{N}_j=\{({\mathbf {e}}_j',1)'\}\) to the uncertainty sets, where \({\mathbf {e}}_j\) is the \(j\)-th unit vector in \(\mathbb {R}^d\).

  2. B.

    Calculate the convolution set \({\mathcal{G}}\) represented by its facets \(\{({\mathbf {n}}_j;d_j)\}_{j\in G}\):

    1. a.

      Take the representation of \(\{\mathcal{U}_i\}_{i=1\dots K}\) by their vertices and put them into the same space after having rescaled them according to (24).

    2. b.

      Calculate the convex hull of the set using the standard quickhull (Barber et al. 1996) or some divide-and-conquer algorithm (Grünbaum 2003).

  3. C.

    Impose the optimization ordering on the space of parameters, creating the dual representation of the optimization vector \({\mathbf {c}}\). It is a line \(\varphi \) connecting the origin \(({\mathbf {0}}',0)'\) and the point \(\left( \epsilon \cdot {\mathbf {c}}',1-\epsilon \right) '\), with a small \(\epsilon >0\).

  4. D.

    Search for a facet \(H_{j_*}\) of \({\mathcal{G}}\) that is intersected by \(\varphi \) (see Fig. 4). Its dual defines the sought-for optimal solution \({\mathbf {x}}^*\):

    1. a.

      Define a set of facets \(\mathcal{G}_{sel}\) to be analysed: This may be either \({\mathcal{G}}\) itself or its part where the intersection is expected; \(\mathcal{G}_{sel}=\{ ({\mathbf {n}}_j,d_j) : j\in J_{sel}\}\).

    2. b.

      Take some \({\mathbf {u}}=\lambda {\mathbf {c}}, \lambda \ge 0\), outside the augmented \({\mathcal{G}}\). Find the \(j_* = \arg \underset{j}{\max }\left\{ \frac{d_j}{{\mathbf {u}}'{\mathbf {n}}_j}\right\} _{j\in J_{sel}}\).

    3. c.

      \({\mathbf {x}}^* = \frac{1}{d_{j_*}} \cdot {\mathbf {n}}_{j_*}\) is the optimal solution.

    4. d.

      If there is no intersection, the solution is at infinity.

    5. e.

      If \(({\mathbf {0}}',1)'\in \mathcal{G}\), there is no solution.

Fig. 4
figure 4

Obtaining the solution using the convex hull computation routine

For efficient calculation of the intersection of \(\varphi \) with \({\mathcal{G}}\) we can apply an approximative procedure which converges to the precise solution. The procedure finds a facet of some scaled \(\mathcal{U}_i\) that is closest from outside to the sought-for facet of \({\mathcal{G}}\). Obviously, if the facet is part of some uncertainty set, the obtained solution is optimal.

3.4 Complexity

The complexity of the algorithm is firstly determined by the routine for the convex hull. Before analyzing the general algorithm, we take a look at its deterministic counterpart. In solving a deterministic LP we have to calculate the convex hull of \(K+1\) points, which has a worst-case complexity of \(\mathcal{O}\left( K \log K +K^{\left[ \frac{d}{2}\right] }\right) \); see Chazelle (1993). The naive linear search on the set of \(n\) facets has a complexity of \(\mathcal{O}(nd)\). Consequently, the worst-case complexity of our algorithm amounts to \(\mathcal{O}\left( d\cdot K^{\left[ \frac{d}{2}\right] }\right) \), which is polynomial. It is well known [e.g., Borgwardt (2001)] that no variant of the simplex method exists that solves an LP with polynomial complexity. Only ellipsoid and interior-point methods (using randomizations) can achieve polynomial complexity.

For the general algorithm is is very natural to use “divide and conquer” algorithms, which construct the convex hull of the whole data out of convex hulls of subsets of data. Such procedures have best complexities in dimensions 2 and 3, namely \(\mathcal{O}(n\log n)\). For example, the standard quickhull algorithm has complexity between \(\mathcal{O}(n\log n)\) and \(\mathcal{O}(n^2)\), depending on the input. However, to our knowledge, such “divide and conquer” algorithms are available in the literature only for dimensions up to 5 [see, e.g., Buckley (1988)].

In fact, we need not calculate the whole convex hull, because we are interested only in the normal to the hyperplane at the intersection (even the actual facet is not interesting for us). That is why we can substantially reduce the complexity by cutting off the region of the intersection by an ellipsoid or a cylinder having axis \(\varphi \). The worst-case complexity of the general algorithm is also polynomial, however of a high power, which can cause difficulties for large-scale problems.

3.5 Robust SLP for generally distributed coefficients

In the previous sections we assumed the random parameters \(\tilde{\mathbf {A}}\) and \(\tilde{\mathbf {b}}\) to follow an empirical distribution based on observations \(\{{\mathbf {A}}^1,\dots ,{\mathbf {A}}^n\}\) and \(\{{\mathbf {b}}^1,\dots ,{\mathbf {b}}^n\}\). Now we want to consider an SLP (1), where \((\tilde{\mathbf {A}}, \tilde{\mathbf {b}})\) follows a general probability distribution \(P\) and realizations are randomly sampled from this distribution.

Actually, Mosler and Bazovkin (2014) have shown that the individual uncertainty set \(\mathcal{U}_j\) is a consistent estimator of its population counterpart. Convergence is almost surely in the Hausdorff sense, which is based on the law of large numbers for weighted-mean regions [Dyckerhoff and Mosler (2012)]. Here, our general algorithm constructs \(\mathcal{G}_n\) as the convex union of individual uncertainty sets, which, obviously, also converges almost surely in the Hausdorff sense to \({\mathcal{G}}\):

Proposition 4

\(\mathcal{G}_n\) converges to \(\mathcal{G}\) almost surely in the Hausdorff sense.

Also the cutting point, where the line \(\varphi \) hits the convolution set \({\mathcal{G}}\), is consistently estimated by our algorithm. A potential complication lies in the fact that the surface of \(\mathcal{G}\) is, in general, not smooth. That is why the optimal solution \({\mathbf {x}}^*\), which, obviously, is defined by the tangent hyperplane at the cutting point can be ambiguous. However, even in such situations, the algorithm automatically selects a unique facet of \({\mathcal{G}}\) determining \({\mathbf {x}}^*\).

4 Conclusion and an application to supervised learning

A new geometric algorithm is proposed for robust linear optimization under distortion risk constraints. The algorithm constructs an uncertainty set in the parameter space, which measures the risk arising from non-deterministic parameters in the original linear constraints. The randomness may affect the coefficient matrix \({\mathbf {A}}\) as well as the right hand side \({\mathbf {b}}\). In our setting a multivariate coherent distortion risk measure is applied to the joint distribution of the parameters. This results in uncertainty sets for each single constraint, which are so called weighted-mean trimmed regions. A multi-constraint uncertainty set then comes out as the convex hull of the union of rescaled single-constraint uncertainty sets. It is determined by calculating the relevant parts of weighted-mean trimmed regions, which is done by the algorithm of Bazovkin and Mosler (2012). Note that the uncertainty set needs not be determined from a sample; alternatively, it can be introduced explicitly by the optimizer. In this case the algorithm starts with step C.

The algorithm can be applied to multi-constraint as well as single-constraint problems. Also, as a special case, deterministic linear optimization problems are solved by the algorithm. To cope with substitution in evaluating the violation of different constraints, a variant of the model is introduced, which is mentioned as robust polyhedral optimization.

At this point it is reasonable to compare the present framework with that of chance-constrained problems. Such SLPs (5) with individual constraints are extremely critical to distributional assumptions: in most cases the program turns out to be non-convex [see, e.g., Kall and Mayer (2010)] and computationally intractable. Plausible results are recently obtained only for elliptically distributed random coefficients. If we have a joint chance-constraint (4) the difficulty increases. A straight-forward approach, which distributes the common violation probability equally among the individual constraints, tends to give poor results, especially if the constraints are stochastically dependent. An approximative solution based on the Bonferroni inequality has been improved by Chen and Sim (2009) and Chen et al. (2010), who propose more efficient bounds for the individual probabilities of constraint violation using results from order statistics. But altogether these approximative methods still lack strong interpretation and universality. In contrast, our approach leads to a natural decomposition of (15), while remaining jointly constrained. Any stochastic dependency among parameters (and, thus, constraints) is completely feasible.

A powerful flexibilization of our model consists in varying the \(\alpha \)’s in (20), if, for instance, we are using the expected shortfall. This is practical, since some constraints can be more tolerant to violations while others, on the contrary, are rather strict or even exact (e.g., a non-negativity requirement).

We conclude the investigation with an efficient application of our optimization model and algorithm to classification problems. Our procedure can be applied to supervised machine learning as a robust alternative to the support vector machine (SVM). The basic problem is: Two classes of points are given in the Euclidean \(d\)-space \(Q_1=\{{\mathbf {x}}_1,\dots ,{\mathbf {x}}_{n_1}\}\) and \(Q_2=\{{\mathbf {y}}_1,\dots ,{\mathbf {y}}_{n_2}\}\). A rule has to be constructed by which any new point \(x\) is classified to one of those classes \(Q_1\) and \(Q_2\). The classical SVM of Vapnik (1998) determines a hyperplane that discriminates the two classes linearly in a higher-dimensional space and serves as a separator for classifying new points. Technically, this approach results in a convex quadratic program. To tackle the problem in a robust way, mostly methods of replacing each point by its neighbourhood are proposed in the literature; see e.g., Ben-Tal et al. (2009). In contrast, we consider no single points of the training classes as uncertain, but the whole classes, and observe the points as a sample of the random variables defining the classes. Besides having a better interpretation, we obviously can expect getting less constraints. In fact, it turns out that the robust SVM can be represented as a robust linear program with two risk constraints. To achieve these representation, we start with the following linear program:

$$\begin{aligned}&\quad C\rightarrow \max _{\varvec{\omega },C}\\&s.t. \left\{ \begin{aligned} \varvec{\omega }'{\mathbf {x}}_i+b&\ge C,\ \ \quad {\mathbf {x}}_i\in Q_1,\\ \varvec{\omega }'{\mathbf {y}}_j+b&\le -C,\ \ \quad {\mathbf {y}}_j\in Q_2. \end{aligned}\right. \end{aligned}$$
(SVM1)

Generally, the solution of (SVM1) does not coincide with the solution of the usual quadratic program. Like Vapnik’s SVM, (SVM1) produces a central separating hyperplane that lies between the classes. However, it does not necessarily minimize the Euclidean distances between support vectors, as the Euclidean distance is not the only possible criterion here. Note that the Euclidean distance is not easily interpreted when the data have been transformed into a higher-dimensional feature space, as it is done by SVM.

To control the quality of our proposed solution, we rewrite the model (SVM1) as a stochastic linear program:

$$\begin{aligned}&\quad {\mathbf {0}}'\cdot {\mathbf {y}}-1\cdot z \rightarrow \min _{{\mathbf {y}},z}\\&s.t. \left\{ \begin{aligned} \tilde{\mathbf {a}}_1'\cdot {\mathbf {y}}&-1\cdot z \ge&-1,\\ (-\tilde{\mathbf {a}}_2)'\cdot {\mathbf {y}}&-1\cdot z \ge&1\,, \end{aligned}\right. \\&\quad \text {where}\quad \tilde{\mathbf {a}}_1\sim Q_1, \ \tilde{\mathbf {a}}_2\sim Q_2. \end{aligned}$$
(SVM2)

\(\tilde{\mathbf {a}}\sim Q\) means that \(\tilde{\mathbf {a}}\) has an empirical distribution on the finite set \(Q\). Following our approach, we next remove the negative values in \({\mathbf {b}}=(-1\ 1)'\). After applying the transformation of Proposition 3, we get:

$$\begin{aligned}&[\begin{array}{lll} {\mathbf {0}}' \;\;&\epsilon \;\;&1-\epsilon \end{array}]\ (\begin{array}{lll} {\mathbf {y}}' \;\;&z \;\;&y_{d+1} \end{array})' \rightarrow \min _{{\mathbf {y}},z,y_{d+1}}\\&s.t. \left[ \begin{array}{l@{\quad }l@{\quad }l} \tilde{\mathbf {a}}_1'\quad &{} -1\quad &{} 2 \\ -\tilde{\mathbf {a}}_2'\quad &{} -1 \quad &{} 0\\ {\mathbf {0}}' \quad &{} 0 \quad &{} 1 \end{array}\right] \cdot \left( \begin{array}{l} {\mathbf {y}}\\ z\\ y_{d+1} \end{array}\right) \ge {\mathbf {1}}\,, \\&\quad \text {where}\quad \tilde{\mathbf {a}}_1\sim Q_1, \ \tilde{\mathbf {a}}_2\sim Q_2. \end{aligned}$$
(SVM3)

The origin \({\mathbf {0}}\) must not be situated between the classes, otherwise we may obtain an infinite solution. We extend this application as follows: To control the width of the margin we make \({\mathbf {b}}\) stochastic (instead of fixing it at \((-1\ 1)'\)). The more uncertain \({\mathbf {b}}\), the wider is the margin of the separating hyperplane.

A soft margin is introduced as usual; see Vapnik (1998). However, in contrast to the classical approach, the additional margin variable \(\xi \) appears to be particularly natural in our stochastic linear program:

$$\begin{aligned}&\quad {\mathbf {0}}'\cdot {\mathbf {y}}-1\cdot z + M\cdot \xi \rightarrow \min _{{\mathbf {y}},z,\xi }\\&s.t. \left\{ \begin{aligned} \tilde{\mathbf {a}}_1'\cdot {\mathbf {y}}&-1\cdot z +\xi \ge&-1,\\ (-\tilde{\mathbf {a}}_2)'\cdot {\mathbf {y}}&-1\cdot z +\xi \ge&1\,, \end{aligned}\right. \\&\quad \text {where}\quad \tilde{\mathbf {a}}_1\sim Q_1, \ \tilde{\mathbf {a}}_2\sim Q_2. \end{aligned}$$
(SVM-soft)

Our soft-margin model has the advantage that, if we are unsure about the proper class labels of the training points, we can introduce a random coefficient for \(\xi \) that describes the level of certainty in labeling. It is also clear that we can use the kernel trick here, because the inner product and the induced norm are sufficient for all calculations in the algorithm.

Further, our robust optimization model and the new algorithm can be applied in many other fields of operations research. In particular, it is well suited to formalize problems in supply chain management, like the management of an inventory. This will be the topic of future work.