Distributionally robust optimization with decision dependent ambiguity sets

Luo, Fengqiao; Mehrotra, Sanjay

doi:10.1007/s11590-020-01574-3

Distributionally robust optimization with decision dependent ambiguity sets

Original Paper
Open access
Published: 08 April 2020

Volume 14, pages 2565–2594, (2020)
Cite this article

Download PDF

You have full access to this open access article

Optimization Letters Aims and scope Submit manuscript

Distributionally robust optimization with decision dependent ambiguity sets

Download PDF

Fengqiao Luo¹ &
Sanjay Mehrotra¹

7265 Accesses
42 Citations
Explore all metrics

Abstract

We study decision dependent distributionally robust optimization models, where the ambiguity sets of probability distributions can depend on the decision variables. These models arise in situations with endogenous uncertainty. The developed framework includes two-stage decision dependent distributionally robust stochastic programming as a special case. Decision dependent generalizations of five types of ambiguity sets are considered. These sets are based on bounds on moments, covariance matrix, Wasserstein metric, Phi-divergence and Kolmogorov–Smirnov test. For the finite support case, we use linear, conic or Lagrangian duality to give reformulations of these models with a finite number of constraints. Reformulations are also given for the continuous support case for moment, covariance, Wasserstein and Kolmogorov–Smirnov based models. These reformulations allow solutions of such problems using global optimization techniques within the framework of a cutting surface algorithm. The importance of decision dependence in the ambiguity set is demonstrated with the help of a numerical example modeling simultaneous determination of order quantity and selling price for a newsvendor problem.

Distributionally Robust Optimization with Moment Ambiguity Sets

Article Open access 01 December 2022

Data-driven Stochastic Programming with Distributionally Robust Constraints Under Wasserstein Distance: Asymptotic Properties

Article 05 August 2020

Data-driven stochastic optimization for distributional ambiguity with integrated confidence region

Article Open access 01 March 2022

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

1 Introduction

The uncertain characteristics of a system’s performance often depend on its design decisions. This type of uncertainty is called endogenous uncertainty. For example in a newsvendor model product demand function may depend on its selling price [1]. Additional examples of decision problems with endogeneous uncertainty from finance, resource management, process design, and network design are given in Sect. 2. The goal of this paper is to present decision dependent ambiguity frameworks to model problems involving endogenous uncertainty. The main contribution is in showing that the dualization of a certain inner problem continues to be applicable in this more general setting. This dualization has a unique advantage for the problems under consideration. It allows application of algorithms from nonlinear global optimization to solve the resulting reformulations.

Specifically, we study the optimization problems in which the ambiguity set of distributions may depend on the decisions in the following modeling framework:

Here $\varvec{x}$ is the vector of decision variables with the feasible set $X\subseteq {\mathbb {R}}^n$, and $\varvec{\xi }$ is the vector of uncertain model parameters, which is defined on a measurable space $({\varXi },{\mathcal {F}})$; ${\varXi }$ is the support in ${\mathbb {R}}^d$, and ${\mathcal {F}}$ is a $\sigma $-algebra. We may allow $\xi $ to also depend on $\varvec{x}$, i.e., replace $h(\varvec{x},\varvec{\xi })$ in ($\hbox {D}^3\hbox {RO}$) with $h(\varvec{x},\varvec{\xi }(\varvec{x}))$ but we have not done so here for notation simplicity. For a given ${\varvec{x}}$, the ambiguity set ${\mathcal {P}}(\varvec{x})$ of the unknown probability distribution depends on the decision variables ${\varvec{x}}$, and ${\mathcal {P}}({\varvec{x}})\subseteq {\mathcal {P}}({\varXi },{\mathcal {F}})$, where ${\mathcal {P}}({\varXi },{\mathcal {F}})$ is the set of probability distributions defined on $({\varXi },{\mathcal {F}})$. The function $f(\varvec{x})$ is the deterministic part of the objective with no uncertain parameters. Keeping this function in ($\hbox {D}^3\hbox {RO}$) allows us to consider decision models involving two-stage decision making.

We denote the inner problem $\mathop {\text {max}}\nolimits _{P\in {\mathcal {P}}(\varvec{x})}\; {\mathbb {E}}_P[h(\varvec{x},\varvec{\xi })]$ as ($\hbox {D}^3\hbox {RO}$)-inner. Note that if $h(\varvec{x},\xi )$ is a recourse function in a two-stage stochastic program, i.e.,

$$\begin{aligned} h(\varvec{x},\varvec{\xi })=\;\underset{\varvec{y}\in {\mathbb {R}}^q}{\text {min}}\;\; g(\varvec{x},\varvec{y},\varvec{\xi }),\quad \text {s.t.}\, \psi _i({\varvec{x}},\varvec{y},\varvec{\xi })\ge 0 \quad \forall i\in [m], \end{aligned}$$

(1)

where $g(\varvec{x},\varvec{y},\varvec{\xi })$ and $\psi _i({\varvec{x}},\varvec{y},\varvec{\xi })$ are bounded and continuous functions of ${\varvec{x}}$, $\varvec{y}$ and $\varvec{\xi }$, then ($\hbox {D}^3\hbox {RO}$) becomes a two-stage decision dependent distributionally robust stochastic program (TSD$^3$SP), which is an important application of ($\hbox {D}^3\hbox {RO}$). We assume that the minimization problem in (1) is feasible for any ${\varvec{x}}\in X$ and $\varvec{\xi }\in {\varXi }$, and $h({\varvec{x}},\varvec{\xi })$ is finite. In other words, we assume that (D$^3$SP) has complete recourse [2]. Moreover, we assume a certain Slater-type condition for the set ${\mathcal {P}}({\varvec{x}})$, as needed.

The ambiguity set ${\mathcal {P}}({\varvec{x}})$ can be constructed in many different ways. The reformulations given in this chapter are for the decision dependent generalizations of the most common types of ambiguity sets proposed in the distributionally robust optimization literature [3]. In Sect. 3 we investigate the reformulation of ($\hbox {D}^3\hbox {RO}$) for five different possible specifications of ${\mathcal {P}}(\varvec{x})$: (i) the ambiguity sets defined by using component-wise moment inequalities and bounds on the scenario probabilities [4, 5]; (ii) ambiguity sets defined by using the mean vector and covariance matrix inequalities [6]; (iii) ambiguity sets defined by using the Wasserstein metric [7,8,9,10]; (iv) ambiguity sets defined by using $\phi $-divergence [11,12,13,14,15,16]; and (v) the ambiguity sets defined by using the multi-variate Kolmogorov–Smirnov test [17]. The reformulations are given in Sects. 3.1 to 3.5, respectively. The choice of the specific ambiguity set when modeling a problem is context dependent. A choice depends on the data being represented by the set, and the needs of the modeler. Moments based sets might be natural when their estimates are available from a prior statistical analysis. However, Wasserstein-distance based sets have an underlying polyhedral structure that makes them more amenable to algorithm development within the linear programming or linear conic duality frameworks. Total variance based sets specify bounds on the probabilities of each scenario more naturally. Kolmogorov–Smirnov test is commonly used in hypothesis testing that compares probability distributions.

The basic concept used in arriving at the reformulations of ($\hbox {D}^3\hbox {RO}$)-inner is to use linear programming duality or conic duality as needed in the specific settings. Lagrangian duality is used for the situations where considering the saddle point problem appears more suitable.

We note that the computational complexity of the reformulated problem is not the main motivation of this paper. Our goal is towards studying the modeling frameworks that more realistically represent the underlying phenomenon. Here we also do not focus on developing any new (possibly more efficient) algorithms, as that is left for future studies. In general, we refer to the global optimization techniques for solving the non-convex optimization problems resulting from our reformulations [18]. Moreover, to simplify the presentation, we consider the finite support case in the main text. Reformulations (without proof) are also given for the ambiguity sets (i)–(iii) allowing for continuous support. In these cases semi-infinite programming reformulation of the corresponding models are given. These semi-infinite reformulations allow the use of a cutting surface algorithm, given in Sect. 4.1, to solve the semi-infinite problems. The models having first-stage binary variables allow for the use of a reformulation technique that reformulates the product of a binary variable with a continuous variable (Lagrange multiplier). The central idea behind this technique is given in Sect. 4.2. A newsvendor example illustrating the reformulations and the relevance of decision dependent ambiguity is given in Sect. 5. The construction of the decision dependent parameters appearing in the specification of the ambiguity set is discussed briefly when making the concluding remarks.

2 Literature review on optimization with decision dependent uncertainty

The endogenous uncertainty has been considered in dynamic programming [19], stochastic programming [20], robust optimization [21, 22], with applications in financial market modeling [23,24,25], resource management [26], stochastic traffic assignment [27], oil (natural gas) exploration [28,29,30], and robust network design [31, 32].

In the framework of stochastic optimization, the endogenous uncertainty affects the underlying probability distribution and the scenario tree. Jonsbråten et al. [33] studied stochastic programming problems with decision dependent scenario distributions, where the distribution is indexed by a Boolean vector. They provided an implicit enumeration algorithm for solving these problems based on a branch-and-bound scheme. This model and the proposed branch-and-bound method was applied to an optimal selection and sequencing of oil well exploration problem under reservoir capacity uncertainty [28]. Ahmed [31] investigated a class of single stage stochastic programs with discrete candidate probability distributions that are based on Luce’s choice axiom [34]. The decision affects utility functions of the choices, and hence the probability distribution [35]. These types of problems arise from network design and server selection applications [31]. It is shown that stochastic programs of this class can be reformulated as 0-1 hyperbolic programs. Viswanath et al. [32] investigated a two-stage shortest path problem in a stochastic network which arises from disaster relief services. Here the first stage investment decisions can reduce the failure probability of links in the network, and a shortest path is identified based on the post-event network. Held et al. [36] developed a heuristic algorithm to solve a two-stage stochastic network interdiction problem, where the interdictor is the first-stage decision maker whose objective is to maximize the probability that the minimum path length exceeds a certain value after the interdiction. The interdictor’s decision changes the network topology and the uncertainty description. The structure of single-stage stochastic programming with a decision dependent probability distribution was also studied by [37]. Lee et al. [38] investigated a newsvendor model under decision dependent uncertainty, where sequential decisions are made after a re-estimation of the demand distribution. They provide conditions under which the estimation and decision process converges.

Grossmann et al. [20, 30] developed a disjunctive programming reformulation for multistage decision dependent stochastic programs. They investigated this problem with finitely many scenarios, where exogenous and endogenous uncertain parameters are involved. In their models, endogenous parameters are resolved after the operational decisions are made (e.g., a facility is installed or an investment is made). A branch-and-bound algorithm is developed to solve the disjunctive program by branching on the logic variables involved in the disjunctive clauses [20, 39], and a lower bound is obtained at each node by solving a Lagrangian dual sub-problem [40]. More solution strategies for the disjunctive program are given in [41]. This framework is applied to model and solve the offshore oil or gas field infrastructure multi-stage planning problem with uncertainty in estimating parameters that are not immediately realized [29]. The framework is also applied to optimize process network synthesis problems with yield uncertainty that can be reduced by investing in pilot plants [42]. Tarhan et al. [43] developed a computational strategy that combines global optimization and outer-approximation to solve multistage nonlinear mixed-integer programs with decision dependent uncertainty.

Decision dependent uncertainty is also considered in the framework of robust optimization by letting the uncertainty set depend on the decision variables. Spacey et al. [44] studied a problem of minimizing the run time of a computer program by assigning code segments to execution locations where the scheduling of code segment execution depends on the assignment. In robust combinatorial optimization, decision dependent uncertainty set is used to ensure the same relative protection level of all binary decision vectors [21]. To model a robust task scheduling problem with uncertainty in the processing time, Vujanic et al. [45] proposed a decision dependent uncertainty set as a Minkowski sum of some static sets such that the uncertain completion time interval of a task can naturally depend on the starting time of the task. Hu et al. [1] studied a newsvendor model where the product demand may depend on the selling price. Since the analytical relationship between the demand and the selling price is unknown, they construct a family of decreasing and convex functions from historical data as the functional ambiguity set of the true demand function, and solve the functionally robust optimization problem of this model using a univariate reformulation. Nohadani et al. [22] investigated robust linear programs with decision dependent budget-type uncertainty (RLP-DDU). They showed that this problem is NP-hard even in the case where the uncertainty set is a polyhedron and the decision dependence is affine. RLP-DDU can be reformulated as a mixed integer linear program (MILP), if the decision variables affect uncertain variables by controlling the upper bounds of the uncertain variables. This concept is demonstrated in a robust shortest-path problem, where the uncertainty is resolved progressively when approaching the destination.

3 Reformulation of (D$^3$RO)

We investigate the dual of the inner problem of ($\hbox {D}^3\hbox {RO}$) under the assumption that the probability distributions are specified over a finite support on ${\varXi }$. Reformulations for the case where ${\varXi }$ is continuous are also given for the first three cases. We make the following assumption to simplify the presentation of our key observations:

Assumption 1

Every $P\in {\mathcal {P}}({\varvec{x}})$ has a decision independent finite support ${\varXi }:=\{ \varvec{\xi }^k \}^{N}_{k=1}$, $\forall {\varvec{x}}\in X$, for a fixed N.

The finite support assumption is commonly used when developing computational algorithms for solving general two-stage stochastic optimization. The finite support framework arises either due to an application of sample average approximation approach when solving problem having parameters with continuous support, or in a data-driven setting when the data provides the nominal samples for the model. For a more general model, the set ${\varXi }$ can be a compact set in the Euclidean space. The reformulations given in Sects. 3.1–3.3 that have finitely many constraints due to Assumption 1 (specified by $\varvec{\xi }^k$, $k=1,\ldots ,N$) become semi-infinite programs, where the constraints are parameterized over the continuous support ${\varXi }$. Dual formulations for finite support case provide a bridge to understanding dual formulations for the case where the distributions have continuous support. In the finite support case, because of LP duality, often we do not need Slater-type conditions. However, these conditions are required for the more general case. It is important to be able to see such distinctions when considering applications. Moreover, all reformulations adapt to the case where $\varvec{\xi }^k$ is replaced with $\varvec{\xi }^k({\varvec{x}})$.

An interpretation of Assumption 1 is as follows. In this case, the candidate probability distributions in ${\mathcal {P}}({\varvec{x}})$ are represented as a vector $\varvec{w}\in {\mathbb {R}}^N$ such that $w_i({\varvec{x}})$ is the probability assigned to scenario $\varvec{\xi }^i$ ($i\in [N]$) and $\Vert \varvec{w}({\varvec{x}}) \Vert _1=1$ for all ${\varvec{x}}\in X$. By judicious specification of ${\varvec{x}}$, the support of the distribution (turning a scenario on or off) may also be allowed to change with x by forcing certain scenarios to have zero probability in the specification of the set ${\mathcal {P}}({\varvec{x}})$. Therefore, this finite support framework allows for a great deal of generality.

3.1 Ambiguity sets defined by simple measure and moment inequalities

We consider the moment robust set defined as follows:

(2)

where ${\mathcal {M}}({\varXi },{\mathcal {F}})$ is the set of positive measures defined on $({\varXi },{\mathcal {F}})$, $\nu _1({\varvec{x}}),\nu _2({\varvec{x}})\in {\mathcal {M}}({\varXi },{\mathcal {F}})$ are two given measures for a fixed ${\varvec{x}}$ that are lower and upper bounds of candidate probability measures, and $\varvec{f}:=[f_1(\varvec{\xi }),\dotsc ,f_m(\varvec{\xi })]$ is a vector of moment functions. The ordering “$\preceq $” is defined as follows: For any two measures (not necessarily probability measures) ${\mathcal {M}}_1$ and ${\mathcal {M}}_2$ defined on $({\varXi },{\mathcal {F}})$, ${\mathcal {M}}_1\preceq {\mathcal {M}}_2$ is equivalent to ${\mathcal {M}}_1(S)\le {\mathcal {M}}_2(S)\;\;\forall S\in {\mathcal {F}}$. If Assumption 1 is satisfied, then ${\mathcal {M}}_1\preceq {\mathcal {M}}_2$ is equivalent to ${\mathcal {M}}_1(\varvec{\xi }^k)\le {\mathcal {M}}_2(\varvec{\xi }^k)\;\;\forall k\in [N]$. To ensure that P is a probability distribution, we set $l_1({\varvec{x}})=u_1({\varvec{x}})=1$ and $f_1(\varvec{\xi })=1$ in the above definition of ${\mathcal {P}}^{SM}({\varvec{x}})$. For any $\varvec{\xi }\in {\varXi }$, let $\varvec{\xi }:=[\xi _1,\ldots ,\xi _d]$. When standard moments are used, the ith ($i\in [m]$) entry of $\varvec{f}$ has the form: $f_i(\varvec{\xi }):=(\xi _1)^{k_{i1}}\cdot (\xi _2)^{k_{i2}} \cdots (\xi _d)^{k_{id}}$, where $k_{ij}$ is a nonnegative integer indicating the power of $\xi _j$ for the ith moment function. The framework also allows the use of generalized moments by choosing alternative base functions. Note that the constraint $\int _{{\varXi }}f_1(\xi )P(d\xi )\in [l_1({\varvec{x}}),u_1({\varvec{x}})]$ in (2) is used to ensure that P is a probability distribution. The ambiguity set (2) is a generalization of the set in [46] for the decision dependent case. The following theorem gives a reformulation of ($\hbox {D}^3\hbox {RO}$) with moment robust ambiguity set ${\mathcal {P}}^{SM}(\varvec{x})$.

Theorem 3.1

Let Assumption 1 hold, and therefore the ambiguity set (2) be given by ${\mathcal {P}}^{SM}_0({\varvec{x}})=\{\varvec{p}\in {\mathbb {R}}^N: \;\varvec{l}({\varvec{x}})\le \sum ^N_{k=1}p_kf(\varvec{\xi }^k)\le \varvec{u}({\varvec{x}}),\;{\underline{p}}_k({\varvec{x}})\le p_k\le {\overline{p}}_k({\varvec{x}})\;\;\forall k\in [N]\}$, where ${\underline{p}}_k({\varvec{x}})$ and ${\overline{p}}_k({\varvec{x}})$ are given lower and upper bounds of each $p_k$. If for any ${\varvec{x}}\in X$, $h({\varvec{x}},\varvec{\xi }^k)$ is finite for any $k\in [N]$ and the ambiguity set ${\mathcal {P}}^{SM}_0({\varvec{x}})$ is nonempty, then the ($\hbox {D}^3\hbox {RO}$) problem with the ambiguity set ${\mathcal {P}}^{SM}_0({\varvec{x}})$ can be reformulated as the following nonlinear program:

$$\begin{aligned} \begin{aligned} \underset{\varvec{x},\varvec{\alpha },\varvec{\beta },\varvec{\gamma },\varvec{\mu }}{min }&\quad f(\varvec{x}) + \varvec{\alpha }^T\varvec{l}({\varvec{x}}) + \varvec{\beta }^T\varvec{u}({\varvec{x}}) +\varvec{\gamma }^T\underline{\varvec{p}}({\varvec{x}}) + \varvec{\mu }^T\overline{\varvec{p}}({\varvec{x}}) \\ s.t.&\quad (\varvec{\alpha }+\varvec{\beta })^T \varvec{f}(\varvec{\xi }^k) + \gamma _k + \mu _k \ge h(\varvec{x},\varvec{\xi }^k) \quad \forall k\in [N], \\&\quad \varvec{x}\in X,\; \varvec{\alpha },\varvec{\beta }\in {\mathbb {R}}^{m},\; \; \varvec{\gamma }, \varvec{\mu }\in {\mathbb {R}}^N,\; \varvec{\alpha }\le \varvec{0},\; \varvec{\beta }\ge \varvec{0},\; \varvec{\gamma }\le 0, \; \varvec{\mu }\ge 0. \end{aligned} \end{aligned}$$

(3)

Proof

Under Assumption 1, the inner problem of ($\hbox {D}^3\hbox {RO}$) becomes the following linear program:

$$\begin{aligned}&\underset{\varvec{p}\in {\mathbb {R}}^{N}}{\text {max}}\;\; \sum ^{N}_{k=1} p_k h(\varvec{x},\varvec{\xi }^k) \nonumber \\&\text {s.t.}\,\quad \varvec{l}(\varvec{x})\le \sum ^{N}_{k=1} p_k\varvec{f}(\varvec{\xi }^k) \le \varvec{u}(\varvec{x}), \nonumber \\&\qquad \quad {\underline{p}}_k({\varvec{x}}) \le p_k\le {\overline{p}}_k({\varvec{x}}) \;\;\forall k\in [N]. \end{aligned}$$

(4)

Based on the hypothesis of the theorem, the above linear program is feasible for any ${\varvec{x}}\in X$, and its objective value is bounded. We take the dual of (4) using strong duality and combining the dual problem with the outer problem to get the desired reformulation. $\square $

A reformulation for the two-stage case is given in the following corollary.

Corollary 3.1

If $h(\cdot ,\cdot )$ is a recourse function defined in (1), and the ambiguity set ${\mathcal {P}}^{SM}_0({\varvec{x}})$ is non-empty for any ${\varvec{x}}\in X$, then the ($\hbox {D}^3\hbox {RO}$) problem with the ambiguity set ${\mathcal {P}}^{SM}_0(\varvec{x})$ can be formulated as follows:

$$\begin{aligned} \begin{aligned} \underset{\varvec{x},\varvec{y},\varvec{\alpha },\varvec{\beta },\varvec{\gamma },\varvec{\mu }}{min }&\quad f(\varvec{x}) + \varvec{\alpha }^T\varvec{l}({\varvec{x}}) + \varvec{\beta }^T\varvec{u}({\varvec{x}}) +\varvec{\gamma }^T\underline{\varvec{p}}({\varvec{x}}) + \varvec{\mu }^T\overline{\varvec{p}}({\varvec{x}}) \\ s.t.&\quad (\varvec{\alpha }+\varvec{\beta })^T \varvec{f}(\varvec{\xi }^k) + \gamma _k + \mu _k \ge g(\varvec{x},\varvec{y}^k,\varvec{\xi }^k) \quad \forall k\in [N] \\&\quad \psi _i(\varvec{x},\varvec{y}^k,\varvec{\xi }^k)\ge 0 \quad \forall i\in [m],\;\forall k\in [N], \\&\quad \varvec{x}\in X,\; \varvec{\alpha },\varvec{\beta }\in {\mathbb {R}}^{m},\; \; \varvec{\gamma }, \varvec{\mu }\in {\mathbb {R}}^N,\; \varvec{\alpha }\le \varvec{0},\; \varvec{\beta }\ge \varvec{0},\; \varvec{\gamma }\le 0, \; \varvec{\mu }\ge 0. \end{aligned} \end{aligned}$$

(5)

Proof

Recall that the complete recourse assumption ensures that $h({\varvec{x}},\varvec{\xi }^k)$ is finite of any given ${\varvec{x}}$. One can take dual of the inner problem and use strong duality as in the proof of Theorem 3.1. $\square $

Theorem 3.3, which is an analogue of Theorem 3.1 for the continuous case, can be proved using conic duality theory in functional spaces.

Theorem 3.2

(conic duality in functional spaces [47]) Let X and Y be linear spaces, $C\subset X$ and $K\subset Y$ be convex cones, $b\in Y$ and $A:\;X\rightarrow Y$ be a linear mapping. The spaces X and Y are paired with linear spaces (dual spaces) $X^{\prime }$ and $Y^{\prime }$, respectively, in the sense that bilinear forms $\langle \cdot ,\cdot \rangle :\;X^{\prime }\times X\rightarrow {\mathbb {R}}$ and $\langle \cdot ,\cdot \rangle :\;Y^{\prime }\times Y\rightarrow {\mathbb {R}}$ are defined. Consider the following pair of conic linear optimization problem and its dual:

$$\begin{aligned}&\underset{x\in C}{inf }\; \langle c, x\rangle \quad s.t. \, Ax+b\in K, \end{aligned}$$

(P)

$$\begin{aligned}&\underset{y\in -K^*}{sup } \langle y, b \rangle \quad s.t. \, A^*y+c\in C^*, \end{aligned}$$

(D)

where $C^*$ is the polar (positive dual) cone of C, $K^*$ is the polar cone of K, and $A^*$ is the adjoint mapping of A. Suppose that X and Y are Banach spaces, the cones C and K are closed and $\langle c,\cdot \rangle $ and $A:\;X\rightarrow Y$ are continuous and $b\in ri [A(C)-K]$. Then $val (P)=val (D)$, and an optimal solution of (D) exists.

Theorem 3.3

Let ${\varXi }$ be a closed and bounded set in the Euclidean space. Let the probability measure P and the positive measures $\nu _1(x),\nu _2(x)$ be defined on the measurable space $({\varXi },{\mathcal {F}})$, where the $\sigma $-algebra ${\mathcal {F}}$ contains all singleton subsets, i.e., $\{\xi \}\in {\mathcal {F}}$ for all $\xi \in {\varXi }$. If for any $x\in {\varXi }$, the inner problem of ($\hbox {D}^3\hbox {RO}$) with the ambiguity set ${\mathcal {P}}^{SM}(x)$ has a non-empty relative interior, then the inner problem has the following dual reformulation:

$$\begin{aligned} \begin{aligned} \underset{\alpha ,\beta ,\gamma ,\mu }{min }&\alpha ^{\top }l(x) + \beta ^{\top }u(x) +\int _{{\varXi }}\gamma ^{\top }(\xi )\nu _1(x,\xi )d\xi + \int _{{\varXi }}\mu ^{\top }(\xi )\nu _2(x,\xi )d\xi \\ s.t.&\quad (\alpha +\beta )^{\top } f(\xi ) + \gamma (\xi ) + \mu (\xi ) = h(x,\xi ) \quad \forall \xi \in {\varXi }, \\&\quad x\in X,\; \alpha ,\beta \in {\mathbb {R}}^{m},\; \; \gamma (\xi ), \mu (\xi )\in {\mathbb {R}}\;\forall \xi \in {\varXi },\; \alpha \le 0,\; \beta \ge 0,\; \\&\quad \gamma (\xi )\le 0, \; \mu (\xi )\ge 0. \end{aligned} \end{aligned}$$

Furthermore, the strong duality holds and a dual optimal solution exists.

Proof

For notational convenience, we omit the x variable in the proof. Let $V^+({\varXi },{\mathcal {F}})$ be the cone of of positive measures on $({\varXi },{\mathcal {F}})$. The inner problem of the original model is a conic linear program in a functional space, which we rewrite as follows:

$$\begin{aligned} \begin{aligned}&\underset{P}{\text {sup}}\;\;{\mathbb {E}}_P[h(\xi )] \\&\text {s.t.}\,\nu _1\preceq P \preceq \nu _2, \\&\qquad l_i\le \int _{{\varXi }} f_i(\xi ) P(d\xi ) \le u_i, \quad i\in [m], \end{aligned} \end{aligned}$$

(6)

where $P\in V^+({\varXi },{\mathcal {F}})$. We write a dual of (6) as follows:

$$\begin{aligned}&\underset{\alpha ,\beta ,\gamma ,\mu }{\text {inf}}\;\; \alpha ^{\top }l+\beta ^{\top }u+\int _{{\varXi }}\gamma (\xi )d\nu _1(\xi ) +\int _{{\varXi }}\mu (\xi )d\nu _2(\xi ) \nonumber \\&\text {s.t.}\, (\alpha +\beta )^{\top }f(\xi )+\gamma (\xi )+\mu (\xi )= h(\xi )\qquad \forall \xi \in {\varXi }, \nonumber \\&\qquad \alpha ,\beta \in {\mathbb {R}}^m,\;\alpha \le 0,\;\beta \ge 0,\;\gamma (\xi )\le 0,\;\mu (\xi )\ge 0 \;\forall \xi \in {\varXi }. \end{aligned}$$

(7)

We now verify that the conditions in Theorem 3.2 hold for (6) and (7). The primal decision variable P is an element in the the following Banach space of finite signed measures over $({\varXi },{\mathcal {F}})$:

The Banach space ${\mathcal {M}}({\varXi },{\mathcal {F}})$ is equipped with the norm $\Vert \mu \Vert =|\mu |(X)$, where $\mu =\mu ^+-\mu ^-$, $|\mu |=\mu ^++\mu ^-$. The $\mu ^+$, $\mu ^-$ are the positive and negative components of $\mu $. The dual decision variable $[\alpha ,\beta ,\gamma ,\mu ]$ is an element in the Banach space ${\mathbb {R}}^m\times {\mathbb {R}}^m\times L\times L$, where L is a Banach space defined as

The primal problem can be rewritten as the following linear conic optimization problem:

$$\begin{aligned} \begin{aligned}&\underset{P\in {\mathcal {M}}({\varXi },{\mathcal {F}})}{\text {sup}}\;\langle h, P \rangle \\&\text {s.t.}\, \langle f_i, P \rangle -l_i \in {\mathbb {R}}_+ \quad i\in [m], \qquad -\langle f_i, P \rangle +u_i \in {\mathbb {R}}_+ \quad i\in [m], \\&\qquad P-\nu _1 \in {\mathcal {M}}^+({\varXi },{\mathcal {F}}), \qquad -P+\nu _2\in {\mathcal {M}}^+({\varXi },{\mathcal {F}}), \qquad P\in {\mathcal {M}}^+({\varXi },{\mathcal {F}}), \end{aligned} \end{aligned}$$

(8)

where $\langle \psi , P\rangle =\int _{{\varXi }} \psi dP$. The linear mapping $A:\;{\mathcal {M}}({\varXi },{\mathcal {F}})\rightarrow {\mathbb {R}}^m\times {\mathbb {R}}^m \times {\mathcal {M}}({\varXi },{\mathcal {F}})\times {\mathcal {M}}({\varXi },{\mathcal {F}})$ is given as:

$$\begin{aligned} A\circ P:=[\langle f, P\rangle ,\;-\langle f, P\rangle ,\;P,\;-P]. \end{aligned}$$

The dual problem can be rewritten as the following linear conic optimization problem:

$$\begin{aligned} \begin{aligned}&\underset{\begin{array}{c} [\alpha ,\beta ,\gamma ,\mu ]\in \\ {\mathbb {R}}^m\times {\mathbb {R}}^m\times L\times L \end{array}}{\text {inf}}\;\langle [l,u,\nu _1,\nu _2], [\alpha ,\beta ,\gamma ,\mu ] \rangle \\&\text {s.t.}\, (\alpha +\beta )^{\top }f(\xi )+\gamma (\xi )+\mu (\xi )-h(\xi )\in \{0\} \quad \forall \xi \in {\varXi }, \\&\qquad [\alpha ,\beta ,\gamma ,\mu ]\in {\mathbb {R}}^{m}_-\times {\mathbb {R}}^{m}_+\times L^-\times L^+, \end{aligned} \end{aligned}$$

(9)

where and .

By definition, it is clear that the cones ${\mathcal {M}}^+({\varXi },{\mathcal {F}})$ and ${\mathbb {R}}^m_+\times {\mathbb {R}}^m_+ \times {\mathcal {M}}^+({\varXi },{\mathcal {F}})\times {\mathcal {M}}^+({\varXi },{\mathcal {F}})$ are closed. To show that the linear functional $\langle h, \cdot \rangle $ and the linear mapping A is continuous, it suffices to show that if $P_n\rightarrow 0$, we have $\langle h, P_n\rangle \rightarrow 0$ and $A\circ P_n\rightarrow 0$ as $n\rightarrow \infty $. Indeed, if $P_n\rightarrow 0$, we have

$$\begin{aligned} |\langle h, P_n\rangle |= & {} \left|\int _{{\varXi }}h(\xi )dP_n(\xi ) \right|\le \text {max}_{\xi \in {\varXi }}h(\xi )\cdot P_n({\varXi }) \rightarrow 0, \text {and similarly,}\, |\langle f_i, P_n \rangle | \rightarrow 0. \end{aligned}$$

Therefore, $\langle h, P_n\rangle \rightarrow 0$ and $A\circ P_n \rightarrow 0$ as $n\rightarrow \infty $. Since the primal problem has a non-empty relative interior, by Theorem 3.2, the strong duality holds and the dual problem has an optimal solution. $\square $

3.2 Covariance matrix based ambiguity set

We now consider a distributional ambiguity set with multi-variate bounds defined as follows:

(10)

This set is a generalization of the set used in [6] for the decision dependent case. In the finite support case this set is written as follows:

(11)

Note that in a special case of (10), $\varvec{\mu }({\varvec{x}})$ and $\varvec{Q}({\varvec{x}})$ may not depend on ${\varvec{x}}$ and only $\alpha ({\varvec{x}})$, $\beta ({\varvec{x}})$ depend on the decision variables. In this case, the decision only affects the size of the ambiguity set. Note that in ${\mathcal {P}}^{DY}_0({\varvec{x}})$ we can also allow the probabilities $p_i$ to depend on ${\varvec{x}}$ as in ${\mathcal {P}}^{DY}({\varvec{x}})$. This possible generalization is ignored here for simplicity. The following theorem gives a reformulation of ($\hbox {D}^3\hbox {RO}$) with the ambiguity set ${\mathcal {P}}^{DY}_0(\varvec{x})$. The following theorem is based on Lemma 1 in [6] for the finite support and decision dependent case.

Theorem 3.4

Let Assumption 1 hold. Suppose that Slater’s constraint qualification conditions are satisfied, i.e., for any $\varvec{x}\in X$, there exist a vector $\varvec{p}^{\prime }$ in the interior of ${\mathcal {P}}^{DY}({\varvec{x}})$. Then the ($\hbox {D}^3\hbox {RO}$) problem with the ambiguity set ${\mathcal {P}}^{MB}({\varvec{x}})$ can be reformulated as:

$$\begin{aligned} \begin{aligned} \underset{{\varvec{x}},s,\varvec{u},\varvec{z},\varvec{Y}}{min }&\quad f(\varvec{x}) + s + [\sqrt{\alpha ({\varvec{x}})}, -(\varvec{Q}({\varvec{x}})^{-1/2}\varvec{\mu }(\varvec{x}))^T]\varvec{z} + \beta ({\varvec{x}})\varvec{Q}({\varvec{x}})\bullet \varvec{Y} \\ s.t.&\quad s - (\varvec{\xi }^{i})^T\varvec{Q}({\varvec{x}})^{-1/2}\varvec{z}_1 + (\varvec{\xi }^i-\varvec{\mu }({\varvec{x}}))(\varvec{\xi }^i-\varvec{\mu }({\varvec{x}}))^T\bullet \varvec{Y} \ge h({\varvec{x}},\varvec{\xi }^i) \quad \forall \; i\in [N], \\&\quad \varvec{x}\in X,\; \varvec{z}:=[z_0,\varvec{z}_1]\in {\mathcal {N}}_{SOC },\; \varvec{Y}\in {\mathbb {R}}^{N\times N}, \; \varvec{Y} \succeq 0, \end{aligned}\nonumber \\ \end{aligned}$$

(12)

where ${\mathcal {N}}_{SOC }$ is a second order cone defined as , $\varvec{z}:=[z_0,\;\varvec{z}_1]$ with $z_0\in {\mathbb {R}}$ and $\varvec{z}_1\in {\mathbb {R}}^d$, and $A\bullet B=Tr(A^TB)$ for matrices A and B.

Proof

With the ambiguity set ${\mathcal {P}}^{DY}_0$ the inner problem of ($\hbox {D}^3\hbox {RO}$) is given as follows:

$$\begin{aligned} \underset{\varvec{p},\tau }{\text {max}}&\quad \sum ^{N}_{i=1} p_i h({\varvec{x}},\varvec{\xi }^i)&\nonumber \\ \text {s.t.}&\quad \sum ^{N}_{i=1} p_i = 1,&:\; s \in {\mathbb {R}}, \nonumber \\&\quad \varvec{\tau } = \sum ^{N}_{i=1} p_i \varvec{\xi }^i,&:\; \varvec{u}\in {\mathbb {R}}^d, \nonumber \\&\quad (\varvec{\tau }-\varvec{\mu }({\varvec{x}}))^T\varvec{Q}({\varvec{x}})^{-1}(\varvec{\tau }-\varvec{\mu }({\varvec{x}})) \le \alpha ({\varvec{x}}),&:\; \varvec{z}\in {\mathcal {N}}_{\text {SOC}}, \nonumber \\&\quad \sum ^{N}_{i=1} p_i(\varvec{\xi }^i-\varvec{\mu }({\varvec{x}}))(\varvec{\xi }^i-\varvec{\mu }({\varvec{x}}))^T \preceq \beta ({\varvec{x}})\varvec{Q}({\varvec{x}}),&:\; \varvec{Y}\succeq 0, \nonumber \\&\quad \varvec{p}:=[p_1,\ldots ,p_{N}]\in {\mathbb {R}}^{N}_+,&\forall i\in [N]. \end{aligned}$$

(13)

Under the Slater’s condition using the strong duality for semi-definite programming, we reformulate the dual problem as:

$$\begin{aligned} \begin{aligned} \underset{s,\varvec{u},\varvec{z},\varvec{Y}}{\text {min}}&\quad s + \beta ({\varvec{x}})\varvec{Q}({\varvec{x}})\bullet \varvec{Y} + \big [\sqrt{\alpha ({\varvec{x}})}, -\left( \varvec{Q}({\varvec{x}})^{-1/2}\varvec{\mu }({\varvec{x}})\right) ^T \big ]\varvec{z} \\ \text {s.t.}&\quad s + \varvec{u}^T\varvec{\xi }^i +(\varvec{\xi }^i-\varvec{\mu }({\varvec{x}}))(\varvec{\xi }^i-\varvec{\mu }({\varvec{x}}))^T\bullet \varvec{Y} \ge h({\varvec{x}},\varvec{\xi }^i) \qquad \forall i\in [N], \\&\quad \varvec{u} + \varvec{Q}({\varvec{x}})^{-1/2}\varvec{z}_1 = 0, \\&\quad \varvec{z}\in {\mathcal {N}}_{\text {SOC}},\; \varvec{Y}\in {\mathbb {R}}^{d\times d}, \; \varvec{Y}\succeq 0. \end{aligned}\quad \end{aligned}$$

(14)

Substituting (14) in ($\hbox {D}^3\hbox {RO}$), we obtain (12). $\square $

Corollary 3.2

If $h(\cdot ,\cdot )$ is a recourse function defined in (1) and the ambiguity set ${\mathcal {P}}^{DY}_0({\varvec{x}})$ is non-empty for any ${\varvec{x}}\in X$, then the ($\hbox {D}^3\hbox {RO}$) problem with the ambiguity set ${\mathcal {P}}^{DY}_0({\varvec{x}})$ can be reformulated as follows:

$$\begin{aligned} \begin{aligned} \underset{{\varvec{x}},s, \varvec{y}, \varvec{u},\varvec{z},\varvec{Y}}{min }&\quad f(\varvec{x}) + s + [\sqrt{\alpha ({\varvec{x}})}, -(\varvec{Q}({\varvec{x}})^{-1/2}\varvec{\mu }(\varvec{x}))^T]\varvec{z} + \beta ({\varvec{x}})\varvec{Q}({\varvec{x}})\bullet \varvec{Y} \\ s.t.&\quad s - (\varvec{\xi }^{k})^T\varvec{Q}({\varvec{x}})^{-1/2}\varvec{z}_1 + (\varvec{\xi }^k-\varvec{\mu }({\varvec{x}}))(\varvec{\xi }^k-\varvec{\mu }({\varvec{x}}))^T\bullet \varvec{Y} \ge g({\varvec{x}},\varvec{y}^k,\varvec{\xi }^k) \quad \\&\qquad \forall \; k\in [N], \\&\quad \psi _i(\varvec{x},\varvec{y}^k,\varvec{\xi }^k)\ge 0 \quad \forall i\in [m],\;\forall k\in [N], \\&\quad \varvec{x}\in X,\; \varvec{z}:=[z_0,\varvec{z}_1]\in {\mathcal {N}}_{SOC },\; \varvec{Y} \succeq 0. \end{aligned}\nonumber \\ \end{aligned}$$

(15)

The following theorem, which is an analogue of Theorem 3.4, can be proved using conic duality.

Theorem 3.5

Let ${\varXi }$ be a closed and bounded set in the Euclidean space. Suppose the $\sigma $-algbra $\mathcal {F}$ contains all singleton sets, i.e., $\{\xi \}\in \mathcal {F}$ for all $\xi \in {\varXi }$. Suppose the following Slater-type conditions are satisfied: For all ${\varvec{x}}\in X$, there exists a probability measure $P\in {\mathcal {P}}({\varXi },{\mathcal {F}})$ satisfying $({\mathbb {E}}_P[\varvec{\xi }] - \varvec{\mu }({\varvec{x}}))^T\varvec{Q}({\varvec{x}})^{-1}({\mathbb {E}}_P[\varvec{\xi }] - \varvec{\mu }({\varvec{x}})) < \alpha ({\varvec{x}})$ and ${\mathbb {E}}_P[(\varvec{\xi }-\varvec{\mu }({\varvec{x}}))(\xi -\varvec{\mu }({\varvec{x}}))^T]\prec \beta ({\varvec{x}}) \varvec{Q}({\varvec{x}})$. Then problem ($\hbox {D}^3\hbox {RO}$) with the ambiguity set ${\mathcal {P}}^{DY}({\varvec{x}})$ can be reformulated as

$$\begin{aligned} \begin{aligned} \underset{{\varvec{x}},s,\varvec{u},\varvec{z},\varvec{Y}}{\text {min}}&\quad f(\varvec{x}) + s + [\sqrt{\alpha ({\varvec{x}})}, -(\varvec{Q}({\varvec{x}})^{-1/2}\varvec{\mu }(\varvec{x}))^T]\varvec{z} + \beta ({\varvec{x}})\varvec{Q}({\varvec{x}})\bullet \varvec{Y} \\ \text {s.t.}&\quad s - \varvec{\xi }^T\varvec{Q}({\varvec{x}})^{-1/2}\varvec{z}_1 + (\varvec{\xi }-\varvec{\mu }({\varvec{x}}))(\varvec{\xi }-\varvec{\mu }({\varvec{x}}))^T\bullet \varvec{Y} \ge h({\varvec{x}},\varvec{\xi }) \quad \forall \; \varvec{\xi }\in {\varXi }, \\&\quad \varvec{x}\in X,\; \varvec{z}:=[z_0,\varvec{z}_1]\in {\mathcal {N}}_{SOC },\; \varvec{Y}\in {\mathbb {R}}^{d\times d}, \; \varvec{Y} \succeq 0. \end{aligned} \end{aligned}$$

Proof

For the notational convenience, we omit the ${\varvec{x}}$ argument in all functions and parameters in the proof. Let $\mathcal {M}^+({\varXi },\mathcal {F})$ be the cone of positive measures defined on the measurable space $({\varXi },\mathcal {F})$. The inner problem of ($\hbox {D}^3\hbox {RO}$) under the uncertainty set ${\mathcal {P}}^{DY}$ is written as:

$$\begin{aligned} \begin{aligned}&\text {max}\;\;{\mathbb {E}}_P[h(\xi )] \\&\text {s.t.}\,({\mathbb {E}}_P[\xi ]-\mu )^{\top }Q^{-1}({\mathbb {E}}_P[\xi ]-\mu )\le \alpha , \\&\qquad {\mathbb {E}}_P[(\xi -\mu )(\xi -\mu )^{\top }]\preceq \beta Q, \\&\qquad {\mathbb {E}}_P[1]=1,\; P\in \mathcal {M}^+({\varXi },\mathcal {F}). \end{aligned} \end{aligned}$$

(16)

Define the linear mapping $\mathcal {A}_1:\;\mathcal {M}^+({\varXi },\mathcal {F})\longmapsto {\mathbb {R}}$ as $\mathcal {A}_1P:=\int _{{\varXi }}1dP(\xi )$, define the linear mapping $\mathcal {A}_2:\;\mathcal {M}^+({\varXi },\mathcal {F})\longmapsto {\mathbb {R}}^{m+1}$ as $\mathcal {A}_2P:=[0,Q^{-1/2}\int _{{\varXi }}\xi dP(\xi )]$, and define the linear mapping $\mathcal {A}_3:\;\mathcal {M}^+({\varXi },\mathcal {F})\longmapsto {\mathbb {R}}^{m\times m}$ as $\mathcal {A}_3P:=-\int _{{\varXi }}(\xi -\mu )(\xi -\mu )^{\top }dP(\xi )$. Define the linear functional $\langle h,\cdot \rangle :\;\mathcal {M}^+({\varXi },\mathcal {F})\longmapsto {\mathbb {R}}$ as $\langle h,P\rangle :=\int _{{\varXi }}h(\xi )dP(\xi )$. Let $b = [\sqrt{\alpha },\;0]$ be a constant vector in ${\mathbb {R}}^{m+1}$. Then (16) can be reformulated as the following conic linear program:

$$\begin{aligned} \begin{aligned}&\text {max}\;\;\langle h, P\rangle \\&\text {s.t.}\, \mathcal {A}_1P-1=0, \\&\qquad \mathcal {A}_2P+b \in \mathcal {K}_{SOC}, \\&\qquad \mathcal {A}_3P+\beta Q\in \mathcal {K}_{SD}, \\&\qquad P\in \mathcal {M}^+({\varXi },\mathcal {F}), \end{aligned} \end{aligned}$$

(17)

where $\mathcal {K}_{SOC}$ and $\mathcal {K}_{SD}$ are the second-order cone and the positive semi-definite cone, respectively. Applying the duality theory [47] of the conic linear program in the functional space, we can obtain the dual of (16) as given in the statement of the theorem. The slater-type condition is equivalent to that there exists a probability measure $P^{\prime }$ such that $\mathcal {A}_2P^{\prime }+b\in \text {int}(\mathcal {K}_{SOC})$, and $\mathcal {A}_3P^{\prime }+\beta Q\in \text {int}(\mathcal {K}_{SD})$. Based on (23) in [47], the statement after (23), and Proposition 2.9 of [47], the strong duality holds. $\square $

3.3 Ambiguity sets defined by Wasserstein metric

Instead of using moment based definitions of the ambiguity set, we may define this set using a statistical distance, such as the Wasserstein metric. We now study the ($\hbox {D}^3\hbox {RO}$) problem with a decision dependent ambiguity set defined using the $L_1$-Wasserstein metric as follows:

(18)

where $P_0$ is a nominal probability distribution, and ${\mathcal {W}}(\cdot ,\cdot ):\;{\mathcal {P}}({\varXi },{\mathcal {F}})\times {\mathcal {P}}({\varXi },{\mathcal {F}})\rightarrow {\mathbb {R}}$ is the $L_1$-Wasserstein metric defined in [48]:

$$\begin{aligned} {\mathcal {W}}(P_1, P_2):= \underset{K\in {\mathcal {S}}(P_1,P_2)}{\text {inf}} \int _{{\varXi }\times {\varXi }} \Vert \varvec{s}_1 - \varvec{s}_2 \Vert K(d\varvec{s}_1\times d \varvec{s}_2), \end{aligned}$$

(19)

where ${\mathcal {S}}(P_1,P_2):=\big \{ K\in {\mathcal {P}}({\varXi }\times {\varXi },{\mathcal {F}}\times {\mathcal {F}}):\; K(A\times {\varXi }) = P_1(A),\; K({\varXi }\times A) = P_2(A),\; \forall A\in {\mathcal {F}} \big \}$ is the set of all joint probability distributions whose marginals are $P_1$ and $P_2$, and $\Vert \cdot \Vert $ is an arbitrary norm defined on ${\mathbb {R}}^d$. The ambiguity set (18) is a generalization of the one considered in [7,8,9,10] for the decision dependent case. As a special case of (18), under Assumption 1, ${\mathcal {P}}^W({\varvec{x}})$ is written as:

$$\begin{aligned} {\mathcal {P}}^W({\varvec{x}})=\left\{ \varvec{p}\in {\mathbb {R}}^N\;|\; {\mathcal {W}}(\varvec{p},\hat{\varvec{p}})\le r({\varvec{x}}),\; \sum ^N_{i=1}p_i=1,\; p_i\ge 0,\; \forall i\in [N] \right\} , \end{aligned}$$

(20)

where $\hat{\varvec{p}}$ is a given empirical probability distribution on ${\varXi }$, and the Wasserstein metric can be simplified as

$$\begin{aligned}&{\mathcal {W}}(\varvec{p},\hat{\varvec{p}})=\left\{ \underset{\varvec{w}}{\text {min}}\; \sum ^N_{i=1}\sum ^N_{j=1}\Vert \varvec{\xi }^i-\varvec{\xi }^j\Vert w_{ij}\;\; \big \vert \; \sum ^N_{j=1}w_{ij}=p_i \; \forall i\in [N], \right. \\&\left. \quad \sum ^N_{i=1}w_{ij}={\hat{p}}_j\; \forall j\in [N],\; w_{ij}\ge 0\;\forall i,j\in [N] \right\} , \end{aligned}$$

where ${\hat{p}}_j$ is the probability of scenario $\varvec{\xi }^j$. The following theorem gives a reformulation of ($\hbox {D}^3\hbox {RO}$) for the ambiguity set (20).

Theorem 3.6

Let Assumption 1 hold. If for any ${\varvec{x}}\in X$, $h({\varvec{x}},\varvec{\xi }^i)$ is finite for any $i\in [N]$ and the ambiguity set (20) is nonempty, then ($\hbox {D}^3\hbox {RO}$) problem with the ambiguity set (20) can be reformulated as:

$$\begin{aligned} \begin{aligned}&\underset{\varvec{x},\varvec{\alpha },\varvec{\beta },\varvec{\mu },\varvec{\lambda },\gamma ,\eta }{min }\;\; f(\varvec{x})-\sum ^N_{i=1}{\hat{p}}_i\beta _i - r(\varvec{x})\gamma + \eta \\&\quad s.t. \, -\alpha _i-\mu _i + \eta \ge h(\varvec{x},\varvec{\xi }^i) \qquad \forall i\in [N], \\&\qquad \quad -\alpha _i+\beta _j + \Vert \varvec{\xi }^i - \varvec{\xi }^j\Vert \gamma + \lambda _{ij} \le 0 \qquad \forall i\in [N],\;\forall j\in [N], \\&\qquad \quad \varvec{x}\in X, \;\; \alpha _i\in {\mathbb {R}},\;\beta _i\in {\mathbb {R}},\;\mu _i\ge 0,\;\lambda _{ij}\ge 0,\;\gamma \le 0,\;\eta \in {\mathbb {R}}\quad \\&\quad \forall i\in [N],\;\forall j\in [N]. \end{aligned} \end{aligned}$$

(21)

Proof

Since ${\varXi }$ is finite, the ($\hbox {D}^3\hbox {RO}$)-inner problem with ambiguity set ${\mathcal {P}}^W(\varvec{x})$ can be formulated as the following linear program:

$$\begin{aligned} \begin{aligned}&\underset{\varvec{p},\varvec{w}}{\text {max}}\;\;\sum ^N_{i=1}h(\varvec{x},\varvec{\xi }^i)p_i \\&\;\text {s.t.}\, \sum ^N_{j=1}w_{ij}=p_i \qquad \forall i\in [N], \\&\qquad \sum ^N_{i=1}w_{ij}={\hat{p}}_j \qquad \forall j\in [N], \\&\qquad \sum ^N_{i=1}\sum ^N_{j=1} \Vert \varvec{\xi }^i - \varvec{\xi }^j\Vert w_{ij} \le r(\varvec{x}), \\&\qquad \sum ^N_{i=1}p_i=1,\;\; p_i\ge 0,\;w_{ij}\ge 0\;\; \forall i\in [N],\;\forall j\in [N], \end{aligned} \end{aligned}$$

(22)

where $\varvec{w}$ is a joint probability distribution with two marginal distributions given by $\varvec{p}$ and $\hat{\varvec{p}}$, respectively. The dual of the above linear program is:

$$\begin{aligned} \begin{aligned}&\underset{\varvec{\alpha },\varvec{\beta },\varvec{\mu },\varvec{\lambda },\gamma ,\eta }{\text {min}}\;\; -\sum ^N_{i=1}\hat{\varvec{p}}_i\beta _i - r(\varvec{x})\gamma + \eta \\&\quad \text {s.t.}\, -\alpha _i-\mu _i + \eta \ge h(\varvec{x},\varvec{\xi }^i) \qquad \forall i\in [N], \\&\qquad \quad -\alpha _i+\beta _j + \Vert \varvec{\xi }^i - \varvec{\xi }^j\Vert \gamma + \lambda _{ij} \le 0 \qquad \forall i\in [N],\;\forall j\in [N], \\&\qquad \quad \alpha _i\in {\mathbb {R}},\;\beta _i\in {\mathbb {R}},\;\mu _i\ge 0,\;\lambda _{ij}\ge 0,\;\gamma \le 0,\;\eta \in {\mathbb {R}}\quad \forall i\in [N],\;\forall j\in [N]. \end{aligned} \end{aligned}$$

(23)

After substituting (23) into ($\hbox {D}^3\hbox {RO}$), we obtain the desired reformulation (21). $\square $

A reformulation for the two-stage case is given in the following corollary.

Corollary 3.3

If $h(\cdot ,\cdot )$ is a recourse function defined in (1), and the ambiguity set ${\mathcal {P}}^W({\varvec{x}})$ is non-empty for any ${\varvec{x}}\in X$, then the ($\hbox {D}^3\hbox {RO}$) problem with the ambiguity set ${\mathcal {P}}^{W}({\varvec{x}})$ can be reformulated as follows:

$$\begin{aligned} \begin{aligned}&\underset{\varvec{x},\varvec{y},\varvec{\alpha },\varvec{\beta },\varvec{\mu },\varvec{\lambda },\gamma }{min }\;\; f(\varvec{x})-\sum ^N_{k=1}\hat{\varvec{p}}_k\beta _k - r(\varvec{x})\gamma \\&\quad s.t. \, -\alpha _k-\mu _k\ge g(\varvec{x},\varvec{y}^k,\varvec{\xi }^k) \qquad \forall k\in [N], \\&\qquad \quad -\alpha _i+\beta _j + \Vert \varvec{\xi }^i-\varvec{\xi }^j\Vert \gamma + \lambda _{ij} \le 0 \qquad \forall i\in [N],\;\forall j\in [N], \\&\qquad \qquad \psi _i(\varvec{x},\varvec{y}^k,\varvec{\xi }^k)\ge 0 \quad \forall i\in [m],\;\forall k\in [N], \\&\qquad \quad \varvec{x}\in X,\;\; \alpha _i\in {\mathbb {R}},\;\beta _i\in {\mathbb {R}},\;\mu _i\ge 0,\;\lambda _{ij}\ge 0,\;\gamma \le 0\quad \forall i\in [N],\;\forall j\in [N]. \end{aligned} \end{aligned}$$

(24)

A generalization of (21) of ($\hbox {D}^3\hbox {RO}$) for the continuous support case is given in Appendix A.

3.4 Ambiguity sets defined using $\phi $-divergence

We now study the ($\hbox {D}^3\hbox {RO}$) problem using a decision dependent ambiguity set defined using the notion of $\phi $-divergence:

$$\begin{aligned} {\mathcal {P}}^{\phi }(\varvec{x}):= \big \{ P\in {\mathcal {P}}({\varXi },{\mathcal {F}}):\; {\mathcal {D}}_{\phi }(P||P_0)\le \eta (\varvec{x}) \big \}, \end{aligned}$$

(25)

where ${\mathcal {D}}_{\phi }(P||P_0)=\int _{{\varOmega }}\phi \left( \frac{dP}{dP_0}\right) dP_0$, and $\phi $ is a non-negative and convex function. This type of ambiguity set is a generalization of the one considered in [11, 12, 14,15,16, 49] for the decision dependent case. Under Assumption 1, and letting ${\hat{p}}_i$ be the probability of scenario $\varvec{\xi }_i$, the ambiguity set (25) is written as:

$$\begin{aligned} {\mathcal {P}}^{\phi }(\varvec{x})= \left\{ P=\sum ^N_{i=1}p_i\delta _{\varvec{\xi }^i}:\; \sum ^N_{i=1}{\hat{p}}_i\phi (p_i/{\hat{p}}_i) \le \eta (\varvec{x}), \; \sum ^N_{i=1}p_i=1, \; p_i\ge 0 \;\;\forall i\le N \right\} .\nonumber \\ \end{aligned}$$

(26)

Two reformulations of ($\hbox {D}^3\hbox {RO}$) with ambiguity set ${\mathcal {P}}^{\phi }(\varvec{x})$ are given in the following theorem.

Theorem 3.7

Let Assumption 1 hold, and $\phi $ be a non-negative convex function. Assume that the following Slater’s condition is satisfied for every $\varvec{x}\in X$: there exist a $\varvec{p}\in {\mathbb {R}}^N$ such that $p_i>0$, $\sum ^N_{i=1}p_i=1$ and $\sum ^N_{i=1}{\hat{p}}_i\phi (p_i/{\hat{p}}_i)<\eta (\varvec{x})$. Then ($\hbox {D}^3\hbox {RO}$) with the ambiguity set ${\mathcal {P}}^{\phi }(\varvec{x})$ can be reformulated as the following semi-infinite program:

$$\begin{aligned} \begin{aligned}&\underset{{\varvec{x}},\varvec{p},\alpha ,\beta ,\varvec{\lambda },z}{min } \;\; z \\&\;\;s.t. \, z\ge \sum ^N_{i=1}h(\varvec{x},\varvec{\xi }^i)p_i + \alpha \Big ( \frac{1}{N}\sum ^N_{i=1}\phi (Np_i)-\eta (\varvec{x}) \Big ) \\&\qquad \qquad + \beta \Big (\sum ^N_{i=1}p_i-1\Big ) + \sum ^N_{i=1}p_i\lambda _i \qquad \forall \varvec{p}\in S,\\&\qquad \quad \varvec{x}\in X,\;\alpha \le 0,\; \beta \in {\mathbb {R}},\; \lambda _i\ge 0,\; p_i\ge 0\;\;\forall i\in [N], \end{aligned} \end{aligned}$$

(27)

where $S=\{\varvec{p}\in {\mathbb {R}}^N:\; \sum ^N_{i=1}p_i=1,\;p_i\ge 0\;\;\forall i\in [N]\}$. Alternatively, (27) also has the reformulation:

$$\begin{aligned} \begin{aligned}&\underset{\varvec{x},\varvec{p},\alpha ,\beta ,\varvec{\lambda }}{min }\;\; f(\varvec{x}) + \sum ^N_{i=1}p_ih\big (\varvec{x},\varvec{\xi }^i\big ) + \alpha \Big (\frac{1}{N}\sum ^N_{i=1}\phi (Np_i)-\eta (\varvec{x}) \Big )\\&\qquad + \beta \Big (\sum ^N_{i=1}p_i-1\Big ) + \sum ^N_{i=1}\lambda _ip_i \\&\;\;s.t. \,\ \alpha \phi ^{\prime }(Np_i) +\beta + h(\varvec{x},\varvec{\xi }^i) + \lambda _i = 0 \qquad \forall i\in [N], \\&\qquad \quad \varvec{x}\in X,\;\alpha \le 0,\; \beta \in {\mathbb {R}},\; \lambda _i\ge 0,\; p_i\ge 0\;\;\forall i\in [N]. \end{aligned} \end{aligned}$$

(28)

Proof

The ($\hbox {D}^3\hbox {RO}$) problem can be written as $\underset{\varvec{x}\in X}{\text {min}}\;\; f(\varvec{x}) + {\varPhi }(\varvec{x})$, where the function ${\varPhi }(\varvec{x})$ is the optimal objective of the following optimization problem:

$$\begin{aligned} \begin{aligned} {\varPhi }(\varvec{x})=\;&\underset{\varvec{p}}{\text {max}}\;\sum ^N_{i=1}p_ih\big (\varvec{x},\varvec{\xi }^i\big ) \\&\text {s.t.}\, \frac{1}{N}\sum ^N_{i=1}\phi (Np_i) \le \eta (\varvec{x}), \quad \sum ^N_{i=1}p_i=1, \\&\qquad p_i\ge 0 \;\;\forall i\le N, \;\; \varvec{x}\in X. \end{aligned} \end{aligned}$$

(29)

Since $\phi $ is convex, (29) is a convex program with respect to the decision variable $\varvec{p}$. For a fixed $\varvec{x}\in X$, the Lagrangian dual of (29) is written as follows:

$$\begin{aligned} \begin{aligned}&\underset{\alpha ,\beta ,\varvec{\lambda }}{\text {min}}\; \underset{\varvec{p}}{\text {max}}\;{\mathcal {L}}(\varvec{p};\alpha ,\beta ,\varvec{\lambda }) \\&\text {s.t.}\, \alpha \le 0,\; \beta \in {\mathbb {R}}, \; \lambda _i\ge 0,\quad \forall i\in [N], \end{aligned} \end{aligned}$$

(30)

where ${\mathcal {L}}(\varvec{p};\alpha ,\beta ,\varvec{\lambda })=\sum ^N_{i=1}h(\varvec{x},\varvec{\xi }^i)p_i + \alpha \Big ( \frac{1}{N}\sum ^N_{i=1}\phi (Np_i)-\eta (\varvec{x}) \Big ) + \beta \Big (\sum ^N_{i=1}p_i-1\Big ) + \sum ^N_{i=1}p_i\lambda _i$, and $\alpha $, $\beta $, $\varvec{\lambda }$ are the Lagrange multipliers. Since Slater’s condition is satisfied for any $\varvec{x}\in X$, strong duality holds. The inner maximization problem of (30) is equivalent to $\{\text {max}\;z,\;\; \text {s.t. }z\ge {\mathcal {L}}(\varvec{p};\alpha ,\beta ,\varvec{\lambda })\;\;\forall \varvec{p}\in S\}$, which gives the reformulation (27).

Note that the inner problem of (30) is an unconstrained convex optimization problem. Using the KKT optimality conditions we have:

$$\begin{aligned} \frac{\partial {\mathcal {L}}}{\partial p_i}=\alpha \phi ^{\prime }(Np_i)+h(\varvec{x},\xi ^i)+\beta +\lambda _i=0,\qquad \forall i\in [N]. \end{aligned}$$

(31)

After substituting the expression of the Lagrangian in (30), adding the optimality condition and using strong duality, we obtain the reformulation given in (28). $\square $

We remark that in the cutting-surface algorithm (Algorithm 1) given in Sect. 4, the separation problem is specified over the set S, and it maximize $\sum ^N_{i=1}h(\varvec{x},\varvec{\xi }^i)p_i + \alpha \Big ( \frac{1}{N}\sum ^N_{i=1}\phi (Np_i)-\eta (\varvec{x}) \Big ) + \beta \Big (\sum ^N_{i=1}p_i-1\Big ) + \sum ^N_{i=1}p_i\lambda _i$ over $\varvec{p}\in S$, where ${\varvec{x}},\alpha ,\beta ,\varvec{\lambda }$ are the current solution obtained from solving the master problem at an iteration. Since $\alpha \le 0$, and $\phi $ is a convex function, the separation problem is maximizing a concave function, which is a convex optimization problem.

A reformulation for the two-stage stochastic optimization case is given in the following corollary.

Corollary 3.4

If $h(\cdot ,\cdot )$ is a recourse function defined in (1) and the ambiguity set ${\mathcal {P}}^{\phi }({\varvec{x}})$ is non-empty for any ${\varvec{x}}\in X$, then the ($\hbox {D}^3\hbox {RO}$) problem with the ambiguity set ${\mathcal {P}}^{\phi }({\varvec{x}})$ can be reformulated as follows:

$$\begin{aligned} \begin{aligned}&\underset{\begin{array}{c} \varvec{x},\varvec{y},z,\varvec{p},\\ \alpha ,\beta ,\varvec{\lambda } \end{array}}{min }\;\; z \\&\;\;s.t. \,\ z\ge f(\varvec{x}) + \sum ^N_{k=1}p_kg\big (\varvec{x},\varvec{y}^k,\xi ^k\big )\\&\qquad \qquad + \alpha \Big (\frac{1}{N}\sum ^N_{k=1}\phi (Np_k)-\eta (\varvec{x}) \Big ) + \beta \Big (\sum ^N_{k=1}p_k-1\Big ) + \sum ^N_{k=1}\lambda _kp_k \; \forall \varvec{p}\in S, \\&\qquad \quad \psi _i(\varvec{x},\varvec{y}^k,\varvec{\xi }^k)\ge 0 \quad \forall i\in [m],\;\forall k\in [N], \\&\qquad \quad z\in {\mathbb {R}},\;\varvec{x}\in X,\;\alpha \le 0,\; \beta \in {\mathbb {R}},\; \lambda _k\ge 0,\; p_k\ge 0\;\;\forall k\in [N], \end{aligned} \end{aligned}$$

(32)

where $S=\{\varvec{p}\in {\mathbb {R}}^N:\; \sum ^N_{k=1}p_k=1,\;p_k\ge 0\;\;\forall k\in [N]\}$.

3.5 Ambiguity sets defined based on the Kolmogorov–Smirnov test

The K–S distance was used by [17] in defining an ambiguity set for a data-driven distributionally robust optimization model. For two univariate probability distributions $P_1$ and $P_2$, let $F_1$ and $F_2$ be their cumulative distribution functions. The Kolmogorov–Smirnov (KS) distance is defined as:

$$\begin{aligned} D(P_1,P_2)=\underset{s}{\text {sup}}\; |F_1(s)-F_2(s)|. \end{aligned}$$

(33)

We now study the ($\hbox {D}^3\hbox {RO}$) problem with the ambiguity set defined based on the KS-distance. Note that although (33) is defined for a univariate random variable, this definition can be directly generalized for the probability distribution of a random vector with a finite support. Specifically, under Assumption 1, let $P_0=\sum ^N_{i=1}{\hat{p}}_i\delta _{\varvec{\xi }^i}$ be an empirical probability distribution, where $\delta _{\varvec{\xi }^i}$ is the indicator function. The KS-distance between a discrete probability distribution $P=\sum ^N_{i=1}p_i\delta _{\varvec{\xi }^i}$ and $P_0$ can be written as:

$$\begin{aligned} D(P,P_0)=\underset{k\in [N]}{\text {sup}}\; \Big | \sum ^k_{i=1}p_i - \sum ^k_{i=1}{\hat{p}}_i \Big |. \end{aligned}$$

(34)

The decision dependent ambiguity set of probability distributions is constructed using the KS-distance as follows:

$$\begin{aligned} {\mathcal {P}}^{KS}(\varvec{x})=\left\{ \varvec{p}\in {\mathbb {R}}^{N}:\;\; \underset{k\in [N]}{\text {sup}}\;\Big |\sum ^k_{i=1}p_i-\sum ^k_{i=1}{\hat{p}}_i \Big |\le \eta (\varvec{x}), \;\; \sum ^N_{i=1}p_i=1,\;\; p_i\ge 0\;\;\forall i\in [N] \right\} .\nonumber \\ \end{aligned}$$

(35)

A reformulation of the ($\hbox {D}^3\hbox {RO}$) problem is given in the following theorem.

Theorem 3.8

Let Assumption 1 hold. If for any ${\varvec{x}}\in X$, $h({\varvec{x}},\varvec{\xi }^k)$ is finite for any $k\in [N]$ and the ambiguity set (35) is nonempty, then the ($\hbox {D}^3\hbox {RO}$) problem with the ambiguity set (35) can be reformulated as:

$$\begin{aligned} \begin{aligned}&\underset{\varvec{x},\lambda ,\varvec{\alpha },\varvec{\beta },\varvec{\gamma }}{min }\; f(\varvec{x})+\lambda + \sum ^N_{i=1}\sum ^N_{k=1}(\alpha _k+\beta _k){\hat{p}}_i +\sum ^N_{k=1}(\alpha _k-\beta _k)\eta (\varvec{x}) \\&\quad s.t. \, \lambda + \sum ^N_{k=i}(\alpha _k+\beta _k) + \gamma _i \ge h(\varvec{x},\xi ^i) \qquad \forall i\in [N], \\&\qquad \quad \lambda \in {\mathbb {R}},\; \alpha _i\ge 0,\; \beta _i\le 0,\; \gamma _i\le 0\;\;\forall i\in [N]. \end{aligned} \end{aligned}$$

(36)

Proof

The ($\hbox {D}^3\hbox {RO}$) problem with the ambiguity set (35) can be written as:

$$\begin{aligned} \begin{aligned}&\underset{\varvec{x}}{\text {min}}\;f(\varvec{x}) + \underset{\varvec{p}}{\text {max}}\;\sum ^N_{i=1}h(\varvec{x},\xi ^i)p_i \\&\text {s.t.}\, \underset{k\in [N]}{\text {sup}}\;\Big |\sum ^k_{i=1}p_i-\sum ^k_{i=1}{\hat{p}}_i \Big |\le \eta (\varvec{x}), \\&\qquad \sum ^N_{i=1}p_i=1, \quad p_i\ge 0 \quad \forall i\in [N]. \end{aligned} \end{aligned}$$

(37)

Note that the inner problem of (37) can be reformulated as the following linear program:

$$\begin{aligned} \begin{aligned}&\underset{\varvec{p}}{\text {max}}\; \sum ^N_{i=1}h(\varvec{x},\xi ^i)p_i \\&\text {s.t.}\, \sum ^k_{i=1}p_i-\sum ^k_{i=1}{\hat{p}}_i\le \eta (\varvec{x}) \qquad \forall k\in [N], \\&\qquad \sum ^k_{i=1}p_i-\sum ^k_{i=1}{\hat{p}}_i\ge -\eta (\varvec{x}) \qquad \forall k\in [N], \\&\qquad \sum ^N_{i=1}p_i=1,\quad p_i\ge 0 \quad \forall i\in [N]. \end{aligned} \end{aligned}$$

(38)

After taking the dual of the above linear program and combining it with the outer problem, we obtain (36). $\square $

A reformulation for the two-stage stochastic optimization case is given in the following corollary.

Corollary 3.5

If $h(\cdot ,\cdot )$ is a recourse function defined in (1) and the ambiguity set is given by (35). If for any ${\varvec{x}}\in X$, $h({\varvec{x}},\varvec{\xi }^k)$ is finite for any $k\in [N]$ and the ambiguity set (35) is nonempty, then the ($\hbox {D}^3\hbox {RO}$) problem with the ambiguity set (35) can be reformulated as follows:

$$\begin{aligned} \begin{aligned}&\underset{\varvec{x},\varvec{y},\lambda ,\varvec{\alpha },\varvec{\beta },\varvec{\gamma }}{min }\; f(\varvec{x})+\lambda + \sum ^N_{i=1}\sum ^N_{k=1}(\alpha _k+\beta _k){\hat{p}}_i +\sum ^N_{k=1}(\alpha _k-\beta _k)\eta (\varvec{x}) \\&\quad s.t. \, \lambda + \sum ^N_{k=i}(\alpha _k+\beta _k) + \gamma _i \ge g(\varvec{x},\varvec{y}^i,\xi ^i) \qquad \forall i\in [N], \\&\qquad \quad \psi _i(\varvec{x},\varvec{y}^k,\varvec{\xi }^k)\ge 0 \quad \forall i\in [m],\;\forall k\in [N], \\&\qquad \quad {\varvec{x}}\in X,\; \lambda \in {\mathbb {R}},\; \alpha _i\ge 0,\; \beta _i\le 0,\; \gamma _i\le 0\;\;\forall i\in [N]. \end{aligned} \end{aligned}$$

(39)

The reformulation (36) of ($\hbox {D}^3\hbox {RO}$) with the ambiguity set defined using the K–S distance can be generalized for the case where the support ${\varXi }$ is continuous. The details of this generalization are given in Appendix B.

4 Solution approaches

4.1 Models with continuous support

For models with continuous and compact support ${\varXi }$, reformulations given in Sect. 3.1 (without the measure bound constraint) Sect. 3.2, Appendix A, B, and that in Sect. 3.4 are semi-infinite programs (gen-SIP):

$$\begin{aligned} \begin{array}{cll} \underset{x}{\text {min}} &{} \displaystyle f(x) &{} \\ \text {s.t.} &{}\displaystyle g(x,t)\le 0, &{} \forall t\in T, \\ &{} x\in X, \end{array} \end{aligned}$$

(gen-SIP)

by appropriately specifying g(x, t). Here $X\subseteq {\mathbb {R}}^{k_1}$ and $T\subseteq {\mathbb {R}}^{k_2}\times {\mathbb {Z}}^{k_3}$. A cutting-surface algorithm from [10] (Algorithm 1) is now presented to solve such problems. We assume to have an oracle that can solve a deterministic non-convex optimization problem with finitely many constraints. The basic cutting-surface algorithm solves a finite constraint relaxation problem at each iteration:

$$\begin{aligned} \underset{x\in X}{\text {min}}\;\{ f(x):\; \text {s.t.}\, g(x,t)\le 0, \; t\in T^{\prime } \}, \end{aligned}$$

(40)

where $T^{\prime }$ are finite samples from T. This relaxation is called the master problem. The algorithm generates desired samples as the algorithm progresses. These samples are identified from the support by solving a separation problem. See for example, the algorithm given in [10]. Specifically, a new constraint is added at each iteration of the master problem. This constrained is identified by finding a $t\in T$ for which the incumbent solution from the master problem is violated. Given the incumbent solution ${\hat{x}}\in X$, the following separation problem

$$\begin{aligned} \underset{t\in T}{\text {max}}\; g({\hat{x}}, t), \end{aligned}$$

(41)

is solved for the purposes of generating the next constraint. The algorithm outputs an $\varepsilon $-optimal solution (Definition 4.1) to (gen-SIP) in a finite number of iterations (Theorem 4.1).

Definition 4.1

For a general semi-infinite program in the form of (gen-SIP), a point $x_0\in X$ is an $\varepsilon $-feasible solution of (gen-SIP) if $\underset{t\in T}{\text {max}}\; g(x_0,t)\le \varepsilon $. A point $x_0\in X$ is an $\varepsilon $-optimal solution of (gen-SIP) if $x_0$ is an $\varepsilon $-feasible solution of (gen-SIP) and $f(x_0)\le \text {Val}$(gen-SIP).

Theorem 4.1

(Theorem 3.2 in [10]) If $X\times T$ is compact, and g(x, t) is continuous on $X\times T$, then Algorithm 1 terminates in finitely many iterations and returns an $\varepsilon $-optimal solution of (gen-SIP).

We note that the oracle problem (41) in the cutting-surface algorithm is simply a function evaluation problem for the finite support case. In this case the algorithm can be adapted by sequentially adding constraints as violated inequalities are identified. This may be useful when N is large.

4.2 Models with binary decision variables and finite support

The reformulations developed in Sect. 3 have nonlinear terms that are products of dual variables and primal variables. These reformulations are in general non-convex programming problems. A solution approach is needed to handle this nonlinearity. However, when applying the reformulations to problems where first-stage decision variables $\varvec{x}$ are binary and the dual variables/Lagrange multipliers are bounded, the nonlinear terms may become a product of a binary variable and a bounded continuous variable. For example, this is the case when in Sect. 3.1 the expressions $\varvec{l}({\varvec{x}})$, $\varvec{u}({\varvec{x}})$, ${\underline{p}}_k({\varvec{x}})$, ${\underline{p}}_k({\varvec{x}})$ are linear functions of ${\varvec{x}}$.

Such terms can be linearized as follows. Consider a bilinear term xy where $x\in \{0,1\}$ is a binary variable and y is a bounded continuous variable with the bounds $a\le y\le b$. We introduce a continuous variable z and the following constraints to represent xy:

$$\begin{aligned} ax\le z\le bx,\qquad y-b(1-x) \le z\le y-a(1-x). \end{aligned}$$

(42)

When $x=1$, the second constraint of (42) is equivalent to $z=y$, and the first constraint $a\le z\le b$ becomes redundant. When $x=0$, the first constraint of (42) is equivalent to $z=0$, and the second constraint $y-b\le z\le y-a$ becomes redundant. With the above linearization technique, problems where the nonlinear terms in the dual formulations in Sects. 3.1–3.4 involve a bilinear product of continuous and binary variables, and $h({\varvec{x}}\varvec{\xi })$ is convex in ${\varvec{x}}$ can be reformulated as a mixed 0-1 convex program. As an example, in [50], we investigate a distributionally-robust service center location problem with decision dependent utilities. We show that this problem can be reformulated as a two-stage stochastic mixed 0-1 second-order cone program using such as technique.

5 Illustrative numerical example

We now use a numerical example to illustrate the relevance of incorporating endogenous ambiguity in a distributionally-robust optimization model. Specifically, we consider a distributionally-robust newsvendor model:

$$\begin{aligned} \underset{x\in {\mathcal {X}},q\in {\mathcal {Q}}}{\text {max}}\;\underset{P\in {\mathcal {P}}(x)}{\text {min}}\;{\mathbb {E}}_P[x\cdot \text {min}(q,D)]-cq, \end{aligned}$$

(DNV)

where c is the unit cost, D is the stochastic demand, x is the selling price, and q is the number of units ordered/purchased by the vendor. The variables x and q are decision variables. The sets ${\mathcal {X}}$ and ${\mathcal {Q}}$ are the feasible sets of selling price and order quantity. The probability distribution of the demand D depends on the selling price. The set ${\mathcal {P}}(x)$ is a decision dependent ambiguity set of the probability distribution of D. The objective is to maximize the risk-averse profit. Product demand is a decreasing function of the selling price. For every candidate selling price $x_i$, there is a nominal deterministic demand value ${\widehat{D}}_i$ corresponding to $x_i$. In the numerical illustration, we consider a finite support ${\varXi }$ for D which is given by ${\varXi }=\{d_i\}^8_{i=1}$ where $d_1=5,\; d_2=10,\; d_3=15,\; d_4=20,\; d_5=25,\; d_6=30,\; d_7=35,\; d_8=40$. The set ${\mathcal {X}}$ of candidate selling price is given by ${\mathcal {X}}=\{x_1,x_2,x_3,x_4\}$ where $x_1=1.2,\; x_2=1.4,\; x_3=1.6,\; x_4=1.8$. The set ${\mathcal {Q}}$ of order quantities is given by ${\mathcal {Q}}=\{q_i\}^8_{i=1}$ where $q_i=d_i$ for all $i\in \{1,\ldots ,8\}$. The cost is $c=1.0$. The nominal demand values are given by ${\widehat{D}}_1=d_8,\;{\widehat{D}}_2=d_6,\;{\widehat{D}}_3=d_4,\;{\widehat{D}}_4=d_2$. Let ${\mathcal {P}}_i={\mathcal {P}}(x_i)$ for $i\in \{1,\ldots ,4\}$. The nominal optimal solution (when no ambiguity exits) is $x^*=x_2=1.4$, $q^*=q_6=30$, and the nominal optimal value is $V^*=11.30$. We consider different ways to define ${\mathcal {P}}_i$ and study the optimal solution of (DNV) under these definitions. In the current illustration, we enumerate all the possible combinations of $(x,q)\in {\mathcal {X}}\times {\mathcal {Q}}$ and solve (DNV) to determine the optimal solution. A future work addresses this problem through a systematic algorithm development [51].

5.1 Definition of ${\mathcal {P}}_i$ using mean and variance

Consider the case of defining ${\mathcal {P}}_i$ using mean and variance as described in Sect. 3.2. Specifically, the ${\mathcal {P}}_i$ is defined as

(43)

where $K=8$. Similarly, we can enumerate all the possible combinations of $(x,q)\in {\mathcal {X}}\times {\mathcal {Q}}$ and solve (DNV) in every case to determine the optimal solution. In the case that $(x,q)=(x_i,q_j)$ the objective value of (DNV) is given by

$$\begin{aligned} \begin{aligned} V(x_i,q_j)=\;&\text {min}\; \sum ^K_{k=1}p_k x_i\cdot \text {min}(q_j,d_k) - cq_j \\&\text {s.t.}\, \sum ^K_{k=1} p_k = 1, \\&\qquad {\widehat{D}}_i-\delta _i \le \sum ^K_{k=1}p_kd_k\le {\widehat{D}}_i+\delta _i, \\&\qquad \sum ^K_{k=1}p_k(d_k-{\widehat{D}}_i)^2\le \gamma _i, \\&\qquad p_k\ge 0\qquad \forall k\in [K]. \end{aligned} \end{aligned}$$

(44)

The optimization problem in (44) is a linear program of the decision variables $\varvec{p}$. In the first setting of the ambiguity, we set $\delta _i=2$ and $\gamma _i=5$ for $i\in \{1,2,3,4\}$. The optimal solution is $x^*=x_2=1.4$, $q^*=q_6=30$, and the optimal value is $V^*=10.60$. In the second setting of the ambiguity, we increase the ambiguity by setting $\delta _i=3$ and $\gamma _i=10$ for $i\in \{1,2,3,4\}$. The optimal solution becomes $x^*=x_2=1.4$, $q^*=q_5=25$, and the optimal value is $V^*=9.30$. Note that although the optimal selling price has not changed, the optimal order quantity changes with decision dependent ambiguity.

5.2 Definition of ${\mathcal {P}}_i$ using the Wasserstein metric

The ${\mathcal {P}}_i$ can be defined using the Wasserstein metic (20). Specifically, let ${\mathcal {P}}_i$ be defined as

$$\begin{aligned} {\mathcal {P}}_i=\{\varvec{p}\in {\mathbb {R}}^K\;|\; {\mathcal {W}}(\varvec{p},\hat{\varvec{p}}^i)\le r_i,\; \sum ^K_{k=1}p_k=1,\; p_k\ge 0,\; \forall k\in [K] \} \qquad \forall i\in [I],\qquad \end{aligned}$$

(45)

where $I=4$, $K=8$, and $\hat{\varvec{p}}^i$ is a nominal probability distribution on ${\varXi }$ satisfying

$$\begin{aligned} {\hat{p}}^i_k = \left\{ \begin{array}{ll} 1 &{} \text {if}\,d_k={\widehat{D}}_i, \\ 0 &{} \text {otherwise}. \end{array} \right. \end{aligned}$$

(46)

In the case that $(x,q)=(x_i,q_j)$ the objective value of (DNV) is given by

$$\begin{aligned} \begin{aligned} V(x_i,q_j)=\;&\text {min}\; \sum ^K_{k=1}p_k x_i\cdot \text {min}(q_j,d_k) - cq_j \\&\text {s.t.}\, \sum ^K_{k=1} p_k = 1, \\&\qquad \sum ^K_{k=1} t_{kl} = {\hat{p}}^i_l \qquad \forall l\in [K], \\&\qquad \sum ^K_{l=1} t_{kl} = p_k \qquad \forall k\in [K], \\&\qquad \sum ^{K}_{k=1}\sum ^K_{l=1}|d_k-d_l|t_{kl}\le r_i, \\&\qquad p_k\ge 0,\; t_{kl}\ge 0\qquad \forall k,l\in [K], \end{aligned} \end{aligned}$$

(47)

Notice that (47) is a linear program of the decision variables $p_k$ and $t_{kl}$ ($k,l\in [K]$). In the first setting of the ambiguity, we set $r_i=0.5$ for $i\in \{1,2,3,4\}$. The optimal solution is $x^*=x_2=1.4$, $q^*=q_6=30$, and the optimal value is $V^*=11.30$. In the second setting of the ambiguity, we increase the ambiguity of the demand at the price $x_2$ by setting $r_2=0.6$ and $r_i=0.5$ for $i\in \{1,3,4\}$. The optimal solution becomes $x^*=x_3=1.6$, $q^*=q_4=20$, and the optimal value is $V^*=11.20$. Note that in this case both the optimal selling price and the order quantity changes with the ambiguity.

6 Concluding remarks

We have established a framework for reformulating the distributionally robust optimization problems with important types of decision dependent ambiguity sets. These ambiguity sets contain decision dependent parameters. For example, the moment robust ambiguity set (10) contains parameters $\alpha ({\varvec{x}})$, $\beta ({\varvec{x}})$, $\varvec{\mu }({\varvec{x}})$ and $\varvec{Q}({\varvec{x}})$, which are functions of the decision ${\varvec{x}}$. We now briefly discuss the estimation of these functions using a data-driven approach. Ambiguity sets for $\varvec{\xi }$ under an arbitrary decision ${\varvec{x}}$ can be constructed if such information is available from past decisions, or if it is possible for us to experiment with trial decisions $\{{\varvec{x}}^i\}^k_{i=1}$ and collect samples of the random vector $\varvec{\xi }$ under each decision ${\varvec{x}}^i$. From these samples we can establish the analytical relation between the parameters in defining the ambiguity set and the decision using statistical learning models. We can subsequently extrapolate this analytical relation to a general decision ${\varvec{x}}$ to obtain an empirical decision dependent ambiguity set description.

The goal of this chapter was to show that it is possible to extend the dual formulations in DRO even when the ambiguity sets are decision dependent. The analysis suggests that the situations for which DRO models admit a dual reformulation also allow for dual reformulations for the decision dependent case. The reformulated models are generally non-convex optimization problems requiring further investigation towards developing efficient algorithms for the specific situations. The non-convex optimization problems may have further structure when additional assumptions on decision dependent parameters and the feasible set X are imposed. This structure may be exploited for further refined reformulations and the development of efficient algorithms.

References

Hu, J., Li, J., Mehrotra, S.: A data driven functionally robust approach for coordinating pricing and order quantity decisions with unknown demand function. Oper. Res. (2015). http://www.optimization-online.org/DB_HTML/2015/07/5016.html(to appear)
Birge, J., Louveaux, F.: Introduction to Stochastic Programming. Springer, New York (1997)
MATH Google Scholar
Rahimian, H., Mehrotra, S.: Distributionally-robust optimization: a review (2019). Available on optimization online. http://www.optimization-online.org/DB_FILE/2019/08/7332.pdf
Bertsimas, D., Doan, X.V., Natarajan, K., Teo, C.: Models for minimax stochastic linear optimization problems with risk aversion. Math. Oper. Res. 35(3), 580–602 (2010)
Article MathSciNet Google Scholar
Mehrotra, S., Zhang, H.: Models and algorithms for distributionally robust least squares problems. Math. Program. 146, 123–141 (2014)
Article MathSciNet Google Scholar
Delage, E., Ye, Y.: Distributional robust optimization under moment uncertainty with application to data-driven problems. Oper. Res. 58(3), 595–612 (2010)
Article MathSciNet Google Scholar
Gao, R., Kleywegt, A.J.: Distributionally robust stochastic optimization with Wasserstein distance (2016). https://arxiv.org/pdf/1604.02199.pdf
Esfahan, P.M., Kuhn, D.: Data-driven distributionally robust optimization using the wasserstein metric: performance guarantees and tractable reformulations. Math. Program. 171, 115–166 (2018)
Article MathSciNet Google Scholar
Shafieezadeh Abadeh, S., Mohajerin Esfahani, P.M., Kuhn, D.: Distributionally robust logistic regression. In: Cortes, C., Lawrence, N.D., Lee, D.D., Sugiyama, M., Garnett, R. (eds.) Advances in Neural Information Processing Systems, vol. 28, pp. 1576–1584. Curran Associates, Inc. (2015). http://papers.nips.cc/paper/5745-distributionally-robust-logistic-regression.pdf
Luo, F.Q., Mehrotra, S.: Decomposition algorithms for distributionally robust optimization using Wasserstein metric (2017). http://www.optimization-online.org/DB_HTML/2017/04/5946.html
Ben-Tal, A., Hertog, D., Waegenaere, A.D., Melenberg, B., Rennen, G.: Robust solutions of optimization problems affected by uncertain probabilities. Manag. Sci. 59, 341–357 (2013)
Article Google Scholar
Calafiore, G.C.: Ambiguous risk measures and optimal robust portfolios. SIAM J. Optim. 18(3), 853–877 (2007)
Article MathSciNet Google Scholar
Jiang, R., Guan, Y.: Data-driven chance constrained stochastic program. Math. Program. 158, 291–327 (2016)
Article MathSciNet Google Scholar
Love, D.K., Bayraksan, G.: Phi-divergence constrained ambiguous stochastic programs for data-driven optimization (2016). http://www.optimization-online.org/DB_HTML/2016/03/5350.html
Wang, Z., Glynn, P.W., Ye, Y.: Likelihood robust optimization for data-driven problems. Comput. Manag. Sci. 13, 241–261 (2016)
Article MathSciNet Google Scholar
Yanıkoğlu, İ., Hertog, D.: Safe approximations of ambiguous chance constraints using historical data. INFORMS J. Comput. 25(4), 666–681 (2013)
Article MathSciNet Google Scholar
Bertsimas, D., Gupta, V., Kallus, N.: Data-driven robust optimization. Math. Program. 167, 235–292 (2018)
Article MathSciNet Google Scholar
Floods, C.A., Pardalos, P.M.: Recent Advances in Global Optimization. Princeton University Press, Princeton (2014)
Book Google Scholar
Webster, M., Santen, N., Parpas, P.: An approximate dynamic programming framework for modeling global climate policy under decision-dependent uncertainty. Comput. Manag. Sci. 9, 339–362 (2012)
Article MathSciNet Google Scholar
Goel, V., Grossmann, I.E.: A class of stochastic programs with decision dependent uncertainty. Math. Program. 108, 355–394 (2006)
Article MathSciNet Google Scholar
Poss, M.: Robust combinatorial optimization with variable budgeted uncertainty. J. Oper. Res. 11, 75–92 (2013)
Article MathSciNet Google Scholar
Nohadani, O., Sharma, K.: Optimization under decision-dependent uncertainty. SIAM J. Optim. 28, 1773–1795 (2018)
Article MathSciNet Google Scholar
Kurz, M.: The Kesten–Stigum model and the treatment of uncertainty in equilibrium theory. In: Balch, M.S., McFadden, D.L., Wu, S.Y. (eds.) Essays on Economic Behavior Under Uncertainty. North Holland, Amsterdam (1974)
Google Scholar
Kurz, M.: Symposium: rational beliefs and endogenous uncertainty. Econ. Theory 8, 383–397 (1996)
Article Google Scholar
Kurz, M., Motolese, M.: Endogenous uncertainty and market volatility. Econ. Theory 17, 497–544 (2001)
Article MathSciNet Google Scholar
Tsur, Y., Zemel, A.: Endangered aquifers: groundwater management under threats of catastrophic events. Water Resour. Res. 40, W06S20 (2004)
Google Scholar
Shao, H., Lam, W.H.K., Tam, M.L.: A reliability-based stochastic traffic assignment model for network with multiple user classes under uncertainty in demand. Netw. Spat. Econ. 6, 173–204 (2006)
Article MathSciNet Google Scholar
Jonsbråten, T.W.: Optimization models for petroleum field exploitation. Ph.D. thesis, Norwegian School of Economics and Business Administration, Bergen, Norway (1998)
Tarhan, B., Grossmann, I.E., Goel, V.: Stochastic programming approach for planning of offshore oil or gas field infrastructure under decision-dependent uncertainty. Ind. Eng. Chem. Res. 48, 3078–3097 (2009)
Article Google Scholar
Goel, V., Grossmann, I.E.: A stochastic programming approach to planning of offshore gas filed developments under uncertainty in reserves. Comput. Chem. Eng. 108, 1409–1429 (2004)
Article Google Scholar
Ahmed, S.: Strategic planning under uncertainty: stochastic integer programming approaches. Ph.D. thesis, University of Illinois at Urbana-Champaign, Urbana, IL, USA (2000)
Viswanath, K., Peeta, S., Salman, F.S.: Investing in the links of a stochastic network to minimize expected shortest path length. Technical report, Purdue University (2004)
Jonsbråten, T.W., Wets, R.J.B., Woodruff, D.L.: A class of stochastic programs with decision dependent random elements. Ann. Oper. Res. 82, 83–106 (1998)
Article MathSciNet Google Scholar
Luce, R.D.: The choice axiom after twenty years. J. Math. Psychol. 15(3), 215–233 (1977)
Article MathSciNet Google Scholar
Hu, J., Mehrotra, S.: Robust decision making over a set of random targets or risk-averse utilities with an application to portfolio optimization. IIE Trans. 47, 358–372 (2015)
Article Google Scholar
Held, H., Woodruff, D.L.: Heuristics for multi-stage interdiction of stochastic networks. J. Heuristics 11, 483–500 (2005)
Article Google Scholar
Dupačová, J.: Optimization under exogenous and endogenous uncertainty. In: Mathematical Methods in Economics (2006)
Lee, S., de Mello, T.H., Kleywegt, A.J.: Newsvendor-type models with decision-dependent uncertainty. Math. Methods Oper. Res. 76, 189–221 (2012)
Article MathSciNet Google Scholar
Goel, V., Grossmann, I.E., El-Bakry, A.S., Mulkay, E.L.: A novel branch and bound algorithm for optimal development of gas fields under uncertainty in reserves. Comput. Chem. Eng. 30, 1076–1092 (2006)
Article Google Scholar
Goel, V., Grossmann, I.E.: A Lagrangian duality based branch and bound for solving linear stochastic programs with decision dependent uncertainty. In: Puigjaner, L., Espuña , A. (eds.) European Symposium on Computer Aided Process Engineering, vol. 15. Elsevier Science B. V. (2005)
Gupta, V., Grossmann, I.E.: Solution strategies for multistage stochastic programming with endogenous uncertainty. Comput. Chem. Eng. 35, 2235–2247 (2011)
Article Google Scholar
Tarhan, B., Grossmann, I.E.: A multistage stochastic programming approach with strategies for uncertainty reduction in the synthesis of process networks with uncertain yields. Comput. Chem. Eng. 32, 766–788 (2008)
Article Google Scholar
Tarhan, B., Grossmann, I.E., Goel, V.: Computational strategies for non-convex multistage minlp models with decision-dependent uncertainty and gradual uncertainty resolution. Ann. Oper. Res. 203, 141–166 (2013)
Article MathSciNet Google Scholar
Spacey, S.A., Wiesemann, W., Kuhn, D., Luk, W.: Robust software partitioning with multiple instantiation. INFORMS J. Comput. 24(3), 500–515 (2012)
Article MathSciNet Google Scholar
Vujanic, R., Goulart, P., Morari, M.: Robust optimization of schedules affected by uncertain events. J. Optim. Theory Appl. 171, 1033–1054 (2016)
Article MathSciNet Google Scholar
Mehrotra, S., Papp, D.: A cutting surface algorithm for semi-infinite convex programming with an application to moment robust optimization. SIAM J. Optim. 24(4), 1670–1697 (2015)
Article MathSciNet Google Scholar
Shapiro, A.: On duality theory of conic linear problems. In: Goberna, M.Á., López, M.A. (eds.) Semi-InfiniteProgramming. Nonconvex Optimization and Its Applications, vol. 57, pp. 135–165. Springer, Boston (2001)
Google Scholar
Givens, C.R., Shortt, R.M.: A class of wasserstein metrics for probability distributions. Mich. Math. J. 31(2), 231–240 (1984)
Article MathSciNet Google Scholar
Jiang, R., Guan, Y.: Risk-averse two-stage stochastic program with distributional ambiguity. Oper. Res. 66, 1390–1405 (2018)
Article MathSciNet Google Scholar
Luo, F.Q., Mehrotra, S.: Distributionally robust service center location problem with decision dependent utilities. Technical report, Northwestern University (2019)
Rahimian, H., Mehrotra, S.: Multi-product newsvendor problem with decision dependent demand. Working paper, Northwestern University (2019)

Download references

Acknowledgements

This research was supported by the Office of Naval Research Grant N00014-18-1-2097-P00001. We thank Hamed Rahimian for identifying several hypothetical errors in an earlier draft of this chapter.

Author information

Authors and Affiliations

Department of Industrial Engineering and Management Science, Northwestern University, Evanston, IL, USA
Fengqiao Luo & Sanjay Mehrotra

Authors

Fengqiao Luo
View author publications
You can also search for this author in PubMed Google Scholar
Sanjay Mehrotra
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Fengqiao Luo.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendices

Reformulation of ($\hbox {D}^3\hbox {RO}$) with Wasserstein metric and continuous support of random parameters

We study the reformulation of ($\hbox {D}^3\hbox {RO}$) with Wasserstein metric and continuous support of random parameters. In contrast to the case studied in Sect. 3.3, we do not assume that Assumption 1 holds. As a consequence, the support ${\varXi }$ of the decision dependent random parameters $\xi $ can be continuous. Suppose at a decision $\varvec{x}_0$, we have observed N samples of the random variable $\varvec{\xi }$, written as $\{\varvec{\xi }^i\}^N_{i=1}$. We construct an empirical distribution as: $P_0=\sum ^N_{i=1}\frac{1}{N}\delta _{\varvec{\xi }^i}$. Setting the empirical distribution as the center of the Wasserstein ball, we can define the decision dependent ambiguity set as:

$$\begin{aligned} {\mathcal {P}}^{W}_C({\varvec{x}}):=\{P\in {\mathcal {P}}({\varXi },{\mathcal {F}})\;|\;{\mathcal {W}}(P,P_0)\le r({\varvec{x}})\}, \end{aligned}$$

(48)

where ${\mathcal {W}}(P,P_0)$ is defined in (19). The reformulation of ($\hbox {D}^3\hbox {RO}$) with the ambiguity set ${\mathcal {P}}^{W}_C({\varvec{x}})$ is given by the following theorem.

Theorem A.1

The ($\hbox {D}^3\hbox {RO}$) model with the ambiguity set ${\mathcal {P}}^{W}_C({\varvec{x}})$ can be reformulated as the following semi-infinite program if ${\mathcal {P}}^{W}_C({\varvec{x}})$ is non-empty for any $x\in X$:

$$\begin{aligned} \begin{aligned} \underset{\varvec{x},\varvec{v}}{min }&\quad f(\varvec{x}) + \frac{1}{N}\sum ^{N}_{i=1} v_i + r(\varvec{x})\cdot v_{N+1}&\\ s.t.&\quad h(\varvec{x},\varvec{s}) - v_i - v_{N+1}\cdot \Vert \varvec{s}-\varvec{\xi }^i\Vert \le 0,&\forall \varvec{s}\in {\varXi }, \forall i\in [N] \\&\quad \varvec{x}\in {\mathcal {X}}, \;\; v_1,\ldots , v_{N}\in {\mathbb {R}}, \quad v_{N+1} \ge 0.&\end{aligned} \end{aligned}$$

(49)

Proof

From Theorem 3.6 of [10], the inner problem of $\hbox {D}^3\hbox {RO}$ with the ambiguity set defined by the Wasserstein metric (48) is equivalent to the following conic linear program:

$$\begin{aligned} \begin{aligned} \underset{\mu }{\text {max}}&\quad \int _{\varvec{s}\in {\varXi }} h(\varvec{x},\varvec{s}) \mu (d\varvec{s}\times {\varXi })&\\ \text {s.t.}&\quad \mu ({\varXi }\times \{\varvec{\xi }^i\}) = 1/N,&\forall i\in [N] \\&\quad \mu ({\varXi }\times {\varXi }^{\prime }) = 0, \\&\quad \sum _{i\in [N]} \int _{\varvec{s}\in {\varXi }} \Vert \varvec{s}-\varvec{\xi }^i\Vert \mu (d\varvec{s}\times \{\varvec{\xi }^i\})\le r(\varvec{x}),&\\&\quad \mu \succeq 0,&\end{aligned} \end{aligned}$$

(50)

where ${\varXi }^{\prime }:={\varXi }\setminus \{\varvec{\xi }^i\}^{N}$, and $\mu \succeq 0$ denotes that $\mu $ is a positive measure. Based on Theorem 3.7 of [10], we can apply the conic duality theory from [47] to (50), and obtain the following dual formulation of (50):

$$\begin{aligned} \begin{aligned} \underset{\varvec{v}}{\text {min}}&\quad \displaystyle \frac{1}{N}\sum ^{N}_{i=1} v_i + r(\varvec{x})\cdot v_{N+1}&\\ \text {s.t.}&\quad h(\varvec{x},\varvec{s}) - v_i - v_{N+1}\cdot \Vert \varvec{s}-\varvec{\xi }^i\Vert \le 0&\forall \varvec{s}\in {\varXi }, \forall i\in [N], \\&\quad v_1,\ldots , v_{N}\in {\mathbb {R}}, \quad v_{N+1} \ge 0.&\end{aligned} \end{aligned}$$

(51)

After combining (51) with the outer minimization problem over x, we obtain the desired reformulation (49). $\square $

Reformulation of ($\hbox {D}^3\hbox {RO}$) with K–S distance and continuous support of the random parameters

We now investigate the reformulation of ($\hbox {D}^3\hbox {RO}$) with the ambiguity set defined by K–S distance where the support ${\varXi }$ is not finite. We assume that ${\varXi }$ is contained in a hyper-rectangle $[\varvec{a},\varvec{b}]:=[a_1,b_1]\times \ldots \times [a_d,b_d]$. The definition of K–S distance (33) can be generalized for two multivariate cumulative distribution functions as follows:

$$\begin{aligned} D(P_1,P_2)=\underset{\varvec{s}\in {\mathbb {R}}^d}{\text {sup}}\; |F_1(\varvec{s})-F_2(\varvec{s})|, \end{aligned}$$

(52)

where $\varvec{s}=[s_1,\ldots ,s_d]$ and the cumulative function $F_i$ ($i=1,2$) is defined as: $F_i(\varvec{s})=\int ^{s_1}_{-\infty }\dots \int ^{s_d}_{-\infty }P_1(x_1,\ldots ,x_d)dx_1\ldots dx_d$. Suppose for a ${\varvec{x}}_0\in X$, we have observed N samples of the random vector $\varvec{\xi }$, written as $\{\varvec{\xi }^i\}^N_{i=1}$. We define the empirical distribution as $P_0:=\sum ^N_{i=1}\frac{1}{N}\delta _{\varvec{\xi }_i}$ and denote the cumulative distribution function of $P_0$ as $F_0$. Let ${\mathcal {P}}([\varvec{a},\varvec{b}],{\mathcal {B}})$ denote the set of probability distributions on [a, b] with the Borel sigma algebra ${\mathcal {B}}$. For any $P\in {\mathcal {P}}([\varvec{a},\varvec{b}],{\mathcal {B}})$, let $F^P$ denote the cumulative distribution function of P. The decision dependent ambiguity set based on K–S distance can be constructed as:

$$\begin{aligned} {\mathcal {P}}^{KS}_C({\varvec{x}}):=\big \{P\in {\mathcal {P}}([\varvec{a},\varvec{b}],{\mathcal {B}}):\; \underset{\varvec{s}\in [\varvec{a},\varvec{b}]}{\text {sup}}\;|F^P(\varvec{s})-F_0(\varvec{s})|\le \alpha ({\varvec{x}}) \big \}. \end{aligned}$$

(53)

We now reformulate the ambiguity set (53) into finitely many expectation constraints of indicator functions by partitioning the hyper-rectangle $[\varvec{a},\varvec{b}]$ into hyper-rectangular cells. Specifically, let $\xi ^i_k$ be the k-th component of the i-th observed sample. For each component k ($k\in [d]$), we sort the observed samples $\{\xi ^i_k\}^N_{i=1}$ based on the k-th component in the ascending order, and suppose that the sorted sample components are labeled as $\{\xi ^{[i]}_k\}^N_{i=1}$ such that $a_k<\xi ^{[1]}_k<\xi ^{[2]}_k<\dots<\xi ^{[N]}_k<b_k$. Let us divide each interval $[a_k, b_k]$ into $N+1$ sub-intervals as: $I^k_0=[a_k, \xi ^{[1]}_k)$, $I^k_i=[\xi ^{[i]}_k, \xi ^{[i+1]}_k)$ for $i\in [N-1]$ and $I^k_{N}=[\xi ^{[N]}_k, b_k]$, and create an N-dimensional grid based on the sub-intervals for each dimension k to partition $[\varvec{a},\varvec{b}]$ into $(N+1)^d$ sub-rectangular cells. Based on this partition and using the convention that $\xi ^{[0]}_k=a_k$ for $k\in [d]$, the reference CDF can be written as:

$$\begin{aligned} F_0(\varvec{s})=\frac{N_{j_1j_2\dots j_d}}{N}\qquad \text {for } \varvec{s}\in I^1_{j_1}\times I^2_{j_2}\times \dots \times I^d_{j_d}, \quad j_r\in \{0,1,\dots ,N\},\quad r\in [d],\nonumber \\ \end{aligned}$$

(54)

where $N_{j_1j_2\dots j_d}$ is the number of observed samples within the hyper-rectangle $[a_1, \xi ^{[j_1]}_1]\times [a_2, \xi ^{[j_2]}_2]\times \dots \times [a_d, \xi ^{[j_d]}_d]$. For simplicity of notations, we let $I_{j_1j_2\dots j_d}:=I^1_{j_1}\times I^2_{j_2}\times \dots \times I^d_{j_d}$, then the ambiguity set (53) can be reformulated as

$$\begin{aligned}&{\mathcal {P}}^{KS}_C({\varvec{x}})=\left\{ P\in {\mathcal {P}}([\varvec{a},\varvec{b}],{\mathcal {B}}):\; \left| P(\varvec{s}\in I_{j_1j_2\dots j_d}) - \frac{N_{j_1\dots j_d}}{N} \right| \right. \nonumber \\&\quad \left. \le \alpha ({\varvec{x}})\,\text {for}\, j_r\in \{0,1,\dots ,N\},\; r\in [d] \right\} . \end{aligned}$$

(55)

Reformulation of the ($\hbox {D}^3\hbox {RO}$) with the ambiguity set (53) is given in the following theorem:

Theorem B.1

If $h({\varvec{x}},\varvec{s})$ is continuous in $\varvec{s}\in [\varvec{a},\varvec{b}]$ for any $\varvec{x}\in X$ and (53) is non-empty for any $x\in X$, then the ($\hbox {D}^3\hbox {RO}$) with the ambiguity set (53) can be reformulated as the following semi-infinite program:

$$\begin{aligned} \begin{aligned}&\underset{\varvec{x},\varvec{\overline{\lambda }},\varvec{\underline{\lambda }}}{min }\;\; f({\varvec{x}}) + \sum ^N_{j_1=0}\dots \sum ^N_{j_d=0}\left( \frac{N_{j_1\dots j_d}}{N} + \alpha ({\varvec{x}}) \right) \overline{\lambda }_{j_1\dots j_d} \\&\qquad \qquad + \sum ^N_{j_1=0}\dots \sum ^N_{j_d=0}\left( \frac{N_{j_1\dots j_d}}{N} - \alpha ({\varvec{x}}) \right) \underline{\lambda }_{j_1\dots j_d} + \gamma \\&s.t. \;\; \overline{\lambda }_{j_1\dots j_d} + \underline{\lambda }_{j_1\dots j_d} + \gamma \ge h({\varvec{x}},\varvec{s}) \qquad \forall \varvec{s}\in cl (I_{j_1\dots j_d}),\;\forall j_r\in \{0,1,\dots ,N\},\; r\in [d], \\&\qquad \gamma \in {\mathbb {R}},\;\overline{\lambda }_{j_1\dots j_d}\le 0,\; \underline{\lambda }_{j_1\dots j_d}(\varvec{s})\ge 0,\;\; \forall j_r\in \{0,1,\dots ,N\},\; r\in [d]. \end{aligned} \end{aligned}$$

(56)

Proof

Note that the probability $P(\varvec{s}\in I_{j_1j_2\dots j_d})$ in (55) can be written as the expectation of the indicator function $\varvec{1}_{I_{j_1j_2\dots j_d}}(\varvec{s})$ with respect to P. The inner problem of ($\hbox {D}^3\hbox {RO}$) can be reformulated as the following conic linear program:

$$\begin{aligned} \begin{aligned}&\underset{P\in {\mathcal {P}}([\varvec{a},\varvec{b}],{\mathcal {B}})}{\text {max}}\;\; {\mathbb {E}}_P[h({\varvec{x}},\varvec{s})]&\\&\qquad \text {s.t.}\,\quad {\mathbb {E}}_P[\varvec{1}_{I_{j_1\dots j_d}}(\varvec{s})] \le \frac{N_{j_1\dots j_d}}{N} + \alpha ({\varvec{x}}) \quad \forall j_r\in \{0,1,\dots ,N\},\;\; r\in [d],&\overline{\lambda }_{j_1\dots j_d}\ge 0 \\&\qquad \qquad \quad {\mathbb {E}}_P[\varvec{1}_{I_{j_1\dots j_d}}(\varvec{s})] \ge \frac{N_{j_1\dots j_d}}{N} - \alpha ({\varvec{x}}) \quad \forall j_r\in \{0,1,\dots ,N\},\;\; r\in [d],&\underline{\lambda }_{j_1\dots j_d}\le 0 \\&\qquad \qquad \quad {\mathbb {E}}_P[\varvec{1}_{[\varvec{a},\varvec{b}]}(\varvec{s})] = 1,&\gamma \in {\mathbb {R}}. \end{aligned}\nonumber \\ \end{aligned}$$

(57)

Applying conic duality [47] to (57) we obtain the following dual problem of (57):

$$\begin{aligned} \begin{aligned}&\underset{\varvec{\overline{\lambda }},\varvec{\underline{\lambda }},\gamma }{\text {min}}\;\; \sum ^N_{j_1=0}\dots \sum ^N_{j_d=0}\left( \frac{N_{j_1\dots j_d}}{N} + \alpha ({\varvec{x}}) \right) \overline{\lambda }_{j_1\dots j_d}\\&\quad +\, \sum ^N_{j_1=0}\dots \quad \sum ^N_{j_d=0}\left( \frac{N_{j_1\dots j_d}}{N} - \alpha ({\varvec{x}}) \right) \underline{\lambda }_{j_1\dots j_d} + \gamma \\&\text {s.t.}\,\quad (\overline{\lambda }_{j_1\dots j_d} + \underline{\lambda }_{j_1\dots j_d} )\varvec{1}_{I_{j_1\dots j_d}}(\varvec{s}) + \gamma \varvec{1}_{[\varvec{a},\varvec{b}]} \ge h({\varvec{x}},\varvec{s}) \qquad \forall \varvec{s}\in [\varvec{a},\varvec{b}], \\&\qquad \quad \gamma \in {\mathbb {R}},\;\overline{\lambda }_{j_1\dots j_d}\ge 0,\; \underline{\lambda }_{j_1\dots j_d}(\varvec{s})\le 0,\;\; \forall j_r\in \{0,1,\dots ,N\},\; r\in [d]. \end{aligned} \end{aligned}$$

(58)

Note that by partitioning the range of the vector $\varvec{s}$, the first constraint of (58) can be reformulated as the following semi-infinite constraints:

$$\begin{aligned} \overline{\lambda }_{j_1\dots j_d} + \underline{\lambda }_{j_1\dots j_d} + \gamma \ge h({\varvec{x}},\varvec{s}) \qquad \forall \varvec{s}\in I_{j_1\dots j_d},\;\forall j_r\in \{0,1,\dots ,N\},\; r\in [d].\nonumber \\ \end{aligned}$$

(59)

Since $h({\varvec{x}},\varvec{s})$ is continuous in $\varvec{s}\in [\varvec{a},\varvec{b}]$ for any ${\varvec{x}}\in X$, we can replace $I_{j_1\dots j_d}$ with the closure $\text {cl}(I_{j_1\dots j_d})$ in the above semi-infinite constraints. Then by Proposition 2.8(iii) of [47], the optimal objective of (57) equals the optimal objective of (58). After combining (58) with the outer minimization problem over ${\varvec{x}}\in X$, we can reformulate ($\hbox {D}^3\hbox {RO}$) into (56). $\square $

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Luo, F., Mehrotra, S. Distributionally robust optimization with decision dependent ambiguity sets. Optim Lett 14, 2565–2594 (2020). https://doi.org/10.1007/s11590-020-01574-3

Download citation

Received: 12 April 2019
Accepted: 21 March 2020
Published: 08 April 2020
Issue Date: November 2020
DOI: https://doi.org/10.1007/s11590-020-01574-3

Keywords

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Distributionally robust optimization with decision dependent ambiguity sets

Abstract

Similar content being viewed by others

Distributionally Robust Optimization with Moment Ambiguity Sets

Data-driven Stochastic Programming with Distributionally Robust Constraints Under Wasserstein Distance: Asymptotic Properties

Data-driven stochastic optimization for distributional ambiguity with integrated confidence region

1 Introduction

2 Literature review on optimization with decision dependent uncertainty

3 Reformulation of (D\(^3\)RO)

Assumption 1

3.1 Ambiguity sets defined by simple measure and moment inequalities

Theorem 3.1

Proof

Corollary 3.1

Proof

Theorem 3.2

Theorem 3.3

Proof

3.2 Covariance matrix based ambiguity set

Theorem 3.4

Proof

Corollary 3.2

Theorem 3.5

Proof

3.3 Ambiguity sets defined by Wasserstein metric

Theorem 3.6

Proof

Corollary 3.3

3.4 Ambiguity sets defined using \(\phi \)-divergence

Theorem 3.7

Proof

Corollary 3.4

3.5 Ambiguity sets defined based on the Kolmogorov–Smirnov test

Theorem 3.8

Proof

Corollary 3.5

4 Solution approaches

4.1 Models with continuous support

Definition 4.1

Theorem 4.1

4.2 Models with binary decision variables and finite support

5 Illustrative numerical example

5.1 Definition of \({\mathcal {P}}_i\) using mean and variance

5.2 Definition of \({\mathcal {P}}_i\) using the Wasserstein metric

6 Concluding remarks

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Appendices

Reformulation of (\(\hbox {D}^3\hbox {RO}\)) with Wasserstein metric and continuous support of random parameters

Theorem A.1

Proof

Reformulation of (\(\hbox {D}^3\hbox {RO}\)) with K–S distance and continuous support of the random parameters

Theorem B.1

Proof

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation