1 Introduction

Stochastic dominance is a fundamental concept in decision theory and economics [26]. For two random variables \(\xi _1(\omega )\) and \(\xi _2(\omega )\) defined on the probability space \((\Omega ,\mathcal{F},P)\) with finite expected values, \(\xi _1(\omega )\) is said to dominate \(\xi _2(\omega )\) in the second order, denoted by \(\xi _1(\omega )\succeq _{2}\xi _2(\omega )\), if

$$\begin{aligned} \int \limits _{-\infty }^{\eta } P\{\omega \in \Omega : \xi _1(\omega )\le t\} dt \le \int \limits _{-\infty }^{\eta } P\{\omega \in \Omega : \xi _2(\omega )\le t\} dt, \quad \forall \eta \in \mathbb R . \end{aligned}$$
(1)

By changing the order of integration in (1), the condition for second order dominance can be reformulated as:

$$\begin{aligned} \mathbb{E }_P[(t-\xi _1(\omega ))_{+}]\le \mathbb{E }_P[(t-\xi _2(\omega ))_{+}], \qquad \forall t\in \mathbb R . \end{aligned}$$
(2)

We use the notation \((\cdot )_{+}\) to denote the positive part, that is, \((x)_+ := \max \{x,0\}\).

In this paper, we consider the following stochastic program with stochastic second order dominance (SSD) constraints:

$$\begin{aligned} \begin{array}{ll} \min \limits _{x}&\mathbb{E }_P[f(x,\xi (\omega ))]\\ \text{ s.t.}&G(x,\xi (\omega ))\succeq _{2} Y(\xi (\omega )),\\&x\in X, \end{array} \end{aligned}$$
(3)

where \(X\) is a nonempty compact subset of \(\mathbb R ^n\), \(\xi :\Omega \rightarrow \Xi \) is a vector of random variables defined on the probability space \((\Omega ,\mathcal{F},P)\) with support set \(\Xi \subset \mathbb R ^m\), \(f, G: \mathbb R ^{n}\times \Xi \rightarrow \mathbb R \) are Lipschitz continuous functions and for every \(\xi \in \Xi \), \(G(\cdot ,\xi ): \mathbb R ^{n}\rightarrow \mathbb R \) is concave; \(Y(\xi (\omega ))\) is a random variable, and \(\mathbb{E }_P[\cdot ]\) denotes the expected value with respect to the probability (\(P\)) distribution of \(\xi \). For simplicity of discussion, we make a blanket assumption that \(f\) and \(G\) are \(P\)-integrable.

Using the equivalent formulation of the second order dominance constraint (2), problem (3) can be written as a stochastic semi-infinite programming (SSIP) problem:

$$\begin{aligned} \begin{array}{ll} \min \limits _{x}&\mathbb{E }_P[f(x,\xi (\omega ))]\\ \text{ s.t.}&\mathbb{E }_P[(t-G(x,\xi (\omega )))_{+}]\le \mathbb{E }_P[(t-Y(\xi (\omega )))_{+}], \quad \forall t\in \mathbb R ,\\&x\in X. \end{array} \end{aligned}$$
(4)

Stochastic optimization models with SSD constraints were introduced by Dentcheva and Ruszczyński [8, 9]. Over the past few years, there has been increasing discussion of this subject covering optimization theory, numerical methods and practical applications, see [712, 17, 19, 22] and references therein.

It is well-known that the SSIP problem above does not satisfy Slater’s constraint qualification, a condition that is often required for a stable numerical method. Subsequently, a relaxed form of SSIP has been proposed:

$$\begin{aligned} \begin{array}{ll} \min \limits _{x}&\mathbb{E }_P[f(x,\xi (\omega ))]\\ \text{ s.t.}&\mathbb{E }_P[H(x,t,\xi (\omega ))]\le 0, \quad \forall t\in T,\\&x\in X, \end{array} \end{aligned}$$
(5)

where

$$\begin{aligned} H(x,t,\xi (\omega )):=(t-G(x,\xi (\omega )))_{+}-(t-Y(\xi (\omega )))_{+} \end{aligned}$$

and \(T\) is a compact subset of \(\mathbb R \). In the literature [810], \(T\) is a closed interval or the union of a finite number of closed intervals in \(\mathbb R \).

Our focus in this paper is on the stability analysis of problem (5). Specifically, we are concerned with the impact of changes in the probability measure \(P\) on both optimal values and optimal solutions. The analysis is inspired by a recent work [7] on the stability and sensitivity analysis of optimization problems with first order stochastic dominance constraints and is in line with the traditional stability analysis in the literature of deterministic nonlinear programming and stochastic programming [14, 15, 20, 21, 29, 31, 32, 35, 36].

From a practical viewpoint, this kind of stability analysis is motivated by the fact that in applications either the probability distribution of \(P\) is not known or a closed form expression for the expected value of the underlying random functions w.r.t. \(P\) is difficult to obtain and consequently the probability measure/distribution may have to be approximated. Stability analysis of problem (5) focuses on the impact on the optimal value and optimal solutions of a perturbation of \(P\) [29]. A particularly interesting case is when the probability measure is approximated by an empirical probability measure. In this case, the expected value of the underlying functions are approximated through sample averaging. The contribution of this paper can be summarized as follows.

  • In Sect. 2, we carry out stability analysis for problem (5). Specifically, we consider the case when the underlying probability measure \(P\) is approximated by a set of probability measures under pseudometric. By exploiting an error bound in semi-infinite programming due to Gugat [13], we show under the Slater condition that the feasible solution set mapping is Lipschitz continuous, the optimal solution set mapping is upper semicontinuous, and the optimal value function is Lipschitz-like (calm). Moreover, when the objective function satisfies certain growth conditions, we show upper semi-continuity of the optimal set-valued mapping. This complements the existing research [7] which focuses on the stability analysis of stochastic optimization problems with first order dominance constraints.

  • In Sect. 3, we consider a special case when the probability measure \(P\) is approximated by an empirical probability measure [which is also known as sample average approximation (SAA)] and present a detailed analysis on the convergence of optimal solution and stationary point obtained from solving the sample average approximate optimization problems as sample size increases. Specifically, we show the exponential rate of convergence of optimal solution and almost sure convergence of stationary point as sample size increases. SAA is a popular method in stochastic programming, but there seems to be few discussions on SAA for stochastic programs with SSD constraints. The only exception is a recent work by Hu et al. [19] which discusses the cutting plane method for sample average approximated optimization problems with SSD constraints.

  • Our convergence analysis is carried out through an exact penalization of (5) [see (11)]. The penalty formulation may provide a potential numerical framework for solving (5). During the revision of this paper, some progress in this regard has been made, see [24].

Throughout this paper, we use the following notation. For vectors \(a,b\in \mathbb R ^n\), \(a^Tb\) denotes the scalar product, \(\Vert \cdot \Vert \) denotes the Euclidean norm of a vector, \(\Vert \cdot \Vert _{\infty }\) denotes the maximum norm of continuous functions defined over compact set \(T\). \(d(x, \mathcal{D}):=\inf _{x^{\prime }\in \mathcal{D}}\Vert x-x^{\prime }\Vert \) denotes the distance from a point \(x\) to a set \(\mathcal{D}\). For two compact sets \(\mathcal{C}\) and \(\mathcal{D}\),

$$\begin{aligned} \mathbb D (\mathcal{C},\mathcal{D}) :=\sup _{x\in \mathcal{C}}d(x, \mathcal{D}) \end{aligned}$$

denotes the deviation of \(\mathcal{C}\) from \(\mathcal{D}\) and \(\mathbb H (\mathcal{C},\mathcal{D}) := \max (\mathbb D (\mathcal{C},\mathcal{D}),\mathbb D (\mathcal{D},\mathcal{C})) \) denotes the Hausdorff distance between \(\mathcal{C}\) and \(\mathcal{D}\). Moreover, \(\mathcal{C}+\mathcal{D}\) denotes the Minkowski addition of the two sets, that is, \(\{C+D: C\in \mathcal{C}, D\in \mathcal{D}\}\), \(B(x,\gamma )\) denotes the closed ball with center \(x\) and radius \(\gamma \), \(\mathcal{B}\) denotes the closed unit ball in the respective space.

2 Stability analysis

Let \(\fancyscript{P}(\Omega )\) denote the set of all Borel probability measures. For \(Q\in \fancyscript{P}(\Omega )\), let \(\mathbb{E }_Q[\xi ]=\int _{\Omega }\xi (\omega ) d Q(\omega )\) denote the expected value of random variable \(\xi \) with respect to the distribution of \(Q\). Assuming \(Q\) is close to \(P\) under some metric to be defined shortly, we investigate in this section the following optimization problem:

$$\begin{aligned} \begin{array}{ll} \min \limits _{x}&\mathbb{E }_Q[f(x,\xi (\omega ))]\\ \text{ s.t.}&\mathbb{E }_Q[H(x,t,\xi (\omega ))]\le 0, \quad \forall t\in T,\\&x\in X, \end{array} \end{aligned}$$
(6)

which is regarded as a perturbation of (5). Specifically, we study the relationship between the perturbed problem (6) and the true problem (5) in terms of optimal values and optimal solutions when \(Q\) is close to \(P\). Of course, we restrict our discussion to the probability measure \(Q\) such that the expected values of the underlying functions in (6) are well defined for all \(x\in X\).

Let us start by introducing a distance function for the set \(\fancyscript{P}(\Omega )\), which is appropriate for our problem. Define the set of functions:

$$\begin{aligned} {\fancyscript{G}}:=\{g(\cdot ):= H(x,t,\cdot ): x\in X, t\in T\}\cup \{g(\cdot ):= f(x,\cdot ): x\in X\}. \end{aligned}$$

The distance function for the elements in set \(\fancyscript{P}(\Omega )\) is defined as:

$$\begin{aligned} {\fancyscript{D}}(P,Q):= \sup _{g\in {\fancyscript{G}}}\left| \mathbb{E }_P[g]-\mathbb{E }_Q[g]\right|. \end{aligned}$$

This type of distance is considered by Römisch [35, Section 2.2] for the stability analysis of stochastic programming and is called a pseudometric. It is well-known that \({\fancyscript{D}}\) is nonnegative, symmetric and satisfies the triangle inequality, see [35, Section 2.1] and the references therein. Throughout this section, we use the following notation:

$$\begin{aligned} \mathcal{F}(Q)&:= \left\{ x\in X: \mathbb{E }_Q[H(x,t,\xi )]\le 0, \quad \forall t\in T\right\} \!,\\ {\vartheta }(Q)&:= \inf \left\{ \mathbb{E }_Q[f(x,\xi )]: x\in \mathcal{F}(Q)\right\} \!,\\ S_{opt}(Q)&:= \left\{ x\in \mathcal{F}(Q): {\vartheta }(Q)=\mathbb{E }_Q[f(x,\xi )]\right\} \!,\\ {\fancyscript{P}}_{\fancyscript{G}}(\Omega )&:= \left\{ Q\in {\fancyscript{P}(\Omega )}: -\infty < \inf _{g(\xi )\in {\fancyscript{G}}}\mathbb{E }_Q[g(\xi )]\; \text{ and}\; \inf _{g(\xi )\in {\fancyscript{G}}}\mathbb{E }_Q[g(\xi )]<\infty \right\} \!. \end{aligned}$$

It is easy to observe that for \(P,Q\in {\fancyscript{P}}_{\fancyscript{G}}(\Omega )\), \({\fancyscript{D}}(P,Q)<\infty \). Throughout this section, the perturbed probability measure \(Q\) in problem (6) is taken from \( {\fancyscript{P}}_{\fancyscript{G}}(\Omega )\).

In what follows, we discuss the continuity of the optimal solution set mapping and optimal value function of problem (6). We do so by applying Klatte’s [20, 21] earlier stability result of parametric nonlinear programming, which was used by Dentcheva et al. [7] for the stability analysis of optimization problems with first order dominance constraints. A key condition in Klatte’s stability result is the pseudo-Lipschitz property of the feasible set mapping. Here we derive the property by exploiting an important result on the error bound in semi-infinite programming established by Gugat in [13]. To this end, we need to introduce some definitions and technical results most of which are translated from deterministic semi-infinite programming in [13].

Definition 2.1

Problem (5) is said to satisfy a weak Slater condition, if there exist positive numbers \(\alpha \) and \(M\) such that for any \(x\in X\) with \(\max _{t\in T}(\mathbb{E }_P[H(x,t,\xi )])_{+}\in (0, M)\) there exists a point \(x^*\) with \(\mathbb{E }_P[H(x^*,t,\xi )]< \max _{t\in T}(\mathbb{E }_P[H(x,t,\xi )])_{+}\) for all \(t\in T\) and

$$\begin{aligned} \Vert x-x^*\Vert \le \alpha \left[ \max \limits _{t\in T} (\mathbb{E }_P[H(x,t,\xi )])_{+}-\max \limits _{t\in T}\mathbb{E }_P[H(x^*,t,\xi )] \right]. \end{aligned}$$

Definition 2.2

Problem (5) is said to satisfy a strong Slater condition, if there exists a positive number \(\gamma \) such that for any feasible point \(x\) satisfying \(\mathbb{E }_P[H(x,t,\xi )]=0\) for some \(t\in T\) there exists a point \(z(x)\) with \(\mathbb{E }_P[H(z(x),t,\xi )]<0\) for all \(t\in T\) and

$$\begin{aligned} \Vert x -z(x)\Vert \le \gamma \min _{t\in T}\left(-\mathbb{E }_P[H(z(x),t,\xi )]\right). \end{aligned}$$

 

Definition 2.3

Problem (5) is said to satisfy a Slater condition if there exist a positive number \(\bar{\delta }\) and a point \(\bar{x}\in X\) such that

$$\begin{aligned} \max _{t\in T} \mathbb{E }_P[H(\bar{x},t, \xi )]\le -\bar{\delta }. \end{aligned}$$

Note that the strong Slater condition implies that the weak Slater condition holds for any \(M>0\) and \(\alpha =\gamma \), where \(M\) is given in Definition 2.1. Since \(X\) is a compact set, the Slater condition implies the strong Slater condition and then the positive number \(\gamma \) in Definition 2.2 can be estimated by

$$\begin{aligned} \gamma =:\sup _{x\in X}\frac{\Vert x-\bar{x}\Vert }{\min _{t\in T}-\mathbb{E }_P[H(\bar{x},t,\xi )]}. \end{aligned}$$
(7)

See [13, Propositions 1 and 2] for details about the relationship.

Proposition 2.4

Assume problem (5) satisfies the Slater condition. Let \(\bar{\delta }\) be given as in Definition 2.3. Then there exists a positive number \(\epsilon \) (\(\epsilon \le \bar{\delta }/2\)) such that for any \(Q\in B(P,\epsilon )\)

$$\begin{aligned} \max _{t\in T} \mathbb{E }_Q[H(\bar{x},t,\xi )]\le -\bar{\delta }/2, \end{aligned}$$

where \(\bar{x}\) is given as in Definition 2.3, that is, the perturbed problem (6) satisfies the Slater condition.

 

Proof

By the definition of pseudometric distance \(\fancyscript{D}\),

$$\begin{aligned} \sup _{t\in T}\left|\mathbb{E }_P[H( x,t,\xi )]-\mathbb{E }_Q[H(x,t,\xi )]\right|\le {\fancyscript{D}}(Q,P), \qquad \forall x\in X. \end{aligned}$$

Let \(Q\in B(P,\bar{\delta }/2)\). Then

$$\begin{aligned} \sup _{t\in T}\mathbb{E }_Q[H(\bar{x},t,\xi )]&\le \sup _{t\in T}\mathbb{E }_P[H( \bar{x},t,\xi )] + \sup _{t\in T}\left|\mathbb{E }_P[H( \bar{x},t,\xi )]-\mathbb{E }_Q[H(\bar{x},t,\xi )]\right|\\&\le -\bar{\delta }+\bar{\delta }/2\\&= -\bar{\delta }/2. \end{aligned}$$

The proof is complete. \(\square \)

By Gugat [13, Lemmas 3 and 6] and Proposition 2.4, we can obtain the following uniform error bound, for the feasible set mapping \(\mathcal{F}(Q)\) of problem (6).

Lemma 2.5

Assume problem (5) satisfies the Slater condition. Then there exist positive numbers \(\epsilon \) and \(\beta \) such that for any \(Q\in B(P,\epsilon )\), the following error bound holds:

$$\begin{aligned} d(x,\mathcal{F}(Q))\le \beta \left\Vert(\mathbb{E }_Q[H(x,t,\xi )])_{+}\right\Vert_{\infty }, \quad \forall x\in X, \end{aligned}$$

where \(\mathcal{F}(Q)\) denotes the feasible set of problem (6).

 

Proof

Let \(\epsilon \) be given as in Proposition 2.4. For any fixed \(Q\in B(P,\epsilon )\), we have by Gugat [13, Lemmas 3 and 6] that

$$\begin{aligned} d(x,\mathcal{F}(Q))\le \gamma (Q)\left\Vert(\mathbb{E }_P[H(x,t,\xi )])_{+}\right\Vert_{\infty }\!, \end{aligned}$$

where

$$\begin{aligned} \gamma (Q)=:\sup _{x\in X}\frac{\Vert x-\bar{x}\Vert }{\min _{t\in T}-\mathbb{E }_Q[H(\bar{x},t,\xi )]}, \end{aligned}$$

and \(\bar{x}\) is given in Definition 2.3. By Proposition 2.4, for \(Q\in B(P,\epsilon )\), \( \mathbb{E }_Q[H(\bar{x},t,\xi )]\le -\bar{\delta }/2, \) where \(\bar{\delta }\) is given in Definition 2.3. This gives

$$\begin{aligned} \gamma (Q)\le \frac{2}{\bar{\delta }}\max _{x^{\prime },x^{\prime \prime }\in X}\Vert x^{\prime }-x^{\prime \prime }\Vert . \end{aligned}$$

The conclusion follows by setting \(\beta :=\frac{2}{\bar{\delta }}\max _{x^{\prime },x^{\prime \prime }\in X}\Vert x^{\prime }-x^{\prime \prime }\Vert \) and the boundedness of \(X\). \(\square \)

Note that this kind of error bound in well known in convex programming, see a pioneering work by Robinson [30]. Here we follow the recent development in [13] as our problem falls into the framework of semi-infinite programming.

Proposition 2.6

Assume that problem (5) satisfies the Slater condition. Then

  1. (i)

    the solution set \(S_{opt}(P)\) is nonempty and compact;

  2. (ii)

    the graph of the feasible set mapping \(\mathcal{F}(\cdot )\) is closed;

  3. (iii)

    there exists a positive number \(\epsilon \) such that the feasible set mapping \(\mathcal{F}(Q)\) is Lipschitz continuous on \( B(P,\epsilon )\), that is,

    $$\begin{aligned} \mathbb H \left(\mathcal{F}({Q_1}),\mathcal{F}({Q_2})\right)\le \beta {\fancyscript{D}}({Q_1},{Q_2}),\qquad \forall {Q_1},{Q_2}\in B(P,\epsilon ), \end{aligned}$$

    where \(\beta \) is a defined in Lemma 2.5.

 

Proof

Part (i) follows from the Slater condition, compactness of \(X\) and the continuity of \(f\).

Part (ii). Let \(t\in T\) be fixed. It follows by virtue of [35, Propositions 3 and 4] that \(\mathbb{E }_Q[H(x,t,\xi )]:X\times ({\fancyscript{G}}, {\fancyscript{D}})\rightarrow \mathbb R \) is lower semicontinuous. Let \(Q_N\rightarrow Q\), \(x^N\in \mathcal{F}(Q_N)\) and \(x^N\rightarrow x^*\). By the Fatou’s lemma

$$\begin{aligned} \mathbb{E }_Q[H(x^*,t,\xi )]\le \liminf _{N\rightarrow \infty }\mathbb{E }_{Q_N}[H(x^N,t,\xi )]\le 0,\quad \forall t\in T, \end{aligned}$$

which implies that \(x^*\in \mathcal{F}(Q)\).

Part (iii). Let \(\epsilon \) be given by Lemma 2.5 and \({Q_1}, {Q_2}\in B(P,\epsilon )\). Observe that for any \(x\in \mathcal{F}({Q_1})\), \((\mathbb{E }_{Q_1}[H(x,t,\xi )])_{+}=0\), for all \(t\in T\). By Lemma 2.5, there exists a positive constant \(\beta \) such that for any \(x\in \mathcal{F}({Q_1})\)

$$\begin{aligned} d(x,\mathcal{F}({Q_2}))&\le \beta \left\Vert\left(\mathbb{E }_{Q_2}[H(x,t,\xi )]\right)_{+}\right\Vert_{\infty }\\&= \beta \left(\max _{t\in T}\left(\mathbb{E }_{Q_2}[H(x,t,\xi )]\right)_{+}-\max _{t\in T}\left(\mathbb{E }_{Q_1}[H(x,t,\xi )]\right)_{+}\right)\\&\le \beta \max _{t\in T}\left((\mathbb{E }_{Q_2}[H(x,t,\xi )])_{+}-(\mathbb{E }_{Q_1}[H(x,t,\xi )])_{+}\right)\\&\le \beta \max _{t\in T}\left|\mathbb{E }_{Q_2}[H(x,t,\xi )]-\mathbb{E }_{Q_1}[H(x,t,\xi )]\right|\\&\le \beta \max _{(x,t)\in X\times T}\left|\mathbb{E }_{Q_2}[H(x,t,\xi )]-\mathbb{E }_{Q_1}[H(x,t,\xi )]\right|\\&\le \beta {\fancyscript{D}}({Q_1},{Q_2}), \end{aligned}$$

which implies \(\mathbb D (\mathcal{F}({Q_1}),\mathcal{F}({Q_2}))\le \beta {\fancyscript{D}}({Q_1},{Q_2})\). In the same manner, we can show that for any \(x\in \mathcal{F}({Q_2})\),

$$\begin{aligned} d(x,\mathcal{F}({Q_1}))&\le \beta \left(\max _{t\in T}\left(\mathbb{E }_{Q_1}[H(x,t,\xi )]\right)_{+}-\max _{t\in T}\left(\mathbb{E }_{Q_2}[H(x,t,\xi )]\right)_{+}\right)\\&\le \beta {\fancyscript{D}}({Q_2},{Q_1}), \end{aligned}$$

which yields \(\mathbb D (\mathcal{F}({Q_2}),\mathcal{F}({Q_1}))\,\,{\le }\,\, \beta {\fancyscript{D}}({Q_1},{Q_2}).\) Summarizing the discussions above, we have

$$\begin{aligned} \mathbb H \left(\mathcal{F}({Q_1}),\mathcal{F}({Q_2})\right)&= \max \left\{ \mathbb D \left(\mathcal{F}({Q_1}),\mathcal{F}({Q_2})\right) ,\mathbb D \left(\mathcal{F}({Q_2}),\mathcal{F}({Q_1})\right)\right\} \\&\le \beta {\fancyscript{D}}({Q_1},{Q_2}). \end{aligned}$$

The proof is complete. \(\square \)

Recall that a set-valued mapping \(\Gamma :\mathbb R ^m \rightrightarrows \mathbb R ^n\) is said to be upper semi-continuous at \(y\) in the sense of Berge if for any \(\epsilon >0\), there exists a number \(\delta >0\) such that

$$\begin{aligned} \Gamma (y^{\prime })\subseteq \Gamma (y)+\epsilon \mathcal{B}, \quad \forall y^{\prime }\in y+\delta \mathcal{B}, \end{aligned}$$

where \(\mathcal{B}\) denotes the closed unit ball in the respective space. It is said to be Lipschitz continuous near \(y\) if there exists a constant \(L\) such that

$$\begin{aligned} \mathbb H (\Gamma (y^{\prime }), \Gamma (y^{\prime \prime }))\le L\Vert y^{\prime }-y^{\prime \prime }\Vert , \quad \forall y^{\prime },y^{\prime \prime }\in y+\delta \mathcal{B}. \end{aligned}$$

See [34, page 368].

Proposition 2.6 (iii) says that the feasible set mapping of problem (6) is Lipschitz continuous with respect to probability measure over set \(B(P,\epsilon )\). Using this property, we are ready to establish our main stability results.

Theorem 2.7

Assume that problem (5) satisfies the Slater condition. Assume also that the Lipschitz modulus of \(f(x,\xi )\) w.r.t. \(x\) is bounded by an integrable function \(\kappa (\xi )>0\). Then

  1. (i)

    there exists \(\epsilon ^{\prime }>0\) such that the optimal solution set of problem (6), denoted by \(S_{opt}(Q)\), is not empty for \( Q\in B(P,\epsilon ^{\prime })\);

  2. (ii)

    \(S_{opt}(\cdot )\) is upper semi-continuous at point \(P\) in the sense of Berge;

  3. (iii)

    there exist positive numbers \(\epsilon ^*\) and \(L^*\) such that the optimal value function of problem (6), denoted by \({\vartheta }(Q)\), is continuous at point \(P\) and satisfies the following Lipschitz-likeFootnote 1 estimation:

    $$\begin{aligned} |{\vartheta }(Q)-{\vartheta }(P)|\le L^* {\fancyscript{D}}(Q,P),\qquad \forall Q\in B(P,\epsilon ^*). \end{aligned}$$

 

Proof

Under the Slater condition, it follows from Proposition 2.6 that there exists a positive number \(\epsilon \) such that the feasible set mapping \(\mathcal{F}(\cdot )\) is Lipschitz continuous on \( B(P,\epsilon )\). The rest follows straightforwardly from [21, Theorem 1] ([29, Theorem 2.3] or [7, Theorem 2.1] in stochastic programming). The proof is complete. \(\square \)

Theorem 2.7 says that the optimal solution set mapping \(S_{opt}(\cdot )\) is nonempty near \(P\) and upper semi-continuous at \(P\). In order to quantify the upper semi-continuity of \(S_{opt}(\cdot )\), we need some growth condition of the objective function of problem (5) in a neighborhood of \(S_{opt}(P)\). Instead of imposing a specific growth condition, here we consider a general growth function

$$\begin{aligned} \Psi (\nu ):=\min \{\mathbb{E }_P[f(x,\xi )]-s^*: d(x, S_{opt}(P))\ge \nu , x\in X\} \end{aligned}$$
(8)

of problem (5), and the associated function

$$\begin{aligned} \widetilde{\Psi }(v):=v+\Psi ^{-1}(2v), \end{aligned}$$

where \(s^*\) denotes the optimal value of problem (5) and \(\nu ,v\in \mathbb R _+\). This kind of growth function is well known, see for instance [29, 34]. The following corollary quantifies the upper semicontinuity of \(S_{opt}(\cdot )\) near \(P\).

Corollary 2.8

Let the assumptions of Theorem 2.7 hold. Then there exist positive constants \(L\) and \(\epsilon \) such that

$$\begin{aligned} \emptyset \ne S_{opt}(Q)\subseteq S_{opt}(P)+\widetilde{\Psi }\left(L{\fancyscript{D}}(Q,P)\right){\mathcal{B}}, \end{aligned}$$

for any \(Q\in B(P,\epsilon )\), where \(\mathcal{B}\) denotes the closed unit ball.

We omit the proof as it is similar to that of [29, Theorem 2.4]. See also [34, Theorem 7.64] for earlier discussions about functions \(\Psi (\cdot )\) and \(\widetilde{\Psi }(\cdot )\). Discussions on a particular form of \(\widetilde{\Psi }\) can be found in [4, 39] when the growth is of second order. We will come back to this in Lemma 3.8.

3 Empirical probability measure

In this section, we consider a special case when the probability measure \(P\) is approximated by a sequence of empirical measures \(P_N\) defined as

$$\begin{aligned} P_N := \frac{1}{N}\sum \limits _{k=1}^N {\small \mathrm{1\!\!\!I}}_{\xi ^k}(\omega ), \end{aligned}$$

where \(\xi ^1, \ldots , \xi ^N\) is an independent and identically distributed sampling of \(\xi \) and

$$\begin{aligned} {\small \mathrm{1\!\!\!I}}_{\xi ^k}(\omega ) := \left\{ \begin{array}{l@{\quad }l} 1,&\text{ if}\; \xi (\omega ) = \xi ^k,\\ 0,&\text{ if} \; \xi (\omega ) \ne \xi ^k. \end{array} \right. \end{aligned}$$

In this case

$$\begin{aligned} \mathbb{E }_{P_N}[f(x,\xi )] = \frac{1}{N} \sum \limits _{k=1}^N f(x,\xi ^k) \end{aligned}$$

and

$$\begin{aligned} \mathbb{E }_{P_N}[H(x,t,\xi )]=\frac{1}{N} \sum \limits _{k=1}^N H(x,t,\xi ^k). \end{aligned}$$

It follows from the classical law of large numbers in statistics, \(\mathbb{E }_{P_N}[f(x,\xi )]\) and \(\mathbb{E }_{P_N}[H(x,t,\xi )]\) converge to \(\mathbb{E }_P[f(x,\xi )]\) and \(\mathbb{E }_P[H(x,t,\xi )]\) respectively as \(N\) increases. This kind of approximation is well-known in stochastic programming under various names such as sample average approximation, Monte Carlo method, sample path optimization, stochastic counterpart etc, see [16, 33, 41, 43] and the references therein.

For the simplicity of notation, we use \(f_N(x)\) and \(H_N(x,t)\) to denote \(\mathbb{E }_{P_N}[f(x,\xi )]\) and \(\mathbb{E }_{P_N}[H(x,t,\xi )]\). Consequently we consider the following approximation of problem (5):

$$\begin{aligned} \begin{array}{ll} \displaystyle \min \limits _{x}&\displaystyle f_N(x):=\frac{1}{N}\sum \limits _{k=1}^{N}f(x,\xi ^k)\\ \displaystyle \text{ s.t.}&\displaystyle H_N(x,t):=\frac{1}{N}\sum \limits _{k=1}^{N}H(x,t,\xi ^k)\le 0, \quad \forall t\in T,\\&\displaystyle x\in X. \end{array} \end{aligned}$$
(9)

We call (9) the SAA problem and (5) the true problem.

Assuming that we can obtain an optimal solution, denoted by \(x^N\), by solving the SAA problem, we analyze the convergence of \(x^N\) as the sample size increases. The analysis will be very complicated if it is carried out on (9) directly because the constraints of the SAA problem depend on the sampling. To get around the difficulty as well as the infinite number of constraints, we consider a reformulation of both the true and the SAA problem through exact penalization of the stochastic constraints to the objective. In doing so, the feasible set of the penalized problems are deterministic and we only need to analyze the convergence of the the objective functions.

For the simplicity of notation, let

$$\begin{aligned} h(x,t):= \max \left\{ \mathbb{E }_P[H(x,t,\xi )],0\right\} , \qquad \theta (x):=\max _{t\in T}h(x,t). \end{aligned}$$
(10)

It is easy to observe that

$$\begin{aligned} h(x,t) = (\mathbb{E }_P[H(x,t,\xi )])_+ \; \text{ and} \; \theta (x) = \Vert (\mathbb{E }_P[H(x,t,\xi )])_+\Vert _{\infty }. \end{aligned}$$

Consider the exact penalization:

$$\begin{aligned} \begin{array}{ll} \min \limits _{x}&\psi (x,\rho ):=\mathbb{E }_P[f(x,\xi )]+\rho \theta (x)\\ \text{ s.t.}&x\in X, \end{array} \end{aligned}$$
(11)

where \( \rho > 0\) is a penalty parameter. This kind of penalization is well documented in the literature, see for instance [28, 42]. In what follows, we establish the equivalence between (5) and (11) in the sense of optimal solutions. We do so by exploiting the error bound established in Lemma 2.5 and a well-known result by Clarke [5, Proposition 2.4.3]. We need the following assumptions.

Assumption 3.1

\(f(x,\xi )\) and \(G(x,\xi )\) are locally Lipschitz continuous w.r.t. \(x\) and their Lipschitz modulus are bounded by an integrable function \(\kappa (\xi )>0\).

 

Theorem 3.2

Assume that the true problem (5) satisfies the Slater condition. Under Assumption 3.1, there exists a positive number \(\bar{\rho }\) such that for any \(\rho >\bar{\rho }\), the sets of optimal solutions of problems (5) and (11), denoted by \(S_{opt}\) and \(X_{opt}\) respectively, coincide.

 

Proof

Under the Slater condition, it follows by Lemma 2.5 that there exists a constant \(\beta >0\) such that

$$\begin{aligned} d(x, \mathcal{F}(P))\le \beta \left\Vert(\mathbb{E }_P[H(x,t,\xi )])_{+}\right\Vert_{\infty } =\beta \theta (x), \quad \forall x\in X. \end{aligned}$$

Let \(C:=\mathbb{E }_P[\kappa (\xi )]\). Under Assumption 3.1, \(C<\infty \) and the Lipschitz modulus of \(\mathbb{E }_P[f(x,\xi )]\) is bounded by \(C\). Let \(\rho \) be a positive constant such that \(\rho >\beta C\). Clarke’s exact penalty function theorem [5, Proposition 2.4.3] ensures that the two optimal solution sets, \(S_{opt}\) and \(X_{opt}\), coincide. This shows the existence of a positive constant \(\bar{\rho }:=\beta C\). The proof is complete. \(\square \)

We now move on to discuss the exact penalization of the SAA problem (9). Let

$$\begin{aligned} h_N(x,t):= (H_N(x,t))_+ = \max \left\{ H_N(x,t),0\right\} \end{aligned}$$

and

$$\begin{aligned} \theta _N(x):=\max _{t\in T}h_N(x,t) =\Vert (H_N(x,t))_+\Vert _{\infty }. \end{aligned}$$
(12)

Consider the SAA penalty problem

$$\begin{aligned} \begin{array}{ll} \min \limits _{x}&\psi _N(x,\rho _N):=f_N(x)+\rho _N\theta _N(x)\\ \text{ s.t.}&x\in X, \end{array} \end{aligned}$$
(13)

where \(\rho _N > 0\) is a penalty parameter.

Under Assumption 3.1, we know from [37, Section 6, Proposition 7] that the sample average \(H_N(x,t)\) converges to \(\mathbb{E }_P[H(x,t,\xi )]\) uniformly over compact set \(X\times T\) almost surely. Since the true problem (5) satisfies the Slater condition, there exists a sufficiently large \(N^*\) (depending on \(\omega \)) such that for \(N\ge N^*\)

$$\begin{aligned} H_N(\bar{x},t) \le -\bar{\delta }/2, \quad \forall t\in T, \end{aligned}$$

where \(\bar{x}\) and \(\bar{\delta }\) are given in Definition 2.3. Subsequently, by Lemma 2.5, we obtain that for any \(N\ge N^*\),

$$\begin{aligned} d(x,\mathcal{F}_N)\le \beta \left\Vert(H_N( x,t))_{+}\right\Vert_{\infty } =\beta \theta _N(x), \quad \forall x\in X, \end{aligned}$$
(14)

where \(\mathcal{F}_N\) denotes the feasible set of problem (9).

Proposition 3.3

Assume that the true problem (5) satisfies the Slater condition. Then there exist positive numbers \(\rho ^*\) and \(N^*\) (depending on \(\omega \)) such that for \(\rho >\rho ^*\) and \(N\ge N^*\), the sets of optimal solutions of problems (9) and (13), denoted by \(S_{opt}^N\) and \(X_{opt}^N\) respectively, coincide.

 

Proof

Following the discussions above, there exist a positive constant \(\beta \) and a sufficiently large positive integer \(N_1\) (depending on \(\omega \)) such that for any \(N\ge N_1\), (14) holds. Let \(C_N\) denote the Lipschitz modulus of function \(f_N(x)\). By [5, Proposition 2.4.3], for any \(\rho >\beta C_N\), the two optimal solution sets, \(S_{opt}^N\) and \(X_{opt}^N\), coincide. Moreover, under Assumption 3.1, \(C_N\) converges to the Lipschitz modulus of \(\mathbb{E }[f(x,\xi )]\) and is bounded by \(\mathbb{E }[\kappa (\xi )]\). This implies that there exists a positive integer \(N_2\ge N_1\) such that when \(N\ge N_2\), we have \(C_N< C+1\), that is, \(C_N\) is bounded almost surely. The conclusion follows by taking \(\rho ^*= \beta (C+1)\) and \(N^*=\max \{N_1, N_2\}.\) \(\square \)

3.1 Optimal solution

Assuming for every fixed sampling, we can obtain an optimal solution, denoted by \(x^N\), from solving the SAA problem (9), we analyze the convergence of \(x^N\) as the sample size \(N\) increases. We do so by establishing uniform convergence of the objective function of problem (13) to the objective function of problem (11). Asymptotic convergence analysis of optimal values and optimal solutions are well known in stochastic programming. Our analysis differs from those in the literature in that it is carried out through exact penalization.

Proposition 3.4

Let Assumption 3.1 hold. Then

  1. (i)

    \(\psi (x,\rho )\) and \(\psi _N(x,\rho _N)\), \(N=1,2, \ldots ,\) are Lipschitz continuous;

  2. (ii)

    if \(\rho _N \rightarrow \rho \) as \(N\rightarrow \infty \), then \(\psi _N(x,\rho _N)\) converges to \(\psi (x,\rho )\) with probability 1 uniformly over \(X\).

 

Proof

Part (i). Under Assumption 3.1, \(\mathbb{E }_P[H(x,t,\xi )]\) and \(H_N(x,t)\) are Lipschitz continuous with respect to \((x,t)\). Since \(T\) is a compact set, by [27, Theorem 3.1], \(\theta (x)\) and \(\theta _N(x)\) are Lipschitz continuous. Together with the Lipschitz continuity of \(\mathbb{E }_P[f(x,\xi )]\) and \(f_N(x)\), we conclude that \(\psi (x,\rho )\) and \(\psi _N(x,\rho _N)\) are Lipschitz continuous.

Part (ii). By Assumption 3.1 and the compactness of \(X\), it is not difficult to show that \(f(x,\xi )\) and \(H(x,t,\xi )\) are dominated by an integrable function. The uniform convergence of \(f_N(x)\) to \(\mathbb{E }_P[f(x,\xi )]\) and \(H_N(x,t)\) to \(\mathbb{E }_P[H(x,t,\xi )]\) follows from classical uniform law of large numbers for random functions, see e.g. [37, Section 6, Proposition 7]. Since \(\rho _N\rightarrow \rho \), it suffices to show the uniform convergence of \(\theta _N(x)\) to \(\theta (x)\). By definition,

$$\begin{aligned} \max _{x\in X}\left|\theta _N(x)-\theta (x)\right|&= \max _{x\in X}\left|\max _{t\in T}(\max \{H_N(x,t),0\})-\max _{t\in T}(\max \{\mathbb{E }_P[H(x,t,\xi )],0\})\right|\nonumber \\&\le \max _{(x,t)\in X\times T} \left|\max \{H_N(x,t),0\}-\max \{\mathbb{E }_P[H(x,t,\xi )],0\}\right|\nonumber \\&\le \max _{(x,t)\in X\times T} \left|H_N(x,t)-\mathbb{E }_P[H(x,t,\xi )]\right|. \end{aligned}$$
(15)

This along with the uniform convergence of \(H_N(x,t)\) to \(\mathbb{E }_P[H(x,t,\xi )]\) over \(X\times T\) gives rise to the assertion. The proof is complete. \(\square \)

 

Assumption 3.5

Let \(f(x,\xi )\) and \(H(x,t,\xi )\) be defined as in (5). The following hold.

  1. (a)

    for every \(x\in X\), the moment generating function

    $$\begin{aligned} M_x(\tau ):=\mathbb{E }_P\left[e^{\tau \left(f(x,\xi ) - \mathbb{E }_P[f(x,\xi )]\right)} \right] \end{aligned}$$

    of random variable \(f(x,\xi ) - \mathbb{E }_P[f(x,\xi )]\) is finite valued for all \(\tau \) in a neighborhood of zero;

  2. (b)

    for every \((x,t)\in X\times T\), the moment generating function

    $$\begin{aligned} M_{(x,t)}(\tau ):=\mathbb{E }_P\left[e^{\tau \left(H(x,t,\xi ) - \mathbb{E }_P[H(x,t,\xi )]\right)} \right] \end{aligned}$$

    of random variable \(H(x,t,\xi ) - \mathbb{E }_P[H(x,t,\xi )]\) is finite valued for all \(\tau \) in a neighborhood of zero;

  3. (c)

    let \(\kappa (\xi )\) be given as in Assumption 3.1. The moment generating function \(M_{\kappa }(\tau )\) of \(\kappa (\xi )\) is finite valued for all \(\tau \) in a neighborhood of \(0\).

Assumption 3.5 (a) means that the random variables \(f(x,\xi ) - \mathbb{E }_P[f(x,\xi )]\) and \(H(x,t,\xi ) - \mathbb{E }_P[H(x,t,\xi )]\) do not have a heavy tail distribution. In particular, it holds if the random variable \(\xi \) has a bounded support set. Note that under Assumption 3.1, the Lipschitz modulus of \(H(x,t,\xi )\) is bounded by \(1+\kappa (\xi )\). Assumption 3.5 (c) implies that the moment generating function of \(1+\kappa (\xi )\) is finite valued for \(\tau \) close to \(0\) because \(\mathbb{E }[e^{-(1+\kappa (\xi )\tau }]=e^{-\tau }\mathbb{E }[e^{-\kappa (\xi ) \tau }]=e^{-\tau }M_\kappa (\tau )\).

Proposition 3.6

Let Assumptions 3.1, 3.5 hold and \(\rho _N\rightarrow \rho \). Then \(\psi _N(x,\rho _N)\) converges to \(\psi (x,\rho )\) with probability approaching \(1\) at an exponential rate, that is, for any \(\alpha >0\), there exist positive constants \(C(\alpha )\), \(K(\alpha )\) and independent of \(N\), such that

$$\begin{aligned} \mathrm{{Prob}}\left\{ \sup _{x\in X} \left| \psi _N(x,\rho _N)- \psi (x,\rho )\right| \ge \alpha \right\} \le C(\alpha )e^{-NK(\alpha )} \end{aligned}$$

for \(N\) sufficiently large.

 

Proof

By definition

$$\begin{aligned}&\mathrm{{Prob}}\left\{ \sup _{x\in X} \left| \psi _N(x,\rho _N)- \psi (x,\rho )\right| \ge \alpha \right\} \\&\quad = \mathrm{{Prob}}\left\{ \sup _{x\in X} \left| f_N(x)+\rho _N\theta _N(x)-(\mathbb{E }_P[f(x,\xi )]+\rho \theta (x))\right| \ge \alpha \right\} \\&\quad \le \mathrm{{Prob}}\left\{ \sup _{x\in X} \left| f_N(x)-\mathbb{E }_P[f(x,\xi )]\right| \ge \alpha /2 \right\} \!+\!\mathrm{{Prob}}\left\{ \sup _{x\in X} \left| \rho _N\theta _N(x)-\rho \theta (x)\right| \ge \alpha /2 \right\} . \end{aligned}$$

Under Assumption 3.5, it follows from [41, Theorem 5.1] that the first term on the right hand side of the inequality above converges to zero at an exponential rate. In the same manner, we can obtain uniform exponential convergence of \(H_N(x,t)\) to \(\mathbb{E }_P[H(x,t,\xi )]\) and hence \(\theta _N(x)\) to \(\theta (x)\) taking into account that \(\rho _N\rightarrow \rho \). The proof is complete. \(\square \)

 

Remark 3.7

Similar to the discussions in [41], we may estimate the sample size. To see this, let us strengthen the conditions in Assumption 3.5 (a) and (b) to the following:

  • There exists a constant \(\varrho >0\) such that for every \(x\in X\),

    $$\begin{aligned} \mathbb{E }_P\left[e^{\tau \left(f(x,\xi )- \mathbb{E }_P[f(x,\xi )]\right)} \right] \le e^{\varrho ^2\tau ^2/2},\quad \forall \tau \in \mathbb R \end{aligned}$$
    (16)

    and for every \((x,t)\in X\times T\),

    $$\begin{aligned} \mathbb{E }_P\left[e^{\tau \left(H(x,t,\xi )- \mathbb{E }_P[H(x,t,\xi )]\right)} \right] \le e^{\varrho ^2\tau ^2/2}, \quad \forall \tau \in \mathbb R . \end{aligned}$$
    (17)

Note that equality in (16) and (17) holds if random variables \(f(x,\xi )-\mathbb{E }_P[f(x,\xi )]\) and \(H(x,t,\xi )- \mathbb{E }_P[H(x,t,\xi )]\) follow a normal distribution with variance \(\varrho ^2\), see a discussion in [41, page 410]. Let \(\alpha _1\) be a small positive number and \(\beta _1\in (0,1)\). It follows from (5.14) and (5.15) in [41] that for

$$\begin{aligned} N\ge N_1(\alpha _1,\beta _1):=\frac{O(1)\varrho ^2}{\alpha _1^2}\left[n\log \left(\frac{O(1)D_1\mathbb{E }_P[\kappa _1(\xi )]}{\alpha _1}\right) + \log \left(\frac{1}{\beta _1}\right)\right],\qquad \end{aligned}$$
(18)

where \(O(1)\) is a generic constant, we have that

$$\begin{aligned} \mathrm{{Prob}}\left\{ \sup _{x\in X} \left| f_N(x)-\mathbb{E }_P[f(x,\xi )]\right| \ge \alpha _1 \right\} \le \beta _1, \end{aligned}$$
(19)

where \(\kappa _1(\xi )\) is the global Lipschitz modulus of \(f(\cdot ,\xi )\) over \(X\), \(D_1 := \sup _{x^{\prime },x^{\prime \prime }\in X} \Vert x^{\prime }-x^{\prime \prime }\Vert \). Likewise, for given positive numbers \(\alpha _2\) and \(\beta _2\in (0,1)\), when

$$\begin{aligned} N\ge N_2(\alpha _2,\beta _2) := \frac{O(1)\varrho ^2}{\alpha _2^2} \left[n\log \left(\frac{O(1)D_2\mathbb{E }_P[\kappa _2(\xi )]}{\alpha _2}\right) + \log \left(\frac{1}{\beta _2}\right)\right],\qquad \end{aligned}$$
(20)

we have

$$\begin{aligned} \mathrm{{Prob}}\left\{ \max _{(x,t)\in X\times T} \left|H_N(x,t)-\mathbb{E }_P[H(x,t,\xi )]\right| \ge \alpha _2 \right\} \le \beta _2, \end{aligned}$$
(21)

where \(\kappa _2(\xi )\) is the global Lipschitz modulus of \(H(\cdot ,\cdot ,\xi )\) over \(X\times T\),

$$\begin{aligned} D_2 := \sup _{w^{\prime },w\in X\times T} \Vert w^{\prime }-w\Vert \le D_1+ \sup _{t^{\prime },t^{\prime \prime }\in T} \Vert t^{\prime }-t^{\prime \prime }\Vert . \end{aligned}$$

Let \(\alpha >0\) be a positive number and \(\beta \in (0,1)\). Observe that

$$\begin{aligned}&\!\!\!\mathrm{{Prob}}\left\{ \max _{x\in X} \left|\psi _N(x,\rho _N)\!-\!\psi (x,\rho )\right| \!\ge \! \alpha \right\} \le \mathrm{{Prob}}\left\{ \sup _{x\in X} \left| f_N(x)\!-\!\mathbb{E }_P[f(x,\xi )]\right|\!\ge \!\alpha /2 \right\} \nonumber \\&\quad + \mathrm{{Prob}}\left\{ \sup _{x\in X} \left| \rho _N\theta _N(x)\!-\!\rho \theta (x)\right| \ge \alpha /2 \right\} . \end{aligned}$$
(22)

Let \(N_3\) be sufficiently large such that \(\rho _N\le 2\rho \) and

$$\begin{aligned} (\rho _N-\rho )\sup _{x\in X} |\theta (x)| \le \frac{\alpha }{4}. \end{aligned}$$

Then it is easy to verify that for \(N\ge N_3\)

$$\begin{aligned}&\mathrm{{Prob}}\left\{ \sup _{x\in X} \left| \rho _N\theta _N(x)-\rho \theta (x)\right| \ge \alpha /2 \right\} \nonumber \\&\quad \le \mathrm{{Prob}}\left\{ \sup _{x\in X} \left| \theta _N(x)-\theta (x)\right| \ge \frac{\alpha }{8\rho }\right\} \nonumber \\&\quad \le \mathrm{{Prob}}\left\{ \max _{(x,t)\in X\times T} \left|H_N(x,t)-\mathbb{E }_P[H(x,t,\xi )]\right| \ge \frac{\alpha }{8\rho } \right\} . \end{aligned}$$
(23)

The last inequality is due to (15). Let

$$\begin{aligned} N(\alpha ,\beta ) := \max \left\{ N_1\left(\frac{\alpha }{2},\beta _1\right), N_2\left(\frac{\alpha }{8\rho },\beta _2\right),N_3\right\} , \end{aligned}$$
(24)

where \(\beta _1,\beta _2\in (0,1)\) and \(\beta _1+\beta _2=\beta \). Combining (19), (21), (22) and (23), we have for \(N\ge N(\alpha ,\beta )\)

$$\begin{aligned}&\mathrm{{Prob}}\left\{ \max _{x\in X} \left|\psi _N(x,\rho _N)-\psi (x,\rho )\right| \ge \alpha \right\} \\&\quad \le \mathrm{{Prob}}\left\{ \sup _{x\in X} \left| f_N(x)-\mathbb{E }_P[f(x,\xi )]\right| \ge \alpha /2 \right\} \nonumber \\&\qquad +\, \mathrm{{Prob}}\left\{ \max _{(x,t)\in X\times T} \left|H_N(x,t)-\mathbb{E }_P[H(x,t,\xi )]\right| \ge \frac{\alpha }{8\rho } \right\} \nonumber \\&\quad \le \beta _1 + \beta _2\\&\quad = \beta . \end{aligned}$$

The discussion above shows that for given \(\alpha \) and \(\beta \), we can obtain sample size \(N(\alpha ,\beta )\) such that when \(N\ge N(\alpha ,\beta )\)

$$\begin{aligned} \mathrm{{Prob}}\left\{ \max _{x\in X} \left|\psi _N(x,\rho _N)-\psi (x,\rho )\right| \ge \alpha \right\} \le \beta . \end{aligned}$$

In what follows, we translate the uniform exponential convergence established in Proposition 3.6 into that of optimal solutions. We need the following intermediate stability result.

Lemma 3.8

Let \(\phi :\mathbb R ^m \rightarrow \mathbb R \) be a continuous function and \(X \subseteq \mathbb R ^m\) be a closed set, let \(\varphi :\mathbb R ^m \rightarrow \mathbb R \) be a continuous perturbation of \(\phi \). Consider the following constrained minimization problem

$$\begin{aligned} \begin{array}{l@{\quad }l} \displaystyle \min&\phi (x) \\ \text{ s.t.}\,&x\in X, \end{array} \end{aligned}$$
(25)

and its perturbation

$$\begin{aligned} \begin{array}{l@{\quad }l} \displaystyle \min&\varphi (x) \\ \text{ s.t.}\,&x\in X. \end{array} \end{aligned}$$
(26)

Let \(X^*_{\phi }\) and \(X^*_{\varphi }\) denote the set of optimal solutions to (25) and (26) respectively. Then

  1. (i)

    for any \(\epsilon >0\), there exists a \(\delta >0\) (depending on \(\epsilon \)) such that

    $$\begin{aligned} \mathbb D (X^*_{\varphi },X^*_{\phi }) \le \epsilon , \end{aligned}$$
    (27)

    when

    $$\begin{aligned} \sup _{x\in X}|\varphi (x)-\phi (x)|\le \delta ; \end{aligned}$$
  2. (ii)

    if, in addition, there exists a positive constant \(\varsigma \) such that

    $$\begin{aligned} \phi (x) \ge \min _{x\in X} \phi (x) +\varsigma d(x,X_\phi ^*)^2, \quad \forall x\in X, \end{aligned}$$
    (28)

    then

    $$\begin{aligned} \mathbb D (X_\psi ^*,X_\phi ^*) \le \sqrt{\frac{3}{\varsigma }\sup _{x\in X}|\varphi (x)-\phi (x)|}. \end{aligned}$$
    (29)

 

Proof

The results are a minor extension of [6, Lemma 3.2] which deals with the case when \(X^*_{\phi }\) is a singleton and are also similar to [34, Theorem 7.64]. Here we provide a proof for completeness.

Part (i). Let \(\epsilon \) be a fixed small positive number and \(\phi ^*\) the optimal value of (25). Define

$$\begin{aligned} R(\epsilon ) :=\inf _{\{x\in X, d(x,X^*_{\phi })\ge \epsilon \}}\phi (x)-\phi ^*. \end{aligned}$$
(30)

Then \(R(\epsilon )>0\). Let \(\delta :=R(\epsilon )/3\) and \(\varphi \) be such that \(\sup _{x\in X} |\varphi (x)-\phi (x)|\le \delta \). Then for any \(x\in X\) with \(d(x,X^*_{\phi })\ge \epsilon \) and for any fixed \(x^*\in X^*_{\phi }\),

$$\begin{aligned} \varphi (x)-\varphi (x^*)\ge \phi (x)-\phi (x^*)-2\delta \ge R(\epsilon )/3> 0, \end{aligned}$$

which implies that \(x\) is not an optimal solution to (26). This is equivalent to \(d(x,X^*_{\phi })< \epsilon \) for all \(x\in X^*_{\varphi }\), that is, \(\mathbb D (X^*_{\varphi },X^*_{\phi }) \le \epsilon \).

Part (ii). Under condition (28), it is easy to derive that \(R(\epsilon )=\varsigma \epsilon ^2\). Let

$$\begin{aligned} \epsilon :=\sqrt{\frac{3}{\varsigma }\sup _{x\in X}|\varphi (x)-\phi (x)|}. \end{aligned}$$

From Part (i), we immediately arrive at (29). The proof is complete. \(\square \)

 

Remark 3.9

We have a few comments on Lemma 3.8.

  1. (i)

    Condition (28) is known as a second order growth condition. Using this condition, Shapiro [38] developed a variational principal which gives a bound for \(d(x,X^*_{\phi })\) in terms of the maximum Lipschitz constant of \(\varphi -\phi \) over \(X\), see [38, Lemma 4.1] and [39, Proposition 2.1]. Both the second order growth condition and the variational principal have been widely used for stability and asymptotic analysis in stochastic programming, see [4, 38, 39]. Our claim in Lemma 3.8 (ii) strengthens the variational principal in that our bound for \(d(x,X^*_{\phi })\) is \(\sqrt{\frac{3}{\varsigma }\sup _{x\in X}|\varphi (x)-\phi (x)|}\) which tends to zero when the maximum Lipschitz constant of \(\varphi (x)-\phi (x)\) over \(X\) goes to zero and \(\varphi (x_0)-\phi (x_0)=0\) at some point \(x_0\in X\).

  2. (ii)

    Lemma 3.8 (ii) may be extended to a general case when \(R(\epsilon )\) is monotonically increasing on \(\mathbb R _+\). In such a case, we may set

    $$\begin{aligned} \epsilon :=R^{-1}\left(3\sup _{x\in X}|\varphi (x)-\phi (x)|\right) \end{aligned}$$

    and obtain from Lemma 3.8 (i) that

    $$\begin{aligned} \mathbb D (X^*_{\varphi },X^*_{\phi })\le R^{-1}\left(3\sup _{x\in X}|\varphi (x)-\phi (x)|\right). \end{aligned}$$

 

Theorem 3.10

Assume that problem (5) satisfies the Slater condition. Let \(\{\rho _N\}\) be a sequence of positive numbers such that \(\rho _N\rightarrow \rho \), where \(\rho \) is given in Theorem 3.2. Then

  1. (i)

    with probability 1

    $$\begin{aligned} \lim _{N\rightarrow \infty }\mathbb D \left(X^N_{opt}, X_{opt}\right)=0, \end{aligned}$$
    (31)

    where \(X_{opt}\) and \(X^N_{opt}\) denote the sets of optimal solutions of problem (11) and (13) respectively. Moreover, if Assumption 3.5 holds, then the convergence rate is exponential, that is, for any \(\alpha >0\), there exist positive constants \(C_1(\alpha )\), \(K_1(\alpha )\) and independent of \(N\), such that

    $$\begin{aligned} \mathrm{{Prob}}\left\{ \mathbb D \left(X^N_{opt}, X_{opt}\right) \ge \alpha \right\} \le C_1(\alpha )e^{-NK_1(\alpha )} \end{aligned}$$

    for \(N\) sufficiently large.

  2. (ii)

    If the objective function of the true penalty problem (11) satisfies the second order growth condition:

    $$\begin{aligned} \psi (x,\rho ) \ge \min _{x\in X} \psi (x,\rho ) +\varsigma d(x,X_{opt})^2, \quad \forall x\in X, \end{aligned}$$
    (32)

    where \(\varsigma \) is a positive constant, then \(C_1(\alpha ) = C(\frac{1}{3}\varsigma \alpha ^2)\) and \(K_1(\alpha ) = K(\frac{1}{3}\varsigma \alpha ^2)\) where \(C(\alpha )\) and \(K(\alpha )\) are given in Proposition 3.6.

  3. (iii)

    Let \(N(\alpha ,\beta )\) be defined as in (24). For \(N\ge N(\frac{1}{3}\varsigma \alpha ^2,\beta )\), we have

    $$\begin{aligned} \mathrm{{Prob}}\left\{ \mathbb D \left(X_{opt}^N, X_{opt}\right) \ge \alpha \right\} \le \beta , \end{aligned}$$

    where \(\beta \in (0,1)\).

 

Proof

The almost sure convergence follows straightforwardly from Proposition 3.4 that \(\psi _N(x,\rho _N)\) converges to \(\psi (x,\rho )\) uniformly over \(X\) and Lemma 3.8. Next, we show the exponential convergence. By Lemma 3.8, for any \(\alpha >0\), there exists \(\varepsilon (\alpha )\) such that if

$$\begin{aligned} \sup _{x\in X} |\psi _N(x,\rho _N)-\psi (x,\rho )|\le \varepsilon (\alpha ), \end{aligned}$$

then \(\mathbb D ( X_{opt}^N, X_{opt}) \le \alpha \). Subsequently,

$$\begin{aligned} \mathrm{{Prob}}\left\{ \mathbb D \left( X_{opt}^N, X_{opt}\right) \ge \alpha \right\} \le \mathrm{{Prob}}\left\{ \sup _{x\in X} |\psi _N(x,\rho _N)-\psi (x,\rho )|\ge \varepsilon (\alpha ) \right\} . \end{aligned}$$

By Proposition 3.6 and the formula above there exist positive constants \(C_1(\alpha )\) and \(K_1(\alpha )\), independent of \(N\) such that

$$\begin{aligned} \mathrm{{Prob}}\left\{ \mathbb D \left(X_{opt}^N, X_{opt}\right) \ge \alpha \right\} \le C_1(\alpha )e^{-NK_1(\alpha )}, \end{aligned}$$

for \(N\) sufficiently large.

Part (ii). Under the second growth condition, it is easy to derive that \(R(\epsilon )=\varsigma \epsilon ^2\), where \(R(\epsilon )\) is given in Lemma 3.8. By (29) in Lemma 3.8 (ii),

$$\begin{aligned} \mathrm{{Prob}}\left\{ \mathbb D \left(X_{opt}^N, X_{opt}\right) \ge \alpha \right\}&\le \mathrm{{Prob}}\left\{ \sqrt{\frac{3}{\varsigma }\sup _{x\in X} |\psi _N(x,\rho _N)-\psi (x,\rho )| } \ge \alpha \right\} \\&= \mathrm{{Prob}}\left\{ \sup _{x\in X} |\psi _N(x,\rho _N)-\psi (x,\rho )| \ge \frac{1}{3}\varsigma \alpha ^2 \right\} . \end{aligned}$$

The rest follows from Part (i).

Part (iii) follows from (24) and Part (ii). The proof is complete. \(\square \)

Let us make some comments on the second order growth condition (32). Since \(G(\cdot ,\xi )\) is assumed to be concave, it is easy to verify that \(\theta (x)\) is a convex function. If \(f(\cdot ,\xi )\) is convex for almost every \(\xi \), then \(\psi (\cdot ,\rho )\) is convex. The second order growth condition is fulfilled if the latter happens to be strongly convex.

3.2 Stationary point

We now move on to investigate the case when we only obtain a stationary point rather than an optimal solution from solving the penalized sample average approximation problem (13). This is motivated by our wish to address the case when \(f(x,\xi )\) is not convex w.r.t. \(x\). Convergence analysis of SAA stationary sequence has been well documented, see [43] and the references therein. Our analysis here differs from those in the literature in two ways: (a) We analyze the convergence of SAA stationary point to its true counterpart rather than so-called weak stationary point of the true problem [43], the analysis is based on an approximation of the Clarke subdifferential of expected value of a random function rather that of the expected value of the Clarke subdifferential of a random function. Note that this kind of subdifferential approximation can be traced back to the earlier work by Birge and Qi [3] and Artstein and Wets [2]. (b) We provide an effective approach to tackle the specific challenges and complications arising from the second order dominance constraints.

We start by defining the stationary points of (11) and (13). Let \(h(x,t)=(\mathbb{E }_P[H(x,t,\xi )])_{+}\) be defined as in (10). For any fixed \(x\in X\), let \(T^*(x)\) denote the set of \(\bar{t}\in T\) such that \(h(x,\bar{t})=\max _{t\in T} h(x,t)\). Since \(G(\cdot ,\xi )\) is concave, then \(\mathbb{E }_P[H(x,t,\xi )]\) is convex in \(x\) and hence it is Clarke regular (see [5, Proposition 2.3.6]). By [5, Proposition 2.3.12]

$$\begin{aligned} \partial _x h(x,t)=\left\{ \begin{array}{l@{\quad }l} 0,&\mathbb{E }_P[H(x,t,\xi )]<0,\\ \text{ conv}\{0,\partial _x\mathbb{E }_P[H(x,t,\xi )]\},&\mathbb{E }_P[H(x,t,\xi )]=0,\\ \partial _x\mathbb{E }_P[H(x,t,\xi )],&\mathbb{E }_P[H(x,t,\xi )]>0. \end{array}\right. \end{aligned}$$
(33)

Here and later on “conv” denotes the convex hull of a set. Since \(h(\cdot ,t)\) is convex for each \(t\), \(T\) is a compact set and for every \(x\), \(h(x,\cdot )\) is continuous on \(T\), by Levin–Valadier theorem (see [37, Section 2, Theorem 51]),

$$\begin{aligned} \partial \theta (x)= \text{ conv}\left\{ \bigcup _{t\in T^*(x)}\partial _xh(x,t)\right\} . \end{aligned}$$
(34)

Let

$$\begin{aligned} \mathcal{T}_{X}(x)=\liminf \limits _{t\rightarrow 0,\; X \ni x^{\prime }\rightarrow x}\frac{1}{t}(X-x^{\prime }) \end{aligned}$$

denote the tangent cone of \(X\) at point \(x\), and \(\mathcal{N}_{X}(x)\) the Clarke normal cone to \(X\) at \(x\), that is, for \(x\in X\),

$$\begin{aligned} \mathcal{N}_{X}(x)=\left\{ \zeta \in \mathbb R ^n\,:\; \zeta ^Td\le 0, \quad \forall d\in \mathcal{T}_{X}(x)\right\} , \end{aligned}$$

and \(\mathcal{N}_{X}(x)=\emptyset \) if \(x\not \in X\). A point \(x\in X\) is said to be a stationary point of the penalized minimization problem (11) if

$$\begin{aligned} 0\in \partial \mathbb{E }_P[f(x,\xi )]+\rho \partial \theta (x) +\mathcal{N}_{X}(x). \end{aligned}$$

Likewise, for any fixed \(x\in X\), let \(T^N(x)\) denote the set of \(\bar{t}\in T\) such that \(h_N(x,\bar{t})=\max _{t\in T} h_N(x,t)\). Then

$$\begin{aligned} \partial _x h_N(x,t)=\left\{ \begin{array}{l@{\quad }l} 0,&H_N(x,t)<0,\\ \text{ conv}\{0,\partial _xH_N(x,t)\},&H_N(x,t)=0,\\ \partial _xH_N(x,t),&H_N(x,t)>0 \end{array}\right. \end{aligned}$$
(35)

and

$$\begin{aligned} \partial \theta _N(x)= \text{ conv}\left\{ \bigcup _{t\in T^N(x)}\partial _xh_N(x,t)\right\} . \end{aligned}$$
(36)

A point \( x\in X\) is said to be a stationary point of the penalized SAA problem (13) if

$$\begin{aligned} 0\in \partial f_N(x)+\rho _N\partial \theta _N(x) +\mathcal{N}_{X}(x). \end{aligned}$$

Assumption 3.11

\(f(x,\xi )\) and \(G(x,\xi )\) are locally Lipschitz continuous w.r.t. \(x\) and \(\xi \), and their Lipschitz modulus w.r.t. \(x\) are bounded by an integrable function \(\kappa (\xi )\) for every \(x\in \mathbb R ^n\).

It is easy to observe that Assumption 3.11 is stronger than Assumption 3.1. Over the past few years, there have been extensive discussions (see [43] ) on the convergence of SAA stationary points to the so-called weak stationary points of the true problem, which are defined through the expected value of the subdifferential of the underlying functions of the true problem in the first order optimality condition. A stationary point is a weak stationary point but not vice versa. Analysis of convergence of the SAA stationary point to a weak stationary point of the true problem can be proved under Assumption 3.1, but convergence to a stationary point of the true problem requires Assumption 3.11.

Consider a sequence of functions \(\{f_N(x)\}\) defined on \(\mathbb R ^n\). Recall that \(f_N\) is said to epiconverge to a function \(f\) if and only if the epigraph of \(f_N\) converges to the epigraph of \(f\). A necessary and sufficient condition for \(f_N\) to epiconverge to \(f\) is that for every \(x\in \mathbb R ^n\),

$$\begin{aligned} \left\{ \begin{array}{l@{\quad }l} \lim \inf _N f_N(x^N)\ge f(x)&\text{ for} \text{ every} \text{ sequence}\,x^N\rightarrow x; \\ \lim \sup _N f_N(x^N)\le f(x)&\text{ for} \text{ some} \text{ sequence}\,x^N\rightarrow x. \end{array} \right. \end{aligned}$$

See [34] for details. For set-valued mappings \(\Gamma _N,\Gamma :\mathbb R ^m \rightrightarrows \mathbb R ^n\), \(\Gamma _N\) is said to converge graphically to \(\Gamma \) if the graph of \(\Gamma _N\) converges to that of \(\Gamma \).

Proposition 3.12

Let \(\theta (x)\) and \(\theta _N(x)\) be defined as in (10) and (12) respectively. Let \(\{x^N\}\subset X\) be a sequence which converges to \(x^*\) almost surely. Under Assumption 3.11

$$\begin{aligned} \lim _{N\rightarrow \infty } \mathbb D (\partial \theta _N(x^N),\partial \theta (x^*)) =0 \end{aligned}$$
(37)

almost surely.

 

Proof

Since for any \(\xi \in \Xi \), \(G(\cdot ,\xi )\) is concave function, then \(H(x,t,\xi )\) is a convex function with respect to \(x\) over \(X\) and so are \(h_N(x,t)\), \(h(x,t)\), \(\theta (x)\) and \(\theta _N(x)\). Under Assumption 3.11, it follows by [37, Section 6, Proposition 7] that \(H_N(x,t,\xi )\) converges to \(\mathbb{E }_P[H(x,t,\xi )]\) uniformly over any compact subset of \(\mathbb R ^n\times \mathbb R \) almost surely. Subsequently, it is easy to verify that \(\theta _N(x)\) converges to \(\theta (x)\) uniformly over any compact subset of \(\mathbb R ^n\), which implies, via [34, Proposition 7.15], \(\theta _N\) epiconverges to \(\theta \) almost surely. By Attouch’s theorem ([34, Theorem 12.35]), the latter convergence implies \(\partial \theta _N\) converges to \(\partial \theta \) graphically over \(X\) and hence (37). The proof is complete. \(\square \)

 

Theorem 3.13

Let \(\{x^N\}\) be a sequence of KKT points of problem (13) and \(x^*\) be an accumulation point. Suppose: (a) Assumption 3.11 holds; (b) for every \(\xi \in \Xi \), \(f(\cdot ,\xi )\) is Clarke regular on \(X\); (c) the probability space is nonatomic. If \(\rho _N\rightarrow \rho \), then with probability 1 \(x^*\) is a stationary point of the true penalty problem (11).

 

Proof

By taking a subsequence if necessary we may assume for simplicity that \(x^N\) converges to \(x^*\). Observe first that for any compact sets \(A,B,C,D\subseteq \mathbb R ^m\),

$$\begin{aligned} \mathbb D (A+C,B+D)&\le \mathbb D (A+C,B+C)+\mathbb D (B+C,B+D)\nonumber \\&\le \mathbb D (A,B)+\mathbb D (C,D), \end{aligned}$$
(38)

where the first inequality follows from the triangle inequality and the second inequality follows from the definition of \(\mathbb D \). Using the inequality (38), we have

$$\begin{aligned}&\mathbb D \left(\partial f_N(x^N)+\rho _N\partial \theta (x^N), \partial \mathbb{E }_P[f(x^*,\xi )]+\rho _N\partial \theta (x^*) \right)\nonumber \\&\quad \le \mathbb D \left(\partial f_N(x^N), \partial \mathbb{E }_P[f(x^*,\xi )]\right)\\&\quad +\mathbb D \left( \rho _N\partial \theta _N(x^N), \rho \partial \theta (x^*)\right). \end{aligned}$$

In Proposition 3.12, we have shown that

$$\begin{aligned} \lim _{N\rightarrow \infty } \mathbb D \left( \rho _N\partial \theta _N(x^N), \rho \partial \theta (x^*)\right)=0. \end{aligned}$$

In what follows, we show

$$\begin{aligned} \lim _{N\rightarrow \infty } \mathbb D \left(\partial f_N(x^N), \partial \mathbb{E }_P[f(x^*,\xi )]\right) =0. \end{aligned}$$
(39)

Under the Clarke regularity

$$\begin{aligned} \partial \mathbb{E }_P[f(x^*,\xi )] = \mathbb{E }_P[\partial _x f(x^*,\xi )], \end{aligned}$$

where \(\mathbb{E }_P[\partial _x f(x^*,\xi )]\) denotes Aumann’s [1] integral of the Clarke subdifferential. Under Assumption 3.11, it is well known that \(\mathbb{E }_P[\partial _x f(x^*,\xi )]\) is well defined, see for instance [1] and [44, Proposition 2.2]. Moreover, the Clarke regularity implies

$$\begin{aligned} \partial f_N(x^N) = \frac{1}{N}\sum \limits _{k=1}^N \partial _x f(x^N,\xi ^k). \end{aligned}$$

Therefore it suffices to show that

$$\begin{aligned} \lim _{N\rightarrow \infty } \mathbb D \left(\frac{1}{N}\sum \limits _{k=1}^N \partial _x f(x^N,\xi ^k), \mathbb{E }_P[\partial _x f(x^*,\xi )]\right) =0 \end{aligned}$$
(40)

almost surely. Under Assumption 3.11, we have by virtue of [40, Theorem 2],

$$\begin{aligned} \lim _{N\rightarrow \infty } \sup _{x\in X} \mathbb D \left(\frac{1}{N}\sum \limits _{k=1}^N \partial _x f(x,\xi ^k), \mathbb{E }_P[\partial _x^\delta f(x,\xi )]\right) =0 \end{aligned}$$

almost surely, where \(\delta \) is a positive number which can be arbitrarily small and

$$\begin{aligned} \partial _x^\delta f(x,\xi ) = \bigcup _{x^{\prime }\in B(x,\delta )} \partial _x f(x^{\prime },\xi ). \end{aligned}$$

This implies

$$\begin{aligned} \lim _{N\rightarrow \infty } \mathbb D \left(\frac{1}{N}\sum \limits _{k=1}^N \partial _x f(x^N,\xi ^k), \mathbb{E }_P[\partial _x^\delta f(x^*,\xi )]\right) =0\; \end{aligned}$$

almost surely and hence

$$\begin{aligned} 0\in \mathbb{E }_P[ \partial _x^\delta f(x^*,\xi )]+\rho \partial \theta (x^*) +\mathcal{N}_{X}(x^*). \end{aligned}$$

By [18, Theorems 2.5] (or [25, Theorem 1.43 (iii)]),

$$\begin{aligned} \mathop {\overline{\lim }}_{\delta \downarrow 0} \mathbb{E }_P[ \partial _x^\delta f(x^*,\xi )] \subset \mathbb{E }_P\left[\mathop {\overline{\lim }}_{\delta \downarrow 0} \partial _x^\delta f(x^*,\xi )\right]= \mathbb{E }_P[\partial _x f(x^*,\xi )]. \end{aligned}$$

The last equality is due to the fact that \(\mathop {\overline{\lim }}_{\delta \downarrow 0} \partial _x^\delta f(x^*,\xi ) = \partial _x f(x^*,\xi )\). Using (38) and the discussions above, we can easily obtain (40) and hence

$$\begin{aligned} 0\in \mathbb{E }_P[ \partial _x f(x^*,\xi )]+\rho \partial \theta (x^*) +\mathcal{N}_{X}(x^*)=\partial \mathbb{E }_P[ f(x^*,\xi )]+\rho \partial \theta (x^*) +\mathcal{N}_{X}(x^*). \end{aligned}$$

This shows that \(x^*\) is a stationary point of the true penalty problem (11). The proof is complete. \(\square \)

Note that the Clarke regularity condition used in the theorem may be replaced by other conditions. For instance, if the Lipschitz modulus in Assumption 3.11 is bounded by positive constant and for a small positive constant \(\tau _0\), \(\frac{1}{\tau }(f(x+\tau u,\xi )-f(x,\xi ))\) is uniformly continuous w.r.t. \(\xi \) for all \(x\in X\), \(u\in \mathbb R ^n\) with \(\Vert u\Vert \le 1\) and \(\tau \in (0,\tau _0]\), then we may apply [23, Lemma 5.2] to show

$$\begin{aligned} \lim _{N\rightarrow \infty } \sup _{x\in X} \mathbb D (\partial f_N(x), \partial \mathbb{E }_P[f(x,\xi )]) \rightarrow 0 \end{aligned}$$

and hence (39). We omit the details.

Note also that it might be interesting to ask whether a stationary point of problem (11) is a stationary point of problem (5). To answer this question, we need to consider the first order optimality conditions for the latter problem. Let us assume that problem (5) satisfies the Slater condition, \(X\) is a compact set and the Lipschitz modulus of \(f(x,\xi )\) w.r.t. \(x\) is bounded by an integrable function \(\kappa (\xi )>0\). We consider the following optimality conditions:

$$\begin{aligned} \left\{ \begin{array}{ll} 0\in \partial \mathbb{E }_P[f(x,\xi )]+\lambda \partial \theta (x),\\ \lambda >0,\\ \mathbb{E }_P[H(x,t,\xi )]\le 0, \quad \forall t\in T,\\ x\in X. \end{array}\right. \end{aligned}$$
(41)

We say a point \(x^*\) is a stationary point of (5) if there exists \(\lambda ^*>0\) such that \((x^*,\lambda ^*)\) satisfies (41). To justify this definition, we show that every local optimal solution to problem (5) satisfies (41) (along with some positive number \(\lambda \)). In the case when \(\mathbb{E }_P[f(x,\xi )]\) is a convex function, a point satisfying optimality conditions (41) is a global optimal solution to problem (5). In what follows, we verify this. Let \(\hat{x}\) be a local minimizer of (5). Let

$$\begin{aligned} \gamma (P)=:\sup _{x\in X}\frac{\Vert x-\bar{x}\Vert }{\min _{t\in T}-\mathbb{E }_P[H(\bar{x},t,\xi )]}, \end{aligned}$$

where \(\bar{x}\) is given in Definition 2.3. Then for \(\rho > \gamma (P)\mathbb{E }_P[\kappa (\xi )]\), \(x^*\) is a local optimal solution of (11). This shows \((x^*, \rho )\) satisfies optimality condition (41). Conversely if \(x^*\) is a stationary point, that is, there exists positive number \(\lambda ^*\) such that \((x^*,\lambda ^*)\) satisfies optimality conditions (41). If \(\mathbb{E }_P[f(x,\xi )]\) is a convex function, then it is easy to see that \(x^*\) is a global optimal solution of (11) with \(\rho =\lambda ^*\). Since \(x^*\) is a feasible point of (5), it is not difficult to verify that \(x^*\) is a global optimal solution of problem (5).

Note that Dentcheva and Ruszczyński [12] introduced some first order optimality conditions for a class of semi-infinite programming problems arising from optimization problems with stochastic second order constraints. Let \(\fancyscript{M}(T)\) denote the set of regular countably additive measures on \(T\) and \(\fancyscript{M}_+(T)\) its subset of positive measures. Consider the following Lagrange function of (5):

$$\begin{aligned} \fancyscript{L}(x,\mu ) = \mathbb{E }_P[f(x,\xi )] + \int \limits _T \mathbb{E }_P[H(x,t,\xi )] \mu (dt), \end{aligned}$$

where \(\mu \in \fancyscript{M}_+(T)\). Under the so-called differential constraint qualifications, Dentcheva and Ruszczyński showed that if a point \(x^*\) is a local optimal solution of problem (5), then there exists \(\mu ^*\in \fancyscript{M}_+(T)\) such that

$$\begin{aligned} \left\{ \begin{array}{ll} 0\in \partial _x \fancyscript{L}(x,\mu )=\partial \mathbb{E }_P[f(x^*,\xi )] + \int _T \partial _x \mathbb{E }_P[H(x^*,t,\xi )]\mu ^*(dt) + \mathcal{N}_{X}(x^*),\\ \mathbb{E }_P[H(x^*,t,\xi )]\le 0, \quad \forall t\in T,\\ \int _T \mathbb{E }_P[H(x^*,t,\xi )]\mu ^*(dt) =0,\\ x\in X, \end{array} \right. \end{aligned}$$
(42)

see [12, Theorem 4] for details and [12, Definition 2] for the definition of the differential constraint qualification. Note that optimality conditions (42) can also be alternatively characterized by some convex functions defined over \(\mathbb R \). This can be done by representing the integral w.r.t. measure \(\mu \) by some convex functions through the Riesz representation theorem, see [8, 9] for details. It is an open question as to whether there is some relationship between (41) and (42) or the equivalent conditions of (42) in [8, 9], and this will be the focus of our future work.