1 Introduction

The absolute value equation (AVE) is to find a vector \(x\in \Re ^{n}\) such that

$$\begin{aligned} Ax+B|x| = b, \end{aligned}$$
(1)

where \(A\in \Re ^{n\times n}, B\in \Re ^{n\times n}, b\in \Re ^{n}\) and |x| denotes the component-wise absolute value of x. If \(B =-I\), where I denotes an identity matrix, then AVE (1) reduces to the special form:

$$\begin{aligned} Ax-|x| = b. \end{aligned}$$
(2)

The main significance of AVE arises from its wide applications in the different fields of mathematics and engineering applications. For example, the general linear complementarity problem [3] (including linear programs), quadratic programs, bimatrix games and some other mathematical programming problems can be formulated as AVE; see [13] for more information. Particularly, if \(B = 0,\) then AVE (1) reduces to the system of linear equations \(Ax = b\), the most foundamental problem in scientific computing. Over the last ten years, AVE (1) and its special form (2) have been extensively studied from both theoretical analysis and numerical algorithms, see [2, 9,10,11,12, 20, 26, 29, 30, 32].

In this paper, we mainly study a class of stochastic absolute value equation (SAVE), which is to find a vector \(x\in \Re ^{n}\) such that

$$\begin{aligned} A(\omega )x+B(\omega )|x| = b(\omega ), \end{aligned}$$
(3)

where \(\omega \in {\Omega }\) and \(({\Omega },\mathcal {F},P)\) is a probability space, \(A(\omega )\in \Re ^{n\times n}\), \(B(\omega )\in \Re ^{n\times n}\) and \(b(\omega )\in \Re ^{n}\) for each \(\omega \in {\Omega }\). If \({\Omega }\) only contains one realization, then SAVE (3) would reduce to AVE (1). One of our motivations for studying SAVE comes from the fact that the well-known stochastic linear complementarity problem (SLCP)

$$\begin{aligned} u\ge 0,~v\ge 0,~v=M(\omega )u+q(\omega ),~u^Tv=0 \end{aligned}$$
(4)

with \(M(\omega )\in \Re ^{n\times n}\) and \(q(\omega )\in \Re ^{n}\) can be formulated as a SAVE (see Sect. 2 below). In the past few decades, due to its widespread application, SLCP has been extensively studied. Here we only list a few literatures that are closely related to this paper. For example, Chen and Fukushima [4] introduced the Expected Residual Minimization (ERM) formula for SLCP and studied some properties of this new reformulation. Chen et al. [6] further proved that the solutions of the ERM formulation are robust for the monotone SLCP. Zhang and Chen [33] investigated a smooth projected gradient method and applied this algorithm to solve SLCP. Zhou and Caccetta [34] proposed a feasible semismooth Newton method for solving a class of SLCPs with finite many realizations. More research and discussions on SLCP can be found in [7, 16, 17] and references therein. For stochastic absolute value equation, Du [8] proved that the SAVE \(A(\omega )x-|x|=b(\omega )\), a special case of SAVE (3), is equivalent to the stochastic bilinear programming and the stochastic generalized linear complementarity problem.

Due to the existence of the randomness of \(\omega \), we do not expect the existence of a vector x satisfying SAVE (3) for all \(\omega \in {\Omega }\). Therefore, in order to obtain a reasonable solution of (3), we need to search for suitable reformulations for SAVE. It is well-known that the deterministic Expected Residual Minimization (ERM) formulation is one of the most effective techniques to deal with stochastic optimization problems. The ERM formulation was firstly proposed by Chen and Fukushima [4] for solving SLCP, where the objective is to minimize an expected residual defined by NCP functions. Subsequently, the ERM formulation has been further studied to deal with stochastic variational inequalities [5], stochastic complementarity problems [14], stochastic generalized complementarity problems [18] and stochastic second-order cone complementarity problems [15, 19, 31].

Our goal is to study the ERM deterministic formulation for SAVE (3) and then propose a smoothing gradient method to find its approximate solution. Our main contributions of the paper can be summarized as follows.

(i) We first show that the SLCP can be reformulated as a SAVE, while the converse direction holds as well provided that \(A(\omega )+B(\omega )\) is nonsingular for each \(\omega \in {\Omega }\).

(ii) We introduce the ERM formulation for SAVE and its Monte Carlo sample average approximation. The existence of optimal solutions to ERM problem and its discrete approximation is established when \(\{A({\bar{\omega }}), B({\bar{\omega }})\}\) is an \(R_0\) pair for an observation \({\bar{\omega }}\). Our results show that the solution of the approximate ERM problem converges to that of ERM problem with probability one, and the optimal value of the sample average approximation achieves a uniform exponential convergence.

(iii) Under some regularity conditions, we prove that the solutions of ERM problem are robust in the sense that they may have a minimum sensitivity with respect to random parameter variations in SAVE. This also indicates from a new perspective that ERM is an effective reformulation of SAVE.

(iv) In general, SAVE is nonsmooth due to the existence of absolute values. However, for a special class of SAVE problems, by exploiting its structure we provide a smooth (even twice continuously differentiable) residual and further study the convergence of the stationary points.

(v) For a general SAVE, we propose a new smooth approximation formulation by simultaneously considering sampling and smoothing techniques (in which the number of samples can also be treated as a smoothing factor). This new smooth approximation allows us to use a smooth gradient method to solve SAVE. Our numerical experiments indicate the algorithm is very effective.

The rest of our paper is organized as follows. In the next section, we discuss the relationship between SAVE and SLCP. In Sect. 3, we study the ERM formulation for SAVE and investigate the boundedness of its level sets, the existence of optimal solutions as well as the exponential convergence rate. The robustness of the solutions to the ERM problem is given in Sect. sec4. Section 5 is devoted to a smoothing reformation for a class of special SAVE. Section 6 proposes a smoothing gradient method for solving general SAVE. Conclusions are drawn in Sect. 7.

2 Relations of SAVE and SLCP

Theorem 1

The SLCP (4) can be formulated as a SAVE (3) with \(A(\omega )=I+M(\omega )\), \(B(\omega )=I-M(\omega )\) and \(b(\omega )=2q(\omega ).\)

Proof

Note that SLCP (4) can be written as

$$\begin{aligned} u\ge 0,~v\ge 0,~(I+M(\omega ))(v-u)+(I-M(\omega ))(v+u)=2q(\omega ),~u^Tv=0. \end{aligned}$$

Let \(x:=v-u\). Then, for any \(i=1,...,n\), since \(u_i\ge 0, v_i\ge 0\) and \(u_iv_i=0\), we have \(x_i^2=(v_i-u_i)^2=(v_i+u_i)^2\) and so \(|x_i|=v_i+u_i\). This implies that \(|x|=v+u\). So, SLCP (4) reduces to SAVE (3) with \(A(\omega )=I+M(\omega )\), \(B(\omega )=I-M(\omega )\) and \(b(\omega )=2q(\omega )\). \(\square \)

Theorem 2

If the matrix \(A(\omega )+B(\omega )\) is nonsingular for each \(\omega \in {\Omega }\), then SAVE (3) is equivalent to SLCP (4).

Proof

According to Theorem 1, it only needs to show that SAVE (3) can be formulated as a SLCP (4). Note that SAVE (3) can be written as

$$\begin{aligned} (A(\omega )+B(\omega ))\frac{x+|x|}{2}+(A(\omega )-B(\omega ))\frac{x-|x|}{2}=b(\omega ). \end{aligned}$$

Since \(A(\omega )+B(\omega )\) is nonsingular for each \(\omega \in {\Omega }\), the above equation is equivalent to

$$\begin{aligned} \frac{|x|+x}{2}=(A(\omega )+B(\omega ))^{-1}(A(\omega )-B(\omega ))\frac{|x|-x}{2} +(A(\omega )+B(\omega ))^{-1}b(\omega ).\nonumber \\ \end{aligned}$$
(5)

By setting \(v:=\frac{|x|+x}{2},~u:=\frac{|x|-x}{2},~M(\omega ):=(A(\omega )+B(\omega ))^{-1}(A(\omega )-B(\omega ))\) and \(q(\omega ):=(A(\omega )+B(\omega ))^{-1}b(\omega ),\) SAVE (5) is reformulated as SLCP (4). \(\square \)

Remark 1

Theorem 1 provides an equivalent way to solve SLCP (4). Specifically, we may first find a solution x by solving the following SAVE

$$\begin{aligned} (I+M(\omega ))x+(I-M(\omega ))|x|=2q(\omega ). \end{aligned}$$

Then setting \(v:=\frac{|x|+x}{2}\) and \(u:=\frac{|x|-x}{2}\) yields a solution (uv) for SLCP (4).

3 ERM Formulation for SAVE

Let

$$\begin{aligned} \textrm{F}({x},\omega ):=A(\omega )x+B(\omega )|x|- b(\omega ). \end{aligned}$$
(6)

We introduce the following ERM model for the SAVE (3):

$$\begin{aligned} \min \limits _{{x}\in \Re ^{n}}{\Theta }({x}):={\mathbb {E}}[\Vert \textrm{F}({x},\omega )\Vert ^2], \end{aligned}$$
(7)

where \({\mathbb {E}}\) denotes the expectation with respect to the random variable \(\omega \). To proceed, we first give the assumption required in this section.

Assumption A Let \(\omega \) be a continuous random variable. Let \(A(\omega ), B(\omega )\) and \(b(\omega )\) be continuous functions of \(\omega \) satisfying

$$\begin{aligned} & {\mathbb {E}}[(\Vert A(\omega )\Vert +\Vert B(\omega )\Vert +\Vert b(\omega )\Vert )^2]\\ & \quad =\int _{\Omega }(\Vert A(\omega )\Vert +\Vert B(\omega )\Vert +\Vert b(\omega )\Vert )^2\rho (\omega )d\omega <+\infty , \end{aligned}$$

where \(\rho :{\Omega }\rightarrow \Re _+\) denotes the continuous probability density function of \(\omega \).

To analyze the existence of optimal solutions to the ERM problem (7), we introduce the following definition of \(R_0\) pair.

Definition 1

For the given matrices \(A,B\in \Re ^{n\times n}\), we define \(\{A,B\}\) to be an \(R_0\) pair if

$$\begin{aligned} Ax+B|x| = 0~\Longrightarrow ~x=0. \end{aligned}$$

The definition of \(R_0\) pair is motivated by the original definition of \(R_0\) matrix for the LCP [4]. Obviously, if AVE (1) is uniquely solvable for any \(b\in \Re ^n\), then \(\{A,B\}\) is an \(R_0\) pair. According to existing results on the unique solvability of AVE (1), \(\{A,B\}\) is an \(R_0\) pair if one of the following conditions holds:

(a) \(\sigma _{\max }(|B|)<\sigma _{\min }(A)\), where \(\sigma _{\max }\) and \(\sigma _{\min }\) denote the maximal and minimal singular value respectively [25];

(b) The inequality \(|Ax|\le |B||x|\) only has trivial solutions [24];

(c) The interval matrix \([A -|B|, A+|B|]\) is regular [30];

(d) \(\{A+B, A- B\}\) has the column \(\mathcal {W}\)-property [20].

More sufficient conditions for the unique solvability of AVE (1) can be found in [26].

Lemma 1

If \(\{A({\bar{\omega }}),B({\bar{\omega }})\}\) is an \(R_0\) pair for some \({\bar{\omega }}\in {\Omega }\), then there exists a closed sphere \(\mathcal {B}({\bar{\omega }},\delta ):=\{\omega |~\Vert \omega -{\bar{\omega }}\Vert \le \delta \}\) with \(\delta > 0\) such that \(\{A({\omega }),B({\omega })\}\) is an \(R_0\) pair for every \(\omega \in \mathcal {B}({\bar{\omega }},\delta )\cap {\Omega }\).

Proof

Assume that the lemma is not true. Then there is a sequence \(\{\omega ^k\}\subset {\Omega }\) satisfying \(\lim \limits _{k\rightarrow \infty }\omega ^k={\bar{\omega }}\) and, for each k, there exists \(x^k\in \Re ^{n}\) such that

$$\begin{aligned} A(\omega ^k)x^k+B(\omega ^k)|x^k| = 0,~~x^k\ne 0. \end{aligned}$$

Let \(t^k:=\frac{x^k}{\Vert x^k\Vert }.\) Then \(\Vert t^k\Vert =1\) and

$$\begin{aligned} A(\omega ^k)t^k+B(\omega ^k)|t^k| = 0. \end{aligned}$$

This implies the existence of \({\bar{t}}\in \Re ^{n}\) such that

$$\begin{aligned} \Vert {\bar{t}}\Vert =1,~~A({\bar{\omega }}){\bar{t}}+B({\bar{\omega }})|{\bar{t}}| = 0. \end{aligned}$$

This leads to a contradiction with the assumption that \(\{A({\bar{\omega }}),B({\bar{\omega }})\}\) is an \(R_0\) pair. \(\square \)

Theorem 3

Suppose that there exists some \({\bar{\omega }}\in {\Omega }\) such that \(\rho ({\bar{\omega }})>0\) and \(\{A({{\bar{\omega }}}),B({{\bar{\omega }}})\}\) is an \(R_0\) pair. Then for any \(\gamma >0\), the level set defined by

$$\begin{aligned} \mathcal {L}(\gamma ):=\{ {x}\in \Re ^{n}|{\Theta }({x}) \le \gamma \} \end{aligned}$$

is bounded.

Proof According to the continuity of \(\rho \) and Lemma 1, there exist a constant \({\bar{\rho }}>0\) and a closed sphere \(\mathcal {B}({\bar{\omega }},\delta )\) such that \(\rho (\omega )\ge {\bar{\rho }}\) and \(\{A({\omega }),B({\omega })\}\) is an \(R_0\) pair for any \(\omega \in \bar{\mathcal {B}}:=\mathcal {B}({\bar{\omega }},\delta )\cap {\Omega }\). Suppose on the contrary that there exists a constant \(\gamma > 0\) and a sequence \(\{{x}^k\}\subset \mathcal {L}(\gamma )\) such that \(\lim \limits _{k\rightarrow \infty }\Vert {x}^k\Vert =+\infty \). The continuity of \(A(\omega ), B(\omega )\) and \(b(\omega )\) ensures that for every k there exists an \(\omega ^k\in \bar{\mathcal {B}}\) such that

$$\begin{aligned} \Vert \textrm{F}({x}^k,\omega ^k)\Vert =\min _{\omega \in \mathcal {{\bar{B}}}}\Vert \textrm{F}({x}^k,\omega )\Vert . \end{aligned}$$

Hence

$$\begin{aligned} \gamma \ge {\Theta }({x}^k)\ge \int _{\mathcal {{\bar{B}}}} \Vert \textrm{F}({x}^k,\omega )\Vert ^2\rho (\omega )d\omega \ge \Vert \textrm{F} ({x}^k,\omega ^k)\Vert ^2{\bar{\rho }}\int _{\mathcal {{\bar{B}}}}d\omega , \end{aligned}$$

which together with \(\int _{\mathcal {{\bar{B}}}}d\omega >0\) implies that \(\{\Vert \textrm{F}({x}^k,\omega ^k)\Vert \}\) is bounded. So, there exists a constant \(C>0\) such that

$$\begin{aligned} \Vert \textrm{F}({x}^k,\omega ^k)\Vert =\Vert A(\omega ^k)x^k+B(\omega ^k)|x^k|- b(\omega ^k)\Vert \le C. \end{aligned}$$
(8)

Since the sequences \(\big \{\frac{{x}^k}{\Vert {x}^k\Vert }\big \}\) and \(\{\omega _k\}\) are bounded, we can assume by taking a subsequence if necessary that they are convergent and denote the limits as \({x}^*:=\lim \limits _{k\rightarrow \infty }\frac{{x}^k}{\Vert {x}^k\Vert }\) and \(\omega ^*:=\lim \limits _{k\rightarrow \infty }\omega ^k.\) Clearly, \(\Vert {x}^*\Vert =1\) and \(\omega ^*\in \mathcal {{\bar{B}}}\). By (8), we have

$$\begin{aligned} \bigg \Vert A(\omega ^k)\frac{x^k}{\Vert {x}^k\Vert }+B(\omega ^k)\bigg |\frac{x^k}{\Vert {x}^k\Vert }\bigg |- \frac{b(\omega ^k)}{\Vert {x}^k\Vert }\bigg \Vert \le \frac{C}{\Vert {x}^k\Vert }. \end{aligned}$$
(9)

Since \(\{b(\omega ^k)\}\) is bounded due to the continuity of \(b(\omega )\), by letting k tend towards infinity in (9), it holds that

$$\begin{aligned} A(\omega ^*){x}^*+B(\omega ^*)|{x}^*|=0. \end{aligned}$$

Hence \({x}^*= 0\) because \(\{A({\omega ^*}),B({\omega ^*})\}\) is an \(R_0\) pair. This yields a contradiction with \(\Vert {x}^*\Vert =1\). The proof is complete. \(\sqcap \!\!\!\!\sqcup \)

As is well known, for an integrable function \(\varphi :{\Omega }\rightarrow \Re \), the Monte Carlo sampling estimate for \(\mathbb {E}[\varphi (\omega )]\) is obtained by taking independently and identically distributed random samples \({\Omega }_k:=\{\omega ^1,...,\omega ^{N_k}\}\) from \({\Omega }\) and letting

$$\begin{aligned} \mathbb {E}[\varphi (\omega )]\approx \frac{1}{N_k}\sum \limits _{\omega ^i\in {\Omega }_k}\varphi (\omega ^i). \end{aligned}$$

The strong law of large numbers guarantees that this procedure converges with probability one (abbreviated by “w.p.1" below), that is,

$$\begin{aligned} \lim \limits _{k\rightarrow \infty }\frac{1}{N_k}\sum \limits _{\omega ^i\in {\Omega }_k}\varphi (\omega ^i)=\mathbb {E}[\varphi (\omega )],~\mathrm {w.p.1}, \end{aligned}$$
(10)

where \(N_k\rightarrow +\infty \) as \(k\rightarrow \infty \).

By generating independently and identically distributed random samples \({\Omega }_k=\{\omega ^1,...,\omega ^{N_k}\}\) from \({\Omega }\), we can obtain the following approximation of the ERM problem (7):

$$\begin{aligned} \min \limits _{{x}\in \Re ^{n}}{\Theta }_k({x}):=\frac{1}{N_k}\sum \limits _{\omega ^i\in {\Omega }_k}\Vert \textrm{F}({x},\omega ^i)\Vert ^2. \end{aligned}$$
(11)

According to (6), the function \({\Theta }_k({x})\) can be further written as

$$\begin{aligned} {\Theta }_k({x})=\frac{1}{N_k}\sum \limits _{\omega ^i\in {\Omega }_k}\Vert A(\omega ^i)x+B(\omega ^i)|x|- b(\omega ^i)\Vert ^2. \end{aligned}$$
(12)

We now give the existence of global optimal solutions to (7) and (11). To this end, we denote the set of optimal solutions to the ERM problem (7) by \(\mathcal {S}^*\) and those of the approximate ERM problem (11) by \(\mathcal {S}_k^*\) respectively

Theorem 4

Let Assumption A hold. Suppose that there exists an \({\bar{\omega }}\in {\Omega }\) such that \(\rho ({\bar{\omega }})>0\) and \(\{A({\bar{\omega }}),B({\bar{\omega }})\}\) is an \(R_0\) pair. Then \(\mathcal {S}_k^*\) is nonempty and bounded for all large k. Let \(x^k\in S^*_k\) for each k and \(x^*\) be an accumulation point of the sequence \(\{x^k\}\). Then \(x^*\in \mathcal {S}^*\) with probability one.

Proof

By Lemma 1, there exist an index \({\bar{k}} > 0\) and a closed sphere \(\mathcal {B}({\bar{\omega }},\delta )\) such that for all \(k\ge {\bar{k}}\), \({\Omega }_k\cap \mathcal {B}({\bar{\omega }},\delta )\) are nonempty and \(\{A(\omega ),B(\omega )\}\) is an \(R_0\) pair for each \(\omega \in {\Omega }_k\cap \mathcal {B}({\bar{\omega }},\delta )\). Hence, by following a similar argument as in Theorem 3, we can show that \(S^*_k\) is nonempty and bounded as \(k\ge {\bar{k}}\).

Next, we prove \(x^*\in \mathcal {S}^*\). In fact, for any fixed \({x}\in \Re ^n\),

$$\begin{aligned} \Vert \textrm{F}({x},\omega )\Vert= & \Vert A(\omega )x+B(\omega )|x|- b(\omega )\Vert \nonumber \\\le & (\Vert A(\omega )\Vert +\Vert B(\omega )\Vert )\Vert {x}\Vert +\Vert b(\omega )\Vert \nonumber \\\le & (\Vert A(\omega )\Vert +\Vert B(\omega )\Vert +\Vert b(\omega )\Vert )(\Vert {x}\Vert +1). \end{aligned}$$
(13)

This together with Assumption A implies that the function \(\Vert \textrm{F}({x},\omega )\Vert ^2\) is integrable over \({\Omega }\). Thus, according to (10), for any fixed \({x}\in \Re ^n\),

$$\begin{aligned} \lim \limits _{k\rightarrow +\infty }{\Theta }_k({x})={\Theta }({x}),~\mathrm {w.p.1}. \end{aligned}$$
(14)

We now assume by taking a subsequence if necessary that \(\lim \limits _{k\rightarrow \infty }{x}^k={x}^*\). Pick up a constant \(\zeta >0\) satisfying \({\Theta }({x}^*)<\zeta \). Since

$$\begin{aligned} \Vert \textrm{F}({x},\omega )\Vert ^2\le (\Vert A(\omega )\Vert +\Vert B(\omega )\Vert +\Vert b(\omega )\Vert )^2(\Vert {x}\Vert +1)^2, \end{aligned}$$

and \(\mathbb {E}[(\Vert A(\omega )\Vert +\Vert B(\omega )\Vert +\Vert b(\omega )\Vert )^2]<+\infty \) by Assumption A, by [22, Theorem 16.8], we know that \({\Theta }({x})={\mathbb {E}}[\Vert \textrm{F}({x},\omega )\Vert ^2]\) is continuous at any \(x\in \mathbb {R}^n.\) Hence \({\Theta }({x}^k)\le \zeta \) for all sufficiently large k. Since the level set \(\mathcal {L}(\zeta )= \{{x}\in \Re ^n|{\Theta }({x}) \le \zeta \}\) is closed and bounded by Theorem 3, the constant

$$\begin{aligned} {\tilde{C}}:=\max \{\Vert {x}\Vert ~|{x}\in \mathcal {L}(\zeta )\} \end{aligned}$$
(15)

is well-defined. For all sufficiently large k and any \(\omega \in {\Omega }\), we have

$$\begin{aligned} & |\Vert A(\omega )x^k+B(\omega )|x^k|- b(\omega )\Vert -\Vert A(\omega )x^*+B(\omega )|x^*|- b(\omega )\Vert |\nonumber \\\le & \Vert A(\omega )\Vert \Vert x^k-x^*\Vert +\Vert B(\omega )\Vert \Vert |x^k|-|x^*|\Vert \nonumber \\\le & (\Vert A(\omega )\Vert +\Vert B(\omega )\Vert )\Vert {x}^k-{x}^*\Vert \nonumber \\\le & (\Vert A(\omega )\Vert +\Vert B(\omega )\Vert +\Vert b(\omega )\Vert )\Vert {x}^k-{x}^*\Vert . \end{aligned}$$
(16)

Thus, by (13), (15) and (16), it follows from (12) that

$$\begin{aligned} & |{\Theta }_k({x}^k)-{\Theta }_k({x}^*)|\nonumber \\\le & \frac{1}{N_k}\sum \limits _{\omega ^i\in {\Omega }_k}\big |\Vert A(\omega ^i)x^k+B(\omega ^i)|x^k|- b(\omega ^i)\Vert ^2\nonumber \\ & -\Vert A(\omega ^i)x^*+B(\omega ^i)|x^*|- b(\omega ^i)\Vert ^2\big |\nonumber \\\le & \frac{1}{N_k}\sum \limits _{\omega ^i\in {\Omega }_k}2({\tilde{C}}+1)(\Vert A(\omega ^i)\Vert +\Vert B(\omega ^i)\Vert +\Vert b(\omega ^i)\Vert )^2\Vert {x}^k-{x}^*\Vert \nonumber \\\rightarrow & 0,~\mathrm {w.p.1}, \end{aligned}$$
(17)

where the limit holds because \(\{{x}^k\}\) converges to \({x}^*\) and

$$\begin{aligned} & \lim \limits _{k\rightarrow \infty }\frac{1}{N_k}\sum \limits _{\omega ^i\in {\Omega }_k}(\Vert A(\omega ^i)\Vert +\Vert B(\omega ^i)\Vert +\Vert b(\omega ^i)\Vert )^2\nonumber \\= & \mathbb {E}[(\Vert A(\omega )\Vert +\Vert B(\omega )\Vert +\Vert b(\omega )\Vert )^2]<+\infty ,~\mathrm {w.p.1} \end{aligned}$$

by Assumption A and (10). Further, putting (14) and (17) together yields

$$\begin{aligned} |{\Theta }_k({x}^k)-{\Theta }({x}^*)|\le |{\Theta }_k({x}^k)-{\Theta }_k({x}^*)|+|{\Theta }_k({x}^*)-{\Theta }({x}^*)|\rightarrow 0,~\mathrm {w.p.1}. \end{aligned}$$
(18)

On the other hand, since \({x}^k\) is the optimal solution of the problem (11), we have

$$\begin{aligned} {\Theta }_k({x}^k)\le {\Theta }_k({x}),~~\forall \, {x}\in \Re ^{n}. \end{aligned}$$

This together with (14) and (18) gives

$$\begin{aligned} {\Theta }({x}^*)=\lim \limits _{k\rightarrow \infty }{\Theta }_k({x}^k)\le \lim \limits _{k\rightarrow \infty }{\Theta }_k({x}) ={\Theta }({x}),~\forall \, {x}\in \Re ^{n} \end{aligned}$$

with probability one. Hence \(x^*\in \mathcal {S}^*\) with probability one. \(\square \)

The following result shows that the optimal values of the sample average approximations have a uniform exponential convergence.

Theorem 5

Suppose that the support set \({\Omega }\) is a compact set. Let \({x}^k\) be an optimal solution of (11) for each k and \({x}^*\) be an accumulation point of the sequence \(\{{x}^k\}\). Then, for every \(\varepsilon >0\), there exist positive constants \(D(\varepsilon )\) and \(\beta (\varepsilon )\), independent of \(N_k\), such that

$$\begin{aligned} {\textrm{Prob}}\left\{ |{\Theta }({x}^k)-{\Theta }(x^*)|\ge \varepsilon \right\} \le D(\varepsilon )e^{-N_k\beta (\varepsilon )}. \end{aligned}$$

Proof

We assume again without loss of generality that \(\{{x}^k\}\) itself converges to \({x}^*\). Let \(\mathcal {Z}\) be a compact set that contains the whole sequence \(\{{x}^k\}.\) Then \({x}^*\in {\mathcal {Z}}\). First, since \(\mathcal {Z}\) and \({\Omega }\) are compact sets, by the continuity of \(\textrm{F}\), there exists a constant \(c>0\) such that

$$\begin{aligned} \Vert \textrm{F}({x},\omega )\Vert \le c,~\forall ({x},\omega )\in \mathcal {Z}\times {\Omega }. \end{aligned}$$
(19)

So for every \({x}\in \mathcal {Z}\), the moment generating function \(\mathbb {E}\left[ e^{t(\Vert \textrm{F}({x},\omega )\Vert ^2-\mathbb {E}[\Vert \textrm{F}({x},\omega )\Vert ^2])}\right] \) of the random variable \(\Vert \textrm{F}({x},\omega )\Vert ^2-\mathbb {E}[ \Vert \textrm{F}({x},\omega )\Vert ^2 ]\) is finite-valued whenever t near zero. Second, for any \(({x}',\omega ),({x},\omega )\in \mathcal {Z}\times {\Omega }\), it follows from (19) that

$$\begin{aligned} |\Vert \textrm{F}({x}',\omega )\Vert ^2-\Vert \textrm{F}({x},\omega )\Vert ^2|\le & 2c|\Vert \textrm{F}({x}',\omega )\Vert -\Vert \textrm{F}({x},\omega )\Vert |\nonumber \\\le & 2c\Vert \textrm{F}({x}',\omega )-\textrm{F}({x},\omega )\Vert \nonumber \\= & 2c\Vert A(\omega )(x'-x)+B(\omega )(|x'|-|x|)\Vert \nonumber \\\le & 2c(\Vert A(\omega )\Vert +\Vert B(\omega )\Vert )\Vert {x}'-{x}\Vert . \end{aligned}$$
(20)

Let \(\kappa (\omega ):=2c(\Vert A(\omega )\Vert +\Vert B(\omega )\Vert ).\) Then \(\kappa (\omega )\) is bounded on \({\Omega }\) due to the continuity of \(A(\omega )\) and \(B(\omega )\). This ensures that \(\mathbb {E}[\kappa (\omega )]<+\infty \) and the moment generating function \(\mathbb {E}[e^{t\kappa (\omega )}]\) of the random variable \(\kappa (\omega )\) is finite-valued for all t near of zero. Thus, the conditions (C1)–(C3) assumed in [28, Theorem 5.1] are all satisfied. Therefore, according to [28, Theorem 5.1], for every \(\varepsilon >0\), there exist positive constants \(D(\varepsilon )\) and \(\beta (\varepsilon )\), independent of \(N_k\), such that

$$\begin{aligned} \text {Prob}\Bigg \{\sup _{{x}\in \mathcal {Z}}\bigg |\frac{1}{N_k}\sum _{\omega ^i\in {\Omega }_k}\Vert \textrm{F}({x},\omega ^i)\Vert ^2- \mathbb {E}[\Vert \textrm{F}({x},\omega )\Vert ^2]\bigg |\ge \varepsilon /2\Bigg \}\le \frac{1}{2}D(\varepsilon )e^{-N_k\beta (\varepsilon )}, \end{aligned}$$

i.e.,

$$\begin{aligned} \text {Prob}\left\{ \sup \limits _{{x}\in \mathcal {Z}}|{\Theta }_k({x})-{\Theta }({x})|\ge \varepsilon /2\right\} \le \frac{1}{2}D(\varepsilon )e^{-N_{k}\beta (\varepsilon )}. \end{aligned}$$
(21)

For each k, since \({x}^k\) is an optimal solution of (11), we have \({\Theta }_k({x}^k)\le {\Theta }_k({x}^*)\) and hence

$$\begin{aligned} {\Theta }_k({x}^k)-{\Theta }(x^*)= & {\Theta }_k({x}^k)-{\Theta }_k({x}^*)+{\Theta }_k({x}^*)-{\Theta }(x^*)\nonumber \\\le & {\Theta }_k({x}^*)-{\Theta }(x^*)\nonumber \\\le & \sup _{{x}\in \mathcal {Z}}|{\Theta }_k({x})-{\Theta }(x)|. \end{aligned}$$
(22)

On the other hand, by Theorem 4, \({x}^*\) is an optimal solution of (7) with probability one. So for each k, \({\Theta }(x^*)\le {\Theta }(x^k)\) with probability one and hence

$$\begin{aligned} {\Theta }_k({x}^k)-{\Theta }(x^*)= & {\Theta }_k({x}^k) -{\Theta }(x^k)+{\Theta }(x^k)-{\Theta }(x^*)\nonumber \\\ge & {\Theta }_k({x}^k)-{\Theta }(x^k)\nonumber \\\ge & -\sup _{{x}\in \mathcal {Z}}|{\Theta }_k({x})-{\Theta }(x)|. \end{aligned}$$
(23)

Combining (22) and (23) together yields

$$\begin{aligned} |{\Theta }_k({x}^k)-{\Theta }(x^*)|\le \sup _{{x}\in \mathcal {Z}}|{\Theta }_k({x})-{\Theta }(x)|. \end{aligned}$$

This together with (21) gives

$$\begin{aligned} \text {Prob}\left\{ |{\Theta }_k({x}^k)-{\Theta }(x^*)|\ge \varepsilon /2\right\}\le & \text {Prob} \left\{ \sup \limits _{{x}\in \mathcal {Z}}|{\Theta }_k({x})-{\Theta }({x})|\ge \varepsilon /2\right\} \nonumber \\\le & \frac{1}{2}D(\varepsilon )e^{-N_{k}\beta (\varepsilon )}. \end{aligned}$$
(24)

Since

$$\begin{aligned} |{\Theta }({x}^k)-{\Theta }(x^*)|\le |{\Theta }({x}^k)-{\Theta }_k(x^k)|+|{\Theta }_k({x}^k)-{\Theta }(x^*)|, \end{aligned}$$

it follows from (21) and (24) that

$$\begin{aligned} & {\textrm{Prob}}\left\{ |{\Theta }({x}^k)-{\Theta }(x^*)|\ge \varepsilon \right\} \nonumber \\\le & {\textrm{Prob}}\left\{ |{\Theta }({x}^k)-{\Theta }_k(x^k)|\ge \varepsilon /2\right\} +{\textrm{Prob}}\left\{ |{\Theta }_k({x}^k)-{\Theta }(x^*)|\ge \varepsilon /2\right\} \nonumber \\\le & \text {Prob}\left\{ \sup \limits _{{x}\in \mathcal {Z}}|{\Theta }({x})-{\Theta }_k({x})|\ge \varepsilon /2\right\} +{\textrm{Prob}}\left\{ |{\Theta }_k({x}^k)-{\Theta }(x^*)|\ge \varepsilon /2\right\} \nonumber \\\le & D(\varepsilon )e^{-N_{k}\beta (\varepsilon )}. \end{aligned}$$

This completes the proof. \(\square \)

At the end of this section, we give some discussions on the computational complexity of the Monte Carlo approximation method. Denote \(\pi (x,\omega ):=\Vert \textrm{F}({x},\omega )\Vert ^2-\mathbb {E}[\Vert \textrm{F}({x},\omega )\Vert ^2]\). We further assume that the moment generating function \(\mathbb {E}\left[ e^{t\pi (x,\omega )}\right] \) of the random variable \(\pi (x,\omega )\) satisfies the following condition: there exists \(\sigma >0\) such that for all \(x\in \mathbb {R}^n,\)

$$\begin{aligned} \mathbb {E}\left[ e^{t(\pi (x,\omega ))}\right] \le \textrm{exp}({\sigma ^2t^2/2}), \ \forall t \in \mathbb {R}. \end{aligned}$$
(25)

This condition was introduced by Shapiro and Xu [28]. It is noticed that the right side of (25) is just the moment generating function of normal distribution with a variance \(\sigma ^2\). Hence, the inequality (25) holds as an equality as \(\pi (x,\omega )\) has normal distribution. In addition, this condition can also be satisfied if \(\pi (x,\omega )\) has a subguass distribution.

From the proof process of Theorem 5, we know that the conditions (C1)–(C3) assumed in [28, Theorem 5.1] are all satisfied. Moreover, by (20) we have

$$\begin{aligned} |\mathbb {E}[\Vert \textrm{F}({x}',\omega )\Vert ^2]-\mathbb {E} [\Vert \textrm{F}({x},\omega )\Vert ^2]|\le \mathbb {E}[2c(\Vert A(\omega )\Vert +\Vert B(\omega )\Vert )]\Vert {x}'-{x}\Vert , \ \forall x',x\in \mathcal {Z}. \end{aligned}$$

Let \(L:=\mathbb {E}[2c(\Vert A(\omega )\Vert +\Vert B(\omega )\Vert )]\). Obviously, L is finite on the compact set \({\Omega }\) due to the continuity of \(A(\omega )\) and \(B(\omega )\). Denote by \(D:=\textrm{sup}_{x',x\in \mathcal {Z}}\Vert x'-x\Vert \) the diameter of \(\mathcal {Z}\). Then, according to (5.14) in [28], we know that for \(\alpha \in (0,1)\), if the sample size \(N_k\) satisfies

$$\begin{aligned} N_k\ge \frac{\mathcal {O}(1)\sigma ^2}{\varepsilon ^2}\left[ n\textrm{log}\left( \frac{\mathcal {O}(1)DL}{\varepsilon }\right) +\textrm{log}\left( \frac{2}{\alpha }\right) \right] , \end{aligned}$$
(26)

then

$$\begin{aligned} \textrm{Prob}\Bigg \{\sup _{{x}\in \mathcal {Z}}\bigg |\frac{1}{N_k}\sum _{\omega ^i\in {\Omega }_k}\Vert \textrm{F}({x},\omega ^i)\Vert ^2- \mathbb {E}[\Vert \textrm{F}({x},\omega )\Vert ^2]\bigg |\ge \varepsilon /2\Bigg \}\le \frac{\alpha }{2}, \end{aligned}$$

i.e.,

$$\begin{aligned} \text {Prob}\left\{ \sup \limits _{{x}\in \mathcal {Z}}|{\Theta }_k({x})-{\Theta }({x})|\ge \varepsilon /2\right\} \le \frac{\alpha }{2}. \end{aligned}$$

Then by following the proof of Theorem 5, we can obtain

$$\begin{aligned} {\textrm{Prob}}\left\{ |{\Theta }({x}^k)-{\Theta }(x^*)|\ge \varepsilon \right\} \le \alpha . \end{aligned}$$

This means that \(x^k\) is an \(\varepsilon \)-optimal solution with probability at least \(1-\alpha \). Hence, to obtain an \(\varepsilon \)-optimal solution, the computational complexity of Monte Carlo approximation method is \(\mathcal {O}\left( \frac{n}{\varepsilon ^2}\textrm{log}\frac{1}{\varepsilon }\right) \) by (26). For the computational complexity of Monte Carlo based methods, more research and discussions can be founded in [1, 27]

4 Robustness of ERM Problem

In this section, we discuss the robustness of solutions of the ERM problem (7). This indicates from a new perspective that ERM is an effective reformulation of SAVE. To this end, we first consider the error bound conditions for the original AVE (1). Let

$$\begin{aligned} \textrm{F}({x}):=Ax+B|x|- b. \end{aligned}$$

It is easy to see that \(\textrm{F}(x)\) is globally Lipschitz continuous and semismooth on \(\Re ^n\). Hence, \(\textrm{F}(x)\) is differentiable almost everywhere by Rademacher’s theorem. Let \(D_{\textrm{F}}\) be the set of points at which \(\textrm{F}\) is differentiable. The Clarke generalized Jacobian of \(\textrm{F}\) at x, denoted by \(\partial \textrm{F}(x)\), is defined by

$$\begin{aligned} \partial \textrm{F}(x):=\textrm{conv}\bigg \{\lim _{{x}^k\rightarrow x} \textrm{F}'({x}^k)|~ {x}^k \in D_{\textrm{F}}\bigg \}, \end{aligned}$$

where conv denotes the convex hull and \(\textrm{F}'({x})\) is the Jacobian whenever \(\textrm{F}({x})\) is differentiable at x. The mapping \(\textrm{F}({x})\) is said to be BD-regular at x if all subgradients in \(\partial \textrm{F}(x)\) are nonsingular.

Denote the solution set of AVE (1) as \({X}^*\), that is,

$$\begin{aligned} {X}^*:=\{x\in \Re ^n| Ax+B|x|=b\}. \end{aligned}$$

Assume that \({X}^*\) is nonempty. A function \(r: \Re ^n\rightarrow \Re _+\) is called a residual function of AVE (1) if \(r(x)\ge 0\) for all x and

$$\begin{aligned} r(x) = 0\Longleftrightarrow x\in {X}^*. \end{aligned}$$

With the help of residual functions, the AVE problem is transformed into finding out the roots of nonlinear equations. The property of error bound plays an important role in theoretical and numerical analysis. Recall that a residual function r has a local error bound for AVE (1) if there exist constants \(\kappa > 0\) and \(\delta >0\) such that for each \(x\in \{x\in \Re ^n|\textrm{dist}(x, {X}^*)\le \delta \}\),

$$\begin{aligned} \kappa r(x)\ge \textrm{dist}(x, {X}^*), \end{aligned}$$
(27)

where \(\textrm{dist}(x, {X}^*):=\textrm{inf}_{y\in {X}^*}\Vert x-y\Vert .\) The residual function r has a global error bound for AVE (1) if (27) holds for all \(x\in \Re ^n\).

Let

$$\begin{aligned} r(x):=\Vert Ax+B|x|- b\Vert . \end{aligned}$$
(28)

Clearly r(x) is a residual function of the AVE (1).

Lemma 2

Let r(x) be defined by (28). If \(\{A,B\}\) is an \(R_0\) pair, then the level set

$$\begin{aligned} \mathcal {L}(\theta ):=\{ {x}\in \Re ^{n}| r({x}) \le \theta \} \end{aligned}$$

is compact for any \(\theta >0.\)

Proof

By similarly following the arguments given in Theorem 3, we can prove that \(\mathcal {L}(\theta )\) is bounded for all \(\theta >0.\) This together with the continuity of r yields the desired result. \(\square \)

Theorem 6

Suppose that \(\{A,B\}\) is an \(R_0\) pair and \(\textrm{F}(x)\) is BD-regular at any \(x\in {X}^*\). Then r(x) provides a global error bound for AVE (1), that is, there exists a constant \(\xi > 0\) such that

$$\begin{aligned} \xi r(x)\ge \textrm{dist}(x, {X}^*), ~\forall \, x\in \Re ^n. \end{aligned}$$

Proof

We first show that the residual r(x) provides a local error bound for AVE (1). In fact, if this conclusion is not true, there must exist a sequence \(\{x^k\}\) such that

$$\begin{aligned} \lim \limits _{k\rightarrow \infty }\frac{ r(x^k)}{\textrm{dist}(x^k, {X}^*)}=0, \end{aligned}$$

which in turn implies \( r(x^k)\rightarrow 0\) as \(k\rightarrow \infty \). Hence there is a constant \(\theta >0\) such that \( r({x^k}) \le \theta \) for all \(k\ge 0\). By Lemma 2, \(\{x^k\}\) has an accumulation point \(x^*\), and we assume without loss of generality that \(x^k\rightarrow x^*\) as \(k\rightarrow \infty .\) Since r(x) is continuous, \( r(x^*)=0\), i.e., \(x^*\in {X}^*\). Thus

$$\begin{aligned} \lim \limits _{k\rightarrow \infty }\frac{ r(x^k)}{\Vert x^k-x^*\Vert }\le \lim \limits _{k\rightarrow \infty }\frac{ r(x^k)}{\textrm{dist}(x^k, {X}^*)}=0. \end{aligned}$$
(29)

Because \(\textrm{F}(x)\) is semismooth and BD-regular at \(x^*,\) according to [23, Proposition 3], there exist \(\varrho > 0\) and \(\delta > 0\) such that \( r(x)\ge \varrho \Vert x-x^*\Vert \) for all x satisfying \(\Vert x-x^*\Vert \le \delta \), which contradicts (29). Thus, r(x) provides a local error bound for AVE (1).

Next, we show that r(x) further provides a global error bound for AVE (1). Suppose on the contrary that for any \(k>0\), there exists \(x^k\) such that

$$\begin{aligned} {\textrm{dist}(x^k, {X}^*)}>k r(x^k). \end{aligned}$$
(30)

Choose a fixed solution \({\hat{x}}\in {X}^*\). Then

$$\begin{aligned} \Vert x^k-{\hat{x}}\Vert \ge {\textrm{dist}(x^k, {X}^*)}>k r(x^k). \end{aligned}$$
(31)

We claim that there exist \(K>0\) and \(\delta > 0\) such that \( r(x^k)>\delta \) for every \(k>K\). If not, then for any integer \(K>0\) and any \(\delta >0\), there exists \(k>K\) such that \( r(x^k)\le \delta .\) By (30) and the property of local error bound of r(x), we have

$$\begin{aligned} \frac{\delta }{k}{\textrm{dist}(x^k, {X}^*)}>\delta r(x^k) \ge \delta {\textrm{dist}(x^k, {X}^*)}. \end{aligned}$$

It implies \(\delta /k>1\), leading to a contradiction as \(k\rightarrow +\infty .\) Thus, by (31) we have \(\Vert x^k-{\hat{x}}\Vert >k\delta \) for every \(k>K\), which further yields \(\Vert x^k\Vert \rightarrow +\infty \) as \(k\rightarrow + \infty .\) Since the sequence \(\big \{\frac{{x}^k}{\Vert {x}^k\Vert }\big \}\) is bounded, we assume by taking a subsequence if necessary that \(\lim \limits _{k\rightarrow \infty }\frac{{x}^k}{\Vert {x}^k\Vert }={x}^*.\) It then follows from (31) that

$$\begin{aligned} 1&=\lim _{k\rightarrow \infty }\frac{\Vert x^k-{\hat{x}}\Vert }{\Vert x^k\Vert } \ge \lim _{k\rightarrow \infty }k\frac{ r(x^k)}{\Vert x^k\Vert } \\&=\lim _{k\rightarrow \infty }k\bigg \Vert A\frac{x^k}{\Vert x^k\Vert }+B\bigg |\frac{x^k}{\Vert x^k\Vert }\bigg |- \frac{b}{\Vert x^k\Vert }\bigg \Vert , \end{aligned}$$

which gives

$$\begin{aligned} \lim _{k\rightarrow \infty }\bigg \Vert A\frac{x^k}{\Vert x^k\Vert }+B\bigg |\frac{x^k}{\Vert x^k\Vert }\bigg |- \frac{b}{\Vert x^k\Vert }\bigg \Vert =\Vert Ax^*+B|x^*|\Vert =0, \end{aligned}$$

i.e., \(Ax^*+B|x^*|=0.\) Since \(\{A,B\}\) is an \(R_0\) pair, then \(x^*=0\), a contradiction with \(\Vert x^*\Vert =1\). This completes the proof. \(\square \)

Let us discuss the relation between the solution of the ERM problem (7) and that of (3). For a fixed \(\omega \in {\Omega }\), denote by \({X}^*(\omega )\) the solution set of the SAVE (3), that is,

$$\begin{aligned} {X}^*(\omega ):=\{x\in \Re ^n| A(\omega )x+B(\omega )|x|=b(\omega )\}. \end{aligned}$$

We assume that \({X}^*(\omega )\) is nonempty.

Theorem 7

Let \({\Omega }=\{\omega ^1,...,\omega ^N\}\). For each \(\omega \in {\Omega }\), suppose that \(\textrm{F}(x,\omega )\) defined by (6) is BD-regular at any \(x\in {X}^*(\omega )\) and \(\{A(\omega ),B(\omega )\}\) is an \(R_0\) pair. Then there is a constant \(\eta >0\) such that

$$\begin{aligned} \mathbb {E}[\textrm{dist}(x,{X}^*(\omega ))]\le \eta \sqrt{{\Theta }(x)},\quad \forall \, x\in \Re ^n. \end{aligned}$$

Proof

By Theorem 6, there exists a constant \(\eta > 0\) such that

$$\begin{aligned} \textrm{dist}(x,{X}^*(\omega ^i))\le \eta \Vert A(\omega ^i)x+B(\omega ^i)|x|- b(\omega ^i)\Vert , \ \ \forall i\in \{1,...,N\}. \end{aligned}$$

Therefore

$$\begin{aligned} \mathbb {E}[\textrm{dist}(x,{X}^*(\omega ))]&\le \eta \mathbb {E}[\Vert A(\omega )x+B(\omega )|x|- b(\omega )\Vert ]\\&\le \eta \sqrt{\mathbb {E}[\Vert A(\omega )x+B(\omega )|x|- b(\omega )\Vert ^2]}\\&=\eta \sqrt{{\Theta }(x)}, \end{aligned}$$

where the second inequality comes from the Schwarz inequality. \(\square \)

Theorem 7 indicates that, for an optimal solution \(x^*\) of ERM problem (7), it holds that

$$\begin{aligned} \mathbb {E}[\text {dist}(x^*, {X}^*(\omega ))]\le \eta \sqrt{{\Theta }(x^*)}. \end{aligned}$$
(32)

Unlike an error bound for AVE (1), the left hand side of the inequality (32) is in general positive at a solution of ERM problem (7). Nevertheless, the inequality (32) suggests that the expectation of the distance from \(x^*\) to the solution set \({X}^*(\omega )\) is bounded by the expectation residual function \({\Theta }(x^*)\). This nice property tells us that the solutions of ERM problem (7) may have a minimum sensitivity with respect to random variables in SAVE (3). In this sense, the solutions of ERM problem (7) are robust for SAVE.

In our previous paper [30], we proved that the AVE (1) is uniquely solvable for any \(b\in \Re ^{n}\) if the interval matrix \([A -|B|, A+|B|]\) is regular. For the definitions of the interval matrix and its regularity, one may refer to [21, 30] for more detailed information. Based on Theorem 7 and regularity conditions, we obtain the following result.

Theorem 8

Let \({\Omega }=\{\omega ^1,...,\omega ^N\}\). Suppose that the interval matrix \([A(\omega )-|B(\omega )|, A(\omega )+|B(\omega )|]\) is regular for each \(\omega \in {\Omega }\). Then there is a constant \(\lambda >0\) such that

$$\begin{aligned} \mathbb {E}[\Vert x-x^*(\omega )\Vert ]\le \lambda \sqrt{{\Theta }(x)},~\forall \, x\in \Re ^n, \end{aligned}$$

where \(x^*(\omega )\) denotes the unique solution of SAVE (3) for each fixed \(\omega \in {\Omega }\).

Proof

For a fixed \(\omega \in {\Omega }\), by [30, Theorem 2.2], the SAVE (3) is uniquely solvable and so \(\{A(\omega ),B(\omega )\}\) is an \(R_0\) pair. Thus, according to Theorem 7, it only needs to prove that \(\textrm{F}(x,\omega )\) defined by (6) is BD-regular at \(x^*(\omega )\). In fact, for \(\omega \in {\Omega }\), the element V in \(\partial \textrm{F}(x,\omega )\) takes the format

$$\begin{aligned} V=A(\omega )+B(\omega )D_x, \end{aligned}$$

where \(D_x=\textrm{diag}(d_x)\) with

$$\begin{aligned} (d_x)_i:=\left\{ \begin{array}{c} 1,~~~~~\text{ if }~x_i>0\\ \tau ,~~~~~\text{ if }~x_i=0\\ -1,~~~\text{ if }~x_i<0\\ \end{array} \right. \end{aligned}$$

and \(\tau \) takes any number in the interval \([-1, 1]\). Since \(diag (d_x)\in [-I,I]\), then \(B(\omega )D_x\in [-|B(\omega )|, |B(\omega )|]\) and hence \(A(\omega )+B(\omega )D_x\in [A-|B(\omega )|, A+|B(\omega )|]\). This together with the regularity of \([A(\omega )-|B(\omega )|, A(\omega )+|B(\omega )|]\) implies that \(A(\omega )+B(\omega )D_x\) is nonsingular, i.e., V is nonsingular. This completes the proof. \(\square \)

5 A Class of Special SAVE

We consider in this section the following SAVE:

$$\begin{aligned} A(\omega )x-|x| = b(\omega ). \end{aligned}$$
(33)

This SAVE is a special case of SAVE (3) by letting \(B(\omega )=-I\), see [8]. Due to the existence of absolute value, SAVE (33) is nonsmooth. However, it can be equivalently transformed into a smooth SAVE by exploiting its special structure. For \(i=1,...,n\), denote by \(a_i(\omega )^T\) the i-th row of \(A(\omega )\). Then

$$\begin{aligned} A(\omega )x-|x|=b(\omega )\Longleftrightarrow & |x|=A(\omega )x-b(\omega )\\\Longleftrightarrow & |x_i|=a_i(\omega )^Tx-b_i(\omega )\\\Longleftrightarrow & |x_i|^3=(a_i(\omega )^Tx-b_i(\omega ))^3\\\Longleftrightarrow & |x|^3=(A(\omega )x-b(\omega ))^3, \end{aligned}$$

where \(\alpha ^3:=(\alpha _1^3,...,\alpha _n^3)^T\) for \(\alpha =(\alpha _1,...,\alpha _n)^T\in \Re ^n\). Thus, solving SAVE (33) is equivalent to solving the following SAVE:

$$\begin{aligned} (A(\omega )x-b(\omega ))^3 -|x|^3=0. \end{aligned}$$
(34)

Since the function \(f(t)=|t|^3\) is twice continuously differentiable on \(\Re \) with \(f'(t)=3|t|t\) and \(f''(t)=6|t|\), SAVE (34) is smooth. Let

$$\begin{aligned} \tilde{\textrm{F}}({x},\omega ):=(A(\omega )x-b(\omega ))^3 -|x|^3. \end{aligned}$$

The corresponding ERM model for SAVE (34) takes the form

$$\begin{aligned} \min \limits _{{x}\in \Re ^{n}}\tilde{{\Theta }}({x}):={\mathbb {E}}[\Vert \tilde{\textrm{F}}({x},\omega )\Vert ^2], \end{aligned}$$
(35)

and its sample average approximation is

$$\begin{aligned} \min \limits _{{x}\in \Re ^{n}}\tilde{{\Theta }}_k({x}):=\frac{1}{N_k}\sum \limits _{\omega ^i\in {\Omega }_k}\Vert \tilde{\textrm{F}}({x},\omega ^i)\Vert ^2. \end{aligned}$$
(36)

Since the function \(\tilde{\textrm{F}}(\cdot ,\omega )\) is twice continuously differentiable on \(\Re ^n\), the problems (35) and (36) are both smooth and hence some smoothing-type algorithms are applicable. The limiting behavior of the stationary points of (36) is established as below.

Theorem 9

Let the support set \({\Omega }\) be a compact set. Assume that \(x^k\) is a stationary point of (36) and \(x^*\) is an accumulation point of \(\{x^k\}\). Then \(x^*\) is a stationary point of (35) with probability one.

Proof

For the sake of simplicity, we assume without of generality that \(\lim \limits _{k\rightarrow \infty }x^k=x^*\). Let \(\mathcal {D}\) be a compact and convex set containing the whole sequence \(\{x^k\}\). For each k,  since \(x^k\) is a stationary point of (36), then

$$\begin{aligned} \frac{2}{N_k}\sum \limits _{\omega ^i\in {\Omega }_k}\nabla _x{{\tilde{F}}}(x^k,\omega ^i){{\tilde{F}}}(x^k,\omega ^i)=0. \end{aligned}$$
(37)

By the continuity of \({{\tilde{F}}}, \nabla _x{{\tilde{F}}}\) and \(\nabla ^2_x {{\tilde{F}}}_j (j = 1,...,n)\) on the compact set \(\mathcal {D}\times {\Omega }\), there exists a constant \({M}>0\) such that

$$\begin{aligned} \Vert {{\tilde{F}}}(x,\omega )\Vert \le {M},~~\Vert \nabla _x{{\tilde{F}}}(x,\omega )\Vert \le {M}, \end{aligned}$$
(38)
$$\begin{aligned} \Vert \nabla ^2_x{{\tilde{F}}}_j(x,\omega )\Vert \le {M} (j = 1,...,n)~~~~~ \end{aligned}$$
(39)

hold for every \((x,\omega )\in \mathcal {D}\times {\Omega }.\) Therefore

$$\begin{aligned} & \bigg |\frac{1}{N_k}\sum \limits _{\omega ^i\in {\Omega }_k}\nabla _x{{\tilde{F}}}_j(x^k,\omega ^i)^T{{\tilde{F}}}(x^k,\omega ^i)- \frac{1}{N_k}\sum \limits _{\omega ^i\in {\Omega }_k}\nabla _x{{\tilde{F}}}_j(x^*,\omega ^i)^T{{\tilde{F}}}(x^*,\omega ^i)\bigg |\\\le & \frac{1}{N_k}\sum \limits _{\omega ^i\in {\Omega }_k}\Vert \nabla _x{{\tilde{F}}}_j(x^k,\omega ^i)\Vert \Vert {{\tilde{F}}}(x^k,\omega ^i)-{{\tilde{F}}}(x^*,\omega ^i)\Vert \\ & +\frac{1}{N_k}\sum \limits _{\omega ^i\in {\Omega }_k}\Vert \nabla _x{{\tilde{F}}}_j(x^k,\omega ^i)-\nabla _x{{\tilde{F}}}_j(x^*,\omega ^i)\Vert \Vert {{\tilde{F}}}(x^*,\omega ^i)\Vert \\\le & \frac{{M}}{N_k}\sum \limits _{\omega ^i\in {\Omega }_k}\int _0^1\bigg (\Vert \nabla _x{{\tilde{F}}}\big (tx^k+(1-t)x^*,\omega ^i\big )\Vert _F\\ & +\Vert \nabla _x^2{{\tilde{F}}}_j\big (tx^k+(1-t)x^*,\omega ^i\big )\Vert _F\bigg )\Vert x^k-x^*\Vert dt\\\le & 2{M}^2 \Vert x^k-x^*\Vert \\\rightarrow & 0~~\text{ as }~k\rightarrow +\infty , \end{aligned}$$

where the second inequality comes from the mean-value theorem and (38), and the third inequality follows from (38) and (39). Thus

$$\begin{aligned} & \lim \limits _{k\rightarrow \infty }\frac{2}{N_k}\sum \limits _{\omega ^i\in {\Omega }_k}\nabla _x{{\tilde{F}}}(x^k,\omega ^i){{\tilde{F}}}(x^k,\omega ^i)\nonumber \\= & \lim \limits _{k\rightarrow \infty }\frac{2}{N_k}\sum \limits _{\omega ^i\in {\Omega }_k}\nabla _x{{\tilde{F}}}(x^*,\omega ^i){{\tilde{F}}}(x^*,\omega ^i)\nonumber \\= & 2\mathbb {E}[\nabla _x{{\tilde{F}}}(x^*,\omega ){{\tilde{F}}}(x^*,\omega )]\nonumber \\= & \mathbb {E}[\nabla _x\Vert {{\tilde{F}}}(x^*,\omega )\Vert ^2],~\mathrm {w.p.1}. \end{aligned}$$
(40)

Moreover, it follows from (38) that

$$\begin{aligned} \Vert \nabla _x{{\tilde{F}}}(x,\omega ){{\tilde{F}}}(x,\omega )\Vert \le {M}^2,~~\forall \, (x,\omega )\in \mathcal {D}\times {\Omega }. \end{aligned}$$

According to [22, Theorem 16.8], we obtain

$$\begin{aligned} \lim \limits _{k\rightarrow \infty }\frac{2}{N_k}\sum \limits _{\omega ^i\in {\Omega }_k}\nabla _x{{\tilde{F}}}(x^k,\omega ^i){{\tilde{F}}}(x^k,\omega ^i)= \nabla \mathbb {E}[\Vert {{\tilde{F}}}(x^*,\omega )\Vert ^2],~\mathrm {w.p.1}. \end{aligned}$$
(41)

Then, by letting \(k\rightarrow \infty \) in (37) and taking (40) and (41) into account, we have

$$\begin{aligned} \nabla {\tilde{{\Theta }}}(x^*)=\nabla \mathbb {E}[\Vert {{\tilde{F}}}(x^*,\omega )\Vert ^2]=0,~\mathrm {w.p.1}, \end{aligned}$$

Hence \(x^*\) is a stationary point of (35) with probability one. \(\square \)

6 A Smoothing Gradient Method

In this section, we present a smoothing gradient method to solve SAVE (3) and give some numerical examples to explain effectiveness of the method. For the ERM problem (7), here we use the well-known sample average approximation (SAA) method to get the approximation of \({\Theta }({x})\) as

$$\begin{aligned} \min \limits _{{x}\in \Re ^{n}}\tilde{{\Theta }}({x}):= & \frac{1}{N}\sum _{i=1}^N\Vert \textrm{F}({x},\omega ^i)\Vert ^2\nonumber \\= & \frac{1}{N}\sum _{i=1}^N\Vert A(\omega ^i)x+B(\omega ^i)|x|- b(\omega ^i)\Vert ^2, \end{aligned}$$
(42)

where \(\omega ^i (i=1,...,N)\) is sample points generated by the given distribution. Since the function \(\textrm{F}({x},\omega )=A(\omega )x+B(\omega )|x|- b(\omega )\) is nonsmooth with respect to x, we can not apply smoothing-type methods to solve the problem (42). To overcome this difficulty, we consider the following smoothing function:

$$\begin{aligned} \textrm{F}_N({x},\omega )=A(\omega )x+B(\omega )\sqrt{x^2+1/N}- b(\omega ), \end{aligned}$$

where N is the number of sample points and

$$\begin{aligned} \sqrt{x^2+1/N}:=\bigg (\sqrt{x_1^2+1/N},...,\sqrt{x_n^2+1/N}\bigg )^T. \end{aligned}$$

Obviously, \(\textrm{F}_N({x},\omega )\) is smooth with respect to x and \(\lim \limits _{N\rightarrow \infty }\textrm{F}_N({x},\omega )=\textrm{F}({x},\omega )\) for each \(x\in \Re ^{n}\). By using \(\textrm{F}_N({x},\omega )\), we define the following smooth approximation of the problem (42):

$$\begin{aligned} \min \limits _{{x}\in \Re ^{n}}{\Theta }_N({x}):= & \frac{1}{N}\sum \limits _{i=1}^N\Vert \textrm{F}_N({x},\omega ^i)\Vert ^2\nonumber \\= & \frac{1}{N}\sum \limits _{i=1}^N\Vert A(\omega ^i)x+B(\omega ^i)\sqrt{x^2+1/N}- b(\omega ^i)\Vert ^2. \end{aligned}$$
(43)

It is easy to see that if \({x}^*\) is the global optimal solution of the problem (43), then \({x}^*\) is an approximation to the global optimization solution of the problem (42) when N is sufficiently large. Moreover, \({\Theta }_N({x})\) is smooth with respect to x and its gradient is

$$\begin{aligned} \nabla _x{\Theta }_N({x})=\frac{2}{N}\sum \limits _{i=1}^N(A(\omega ^i)+B(\omega ^i)T_x)^T(A(\omega ^i)x+B(\omega ^i)\sqrt{x^2+1/N}- b(\omega ^i)) \end{aligned}$$
(44)

with \(T_x:=\textrm{diag}\Big (\frac{x_i}{\sqrt{x_i^2+1/N}}\Big ).\) Now we present a smoothing gradient method to solve the problem (43).

Algorithm A:

(A smoothing gradient method)

Step 0 :

Given \(x^0\in \Re ^n, \sigma , \rho \in (0,1)\) and \(\epsilon > 0.\) Set \(k:=0.\)

Step 1 :

If \(\Vert \nabla _{x}{\Theta }_N({x}^k)\Vert \le \epsilon \), then stop.

Step 2 :

Compute the search direction \(d_k=-\nabla _x{\Theta }_N({x}^k).\)

Step 3 :

Let the step-size \(\alpha _k:=\rho ^{l_k}\), where \(l_k\) is the smallest nonnegative integer l satisfying

$$\begin{aligned} {\Theta }_N(x^k+ \rho ^{l}d_k)\le (1-\sigma \rho ^{2l}) {\Theta }_N(x^k). \end{aligned}$$
Step 4 :

Set \(x_{k+1}:= x_k+\alpha _kd_k.\) Set \(k:=k+1\) and go to Step l.

Theorem 10

For any positive integer N, let \(\{x^k\}\) be the iteration sequence generated by Algorithm A. Then any accumulation point \(x^*\) of \(\{x^k\}\) satisfies \(\nabla _{x}{\Theta }_N({x}^*)= 0.\)

Proof

Since \(x^*\) is an accumulation point of \(\{x^k\}\), there exists an infinite subsequence \(\{x^k\}_{k\in K}\subset \{x^k\}\) such that \(\lim \limits _{k\in K,k\rightarrow \infty } x^k=x^*\). Since \(\{ {\Theta }_N(x^k)\}\) is monotonically decreasing, it is convergent and so \(\lim \limits _{k\rightarrow \infty } {\Theta }_N(x^k)={\Theta }_N({x}^*).\) Note that if \({\Theta }_N({x}^*)=0\), then by (43) we have for \(i=1,...,N,\)

$$\begin{aligned} A(\omega ^i)x^*+B(\omega ^i)\sqrt{(x^*)^2+1/N}- b(\omega ^i)=0, \end{aligned}$$

which together with (44) gives \(\nabla _{x}{\Theta }_N({x}^*)= 0.\) Now we assume \({\Theta }_N({x}^*)>0\). Since

$$\begin{aligned} {\Theta }_N(x^{k+1})\le (1-\sigma \alpha _k^2) {\Theta }_N(x^k), \end{aligned}$$

it follows from \(\lim \limits _{k\rightarrow \infty } {\Theta }_N(x^k)={\Theta }_N({x}^*)>0\) that \(\lim \limits _{k\rightarrow \infty } \alpha _k=0\). Moreover, by the line search criterion in Step 3, we have for all \(k\in {K}\),

$$\begin{aligned} {\Theta }_N(x^k+\rho ^{-1}\alpha _k d_k)> (1-\sigma (\rho ^{-1}\alpha _k)^2) {\Theta }_N(x^k), \end{aligned}$$

i.e.,

$$\begin{aligned} \frac{ {\Theta }_N(x^k+\rho ^{-1}\alpha _k d_k)- {\Theta }_N(x^k)}{\rho ^{-1}\alpha _k}>- \sigma \rho ^{-1}\alpha _k {\Theta }_N(x^k). \end{aligned}$$

Since \( {\Theta }_N(x)\) is continuously differentiable everywhere, by letting \(k\rightarrow \infty \) with \(k\in {K}\) on both sides of the above inequality, we have \(\nabla _{x}{\Theta }_N({x}^*)^{T}d^{*}\ge 0,\) where \(d^*=-\nabla _{x}{\Theta }_N({x}^*)\). It follows that \(-\Vert \nabla _{x}{\Theta }_N({x}^*)\Vert ^2\ge 0\) and so \(\nabla _{x}{\Theta }_N({x}^*)=0.\) We complete the proof. \(\square \)

In the following, we give some numerical examples to explain effectiveness of Algorithm A. All experiments are carried on a PC with CPU of 12th Gen Intel(R) Core(TM) i9-12,900 2.40 GHz and RAM of 64.0GB. The program codes are written in MATLAB and run in MATLAB R2018a environment. In our experiments, we choose \(\rho = 0.2,~\sigma = 10^{-3},~\epsilon =10^{-5}\). For the purpose of comparison, we also apply the smoothing gradient method proposed by Du [8] to solve these examples. The main difference between two methods is that for the generated sampling \(\{\omega ^1,...,\omega ^{N}\}\), our method takes the smooth parameter \(\mu =\frac{1}{N}\) which is fixed, while Du’s method [8] takes the smooth parameter \(\mu =\mu _k\) which is updated in every iteration.

In the following tables, N-SGM denotes Algorithm A, SGM denotes the smoothing gradient method proposed in [8], N is the number of sample points, IT denotes the iteration numbers, CPU denotes the CPU time in seconds and \(x^k\) denotes the obtained solution when the algorithms terminate.

Example 1

Consider SAVE (3), in which

$$\begin{aligned} A(\omega )=\left( \begin{array}{cccccc} 2+\omega & 1\\ 2& 1+\omega \\ \end{array} \right) ,~~ B(\omega )=\left( \begin{array}{cccccc} 3+\omega & -1\\ 0& 2\omega \\ \end{array} \right) ,~~b(\omega )=\left( \begin{array}{cccccc} 5+2\omega \\ 4+6\omega \\ \end{array} \right) . \end{aligned}$$

This SAVE has a solution \(x^*=(1, 2)^T\). We generate samples \(\omega ^i, i = 1, 2,...,N,\) which obey the the uniform distribution of [0, 1]. We take \(x^0=\textrm{rand}(2,1)\) as the starting point. Numerical results are given in Table 1.

Table 1 Numerical results of Example 1

Example 2

Consider SAVE (3), in which \(\omega =(\omega _1,\omega _2,\omega _3,\omega _4)^T\) and

$$\begin{aligned} A(\omega )=\left( \begin{array}{cccccc} 1+\omega _1& 1& 0& 0\\ -1& 2+\omega _2& 1& 0\\ 0& -1& 3+\omega _3& 1\\ 0& 0& -1& 4+\omega _4\\ \end{array} \right) , \\ B(\omega )=\left( \begin{array}{cccccc} \omega _1& 0& 0& 0\\ 0& \omega _2& 0& 0\\ 0& 0& \omega _3& 0\\ 0& 0& 0& \omega _4\\ \end{array} \right) ,~~b(\omega )=\left( \begin{array}{cccccc} 2\omega _1+2\\ 2\omega _2+2\\ 2\omega _3+3\\ 2\omega _4+3\\ \end{array} \right) , \end{aligned}$$

where \(\omega _1\) and \(\omega _2\) are distributed uniformly, \(\omega _3\) is distributed exponentially, \(\omega _4\) is distributed normally, respectively, with the following parameters:

$$\begin{aligned}\omega _1\sim U(-0.5,0.5),~\omega _2\sim U(-1,1),~\omega _3\sim Exp(1.5),~\omega _4\sim N(0,5).\end{aligned}$$

This SAVE has a solution \(x^*=(1, 1, 1, 1)^T\). We take \(x^0=\textrm{rand}(4,1)\) as the starting point. Numerical results are given in Table 2.

Table 2 Numerical results of Example 2

Example 3

Consider SLCP (4), in which \(\omega =(\omega _1,\omega _2,\omega _3,\omega _4)^T\) and

$$\begin{aligned} M(\omega )=\left( \begin{array}{cccccc} \omega _1& 2& 2& 2\\ 0& \omega _2& 2& 2\\ 0& 0& \omega _3& 2\\ 0& 0& 0& \omega _4\\ \end{array} \right) ,~~ q(\omega )=\left( \begin{array}{cccccc} -1\\ -1\\ -1\\ -\omega _4\\ \end{array} \right) , \end{aligned}$$

where \(\omega _1\) is distributed uniformly, \(\omega _2\) is distributed exponentially, \(\omega _3\) and \(\omega _4\) are distributed normally, respectively, with the following parameters:

$$\begin{aligned} \omega _1\sim U(-1,1),~\omega _2\sim Exp(2),~\omega _3\sim N(0,1),~\omega _4\sim N(0,2). \end{aligned}$$

This SLCP has the solution \(u^*=(0,0,0,1)^T\) and \(v^*=(1,1,1,0)^T\). By Remark 1, to obtain the solution of this SLCP, we may solve the SAVE (3) with

$$\begin{aligned} A(\omega )=I+M(\omega ),~~B(\omega )=I-M(\omega ),~~b(\omega )=2q(\omega ). \end{aligned}$$
(45)

We take \(x^0=\textrm{rand}(4,1)\) as the starting point and the numerical results are given in Table 3.

Table 3 Numerical results of Example 3

Example 4

Consider SLCP (4), in which \(\omega =(\omega _1,\omega _2,\omega _3,\omega _4,\omega _5,\omega _6)^T\) and

$$\begin{aligned} M(\omega )=\left( \begin{array}{cccccc} 1& 1& \omega _1& \omega _2\\ 1& 1& \omega _3& \omega _4\\ 2-\omega _5& 0& 2& 0\\ 0& 2-\omega _6& 0& 2\\ \end{array} \right) ,~~ q(\omega )=\left( \begin{array}{cccccc} -2\\ -2\\ \omega _5\\ \omega _6\\ \end{array} \right) , \end{aligned}$$

where \(\omega _1\) and \(\omega _2\) are distributed uniformly, \(\omega _3\) is distributed exponentially, \(\omega _4\), \(\omega _5\) and \(\omega _6\) are distributed normally, respectively, with the following parameters:

$$\begin{aligned} \omega _1\sim U(-0.5,0.5),~\omega _2\sim U(-1,1),~\omega _3\sim Exp(2), \\ \omega _4\sim N(0,2),~\omega _5\sim N(0,4),~\omega _6\sim N(0,6). \end{aligned}$$

This SLCP has the solution \(u^*=(1,1,0,0)^T\) and \(v^*=(0,0,2,2)^T\). Same as Example 3, we solve the SAVE (3) with

$$\begin{aligned} A(\omega )=I+M(\omega ),~~B(\omega )=I-M(\omega ),~~b(\omega )=2q(\omega ). \end{aligned}$$

We take \(x^0=(1.1,1,1)^T\) as the starting point and the numerical results are given in Table 4.

Table 4 Numerical results of Example 4

From numerical results listed in Tables 1, 2, 3 and 4, we may observe that, as the sample size N increases, both N-SGM and SGM converge stably, that is, there is clearly convergent trend toward the solutions of the test problems. Moreover, we may see that N-SGM has some advantages over SGM because the former converges faster as the sample size N increases.

7 Conclusions

We have established the relationship between the SAVE (3) and the SLCP (4). We have presented the ERM formulation (7) for SAVE and have given its Monte Carlo approximation. Some results related to existence of solutions, exponential convergence rate and robustness of solutions have been derived. In particular, we have discussed a class of special SAVE (33) and have obtained a result related to convergence of stationary points. Finally, we have proposed a smoothing gradient method to solve SAVE and have given some numerical examples to explain its effectiveness. Although Guass/Subguass distribution satisfies the condition (25), it is worth further studying the computational complexity of Monte Carlo approximation method when the underlying distribution of random vector is not Guass/Subguass or even unknown in advance. Hence how to establish the computational complexity of Monte Carlo approximation method for SAVE under weaker conditions is an important but challenging topic and we prefer to leave this as our future research topic.