1 Introduction

All stochastic variational models involve inherently a “dynamic” component that takes into account decisions taken over time, or/and space, where the decisions depend on the information that will become available as the decision process evolves. So far, the models proposed for stochastic variational inequalities have either bypassed or not made explicit this particular feature(s). Various “stochastic” extensions of variational inequalities have been proposed in the literature but so far relatively little concern has been paid to the ‘dynamics’ of the decision, or solution, process that is germane to all stochastic variational problems: stochastic programs, stochastic optimal control, stochastic equilibrium models in economics or finance, stochastic games, and so on. The “dynamics” of the model considered here are of limited scope. What is essential is that it makes a distinction between two families of variables: (i) those that are of the “here-and-now” type and cannot depend on the outcome of random events to be revealed at some future time or place and (ii) those that are allowed to depend on these outcomes. Our restriction to the two-stage model allows for a more detailed exposition and analysis as well as the development of computational guidelines, implemented here in a specific instance. By empathy with the terminology used for stochastic programming models, one might be tempted to refer to such a class of problems as stochastic variational inequalities with recourse but, as we shall see from the formulation and examples, that would not quite catch the full nature of the variables, mostly because the decision-variables aren’t necessarily chosen sequentially. We shall refer to this “short horizon” version as a two-stage stochastic variational inequality. In principle, the generalization to a multistage model is not challenging, at least conceptually, notwithstanding that a careful description of the model might become rather delicate, involved and technically, not completely trivial; a broad view of multistage models, as well as some of their canonical features, is provided in [59].

We consider the two-stage stochastic variational inequality: Given the (induced) probability space \((\Xi \subset {\mathbb {R}}^N ,\mathcal{A},P)\), find a pair \(\big (x \in {\mathbb {R}}^{n_1}, u: \Xi \rightarrow {\mathbb {R}}^{n_2} \,\mathcal{A}\)-measurable\(\big ),\) such that the following collection of variational inequalities is satisfied:Footnote 1

$$\begin{aligned} -\mathbb {E}[ G\big ({\varvec{\xi }}, x, u_{\pmb {\xi }}\big )] \in \;&N_{D}(x) \\ -F\big ({\varvec{\xi }}, x, u_{\pmb {\xi }}\big ) \in _{a.s.}&N_{C_{\pmb {\xi }}}\big (H(x,u_{\pmb {\xi }})\big ) \end{aligned}$$

with

  • \(G:(\Xi ,{\mathbb {R}}^{n_1}, {\mathbb {R}}^{n_2}) \rightarrow {\mathbb {R}}^{n_1}\) a vector-valued function, continuous with respect to (xu) for all \(\xi \in \Xi \), \(\mathcal{A}\)-measurable and integrable with respect to \(\xi \).

  • \(N_{D}(x)\) the normal cone to the nonempty closed-convex set \(D\subset {\mathbb {R}}^{n_1}\) at \(x \in {\mathbb {R}}^{n_1}.\)

  • \(F:(\Xi ,{\mathbb {R}}^{n_1}, {\mathbb {R}}^{n_2}) \rightarrow {\mathbb {R}}^{n_2}\) a vector-valued function, continuous with respect to (xu) for all \(\xi \in \Xi \) and \(\mathcal{A}\)-measurable with respect to \(\xi \).

  • \(N_{C_\xi }\big (v \big )\) the normal cone to the nonempty closed-convex set \(C_\xi \subset {\mathbb {R}}^{n_2}\) at \(v \in {\mathbb {R}}^{n_2}\), the random set \(C_{\pmb {\xi }}\) is \(\mathcal{A}\)-measurable.

  • \(H: ({\mathbb {R}}^{n_1}, {\mathbb {R}}^{n_2}) \rightarrow {\mathbb {R}}^{n_2}\) a continuous vector-valued function.

The definition of the normal cone yields the following, somewhat more explicit, but equivalent formulation:

$$\begin{aligned}&\text {find } \bar{x}\in D \text { and } \bar{u}: \Xi \rightarrow {\mathbb {R}}^{n_2}, \;\mathcal{A}\text {-measurable, such that } H(\bar{x}, \bar{u}_{\pmb {\xi }}) \in _{a.s.} C_{\pmb {\xi }}\text { and}\\&\quad \langle \mathbb {E}[G({\varvec{\xi }}, \bar{x}, \bar{u}_{\pmb {\xi }})], x - \bar{x} \rangle \ge 0, \; \forall \, x\in D,\\&\quad \langle F({\varvec{\xi }}, \bar{x}, \bar{u}_{\pmb {\xi }}), v-H(\bar{x},\bar{u}_{\pmb {\xi }}) \rangle \ge 0, \;\; \forall v \in C_{\pmb {\xi }}, \;\; P{\hbox { -}}a.s.\end{aligned}$$

The model assumes that the uncertainty can be described by a random vector \({\varvec{\xi }}\) with known distribution P and a two-stage decision process: (i) x to be chosen before the value \(\xi \) of \({\varvec{\xi }}\) is revealed (observed) and (ii) u to be selected with full knowledgeFootnote 2 of the realization \(\xi \). From the decision-maker’s viewpoint, the problem can be viewed as choosing a pair \((x,\xi \mapsto u_\xi )\) where u depends on the events that might occur, or in other words, is \(\mathcal{A}\)-measurable. It is noteworthy that in our formulation of the stochastic variational inequality this pair is present, in one way or another, in each one of our examples. We find it advantageous to work here with this slightly more explicit model where the first collection of inclusions suggest the presence of (expected) equilibrium constraints; in [59], taking a somewhat more comprehensive approach, it is shown how these “equilibrium constraints” can be incorporated in a global, possibly more familiar, variational inequality of the same type as the second inclusion.

This paper is organized as follows. In Sect. 2, we devote a review on some fundamental examples and applications that are special cases of this model. In the last two examples in Sect. 2, we concentrate our attention on the stochastic traffic flow problems with the accent being placed on getting implementable solutions. To do this we show how an alternative formulation, based on the Expected Residual Minimization (ERM) might actually be more adaptable to coming up with solutions that are of immediate interest to the initial design or capacity expansion of traffic networks. In Sect. 3, we develop the basic properties of such a model, lay out the theory to justify deriving solutions via a sample average approximation approach in Sect. 4 and finally, in Sect. 5, describe an algorithm procedure based on the Douglas–Rachford splitting method which is then illustrated by numerical experimentation involving a couple of “classical-test” networks. One of the basic goals of this article was to delineate the relationships between various formulations of stochastic variational inequalities as well as to show how the solution-type desired might also lead us to work with some variants that might not fit perfectly the canonical model.

2 Stochastic variational inequalities: examples

When \({\varvec{\xi }}\) is discretely distributed with finite support, i.e., \(\Xi \) is finite, one can write the problem as:

When expressed in this form, we are just dealing with a, possibly extremely large, deterministic variational inequality over the set \(X\times \Xi \). How large will depend on the cardinality \(|\Xi |\) of the support and this immediately raises the question of the design of solution procedures for large scale variational inequalities. The difficulty in finding a solution might also depend on how challenging it is to compute \(\mathbb {E}[G({\varvec{\xi }},x,u_\xi )]\), even when G doesn’t depend on u.

Of course, both the choice of x and the realization \(\xi \) of the random components of the problem influence the ‘upcoming’ environment so we can also think as the state of the system being determined by the pair \((\xi ,x)\) and occasionally it will be convenient, mostly for technical reasons, to view the u-decision as a function of the state, i.e., \((\xi , x) \mapsto u(\xi ,x)\), cf. Sect. 3.

On the other hand, various generalizations are possible:

  1. (a)

    One could also have D depend on \({\varvec{\xi }}\) and x, in which case we are dealing with a random convex-valued mapping: and one would, usually, specify the continuity properties of D with respect to \({\varvec{\xi }}\) and x; the analysis then enters the domain of stochastic generalized equations and can get rather touchy [45, 46, 63]. Here, we restrict our analysis to the case when D is independent of \({\varvec{\xi }}\) and x.

  2. (b)

    Another extension is the case when there are further global type constraints on the choice of the functions u, for example, constraints involving \(\mathbb {E}[u_\xi ]\) or equilibrium-type constraints [9, 47].

  3. (c)

    The formulation can be generalized to a two-stage stochastic quasi variational inequality or generalized Nash equilibrium. For example the second-stage variational inequality problem can be defined as follows:

    $$\begin{aligned} -F\big ({\varvec{\xi }}, x, u_{\pmb {\xi }}\big ) \in _{a.s.} \;N_{C_{\pmb {\xi }}(x)}(H(x,u_{\pmb {\xi }})), \end{aligned}$$

    where the set \(C_{\pmb {\xi }}\) depends on x [21]. However, the analysis of the generalization is not trivial and deserves a separate analysis. In this paper, we restrict our analysis to the case when \(C_{\pmb {\xi }}\) is independent of x.

In order to illustrate the profusion of problems that are covered by this formulation we are going to go through a series of examples including some that are presently understood, in the literature, to fall under the “stochastic variational inequality” heading.

2.1 One-stage examples

Example 2.1

Single-stage problems. This is by all means the class of problems that, so far, has attracted the major interest in the literature. In terms of our notation, it reads

$$\begin{aligned} \text {find}\;\bar{x} \in D \subset {\mathbb {R}}^n \mathop {\mathrm{\;such\; that\;}}\nolimits -\mathbb {E}[G({\varvec{\xi }},\bar{x})] \in N_D(\bar{x}), \end{aligned}$$

where D is a closed convex set, possibly bounded and often polyhedral, for all \(x\in D\), the vector-valued mapping \(\xi \mapsto G(\xi ,x) \in {\mathbb {R}}^n\) is integrable and \(x \mapsto \mathbb {E}[G({\varvec{\xi }},x)]\) is continuous. Especially in the design of solution procedures, it is convenient to rely on the alternative formulation,

$$\begin{aligned} \text {find}\;\bar{x} \in D \subset {\mathbb {R}}^n \mathop {\mathrm{\;such\; that\;}}\nolimits \;\forall \, x\in D, \langle \mathbb {E}[G({\varvec{\xi }},\bar{x})], x - \bar{x} \rangle \ge 0. \end{aligned}$$

Detail. It’s only with some hesitation that one should refer to this model as a “stochastic variational equality.” Although, certain parameters of the problem are random variables and the solution will have to take this into account, this is essentially a deterministic variational inequality with the added twist that evaluating \(\mathbb {E}[G({\varvec{\xi }},x)]\), usually a multidimensional integral, requires relying on quadrature approximation schemes. Typically, P is then approximated by a discrete distribution with finite support, obtained via a cleverly designed approximation scheme or as the empirical distribution derived from a sufficiently large sample. So far, no cleverly designed approximation scheme has been proposed although the approach used by Pennanen and Koivu in the stochastic programming context might also prove to be effective [48, 49] in this situation. To a large extent the work has been focused on deriving convergence of the solutions of approximating variational inequalities where P has been replaced by a sequence of empirical probability measures \(P^\nu \), generated from independent samples of \({\varvec{\xi }}\): \(\xi ^1, \xi ^2, \ldots .,\xi ^\nu \). The approximating problem:

$$\begin{aligned} \text {find}\;x^\nu \in D \subset {\mathbb {R}}^n \mathop {\mathrm{\;such\; that\;}}\nolimits - \nu ^{-1} \sum \limits _{k=1}^\nu G(\xi ^k,x^\nu ) \in N_D(x^\nu ). \end{aligned}$$

Two basic questions then become:

  1. (i)

    Will the solutions \(x^\nu \) converge to a solution of the given problem when the number of samples gets arbitrarily large?

  2. (ii)

    Can one find bounds that characterize the error, i.e., can one measure the distance between \(x^\nu \) and the (set of) solution(s) to the given variational inequality?

There is a non-negligible literature devoted to these questions which provides satisfactory answers under not too stringent additional restrictions, cf. [5, 25, 28, 30,31,32, 35, 38, 40, 41, 61, 66]. \(\square \)

In [27] Gürkan, Özge and Robinson rely on solving a variational inequality of this type to price an American call option with strike price K for an instrument paying a dividend d at some time \(\tau \le T\). The expiration date comes down to calculating the expectation of the option-value based on whether the option is exercised, or not, at time \(\tau ^{\scriptscriptstyle -}\) just before the price of the (underlying) instrument drops by d which otherwise follows a geometric Brownian motion, cf. [27, Section 3] and the references therein. The authors rely on the first order optimality conditions, generating a one-stage stochastic variational inequality where the vector-valued mapping \(G(\xi , \cdot )\) corresponds to the gradient of the option-value along a particular price-path. It turns out that contrary to an approach based on a sample-path average approximation of these step functions [27, Figure 1] the sample average approximation of their gradients is quite well-behaved [27, Figure 2]. \(\square \)

Example 2.2

Stochastic linear complementarity problem. The stochastic linear complementarity version of the one-stage stochastic variational inequality,

$$\begin{aligned} 0 \le _{a.s.} M_\xi x + q_\xi \;\perp _{a.s.}\; x \ge 0, \end{aligned}$$

where some or all the elements of the matrix M and vector q are potentially stochastic, was analyzed by Chen and Fukushima [8] suggesting in the process a solution procedure based on “ERM: Expected Residual Minimization.”

Details. Their residual function can be regarded as a relative of the gap function used by Facchinei and Pang to solve deterministic variational inequalities [21]. More specifically,

$$\begin{aligned} \mathbb {E}[\Vert \Phi ({\varvec{\xi }}, x)\Vert ^2] \;\text {with} \; \Phi (\xi ,x) = \begin{pmatrix} \varphi \big ((M_\xi x + q_{\xi })_1, x_1\big ) \\ \vdots \\ \varphi \big ((M_\xi x + q_\xi )_n, x_n\big ) \end{pmatrix} \end{aligned}$$

is minimized for \(x\in {\mathbb {R}}^n_{\scriptscriptstyle +}\), where \(\varphi : {\mathbb {R}}^2 \rightarrow {\mathbb {R}}\) is such that \(\varphi (a,b) = 0\) if and only if \((a,b)\in {\mathbb {R}}^2_+\), \(ab = 0\); for example, the min-function \(\varphi (a,b) = \min \big \lbrace a,b \big \rbrace \) and the Fisher–Burmeister function \(\varphi (a,b) = (a+b) - \sqrt{a^2 + b^2}\) satisfy these conditions. Quite sharp existence results are obtained, in particular, when only the vector q is stochastic. They are complemented by convergence result when the probability measure P is replaced by discrete empirical measures generated via (independent) sampling. The convergence of the stationary points is raised in [8, Remark 3.1].

The stochastic linear complementarity problem gets recast as a (stochastic) optimization problem where one is confronted with an objective function defined as a multi-dimensional integral. Convergence of the solutions of the discretized problems can then be derived by appealing to the law of large numbers for random lsc (lower semicontinuous) functions [2, 37].

The solution provided by the ERM-reformulated problem

$$\begin{aligned} \min _{x\ge 0} \mathbb {E}[\Vert \Phi ({\varvec{\xi }}, x)\Vert ^2] \end{aligned}$$

doesn’t, strictly speaking, solve the originally formulated complementarity problem for every realization \(\xi \); this could only occur if the optimal value turns out to be 0, or equivalently, if one could find a solution that satisfies for (almost) all \(\xi \), the system \(0 \le _{a.s.} M_\xi x + q_\xi \;\perp _{a.s.}\; x \ge 0\).

The residual function \(\Vert \Phi (\xi , x)\Vert \) can be considered as a cost function which measures the loss at the event \(\xi \) and decision x. The ERM formulation minimizes the expected values of the loss for all possible occurrences due to failure of the equilibrium. Recently Xie and Shanbhag construct tractable robust counterparts as an alternative way to using the ERM approach [67]. \(\square \)

2.2 Two-stage examples

Our first two examples in this subsection haven’t been considered, so far, in the literature but in some ways best motivates our formulation, cf. Sect. 1, in the same way that optimality conditions for (deterministic) linear and nonlinear optimization problems lead us to a rich class of variational inequalities [21].

Example 2.3

(Optimality conditions for a stochastic program) Not unexpectedly, the optimality conditions for a stochastic program with recourse (two-stage) lead immediately to a two-stage stochastic variational inequality. Here, we only develop this for a well-behaved simple (linear) recourse problem; to deal with more general formulations one has to rely on the full duality theory developed in [53, 59]. The class of stochastic programs considered are of the following (classical) type:

$$\begin{aligned} \min \; \langle c,x^1 \rangle + \mathbb {E}[Q({\varvec{\xi }},x^1)] \quad \text{ subject } \text{ to } \quad Ax^1 \ge b, \;x^1\ge 0, \end{aligned}$$

with

$$\begin{aligned} Q(\xi ,x^1) = \inf \big \lbrace \langle q_\xi ,x^2 \rangle {\,\big \vert \,}W_\xi x^2 \ge d_\xi - T_\xi x^1, \; x^2\ge 0 \big \rbrace , \end{aligned}$$

where the matrices and vectors subscripted by \(\xi \) indicate that they depend on the realization of a (global) random variable \({\varvec{\xi }}\) with support \(\Xi \); for any fixed \(\xi \): \(x^1\in {\mathbb {R}}^{n_1}, x^2_\xi \in {\mathbb {R}}^{n_2}, A\in {\mathbb {R}}^{m_1\times n_1}, b\in {\mathbb {R}}^{m_1}, q_\xi \in {\mathbb {R}}^{n_2}, W_\xi \in {\mathbb {R}}^{m_2\times n_2}, d_\xi \in {\mathbb {R}}^{m_2}\) and \(T_\xi \in {\mathbb {R}}^{m_2\times n_1}.\)

Detail. Let’s assume, relatively complete recourse: for all \(x^1\) satisfying the (explicit) constraints \(C^1 = \big \lbrace x^1\in {\mathbb {R}}^{n_1} {\,\big \vert \,}Ax^1 \ge b, \,x^1\ge 0 \big \rbrace \) and all \(\xi \in \Xi \), one can always find a feasible recourse \(x^2_\xi \), i.e., \(Q(\xi ,x^1) < \infty \) on \(\Xi \times C^1\), and

Strict feasibility: for some \(\varepsilon >0\), arbitrarily small, one can find \(x^1\in C^1\) and \(\tilde{x}^2\in \mathcal{L}_{n_2}^{\scriptscriptstyle \infty }\) such that \(\tilde{x}^2 \ge _{a.s.} 0\) and for almost all \(\xi \): \(W_\xi \tilde{x}^2_\xi > \tilde{\varepsilon }+ d_\xi - T_\xi x^1\), where \(\tilde{\varepsilon }\) is simply an \(m_2\) dimensional vector of \(\varepsilon \)’s.

For a problem of this type, \(\big (\bar{x}^1, \bar{x}^2)\in {\mathbb {R}}^{n_1}_{\scriptscriptstyle +}\times \mathcal{L}^{\scriptscriptstyle \infty }_{n_2,{\scriptscriptstyle +}}\) is an optimal solution [52, 54] if and only if

  1. (a)

    it is feasible, i.e., satisfies the constraints,

  2. (b)

    \(\exists \,\) multipliers \((y^1\in {\mathbb {R}}^{m_1}, y^2\in \mathcal{L}^1_{m_2})\) such that

    $$\begin{aligned} 0 \le y^1 \perp A\bar{x}^1 -b \ge 0, \quad 0 \le _{a.s.} y^2_\xi \perp _{a.s.} T_\xi \bar{x}^1 + W_\xi \bar{x}^2_\xi -d_\xi \ge _{a.s.} 0, \end{aligned}$$
  3. (c)

    \(\big (\bar{x}^1, \bar{x}^2)\) minimize

    $$\begin{aligned} \mathbb {E}[\langle c - A^\top y^1 - T_\xi ^\top y^2_\xi , x^1 \rangle + \langle q_\xi - W_\xi ^\top y^2_\xi , x^2_\xi \rangle ] \end{aligned}$$

    for \(x^1\in {\mathbb {R}}^{n_1}_{\scriptscriptstyle +}\) and \(x^2 \in \mathcal{L}^{\scriptscriptstyle \infty }_{n_2,{\scriptscriptstyle +}}\).

This means that under these assumptions, the double pairs

$$\begin{aligned} \big (x^1, x^2), \;\big (y^1,y^2) \in \big ({\mathbb {R}}_+^{n_1} \times \mathcal{L}^{\scriptscriptstyle \infty }_{n_2,{\scriptscriptstyle +}} \big ) \times \big ({\mathbb {R}}_+^{m_1}\times \mathcal{L}^1_{m_2,{\scriptscriptstyle +}}\big ) \end{aligned}$$

must satisfy the stochastic variational inequality:

$$\begin{aligned} 0 \le y^1&\perp Ax^1 -b \ge 0,\\ 0 \le x^1&\perp c- A^\top y^1 - \mathbb {E}[T_\xi ^\top y^2_\xi ] \ge 0 \end{aligned}$$

and

$$\begin{aligned} 0\le _{a.s.} y^2_\xi&\perp _{a.s.} T_\xi x^1+W_\xi x^2_\xi -d_\xi \ge _{a.s.} 0, \\ 0\le _{a.s.} x^2_\xi&\perp _{a.s.} q_\xi - W_\xi ^\top y^2_\xi \ge _{a.s.} 0. \end{aligned}$$

In terms of our general formulation in Sect. 1, the first pair of inequalities define the function G and the set \(D={\mathbb {R}}^{n_1}_{\scriptscriptstyle +}\times {\mathbb {R}}^{m_1}_{\scriptscriptstyle +}\) whereas the second pair define F and the random convex set \(C_{\pmb {\xi }}\) with \((x^1, y^1)\) corresponding to x and \((x^2, y^2)\) corresponding to u; \(C_\xi ={\mathbb {R}}^{n_2}_{\scriptscriptstyle +}\times {\mathbb {R}}^{m_2}_{\scriptscriptstyle +}\). Of course, this can also be viewed as a stochastic complementarity problem albeit, in general, an infinite dimensional one. When, the probability distribution P has finite support, one can remove the “\(a.s.\)” in the second pair of inequalities and it’s a problem involving only a finite number of variables and inequalities but this finite number might be truly considerable. \(\square \)

Example 2.4

A Walras equilibrium problem. In some ways, this example is an extension of the preceding one except that it doesn’t lead to a stochastic complementarity problem but to a stochastic variational inequality that might not have the wished-for monotonicity properties for \(\mathbb {E}[G({\varvec{\xi }},\cdot )]\) and F.

Detail. We consider a stripped down version of the GEI-model, (General Equilibrium with Incomplete Markets), but even this model has a variety of immediate applications in finance, international commodity trading, species interaction in ecological models, \(\dots \). The major difference with the extensive (economic) literature devoted to the GEI-model is the absence of a so-called financial market that allows agents to enter into contracts involving the delivery of goods at some future date.

Again, \({\varvec{\xi }}\) provides the description of the uncertainty about future events. We are dealing with a finite number of (individual) agents \(i\in \mathcal{I}\). Each agent, endowed with vectors of goods \(e_i^1\in {\mathbb {R}}^L\) (here-and-now) and \(e_{\xi ,i}^2\) (in the future), choose its consumption plan, \(c^{1\star }\) here-and-now and \(c^{2\star }_\xi \) after observing \(\xi \), so as to maximize their expected utilities \(v^1_i(c^1)+\mathbb {E}[v_i^2(c^2_\xi )]\), where the utility functions \((v_i^1,v_i^2)\) are continuous, concave functions in \((c^1,c^2)\) on closed convex sets \(C_i^1 \subset {\mathbb {R}}^{n_1}_{\scriptscriptstyle +}\) and \(C_i^2 \subset {\mathbb {R}}^{n_2}_{\scriptscriptstyle +}\) respectively and \(v_i^2\) is \(\mathcal{A}\)-measurable with respect to \(\xi \). One often refers to \(C^1_i\) and \(C_i^2\) as agent-i’s survival sets; in some situations it would be appropriate to let \(C^2_i\) also depend on \(\xi \), this wouldn’t affect significantly our ensuing development. Each agent can also engage in here-and-now activities \(y\in {\mathbb {R}}^{m_i}\) that will use up a vector of goods \(T^1_i y\) which, in turn, will generate, possibly depending on \(\xi \), goods \(T^2_{\xi ,i}y\) in the future; a simple example could be the saving of some goods and a more elaborate one would involve “home production.” The market process allows each agent to exchange their (modified) endowments \((e^1_i - T^1_i y, e^2_{\xi ,i}+T^2_{\xi ,i}y)\) for their consumption at the prevalent market prices \(p^1\) in the here-and-now market and \(p^2_\xi \) in the future market. Thus, given \((p^1, p^2_{\pmb {\xi }})\), each agent will aim, selfishly, to maximize its expected rewards taking only into account its survival and budgetary limitations: choose \(\big (c_i^{1\star }, (c_{\xi ,i}^{2\star },\,\xi \in \Xi ))\) that solves the following stochastic program with recourse:

$$\begin{aligned} \max \quad&v^1_i(c^1)+\mathbb {E}[v_i^2(c^2_{\pmb {\xi }})] \\ \text{ subject } \text{ to } \quad&\, \langle p^1, c^1 + T^1_i y\rangle \le \langle p^1, e^1_i \rangle , \\&\langle p^2_{\pmb {\xi }}, c^2_{\pmb {\xi }}\rangle \le _{a.s.}\; \langle p^2_{\pmb {\xi }}, e^2_{{\pmb {\xi }},i} + T^2_{{\pmb {\xi }},i} y \rangle , \\&c^1\in C_i^1, \; y\in {\mathbb {R}}^{m_i}_{\scriptscriptstyle +}, \; c^2_{\pmb {\xi }}\in _{a.s.} C_i^2, \;i\in \mathcal{I}. \end{aligned}$$

The Walras equilibrium problem is to find a (nonnegative) price system \(\big (p^1, (p^2_\xi ,\,\xi \in \Xi )\big )\) that will clear the here-and-now market and the future markets, i.e., such that the total demand does not exceed the available supply:

$$\begin{aligned} \sum _{i\in \mathcal{I}} (e^1_i -c_i^{1\star }) \ge 0, \qquad \sum _{i\in \mathcal{I}} (e^2_{\xi ,i} -c_{\xi ,i}^{2\star }) \ge 0, \;\forall _{a.s.}\,\xi \in \Xi . \end{aligned}$$

Since the budgetary constraints of the agents are positively homogeneous with respect to the price system, up to eventual rescaling after a solution is found, one can, without loss of generality, restrict the prices \(p^1\) and \(p^2_\xi \) for each \(\xi \) to the unit simplex, i.e., a compact set.

At this point, by combining the optimality conditions associated with the individual agents’ problems with the “clearing the market” conditions it’s possible to translate the equilibrium problem in an equivalent two-stage stochastic variational inequality. Unfortunately, so far, our assumptions don’t guarantee existence of a solution of this variational inequality. At this stage, it’s convenient to proceed with following assumptions that are common in the extensive literature devoted to the GEI-model:

  • the utility functions \(v^1_i\) and \(v^2_i\) are upper semicontinuous, strictly concave,

  • the agents’ endowments are such that \(e_i^1\in \mathop {\mathrm{int}}C_i^1\) and, for all \(\xi \in \Xi \), \(e_{\xi ,i}^2\in \mathop {\mathrm{int}}C_i^2\);

The GEI-literature makes these assumptions to be able to rely on differentiable Topology-methodology to obtain a “generic” existence proof. It’s rather clear that the implications of these assumptions are quite stiff. In particular, they imply that every agent must be endowed, in all situations, with a minimal amount, potentially infinitesimal, of every good and that this agent will be interested, possibly also minimally, in acquiring some more of every good.Footnote 3 Not only do these conditions yield existence [34], and not just generic, but they also imply that Walras’ Law must be satisfied, i.e., the following complementarity conditions involving equilibrium prices and excess supply will hold:

$$\begin{aligned} p^1&\perp e_i^1-c^{1\star }_i - T^1_i y_i^\star \ge 0\\ p^2_\xi&\perp e_i^2+ T^2_i y_i^\star -c^{2\star }_{\xi ,i} \ge 0, \;\; \forall _{a.s.}\,\xi \in \Xi . \end{aligned}$$

Moreover, it means that the agents’ problems are stochastic programs with relatively complete recourse which means that their optimality conditions can be stated in terms of \(\mathcal{L}^1\)-multipliers, refer to [53, 55]; note that here we haven’t restricted ourselves to a situation when the description of the uncertainty is limited to a finite number of events.

For any equilibrium price system \(\big (p^1\in \Delta , (p^2_\xi \in \Delta , \xi \in \Xi )\big )\) with \(\Delta \) \(=\{p \, | \, \sum ^L_{i=1} p_i \le 1, p \ge 0 \}\) the unit simplex in \({\mathbb {R}}^L\): the agents’ consumption plans must satisfy the following optimality conditions: the pair \(\big ((c_i^{1\star }, y_i^\star ), (c_{\xi ,i}^{2\star }, \xi \in \Xi )\big )\) is an optimal solution for agent-i if and only if

  1. (a)

    it satisfies the budgetary constraints,

  2. (b)

    \(\exists \) multipliers \(\big (\lambda _i^1\in {\mathbb {R}}, (\lambda _{\cdot ,i}^2\in \mathcal{L}^1)\big )\) such that

    $$\begin{aligned} 0\le & {} \lambda _i^1 \perp \langle p^1, e^1_i - T_i^1 y_i^\star - c_i^{1\star } \rangle \ge 0, \\ 0\le & {} _{a.s.}\, \lambda ^2_{\xi ,i} \perp _{a.s.}\, \langle p^2_\xi , e^2_{\xi ,i} + T^2_{\xi ,i} y_i^\star - c^{2\star }_{\xi ,i} \rangle \ge _{a.s.}\, 0, \end{aligned}$$
  3. (c)

    and

    $$\begin{aligned} c_i^{1\star } \in&\;\mathop {\mathrm{argmax}}\limits \nolimits _{c^1\in C^1_i} v^1_i(c^1) - \lambda ^1_i \langle p^1, c^1 \rangle , \\ c^{2\star }_{\xi ,i} \in \,&\;\mathop {\mathrm{argmax}}\limits \nolimits _{c^2\in C^2_i} v^2_i(c^2) - \lambda _{\xi ,i}^2 \langle p^2_\xi , c^2 \rangle , \; \forall _{a.s.}\, \xi \in \Xi ,\\ y_i^\star \in&\;\mathop {\mathrm{argmax}}\limits \nolimits _{y\in {\mathbb {R}}_{\scriptscriptstyle +}^{m_i}} -\lambda ^1_i \langle p^1, T_i^1 y \rangle + \mathbb {E}[\lambda ^2_{\xi ,i} \langle p^2_\xi , T^2_{\xi ,i} y \rangle ]. \end{aligned}$$

Assuming the utility functions are also differentiable, these last conditions can be translated in terms of the first order optimality conditions for these programs:

$$\begin{aligned}&\big (\nabla v^1_i(c_i^{1\star })-\lambda ^1_i p^1\big ) \in N_{C^1_i}(c_i^{1\star }) \\&\quad \big (\nabla v^2_i(c_{{\pmb {\xi }},i}^{2\star }) - \lambda _{{\pmb {\xi }},i}^2 p^2_{\pmb {\xi }}\big ) \in _{a.s.} N_{C^2_i}(c_{{\pmb {\xi }},i}^{2\star }) \\&\quad 0 \le y_i^\star \perp \big (\lambda ^1_i (T_i^1)^\top p^1 - \mathbb {E}[\lambda ^2_{{\pmb {\xi }},i} (T^2_{{\pmb {\xi }},i})^\top p^2_{\pmb {\xi }}]\big ) \ge 0. \end{aligned}$$

In conjunction with Walras law, we can regroup these conditions so that they fit the pattern of our two-stage formulation in Sect. 1: find \(x = \big ((p^1,c^1_i, y_i, \lambda ^1_i), i\in \mathcal{I}\big )\) and \(u_{\pmb {\xi }}= \big ((p^2_{\pmb {\xi }}, c^2_{{\pmb {\xi }},i}, \lambda ^2_{{\pmb {\xi }},i}), i\in \mathcal{I}\big )\) such that for all \(i\in \mathcal{I}\):

$$\begin{aligned}&\langle p^1, e_i^1 - c^1_i - T^1_i y_i\rangle \ge 0 \quad \quad \text { (feasibility)}\\&\quad 0 \le \lambda _i^1 \perp \langle p^1, e^1_i - T_i^1 y_i - c_i^1 \rangle \ge 0 \quad \quad \;\text { (multipliers complementarity)}\\&\quad \big (\nabla v^1_i(c_i^1)-\lambda ^1_i p^1\big )\in N_{C^1_i}(c_i^1) \; \quad \quad (c^1\text {-optimality)}\\&\quad 0 \le y_i \perp \big (\lambda ^1_i (T_i^1)^\top p^1 - \mathbb {E}[\lambda ^2_{{\pmb {\xi }},i} (T^2_{{\pmb {\xi }},i})^\top p^2_{\pmb {\xi }}]\big )\ge 0 \quad \quad \; (y\text {-optimality)}\\&\quad 0 \le e_i^1-c^1_i - T^1_i y_i \perp p^1 \in \Delta \;\; \quad \quad \text {(Walras' law)}\\&\quad \sum _{i\in \mathcal{I}} (e^1_i -c_i^1) \ge 0 \; \quad \quad \text { (clearing the market)} \end{aligned}$$

and

$$\begin{aligned}&\langle p^2_{\pmb {\xi }}, e^2_{{\pmb {\xi }},i} + T^2_{{\pmb {\xi }},i}y_i -c^2_{{\pmb {\xi }},i} \rangle \ge _{a.s.} 0\; \quad \quad \text { (feasibility)} \\&\quad 0 \le _{a.s.} \lambda ^2_{{\pmb {\xi }},i} \perp _{a.s.} \langle p^2_{\pmb {\xi }}, e^2_{{\pmb {\xi }},i} + T^2_{{\pmb {\xi }},i} y_i - c^2_{{\pmb {\xi }},i} \rangle \ge _{a.s.} 0 \quad \text { (multipliers complementarity)}\\&\quad \big (\nabla v^2_i(c_{{\pmb {\xi }},i}^2) - \lambda _{{\pmb {\xi }},i}^2 p^2_{\pmb {\xi }}\big ) \in _{a.s.} N_{C^2_i}(c_{{\pmb {\xi }},i}^{2}) \quad \quad \; (c^2\text { -optimality)}\\&\quad 0 \le _{a.s.} e_{\xi ,i}^2+ T^2_{{\pmb {\xi }},i} y_i - c^2_{\xi ,i} \perp _{a.s.} p^2_{\pmb {\xi }}\in _{a.s.} \Delta \;\; \quad \quad \text {(Walras' law)} \\&\quad \sum _{i\in \mathcal{I}} (e^2_{{\pmb {\xi }},i} -c_{{\pmb {\xi }},i}^2) \ge _{a.s.} 0. \;\quad \quad \text { (clearing the market)} \end{aligned}$$

One approach in designing solution procedures for such a potentially whopping stochastic variational inequality is to attempt a straightforward approach relying, for example, on PATH Solver [18]. Notwithstanding, the capabilities of this excellent package, it is bound to be quickly overwhelmed by the size of this problem even when the number of agents and potential \(\xi \)-event is still quite limited.Footnote 4 An approach based on decomposition is bound to be indispensable. One could rely on a per-agent decomposition first laid out in [24] and used in a variety of applications, cf. for example [50]. Another approach is via scenarios (events) based decomposition, relying on the Progressive Hedging algorithm [56, 57] and an approximation scheme as developed in [17]. Finally, one should also be able to expand on a splitting algorithm elaborated in Sect. 5 to allow for an agent/scenario decomposition expected to be reasonably competitive. \(\square \)

The two last examples are stochastic variational inequalities that arise in connection with transportation/communication problems whose one must take into account uncertainties about some of the parameters of the problem. To fix terminology and notation, we begin with a brief description of the deterministic canonical model, the so called Wardrop equilibirum model; for excellent and thorough surveys of this model, see [14, 44] and as far as possible we follow their overall framework. There is no straightforward generalization to the “stochastic version” which is bound to very much depend on the motivation, or in other words, on the type of solution one is interested in. Here, we are going to be basically interested in problems where the uncertainty comes from the components of the underlying structure (network) or demand volume. Cominetti [13] and the references therein consider an interesting set of alternative questions primarily related to the uncertainty in the users’ information and behavior.

Given an oriented network \(\mathcal{N}=\big (\mathcal{G}\text { (nodes)}, \mathcal{A}\text { (arcs)}\big )\) together with for \(c_a\ge 0\) the maximum flow capacity for each arc (a) and demand \(h_{od}\) for each origin(o)-destination(d) pairs. \(R_{od}\) are all (acyclic) routes r connecting o to d with N being the arcs(a)/routes(r) incidence matrix, i.e., \(N_{a,r} = 1\) if arc \(a\in r\). A route-flow \(f = \big \lbrace f_r, r \in \cup _{od} R_{od}\big \rbrace \) results in an arc-flow \(x = \big \lbrace x_a = \langle N_a, f \rangle , \;a\in \mathcal{A}\big \rbrace \). The travel time on route r, assumed to be additive, \(\sum _{a\in r} t_a(x_a)\) where \(t_a(\cdot )\), a congestion dependent function, specifies the travel time on arc a. Let

$$\begin{aligned} C = \Big \lbrace (x_a =\langle N_a,f\rangle \le c_a,\, a\in \mathcal{A}) {\,\Big \vert \,}\, f \ge 0, \sum _{r\in R_{od}} f_r = h_{od} \;\forall \; od\text {-pairs} \Big \rbrace \end{aligned}$$

be the polyhedral set of the arc-flows satisfying the flow conservation constraints. One refers to \(x^\star = N f^\star \in C\) as a Wardrop equilibrium if

$$\begin{aligned} \forall \, od\text {-pairs}, \forall \, r\in R_{od} \text { with } f_r^\star > 0,\; \sum _{a\in r} t_a(x_a^\star ) \text { is the minimal } od\text { -travel-time,} \end{aligned}$$

i.e., flow travels only on shortest routes. A minor variation, due to the capacity constraints on the arcs, of the authoritative argument of Beckmann et al. [4], shows that these feasibility and equilibrium can be interpreted as the first order optimality condition of the convex program

$$\begin{aligned} \min \sum _{a\in \mathcal{A}} \int _0^{x_a} t_a(z)\,dz, \; x = (x_a,\,a\in \mathcal{A})\in C. \end{aligned}$$

Indeed, \(x^\star \) is optimal if and only if it satisfies the variational inequality:

$$\begin{aligned} \sum _{a\in \mathcal{A}} t_a(x_a^\star )(x_a-x_a^\star )\ge 0,\qquad \forall \; x\in C. \end{aligned}$$

Example 2.5

Prevailing flow analysis. The main motivation in [10], which by the way is probably the first article to introduce a(n elementary) formulation of a two-stage stochastic variational inequality, was to determine the steady flow f that would prevail given an established network but where the origin(o)-destination(d) demand as well as the arcs capacities are subject to (stochastic) perturbations. The individual users would make adjustments to their steady-equilibrium routes, depending on these perturbations \(\xi \), by choosing a recourse decision \(u_\xi \) that ends up to be the “nearest feasible” route to their steady-route. Although, one would actually like to place as few restrictions as possible on the class of recourse functions \((\xi ,f) \mapsto u(\xi ,f)\), from a modeling as well as a computational viewpoint, one might be satisfied with some restrictions or the application might dictate these restrictions as being the only really implementable “recourse” decisions; in [10] “nearest feasible” was interpreted as meaning the projection on the feasible region.

Detail. Recasting the problem considered in [10] in our present notation, it would read: find \((f,u_\xi )\) such that for all \(\xi \in \Xi \),

and

$$\begin{aligned} C_\xi = \big \lbrace u \in {\mathbb {R}}^{n_2} {\,\big \vert \,}Au = b_\xi , u\ge 0 \big \rbrace . \end{aligned}$$

This problem comes with no additional variational inequality, i.e., of the type \(-\mathbb {E}[G(\cdot )] \in N_D(\cdot )\). Actually, in [10], it’s assumed that one can restrict the choice of f to a set \(D\subset {\mathbb {R}}^n_{\scriptscriptstyle +}\) which will guarantee that for all \(\xi \), one can find \(u_\xi \) so that the corresponding variational inequality is solvable. This is a non-trivial assumption and it is only valid if we expect the perturbations of both the od-demand and those modifying the capacities to be relatively small, i.e., can essentially be interpreted as “noise” affecting these uncertain quantities. \(\square \)

Example 2.6

Capacity expansion. Arc capacity expansion, \(\big (c_a \rightarrow c_a + x_a, a\in \mathcal{A}\big )\), is being considered for an existing, or to be designed, network (traffic, data transmission, high-speed rail, \(\dots \)). Only a limited pool of resources is available and thus, determine a number of constraints on the choice of x. To stay in tune with our general formulation of the two-stage model and provide a wide margin of flexibility in the formulation of these x-constraints, we express them as a variational inequality \(\langle G(x), x'-x\rangle \ge 0\) for all \(x'\in D\) where D is a closed convex subset of \({\mathbb {R}}^L\), \(G: {\mathbb {R}}^L \rightarrow {\mathbb {R}}^L\) is a continuous vector-valued mapping and \(L = |\mathcal{A}|\). The overall objective is to determine the arcs where capacities expansion will be most useful, i.e., will as well as possible respond to the “average” needs of the users of this network (minimal travel times, rapid connections, \(\dots \)). We interpret this to mean that the network flow will be at, or at least seek, a Wardrop equilibrium based on the information available: each od-pair demand and arcs’ capacity both of which are subject to stochastic fluctuations. Taking our clue from the deterministic version described earlier, given a capacity expansion x and an environment \(\xi \in \Xi \) affecting both demands and capacities, a solution \(\big (u_{\xi ,a}^\star , a\in \mathcal{A}\big )\) of the following variational inequality would yield a Wardrop equilibrium:

$$\begin{aligned} \sum _{a\in \mathcal{A}} t_a(\xi ,u_{\xi ,a}^\star )(u_a -u_{\xi ,a}^\star ) \ge 0, \quad \forall \, u = (u_a, a\in \mathcal{A}) \in C_\xi \end{aligned}$$

with

$$\begin{aligned} C_\xi =\Big \lbrace u \le c_\xi {\,\Big \vert \,}\, u = Nf,\; f \ge 0,\; \sum _{r\in R_{od}} f_r = h_{\xi ,od} \;\forall \; od\text {-pairs} \Big \rbrace . \end{aligned}$$

Our two-stage stochastic variational inequality, cf. Sect. 1, would take the form:

$$\begin{aligned}&\text {find } \big (x^\star , u^\star :\Xi \times D\rightarrow {\mathbb {R}}^L, \mathcal{A}\text {-measurable in } \xi \big ) \mathop {\mathrm{\;such\; that\;}}\nolimits \\&\qquad \qquad \qquad \qquad -G(x^\star ) \in N_D(x^\star ) \\&\qquad \qquad -F({\varvec{\xi }}, u^\star ({\varvec{\xi }},x^\star )) \in _{a.s.} N_{C_{\pmb {\xi }}}(u^\star ({\varvec{\xi }},x^\star )), \end{aligned}$$

where

$$\begin{aligned} F(\xi ,u)&= \big (t_a(\xi ,u)\big )_{a\in \mathcal{A}}. \end{aligned}$$

Detail. However, finding the solution to this particular stochastic variational inequality might not provide us with a well thought out solution to the network design problem: find optimal arc-capacities extension x that would result in expected minimal changes in the traffic flow when there are arbitrary, possibly significant, perturbations in the demand and the capacities. These concerns lead us to a modified formulation of this problem which rather than simply asking for a (feasible) solution of the preceding stochastic variational inequality wants to take into account a penalty cost associated with the appropriate recourse decisions when users are confronted with specific scenarios \(\xi \in \Xi \). A “recourse cost” needs to be introduced. It’s then also natural, and convenient computationally, to rely on an approach that allows for the possibility that in unusual circumstances some of the preceding variational inequalities might actually fail to be satisfied. This will be handled via a residual function. This application oriented reformulation results in a recasting of the problem and brings it closer to what has been called an Expected Residual Minimization or ERM-formulation. The analysis and the proposed solution procedure in the subsequent sections is devoted to this re-molded problem. Although this ERM-reformulation, see (4) below and Sect. 5, might usuallyFootnote 5 only provide an approximate solution of the preceding variational inequality, it provides a template for the design of optimal networks (traffic, communication,\(\dots \)) coming with both structural and demand uncertainties as argued in the next section.   \(\square \)

3 An expected residual minimization formulation

We proceed with an ERM-formulation that in some instances might provide a solution which better suits the overall objective of the decision maker, cf. Sect. 5. Our focus will be on a particular class of stochastic variational inequalities which, in particular, include a number of traffic/communication problems (along the lines of Example 2.6):

$$\begin{aligned}&\text { find } \big (\bar{x}\in {\mathbb {R}}^n, \bar{u}:\Xi \times D \rightarrow {\mathbb {R}}^n, \, \mathcal{A}\text {-measurable in } \xi \big ) \mathop {\mathrm{\;such\; that\;}}\nolimits \\&\quad -G(\bar{x}) \in N_D(\bar{x}), \;\; \\&\quad -F({\varvec{\xi }}, \bar{u}({\varvec{\xi }}, \bar{x})) \quad \in _{a.s.} N_{C_{\pmb {\xi }}}(\bar{u}({\varvec{\xi }},\bar{x})), \end{aligned}$$

where

$$\begin{aligned} \forall \xi \in \Xi :\;\; C_\xi \subset {\mathbb {R}}^n, \text { closed, convex}, \quad \xi \mapsto C_\xi \; \mathcal{A}\text {-measurable}, \end{aligned}$$

D is closed and convex, and \(G: {\mathbb {R}}^n\rightarrow {\mathbb {R}}^n\) and \(F:\Xi \times {\mathbb {R}}^n\rightarrow {\mathbb {R}}^n\), as in Sect. 1, are continuous vector-valued functions in x and u respectively for all \(\xi \in \Xi \), and F is \(\mathcal{A}\)-measurable in \(\xi \) for each \(u\in {\mathbb {R}}^n\).

The dependence of F on x is assimilated in u by letting it depend on the state of the system determined by (\({\varvec{\xi }},x)\); it is thus convenient in the ensuing analysis to view F as just a function of \({\varvec{\xi }}\) and u. A convenient interpretation is to think of \(x_a\) as the “average” flow that would pass through arc a when uncertainties aren’t taken into account whereas, when confronted with scenario \(\xi \), the actual flow would turn out to be \(u_{\xi ,a}\). Let’s denote by \(y_\xi = \big (y_{\xi ,a} = u_{\xi ,a} - x_a, \;a\in \mathcal{A}\big )\) the “recourse” required to adapt \(x_a\) to the events/scenario \(\xi \), this will come at an expected cost (time delay) to the users. Since \(\bar{x}\) is conceivably a Wardrop equilibrium, these adjustments will usually result in less desirable solutions. Such flow adjustments will come at a cost, say , i.e. a generalized version of a least square cost proportional to the recourse decisions. This means that H, usually but not necessarily a diagonal matrix, would be positive definite. So, rather than viewing our decision variables as being \(u_\xi \), we can equivalently formulate the recourse decisions in terms of the \(y_\xi \), but now at some cost, and define \(u_\xi = x + y_\xi \). In fact, let us go one step further and enrich our model by treating these \(y_\xi \) as activitiesFootnote 6 that result in a flow adjustment \(Wy_\xi \) and, hence, \(u_\xi = x + Wy_\xi \); whenever \(W=I\), the \(y_\xi \) just compensate for deviations from x.

When dealing with a deterministic variational inequality \(-G(x) \in N_D(x)\), it is well known that its solution can be found by solving the following optimization problem

$$\begin{aligned} \min \theta (x) \text { subject to } x\in D, \end{aligned}$$

where \(\theta \) is a residual (gap-type) function, i.e., such that \(\theta \ge 0\) on D and \(\theta (x) = 0\) if and only if \(x\in D\) solves the variational inequality. In this article, we work with the following residual function, to which one usually refers as the regularized gap function [26]

$$\begin{aligned} \theta (x) = \mathop {\mathrm{max}}\nolimits _{v\in D} \langle x-v, G(x) \rangle - \frac{\alpha }{2}\Vert x-v \Vert ^2 \end{aligned}$$
(1)

for some \(\alpha > 0\). In terms of the overall framework of Sect. 1, this residual function-type will be attached to the inclusion \(-G(x):=-\mathbb {E}[G(\xi ,x)] \in N_D(x)\). The corresponding optimization problem reads

$$\begin{aligned} \min _{x\in D} \big [\theta (x) = \max _{v\in D} \langle x-v, \mathbb {E}[G(\xi ,x)]\rangle - \frac{\alpha }{2}\Vert x-v \Vert ^2\big ]. \end{aligned}$$
(2)

When dealing with the second inclusions \(-F(\xi ,u(\xi ,x)) \in N_{C_\xi }(u(\xi ,x))\) for P-almost all \(\xi \in \Xi \), one could similarly introduce a collection of residual functions \(\tilde{\theta }(\xi ,u)\) whose properties would be similar to those of \(\theta \) and ask that with probability 1, \(\tilde{\theta }(\xi ,u(\xi ,x)) = 0\) if and only if the function \((\xi , x) \mapsto u(\xi ,x)\) satisfies the corresponding stochastic variational inequality. This certainly might be appropriate in certain instances, but in a variety of problems one might want to relax this condition and replace it by a weaker criterion, namely that the Expected Residual \(\mathbb {E}[\tilde{\theta }(\xi ,u(\xi ,x))]\) be minimal. A somewhat modified definition of a residual function will be more appropriate when dealing with this “collection of variational inequalities”, we adopt here the one introduced in [10].

Definition 3.1

(SVI-Residual function) Given a closed, convex set \(D\subset {\mathbb {R}}^n\) and the random vector \({\varvec{\xi }}\) defined on \((\Xi ,\mathcal{A},P)\), let us consider the following collection of variational inequalities (SVI):

$$\begin{aligned} \text { find } \big (\bar{x}\in {\mathbb {R}}^n, \bar{u}:\Xi \times D&\rightarrow {\mathbb {R}}^n, \ \mathcal{A}\text {-measurable in}~ \xi \big ) \mathop {\mathrm{\;such\; that\;}}\nolimits \\ -F\big ({\varvec{\xi }},\bar{u}({\varvec{\xi }},\bar{x})\big )&\in _{a.s.} N_{C_{{\varvec{\xi }}}}(\bar{u}({\varvec{\xi }},\bar{x})). \end{aligned}$$

A function \(r: \Xi \times {\mathbb {R}}^{n} \rightarrow {\mathbb {R}}_+\) is a residual function for these inclusions if the following conditions are satisfied:

  1. (i)

    P-almost all \(\xi \in \Xi \), \(r(\xi , u)\ge 0\) for all \( u\in C_\xi \),

  2. (ii)

    For any \(u:\Xi \times D\rightarrow {\mathbb {R}}^n\) \(\mathcal{A}\)-measurable, it holds that

    $$\begin{aligned} -F\big (\xi ,u(\xi ,x) \big ) \in _{a.s.} N_{C_\xi }(u(\xi ,x)) \Leftrightarrow r(\xi , u(\xi ,x))=_{a.s.}0 \ \mathrm{and }\ u(\xi ,x)\in _{a.s.} C_\xi . \end{aligned}$$

The stochastic variational inequality in this definition is in line with the second \(a.s.\)-inclusion in Example 2.6. We will work with a concrete example of SVI-residual functions: the regularized gap residual function with \(\alpha >0\),

$$\begin{aligned} r(\xi ,u)=\max _{v\in C_\xi } \Big \lbrace \langle u-v, F(\xi , u)\rangle -\frac{\alpha }{2}\Vert u-v\Vert ^2\Big \rbrace ; \end{aligned}$$
(3)

We will show in Theorem 3.3 that the above function satisfies the two conditions in Definition 3.1. The use of residual functions leads us to seeking a solution of the stochastic program

$$\begin{aligned} \begin{array}{ll} \mathop {\mathrm{min}}\nolimits _{x\in D} &{} \theta (x)+ \lambda \mathbb {E}[r(\xi ,u(\xi ,x)) +Q(\xi , x)]\\ \text {where} &{} u(\xi , x)= x+Wy_\xi ^*, \quad Q(\xi , x) = \frac{1}{2} \langle y^*_\xi , Hy^*_\xi \rangle , \quad \forall _{a.s.}\ \xi \in \Xi ,\\ &{}y^*_\xi =\mathop {\mathrm{argmin}}\limits \big \lbrace \frac{1}{2} \langle y,Hy\rangle {\,\big \vert \,}x + Wy \in C_\xi \big \rbrace . \end{array} \end{aligned}$$
(4)

With the positive definite property of H and the convexity of \(C_\xi \), \(u(\xi , x)\) is uniquely defined by the unique solution \(y^*_\xi \) of the second stage optimization problem in (4). Moreover, with the residual functions \(\theta \) and r defined in (1) and (3), respectively, the positive parameter \(\lambda \) in (4) allows for a further adjustment of the weight to ascribe to the required recourse decisions and residuals; with \(\lambda \) relatively large, and adjusting \(H\succ 0\) correspondingly, one should end up with a solution that essentially avoids any violation of the collection (SVI) of variational inequalities.Footnote 7

Assumption 3.2

Assume

  1. (i)

    W has full row rank;

  2. (ii)

    for almost all \(\xi \), \(C_\xi \subset C^\dagger \), a compact convex set.

For almost all \(\xi \), since \(C_\xi \) is convex and compact, it is easy to show that for these \(\xi \), \(u\mapsto r(\xi ,u)\) is continuous. On the other hand, for each fixed u, it follows from [58, Theorem 14.37] that \(\xi \mapsto r(\xi ,u)\) is measurable. This means that r is a Carathéodory function. In addition, consider for each \(\xi \in \Xi \),

$$\begin{aligned} v(\xi ,u)=\mathop {\mathrm{prj}}\nolimits _{C_\xi }\left( u- \frac{1}{\alpha }F(\xi , u)\right) . \end{aligned}$$
(5)

Recall that \(v(\xi ,u)\) attains the maximum in (3). Clearly, \(u\mapsto v(\xi ,u)\) is continuous, and for each fixed u, the measurability of \(\xi \mapsto v(\xi ,u)\) again follows from [58, Theorem 14.37]. Consequently, also v is a Carathéodory function.

Theorem 3.3

(Residual function) When Assumption 3.2 is satisfied, r is a residual function for our collection (SVI) and for any \(x \in D\) and almost every \(\xi \in \Xi \), the function \(r(\xi ,u(\xi ,x))+Q(\xi ,x)\) in (4) is finite, nonnegative with

$$\begin{aligned} v(\xi ,u(\xi ,x))=\mathop {\mathrm{prj}}\nolimits _{C_\xi }\left( u(\xi ,x)- \frac{1}{\alpha }F(\xi , u(\xi ,x))\right) \end{aligned}$$
(6)

as the unique maximizer of the maximization problem in (3).

Proof

Let \(x\in D\) and \(u:\Xi \times D\rightarrow {\mathbb {R}}^n\) be \(\mathcal A\)-measurable in \(\xi \). We now check the two conditions in Definition 3.1. First of all, we see that \(r(\xi , u)\) is nonnegative for all \(u\in C_\xi \) from the definition. Moreover, from the property of the regularized gap function for VI, we have \(r(\xi , u(\xi , x))=0\) and \(u(\xi ,x)\in C_\xi \) if and only if \(u(\xi , x)\) solves the (deterministic) variational inequality \(-F(\xi ,u(\xi ,x))\in N_{C_\xi }(u(\xi ,x))\) for fixed \(\xi \in \Xi \). Thus, it follows that r is a residual function of this collection of variational inequalities \(-F(\xi ,u(\xi ,x))\in _{a.s.} N_{C_\xi }(u(\xi ,x)), \;\xi \in \Xi \).

We next show that for any \(x\in D\), the \(y^*_\xi \) in (4) is, in fact, uniquely defined. To this end, consider \(y=W^\top (WW^\top )^{-1}(\mathop {\mathrm{prj}}_{C_\xi } x -x)\), where \(WW^\top \) is invertible by Assumption 3.2. Then \(Wy= \mathop {\mathrm{prj}}_{C_\xi }x-x\) and \(Wy +x \in C_\xi \) which means that the feasible set of the second stage optimization problem in (4), i.e., \(\big \lbrace y {\,\big \vert \,}x+Wy\in C_\xi \big \rbrace \), is nonempty. Moreover, for almost all \(\xi \in \Xi \), this set is closed and convex since \(C_\xi \) is compact and convex. Consequently, the second stage optimization problem in (4) has a strongly convex objective and a nonempty closed convex feasible set. Therefore, it has a unique minimizer \(y^*_\xi \), and \(\xi \mapsto y^*_\xi \) is measurable thanks to [58, Theorem 14.37].

Finally, for \(u(\xi ,x) = x + Wy^*_\xi \), we recall the well-known fact that the problem

$$\begin{aligned} \max _{v\in C_\xi } \; \langle u(\xi ,x)-v, F(\xi , u(\xi ,x)) \rangle -\frac{\alpha }{2}\Vert u(\xi ,x)-v\Vert ^2 \end{aligned}$$

has a solution, with the unique maximizer given by the projection (6). Thus, the function \(r(\xi , u(\xi ,x))\) in (4) is also finite-valued. Furthermore, since H is positive definite and r is nonnegative, \(r(\xi , u(\xi ,x))+Q(\xi , x)\) has a finite nonnegative value at any \(x \in D\) for almost every \(\xi \in \Xi \). This completes the proof. \(\square \)

Theorem 3.3 means that under Assumption 3.2, problem (4) is a stochastic program with complete recourse, that is, for any x and almost all \(\xi \), the second stage optimization problem in (4) has a solution. In general, this is not an innocuous assumption but in the context that we are considering it only means that the set D has singled-out a set of possible traffic network layouts that will guarantee that traffic will proceed, possibly highly disturbed and arduously, whatever be the perturbations resulting from the stochastic environment \(\xi \).

We need the following to guarantee the objective of (4) is finite-valued:

Assumption 3.4

The functions \(F(\xi ,\cdot )\) and \(G(\cdot )\) are continuously differentiable for all \(\xi \in \Xi \) and \(\nabla F\) is \(\mathcal{A}\)-measurable. Moreover, for any compact set \(\Omega \), there are functions \(d, \rho : \Xi \rightarrow {\mathbb {R}}_+\) such that

$$\begin{aligned}\Vert F(\xi , u)\Vert \le d_\xi \ \mathrm{and}\ \Vert \nabla F(\xi , u)\Vert \le \rho _\xi \end{aligned}$$

for all \(u\in \Omega \), where \(d\in \mathcal{L}_1^{\scriptscriptstyle \infty }\) and \(\rho \in \mathcal{L}_1^1\).

Lemma 3.5

Suppose Assumptions 3.2 and 3.4 hold. Then for almost all \(\xi \), r is continuously differentiable and its gradient is given by

$$\begin{aligned} \nabla r(\xi , u)= F(\xi , u)-(\nabla F(\xi , u)-\alpha I)(v(\xi ,u)-u). \end{aligned}$$
(7)

Moreover, for any measurable \(u_\xi \in _{a.s.}C_\xi \), both \(\xi \mapsto r(\xi ,u_\xi )\) and \(\xi \mapsto \nabla r(\xi , u_\xi )\) are not only measurable but actually integrable uniformly in \(u_\xi \). In particular, this means that the objective function in (4) is well defined at any \(x\in D\), and the optimal value of (4) is finite.

Proof

From Theorem 3.2 in [26], we know that \(r(\xi , \cdot )\) is continuously differentiable and its gradient is given by (7). Moreover, notice that \(r(\xi ,u)\) and its gradients with respect to u are both Carathéodory functions. Hence, the measurability of \(r(\xi , u_\xi )\) and \(\nabla r(\xi , u_\xi )\) for any measurable function \(u_\xi \) follows. We now establish uniform integrability.

Since \(C^\dagger \) is compact, for a \(u\in C^\dagger \), \(\Vert u\Vert \le \gamma \) for some \(\gamma >0\). This together with Assumption 3.4 yields \(r(\xi ,u_\xi ) \le 2\gamma d_\xi + \frac{\alpha }{2} (2\gamma )^2\) and

$$\begin{aligned} \Vert \nabla r(\xi ,u_\xi )\Vert\le & {} \Vert F(\xi , u_\xi )\Vert +\Vert \nabla F(\xi , u_\xi )-\alpha I\Vert \Vert v(\xi ,u_\xi )-u_\xi \Vert \\\le & {} d_\xi + 2\gamma (\rho _\xi + \alpha ) =: d^r_\xi . \end{aligned}$$

This proves uniform integrability.

Finally, let x be feasible for (4) and consider the corresponding \(y^*_\xi \) (exists and is measurable according to the proof of Theorem 3.3). Set \(\hat{y} = W^\top (WW^\top )^{-1}(\mathop {\mathrm{prj}}_{C_\xi }x - x)\). Then \(W\hat{y} + x \in C_\xi \) and hence we have from the definition of \(y^*_\xi \) and \(C^\dagger \) that \(\langle y^*_\xi ,Hy^*_\xi \rangle \le \langle \hat{y},H\hat{y}\rangle \le c\) for some constant \(c > 0\) (that depends on x but is independent of \(\xi \)). Hence

$$\begin{aligned} \theta (x) + \lambda \mathbb {E}[r(\xi ,u(\xi ,x)) + Q(\xi ,x)] \le \theta (x) + 2\lambda \gamma \mathbb {E}[d_\xi ] + 2\lambda \alpha \gamma ^2 + \frac{\lambda c}{2}, \end{aligned}$$

which implies the well-definedness of the objective in (4) and the finiteness of the optimal value. This completes the proof. \(\square \)

Lemma 3.5 establishes the differentiability of r and the finiteness of optimal value of (4). However, the objective function of problem (4) involves minimizers of constrained quadratic programs for \(\xi \in \Xi \) and is not necessarily differentiable even when the sample is finite, despite the fact that the function \(r(\xi ,\cdot )\) in (3) is continuously differentiable for almost all \(\xi \in \Xi \).

Below, we adopt the strategy of the L-shaped algorithm for two-stage stochastic optimization problems [6, 36, 65] to obtain a relaxation of (4), whose objective function will be smooth when the sample is finite. We start by rewriting the recourse program. First, observe that the second stage problem is the same as

$$\begin{aligned} \min \frac{1}{2}\langle y,Hy \rangle \text { subject to } Wy = u - x, \;\; u \in C_\xi . \end{aligned}$$

With the substitution \(z = H^{\frac{1}{2}} y\), the above problem is equivalent to

$$\begin{aligned} \min \frac{1}{2}\Vert z\Vert ^2 \text { subject to } W H^{-\frac{1}{2}}z = u - x, \;\; u \in C_\xi . \end{aligned}$$
(8)

Observe that for each fixed u and x, the minimizer of \(\min \big \lbrace \frac{1}{2}\Vert z\Vert ^2 {\,\big \vert \,}WH^{-\frac{1}{2}}z = u - x\big \rbrace \) is

$$\begin{aligned} z = H^{-\frac{1}{2}}W^\top B(u - x) \text { where } B=(WH^{-1}W^\top )^{-1}. \end{aligned}$$
(9)

Plugging this expression back in (8), the second stage problem is further equivalent to

$$\begin{aligned} \min \frac{1}{2}\langle u-x,B(u-x) \rangle \text { subject to } u \in C_\xi , \end{aligned}$$

whose unique solution \(u_\xi ^*\) can be interpreted as a weighted projection of x onto \(C_\xi \) for each \(\xi \in \Xi \). From this and (9), for each \(\xi \in \Xi \), the unique solution \(y^*_\xi \) of the second stage problem is given by

$$\begin{aligned} y^*_\xi = \mathop {\mathrm{argmin}}\limits \left\{ \frac{1}{2} \langle y,Hy \rangle {\,\big \vert \,}x+ Wy \in C_\xi \right\} =H^{-1}W^\top B(u^*_\xi -x). \end{aligned}$$
(10)

Combining the preceding reformulation of the second stage problem with the idea of the L-shaped algorithm, we are led to the following problem, whose objective is smooth when the sample is finite:

$$\begin{aligned} \begin{array}{ll} \min &{} \theta (x)+ \lambda \mathbb {E}[r(\xi ,u_\xi )+\frac{1}{2}\langle u_\xi -x,B(u_\xi -x)\rangle ]\\ \text {subject to } &{} x\in D, \quad u_\xi \in _{a.s.} C_\xi . \end{array} \end{aligned}$$
(11)

It is not hard to see that the optimal value of (11) is smaller than that of (4) since fewer restrictions are imposed on \(u_\xi \), or, equivalently on \(y_\xi \) in (10). Hence, it follows from Lemma 3.5 that the optimal value of (11) is also finite.

Unlike (4) which only depends explicitly on the finite dimensional decision variable x, problem (11) depends also explicitly on the measurable function \(\xi \mapsto u_\xi \). However, notice that (11) can be equivalently rewritten as

$$\begin{aligned} \min _{x\in D}\ \ \left\{ \theta (x)+\lambda \min _{ u \in \mathcal{L}^{\scriptscriptstyle \infty }_{n}} \mathbb {E}\left[ r(\xi ,u_\xi ) + \frac{1}{2}\langle u_\xi -x,B(u_\xi -x)\rangle + \delta _{C_\xi }(u_\xi )\right] \right\} , \end{aligned}$$
(12)

where \(\delta _{C_\xi }\) is the indicator function of the set \(C_\xi \) which is zero inside the set and is infinity otherwise. We next recall the following result, which allows interchange of minimization and integration. This is a consequence of the finiteness of the optimal value of the inner minimization in (12) (thanks to Lemma 3.5), Exercise 14.61 and Theorem 14.60 (interchange of minimization and integration) in [58].

Lemma 3.6

Under Assumptions 3.2 and 3.4, for any fixed \(x\in D\), we have

$$\begin{aligned} \min _{u\in \mathcal{L}^{\scriptscriptstyle \infty }_{n}} \mathbb {E}[\Phi (\xi , u_\xi )]=\mathbb {E}[\min _{u\in {\mathbb {R}}^n} \Phi (\xi , u)], \end{aligned}$$
(13)

where

$$\begin{aligned} \Phi (\xi ,u) = r(\xi ,u) + \frac{1}{2}\langle u-x,B(u-x)\rangle + \delta _{C_\xi }(u). \end{aligned}$$

Moreover, for \(\bar{u}\in \mathcal{L}^{\scriptscriptstyle \infty }_{n}\),

$$\begin{aligned} \bar{u} \in \mathop {\mathrm{argmin}}\limits _{u\in \mathcal{L}^{\scriptscriptstyle \infty }_{n}} \mathbb {E}[\Phi (\xi , u_\xi )] \quad \Leftrightarrow \quad \bar{u}_\xi \in \mathop {\mathrm{argmin}}\limits _{u\in {\mathbb {R}}^n} \Phi (\xi , u), \quad \xi \in \Xi , \, a.s. \end{aligned}$$
(14)

Using the above result, one can further reformulate (11) equivalently as

$$\begin{aligned} \begin{array}{l} {\displaystyle \min _{x\in D}\varphi (x) := \theta (x)+\lambda \mathbb {E}[\psi (\xi , x)]}\\ {\displaystyle \psi (\xi , x):=\min _{ u \in C_\xi } r(\xi ,u) + \frac{1}{2}\langle u-x,B(u-x)\rangle }, \end{array} \end{aligned}$$
(15)

which is an optimization problem that depends only explicitly on the finite dimensional decision variable x. Moreover, for each \(x\in D\), from (14), we have \(u^*\in \mathcal{L}_n^\infty \) attaining the minimum of the inner minimization problem in (12) if and only if its value at \(\xi \), i.e., \(u^*_\xi \), is a minimizer of the minimization problem defining \(\psi (\xi ,x)\) for almost all \(\xi \in \Xi \). We note that \(\psi (\xi ,x)\) is also a Carathéodory function: the measurability with respect to \(\xi \) for each fixed x follows from [58, Theorem 14.37], while the continuity with respect to x for almost all \(\xi \) follows from the compactness of \(C_\xi \) and the continuity of \((u,x)\mapsto r(\xi ,u) + \frac{1}{2}\langle u-x,B(u-x)\rangle \). In addition, thanks to Lemma 3.5, it is not hard to check that the objective in (15) is finite at any \(x\in D\) under Assumptions 3.2 & 3.4.

We show next that both (4) and (11) (and hence (15)) are solvable, and discuss their relationship.

Theorem 3.7

(Solvability) Suppose Assumptions 3.2 and 3.4 hold. Then problems (4) and (11) are solvable. Let \(\nu _1\) and \(\nu _2\) be the optimal values of (4) and (11), respectively. Then \(\nu _1\ge \nu _2\). Moreover, if for any \(x\in D\) and \(x+Wy\), \(x+Wz\in C_\xi \), we have

$$\begin{aligned} |r(\xi , x+Wy)-r(\xi , x+Wz)|\le \frac{1}{2}|\langle y,Hy\rangle -\langle z,Hz\rangle |, \quad \xi \in \Xi , \quad \mathrm{a.s.} \end{aligned}$$
(16)

then the two problems have the same optimal values.

Proof

According to Lemma 3.5 and the preceding discussions, the optimal value of both problems (4) and (11) are finite. We show that the values are attained.

We first show that the optimal value of problem (4) is attained. To this end, consider \(\Vert x\Vert \rightarrow \infty \). Then from the boundedness of \(C^\dagger \supseteq C_\xi \), it follows that \(\Vert y^*_\xi \Vert \) in (4) must go to \(\infty \) uniformly in \(\xi \) except for a set of measure zero, which implies that \(\mathbb {E}[Q(\xi , x)]\) goes to \(\infty \) in (4). This together with the nonnegativity of r and \(\theta \) shows that the objective function of (4), as a function of x, is level bounded. Next, we show that the objective is lower semicontinuous. To this end, consider a sequence \(x_k\rightarrow x_0\). From the discussions in (8) to (10) and using the continuity of weighted projections, we see that \(u(\xi ,x_k)\rightarrow u(\xi ,x_0)\) for almost all \(\xi \in \Xi \). This implies the continuity of \(x\mapsto r(\xi ,u(\xi ,x)) +Q(\xi , x)\), which is a nonnegative function, and Fatou’s lemma gives the lower semicontinuity of \(\mathbb {E}[r(\xi ,u(\xi ,x)) +Q(\xi , x)]\). The lower semicontinuity of the objective of (4) now follows upon recalling that \(\theta (x)\) is continuous. Hence (4) has a solution \(x^*\), from which one can easily construct the second stage solution \(y^*_\xi \).

We now show that the optimal value of problem (11) is attained. Note that from the discussion preceding this lemma, one can equivalently consider problem (15). For this latter problem, observe that

$$\begin{aligned} \varphi (x) \ge \lambda \mathbb {E}[\psi (\xi ,x)]\ge \frac{\lambda }{2}\mathbb {E}[\mathop {\mathrm{min}}\nolimits _{u\in C_\xi }\langle u-x,B(u-x)\rangle ]\ge \frac{\lambda }{2}\mathop {\mathrm{min}}\nolimits _{u\in C^\dagger }\langle u-x,B(u-x) \rangle , \end{aligned}$$

where the first two inequalities follow from the nonnegativity of the residual functions, and the third inequality follows from \(C_\xi \subseteq C^\dagger \). The above relation together with the positive definiteness of B and the compactness of \(C^\dagger \) shows that the objective function of (15) is level bounded. We note also that the objective is lower semicontinuous, which is a consequence of the continuity of \(\theta (x)\), the continuity of \(\psi (\xi ,x)\) in x and Fatou’s lemma. Hence an optimal solution \(x^*\) exists, from which the corresponding \(u^*_\xi \) that minimizes the subproblem defining \(\psi (\xi ,x)\) can be obtained easily. From this and the relationship between the solutions of (11) and (15), one concludes that a solution of problem (11) exists.

Next, we consider the relationship between the optimal values of (4) and (11), which we denote by \(\nu _1\) and \(\nu _2\), respectively. Obviously, \(\nu _1\ge \nu _2\).

Suppose in addition that (16) holds, then \(\nu _1 = \nu _2\). To prove it, let \((x^2, u^2_\xi )\) be a solution of (11) and \(x^1\) be a solution of (4). Set

$$\begin{aligned} y^{2*}_\xi =\mathop {\mathrm{argmin}}\limits \left\{ \, \frac{1}{2} \langle y,Hy \rangle \, \bigg | \, x^2+ Wy \in C_\xi \right\} \end{aligned}$$

and \(y^2_\xi =H^{-1}W^\top B(u^2_\xi -x^2).\) Since \(u^2_\xi =x^2+Wy^2_\xi \in C_\xi \), we have

$$\begin{aligned} \langle y^{2*}_\xi ,Hy^{2*}_\xi \rangle -\langle y^2_\xi ,Hy^2_\xi \rangle \le 0. \end{aligned}$$

Using this inequality, condition (16) and \(\langle y_\xi ^2,Hy_\xi ^2\rangle =\langle u^2_\xi -x^2,B(u^2_\xi -x^2)\rangle \) yield

$$\begin{aligned}&\nu ^1=\theta (x^1)+ \lambda \mathbb {E}[r(\xi , u(\xi , x^1))+Q(\xi , x^1)]\\&\quad \le \theta (x^2) + \lambda \mathbb {E}[r(\xi , u(\xi , x^2))+Q(\xi , x^2)]\\&\quad = \theta (x^2)+ \lambda \mathbb {E}[r(\xi , x^2+Wy_\xi ^{2*})+\frac{1}{2}\langle y^{2*}_\xi ,Hy^{2*}_\xi \rangle ]\\&\quad \le \theta (x^2) + \lambda \mathbb {E}[r(\xi , u^2_\xi )+\frac{1}{2}\langle y^{2}_\xi ,Hy^{2}_\xi \rangle ]\\&\qquad +\, \lambda \mathbb {E}[ |r(\xi , x^2+Wy_\xi ^{2*})- r(\xi , x^2+Wy_\xi ^{2})| -\frac{1}{2}|\langle y^{2*}_\xi ,Hy^{2*}_\xi \rangle - \langle y^{2}_\xi ,Hy^{2}_\xi \rangle |]\\&\quad \le \theta (x^2)+ \lambda \mathbb {E}[ r(\xi , u^2_\xi )+\frac{1}{2}\langle u^2_\xi -x^2,B(u^2_\xi -x^2)\rangle ]=\nu ^2. \end{aligned}$$

Therefore, problems (4) and (11) have the same optimal values. \(\square \)

Similarly to Lemma 3.5, we now study the differentiability of the objective function of problem (11). Let

$$\begin{aligned} f(\xi , x, u)=r(\xi ,u) + \frac{1}{2}\langle u-x, B(u-x)\rangle , \end{aligned}$$

where r is given by (3). The proof of the following lemma is similar to that of Lemma 3.5 and will thus be omitted.

Proposition 3.8

Suppose that Assumptions 3.2 and 3.4 hold. Then for almost all \(\xi \), f is continuously differentiable and its gradient is given by

$$\begin{aligned} \nabla f(\xi , x, u)=\left( \begin{array}{l} \nabla _{u} f(\xi , x, u)\\ \nabla _x f(\xi , x, u) \end{array}\right) = \left( \begin{array}{l} \nabla r(\xi , u) + B(u-x)\\ B(x-u) \end{array}\right) . \end{aligned}$$
(17)

Moreover, \(f(\xi , x, u_\xi )\) and \(\nabla f(\xi , x, u_\xi )\) are measurable in \(\xi \). Furthermore, for any compact set \(\bar{D}\subseteq D\), there are \(d^f, \rho ^f \in \mathcal{L}_1^1\) such that \(\Vert f(\xi , x, u_\xi )\Vert \le d^f_\xi \) and \(\Vert \nabla f(\xi , x, u_\xi )\Vert \le \rho ^f_\xi \) whenever \(u_\xi \in C_\xi \) and \(x\in \bar{D}\).

In general, a two-stage stochastic VI cannot be reformulated as a convex two-stage stochastic optimization problem. Below, we show that convexity is inherited partially in f from a linear two-stage stochastic optimization problem. This proposition will be used in the next section to establish, for this particular class of problems, a stronger convergence result for linear two-stage stochastic optimization problems.

Proposition 3.9

Under Assumptions 3.2 and 3.4, if for \(\xi \in \Xi \), \(F(\xi , x)=M_\xi x+q_\xi \) and \(\inf _{\xi \in \Xi }\lambda _{\min }(B+M_\xi +M_\xi ^\top ) > \alpha \), then \(f(\xi ,x, \cdot )\) is strongly convex for each \(x\in D\), with a strong convexity modulus independent of x and \(\xi \). Moreover, if \(\inf _{\xi \in \Xi }\lambda _{\min }(M_\xi +M_\xi ^\top ) \ge \alpha \), then \(f(\xi ,\cdot ,\cdot )\) is convex.

Proof

For a fixed \(\xi \), notice that the function

$$\begin{aligned} f(\xi , x, u) =\max _{v\in C_\xi } \langle u-v, M_\xi u +q_\xi \rangle -\frac{\alpha }{2}\Vert u-v\Vert ^2+\frac{1}{2}\langle u-x,B(u-x)\rangle \end{aligned}$$

is convex in u if the function

$$\begin{aligned} \varphi (\xi , x, u,v)=\langle u-v,M_\xi u +q_\xi \rangle -\frac{\alpha }{2}\Vert u-v\Vert ^2+\frac{1}{2}\langle u-x,B(u-x) \rangle \end{aligned}$$

is convex in u for any fixed \(v\in {\mathbb {R}}^n\) and \(x\in {\mathbb {R}}^n\). Since

$$\begin{aligned} \nabla _{uu}^2\varphi (\xi ,x, u,v)= M_\xi +M_\xi ^\top -\alpha I + B \end{aligned}$$

is positive definite, \(\varphi (\xi , x, \cdot ,v)\) and thus \(f(\xi , x, \cdot )\) is convex. The independence of the modulus on x can be seen from the fact that \(\nabla ^2_{uu}\varphi (\xi ,x,u,v)\) does not depend on x, while the independence on \(\xi \) can be seen from the assumption on the eigenvalue. Finally, when \(\inf _{\xi \in \Xi }\lambda _{\min }(M_\xi +M_\xi ^\top ) \ge \alpha \), the convexity of \(f(\xi , \cdot , \cdot )\) in (xu) follows from the positive semi-definiteness of the matrix

$$\begin{aligned} \left( \begin{array}{c@{\quad }c} \nabla ^2_{uu}\varphi (\xi ,x, u,v) &{} \nabla ^2_{ux}\varphi (\xi ,x, u,v)\\ \nabla ^2_{xu}\varphi (\xi ,x, u,v) &{} \nabla ^2_{xx}\varphi (\xi ,x, u,v)\end{array}\right) =\left( \begin{array}{cc} M_\xi +M_\xi ^\top -\alpha I + B &{} -B\\ -B &{} B \end{array}\right) \in {\mathbb {R}}^{2n\times 2n}. \end{aligned}$$

This completes the proof. \(\square \)

Remark 3.10

In [1], Agdeppa, Yamashita and Fukushima defined a convex ERM formulation for the stochastic linear VI by using the regularized gap function under the assumption that \(\inf _{\xi \in \Xi }\lambda _{\min }(M_\xi +M_\xi ^\top ) > \alpha \). In [10], Chen, Wets and Zhang defined a convex ERM formulation for the stochastic linear VI by using the gap function under a weaker assumption that \(E[ M_\xi +M^\top _\xi ]\) is positive semi-definite. In this paper, Lemma 3.9 shows that for any stochastic linear VI, we can choose a positive definite matrix B to define a convex optimization problem in the second stage of the generalized ERM formulation (11).

To end this section, we include a corollary concerning the entire objective function of (11).

Corollary 3.11

Suppose that Assumptions 3.2 and 3.4 hold. Then the objective function \(\theta (x)+\lambda \mathbb {E}[f(\xi , x, u_\xi )]\) is well defined at any \(x\in D\) and any measurable function \(u_\xi \in C_\xi \). Moreover, if \(G(x)=\mathbb {E}[M_\xi ] x + \mathbb {E}[q_\xi ]\), then the objective function is convex under all the assumptions of Proposition 3.9.

4 Convergence analysis

In this section, we discuss a sample average approximation (SAA) for (15) (and hence, equivalently, (11)) and derive its convergence. Let \(G(x)=\mathbb {E}[F(\xi , x)]\) and \(\xi ^1, \ldots , \xi ^\nu \) be iid (independent and identically distributed) samples of \({\varvec{\xi }}\) and

$$\begin{aligned} \begin{array}{l} {\displaystyle G^\nu (x)= \frac{1}{\nu } \sum ^\nu _{i=1} F(\xi ^i,x)}\\ {\displaystyle \theta ^\nu (x)= \max _{v\in D} \langle x-v,G^\nu (x)\rangle -\frac{\alpha }{2}\Vert x-v\Vert ^2} \\ {\displaystyle \psi (\xi ^i, x)=\mathop {\mathrm{min}}\nolimits _{ u \in C_{\xi ^i}} r(\xi ^i,u) + \frac{1}{2} \langle u-x, B(u-x)\rangle }. \end{array} \end{aligned}$$

Note that \(G^\nu \) and \(\theta ^\nu \) are continuous P-a.s. for all \(\nu \), and recall that \(\psi \) is a Carathéodory function; cf. the discussion following (15). We consider the following SAA of (15) (and hence, equivalently, (11)):

$$\begin{aligned} {\displaystyle \mathop {\mathrm{min}}\nolimits _{x\in D}\varphi ^\nu (x) := \theta ^ \nu (x)+ \frac{\lambda }{\nu }\sum ^\nu _{i=1}\psi (\xi ^i, x)}. \end{aligned}$$
(18)

Let \(X^*\) and \(X^\nu \) be the solution sets of problems (15) and (18). We will give sufficient conditions for that \(X^*\) and \(X^\nu \) to be nonempty and bounded, and for any cluster point of a sequence \(\{x^\nu \}_{x^\nu \in X^\nu }\) to be in \(X^*\).

Lemma 4.1

Under Assumptions 3.2 and 3.4, \(X^*\) and \(X^\nu \) are nonempty P-a.s. for all \(\nu \). Moreover, there exists a compact convex set \(\bar{D}\subseteq D\) so that \(X^*\subseteq \bar{D}\) and \(X^\nu \subseteq \bar{D}\) P-a.s. for all \(\nu \).

Proof

From the nonnegativity of \(r(\xi ,u)\), we have for almost all \(\xi \in \Xi \),

$$\begin{aligned} \psi (\xi , x)= & {} \mathop {\mathrm{min}}\nolimits _{u\in C_\xi } r(\xi ,u) + \frac{1}{2} \langle u-x,B(u-x) \rangle \\\ge & {} \mathop {\mathrm{min}}\nolimits _{u\in C_\xi } \frac{1}{2} \langle u-x, B(u-x)\rangle \\\ge & {} \frac{\lambda _B}{2}\mathop {\mathrm{min}}\nolimits _{u\in C^\dagger } \Vert u - x\Vert ^2 \\= & {} \frac{\lambda _B}{2} \Vert \mathop {\mathrm{prj}}\nolimits _{C^\dagger } x -x\Vert ^2, \end{aligned}$$

where \(\lambda _B > 0\) is the smallest eigenvalue of the matrix B. On the other hand, recall from Assumption 3.2 that there exists \(\tau > 0\) so that \(\Vert u\Vert \le \tau \) for all \(u\in C^\dagger \), and from Assumption 3.4 that \(\Vert F(\xi ,u)\Vert \le d_\xi \) for some \(d\in \mathcal{L}_1^{\scriptscriptstyle \infty }\). Hence, for a fixed \(x_0 \in D\), we have

$$\begin{aligned} \psi (\xi ^\nu ,x_0)\le & {} \sup _{u\in C^\dagger } r(\xi ^\nu ,u) + \frac{\Vert B\Vert }{2}(\tau + \Vert x_0\Vert )^2\\\le & {} 2\tau d_{\xi ^\nu } + \frac{\alpha }{2}(2\tau )^2 + \frac{\Vert B\Vert }{2}(\tau + \Vert x_0\Vert )^2 \le \gamma _1 \end{aligned}$$

for some constant \(\gamma _1\) P-a.s. for all \(\nu \), since \(d\in \mathcal{L}_1^{\scriptscriptstyle \infty }\). Furthermore, we have

$$\begin{aligned} \theta ^\nu (x_0)= & {} \left\langle x_0-\mathop {\mathrm{prj}}\nolimits _{D}\left( x_0-\frac{1}{\alpha }G^\nu (x_0)\right) , G^\nu (x_0)\right\rangle \\&-\frac{\alpha }{2}\left\| x_0-\mathop {\mathrm{prj}}\nolimits _{D}\left( x_0-\frac{1}{\alpha }G^\nu (x_0)\right) \right\| ^2 \le \gamma _2 \end{aligned}$$

P-a.s.for all \(\nu \), since \(F(\cdot ,x_0)\in \mathcal{L}_1^{\scriptscriptstyle \infty }\).

Hence we have

$$\begin{aligned} \begin{aligned}&\{x\in D \, | \, \varphi ^\nu (x) \le \varphi ^\nu (x_0) \} \subseteq \bigg \{x\in D \, \bigg | \, \frac{\lambda }{\nu }\sum _{i=1}^\nu \psi (\xi ^i,x) \le \varphi ^\nu (x_0) \bigg \} \\&\quad \subseteq \bigg \{x\in D \, \bigg | \, \frac{\lambda }{\nu }\sum _{i=1}^\nu \psi (\xi ^i,x) \le \lambda \gamma _1 + \gamma _2 \bigg \}\\&\quad \subseteq \{ x\in D \, | \, \lambda \cdot \lambda _B\Vert \mathop {\mathrm{prj}}\nolimits _{C^\dagger } x -x\Vert ^2\le 2(\lambda \gamma _1 + \gamma _2) \} \end{aligned} \end{aligned}$$
(19)

P-a.s. for all \(\nu \), and that

$$\begin{aligned} \{x\in D \, | \, \varphi (x) \le \varphi (x_0) \} \subseteq \{ x\in D \, | \, \lambda \cdot \lambda _B\Vert \mathop {\mathrm{prj}}\nolimits _{C^\dagger } x -x\Vert ^2\le 2\varphi (x_0) \}. \end{aligned}$$
(20)

These, together with the boundedness of \(C^\dag \), imply that the level sets of \(\varphi ^\nu (\cdot )\) and \(\varphi (\cdot )\), are bounded and are included in a compact set that is independent of \(\xi \), P-a.s. for all \(\nu \). Moreover, the objective functions of (15) and (18) are all lower semicontinuous, and \(\theta \), \(\theta ^\nu \), \(\psi (\xi , x)\) are nonnegative. Hence \(X^*\) and \(X^\nu \) are nonempty P-a.s. for all \(\nu \). Finally, from (19) and (20), one can choose for \(\bar{D} = \{ x\in D \, | \, \lambda \cdot \lambda _B\Vert \mathop {\mathrm{prj}}\nolimits _{C^\dagger } x -x\Vert ^2\le 2\max \{\varphi (x_0),\lambda \gamma _1 + \gamma _2\} \}\). \(\square \)

Theorem 4.2

(Convergence theorem) Suppose Assumptions 3.2 and 3.4 hold. Then \(\varphi ^\nu \) converges to \(\varphi \) a.s.-uniformly on the compact set \(\bar{D}\) specified in Lemma 4.1. Let \(\{x^\nu \}\) be a sequence of minimizers of problems (18) generated by iid samples. Then \(\{x^\nu \}\) is P-a.s. bounded and any accumulation point \(x^*\) of \(\{x^\nu \}\) as \(\nu \rightarrow \infty \) is P-a.s. a solution of (15).

Proof

From Lemma 4.1, the convex compact set \(\bar{D}\subseteq D\) is such that \(X^*\subseteq \bar{D}\),

$$\begin{aligned} \mathop {\mathrm{min}}\nolimits _{x\in \bar{D}} \varphi (x)=\mathop {\mathrm{min}}\nolimits _{x\in D} \varphi (x) = \mathop {\mathrm{min}}\nolimits _{x\in {\mathbb {R}}^n} \varphi (x) +\delta _D(x) \end{aligned}$$

and for P-a.s. all \(\nu \),

$$\begin{aligned} \mathop {\mathrm{min}}\nolimits _{x\in \bar{D}} \varphi ^\nu (x)=\mathop {\mathrm{min}}\nolimits _{x\in D} \varphi ^\nu (x)= \mathop {\mathrm{min}}\nolimits _{x\in {\mathbb {R}}^n} \varphi ^\nu (x) +\delta _D(x). \end{aligned}$$

Note that \(\varphi ^\nu (x) = \theta ^\nu (x)+ \frac{\lambda }{\nu }\sum ^\nu _{i=1}\psi (\xi ^i, x)\). To analyze the convergence, we first show that \(\theta ^\nu (x)\) converges to \(\theta (x)\) P-a.s. uniformly on \(\bar{D}\). Since \(\bar{D}\) is a nonempty compact subset of \({\mathbb {R}}^n\), \(F(\xi , \cdot )\) is continuous at x for almost every \(\xi \in \Xi \), \(\Vert F(\xi , x)\Vert \le d_\xi \) for some \(d\in \mathcal{L}_1^{\scriptscriptstyle \infty }\) due to Assumption 3.4, and the sample is iid, we can apply Theorem 7.48 in [62] to claim that \(G^\nu (x)\) converges to G(x) a.s.-uniformly on \(\bar{D}\), that is, for any \(\varepsilon >0\), there is \(\hat{\nu }\) such that for any \(\nu \ge _{a.s.} \hat{\nu }\), one has

$$\begin{aligned} \mathop {\mathrm{sup}}\nolimits _{x\in \bar{D}}|G(x)-G^\nu (x)|< \varepsilon . \end{aligned}$$

Moreover, from the definition of \(\theta \) in (1), we have

$$\begin{aligned} \theta (x)= & {} \langle x-v(x), G(x) \rangle -\frac{\alpha }{2}\Vert x-v(x)\Vert ^2\\ \theta ^\nu (x)= & {} \langle x-v^\nu (x),G^\nu (x) \rangle -\frac{\alpha }{2}\Vert x-v^\nu (x)\Vert ^2, \end{aligned}$$

where

$$\begin{aligned} v(x)=\mathop {\mathrm{prj}}\nolimits _{D}\left( x-\frac{1}{\alpha }G(x)\right) \quad \mathrm{and} \quad v^\nu (x)=\mathop {\mathrm{prj}}\nolimits _{D}\left( x-\frac{1}{\alpha }G^\nu (x)\right) . \end{aligned}$$

Obviously, one has for \(\nu \ge _{a.s.} \hat{\nu }\),

$$\begin{aligned} \mathop {\mathrm{sup}}\nolimits _{x\in \bar{D}}\Vert v(x)-v^\nu (x)\Vert \le \mathop {\mathrm{sup}}\nolimits _{x\in \bar{D}}\frac{1}{\alpha }\Vert G(x)-G^\nu (x)\Vert <\frac{\varepsilon }{\alpha }. \end{aligned}$$

From the boundedness of \(\bar{D}\) and the continuity of G, there is \(\gamma >0\) such that \(\max \{\Vert z\Vert ,\Vert v(z)\Vert ,\Vert G(z)\Vert \}\le \gamma \) for any \(z\in \bar{D}\). Hence we obtain for \(\nu \ge _{ a.s.} \hat{\nu }\), that \(\Vert v^\nu (z)\Vert \le \gamma + \frac{\varepsilon }{\alpha }\) for any \(z\in \bar{D}\) and

$$\begin{aligned} \mathop {\mathrm{sup}}\nolimits _{x\in \bar{D}}|\theta (x)-\theta ^\nu (x)|\le & {} \mathop {\mathrm{sup}}\nolimits _{x\in \bar{D}} (\Vert v^\nu (x)-x\Vert \Vert G(x)-G^\nu (x)\Vert \\&+\Vert v(x)-v^\nu (x)\Vert \Vert G(x)\Vert \\&+ \frac{\alpha }{2}(\Vert x-v(x)\Vert +\Vert x-v^\nu (x)\Vert )\Vert v(x)-v^\nu (x)\Vert )\\< & {} (4 +\frac{1}{\alpha } ) \gamma \varepsilon +\frac{3\varepsilon ^2}{2\alpha }. \end{aligned}$$

Hence, \(\theta ^\nu (x)\) converges to \(\theta (x)\) a.s.-uniformly on \(\bar{D}\). Similarly, again by using Theorem 7.48 in [62], we can show that \(\frac{1}{\nu }\sum ^\nu _{i=1}\psi (\xi ^i, x)\rightarrow \mathbb {E}[\psi (\xi ,x)]\) a.s.-uniformly on \(\bar{D}\). Consequently, \(\varphi ^\nu \) converges to \(\varphi \) a.s.-uniformly on \(\bar{D}\) as claimed.

Combining the convergence result with Theorem 7.11, Theorem 7.14 and Theorem 7.31 in [58], we obtain further that

$$\begin{aligned} \limsup _{\nu \rightarrow \infty } \mathrm{argmin}_{x\in \bar{D}} \varphi ^\nu (x) \subseteq \mathrm{argmin}_{x\in \bar{D}} \varphi (x). \end{aligned}$$

Since \(\bar{D}\) is compact, we conclude further that \(\{x^\nu \}\) is a.s.  bounded and any accumulation point of \(\{x^\nu \}\), as \(\nu \rightarrow \infty \), is a.s.  a solution \(x^*\) of (15). \(\square \)

It is worth noting that the function

$$\begin{aligned} \varphi (x)=\big (\max _{v\in D} \langle x-v, \mathbb {E}[F(\xi ,x)]\rangle -\frac{\alpha }{2}\Vert x-v\Vert ^2\big )+\lambda \mathbb {E}[\psi (\xi ,x)] \end{aligned}$$

is the sum of an EV formulation for the first stage and an ERM formulation for the second stage of the two stage stochastic VI. The convergence analysis of SAA for the EV formulation of a single stage stochastic VI in [62] cannot be directly applied to \(\varphi ^\nu \).

We next present a result in the presence of convexity: when \(F(\xi ,\cdot )\) is an affine mapping, we can obtain further results if a first-order growth condition is satisfied. For the same reason explained above, the corresponding results in [62] cannot be directly applied.

Theorem 4.3

(Convergence theorem for the convex problem) In addition to Assumptions 3.2 and 3.4, suppose that for \(\xi \in \Xi \), \(F(\xi , u)=M_\xi u+q_\xi \). Let \(\theta (x)\) be defined by (1) with \(G(x) = \mathbb {E}(F(\xi ,x))\) and \(\inf _{\xi \in \Xi }\lambda _{\mathop {\mathrm{min}}\nolimits }(M_\xi +M_\xi ^\top ) > \alpha \). Then \(\varphi \) and \(\varphi ^\nu \) are strongly convex P-a.s. for all \(\nu \). Let \(\{x^\nu \}\) be a sequence of minimizers of problems (18) and the samples be iid. Then \(\varphi ^\nu \) converges to \(\varphi \) a.s.-uniformly on the compact set \(\bar{D}\) specified in Lemma 4.1. Moreover, let \(x^*\) be the unique solution of (15) such that

$$\begin{aligned} \varphi (x)\ge \varphi (x^*)+c\Vert x-x^*\Vert , \,\,\, \forall \, x\in \bar{D}, \end{aligned}$$
(21)

where \(c>0\) is a constant.Footnote 8 Then a.s., we have \(x^\nu =x^*\) for all sufficiently large \(\nu \).

Proof

From Theorem 3.2 in [26], the functions \(\theta \) and \(\theta ^\nu \) defined by the regularized gap function are strongly convex and continuously differentiable P-a.s. for all \(\nu \). Moreover, we also have the convexity of \(\psi (\xi ,\cdot )\) and \(\psi ^\nu (\xi ,\cdot )\) from the second conclusion in Proposition 3.9. Hence \(\varphi \) and \(\varphi ^\nu \) are strongly convex P-a.s. for all \(\nu \).

The a.s.-uniform convergence of \(\varphi ^\nu \) to \(\varphi \) on \(\bar{D}\) is obtained from Theorem 4.2. Alternatively, we have a much simpler proof thanks to the presence of convexity: since \(\varphi ^\nu (x)\rightarrow \varphi (x)\) at each \(x\in \bar{D}\) a.s., we have \(\varphi ^\nu (x) \rightarrow \varphi (x)\) for every x in any countable dense subset of \(\bar{D}\) a.s., and consequently, using Theorem 10.8 from [51], we can also conclude that \(\varphi ^\nu \) converges to \(\varphi \) a.s.-uniformly on \(\bar{D}\).

Next, assume in addition that (21) holds at the unique solution \(x^*\) of (15). We first recall from [62, Theorem 7.54] that using the convexity and the differentiability of \(\varphi ^\nu (\cdot )\) and \(\varphi (\cdot )\), we have \(\nabla \varphi ^\nu (x^*)\) converging to \(\nabla \varphi (x^*)\) a.s.-uniformly on the unit sphere. Combining this with (21), we have for large enough \(\nu \)

$$\begin{aligned} \langle \nabla \varphi ^\nu (x^*), h \rangle \ge \frac{c}{2}\Vert h\Vert , \quad \quad \forall \, h\in \mathcal{T}_{\bar{D}}(x^*), \end{aligned}$$

where \(\mathcal{T}\) is the tangent cone to \(\bar{D}\) at \(x^*\). This implies that \(x^*\) is the unique minimizer of \(\varphi ^\nu \) on \(\bar{D}\) and hence \(x^\nu = x^*\). \(\square \)

We close this section with the following result concerning a Lipschitz continuity property of \(\psi \) when \(F(\xi ,\cdot )\) is affine. The conclusion might be useful for future sensitivity analysis.

Proposition 4.4

In addition to Assumptions 3.2 and 3.4, if for \(\xi \in \Xi \), \(F(\xi , u)=M_\xi u+q_\xi \) and \(\inf _{\xi \in \Xi }\lambda _{\mathop {\mathrm{min}}\nolimits }(B+M_\xi +M_\xi ^\top ) > \alpha \), then for any arbitrary compact set \(\bar{D}\subseteq D\), there is a measurable function \(\kappa : \Xi \rightarrow {\mathbb {R}}_+\) with \(\kappa \in \mathcal{L}_1^1\) such that for any \(x, z\in \bar{D}\subseteq D\), we have

$$\begin{aligned} |\psi (\xi , x)-\psi (\xi , z)|\le \kappa (\xi )\Vert x-z\Vert . \end{aligned}$$

Proof

Let \(\bar{D}\subseteq D\) be an arbitrary compact set. For any x, \(z\in D\), and \(\xi \in \Xi \), let

$$\begin{aligned} \bar{u}=\mathrm{arg}\mathop {\mathrm{min}}\nolimits _{u\in C_\xi } f(\xi , x, u) \quad \mathrm{and} \quad \bar{w}=\mathrm{arg}\mathop {\mathrm{min}}\nolimits _{w\in C_\xi } f(\xi ,z, w). \end{aligned}$$
(22)

In particular, we have from the definition of \(\psi \) that

$$\begin{aligned} \psi (\xi , x)=r(\xi ,\bar{u})+\frac{1}{2}\langle \bar{u}-x, B(\bar{u}-x) \rangle \quad \mathrm{and} \quad \psi (\xi , z)=r(\xi ,\bar{w})+\frac{1}{2}\langle \bar{w}-z,B(\bar{w}-z) \rangle . \end{aligned}$$
(23)

From Proposition 3.9, \(f(\xi , x, \cdot )\) is strongly convex. Using this, Proposition 3.8, (22) and the first-order optimality conditions, we have

$$\begin{aligned} \langle \nabla r(\xi , \bar{u})+B(\bar{u}-x),\bar{w}-\bar{u} \rangle \ge 0\ \; \mathrm{and}\; \; \langle \nabla r(\xi , \bar{w})+B(\bar{w}-z),\bar{u}-\bar{w}\rangle \ge 0. \end{aligned}$$

Adding these two inequalities, we obtain that

$$\begin{aligned} -\langle \nabla r(\xi , \bar{u})-\nabla r(\xi , \bar{w})+B(\bar{u}-\bar{w}), \bar{u}-\bar{w} \rangle + \langle x-z, B(\bar{u}-\bar{w})\rangle \ge 0. \end{aligned}$$

Combining this with the strong convexity of f with respect to u established in Proposition 3.9, we have further that

$$\begin{aligned} \sigma \Vert \bar{u}-\bar{w}\Vert ^2\le \langle \nabla r(\xi ,\bar{u})-\nabla r(\xi , \bar{w})+B(\bar{u}-\bar{w}), \bar{u}-\bar{w} \rangle \le \Vert B\Vert \Vert x-z\Vert \Vert \bar{u}-\bar{w}\Vert , \end{aligned}$$

where \(\sigma > 0\) is independent of x, z and \(\xi \). Hence, we obtain that the solution is Lipschitz continuous, that is,

$$\begin{aligned} \Vert \bar{u}-\bar{w}\Vert \le \frac{\Vert B\Vert }{\sigma }\Vert x-z\Vert , \end{aligned}$$
(24)

whenever x, \(z\in D\). Next, by the definition of r and Theorem 3.3, we have

$$\begin{aligned} r(\xi ,u)= & {} \left\langle u-\mathop {\mathrm{prj}}\nolimits _{C_\xi }\left( u-\frac{1}{\alpha }F(\xi , u)\right) , F(\xi , u)\right\rangle \\&-\frac{\alpha }{2}\left\| u-\mathop {\mathrm{prj}}\nolimits _{C_\xi }\left( u-\frac{1}{\alpha }F(\xi , u)\right) \right\| ^2. \end{aligned}$$

Since the projection and F are Lipschitz continuous, \(\Vert F(\xi ,u)\Vert \le d_\xi \) and \(\Vert \nabla F(\xi ,u)\Vert \le \rho _\xi \) for some \(d\in \mathcal{L}_1^{\scriptscriptstyle \infty }\) and \(\rho \in \mathcal{L}_1^1\), and \(C_\xi \subseteq C^\dagger \) for almost all \(\xi \in \Xi \), there is \(\kappa _1\in \mathcal{L}_1^1\) such that whenever \(u,w\in C_\xi \),

$$\begin{aligned} |r(\xi , u)-r(\xi , w)|\le \kappa _1(\xi )\Vert u-w\Vert . \end{aligned}$$
(25)

Finally, since \(\bar{D}\) and \(C^\dagger \) are bounded, there is \(L>0\) such that for any \(y\in \bar{D}\cup C^\dagger \), we have \(\Vert y\Vert \le L/2.\)

Combining this with (23), (24) and (25), we obtain the following Lipschitz continuity property of \(\psi \) for any x, \(z\in \bar{D}\):

$$\begin{aligned} |\psi (\xi , x)-\psi (\xi , z)|\le \left[ \kappa _1(\xi )\frac{\Vert B\Vert }{\sigma }+L\Vert B\Vert \left( 1+\frac{\Vert B\Vert }{\sigma }\right) \right] \Vert x-z\Vert . \end{aligned}$$

This completes the proof. \(\square \)

Proposition 4.4 provides sufficient conditions for the Lipschitz continuity of \(\psi \) with modulus \(\kappa (\xi )\), which can be used for deriving convergence rate. As an illustration, suppose in addition that D is compact. Then, under the assumptions of Proposition 4.4, we have for any \(x_1\), \(x_2\), \(v_1\), \(v_2\in D\) and all \(\xi \in \Xi \) that

$$\begin{aligned}&|\langle x_1 - v_1, F(\xi ,x_1)\rangle - \langle x_2 - v_2, F(\xi ,x_2)\rangle |\\&\quad = |\langle x_1 - v_1, F(\xi ,x_1) - F(\xi ,x_2)\rangle + \langle (x_1 - x_2) - (v_1 - v_2), F(\xi ,x_2)\rangle |\\&\quad = |\langle x_1 - v_1, M_\xi (x_1-x_2)\rangle + \langle (x_1 - x_2) - (v_1 - v_2), M_\xi x_2+q_\xi \rangle |\\&\quad \le 2r_D\Vert M_\xi \Vert \Vert x_1 - x_2\Vert + (r_D\Vert M_\xi \Vert + \Vert q_\xi \Vert )(\Vert x_1 - x_2\Vert + \Vert v_1 - v_2\Vert )\\&\quad \le (3r_D\Vert M_\xi \Vert + \Vert q_\xi \Vert )(\Vert x_1 - x_2\Vert + \Vert v_1 - v_2\Vert )\\&\quad \le \zeta (\xi )\Vert (x_1 - x_2,v_1 - v_2)\Vert , \end{aligned}$$

where \(r_D:= \sup _{x\in D}\Vert x\Vert \) and \(\zeta (\xi ):= \sqrt{2}(3r_D\Vert M_\xi \Vert + \Vert q_\xi \Vert )\).

Thus, if we further assume that

$$\begin{aligned} \mathbb {E}[e^{t(\zeta _\xi + \lambda \kappa _\xi )}] <\infty \ \text { for all { t} close to 0}, \end{aligned}$$
(26)

and for any x, \(v\in D\),

$$\begin{aligned} \mathbb {E}[e^{t( \langle x - v,F(\xi ,x)\rangle + \lambda \psi (\xi ,x)- \mathbb {E}[\langle x - v,F(\xi ,x)\rangle + \lambda \psi (\xi ,x)])}] <\infty \ \text { for all { t} close to 0}, \end{aligned}$$
(27)

then for any \(\varepsilon >0\), there exist positive constants \(c(\varepsilon )\) and \(\beta (\varepsilon )\) such that

$$\begin{aligned}&\mathrm{Prob} \bigg (\sup _{x\in D} \bigg | \varphi ^\nu (x)- \varphi (x)\bigg |\ge \varepsilon \bigg )\\&\quad = \mathrm{Prob}\left( \sup _{x\in D}\left| \max _{v\in D}\left\{ \left\langle x-v,\frac{1}{\nu }\sum _{i=1}^\nu F(\xi ^i,x)\right\rangle - \frac{\alpha }{2}\Vert x-v\Vert ^2\right\} + \frac{\lambda }{\nu }\sum _{i=1}^\nu \psi (\xi ^i,x)\right. \right. \\&\qquad - \left. \left. \max _{v\in D}\left\{ \left\langle x-v,G(x)\right\rangle - \frac{\alpha }{2}\Vert x-v\Vert ^2\right\} - \lambda \mathbb {E}[\psi (\xi ,x)]\right| \ge \varepsilon \right) \\&\quad \le \mathrm{Prob} \left( \sup _{x,v\in D} \left| \left\langle x-v,\frac{1}{\nu }\sum _{i=1}^\nu F(\xi ^i,x) - G(x)\right\rangle + \frac{\lambda }{\nu }\sum _{i=1}^\nu \psi (\xi ^i,x)- \lambda \mathbb {E}[\psi (\xi , x)]\right| \right. \\&\left. \quad \ge \varepsilon \right) \le c(\varepsilon )e^{-\nu \beta (\varepsilon )} \end{aligned}$$

for all \(\nu \), where the first inequality follows from the elementary relation that \(|\max _{v\in D}g(v) - \max _{v\in D}h(v)|\le \max _{v\in D}|g(v) - h(v)|\) for any functions g and h, and we refer the readers to [62, Theorem 7.65] for reference for the second inequality.

A further convergence statement can be made if we assume in addition that \(\mathbb {E}[(\langle x-v, F(\xi ,x)\rangle -\frac{\alpha }{2}\Vert x-v\Vert ^2+\lambda \psi (\xi ,x))^2]\) is finite for some \((x,v)\in D\times D\) and that \(\inf _{\xi \in \Xi } \lambda _\mathrm{min}(M_\xi +M_\xi ^T)\ge \alpha .\) To see this, note that problem (15) can be written as a minimax stochastic problem

$$\begin{aligned} \min _{x\in D} \max _{v\in D} \hat{\varphi }(x,v)=\mathbb {E}[\langle x-v, F(\xi ,x)\rangle -\frac{\alpha }{2}\Vert x-v\Vert ^2+\lambda \psi (\xi ,x)]. \end{aligned}$$
(28)

Thus, we may use [62, Theorem 5.10] to obtain the following convergence rate of optimal values

$$\begin{aligned} \sqrt{\nu } (\varphi ^\nu (x^\nu ) - \varphi (x^*))\quad \longrightarrow _d \quad z, \end{aligned}$$
(29)

where \(\longrightarrow _d\) denotes convergence in distribution and z is a random number having normal distribution with zero mean and variance \(\sigma ^2=Var[F(x^*,v^*,\xi )]\), where (\(x^*, v^*)\) is the unique solution of (28).

Finally, we note that under the assumptions of Theorem 4.3 and the linear growth condition (21), we have that \(x^\nu =x^*\) for all sufficient large \(\nu \), which gives the finite convergence rate of the SAA solutions.

5 Implementation and experimentation

We begin with a more detailed reformulation of Example 2.6, more compatible with our computational approach, in particular it provides a more explicit version of the flow-conservation equations; in the traffic transportation community all possible node pairs are implicitly included in the od-collection even those with \(d=o\) and \(h_{oo} = 0\) when the o-node is simply a transhipment node. We follow the Ferris-Pang multicommodity formulation [23] which associates a (different) commodity with each destination node \(d\in \mathcal{D}\subset \mathcal{N}\). In their formulation, a commodity is associated with each destination node in \(\mathcal{D}\subseteq \mathcal{N}\) and \(x^j\in {\mathbb {R}}^{|\mathcal{A}|}\) representing the flows of the commodities \(j=1,2,\ldots ,|\mathcal{D}|\) with \(x^j_a\) denoting the flow of commodity j on arc \(a\in \mathcal{A}\).

Let \(V=(v_{ij})\) denote the node-arc incidence matrix with entries \(v_{ij}=1\) if node i has outgoing flows on arc j, \(v_{ij}=-1\) if node i has incoming flows on arc j, and \(v_{ij}=0\) otherwise. The following condition represents conservation of flows of commodities,

$$\begin{aligned} Vx^j=d^j, \quad x^j\ge 0, \,\, j\in \mathcal{D}, \end{aligned}$$
(30)

where \(d^j_i\) is demand at node \(i\in \mathcal{N}\) for commodity j.

Let

$$\begin{aligned} A= & {} \left( \begin{array}{ccc} V &{} &{}\\ &{} \ddots &{} \\ &{} &{} V \end{array}\right) \in {\mathbb {R}}^{|\mathcal{D}||\mathcal{N}| \times |\mathcal{D}||\mathcal{A}|}, \\ x= & {} \left( \begin{array}{l} x^1\\ \vdots \\ x^{|\mathcal{D}|}\end{array}\right) \in {\mathbb {R}}^{|\mathcal{D}||\mathcal{A}|}, \quad b=\left( \begin{array}{l} d^1\\ \vdots \\ d^{|\mathcal{D}|}\end{array}\right) \in {\mathbb {R}}^{|\mathcal{D}||\mathcal{N}|}. \end{aligned}$$

Then (30) can be written as

$$\begin{aligned} Ax=b, \quad x\ge 0. \end{aligned}$$

If we add the constraints on capacity \(c_a\) of travel flows on each arc a, then the constraints on the arc flows are presented as follows

$$\begin{aligned} Ax=b, \quad 0\le x, \quad Px \le c, \end{aligned}$$

where \(P=(I,\ldots , I) \in {\mathbb {R}}^{|\mathcal{A}|\times |\mathcal{D}||\mathcal{A}|}\) and I is the \(|\mathcal{A}|\times |\mathcal{A}|\) identity matrix.

Traffic equilibrium models are built based on travel demand, travel capacity on each arc and travel flows via nodes. The demand and capacity depend heavily on various uncertain parameters, such as weather, accidents, etc. Let \(\Xi \subseteq {\mathbb {R}}^N\) denote the set of uncertain factors. Let \((d^j_\xi )_i>0\) denote the stochastic travel demand at the ith node for commodity j and \((c_{\xi })_a\) denote the stochastic capacity of arc a.

For a realization of random vectors \(d^j_\xi \in {\mathbb {R}}^{|\mathcal{N}|}\) and \(c_\xi \in {\mathbb {R}}^{|\mathcal{A}|}, \, \xi \in \Xi \), an assignment of flows to all arcs for commodity j is denoted by the vector \(u_\xi ^j\in {\mathbb {R}}^{|\mathcal{A}|}\), whose component \((u_\xi ^j)_a\) denotes the flow on arc a for commodity j.

Fig. 1
figure 1

The 7-links, 6-paths network

The network in Fig. 1 from [68] has 5 nodes, 7 arcs and 2 destination nodes \(\{4, 5\}\). The node-arc matrix for the network in Fig. 1 is given as follows.

$$\begin{aligned} V=\left( \begin{array}{ccccccc} 0~&{}0~&{}1~ &{}1~&{}0 ~ &{}0~&{}0~\\ 1~&{}0~&{}-1~&{}0 ~&{}1 ~&{}0~ &{}1~\\ 0~&{}1~&{}0~ &{}-1 ~&{}0 ~&{}1~&{}-1~\\ -1~&{}0~&{}0~&{}0 ~&{}0 ~&{}-1~&{}0~\\ 0~&{}-1~&{}0~&{}0 ~&{}-1 ~&{}0~&{}0~ \end{array} \right) \end{aligned}$$

and the matrices and vectors are as follows

$$\begin{aligned} A=\left( \begin{array}{cc} V &{} 0\\ 0 &{} V \end{array}\right) \in {\mathbb {R}}^{10\times 14}, \quad u_\xi \in {\mathbb {R}}^{14}, \quad b_\xi \in {\mathbb {R}}^{10}, \quad c_\xi \in {\mathbb {R}}^{7}. \end{aligned}$$

For a forecast robust arc flows x, let the feasible set for each realization \(\xi \) be

$$\begin{aligned} C_\xi = \{ u_\xi \, | \, Au_\xi = b_\xi , \, 0\le u_\xi , \, Pu_\xi \le c_\xi \}, \end{aligned}$$

where \(Pu_\xi =\sum _{j=1}^{|\mathcal{D}|} u_\xi ^j\) is the total travel flows.

The arc travel time function \(h(\xi , \cdot ):{\mathbb {R}}^{|\mathcal{D}||\mathcal{A}|}\rightarrow {\mathbb {R}}^{|\mathcal{A}|}\) is a stochastic vector and each of its entries \(h_a(\xi ,u_\xi )\) is assumed to follow a generalized Bureau of Public Roads (GBPR) function,

$$\begin{aligned} h_a(\xi , u_\xi )=\Big (\eta _a+\tau _a\Big (\frac{(Pu_\xi )_a}{(\gamma _\xi )_a}\Big )^{n_a}\Big ),\quad a=1,\ldots ,|\mathcal{A}|, \end{aligned}$$
(31)

where \(\eta _a, \tau _a, (\gamma _\xi )_a\) and \(n_a\) are given positive parameters, and \((Pu_\xi )_a\) is the total travel flows on each arc \(a\in \mathcal{A}\). Let

$$\begin{aligned} F(\xi , u_\xi )=P^Th(\xi ,u_\xi ). \end{aligned}$$

Then

$$\begin{aligned} \nabla _uF(\xi , u_\xi )=P^T\mathrm{diag}\Big (\tau _an_a\frac{(Pu_\xi )^{n_a-1}_a}{(\gamma _\xi )^{n_a}_a}\Big )P, \end{aligned}$$

which is symmetric positive semi-definite for any \(u_\xi \in C_\xi \subseteq {\mathbb {R}}_+^{|\mathcal{D}||\mathcal{A}|}\). One commonly considered special case is when \(n_a=1\), for all \(a\in \mathcal{A}\). In this case, \(F(\xi , u_\xi )=M_\xi u_\xi + q\), where

$$\begin{aligned} M_\xi =P^T\mathrm{diag}\left( \frac{\tau _a}{(\gamma _\xi )_a}\right) P \quad \mathrm{and} \quad q=P^T(\eta _1,\ldots , \eta _{|\mathcal{A}|})^T. \end{aligned}$$
(32)

Here, we have rank\((P)=|\mathcal{A}|\), and for any \(\xi \in \Xi \), \(M_\xi \in {\mathbb {R}}^{|\mathcal{D}||\mathcal{A}|\times |\mathcal{D}||\mathcal{A}|}\) is a positive semi-definite matrix. Thus, \(\mathbb {E}[M_\xi ]\) is positive semi-definite. Another commonly considered case is when \(n_a=3\), for all \(a\in \mathcal A\); see [23] and our numerical experiments in Sect. 5.2.

To define a here-and-now solution x, let

$$\begin{aligned} D=\{ x \, | \, Ax = \mathbb {E}[b_\xi ], \, 0\le x, \, Px \le \mathbb {E}[c_\xi ]\} \end{aligned}$$

and \(G(x)=P^T\bar{h}(x)\) where the components of \(\bar{h}\) is defined by

$$\begin{aligned} \bar{h}_a(x)=\eta _a+\tau _a(Px)_a^{n_a}\mathbb {E}[(\gamma _\xi )_a^{-n_a}], \quad a=1,\ldots ,|\mathcal{A}|. \end{aligned}$$

The deterministic VI formulation for Wardrop’s user equilibrium, seeks a forecast arc flows \(x \in D\) satisfying

$$\begin{aligned} -G(x)\in N_D(x). \end{aligned}$$
(33)

On the other hand, the stochastic VI formulation for Wardrop’s user equilibrium, seeks an equilibrium arc flow \(u_\xi \in C_\xi \) for a known event \(\xi \in \Xi \), such that

$$\begin{aligned} -F(\xi , u_\xi ) \in _{a.s.} N_{C_\xi }(u_\xi ). \end{aligned}$$
(34)

In general, the solution sets of the variational inequalities (33) and (34) can have multiple solutions. Naturally, a here-and-now solution x should have minimum total distances to the solution set of (34) for almost all observations \(\xi \in \Xi \). It can be written as a mathematical programming with equilibrium constraints [43] as the following

$$\begin{aligned} \begin{array}{ll} \min &{} \mathbb {E}[\Vert u_\xi -x\Vert ^2]\\ \text {subject to } &{} -G(x)\in N_D(x),\quad -F(\xi , u_\xi ) \in _{a.s.} N_{C_\xi }(u_\xi ), \quad \xi \in \Xi . \end{array} \end{aligned}$$
(35)

Recalling the definitions of the residual functions induced by the regularized gap function given in §3, we have

$$\begin{aligned}&-G(x)\in N_D(x) \quad \Leftrightarrow \quad \theta (x)=0 \ \mathrm{and} \ x\in D, \\&\quad -F(\xi , u_\xi ) \in N_{C_\xi }(u_\xi ) \quad \Leftrightarrow \quad r(\xi , u_\xi )=0\ \mathrm{and} \ u_\xi \in C_\xi . \end{aligned}$$

Note that \(\theta (x)\ge 0, r(\xi , u_\xi )\ge 0\) for \(x\in D\) and \(u_\xi \in C_\xi \), and they are continuously differentiable. It is natural to consider the following \(l_1\) penalty problem which trades off optimization of total distance and the violation of the constraints in (35):

$$\begin{aligned} \begin{array}{ll} \min &{} \frac{1}{\lambda \rho }\theta (x)+ \frac{1}{\rho }\mathbb {E}[r(\xi ,u_\xi )] +\mathbb {E}[\Vert u_\xi -x\Vert ^2]\\ \text {subject to } &{} x\in D, \quad u_\xi \in _{a.s.} C_\xi , \ \xi \in \Xi , \end{array} \end{aligned}$$
(36)

for some positive numbers \(\lambda \) and \(\rho \). Notice that this is a special case of (11), which was derived in §3 as a relaxation of (4). Our discussion above provided an alternative interpretation of this problem as an \(l_1\) penalty problem.

Using Lemma 3.6, we see that (36) is further equivalent to the following problem

$$\begin{aligned} \begin{array}{lll} \min _{x\in D} &{} \theta (x)+ \lambda \mathbb {E}[\psi (\xi , x)]&{}\\ \mathrm{where} &{} \psi (\xi , x) =\min _{u_\xi \in C_\xi } r(\xi ,u_\xi ) +\rho \Vert u_\xi -x\Vert ^2. \end{array} \end{aligned}$$
(37)

In this problem, for each \(\xi \in \Xi \), the decisions \(u_\xi \) are dependent on a forecast arc flows x. The two-stage stochastic program (37) uses x as the first stage decision variable, and \(u_\xi \) as the second stage variable.

In a stochastic environment, \(\xi \) belongs to a set \(\Xi \) representing future states of knowledge. The stochastic optimization approach (36) (equivalently, (37)) is to find a vector \(x^*\) which minimizes the expected residual values with recourse cost. The main role of traffic model is to provide a forecast for future traffic states. The solution of the stochastic optimization approach (36) is a “here-and-now” solution which provides a robust forecast and has advantages over other models for long term planning. Stochastic traffic assignment on path flow has been formulated as stochastic complementarity problems and stochastic variational inequalities in [1, 10, 70]

5.1 The Douglas–Rachford splitting method

In this section, we focus on problem (36), which is a special case of (11) with \(B = 2\rho I\) and \(D = \{x\in {\mathbb {R}}^{|\mathcal{D}||\mathcal{A}|}_+\,|\, Ax=\mathbb {E}[b_\xi ],\, Px\le \mathbb {E}[c_\xi ]\}\). We will discuss an algorithm for solving the following SAA of problem (36), for any fixed \(\nu \):

$$\begin{aligned} \begin{array}{ll} \min &{} \theta (x) + \frac{\lambda }{\nu }\sum _{i=1}^\nu \left[ r(\xi ^i,u_{\xi ^i}) + \rho \Vert u_{\xi ^i} - x\Vert ^2\right] \\ \text {subject to } &{} x\in D, \ u_{\xi ^i}\in C_{\xi ^i} \ \ \forall \, i = 1,\ldots ,\nu , \end{array} \end{aligned}$$
(38)

where \(\big \lbrace \xi ^1,\ldots ,\xi ^\nu \big \rbrace \) is an iid sample of \(\Xi \) of size \(\nu \).

Notice that the objective of (38) is highly structured: the decision variables are only related due to the quadratic term \(\frac{\lambda \rho }{\nu }\sum _{i=1}^\nu \Vert u_{\xi ^i} - x\Vert ^2\); if this quadratic term was absent, then the problem would be decomposed into \(\nu +1\) smaller independent optimization problems that can be solved in parallel. This observation leads us to consider splitting methods in which the objective is decoupled into two parts and minimized separately. One such method is the Douglas–Rachford (DR) splitting method. This method was first proposed in [19] for solving feasibility problems and has been extensively studied in the convex scenario; see, for example, [3, 20, 29]. Moreover, the global convergence of the method for some class of problems with nonconvex objectives has been recently studied and established in [39].

To apply the DR splitting method, we first note that (38) can be equivalently written as a minimization problem that minimizes the sum of the following two functions:

$$\begin{aligned} f(U,x):= & {} \frac{\lambda \rho }{\nu }\sum _{i=1}^\nu \Vert u_{\xi ^i} - x\Vert ^2,\nonumber \\ g(U,x):= & {} \theta (x) + \delta _D(x) + \frac{\lambda }{\nu }\sum _{i=1}^\nu (r(\xi ^i,u_{\xi ^i}) + \delta _{C_{\xi ^i}}(u_{\xi ^i})), \end{aligned}$$
(39)

where g is a proper closed function and f is a quadratic function whose Lipschitz continuity modulus is \(2\lambda \rho \left( 1+\frac{1}{\nu }\right) \), and U is the vector in \({\mathbb {R}}^{\nu |\mathcal{D}||\mathcal{A}|}\) formed by stacking \(u_{\xi ^i}\), i.e.,

$$\begin{aligned} U = \begin{bmatrix} u_{\xi ^1}^T&\cdots&u_{\xi ^\nu }^T \end{bmatrix}^T. \end{aligned}$$

Each iteration of the DR splitting method then involves solving regularized optimization problems related to f, g and the current iterates, as well as updating the “dual” variables. The DR splitting method applied to solving (38) is presented in (40), where we denote by V (resp., Z) the vector in \({\mathbb {R}}^{\nu |\mathcal{D}||\mathcal{A}|}\) formed by stacking \(v_i\) (resp., \(z_i\)) for notational convenience, i.e.,

$$\begin{aligned} V = \begin{bmatrix} v_{1}^T&\cdots&v_{\nu }^T \end{bmatrix}^T,\ \ Z = \begin{bmatrix} z_{1}^T&\cdots&z_{\nu }^T \end{bmatrix}^T. \end{aligned}$$
figure a

The global convergence of the sequence generated by this method to stationary points of (38) follows immediately from [39, Theorem 1], [39, Theorem 4] and our choice of step-size parameter \(\mu \). However, notice that the subproblems for the (Ux)-update are nonconvex in general. Consequently, in practice, it is hard to compute global minimizers for these subproblems and typical optimization solvers are only guaranteed to return a stationary point satisfying the first-order optimality conditions. Fortunately, a closer look at the proofs of [39, Theorem 1] and [39, Theorem 4] reveals that, for the convergence proofs to remain valid, one only needs to update (Ux) so that

  1. 1.

    \((U^{k+1},x^{k+1})\) is a first-order stationary point of the subproblems in the kth iteration; and

  2. 2.

    the subproblem objective values in the kth iteration at \((U^{k+1},x^{k+1})\) are smaller than those at \((U^k,x^k)\).

Thus, in practice, one can apply any optimization solver that employs a descent algorithm for solving the nonconvex subproblems corresponding to the (Ux)-update approximately. On the other hand, the (Vy)-updates for this algorithm can be solved explicitly and efficiently by exploiting the arrow-shaped structure of the Hessian of the quadratic optimization problem.

Finally, we discuss a possible termination criterion for the algorithm. To this end, observe from the definitions of f and g in (39) and the optimality conditions for the (Vy) and (Ux)-subproblems in (40) that

$$\begin{aligned}\begin{aligned} 0&= \nabla f(V^{k+1},y^{k+1}) + \frac{1}{\mu }[(V^{k+1},y^{k+1}) - (Z^k,\zeta ^k)],\\ 0&\in \partial g(U^{k+1},x^{k+1}) + \frac{1}{\mu }[(U^{k+1},x^{k+1}) - 2(V^{k+1},y^{k+1}) + (Z^k,\zeta ^k)]. \end{aligned} \end{aligned}$$

From these, it is not hard to deduce that

$$\begin{aligned} -\frac{1}{\mu }[(U^k,x^k) - (V^k,y^k)] + (\nabla f(U^k,x^k) - \nabla f(V^k,y^k)) \end{aligned}$$
(41)

is an element of \(\nabla f(U^k,x^k) + \partial g(U^k,x^k)\). The algorithm can thus be terminated with an approximate stationary point \((U^k,x^k)\) when the quantity in (41) is small in norm. For instance, one can terminate when

$$\begin{aligned} \left( L + \frac{1}{\mu }\right) \frac{\Vert (U^k,x^k) - (V^k,y^k)\Vert }{\max \{\Vert (U^k,x^k)\Vert ,\Vert (V^k,y^k)\Vert ,1\}} < tol \end{aligned}$$
(42)

for some \(tol > 0\); here, L stands for the Lipschitz continuity modulus of \(\nabla f\).

5.2 Numerical simulations

We now report simulation results based on the networks described at the beginning of this section. All codes were written in MATLAB and run in MATLAB R2011b.

We consider the case when \(n_a = 3\) and compare the solution of our model against the solution of the EV formulation (2). The latter problem is a nonlinear programming problem with the smooth objective (1). In our experiments below, we choose \(\alpha = 10\) in (1) for the EV formulation and solve the problem approximately by the MATLAB function fmincon Footnote 9 using the default tolerance, initialized at the projection of the origin onto the feasible set. The projections onto D (as well as \(C_\xi \) considered below) are computed approximately by the MATLAB function lsqlin,Footnote 10 using the default tolerance.

For our model, we choose \(\alpha = 10\) in (3), \(\lambda = 20\) and \(\rho = 5\). To initialize the algorithm, we set \(\{z_i^0\}\) to be the approximate solutions to \(\min _{u_{\xi ^i}\in C_{\xi ^i}} r({\xi ^i},u_{\xi ^i})\) for each \(i = 1,\ldots ,\nu \) and \(\zeta ^0\) to be the approximate solution to (2); these problems are solved approximately by fmincon using the same settings as described above. On the other hand, we terminate when

$$\begin{aligned} \left( 2\lambda \rho \frac{\nu + 1}{\nu } + \frac{1}{\mu }\right) \frac{\Vert (U^{k},x^k) - (V^{k},y^k)\Vert }{\max \{\Vert (U^k,x^k)\Vert ,\Vert (V^k,y^k)\Vert ,1\}} < tol, \end{aligned}$$
(43)

for some \(tol > 0\). This termination criterion is motivated by (42). To speed up the convergence, we use the same kind of heuristic for choosing \(\mu \) as described in [39, Remark 4] and [64, Remark 2.3]: we set \(\mu = 150\mu _0\), where \(\mu _0 := \frac{0.99\nu }{\nu + 1}\frac{\sqrt{1.5}-1}{2\lambda \rho }\), and update \(\mu = \max \{\mu _0,\mu /2\}\) when either \(\Vert (V^k,y^k)\Vert > 10^{10}\) or

$$\begin{aligned} \Vert (V^k,y^k) - (V^{k-1},y^{k-1})\Vert > \frac{10^6}{k}. \end{aligned}$$

In our experiment below, we use the same free travel time \(\eta \) in (31) as in [68], i.e.,

$$\begin{aligned} \eta = \begin{bmatrix} 6&4&3&5&6&4&1 \end{bmatrix}^T, \end{aligned}$$

and \(\tau = 0.15\eta \). The demand vector and the link capacity both have a beta distribution, as in [10, Example 4.1]. The lower bound for link capacity \(\gamma _\xi \) is \(\begin{bmatrix} 10&10&20&20&10&10&10 \end{bmatrix}^T\) with a mean of \(\overline{\gamma }:=\begin{bmatrix} 15&15&30&30&15&15&15 \end{bmatrix}^T\), and the parameters for the beta distribution are \((\alpha ,\beta ) = (2,2)\). We then set \(c_\xi \) in \(C_\xi \) to be \(10\overline{\gamma }\), independent of \(\xi \in \big \lbrace \xi ^1,\ldots ,\xi ^\nu \big \rbrace \). For the demand vector \(b_\xi \), the lower bound \(\underline{b}\) is \(\begin{bmatrix} 150&180 \end{bmatrix}^T\) and we set \(b_\xi = \underline{b} + \begin{bmatrix} 120&96 \end{bmatrix}^T\circ \ beta(\alpha ,\beta )\), with the first entry corresponding to the OD pair \(1\rightarrow 4\). The corresponding parameters for the beta distribution are \((\alpha ,\beta ) = (5,1)\) in this case. Furthermore, we set \(tol=5\times 10^{-5}\) in (43) in our experiments for this network.

The computational results are given in Table 1, where we present results for sample sizes \(\nu = 50\), 100 and 150. We denote by \(x^*\) the approximate solution to our model obtained from the DR splitting method and \(x_\mathrm{EV}\) the solution to (2). For these solutions, we compute their projections \(w_\xi \) onto \(C_\xi \) for a larger sample \(\Upsilon _\nu \) of size \(20\times \nu \) that contains \(\big \lbrace \xi ^1,\ldots ,\xi ^\nu \big \rbrace \), and report \(\mathbf{vio}:= \frac{1}{20\nu }\sum _{\xi \in \Upsilon _\nu }\max _{v\in C_\xi }(Pw_\xi - Pv)^Th(\xi ,w_\xi )\),Footnote 11 the CVaR given by

$$\begin{aligned} \mathrm{CVaR} = \inf _{\alpha } \alpha + \frac{1}{2\nu } \sum _{\xi \in \Upsilon _\nu }\left[ \max _{v\in C_\xi }(Pw_\xi - Pv)^Th(\xi ,w_\xi ) - \alpha \right] _+, \end{aligned}$$

and the corresponding minimizer \(\alpha ^*\) at which the above infimum is attained. Clearly, the smaller these three quantities are, the closer \(w_\xi \) is to being optimal for the corresponding instance. Our computational results show that the \(w_\xi \) constructed from our \(x^*\) are better optimal solutions on average for \(\xi \in \Upsilon _\nu \).

Table 1 Computational result for the network from [68]

We next consider the Nguyen and Dupuis network, which contains 13 nodes, 19 arcs, 2 destinations. See Fig. 2.

Fig. 2
figure 2

Nguyen and Dupuis Network with 19 links and 25 paths

We use the same free travel time \(\eta \) in (31) as in [69, Table 1], i.e.,

$$\begin{aligned} \eta = [\begin{array}{ccccccccccccccccccc} 7&9&9&12&3&9&5&13&5&9&9&10&9&6&9&8&7&14&11 \end{array}]^T, \end{aligned}$$

and \(\tau = 0.15\eta \). Also, as in [10, Example 4.2], our demand vector has a beta distribution. The lower bound \(\underline{b}\) for the demand vector is \(\begin{bmatrix} 300&700&500&350 \end{bmatrix}^T\), which corresponds to the OD pairs \(1\rightarrow 2\), \(1\rightarrow 3\), \(4\rightarrow 2\) and \(4\rightarrow 3\). The parameters for the beta distribution are \((\alpha ,\beta ) = (50,10)\). The demand vector \(b_\xi \) is then generated according to \(\underline{b} + 120\, beta(\alpha ,\beta )\). On the other hand, we generate the link capacity \(\gamma _\xi \) exactly as in [10, Example 4.2]. We also set \(c_\xi \) in \(C_\xi \) to be \(10\overline{\gamma }\), where \(\overline{\gamma }\) is the mean of the three possible link capacities in [10, Example 4.2] with the probability specified there, independent of \(\xi \). Furthermore, we set \(tol=5\times 10^{-4}\) in (43) in our experiments for this network.

The computational results are given in Table 2, where we present results for various sample sizes \(\nu = 50\), 100 and 150 as before. We also report the same quantities as in Table 1: \(\mathbf{vio}\), the CVaR and the corresponding \(\alpha ^*\). Our computational results again show that the \(w_\xi \) constructed from our \(x^*\) are better optimal solutions on average for \(\xi \in \Upsilon _\nu \).

Table 2 Computational result for the Nguyen and Dupuis network