1 Introduction

Advances in areas such as computational science and engineering, applied mathematics, software design, and scientific computing have allowed decision makers to optimize complex physics-based systems under uncertainty, such as those modeled using partial differential equations (PDEs) with uncertain inputs. Recent applications in the field of PDE-constrained optimization under uncertainty are, for example, oil field development [60], stellarator coil optimization [81], acoustic wave propagation [82], and shape optimization of electrical engines [39]. In the literature on optimization under uncertainty, several approaches have been proposed for obtaining decisions that are resilient to uncertainty, such as robust optimization [11] and stochastic optimization [70]. When the parameter vector is modeled as a random vector with known probability distribution, a common approach is to seek decisions that minimize the expected value of a parameterized objective function. The resulting optimization problem is referred to as risk-neutral optimization problem. However, evaluating the risk-neutral problem’s objective function would require computing a potentially high-dimensional integral. Furthermore, each evaluation of the parameterized objective function may require the simulation of complex systems of PDEs, adding another challenge to obtaining solutions to risk-neutral PDE-constrained optimization problems.

A common approach for approximating risk-neutral optimization problems is the sample average approximation (SAA) method, yielding the SAA problem. For example, the SAA approach is used in the literature on mathematical programming [38, 70, 73] and on PDE-constrained optimization [29, 43, 66, 81]. The SAA problem’s objective function is the sample average of the parameterized objective function computed using samples of the random vector. To assess the quality of the SAA solutions as approximate solutions to the risk-neutral problem, different error measures have been considered, such as the consistency of the SAA optimal value and of SAA solutions [7, 30, 50, 68, 69, 72], nonasymptotic sample size estimates [18, 67, 70, 71, 73], mean and almost sure convergence rates [9], and confidence intervals for SAA optimal values [27].

A number of results on the SAA approach are based on the compactness of either the feasible set or of sets that eventually contain the SAA solutions, such as the consistency properties of SAA solutions and sample size estimates. The analysis of the SAA approach as applied to PDE-constrained optimization problems is complicated by the fact that the feasible sets are commonly noncompact, such as the set of square integrable functions defined on the interval (0, 1) with values in \([-1,1]\). Moreover, level sets of the SAA objective function may not be contained in a deterministic, compact set as shown in Appendix A. Our approach for establishing consistency is based on that developed in [72, Chap. 5]. While the consistency results in [72, Chap. 5] are established for finite dimensional stochastic programs, the results do not require the compactness of the feasible set. Instead, they are valid provided that the solution set to the stochastic program and those to the SAA problems are eventually contained in a deterministic, compact set.

We establish the consistency of SAA optimal values and SAA solutions to risk-neutral nonlinear PDE-constrained optimization problems. For analyzing the SAA approach, we construct deterministic, compact subsets of the possibly infinite dimensional feasible sets that contain the solutions to risk-neutral PDE-constrained problems and eventually those to the corresponding SAA problems. This observation allows us to study the consistency using the tools developed in the literature on M-estimation [34, 52] and stochastic programming [70, 72]. Our consistency results are inspired by and based on those established in [70, Sects. 2 and 7] and [72, Chap. 5]. For our construction of these compact sets, we use the fact that many PDE-constrained optimization problems involve compact operators, such as compact embeddings. Moreover, we use first-order optimality conditions and PDE stability estimates. The construction is partly inspired by the computations used to establish higher regularity of solutions to deterministic PDE-constrained optimization problems [56, p. 1305], [76, Sect. 2.15] and a computation made in the author’s dissertation [57, Sect. 3.5] which demonstrates that all SAA solutions to certain linear elliptic optimal control problems are contained in a compact set.

The SAA method as applied to risk-neutral strongly convex PDE-constrained optimization has recently been analyzed in [33, 54, 58, 66]. The authors of [62] apply the SAA scheme to the optimal control of ordinary differential equations with random inputs and demonstrate the epiconvergence of the SAA objective function and the weak consistency of SAA critical points in the sense defined in [64, Definition 3.3.6]. The weak consistency implies that accumulation points of SAA critical points are critical points of the optimal control problem [62, p. 13].

Monte Carlo sampling is one approach to approximating expected values in stochastic program’s objective functions. For strongly convex elliptic PDE-constrained optimization problems, quasi-Monte Carlo techniques are analyzed in [28]. Further discretization approaches for expectations are, for example, stochastic collocation [74] and low-rank tensor approximations [23]. Besides risk-neutral PDE-constrained optimization, risk-averse PDE-constrained optimization [3, 17, 43, 44, 55], distributionally robust PDE-constrained optimization [41, 59], robust PDE-constrained optimization [5, 39, 51], and PDE-constrained optimization with chance constraints [16, 20, 21, 26, 75] provide approaches to decision making under uncertainty with PDEs.

1.1 Outline

We introduce notation in Sect. 2 and a class of risk-neutral nonlinear PDE-constrained optimization problems and their SAA problems in Sect. 3. Section 3.1 presents a compact subset that contains the solutions to the risk-neutral problem and eventually those to its SAA problems. We study the consistency of SAA optimal values and solutions in Sect. 3.2. Section 4 discusses the application of our theory to three nonlinear PDE-constrained optimization problems under uncertainty. We summarize our contributions, and discuss some limitations of our approach and open research questions in Sect. 5.

2 Notation and Preliminaries

Throughout the paper, the control space \(U\) is a real, separable Hilbert space and is identified with its dual, that is, we omit writing the Riesz mapping.

Metric spaces are defined over the real numbers and equipped with their Borel sigma-algebra. We abbreviate “with probability one” by w.p. \(1\). Let \((\Theta , {\mathcal {A}}, \mu )\) be probability space. For two complete metric spaces \(\varLambda _1\) and \(\varLambda _2\), a mapping \(G : \varLambda _1 \times \Theta \rightarrow \varLambda _2\) is a Carathéodory mapping if \(G(\cdot , \theta )\) is continuous for all \(\theta \in \Theta \) and \(G(v, \cdot )\) is measurable for each \(v \in \varLambda _1\). Let \(\varLambda \) be a Banach space. For each \(N \in {\mathbb {N}}\), let \(\Upsilon _N : \Theta \rightrightarrows \varLambda \) be a set-valued mapping and let \(\Psi \subset \varLambda \) be a set. We say that w.p. \(1\) for all sufficiently large N, \(\Upsilon _N \subset \Psi \) if the set \(\{\, \theta \in \Theta :\, \exists \, n(\theta ) \in {\mathbb {N}}\, \forall \, N \ge n(\theta ); \, \Upsilon _N(\theta ) \subset \Psi \}\) is contained in \({\mathcal {A}}\) and occurs w.p. \(1\), that is, if the limit inferior of the sequence \( (\{\, \theta \in \Theta :\,\Upsilon _N(\theta ) \subset \Psi \})_N \) is contained in \({\mathcal {A}}\) and occurs w.p. \(1\) [12, p. 55]. A mapping \(\upsilon : \Theta \rightarrow \varLambda \) is strongly measurable if there exists a sequence of simple mappings \(\upsilon _k : \Theta \rightarrow \varLambda \) such that \(\upsilon _k(\theta ) \rightarrow \upsilon (\theta )\) as \(k \rightarrow \infty \) for all \(\theta \in \Theta \) [35, Definition 1.1.4]. If \(\varLambda \) is separable, then \(\upsilon : \Theta \rightarrow \varLambda \) is strongly measurable if and only if it is measurable [35, Corollary 1.1.2 and Theorem 1.1.6]. The dual to a Banach space \(\varLambda \) is \(\varLambda ^*\) and the norm of \(\varLambda \) is denoted by \(\left\Vert \cdot \right\Vert _{\varLambda }\). We use \(\langle \cdot , \cdot \rangle _{{\varLambda }^*\!, \varLambda }\) to denote the dual pairing between \(\varLambda ^*\) and \(\varLambda \). If \(\varLambda \) is a reflexive Banach space, we identify \((\varLambda ^*)^*\) with \(\varLambda \) and write \((\varLambda ^*)^* = \varLambda \). Let \(\varLambda _1\) and \(\varLambda _2\) be real Banach spaces. A linear operator \(\Upsilon : \varLambda _1 \rightarrow \varLambda _2\) is compact if the image \(\Upsilon (\varLambda _0)\) is precompact in \(\varLambda _2\) for each bounded set \(\varLambda _0 \subset \varLambda _1\) [49, Definition 8.1-1]. The operator \(\Upsilon ^* :\varLambda _2^* \rightarrow \varLambda _1^*\) is the (Banach space-)adjoint operator of the linear, bounded mapping \(\Upsilon :\varLambda _1 \rightarrow \varLambda _2\) and is defined by \(\langle \Upsilon ^*v_2, v_1\rangle _{\varLambda _1^*,\varLambda _1} = \langle v_2, \Upsilon v_1\rangle _{\varLambda _2^*,\varLambda _2}\) [49, Definition 4.5-1]. We use to denote a continuous embedding from \(\varLambda _1\) to \(\varLambda _2\), that is, \(\varLambda _1 \subset \varLambda _2\) and the embedding operator \(\iota : \varLambda _1 \rightarrow \varLambda _2\) defined by \(\iota [v] = v\) is continuous [65, Definition 7.15 and Rem. 7.17]. A continuous embedding is compact if the embedding operator is a compact operator [65, Definition 7.25 and Lemma 8.75]. We denote by \(\textrm{D}f\) the Fréchet derivative of \(f\), and use the notation \(\textrm{D}_x f\) and \(f_x\) for partial derivatives with respect to x. Throughout the text, \(D\subset {\mathbb {R}}^d\) is a bounded domain. For \(p \in [1,\infty )\), we denote by \(L^p(D)\) the Lebesgue space of p-integrable functions defined on \(D\) and \(L^\infty (D)\) that of essentially bounded functions. The space \(H^1(D)\) is the space of all \(v \in L^2(D)\) with weak derivatives contained in \(L^2(D)^d\), where \(L^2(D)^d\) is the Cartesian product of \(L^2(D)\) taken d times. We equip \(H^1(D)\) with the norm \(\left\Vert y\right\Vert _{H^1(D)} = (\left\Vert y\right\Vert _{L^2(D)}^2 + \left\Vert \nabla y\right\Vert _{L^2(D)^d}^2)^{1/2}\). The Hilbert space \(H_0^1(D)\) consists of all \(v \in H^1(D)\) with zero boundary traces and is equipped with the norm \(\left\Vert y\right\Vert _{H_0^1(D)} = \left\Vert \nabla y\right\Vert _{L^2(D)^d}\). We define \(H^{-1}(D) = H_0^1(D)^*\). We define Friedrichs ’ constant \(C_D\in (0,\infty )\) by \( C_{D} = \sup _{v \in H_0^1(D) \setminus \{0\}}\, \left\Vert v\right\Vert _{L^2(D)}/\left\Vert v\right\Vert _{H_0^1(D)} \). The indicator function \(I_{U_0} : U\rightarrow [0,\infty ]\) of \(U_0\subset U\) is given by \(I_{U_0}(v) = 0\) if \(v \in U_0\) and \(I_{U_0}(v) = \infty \) otherwise. For a convex, lower semicontinuous, proper function \(\chi : U\rightarrow (-\infty ,\infty ]\), the proximity operator \(\textrm{prox}_{\chi } :U\rightarrow U\) of \(\chi \) is defined by (see [10, Definition 12.23])

$$\begin{aligned} \textrm{prox}_{\chi }({v}) = \mathop {\mathrm {arg\,min}}\limits _{w\in U}\, \chi (w) + (1/2)\left\Vert v-w\right\Vert _{U}^2. \end{aligned}$$

3 Risk-Neutral PDE-Constrained Optimization Problem

We consider the risk-neutral PDE-constrained optimization problem

$$\begin{aligned} \min _{u\in U}\, {\mathbb {E}}\left[ J_1(S(u,\xi ),\xi )\right] + \psi (u) + (\alpha /2)\left\Vert u\right\Vert _{U}^2\, \end{aligned}$$
(1)

and its sample average approximation

$$\begin{aligned} \min _{u\in U}\, \frac{1}{N} \sum _{i=1}^NJ_1(S(u,\xi ^i),\xi ^i) + \psi (u) + (\alpha /2)\left\Vert u\right\Vert _{U}^2, \end{aligned}$$
(2)

where \(\alpha > 0\), and \(\xi ^1\), \(\xi ^2, \ldots \) are independent identically distributed \(\Xi \)-valued random elements defined on a complete probability space \((\Omega , {\mathcal {F}}, P)\) and each \(\xi ^i\) has the same distribution as that of the random element \(\xi \). Here, \(\xi \) maps from a probability space to a complete probability space with sample space \(\Xi \) being a complete, separable metric space. We state assumptions on the mappings \(J_1 :Y\times \Xi \rightarrow [0,\infty )\), \(\psi :U\rightarrow [0,\infty ]\), and \(S : U\times \Xi \rightarrow Y\) as well as on the control space \(U\) and state space \(Y\) in Assumptions 1 and 2. Let \(F: U\rightarrow [0,\infty ]\) be the objective function of (1) and let \({\hat{F}}_N : U\rightarrow [0,\infty ]\) be that of (2). Since \(\xi ^1, \xi ^2, \ldots \) are defined on the common probability space \((\Omega ,{\mathcal {F}},P)\), we can view the function \({\hat{F}}_N\) as defined on \(U\times \Omega \). However, we often omit writing the second argument. We often use \(\xi \) to denote a deterministic element in \(\Xi \).

In the remainder of the section, we impose conditions on the optimization problem (1). Assumptions 1 and 2 ensure that the reduced formulation of the risk-neutral problem (1) and its SAA problem (2) are well-defined.

Assumption 1

  1. (a)

    The space \(U\) is a real, separable Hilbert space, and \(Y\) is a real, separable Banach space.

  2. (b)

    The function \(J_1 : Y\times \Xi \rightarrow [0,\infty )\) is a Carathéodory function, and \(J_1(\cdot , \xi )\) is continuously differentiable for all \(\xi \in \Xi \).

  3. (c)

    The regularization parameter \(\alpha \) is positive, and \(\psi : U\rightarrow [0,\infty ]\) is proper, convex and lower semicontinuous.

The nonnegativity of \(J_1\) and \(\psi \) is fulfilled for many PDE-constrained optimization problems (see Sect. 4). We define the feasible set

$$\begin{aligned} U_\text {ad}= \{ \, u \in U:\, \psi (u)<\infty \}. \end{aligned}$$
(3)

Assumption 2

  1. (a)

    The operator \(E : (Y\times U) \times \Xi \rightarrow Z\) is a Carathéodory mapping, \(E(\cdot ,\cdot , \xi )\) is continuously differentiable for all \(\xi \in \Xi \), and \(Z\) is a real, separable Banach space.

  2. (b)

    For each \((u,\xi ) \in U\times \Xi \), \(S(u,\xi ) \in Y\) is the unique solution to: find \(y \in Y\) with \(E(y,u,\xi ) = 0\).

  3. (c)

    For each \((u,\xi ) \in U\times \Xi \), \(E_y(S(u,\xi ),u,\xi )\) has a bounded inverse.

Assumptions 1 and 2 and the implicit function theorem ensure that \(S(\cdot , \xi ) \) is continuously differentiable on \(U\) for each \(\xi \in \Xi \). Let us define \(\widehat{J}_1: U\times \Xi \rightarrow [0,\infty )\) by

$$\begin{aligned} \widehat{J}_1(u,\xi ) = J_1(S(u,\xi ),\xi ) \end{aligned}$$
(4)

and \(\widehat{J}: U\times \Xi \rightarrow [0,\infty )\) by

$$\begin{aligned} \widehat{J}(u,\xi ) = \widehat{J}_1(u,\xi ) + \psi (u) + (\alpha /2)\left\Vert u\right\Vert _{U}^2. \end{aligned}$$
(5)

Let us fix \(\xi \in \Xi \). Assumptions 1 and 2 allow us to use the adjoint approach [32, Sect. 1.6.2] to compute the gradient of the function \(\widehat{J}_1(\cdot ,\xi )\) defined in (4) at each \(u \in U\). It yields the gradient

$$\begin{aligned} \nabla _u \widehat{J}_1(u,\xi ) = E_u(S(u,\xi ),u,\xi )^*z(u,\xi ), \end{aligned}$$
(6)

where for each \((u,\xi ) \in U\times \Xi \), \(z(u,\xi ) \in Z^*\) is the unique solution to the (parameterized) adjoint equation: find \(z \in Z^*\) with

$$\begin{aligned} E_y(S(u,\xi ),u,\xi )^*z = - \textrm{D}_y J_1(S(u,\xi ),\xi ). \end{aligned}$$
(7)

Assumption 3

The risk-neutral problem (1) has a solution. For each \(N \in {\mathbb {N}}\) and every \(\omega \in \Omega \), the SAA problem (2) has a solution.

We refer the reader to [42, Theorem 1] and [44, Proposition 3.12] for theorems on the existence of solutions to risk-averse PDE-constrained optimization problems.

For some \(u_0 \in U_\text {ad}\) with \({\mathbb {E}}\left[ \widehat{J}(u_0,\xi )\right] < \infty \) and a scalar \(\rho \in (0,\infty )\), we define the set

$$\begin{aligned} V_{\text {ad}}^\rho (u_0) = \{u \in U_\text {ad}:\, (\alpha /2) \left\Vert u\right\Vert _{U}^2 \le {\mathbb {E}}\left[ \widehat{J}(u_0,\xi )\right] + \rho \}. \end{aligned}$$

The existence of such a point \(u_0\) is implied by Assumption 3, for example. Whereas \(U_\text {ad}\) may be unbounded, the set \(V_{\text {ad}}^\rho (u_0)\) is bounded. If \(U_\text {ad}\) is bounded and \(\rho \in (0,\infty )\) is sufficiently large, then \(V_{\text {ad}}^\rho (u_0) = U_\text {ad}\). Each solution to the risk-neutral problem (1) is contained in \(V_{\text {ad}}^\rho (u_0)\), because \(\widehat{J}_1 \ge 0\), \(\psi \ge 0\), and \(u_0 \in U_\text {ad}\).

Assumption 4 allows us to construct compact subsets of the bounded set \(V_{\text {ad}}^\rho (u_0)\).

Assumption 4

  1. (a)

    The linear operator \(K: V\rightarrow U\) is compact, \(V\) is a real, separable Banach space, and \(B_{V_{\text {ad}}^\rho (u_0)} \subset U\) is a bounded, convex neighborhood of \(V_{\text {ad}}^\rho (u_0)\).

  2. (b)

    The mapping \(M :U\times \Xi \rightarrow V\) is a Carathéodory mapping and for all \((u,\xi ) \in U\times \Xi \),

    $$\begin{aligned} \nabla _u \widehat{J}_1(u,\xi ) = K[M(u,\xi )]. \end{aligned}$$
    (8)
  3. (c)

    For some integrable random variable \(\zeta : \Xi \rightarrow [0,\infty )\),

    $$\begin{aligned} \left\Vert M(u,\xi )\right\Vert _{V} \le \zeta (\xi ) \quad \text {for all} \quad (u,\xi )\in B_{V_{\text {ad}}^\rho (u_0)} \times \Xi . \end{aligned}$$
    (9)

Assumption 4 (b) and the gradient formula in (6) yield for all \((u,\xi ) \in U\times \Xi \),

$$\begin{aligned} E_u(S(u,\xi ),u,\xi )^*z(u,\xi ) = K[M(u,\xi )]. \end{aligned}$$

Assumption 4 (c) may be verified using stability estimates for the solution operator and adjoint state. If \(B_{V_{\text {ad}}^\rho (u_0)}\) would be unbounded, then Assumption 4 (c) may be violated.

Lemma 1

If Assumptions 1 and 2 hold, then \(\widehat{J}_1 : U\times \Xi \rightarrow [0,\infty )\) is a Carathéodory mapping.

Proof

For each \(\xi \in \Xi \), the implicit function theorem when combined with Assumption 1 and 2 ensures that the mappings \(S(\cdot ,\xi )\) is continuously differentiable. In particular, \(\widehat{J}_1(\cdot ,\xi )\) is continuous. Fix \(u \in U\). The measurability of \(S(u,\cdot )\) follows from [8, Theorem 8.2.9] when combined with Assumptions 1 and 2. Using the definition of \(\widehat{J}_1\) provided in (4), the measurability of \(J_1(u,\cdot )\) and of \(S(u,\cdot )\), the separability of \(Y\), and the composition rule [35, Corollary 1.1.11], we find that \(\widehat{J}_1 (u,\cdot )\) is measurable. \(\square \)

We define the expectation function \(F_1 : B_{V_{\text {ad}}^\rho (u_0)} \rightarrow {\mathbb {R}}\) and the sample average function \({\hat{F}}_{1,N} : B_{V_{\text {ad}}^\rho (u_0)} \rightarrow {\mathbb {R}}\) by

$$\begin{aligned} F_1(u) = {\mathbb {E}}\left[ \widehat{J}_1(u,\xi )\right] \quad \text {and} \quad {\hat{F}}_{1,N}(u) = \frac{1}{N} \sum _{i=1}^N\widehat{J}_1(u,\xi ^i). \end{aligned}$$
(10)

Lemma 2

If Assumptions 1-4 hold, then \(F_1\) and \({\hat{F}}_{1,N}\) are continuously differentiable on \(B_{V_{\text {ad}}^\rho (u_0)}\) and for each \(u \in B_{V_{\text {ad}}^\rho (u_0)}\), we have \(\nabla F_1(u) = {\mathbb {E}}\left[ \nabla _u \widehat{J}_1(u, \xi )\right] \) and \(\nabla {\hat{F}}_{1,N}(u) = (1/N)\sum _{i=1}^N \nabla _u \widehat{J}_1(u, \xi ^i)\).

We prove Lemma 2 using Lemma 3.

Lemma 3

If Assumptions 1,2 and 4 hold, then for all \(\xi \in \Xi \), the function \(\widehat{J}_1(\cdot , \xi )\) is continuously differentiable on \(U\), and for all \(u \in B_{V_{\text {ad}}^\rho (u_0)}\), we have

$$\begin{aligned} \widehat{J}_1(u,\xi ) \le \widehat{J}_1(u_0,\xi ) + C_K\zeta (\xi ) \left\Vert u-u_0\right\Vert _{U}, \end{aligned}$$

where \(C_K \in [0,\infty )\) is the operator norm of \(K\).

Proof

Since \(K\) is linear and compact, it is bounded [49, Lemma 8.1-2]. Hence \(C_K\) is finite. For each \(\xi \in \Xi \), \(\widehat{J}_1(\cdot , \xi )\) is continuously differentiable on \(U\) owing to the implicit function theorem and Assumptions 1 and 2. Since \(\psi \ge 0\) and \(\widehat{J}_1 \ge 0\), we have \(u_0 \in V_{\text {ad}}^\rho (u_0)\). Using the mean-value theorem, the convexity of \(B_{V_{\text {ad}}^\rho (u_0)}\), u, \(u_0 \in B_{V_{\text {ad}}^\rho (u_0)}\), the formula (8), and the estimate (9), we obtain

$$\begin{aligned} \widehat{J}_1(u,\xi )-\widehat{J}_1(u_0,\xi )&\le \sup _{t\in (0,1)}\, \left\Vert \nabla _u\widehat{J}_1(u_0+t(u-u_0),\xi )\right\Vert _{U}\left\Vert u-u_0\right\Vert _{U} \\ {}&\le C_K \zeta (\xi ) \left\Vert u-u_0\right\Vert _{U}. \end{aligned}$$

\(\square \)

Proof

(Proof of Lemma 2) Owing to \(\psi (u_0) \in [0,\infty )\), \({\mathbb {E}}\left[ \widehat{J}(u_0,\xi )\right] < \infty \), and \(\widehat{J}_1 \ge 0\), we have \({\mathbb {E}}\left[ \widehat{J}_1(u_0,\xi )\right] \in [0,\infty )\). Combined with Lemma 3 and \({\mathbb {E}}\left[ \zeta (\xi )\right] < \infty \), we find that \(F_1\) is well-defined on the open set \(B_{V_{\text {ad}}^\rho (u_0)}\). Moreover, \(\widehat{J}_1(\cdot ,\xi )\) is continuously differentiable on \(U\) for all \(\xi \in \Xi \). Combined with Assumption 4 and [25, Lemma C.3], we find that \(F_1\) and \({\hat{F}}_{1,N}\) are Fréchet differentiable on \(B_{V_{\text {ad}}^\rho (u_0)}\) with the asserted derivatives. Using Assumption 4 and the dominated convergence theorem, we obtain the continuity of the Fréchet derivatives on \(B_{V_{\text {ad}}^\rho (u_0)}\). \(\square \)

3.1 Compact Subsets

We define a compact subset of the feasible set \(U_\text {ad}\) that contains the solutions to the risk-neutral problem (1) and eventually those to its SAA problem (2).

Let us define

$$\begin{aligned} W_\text {ad}^\rho = V_{\text {ad}}^\rho (u_0) \cap \overline{\{\textrm{prox}_{\psi /\alpha }({-(1/\alpha ) K[v]}) :\, \, v \in V,\, \left\Vert v\right\Vert _{V} \le {\mathbb {E}}\left[ \zeta (\xi )\right] + \rho \}}^{\left\Vert \cdot \right\Vert _{U}}, \end{aligned}$$
(11)

where \(\overline{U_0}^{\left\Vert \cdot \right\Vert _{U}}\) denotes the \(\left\Vert \cdot \right\Vert _{U}\)-closure of \(U_0 \subset U\).

Lemma 4

If Assumptions 1,2 and 4 hold, then \(W_\text {ad}^\rho \) is a compact subset of \(U_\text {ad}\).

Proof

We first show that the second set on the right-hand side in (11) is compact. Assumption 4 (c) yields \({\mathbb {E}}\left[ \zeta (\xi )\right] < \infty \). Hence the set \(\{v \in V: \, \left\Vert v\right\Vert _{V} \le {\mathbb {E}}\left[ \zeta (\xi )\right] + \rho \}\) is bounded. Thus, its image under the compact operator \(K\) (see Assumption 4 (a)) is precompact. The operator \(\textrm{prox}_{\psi /\alpha }({-(1/\alpha ) \cdot }) : U\rightarrow U\) is continuous, as \(\textrm{prox}_{\psi /\alpha } \) is firmly nonexpansive [10, Proposition 12.28]. Since each continuous function maps precompact sets to precompact ones [49, p. 412], the second set on the right-hand side in (11) is compact. This set is a subset of \(U_\text {ad}\) because \(\textrm{prox}_{\psi /\alpha }({U}) \subset U_\text {ad}\). Since \(V_{\text {ad}}^\rho (u_0)\) is closed, the set \(W_\text {ad}^\rho \) is compact. Owing to \(V_{\text {ad}}^\rho (u_0) \subset U_\text {ad}\), we have \(W_\text {ad}^\rho \subset U_\text {ad}\). \(\square \)

For each \(\omega \in \Omega \), we define

$$\begin{aligned} W_{\text {ad}}^{[N]}(\omega ) = \overline{\{u = \textrm{prox}_{\psi /\alpha } (-(1/\alpha )\nabla _u\hat{F}_{1,N}(u,\omega )): u \in V_{\text {ad}}^\rho (u_0)\}}^{\Vert \cdot \Vert _{U}}. \end{aligned}$$
(12)

Lemma 5

Let Assumptions 1-4 hold. Then the following assertions hold.

  1. 1.

    The set of solutions to (1) is contained in \(W_\text {ad}^\rho \).

  2. 2.

    We have w.p. \(1\) for all sufficiently large N, \(W_\text {ad}^{[N]} \subset W_\text {ad}^\rho \).

Proof

  1. 1.

    Let \(u^*\) be a solution to (1). Since \(\widehat{J}_1 \ge 0\) and \(\psi \ge 0\), we have \(u^* \in V_{\text {ad}}^\rho (u_0)\). Lemma 2 ensures that \(F_1\) is continuously differentiable on \(B_{V_{\text {ad}}^\rho (u_0)}\). Hence \(u^* = \textrm{prox}_{\psi /\alpha }({-(1/\alpha )\nabla F_1(u^*)}) \) (cf. [63, Proposition 3.5] and [53, p. 2092]). Using Assumption 4, in particular the bound in (9), and [35, Proposition 1.2.2], we have

    $$\begin{aligned} \left\Vert {\mathbb {E}}\left[ M(u^*,\xi )\right] \right\Vert _{V} \le {\mathbb {E}}\left[ \left\Vert M(u^*,\xi )\right\Vert _{V}\right] \le {\mathbb {E}}\left[ \zeta (\xi )\right] < \infty . \end{aligned}$$

    Combined with (8) and [35, eq. (1.2)], we find that

    $$\begin{aligned} \nabla F_1(u^*) ={\mathbb {E}}\left[ KM(u^*,\xi )\right] =K{\mathbb {E}}\left[ M(u^*,\xi )\right] . \end{aligned}$$

    Since \(u^* = \textrm{prox}_{\psi /\alpha }({-(1/\alpha )K{\mathbb {E}}\left[ M(u^*,\xi )\right] }) \) and \(\rho > 0\), we have \(u^* \in W_\text {ad}^\rho \) (see (11)).

  2. 2.

    The (strong) law of large numbers ensures \((1/N) \sum _{i=1}^N\zeta (\xi ^i) \rightarrow {\mathbb {E}}\left[ \zeta (\xi )\right] \) w.p. \(1\) as \(N \rightarrow \infty \). Combined with \(\rho > 0\), we deduce the existence of an event \(\Omega _1 \in {\mathcal {F}}\) with \(P(\Omega _1) = 1\) and for each \(\omega \in \Omega _1\), there exists \(n(\omega ) \in {\mathbb {N}}\) such that for all \(N \ge n(\omega )\), we have

    $$\begin{aligned} \frac{1}{N} \sum _{i=1}^N\zeta (\xi ^i(\omega )) \le {\mathbb {E}}\left[ \zeta (\xi )\right] +\rho . \end{aligned}$$

    Fix \(\omega \in \Omega _1\) and let \(N \ge n(\omega )\). Let \(u \in V_{\text {ad}}^\rho (u_0)\) be arbitrary. Using \(V_{\text {ad}}^\rho (u_0) \subset B_{V_{\text {ad}}^\rho (u_0)}\) and Assumption 4, we find that

    $$\begin{aligned} \bigg \Vert {\frac{1}{N}\sum _{i=1}^N M(u,\xi ^i(\omega ))}\bigg \Vert _{V}&\le \frac{1}{N}\sum _{i=1}^N\left\Vert M(u,\xi ^i(\omega ))\right\Vert _{V} \le \frac{1}{N} \sum _{i=1}^N\zeta (\xi ^i(\omega )) \\&\le {\mathbb {E}}\left[ \zeta (\xi )\right] +\rho , \end{aligned}$$

    where the right-hand side is independent of \(u \in V_{\text {ad}}^\rho (u_0)\). Furthermore

    $$\begin{aligned} \nabla {\hat{F}}_{1,N}(u,\omega ) =(1/N) \sum _{i=1}^N \nabla _u \widehat{J}_1(u,\xi ^i(\omega )) =K\bigg ( (1/N) \sum _{i=1}^N M(u,\xi ^i(\omega )) \bigg ). \end{aligned}$$

    We conclude that \(u = \textrm{prox}_{\psi /\alpha }(-(1/\alpha )\nabla \hat{F}_{1,N}(u,\omega )) \in W_{\text {ad}}^\rho \) for each \(u \in V_{\text {ad}}^\rho (u_0)\). Since \(W_\text {ad}^\rho \) is closed (see Lemma 4), we have \(W_\text {ad}^{[N]}(\omega ) \subset W_\text {ad}^\rho \). Hence

    $$\begin{aligned} \Omega _1 \subset \{\omega \in \Omega :\, \exists \, n(\omega ) \in {\mathbb {N}}\quad \forall \, N \ge n(\omega ); \quad W_\text {ad}^{[N]}(\omega ) \subset W_\text {ad}^\rho \}. \end{aligned}$$

    The set on the right-hand side is a subset of \(\Omega \). Since \(\Omega _1 \in {\mathcal {F}}\), \(P(\Omega _1) = 1\) and \((\Omega , {\mathcal {F}}, P)\) is complete, the set on the right-hand side is measurable and hence occurs w.p. \(1\).

\(\square \)

To establish the measurability of the event “for all sufficiently large N, \(W_\text {ad}^{[N]} \subset W_\text {ad}^\rho \),” we used the fact that \((\Omega , {\mathcal {F}}, P)\) is complete. Since this event equals the limit inferior of the sequence \((\{\omega \in \Omega :W_\text {ad}^{[N]}(\omega ) \subset W_\text {ad}^\rho \})_N\), the measurability of the event would also be implied by that of \(\{\omega \in \Omega :W_\text {ad}^{[N]}(\omega ) \subset W_\text {ad}^\rho \}\) for each \(N \in {\mathbb {N}}\) [12, p. 55]. This approach would require us to show that \(\{\omega \in \Omega :W_\text {ad}^{[N]}(\omega ) \subset W_\text {ad}^\rho \}\) is measurable for each \(N \in {\mathbb {N}}\), which entails those of \(\{\omega \in \Omega :W_\text {ad}^{[N]}(\omega ) \subset W_\text {ad}^\rho \}\) and \(W_\text {ad}^{[N]}\). Using [8, Theorem 8.2.8], we can show that \(W_\text {ad}^{[N]}\) is measurable. However, an application of [8, Theorem 8.2.8] requires \((\Omega , {\mathcal {F}}, P)\) be complete.

3.2 Consistency of SAA Optimal Values and SAA Solutions

We demonstrate the consistency of the SAA optimal value and the SAA solutions. Let \(\vartheta ^*\) and \({\mathscr {S}}\) be the optimal value and the set of solutions to (1), respectively. Moreover, for each \(\omega \in \Omega \), let \({\hat{\vartheta }}_N^*(\omega )\) and \(\hat{{\mathscr {S}}}_N(\omega )\) be the optimal value and the set of solutions to the SAA problem (2), respectively.

We define the distance \(\textrm{dist}({u,{\mathscr {S}}}) \) from \(u \in \hat{{\mathscr {S}}}_N(\omega )\) to \({\mathscr {S}}\) and the deviation \({\mathbb {D}}({\hat{{\mathscr {S}}}_N(\omega ),{\mathscr {S}}}) \) between the sets \(\hat{{\mathscr {S}}}_N(\omega )\) and \({\mathscr {S}}\) by

$$\begin{aligned} \textrm{dist}({u,{\mathscr {S}}}) = \inf _{v\in {\mathscr {S}}}\, \left\Vert u-v\right\Vert _{U} \quad \text {and} \quad {\mathbb {D}}({\hat{{\mathscr {S}}}_N(\omega ),{\mathscr {S}}}) = \sup _{u\in \hat{{\mathscr {S}}}_N(\omega )}\, \textrm{dist}({u,{\mathscr {S}}}) . \end{aligned}$$

Theorem 1

If Assumptions 1-4 hold, then \({\hat{\vartheta }}_N^* \rightarrow \vartheta ^*\) and \({\mathbb {D}}({\hat{{\mathscr {S}}}_N,{\mathscr {S}}}) \rightarrow 0\) w.p. \(1\) as \(N \rightarrow \infty \).

We prepare our proof of Theorem 1, which is based on that of [72, Theorem 5.3].

Lemma 6

If Assumptions 1, 2 and 4 hold, then the function \(\widehat{J}_1\) defined in (4) is a Carathéodory function on \(W_\text {ad}^\rho \times \Xi \). Moreover, \((\widehat{J}_1(u,\xi ))_{u\in W_\text {ad}^\rho }\) is dominated by an integrable function.

Proof

Lemma 4 ensures that \(W_\text {ad}^\rho \) is a compact metric space. Since \(W_\text {ad}^\rho \subset U\) and \(\widehat{J}_1\) is a Carathéodory function on \(U\times \Xi \) (see Lemma 1), the function \(\widehat{J}_1\) is a Carathéodory function on \(W_\text {ad}^\rho \times \Xi \). Lemma 3 ensures that for all \(u \in W_\text {ad}^\rho \subset V_{\text {ad}}^\rho (u_0)\),

$$\begin{aligned} \widehat{J}_1(u,\xi ) \le \widehat{J}_1(u_0,\xi ) + C_K\zeta (\xi ) \sup _{u\in W_\text {ad}^\rho }\, \left\Vert u-u_0\right\Vert _{U}. \end{aligned}$$

The random variable on the right-hand side is integrable owing to the integrability of \(\zeta \) (see Assumption 4 (c)), the boundedness of \(W_\text {ad}^\rho \) (see Lemma 4), \(C_K \in [0,\infty )\), and \({\mathbb {E}}\left[ \widehat{J}(u_0,\xi )\right] < \infty \). Combined with \(\widehat{J}_1 \ge 0\), we find that \((\widehat{J}_1(u,\xi ))_{W_\text {ad}^\rho }\) is dominated by an integrable random variable. \(\square \)

Lemma 7

If Assumptions 1-4 hold, then for each \(N \in {\mathbb {N}}\), the functions \({\hat{\vartheta }}_N^*\) and \({\mathbb {D}}({\hat{{\mathscr {S}}}_N,{\mathscr {S}}}) \) are measurable.

Proof

For each \(\omega \in \Omega \), Assumption 3 ensures that \(\hat{{\mathscr {S}}}_N(\omega )\) is nonempty. The function \(\widehat{J}_1\) is Carathéodory function on \(U\times \Xi \) according to Lemma 6 and \(\psi \) is lower semicontinuous according to Assumption 1 (c). Hence \({\hat{\vartheta }}_N^*\) is measurable [15, Lemma III.39] and the set-valued mapping \(\hat{{\mathscr {S}}}_N\) is measurable [15, p. 86]. Assumption 3 implies that \({\mathscr {S}}\) is nonempty and, hence, \(\textrm{dist}({\cdot ,{\mathscr {S}}}) \) is (Lipschitz) continuous [4, Theorem 3.16]. For each \(\omega \in \Omega \), \({\hat{F}}_N(\cdot , \omega )\) is lower semicontinuous and hence \(\hat{{\mathscr {S}}}_N(\omega )\) is closed. Thus \({\mathbb {D}}({\hat{{\mathscr {S}}}_N,{\mathscr {S}}}) \) is measurable [8, Theorem 8.2.11]. \(\square \)

Lemma 8

If Assumptions 1-4 hold, then \({\hat{F}}_{N}\) converges to \(F\) w.p. \(1\) uniformly on \(W_\text {ad}^\rho \).

Proof

We first verify the hypotheses of the uniform law of large numbers established in [52, Corollary 4:1] to demonstrate the uniform almost sure convergence of \({\hat{F}}_{1,N}\) to \(F_1\) on \(W_\text {ad}^\rho \).

Lemma 6 ensures that \(\widehat{J}_1\) is a Carathéodory function on \(W_\text {ad}^\rho \times \Xi \) and that \((\widehat{J}_1(u,\xi ))_{u\in W_\text {ad}^\rho }\) is dominated by an integrable function. Moreover, \(W_\text {ad}^\rho \) is a compact metric space (see Lemma 4). Since \(\xi ^1, \xi ^2, \ldots \) are independent identically distributed random elements, the uniform law of large numbers [52, Corollary 4:1] implies that \({\hat{F}}_{1,N}(\cdot ) = (1/N) \sum _{i=1}^N \widehat{J}_1(\cdot ,\xi ^i)\) converges to \(F_1(\cdot ) = {\mathbb {E}}\left[ \widehat{J}_1(\cdot ,\xi )\right] \) w.p. \(1\) uniformly on \(W_\text {ad}^\rho \).

Since \(U_\text {ad}\) is the domain of \(\psi \) and \(\psi \ge 0\), we have \( \psi (u) \in [0,\infty )\) for all \(u \in U_\text {ad}\). Lemma 4 ensures \(W_\text {ad}^\rho \subset U_\text {ad}\). Hence for all \(u \in W_\text {ad}^\rho \),

$$\begin{aligned} {\hat{F}}_N(u) - F(u)&= {\hat{F}}_{1,N}(u) +(\alpha /2)\left\Vert u\right\Vert _{U}^2 + \psi (u) - \big (F_1(u) + (\alpha /2)\left\Vert u\right\Vert _{U}^2 + \psi (u) \big ) \\&= {\hat{F}}_{1,N}(u) - F_1(u). \end{aligned}$$

Therefore, the assertion follows from the above uniform convergence statement. \(\square \)

Lemma 9 demonstrates that the SAA solution set is eventually contained in the compact set \(W_\text {ad}^\rho \).

Lemma 9

If Assumptions 1-4 hold, then w.p. \(1\) for all sufficiently large N, \(\hat{{\mathscr {S}}}_N \subset W_\text {ad}^\rho \).

Proof

First, we show that w.p. \(1\) for all sufficiently large N, \(\hat{{\mathscr {S}}}_N \subset V_{\text {ad}}^\rho (u_0)\). Lemma 6 ensures that \(\widehat{J}_1\) is a Carathéodory function on \(U\times \Xi \). Since \(\widehat{J}\ge 0\), \(u_0 \in U_\text {ad}\), and \({\mathbb {E}}\left[ \widehat{J}(u_0,\xi )\right] < \infty \), the (strong) law of large numbers ensures

$$\begin{aligned} \frac{1}{N}\sum _{i=1}^N \widehat{J}(u_0,\xi ^i) \rightarrow {\mathbb {E}}\left[ \widehat{J}(u_0,\xi )\right] \quad \text {w.p.~1} \quad \text {as}\quad N \rightarrow \infty . \end{aligned}$$

Combined with \(\rho > 0\), we deduce the existence of an event \(\Omega _1 \in {\mathcal {F}}\) such that \(P(\Omega _1) = 1\) and for each \(\omega \in \Omega _1\), there exists \(n_1(\omega ) \in {\mathbb {N}}\) such that for all \(N \ge n_1(\omega )\), we have

$$\begin{aligned} \frac{1}{N}\sum _{i=1}^N \widehat{J}(u_0,\xi ^i(\omega )) \le {\mathbb {E}}\left[ \widehat{J}(u_0,\xi )\right] + \rho . \end{aligned}$$
(13)

Fix \(\omega \in \Omega _1\) and let \(N \ge n_1(\omega )\). Using \(\psi \ge 0\) and \(\widehat{J}_1 \ge 0\), we have for all \(u_N^* = u_N^*(\omega ) \in \hat{{\mathscr {S}}}_N(\omega )\),

$$\begin{aligned} (\alpha /2) \left\Vert u_N^*\right\Vert _{U}^2&\le \frac{1}{N}\sum _{i=1}^N \widehat{J}_1(u_N^*,\xi ^i(\omega )) + \psi (u_N^*) + (\alpha /2) \left\Vert u_N^*\right\Vert _{U}^2 \\&\le \frac{1}{N}\sum _{i=1}^N \widehat{J}(u_0,\xi ^i(\omega )). \end{aligned}$$

Combined with (13), we find that \(\hat{{\mathscr {S}}}_N(\omega ) \subset V_{\text {ad}}^\rho (u_0)\).

By construction of \(W_\text {ad}^{[N]}\), we have \(\hat{{\mathscr {S}}}_N(\omega ) \cap V_{\text {ad}}^\rho (u_0) \subset W_\text {ad}^{[N]}(\omega )\) for all \(\omega \in \Omega \). Indeed, if \(u_N^*(\omega ) \in \hat{{\mathscr {S}}}_N(\omega ) \cap V_{\text {ad}}^\rho (u_0)\), then we have the first-order optimality condition \(u_N^*(\omega ) = \textrm{prox}_{\psi /\alpha }({-(1/\alpha ) \nabla {\hat{F}}_{1,N}(u_N^*(\omega ),\omega )}) \). Hence \(u_N^*(\omega ) \in W_\text {ad}^{[N]}(\omega )\). Lemma 5 implies that w.p. \(1\) for all sufficiently large N, \(W_\text {ad}^{[N]} \subset W_\text {ad}^\rho \). Hence there exists \(\Omega _2 \in {\mathcal {F}}\) with \(P(\Omega _2) = 1\) and for each \(\omega \in \Omega _2\) there exists \(n_2(\omega ) \in {\mathbb {N}}\) such that for all \(N \ge n_2(\omega )\), \(W_\text {ad}^{[N]}(\omega ) \subset W_\text {ad}^\rho \). Putting together the pieces, we find that for all \(\omega \in \Omega _1 \cap \Omega _2\) and each \(N \ge \max \{n_1(\omega ), n_2(\omega )\}\), we have \(\hat{{\mathscr {S}}}_N(\omega ) \subset W_\text {ad}^\rho \). Since \((\Omega ,{\mathcal {F}}, P)\) is complete and \(P(\Omega _1 \cap \Omega _2) = 1\), we have w.p. \(1\) for all sufficiently large N, \(\hat{{\mathscr {S}}}_N \subset W_\text {ad}^\rho \). \(\square \)

Proof

(Proof of Theorem 1) The proof is based on that of [72, Theorem 5.3]. Lemma 5 yields \({\mathscr {S}} \subset W_\text {ad}^\rho \). Lemma 9 ensures that w.p. \(1\) for all sufficiently large N, \(\hat{{\mathscr {S}}}_N \subset W_\text {ad}^\rho \). Hence, we deduce the existence of an event \(\Omega _1 \in {\mathcal {F}}\) with \(P(\Omega _1) = 1\) and for each \(\omega \in \Omega _1\) there exists \(n(\omega ) \in {\mathbb {N}}\) such that for all \(N \ge n(\omega )\), \(\hat{{\mathscr {S}}}_N(\omega ) \subset W_\text {ad}^\rho \). Lemma 8 ensures that \({\hat{F}}_{N}(\cdot ,\omega )\) converges to \(F(\cdot )\) uniformly on \(W_\text {ad}^\rho \) for almost all \(\omega \in \Omega \). Therefore, there exists \(\Omega _2 \in {\mathcal {F}}\) with \(P(\Omega _2) = 1\) and for each \(\omega \in \Omega _2\), \({\hat{F}}_{N}(\cdot ,\omega )\) converges to \(F(\cdot )\) uniformly on \(W_\text {ad}^\rho \).

We show that \({\hat{\vartheta }}_N^*(\omega ) \rightarrow \vartheta ^*\) as \(N \rightarrow \infty \) for each \(\omega \in \Omega _1 \cap \Omega _2\). Fix \(\omega \in \Omega _1 \cap \Omega _2\). Assumption 3 ensures that \({\mathscr {S}}\) and \(\hat{{\mathscr {S}}}_N(\omega )\) are nonempty for all \(N \in {\mathbb {N}}\). Let \(u^* \in {\mathscr {S}}\) and let \(u_N^*(\omega ) \in \hat{{\mathscr {S}}}_N(\omega )\). Then for all \(N \ge n(\omega )\), we have \(u_N^*(\omega ) \in W_\text {ad}^\rho \) and hence \(|{\hat{\vartheta }}_N^*(\omega )-\vartheta ^*| \le \sup _{u \in W_\text {ad}^\rho }\, |{\hat{F}}_{N}(u,\omega )-F(u)| \) for all \(N \ge n(\omega )\) (cf. [37, pp. 194–195]). We deduce \({\hat{\vartheta }}_N^*(\omega ) \rightarrow \vartheta ^*\) as \(N \rightarrow \infty \).

Next, we show that \({\mathbb {D}}({\hat{{\mathscr {S}}}_N(\omega ),{\mathscr {S}}}) \rightarrow 0\) as \(N \rightarrow \infty \) for each \(\omega \in \Omega _1 \cap \Omega _2\). Fix \(\omega \in \Omega _1 \cap \Omega _2\). Since \({\mathscr {S}}\) is nonempty (see Assumption 3), the function \(\textrm{dist}({\cdot ,{\mathscr {S}}}) \) is (Lipschitz) continuous [4, Theorem 3.16]. For each \(N \ge n(\omega )\), the set \(\hat{{\mathscr {S}}}_N(\omega )\) is closed and \(\hat{{\mathscr {S}}}_N(\omega ) \subset W_\text {ad}^\rho \). Hence \(\hat{{\mathscr {S}}}_N(\omega )\) is compact for each \(N \ge n(\omega )\). Therefore, for each \(N \ge n(\omega )\), there exists \(u_N = u_N(\omega ) \in W_\text {ad}^\rho \) with \(\textrm{dist}({u_N,{\mathscr {S}}}) = {\mathbb {D}}({\hat{{\mathscr {S}}}_N(\omega ),{\mathscr {S}}}) \). Suppose that \({\mathbb {D}}({\hat{{\mathscr {S}}}_N(\omega ),{\mathscr {S}}}) \not \rightarrow 0\). We deduce the existence of a subsequence \({\mathcal {N}} = {\mathcal {N}}(\omega )\) of \((n(\omega ),n(\omega )+1, \ldots )\) such that \({\mathbb {D}}({u_N,{\mathscr {S}}}) \ge \varepsilon \) for all \(N \in {\mathcal {N}}\) and some \(\varepsilon > 0\), and \(u_N \rightarrow {\bar{u}} \in W_\text {ad}^\rho \) as \({\mathcal {N}} \ni N \rightarrow \infty \). Combined with the fact that \(\textrm{dist}({\cdot ,{\mathscr {S}}}) \) is continuous, we obtain \({\bar{u}} \not \in {\mathscr {S}}\). Hence \(F({\bar{u}}) > \vartheta ^*\). We have

$$\begin{aligned} \liminf _{{\mathcal {N}}\ni N \rightarrow \infty } \,{\hat{F}}_{N}(u_N,\omega ) = \lim _{{\mathcal {N}}\ni N \rightarrow \infty } \big ({\hat{F}}_{N}(u_N,\omega )-F(u_N)\big ) + \liminf _{{\mathcal {N}}\ni N \rightarrow \infty }\, F(u_N). \end{aligned}$$

The uniform convergence implies that the first term in the right-hand side is zero. Since \(F\) is lower semicontinuous on \(U_\text {ad}\) (see Assumption 1 and Lemma 2), \(F({\bar{u}}) > \vartheta ^*\), and \({\hat{F}}_{N}(u_N,\omega ) = {\hat{\vartheta }}_N^*(\omega )\), we find that

$$\begin{aligned} \liminf _{{\mathcal {N}}\ni N \rightarrow \infty } {\hat{\vartheta }}_N^*(\omega ) = \liminf _{{\mathcal {N}}\ni N \rightarrow \infty } \,{\hat{F}}_{N}(u_N,\omega ) = \liminf _{{\mathcal {N}}\ni N \rightarrow \infty }\, F(u_N) \ge F({\bar{u}}) > \vartheta ^*. \end{aligned}$$

This contradicts \({\hat{\vartheta }}_N^*(\omega ) \rightarrow \vartheta ^*\) as \(N \rightarrow \infty \). Hence \({\mathbb {D}}({\hat{{\mathscr {S}}}_N(\omega ),{\mathscr {S}}}) \rightarrow 0\) as \(N \rightarrow \infty \). Combined with Lemma 7 and the fact that \(P(\Omega _1 \cap \Omega _2) = 1\), we obtain the almost sure convergence statements. \(\square \)

4 Examples

We present three risk-neutral nonlinear PDE-constrained optimization problems and verify the assumptions made in Sect. 3, except Assumption 3 on the existence of solutions in order to keep the section relatively short.

We use the following facts. (i) The Sobolev spaces \(H_0^1(D)\) and \(H^1(D)\) are separable Hilbert spaces [1, Theorem 3.5]. (ii) If a real Banach space is reflexive and separable, then its dual is separable [1, Theorem 1.14]. (iii) The operator norm of a linear, bounded operator equals that of its (Banach space-)adjoint operator [49, Theorem 4.5-2]. (iv) If \(\varLambda _1\) and \(\varLambda _2\) are real, reflexive Banach spaces and \(\Upsilon :\varLambda _1 \rightarrow \varLambda _2\) is linear and bounded, then \((\Upsilon ^*)^* = \Upsilon \) [6, p. 390] (see also [65, Theorem 8.57]) because we write \((\varLambda _i^*)^* = \varLambda _i\) for \(i \in \{1,2\}\).

4.1 Boundary Optimal Control of a Semilinear State Equation

We consider the risk-neutral boundary optimal control of a parameterized semilinear PDE. Our model problem is based on the deterministic semilinear boundary control problems studied in [14, 32, 36, 76].

We consider

$$\begin{aligned} \min _{u\in L^2(\partial D)}\, (1/2){\mathbb {E}}\left[ \left\Vert S(u,\xi )-y_d\right\Vert _{L^2(D)}^2\right] + (\alpha /2)\left\Vert u\right\Vert _{L^2(\partial D)}^2 + \psi (u), \end{aligned}$$
(14)

where \(\partial D\) is the boundary of \(D\subset {\mathbb {R}}^{2}\) and for each \((u,\xi ) \in L^2(\partial D) \times \Xi \), the state \(S(u,\xi ) \in H^1(D)\) is the weak solution to: find \(y \in H^1(D)\) with

$$\begin{aligned} -\nabla \cdot (\kappa (\xi )\nabla y) + g(\xi ) y + y^3 = b(\xi ) \;\; \text {in} \;\; D, \quad \kappa (\xi ) \partial _\nu y + \sigma (\xi )y = Bu \;\; \text {on} \;\; \partial D, \end{aligned}$$
(15)

where \(\partial _\nu y\) is the normal derivative of y; see [76, p. 31]. For a bounded Lipschitz domain \(D\subset {\mathbb {R}}^d\), we denote by \(L^2(\partial D)\) the space of square integrable functions on \(\partial D\) and by \(L^\infty (\partial D)\) that of essentially bounded functions [6, p. 263]. The space \(L^2(\partial D)\) is a Hilbert space with inner product \(( v, w )_{L^2(\partial D)} = \int _{\partial D} v(x)w(x) \textrm{d}\textrm{H}^{d-1}(x)\), where \(\textrm{H}^{d-1}\) is the \((d-1)\)-dimensional Hausdorff measure on \(\partial D\) [6, Theorem 3.16 and pp. 47, 263 and 267]. The space \(L^2(\partial D)\) is separable [61, Theorem 4.1].

We formulate assumptions on the control problem (14).

  • \(D\subset {\mathbb {R}}^2\) is a bounded Lipschitz domain.

  • \(\kappa \), \(g : \Xi \rightarrow L^\infty (D)\) are strongly measurable and there exist \( \kappa _{\min }\), \( \kappa _{\max }\), \(g_{\min }\), \(g_{\max } \in (0,\infty )\) such that \( \kappa _{\min } \le \kappa (\xi ) \le \kappa _{\max }\) and \(g_{\min } \le g(\xi ) \le g_{\max }\) for all \(\xi \in \Xi \).

  • \(b:\Xi \rightarrow L^2(D)\) and \(\sigma : \Xi \rightarrow L^\infty (\partial D)\) are strongly measurable with \({\mathbb {E}}\left[ \left\Vert b(\xi )\right\Vert _{L^2(D)}^2\right] < \infty \), \({\mathbb {E}}\left[ \left\Vert \sigma (\xi )\right\Vert _{L^\infty (\partial D)}^2\right] < \infty \) and \(\sigma (\xi ) \ge 0\) for all \(\xi \in \Xi \).

  • \(B : L^2(\partial D) \rightarrow L^2(\partial D)\) is a linear, bounded operator.

  • \(y_d \in L^2(D)\), \(\alpha > 0\), and \(\psi : L^2(\partial D) \rightarrow [0,\infty ]\) is proper, convex, and lower semicontinuous.

Throughout the section, we assume these conditions be satisfied.

We establish Assumption 1. Since the embedding is continuous, the function \(J_1: H^1(D) \rightarrow [0,\infty )\) defined by \(J_1(y) = (1/2)\left\Vert y-y_d\right\Vert _{L^2(D)}^2\) is continuously differentiable. We find that Assumption 1 holds true.

We formulate the weak form of (15) as an operator equation; cf. [36, eq. (2)]. We define \(E: H^1(D) \times L^2(\partial D) \times \Xi \rightarrow H^1(D)^*\) by

$$\begin{aligned} \begin{aligned} \langle E(y,u,\xi ), v \rangle _{{H^1(D)}^*\!, H^1(D)}&= ( \kappa (\xi )\nabla y, \nabla v )_{L^2(D)^{2}} +( g(\xi )y+y^3, v )_{L^2(D)} \\&\quad + ( \sigma (\xi )\tau _{\partial D}[y], \tau _{\partial D}[v] )_{L^2(\partial D)} \\&\quad -( b(\xi ), v )_{L^2(D)} -( Bu, \tau _{\partial D}[v] )_{L^2(\partial D)}. \end{aligned} \end{aligned}$$
(16)

where \(\tau _{\partial D} : H^1(D) \rightarrow L^2(\partial D)\) is the trace operator. We refer the reader to [6, p. 268] for the definition of \(\tau _{\partial D}\). Since \(D\) has a Lipschitz boundary, the trace operator \(\tau _{\partial D}\) is linear and compact [61, Theorem 6.2].

We verify Assumption 2. Using [32, Theorem 1.15], we find that \(E(y,u,\xi ) = 0\) has a unique solution \(S(u,\xi ) \in H^1(D)\) for each \((u,\xi ) \in L^2(\partial D) \times \Xi \). Since the embedding is continuous [32, Theorem 1.14], we have \(y^3 \in L^2(D)\) for each \(y \in H^1(D)\) [32, p. 57] and the mapping \(L^6(D) \ni y \mapsto y^3 \in L^2(D)\) is continuously differentiable [32, p. 76]. We find that \(E(\cdot , \cdot , \xi )\) is continuously differentiable. Now, the Lax–Milgram lemma can be used to show that \(E_y(S(u,\xi ),u,\xi )\) has a bounded inverse. We show that \(E(y,u,\cdot )\) is measurable for each \((y,u) \in H^1(D) \times L^2(\partial D)\). Since \(H^1(D)^*\) is separable, it suffices to show that \(\xi \mapsto \langle E(y,u,\xi ), v \rangle _{{H^1(D)}^*\!, H^1(D)}\) is measurable for each fixed \((y,v,u) \in H^1(D)^2 \times L^2(\partial D)\) [35, Theorem 1.1.6]. We define \(\phi : L^\infty (D) \rightarrow {\mathbb {R}}\) by \(\phi (\nu ) = ( \nu \nabla y, \nabla v )_{L^2(D)^{2}}\). Hölder’s inequality ensures that \(\phi \) is (Lipschitz) continuous. Since \(\xi \mapsto ( \kappa (\xi )\nabla y, \nabla v )_{L^2(D)^{2}}\) is the composition of the continuous function \(\phi \) with \(\kappa \), it is measurable [35, Corollary 1.1.11]. Similar arguments can be used to establish the measurability of the other terms in (16). Hence Assumption 2 holds true.

We establish Assumption 4. Fix \((u,\xi ) \in L^2(\partial D) \times \Xi \). Choosing \(v = S(u,\xi )\) in (16) and using

$$\begin{aligned} ( \kappa (\xi )\nabla y, \nabla y )_{L^2(D)^{2}} +( g(\xi )y, y )_{L^2(D)} \ge \min \{\kappa _{\min },g_{\min }\} \left\Vert y\right\Vert _{H^1(D)}^2 \end{aligned}$$

valid for all \(y \in H^1(D)\), we obtain the stability estimate

$$\begin{aligned} \min \{\kappa _{\min },g_{\min }\} \left\Vert S(u,\xi )\right\Vert _{H^1(D)} \le \left\Vert b(\xi )\right\Vert _{L^2(D)} + C_{\tau _{\partial D}} \left\Vert Bu\right\Vert _{L^2(\partial D)}, \end{aligned}$$
(17)

where \(C_{\tau _{\partial D}}\) is the operator norm of \(\tau _{\partial D}\). For each \((u,\xi ) \in L^2(\partial D) \times \Xi \), let \(z(u,\xi )\) be the unique solution to the adjoint equation: find \(z \in H^1(D)\) with

$$\begin{aligned} ( \kappa (\xi )\nabla z, \nabla v )_{L^2(D)^{2}}&+( g(\xi )z+3S(u,\xi )^2z, v )_{L^2(D)} + ( \sigma (\xi )\tau _{\partial D}[z], \tau _{\partial D}[v] )_{L^2(\partial D)} \\&= -( S(u,\xi )-y_d, v )_{L^2(D)} \quad \text {for all} \quad v \in H^1(D); \end{aligned}$$

cf. [76, eq. (4.54)] and [36, p. 729]. For its solution \(z(u,\xi )\), we obtain

$$\begin{aligned} \min \{\kappa _{\min },g_{\min }\} \left\Vert z(u,\xi )\right\Vert _{H^1(D)} \le \left\Vert S(u,\xi )-y_d\right\Vert _{L^2(D)}. \end{aligned}$$
(18)

Since \(\tau _{\partial D}^*\) is the adjoint operator of \(\tau _{\partial D}\), we have for all \(u \in L^2(\partial D)\) and \(v \in H^1(D)\), \(( Bu, \tau _{\partial D}[v] )_{L^2(\partial D)} = \langle \tau _{\partial D}^*Bu, v \rangle _{{H^1(D)}^*\!, H^1(D)} \). Combined with \(\tau _{\partial D} = (\tau _{\partial D}^*)^*\) and the identity \(E_u(S(u,\xi ),u,\xi ) = -\tau _{\partial D}^*B\) (cf. [32, p. 136]), the gradient formula in (6) yields

$$\begin{aligned} \nabla \widehat{J}_1(u,\xi ) = -B^*\tau _{\partial D}[z(u,\xi )]. \end{aligned}$$

We choose \(K= -B^*\tau _{\partial D}\) and \(M(u,\xi ) = z(u,\xi )\). The operator \(K: H^1(D) \rightarrow L^2(\partial D)\) is compact, as B is linear and bounded and \(\tau _{\partial D}\) is linear and compact [61, Theorem 6.2]. Using [8, Theorem 8.2.9] and the measurability of \(S(u,\cdot )\) (see Lemma 1), we can show that \(z(u,\cdot )\) is measurable for all \(u \in L^2(\partial D)\). The implicit function theorem implies that \(z(\cdot , \xi )\) is continuous for each \(\xi \in \Xi \). Since \(\psi \) is proper, there exists \(u_0 \in L^2(\partial D)\) with \(\psi (u_0) < \infty \). Using Young’s inequality, we have \( \widehat{J}_1(u_0,\xi ) \le \left\Vert y_d\right\Vert _{L^2(D)}^2+ \left\Vert S(u_0,\xi )\right\Vert _{L^2(D)}^2 \). Combined with (17), we find that \({\mathbb {E}}\left[ \widehat{J}_1(u_0,\xi )\right] < \infty \) and hence \({\mathbb {E}}\left[ \widehat{J}(u_0,\xi )\right] < \infty \). Let \(B_{V_{\text {ad}}^\rho (u_0)}\) be an open, bounded ball about zero containing \(V_{\text {ad}}^\rho (u_0)\) and let \(R_{\text {ad}}\) be its radius. We define

$$\begin{aligned} \zeta (\xi ) = \tfrac{1}{\min \{\kappa _{\min },g_{\min }\}} \Big ( \left\Vert y_d\right\Vert _{L^2(D)} + \tfrac{C_{\tau _{\partial D}} C_{B}R_{\text {ad}}+ \left\Vert b(\xi )\right\Vert _{L^2(D)}}{\min \{\kappa _{\min },g_{\min }\}} \Big ). \end{aligned}$$

where \(C_B > 0\) is the operator norm of B. The random variable \(\zeta \) is integrable. Using the stability estimates (17) and (18), we conclude that Assumption 4 holds true.

4.2 Distributed Control of a Steady Burgers’ Equation

We consider the risk-neutral distributed optimization of a steady Burgers’ equation. Deterministic optimal control problems with the Burgers’ equation are studied, for example, in [19, 78,79,80]. We refer the reader to [40, 43, 46, 48] for risk-neutral and risk-averse control of the steady Burgers’ equation.

Let us consider

$$\begin{aligned} \min _{u\in U_\text {ad}}\, (1/2){\mathbb {E}}\left[ \left\Vert S(u,\xi )-y_d\right\Vert _{L^2(0,1)}^2\right] + (\alpha /2)\left\Vert u\right\Vert _{L^2(D_0)}^2, \end{aligned}$$
(19)

where \(D_0 \subset (0,1)\) is a nonempty domain and for all \((u,\xi ) \in L^2(D_0) \times \Xi \), the state \(S(u,\xi ) \in H^1(0,1)\) is the weak solution to the steady Burgers’ equation: find \(y \in H_0^1(0,1)\) with

$$\begin{aligned} -\kappa (\xi )y'' + y y' = b(\xi ) + Bu\quad \text {in} \quad (0,1), \quad y(0) = 0, \;\; y(1) = 0, \end{aligned}$$

where \(b: \Xi \rightarrow L^2(D)\) and \(\kappa : \Xi \rightarrow (0,\infty )\). As in [78, p. 78], \(B : L^2(D_0) \rightarrow L^2(0,1)\) is defined by \((Bu)(x) = u(x)\) if \(x \in D_0\) and 0 else. We consider homogeneous Dirichlet boundary conditions, as it simplifies the derivation of a state stability estimate.

The weak form of the steady Burgers’ equation has at least one solution \(S(u,\xi ) \in H_0^1(0,1)\) for each \((u,\xi ) \in L^2(D_0) \times \Xi \) [79, Proposition 3.1]. We assume that the solution \(S(u,\xi )\) be unique to ensure that the reduced formulation (19) is well-defined. A condition sufficient for uniqueness is that \(\kappa (\xi )\) is sufficiently large [79, Proposition 3.1]. We formulate the uniqueness as an assumption.

  • \(\kappa :\Xi \rightarrow {\mathbb {R}}\) is measurable and there exists \(\kappa _{\min }\), \(\kappa _{\max } \in (0,\infty )\) such that \(\kappa _{\min }\le \kappa (\xi ) \le \kappa _{\max }\) for all \(\xi \in \Xi \).

  • \(b: \Xi \rightarrow L^2(0,1)\) is strongly measurable and there exists \(b_{\max } \in (0,\infty )\) such that \(\left\Vert b(\xi )\right\Vert _{L^2(0,1)} \le b_{\max }\) for all \(\xi \in \Xi \).

  • For each \((u,\xi ) \in L^2(D_0) \times \Xi \), the solution \(S(u,\xi ) \in H_0^1(0,1)\) to the weak form of the steady Burgers’ equation is unique.

  • \(y_d \in L^2(0,1)\), \(U_\text {ad}\subset L^2(D_0)\) is nonempty, closed, and convex, and \(\alpha > 0\).

Throughout the section, we assume these conditions be satisfied.

Let us verify Assumption 1. The constraints in (19) can be modeled using the indicator function \(\psi = I_{U_\text {ad}}\). Since \(U_\text {ad}\) is nonempty, closed, and convex, the function \(I_{U_\text {ad}}\) is proper, convex, and lower semicontinuous [13, Ex. 2.67]. The function \(J_1 : H_0^1(D) \rightarrow [0,\infty )\) defined by \(J_1(y) = (1/2)\left\Vert y-y_d\right\Vert _{L^2(D)}^2\) is continuously differentiable. Putting together the pieces, we find that Assumption 1 holds true.

We define \(E : H_0^1(0,1) \times L^2(D_0) \times \Xi \rightarrow H^{-1}(0,1)\) by

$$\begin{aligned} \begin{aligned} \langle E(y, u, \xi ), v \rangle _{H^{-1}(D), H_0^1(D)}&= ( \kappa (\xi )y', v' )_{L^2(0,1)} + ( yy', v )_{L^2(0,1)}\\&\quad - ( b(\xi ), v )_{L^2(0,1)} - ( Bu, v )_{L^2(0,1)}. \end{aligned} \end{aligned}$$

Let \(\iota : H_0^1(0,1) \rightarrow L^2(0,1)\) be the embedding operator of the compact embedding . We have \(\langle \iota ^*[Bu], v \rangle _{H^{-1}(D), H_0^1(D)} = ( Bu, v )_{L^2(0,1)}\) for all \(v \in H_0^1(D)\) and \(u \in L^2(D_0)\).

We show that Assumption 2 holds true. The operator E is well-defined [78, pp. 76 and 80] and \(E(\cdot ,\cdot ,\xi )\) is twice continuously differentiable for each \(\xi \in \Xi \) [78, p. 81]. For each \((u,\xi ) \in L^2(D_0) \times \Xi \), \(E_y(S(u,\xi ),u,\xi )\) has a bounded inverse [46, p. A1866]. Using arguments similar to those in Sect. 4.1, we can show that \(E(y,u,\cdot )\) is measurable for each \((y,u) \in H_0^1(D) \times L^2(D_0)\). We conclude that Assumption 2 holds true.

Using the gradient formula (6), \((\iota ^*)^* = \iota \), and \(E_u(S(u,\xi ),u,\xi ) = -\iota ^*B\), we find that

$$\begin{aligned} \nabla \widehat{J}_1(u,\xi ) = -B^*\iota [z(u,\xi )], \end{aligned}$$
(20)

where for each \((u,\xi ) \in L^2(D_0) \times \Xi \), \(z(u,\xi ) \in H_0^1(0,1)\) solves the adjoint equation: find \(z \in H_0^1(0,1)\) with

$$\begin{aligned} \kappa (\xi )( z', v' )_{L^2(0,1)} -( S(u,\xi )z', v )_{L^2(0,1)} = -( S(u,\xi )-y_d, v )_{L^2(0,1)} \end{aligned}$$

for all \(v \in H_0^1(0,1)\); cf. [19, 205–206] and [78, p. 83]. Since \(\iota \) is linear and compact, and B is linear and bounded, the operator \(K= -B^* \iota \) is compact [49, Theorem 8.2-5 and p. 427]. We choose \(M(u,\xi ) = z(u,\xi )\).

We establish Assumption 4. Using [8, Theorem 8.2.9] and the measurability of \(S(u,\cdot )\) (see Lemma 1), we can show that \(z(u,\cdot )\) is measurable for all \(u \in L^2(D_0)\). The implicit function theorem can be used to show that \(z(\cdot , \xi )\) is continuous. Hence z is a Carathéodory mapping. Next, we derive an \(H_0^1(0,1)\)-stability estimate for the state. We have \(\left\Vert Bu\right\Vert _{L^2(0,1)} \le \left\Vert u\right\Vert _{L^2(D_0)}\) for all \(u \in L^2(D_0)\). Hence the operator norm of B is less than or equal to one. We have \(\left\Vert v\right\Vert _{L^p(0,1)} \le \left\Vert v\right\Vert _{H_0^1(0,1)}\) for each \(v \in H_0^1(0,1)\) and \(1\le p \le \infty \) [78, Lemma 3.4 on p. 9]. Hence Friedrichs ’ constant \(C_D\) satisfies \(C_D\le 1\). Using integration by parts, we have \(( yy', y )_{L^2(0,1)} = 0\) for all \(y \in H_0^1(0,1)\) [78, p. 72]. Choosing \(v = S(u,\xi )\) in the weak form of Burgers’ equation, we obtain

$$\begin{aligned} \kappa _{\min } \left\Vert S(u,\xi )\right\Vert _{H_0^1(0,1)} \le \left\Vert b(\xi )\right\Vert _{L^2(0,1)} + \left\Vert u\right\Vert _{L^2(D_0)}; \end{aligned}$$
(21)

cf. [78, p. 75]. Next, we establish a stability estimate for \(M(u,\xi ) = z(u,\xi )\). Combining the \(L^\infty (0,1)\)-stability estimate established in [78, Lemma 3.4 on p. 83] with \((1+\textrm{e}^{2x})\textrm{e}^{x} \le 2\textrm{e}^{3x}\) valid for all \(x \ge 0\), we obtain

$$\begin{aligned} \left\Vert z(u,\xi )\right\Vert _{L^\infty (0,1)} \le 2\kappa (\xi )^{-1} \textrm{e}^{3\kappa (\xi )^{-1}\left\Vert S(u,\xi )\right\Vert _{L^1(0,1)}} \left\Vert S(u,\xi )-y_d\right\Vert _{L^2(0,1)}. \end{aligned}$$
(22)

Choosing \(v = z(u,\xi )\) in the adjoint equation and using the Hölder and Friedrichs inequalities, and \(C_D\le 1\), we obtain

$$\begin{aligned} \kappa (\xi )\left\Vert z(u,\xi )\right\Vert _{H_0^1(0,1)} \le \left\Vert S(u,\xi )\right\Vert _{L^2(0,1)} \left\Vert z(u,\xi )\right\Vert _{L^\infty (0,1)} +\left\Vert S(u,\xi )-y_d\right\Vert _{L^2(0,1)}. \end{aligned}$$
(23)

Since \(U_\text {ad}\) is nonempty, there exists \(u_0 \in U_\text {ad}\). Combined with (21) and the definition of \(J_1\), we find that \({\mathbb {E}}\left[ \widehat{J}(u_0,\xi )\right] < \infty \). Let \(B_{V_{\text {ad}}^\rho (u_0)}\) be an open, bounded ball about zero containing \(V_{\text {ad}}^\rho (u_0)\) with radius \(R_{\text {ad}}> 0\). We define \(\zeta _1(\xi ) = (1/\kappa _{\min })\big (\left\Vert b(\xi )\right\Vert _{L^2(0,1)} + R_{\text {ad}}+ \left\Vert y_d\right\Vert _{L^2(0,1)}\big )\) and

$$\begin{aligned} \zeta (\xi ) = (1/\kappa _{\min })\zeta _1(\xi )\big ((2/\kappa _{\min })\zeta _1(\xi ) \textrm{e}^{(3/\kappa _{\min })\zeta _1(\xi )} +1\big ). \end{aligned}$$

Combining (20) and the stability estimates (21), (22) and (23), we conclude that Assumption 4 holds true with \(\zeta \) being an essentially bounded random variable.

4.3 Distributed Control of a Semilinear State Equation

We consider a distributed control problem with a semilinear state equation based on those considered in [47, Sect. 5] and [45, Sect. 5.2]. Risk-neutral optimization of semilinear PDEs are also studied, for example, in [24, 25]. We refer the reader to [77, Chap. 9] and [76, Chap. 4] for the analysis of deterministic, distributed control problems with semilinear PDEs.

We consider

$$\begin{aligned} \min _{u\in U_\text {ad}}\, (1/2){\mathbb {E}}\left[ \left\Vert (1-S(u,\xi ))_+\right\Vert _{L^2(D)}^2\right] +(\alpha /2)\left\Vert u\right\Vert _{L^2(D)}^2, \end{aligned}$$
(24)

where \((\cdot )_+ = \max \{0,\cdot \}\), \(\alpha > 0\), and \(U_\text {ad}\subset L^2(D)\) is a nonempty, closed, and convex. For each \((u,\xi ) \in L^2(D) \times \Xi \), \(S(u,\xi ) \in H^1(D)\) is the solution to: find \(y \in H^1(D)\) with \(E(y,u,\xi ) = 0\), where the operator \(E: H^1(D) \times L^2(D) \times \Xi \rightarrow H^1(D)^*\) is defined by

$$\begin{aligned} \begin{aligned} \langle E(y,u,\xi ), v \rangle _{{H^1(D)}^*\!, H^1(D)} =&( \kappa (\xi )\nabla y, \nabla v )_{L^2(D)^{2}} +( g(\xi )y+y^3, v )_{L^2(D)} \\&\quad - ( B(\xi )[u], v )_{L^2(D)} -( b(\xi ), v )_{L^2(D)}. \end{aligned} \end{aligned}$$
(25)

Let \(\iota : H^1(D) \rightarrow L^2(D)\) be the compact embedding operator of the compact embedding [32, Theorem 1.14]. For each \(\xi \in \Xi \), we define \(B(\xi ) = \iota {\widetilde{B}}(\xi ) \iota ^*\). The operator \({\widetilde{B}}(\xi ) : H^1(D)^* \rightarrow H^1(D)\) is the solution operator to a parameterized PDE. For each \((f,\xi ) \in H^1(D)^* \times \Xi \), \({\widetilde{B}}(\xi )f \in H^1(D)\) is the solution to: find \(w \in H^1(D)\) with

$$\begin{aligned} ( r(\xi )\nabla w, \nabla v )_{L^2(D)^{2}} + ( w, v )_{L^2(D)} = \langle f, v \rangle _{{H^1(D)}^*\!, H^1(D)} \quad \text {for all} \quad v \in H^1(D). \end{aligned}$$
(26)

Since the embedding is continuous, the operator \(\iota ^*\) is given by \(\langle \iota ^*[u], v \rangle _{{H^1(D)}^*\!, H^1(D)} = ( u, v )_{L^2(D)}\) for all \((u,v) \in L^2(D) \times H^1(D)\) [13, p. 21].

The assumptions stated next ensure the existence and uniqueness of solutions to the PDE defined by the operator in (25) and the well-posedness of the operator \({\widetilde{B}}(\xi )\); see [47, Sects. 3 and 5].

  • \(D\subset {\mathbb {R}}^2\) is a bounded Lipschitz domain.

  • \(\kappa \), \(g : \Xi \rightarrow L^\infty (D)\) are strongly measurable and there exist \( \kappa _{\min }\), \( \kappa _{\max }\), \(g_{\min }\), \(g_{\max } \in (0,\infty )\) such that \(\kappa _{\min } \le \kappa (\xi ) \le \kappa _{\max }\) and \(g_{\min } \le g(\xi ) \le g_{\max }\) for all \(\xi \in \Xi \).

  • \(b:\Xi \rightarrow L^2(D)\) and \(r :\Xi \rightarrow L^\infty (D)\) are strongly measurable and there exist \(b_{\max }\), \(r_{\min }\), \(r_{\max } \in (0,\infty )\) such that \(\left\Vert b(\xi )\right\Vert _{L^2(D)} \le b_{\max }\) and \(r_{\min } \le r(\xi ) \le r_{\max }\) for all \(\xi \in \Xi \).

Throughout the section, we assume these conditions be satisfied.

Assumption 1 is fulfilled since the function \(J_1 : H^1(D) \rightarrow [0,\infty )\) defined by \(J_1(y) = (1/2)\left\Vert (1-\iota y)_+\right\Vert _{L^2(D)}^2\) is continuously differentiable [47, p. 14]. We have \(\textrm{D}_y J_1(y) = -\iota ^*(1-\iota [y])_+\). Since \(\iota [y] = y\), we have for all \(y \in H^1(D)\),

$$\begin{aligned} \left\Vert \textrm{D}_y J_1(y)\right\Vert _{H^1(D)^*} \le \left\Vert (1-y)_+\right\Vert _{L^2(D)} \le \left\Vert 1\right\Vert _{L^2(D)}+ \left\Vert y\right\Vert _{H^1(D)}. \end{aligned}$$
(27)

For each \(\xi \in \Xi \), the operator \(E(\cdot ,\cdot ,\xi )\) is continuously differentiable [47, p. 14] and for each \((u,\xi ) \in L^2(D) \times \Xi \), \(E_y(S(u,\xi ),u,\xi )\) has a bounded inverse [47, p. 9]. Using arguments similar to those in Sect. 4.1, we can show that \(E(y,u,\cdot )\) is measurable for each \((y,u) \in H^1(D) \times L^2(D)\). We find that Assumption 2 holds true.

We verify Assumption 4. For each \((u,\xi ) \in L^2(D) \times \Xi \), the adjoint state \(z(u,\xi ) \in H^1(D)\) is the solution to: find \(z \in H^1(D)\) with

$$\begin{aligned} ( \kappa (\xi )\nabla z, \nabla v )_{L^2(D)^{2}} +( g(\xi )z+3S(u,\xi )^2z, v )_{L^2(D)} = ( (1-S(u,\xi ))_+, v )_{L^2(D)} \end{aligned}$$

for all \(v \in H^1(D)\). Choosing \(v = z(u,\xi )\) and using (27), we obtain the stability estimate

$$\begin{aligned} \min \{\kappa _{\min },g_{\min }\} \left\Vert z(u,\xi )\right\Vert _{H^1(D)} \le \left\Vert 1\right\Vert _{L^2(D)} + \left\Vert S(u,\xi )\right\Vert _{H^1(D)}. \end{aligned}$$
(28)

Moreover, for all \(f \in H^1(D)\) and \(u \in L^2(D)\), we have the stability estimates (cf. [47, Sects. 3 and 5])

$$\begin{aligned} \begin{aligned} \min \{r_{\min },1\} \left\Vert {\widetilde{B}}(\xi )f\right\Vert _{H^1(D)}&\le \left\Vert f\right\Vert _{H^1(D)^*}, \\ \min \{\kappa _{\min },g_{\min }\}\left\Vert S(u,\xi )\right\Vert _{H^1(D)}&\le \left\Vert \iota ^*[B(\xi )u]+\iota ^*[b(\xi )]\right\Vert _{H^1(D)^*}. \end{aligned} \end{aligned}$$
(29)

Using calculus for adjoint operators [49, p. 235] and \(\iota = (\iota ^*)^*\), we find that \((\iota ^*B(\xi ))^* = \iota {\widetilde{B}}(\xi )^* \iota ^*\iota \). Consequently, the gradient formula (6) yields

$$\begin{aligned} \nabla \widehat{J}_1(u,\xi ) = -\iota [{\widetilde{B}}(\xi )^*\iota ^*\iota z(u,\xi )]. \end{aligned}$$

We choose \(K = - \iota \) and \(M(u,\xi ) = {\widetilde{B}}(\xi )^*\iota ^*\iota z(u,\xi )\). Using the implicit function theorem and [47, Proposition 4.3], we find that z is a Carathéodory mapping. Combined with (29), we obtain that \(M(\cdot , \xi )\) is continuous for each \(\xi \in \Xi \). Fix f, \(v \in H^1(D)^*\). Using [8, Theorem 8.2.9], we can show that \(\xi \mapsto {\widetilde{B}}(\xi )f\) is measurable. Hence \(\xi \mapsto \langle v, {\widetilde{B}}(\xi )f \rangle _{{H^1(D)}^*\!, H^1(D)}\) is measurable [35, Theorem 1.1.6]. Since \(\langle {\widetilde{B}}(\xi )^*v, f \rangle _{H^1(D), H^1(D)^*} = \langle v, {\widetilde{B}}(\xi )f \rangle _{{H^1(D)}^*\!, H^1(D)}\) for all \(\xi \in \Xi \), the mapping \(\xi \mapsto {\widetilde{B}}(\xi )^*v\) is measurable [35, Theorem 1.1.6]. Since \(H^1(D)\) is separable, \(\xi \mapsto {\widetilde{B}}(\xi )^*\) is strongly measurable [35, Theorem 1.1.6]. Combined with the composition rules [35, Proposition 1.1.28 and Corollary 1.1.29], we can show that \(M(u,\cdot )\) is measurable.

Using (29) and the fact that \(U_\text {ad}\) is nonempty, we find that there exists \(u_0 \in U_\text {ad}\) with \({\mathbb {E}}\left[ \widehat{J}(u_0,\xi )\right] < \infty \). Let \(B_{V_{\text {ad}}^\rho (u_0)}\) be an open, bounded ball about zero containing \(V_{\text {ad}}^\rho (u_0)\) and let \(R_{\text {ad}}\) be its radius. We define the random variable

$$\begin{aligned} \zeta (\xi ) = \tfrac{1}{\min \{\kappa _{\min },g_{\min }\}} \bigg (\left\Vert 1\right\Vert _{L^2(D)} + \tfrac{\left\Vert b(\xi )\right\Vert _{L^2(D)} +\tfrac{1}{\min \{r_{\min },1\}}R_{\text {ad}}}{\min \{\kappa _{\min },g_{\min }\}}\bigg ). \end{aligned}$$

Our assumptions and Hölder’s inequality ensure that \(\zeta \) is integrable.

Combined with the stability estimates (28) and (29), we conclude that Assumption 4 holds true.

5 Discussion

The analysis of the SAA approach for PDE-constrained optimization under uncertainty is complicated by the fact that the feasible sets are generally noncompact, stopping us from directly applying the consistency results developed in the literature on M-estimation and stochastic programming. Inspired by the consistency results in [70, 72], we constructed compact subsets of the feasible set that contain the solutions to the stochastic programs and eventually those to their SAA problems, allowing us to establish consistency of the SAA optimal values and SAA solutions. To construct such compact sets, we combined the adjoint approach, optimality conditions, and PDE stability estimates. We applied our framework to three risk-neutral nonlinear PDE-constrained optimization problems.

We comment on four limitations of our approach. First, our construction of the compact sets exploits the positivity of the regularization parameter \(\alpha \), limiting our approach at first to PDE-constrained optimization problems with strongly convex control regularization. However, we can add \((\alpha /2)\left\Vert \cdot \right\Vert _{L^2(D)}^2\) with \(\alpha > 0\) to the objective function, allowing us to establish the consistency of regularized SAA solutions. If \(U_\text {ad}\) is contained in a ball with radius \(r_{\text {ad}} > 0\), \(\varepsilon > 0\), and \(\alpha = 2\varepsilon /r_{\text {ad}}^2\), then solutions to the regularized SAA problem provide \(\varepsilon \)-optimal solutions to the non-regularized SAA problem (2).Footnote 1 Second, the analysis developed here demonstrates the consistency of SAA optimal values and SAA optimal solutions, but not of SAA critical points. Since the risk-neutral PDE-constrained optimization problems considered here are generally nonconvex, a consistency analysis of SAA critical points would be desirable. However, even though risk-neutral nonlinear PDE-constrained optimization problems and their SAA problems are generally nonconvex, significant progress has been made in establishing convexity properties of nonlinear PDE-constrained optimization problems [22, 31] and in developing verifiable conditions that can be used to certify global optimality of critical points [2]. Third, the construction of the compacts subsets performed in Sect. 3 exploits the fact that the feasible set (3) of the SAA problems is the same as that of the risk-neutral problem. Therefore, our approach does not allow for a consistency analysis for SAA problems defined by random constraints, such as those resulting from sample-based approximations of expectation constraints [72, pp. 168–170]. Fourth, our analysis does not apply to risk-averse PDE-constrained optimization problems, as it exploits smoothness of the expectation function. However, our approach may be generalized to allow for the consistency analysis of risk-averse PDE-constrained programs, such as those defined via the superquantile/conditional value-at-risk [43].