1 Introduction

In computational science and engineering, often it is common practice to control selected variables of the underlying physical system modelled by partial differential equations (PDE) in order to drive the simulation results as close as possible to some ideal data or experimental measurement. These represent optimal control problems for PDE, where a cost functional is minimized subject to some PDE constraint. In practical applications, uncertainties often arise from various sources, for instance the PDE coefficients representing physical parameters, computational geometries, external loadings, boundary and initial conditions, etc. Quantification of uncertainties can be crucial for the determination of meaningful optimal solutions. Deterministic optimal control problems without taking the uncertainties into account have been studied from both mathematical and computational perspectives [21, 22, 33, 51]. Stochastic optimal control problems constrained by PDE models with random inputs have been considered only recently thanks to the development of efficient stochastic computational methods [11, 23, 26, 31, 43, 50].

Several computational challenges arise from solving PDE-constrained stochastic optimal control problems. Firstly, design of efficient and accurate numerical schemes for the approximation of the optimal solution in the stochastic space has been a difficult task for most PDE models. The Monte Carlo method can be regarded as one of the most effective and simple schemes, however it is to be blamed for its low convergence rate, thus leading to heavy computational cost when a full deterministic optimal control problem has to be solved for every sample. Galerkin projection of the optimal solution onto suitable (e.g. global polynomial) subspace of the stochastic space has been proven to converge exponentially fast for smooth problems [3, 26]. Unfortunately, the tensor-product projection scheme produces a large-scale tensor system to be solved, bringing further computational difficulties. Another scheme, the stochastic collocation method based on multidimensional interpolation [2, 53], takes advantage of the fast convergence of the Galerkin projection and the non-intrusivity (thus easy implementability) of the Monte Carlo sampling.

Secondly, it is commonly recognized as a computational challenge to deal with high-dimensional stochastic problems due to the “curse-of-dimensionality”. In order to harness the computational burden, sparse and adaptive algorithms have been employed by making good use of sparse structure of numerical approximation and the different importance of each dimension, for instance the (anisotropic) sparse-grid stochastic collocation method [2, 39, 53].

An additional computational challenge in solving PDE-constrained stochastic optimal control problems comes from the ill-conditioning and the coupled structure of the optimality system obtained by a Lagrangian approach [51]. Efficient preconditioning techniques have been developed to solve the optimality system by one “shot” approach [42, 48]; sequential quadratic programming [50] and trust-region iterative algorithms [31] have been applied too. However, when solving the full optimality system becomes very expensive, it is only affordable for tens or hundreds of full solves in practice. This makes the approaches introduced above unaffordable since the number of samples needed can easily become unbearable, especially for high-dimensional problems.

On the other hand, since quantities of interest usually live in a low-dimensional manifold, model order reduction techniques may be applied using proper orthogonal decomposition or reduced basis methods for parametrized optimal control problems, see for example [29, 30, 32, 34, 37]. Reduced basis methods for deterministic parametrized Stokes flow control problems have been treated in [35, 36, 46].

In this paper, we study a stochastic optimal control problem constrained by Stokes equations with random input data and a distributed control function, which features all the aforementioned computational challenges, besides the additional difficulty due to the saddle point structure of the underlying Stokes model [6]. We develop and analyze a multilevel and weighted reduced basis method in combination with the stochastic collocation method to solve the stochastic optimal control problem. More in detail, the (anisotropic) sparse grid stochastic collocation method is applied for the stochastic approximation of the optimal solution in the probability space, while the finite element method is used for deterministic approximation in the physical space, leading to a large number of finite element optimality systems to solve. To reduce the computational cost, the latter are projected onto an adaptively constructed reduced basis space, leading to a much cheaper reduced optimality system [35, 36]. For the construction of the reduced basis space, we design a multilevel greedy algorithm and propose a weighted a posteriori error bound, which together produce a quasi-optimal “snapshots” space that well approximate the low-dimensional manifold of the quantities of interest. A global error analysis is carried out for the complete numerical approximation based on the regularity of the optimal solution, in particular the stochastic regularity obtained for the specific Stokes control problem. Numerical experiments with stochastic dimensions ranging from 10 to 100 are performed to verify the theoretically predicted error convergence results and demonstrate the efficiency and accuracy of our computational method for large-scale and high-dimensional PDE-constrained optimization problems. The main contribution of this work is the development of efficient model order reduction techniques to solve stochastic optimal control problems with PDE (Stokes equations) constraints. We also carry out an analysis of the stochastic regularity of the optimal solution with respect to the input random variables, as well as the convergence of the associated error. Moreover, we obtain a global error estimate for the proposed method. Numerical experiments demonstrate that our method achieves considerable computational saving, particularly for high-dimensional and large-scale problems.

The paper is organized as follows: the stochastic optimal control problem with Stokes constraint is presented in Sect. 2 with certain assumptions on the random input data; Sect. 3 is devoted to the study of the stochastic regularity of the optimal solution with respect to the random variables; detailed numerical approximation of the problem is presented in Sect. 4, which provides the basis for the development of the multilevel and weighted reduced basis method in Sect. 5; in Sect. 6, global error estimates are carried out and verified by numerical experiments in Sect. 7; concluding remarks are provided in the last Sect. 8.

2 Problem statement

Let \((\Omega , {\mathfrak {F}},P)\) denote a complete probability space, where \(\Omega \) is a set of outcomes \(\omega \in \Omega \), \({\mathfrak {F}}\) is a \(\sigma \)-algebra of events and \(P:{\mathfrak {F}} \rightarrow [0,1]\) with \(P(\Omega ) = 1\) is a probability measure. A real-valued random variable is defined as a measurable function \(Y: (\Omega , \mathfrak {F}) \rightarrow (\mathbb {R}, \mathfrak {B})\), being \(\mathfrak {B}\) the Borel \(\sigma \)-algebra on \(\mathbb {R}\). The distribution function of a random variable \(Y:\Omega \rightarrow \Gamma \subset \mathbb {R}\), being \(\Gamma \) the image of Y, is defined as \(F_Y:\Gamma \rightarrow [0,1]\) such that with \(F_Y(y) = P(\omega \in \Omega : Y(\omega )\le y)\) and its probability density function \(\rho : \Gamma \rightarrow \mathbb {R}\) is given by \(\rho (y)dy=dF_Y(y)\) if the random variable is continuous [15]. We define the probability Hilbert spaces \( L^2(\Omega ):=\{v:\Omega \rightarrow \mathbb {R}: \int _{\Omega } v^2(\omega ) dP(\omega ) < \infty \} \) and \( L^2_{\rho }(\Gamma ):=\{w:\Gamma \rightarrow \mathbb {R}| \int _{\Gamma }w^2(y)\rho (y)dy < \infty \}, \) equipped with the equivalent norms (by noting \(v(\omega ) = w(y(\omega ))\))

$$\begin{aligned} ||v||_{L^2(\Omega )} := \left( \int _{\Omega } v^2(\omega ) dP(\omega )\right) ^{1/2} = \left( \int _{\Gamma } w^2(y) \rho (y)dy\right) ^{1/2} =: ||\cdot ||_{L^2_{\rho }(\Gamma )}. \end{aligned}$$
(2.1)

Let D be an open and bounded physical domain in \(\mathbb {R}^d\) (\(d=2,3\)) with Lipschitz continuous boundary \(\partial D\). Let \(v:D\times \Omega \rightarrow \mathbb {R}\) represent a real-valued random field, which is a real-valued random variable defined in \(\Omega \) for each \(x \in D\). We define the Hilbert space \(\mathcal {X}^s(D): = L^2(\Omega )\otimes H^s(D), s\in \mathbb {N}_0\), equipped with the inner product

$$\begin{aligned} (w,v) = \int _{\Omega } \int _D \sum _{|\alpha |\le s} \partial ^{\alpha }w \partial ^{\alpha } v dxdP(\omega ) \quad \forall w, v \in \mathcal {X}^s(D), \end{aligned}$$
(2.2)

where the partial derivative is defined as \(\partial ^{\alpha } w = \frac{\partial ^{|\alpha |} w }{\partial _{x_1^{\alpha _1}}\cdots \partial _{x_d^{\alpha _d}}}\) with the multi-index \(\alpha =(\alpha _1, \ldots , \alpha _d) \in \mathbb {N}_0^d\) and \(|\alpha | = \alpha _1 + \cdots +\alpha _d\). The associated norm is defined as \(||v||_{\mathcal {X}^s(D)} = \sqrt{(v,v)}\). When \(s=0\), we denote \(H^0(D) \equiv L^2(D)\), and thus \(\mathcal {X}^0(D) \equiv \mathcal {L}^2(D)\) by convention. For a random vector field \(\mathbf {v} = (v_1, \ldots , v_d):D\times \Omega \rightarrow \mathbb {R}^d\), we define the Hilbert space \(\mathcal {X}^{s,d}(D): = \left( L^2(\Omega )\otimes H^s(D)\right) ^d\) (\(=\mathcal {L}^{2,d}(D)\) for \(s=0\)).

2.1 Stochastic Stokes equations

We consider the following stochastic Stokes equations: given random variable \(\nu :\Omega \rightarrow \mathbb {R}_+\), random vector fields \(\mathbf {f}:D\times \Omega \rightarrow \mathbb {R}^d\) and \(\mathbf {h}:\partial D_N \times \Omega \rightarrow \mathbb {R}^d\), find a solution \(\{\mathbf {u},p\}:D\times \Omega \rightarrow \mathbb {R}^d\times \mathbb {R}\) such that the following equations hold almost surely (for almost every \(\omega \in \Omega \))

$$\begin{aligned} {\left\{ \begin{array}{ll} -\nu (\omega ) \triangle \mathbf {u}(\cdot ,\omega ) + \nabla p(\cdot ,\omega ) = \mathbf {f}(\cdot ,\omega ) &{} \text { in } D,\\ \nabla \cdot \mathbf {u}(\cdot , \omega ) = 0 &{} \text { in } D, \\ \mathbf {u}(\cdot , \omega ) = \mathbf {0} &{} \text { on } \partial D_D,\\ \nu (\omega ) \nabla u(\cdot , \omega ) \cdot \mathbf {n} - p(\cdot , \omega ) \mathbf {n} = \mathbf {h}(\cdot , \omega ) &{} \text { on } \partial D_N, \end{array}\right. } \end{aligned}$$
(2.3)

where \(\partial D_D\) and \(\partial D_N\) represent the Dirichlet and Neumann boundaries such that \(\partial D_D \cup \partial D_N = \partial D\) and \(\partial D_D \cap \partial D_N = \emptyset \). In particular, we consider a homogeneous Dirichlet boundary condition and a nonhomogeneous Neumann boundary condition.

At any realization \(\omega \in \Omega \), the Stokes Eq. (2.3) is commonly used to quantify the velocity \(\mathbf {u}\) and pressure p of fluid flow where advective inertial forces are negligible compared to viscous forces measured via the kinematic viscosity parameter \(\nu \). This occurs, e.g., for low speed channel flows, the flow of viscous polymers or micro-organisms [1]. In practice, the viscosity \(\nu \) may vary in a large extent rather than stay as a fixed constant for many fluids depending on the temperature, the multicomponent property of the fluid and some other factors [18]. Quantification of the body force \(\mathbf {f}\) and boundary condition \(\mathbf {h}\), for instance by experimental measurement, may also be faced with various noises or uncertainties. Incorporation of these different uncertainties leads to the study of stochastic Stokes equations.

We consider the weak formulation of (2.3): find \(\{\mathbf {u},p\} \in \mathcal {V} \times \mathcal {Q}\) such that

$$\begin{aligned} {\left\{ \begin{array}{ll} a(\mathbf {u}, \mathbf {v}) + b(\mathbf {v}, p) = (\mathbf {f}, \mathbf {v}) + (\mathbf {h},\mathbf {v})_{\partial D_N} &{}\quad \forall \mathbf {v}\in \mathcal {V}, \\ b(\mathbf {u}, q) = 0 &{}\quad \forall q \in \mathcal {Q}, \end{array}\right. } \end{aligned}$$
(2.4)

where \(\mathcal {V} := \left\{ \mathbf {v} \in \mathcal {X}^{1,d}(D): \mathbf {v} = \mathbf {0} \text { on } \partial D_D\right\} \) equipped with the norm \(||\cdot ||_{\mathcal {V}} = ||\cdot ||_{\mathcal {X}^{1,d}}\), \(\mathcal {Q} := L^2(\Omega )\otimes Q(D)\) equipped with the norm \(||\cdot ||_{\mathcal {Q}}:= ||\cdot ||_{\mathcal {L}^2(D)}\), and

$$\begin{aligned} Q(D) :=\left\{ q\in L^2(D): \int _D q dx = 0\right\} . \end{aligned}$$
(2.5)

The bilinear form \(a(\cdot , \cdot ):\mathcal {V} \times \mathcal {V} \rightarrow \mathbb {R}\) is defined as

$$\begin{aligned} a( \mathbf {w}, \mathbf {v}):= \int _{\Omega }\int _D \nu \nabla \mathbf {w} \otimes \nabla \mathbf {v} dxdP(\omega ) = \sum _{i,j=1}^d \int _{\Omega }\int _D \nu \frac{\partial w_i}{\partial x_j} \frac{\partial v_i}{\partial x_j} dxdP(\Omega ) \end{aligned}$$
(2.6)

and the bilinear form \(b(\cdot , \cdot ): \mathcal {V} \times \mathcal {Q} \rightarrow \mathbb {R}\) reads

$$\begin{aligned} b(\mathbf {v}, q) = - \int _{\Omega }\int _D \nabla \cdot \mathbf {v} q dx dP(\omega ) = -\sum _{i=1}^d \int _{\Omega }\int _D \frac{\partial v_i}{\partial x_i}q dx dP(\omega ). \end{aligned}$$
(2.7)

The stochastic inner product \((\mathbf {f}, \mathbf {v})\) and \((\mathbf {h},\mathbf {v})_{\partial D_N}\) are defined by the formula (2.2) on the domain D and Neumann boundary \(\partial D_N\), respectively. We make the following assumptions on the random variable \(\nu \) and random vector fields \(\mathbf {f}\) and \(\mathbf {h}\).

Assumption 1

The random viscosity \(\nu \) is positive and uniformly bounded from below and from above, i.e. there exist two constants \(0 < \nu _{\min } \le \nu _{max} < \infty \) such that

$$\begin{aligned} P(\omega : \nu _{min} \le \nu (\omega ) \le \nu _{max}) = 1. \end{aligned}$$
(2.8)

The random force field \(\mathbf {f}\) and Neumann boundary field \(\mathbf {h}\) satisfy

$$\begin{aligned} ||\mathbf {f}||_{\mathcal {L}} < \infty \text { and } ||\mathbf {h}||_{\mathcal {H}} < \infty , \end{aligned}$$
(2.9)

where we denote \(\mathcal {L} = \mathcal {L}^{2,d}(D)\) and \(\mathcal {H} = \mathcal {L}^{2,d}(\partial D_N)\) for simplicity.

Assumption 2

The random data \(\nu \), \(\mathbf {f}\) and \(\mathbf {h}\) depend only on a finite number of random variables \(Y(\omega ) = (Y_1(\omega ), \ldots , Y_N(\omega )):\Omega \rightarrow \Gamma = \Gamma _1\times \cdots \times \Gamma _N \subset \mathbb {R}^N\) with probability density function \(\rho = (\rho _1, \dots , \rho _N): \Gamma \rightarrow \mathbb {R}^N\), i.e., with slight abuse of notation, \(\nu (\omega ) = \nu (Y(\omega ))\in \mathbb {R}_+\), \(\mathbf {f}(\cdot , \omega ) = \mathbf {f}(\cdot , Y(\omega )): D \rightarrow \mathbb {R}^d\) and \(\mathbf {h}(\cdot , \omega ) = \mathbf {h}(\cdot , Y(\omega )): \partial D_N \rightarrow \mathbb {R}^d\) almost surely.

Remark 2.1

The random variable \(\nu \) and random vector fields \(\mathbf {f}\) and \(\mathbf {h}\) may not depend on the same random vector Y but on different ones \(Y_{\nu }, Y_{f}, Y_{h}\). For ease of notation, we still use a single random vector \(Y = (Y_{\nu }, Y_{f}, Y_{h})\) with dimension N.

Example 1

For a multicomponent fluid flow, the viscosity is propositional to the contribution of each component [28], which can be described by

$$\begin{aligned} \nu (Y(\omega )) = \sum _{n=1}^N \nu _n Y_n(\omega ) + \nu _0 \left( 1-\sum _{n=1}^N Y_n(\omega )\right) = \nu _0 + \sum _{n=1}^N(\nu _n-\nu _0)Y_n(\omega ), \end{aligned}$$
(2.10)

where \(Y_n, 1\le n \le N\) are uniformly distributed in [0, 1 / N] and \(\nu _n > 0, 0\le n \le N\).

Example 2

Another example for the random vector field \(\mathbf {h}\) is given by the truncated Karhunen-Loève expansion with \(N+1\) terms as [49]

$$\begin{aligned} \mathbf {h}(x, Y(\omega )) = \mathbb {E}[\mathbf {h}](x) + \sum _{n=1}^N \sqrt{\lambda _n}\mathbf {h}_n(x) Y_n(\omega ) \quad x\in \partial D_N, \end{aligned}$$
(2.11)

where \((\lambda _n, \mathbf {h}_n)\) are the eigenpairs of the continuous and bounded covariance function \(\mathbb {C}(x,x') = \mathbb {E}[(\mathbf {h}(x,Y)-\mathbb {E}[\mathbf {h}](x)])(\mathbf {h}(x',Y)-\mathbb {E}[\mathbf {h}](x')])]\) and the random variables \(Y_n, 1\le n \le N\) are uncorrelated with zero mean and unit variance, given by [49]

$$\begin{aligned} Y_n(\omega ) = \frac{1}{\sqrt{\lambda _n}}\int _{D}\left( \mathbf {h}(x, Y(\omega )) - \mathbb {E}[\mathbf {h}](x)\right) \cdot \mathbf {h}_n(x) dx. \end{aligned}$$
(2.12)

Under Assumption 2, the stochastic Stokes Eq. (2.3) can be viewed as a set of parameterized equations defined in a tensor product of the spatial domain and the parameter space \(D\times \Gamma \). We remark that the Hilbert space \(L^2(\Omega )\) is equivalent to \(L^2_{\rho }(\Gamma )\) and we use the same notation \(\mathcal {L}, \mathcal {H}, \mathcal {V}, \mathcal {Q}\) for the stochastic Hilbert spaces.

2.2 Constrained optimal control problem

We study a distributed optimal control problem constrained by the stochastic Stokes equations. Let us define a cost functional as follows

$$\begin{aligned} \mathcal {J}(\mathbf {u}, p, \mathbf {f})= & {} \frac{1}{2}||\mathbf {u}-\mathbf {u}_d||^2_{\mathcal {L}} + \frac{1}{2}||p-p_d||^2_{\mathcal {L}^{2}(D)} + \frac{\kappa }{2}||\mathbf {f}||^2_{\mathcal {G}} \nonumber \\= & {} \mathbb {E}\left[ \frac{1}{2}\int _D (\mathbf {u}-\mathbf {u}_d)^2 dx + \frac{1}{2}\int _D (p-p_d)^2 dx + \frac{\kappa }{2}\int _D \mathbf {f}^2 dx \right] ,\nonumber \\ \end{aligned}$$
(2.13)

where the first two terms measure the discrepancy between the solution \(\{\mathbf {u}, p\}\in \mathcal {V}\times \mathcal {Q}\) of the stochastic Stokes problem (2.4) and the observational data \(\{\mathbf {u}_d, p_d\} \in L^{2,d}(D)\times Q(D)\) that represent the mean of measurement. In the last term, we take the space for the control function as \(\mathcal {G} = \mathcal {L}^{2,d}(D)\) in this work. The last term is used to regularize in mathematical sense the control function \(\mathbf {f}\) with a regularization parameter \(\kappa > 0\), which can also be viewed as a penalization of the control energy. The optimal control problem constrained by the stochastic Stokes problem (2.4) can be formulated as: find an optimal solution \(\{\mathbf {u}^*, p^*, \mathbf {f}^*\}\) such that

$$\begin{aligned} \mathcal {J}(\mathbf {u}^*, p^*, \mathbf {f}^*) = \min \left\{ \mathcal {J}(\mathbf {u}, p, \mathbf {f}): \{\mathbf {u}, p, \mathbf {f}\} \in \mathcal {V}\times \mathcal {Q} \times \mathcal {G} \text { and solve } (2.4)\right\} . \end{aligned}$$
(2.14)

Remark 2.2

In the cost functional, we have used the \(\mathcal {L}^{2,d}(D)\) norm for measuring the discrepancy between the velocity field and its mean value of measurement. Extension to the case with \(\mathcal {V}\) norm is straightforward by requiring that the data \(\{\mathbf {u}_d, p_d\}\) possess higher regularity in the spatial domain. Another extension to stochastic data \(\{\mathbf {u}_d, p_d\}\) can be handled in the same way as in this work provided they depend explicitly on a finite dimensional random vector, i.e. \(\{\mathbf {u}_d, p_d\}(\cdot , \omega ) = \{\mathbf {u}_d, p_d\}(\cdot , Y(\omega ))\).

Remark 2.3

When the higher moments of the observational data \(\{\mathbf {u}_d, p_d\}\) or the control function \(\mathbf {f}\), e.g. variance, skewness, etc., or the probability distribution of \(\{\mathbf {u}_d, p_d\}\) are incorporated into the cost functional in more general settings [50], we face essentially nonlinear and fully coupled problems, which will be addressed in future.

Let the tensor-product Hilbert spaces \(\mathcal {V}\times \mathcal {Q}\) and \(\mathcal {V}\times \mathcal {Q} \times \mathcal {G}\) be equipped with the norm \(||\{\mathbf {v}, q\}||_{\mathcal {V}\times \mathcal {Q}}:=||\mathbf {v}||_{\mathcal {V}} + ||q||_{\mathcal {Q}}\) and \(||\{\mathbf {v}, q, \mathbf {g}\}||_{\mathcal {V}\times \mathcal {Q} \times \mathcal {G}}:= ||\mathbf {v}||_{\mathcal {V}} + ||q||_{\mathcal {Q}} + \sqrt{\kappa }||\mathbf {g}||_{\mathcal {G}}\) for any \(\mathbf {v} \in \mathcal {V}, q \in \mathcal {Q}, \mathbf {g} \in \mathcal {G}\); let the tensor-product Hilbert space \(\mathcal {L} \times \mathcal {Q}\) be equipped with the norm \(||\{\mathbf {v}, q\}||_{ \mathcal {L} \times \mathcal {Q}} := ||\mathbf {v}||_{\mathcal {L}} + ||q||_{\mathcal {Q}}\) for any \(\mathbf {v}\in \mathcal {L}, q\in \mathcal {Q}\).

Let \(\mathcal {A}:(\mathcal {V}\times \mathcal {Q}\times \mathcal {G})\times (\mathcal {V}\times \mathcal {Q}\times \mathcal {G}) \rightarrow \mathbb {R}\) be a compound bilinear form defined as

$$\begin{aligned} \mathcal {A}(\{\mathbf {u}, p, \mathbf {f}\}, \{\mathbf {v}, q, \mathbf {g}\}) = (\mathbf {u}, \mathbf {v}) + (p, q) + \kappa (\mathbf {f}, \mathbf {g}), \end{aligned}$$
(2.15)

and \(\mathcal {B}:(\mathcal {V}\times \mathcal {Q}\times \mathcal {G})\times (\mathcal {V}\times \mathcal {Q}) \rightarrow \mathbb {R}\) be a compound bilinear form defined as

$$\begin{aligned} \mathcal {B}(\{\mathbf {u}, p, \mathbf {f}\}, \{\mathbf {v},q\}) = a(\mathbf {u}, \mathbf {v}) + b(\mathbf {v}, p) + b(\mathbf {u}, q) - (\mathbf {f}, \mathbf {v}). \end{aligned}$$
(2.16)

Following the same procedure in the deterministic setting in [7, 35], it can be proven that the constrained optimal control problem (2.14) is equivalent to the following saddle point problem in the stochastic setting: find \(\{\mathbf {u}, p, \mathbf {f}\} \in \mathcal {V}\times \mathcal {Q} \times \mathcal {G} \text { and } \{\mathbf {u}^a, p^a\} \in \mathcal {V}\times \mathcal {Q}\) such that

$$\begin{aligned} \left\{ \begin{array}{l} \mathcal {A}(\{\mathbf {u}, p, \mathbf {f}\}, \{\mathbf {v}^a, q^a, \mathbf {g}\}) + \mathcal {B}(\{\mathbf {v}^a, q^a, \mathbf {g}\}, \{\mathbf {u}^a,p^a\}) \\ \quad = (\{\mathbf {u}_d, p_d, \mathbf {0}\}, \{\mathbf {v}^a, q^a, \mathbf {g}\}) \quad \forall \{\mathbf {v}^a, q^a, \mathbf {g}\}\in \mathcal {V}\times \mathcal {Q} \times \mathcal {G},\\ \mathcal {B}(\{\mathbf {u}, p, \mathbf {f}\}, \{\mathbf {v},q\}) = (\mathbf {h}, \mathbf {v})_{\partial D_N} \quad \forall \{\mathbf {v}, q\}\in \mathcal {V}\times \mathcal {Q}. \end{array} \right. \end{aligned}$$
(2.17)

Thanks to this equivalence and well-posenedness of the saddle point problem 2.17, there exists a unique optimal solution to the constrained optimal control problem (2.14). Moreover, the following stability estimates hold by the Brezzi’s theorem [41]

$$\begin{aligned} ||\{\mathbf {u}, p, \mathbf {f}\}||_{\mathcal {V}\times \mathcal {Q} \times \mathcal {G}} \le \alpha _1 ||\{\mathbf {u}_d, p_d\}||_{ \mathcal {L} \times \mathcal {Q}} + \beta _1||\mathbf {h}||_{\mathcal {H}} \end{aligned}$$
(2.18)

and

$$\begin{aligned} ||\{\mathbf {u}^a, p^a\}||_{\mathcal {V}\times \mathcal {Q}} \le \alpha _2 ||\{\mathbf {u}_d, p_d\}||_{ \mathcal {L} \times \mathcal {Q}} + \beta _2 ||\mathbf {h}||_{\mathcal {H}} \end{aligned}$$
(2.19)

where the constants \(\alpha _1, \beta _1, \alpha _2, \beta _2\) are positive and depend on the continuity and inf-sup constants of the bilinear forms \(\mathcal {A}\) and \(\mathcal {B}\), see the appendix.

3 Stochastic regularity

In this section, we show that under suitable assumptions for the regularity of the viscosity \(\nu :\Gamma \rightarrow \mathbb {R}_+\) and boundary data \(\mathbf {h}:\Gamma \rightarrow H\) in the stochastic space \(\Gamma \), the solution \(\{\mathbf {u},p,\mathbf {f},\mathbf {u}^a,p^a\}:\Gamma \rightarrow V\times Q \times G\times V \times Q\) can be analytically extended to a complex region that covers the stochastic space \(\Gamma \). Here and in the following, we denote LVQGH as the deterministic Hilbert space corresponding to their stochastic counterparts \(\mathcal {L}, \mathcal {V}, \mathcal {Q}, \mathcal {G}, \mathcal {H}\), e.g. \(H = L^{2,d}(\partial D_N)\). The norms of these Hilbert spaces are defined as in the last section except for V, which is defined as \(||\cdot ||_V: = (\nu (\bar{y})\nabla \mathbf {v}, \nabla \mathbf {v}) + (\nu (\bar{y})\mathbf {v}, \mathbf {v}) \; \forall \mathbf {v} \in V\) at a reference value \(\bar{y} \in \Gamma \), e.g., the center of \(\Gamma \).

Let \(\mathbf {k} = (k_1, \dots , k_N) \in \mathbb {N}_0^N\) be a N-dimensional multi-index of non-negative integers, with \(\mathbf {k}!=\prod _{i_1=1}^{k_1}i_1 \cdots \prod _{i_N=1}^{k_N}i_N\), \(|\mathbf {k}| = \sum _{n=1}^N k_n\), \(|\mathbf {k}|! = \prod _{i=1}^{|\mathbf {k}|} i\), and \(\mathbf {r}^{\mathbf {k}} := \prod _{n=1}^N r_n^{k_n}\) for any \(\mathbf {r} = (r_1, \dots , r_N)\in \mathbb {R}^N_+\); let \(\partial _y^{\mathbf {k}}\{\cdot \} = \partial _{y_1}^{k_1} \partial _{y_2}^{k_2}\cdots \partial _{y_N}^{k_N}\{\cdot \}\) represent the \(\mathbf {k}\)-th order partial derivative with respect to the parameter \(y = (y_1, \dots , y_N)\). Let us also define the following constants for ease of notation

$$\begin{aligned} C_{\alpha } = \alpha _1+\alpha _2, C_{\beta } = \beta _1 + \beta _2, C_{\alpha ,\beta } = \max \{\alpha _1+\alpha _2, \beta _1+\beta _2\}, \end{aligned}$$
(3.1)

where \(\alpha _1, \alpha _2, \beta _1, \beta _2\) are the stability constants in (2.18) and (2.19).

We make the following assumption of stochastic regularity on the input data:

Assumption 3

For every \(y \in \Gamma \), there exists a N-dimensional positive rate vector \(\mathbf {r}= (r_1, \dots , r_N)\in \mathbb {R}^N_+\) such that, for any \(\mathbf {k}\in \mathbb {N}_0^N\), the \(\mathbf {k}\)-th order derivative of the viscosity \(\nu \in C^{\infty }(\Gamma )\) and the boundary condition \(\mathbf {h} \in C^{\infty }(\Gamma ,H)\), which satisfy

$$\begin{aligned} C_{\alpha ,\beta } \frac{|\partial _y^{\mathbf {k}}\nu (y)|}{\nu (\bar{y})} \le |\mathbf {k}|!\mathbf {r}^{\mathbf {k}} \text { and } \frac{C_{\beta } ||\partial _y^{\mathbf {k}}\mathbf {h}(y)||_H}{C_{\alpha } ||\{\mathbf {u}_d,p_d\}||_{L\times Q} + C_{\beta } ||\mathbf {h}(y)||_H} \le |\mathbf {k}|! \mathbf {r}^{\mathbf {k}}. \end{aligned}$$
(3.2)

Remark 3.1

Assumption 3 provides a bound for the growth of the derivatives of the stochastic data, where \(\mathbf {r}\) is closely related to the complex region of analytic extension of the solution, see more details in the following theorems.

Theorem 3.1

Under assumption 3, for any \(y\in \Gamma \), the solution \(\{\mathbf {u},p,\mathbf {f},\mathbf {u}^a,p^a\} \in C^{\infty }(\Gamma , V\times Q \times G\times V \times Q)\), whose \(\mathbf {k}\)-th order derivative satisfies

(3.3)

where \(\hat{r}\mathbf {r} = (\hat{r} r_1, \hat{r} r_2, \ldots , \hat{r} r_N)\) with \(\hat{r} > 1/\log (2)\), and \(C = \max (2,1/(2-e^{1/\hat{r}}))\).

Proof

The semi-weak formulation of the saddle point problem (2.17) reads: find \(\{\mathbf {u}(y),p(y),\mathbf {f}(y)\} \in V \times Q \times G\) and \(\{\mathbf {u}^a(y),p^a(y)\} \in V \times Q\) such that

$$\begin{aligned} \left\{ \begin{array}{l} \mathcal {A}\left( \{\mathbf {u}(y),p(y),\mathbf {f}(y)\},\{\mathbf {v}^a,q^a,\mathbf {g}\}\right) + \mathcal {B}(\{\mathbf {v}^a,q^a,\mathbf {g}\}, \{\mathbf {u}^a(y),p^a(y)\};y) \\ \quad = (\{\mathbf {u}_d,p_d,\mathbf {0}\},\{\mathbf {v}^a,q^a,\mathbf {g}\}) \quad \forall \{\mathbf {v}^a,q^a,\mathbf {g}\} \in V \times Q \times G, \\ \mathcal {B}(\{\mathbf {u}(y),p(y),\mathbf {g}(y)\},\{\mathbf {v},q\};y) = (\mathbf {h}(y),\mathbf {v})_{\partial D_N} \quad \forall \{\mathbf {v},q\} \in V \times Q. \end{array} \right. \end{aligned}$$
(3.4)

With a slight abuse of notation, we have used the same bilinear forms \(\mathcal {A}\) and \(\mathcal {B}\) in the semi-weak formulation. Note now the bilinear form \(\mathcal {B}\) depends on the parameter y, \(\mathcal {B}(\{\mathbf {v}^a,q^a,\mathbf {g}\}, \{\mathbf {u}^a(y),p^a(y)\};y) = a(\mathbf {u}, \mathbf {v};y) + b(\mathbf {v}, p) + b(\mathbf {u}, q) - (\mathbf {f}, \mathbf {v})\) corresponding to (2.16), where the semi-weak bilinear form \(a(\mathbf {u}, \mathbf {v};y) = \int _D\nu (y)\nabla \mathbf {u}\otimes \nabla \mathbf {v} dx\) corresponding to the full-weak bilinear form (2.6). To prove the estimate (3.3) for a general \(\mathbf {k}\in \mathbb {N}^N_0\), we adopt an induction argument.

Step 1. To start, we consider the case when \(|\mathbf {k}| = 0\). Application of the Brezzi theorem to the semi-weak problem (3.4) leads to the existence of a unique solution \(\{\mathbf {u}(y),p(y),\mathbf {f}(y)\} \in V\times Q\times G\) and \(\{\mathbf {u}^a(y),p^a(y)\} \in V \times Q\) that satisfy the following estimates corresponding to (2.18) and (2.19) for the weak problem (2.17)

$$\begin{aligned} \left\{ \begin{aligned}&||\{\mathbf {u}(y),p(y),\mathbf {f}(y)\}||_{V \times Q \times G} \le \alpha _1 ||\{\mathbf {u}_d, p_d\}||_{L\times Q} + \beta _1 ||\mathbf {h}(y)||_H, \\&||\{\mathbf {u}^a(y),p^a(y)\}||_{V \times Q} \le \alpha _2 ||\{\mathbf {u}_d, p_d\}||_{L\times Q} + \beta _2 ||\mathbf {h}(y)||_H. \end{aligned} \right. \end{aligned}$$
(3.5)

Adding the second inequality of (3.5) to the first one, we find

$$\begin{aligned}&||\{\mathbf {u}(y),p(y),\mathbf {f}(y)\}||_{V \times Q \times G} + ||\{\mathbf {u}^a(y),p^a(y)\}||_{V \times Q}\nonumber \\&\quad \le C_{\alpha } ||\{\mathbf {u}_d, p_d\}||_{L\times Q} + C_{\beta } ||\mathbf {h}(y)||_H, \end{aligned}$$
(3.6)

which verifies the estimate (3.3) by noting that \({|\mathbf {k}|! = 1}\),\((\hat{r}\mathbf {r})^{\mathbf {k}} = 1\), and \(1 < C\). Moreover, \(\{\mathbf {u},p,\mathbf {f},\mathbf {u}^a,p^a\} \in C^0(\Gamma , V\times Q \times G\times V \times Q)\) as a consequence of (3.6) and Assumption 3, where \(\mathbf {h} \in C^{\infty }(\Gamma , H)\).

Step 2. When \(|\mathbf {k}| = 1\), i.e., there exists \(n, 1\le n \le N\) such that \(k_n = 1\) and \(k_{n^*} = 0\) for all \(n^*\ne n, 1\le n^* \le N\), we take the partial derivative \(\partial _{y_n}\) on both sides of (3.4), yielding

$$\begin{aligned} \left\{ \begin{array}{l} \displaystyle \mathcal {A}\left( \partial _{y_n}\{\mathbf {u}(y),p(y),\mathbf {f}(y)\},\{\mathbf {v}^a,q^a,\mathbf {g}\}\right) + \mathcal {B}(\{\mathbf {v}^a,q^a,\mathbf {g}\}, \partial _{y_n} \{\mathbf {u}^a(y),p^a(y)\};y) \\ \displaystyle \quad = - (\partial _{y_n}\nu (y) \nabla \mathbf {u}^a(y), \nabla \mathbf {v}^a) \quad \forall \{\mathbf {v}^a,q^a,\mathbf {g}\} \in V \times Q \times G, \\ \displaystyle \mathcal {B}(\partial _{y_n}\{\mathbf {u}(y),p(y),\mathbf {g}(y)\},\{\mathbf {v},q\};y) \\ \displaystyle \quad = (\partial _{y_n}\mathbf {h}(y),\mathbf {v})_{\partial D_N} - (\partial _{y_n} \nu (y) \nabla \mathbf {u}(y), \nabla \mathbf {v}) \quad \forall \{\mathbf {v},q\} \in V \times Q, \end{array} \right. \end{aligned}$$
(3.7)

In order to prove the existence of \(\partial _{y_n}\{\mathbf {u}(y),p(y),\mathbf {f}(y), \mathbf {u}^a(y),p^a(y)\}\) as a solution to the problem (3.7), we adopt the difference quotient approach [17]. Let us consider the semi-weak problem (3.4) at \(y+\tau e_n\), where \(e_n\) is a N-dimensional vector with the n-th argument being one and all the other arguments being zero, and \(\tau \) is a positive parameter such that the Assumption 1 holds with \(\nu _{min}\) and \(\nu _{max}\) replaced by \(\nu _{min}/2\) and \(2\nu _{max}\), respectively. By Brezzi’s theorem, this problem has a unique solution \(\{\mathbf {u}(y+\tau e_n),p(y+\tau e_n),\mathbf {f}(y+\tau e_n), \mathbf {u}^a(y+\tau e_n), p^a(y+\tau e_n)\} \in V\times Q \times G\times V \times Q\). By subtracting (3.4) at y from (3.4) at \(y+\tau e_n\), dividing by \(\tau \) on both sides, and suitably rearranging terms, we obtain

$$\begin{aligned} \left\{ \begin{array}{l} \displaystyle \mathcal {A}\left( \{\mathbf {u}_{\tau }(y),p_{\tau }(y),\mathbf {f}_{\tau }(y)\},\{\mathbf {v}^a,q^a,\mathbf {g}\}\right) + \mathcal {B}(\{\mathbf {v}^a,q^a,\mathbf {g}\}, \{\mathbf {u}_{\tau }^a(y),p_{\tau }^a(y)\};y) \\ \displaystyle \quad = - (\nu _{\tau }(y) \nabla \mathbf {u}^a(y), \nabla \mathbf {v}^a) \quad \forall \{\mathbf {v}^a,q^a,\mathbf {g}\} \in V \times Q \times G, \\ \displaystyle \mathcal {B}(\{\mathbf {u}_{\tau }(y),p_{\tau }(y),\mathbf {g}(y)\},\{\mathbf {v},q\};y) \\ \displaystyle \quad = (\mathbf {h}_{\tau }(y),\mathbf {v})_{\partial D_N} - (\nu _{\tau }(y) \nabla \mathbf {u}(y), \nabla \mathbf {v}) \quad \forall \{\mathbf {v},q\} \in V \times Q, \end{array} \right. \end{aligned}$$
(3.8)

where \(\mathbf {u}_{\tau }(y) := (\mathbf {u}(y+\tau e_n) - \mathbf {u}(y))/\tau \), and \(p_{\tau }(y)\), \(\mathbf {f}_{\tau }(y)\), \(\mathbf {u}_{\tau }^a(y)\), and \(p_{\tau }^a(y)\) are defined in the same way. Note that since the test function on the right hand side are \(\nabla \mathbf {v}^a\) and \(\nabla \mathbf {v}\), the Brezzi’s theorem can not be directly applied to problem (3.8). Instead, a variation of Brezzi’s theorem can be employed to guarantee the existence of a unique solution that satisfies the following estimates

$$\begin{aligned}&||\{\mathbf {u}_{\tau }(y), p_{\tau }(y), \mathbf {f}_{\tau }(y)\}||_{V\times Q \times G} + ||\{\mathbf {u}^a_{\tau }(y), p^a_{\tau }(y)\}||_{V\times Q} \nonumber \\&\quad \le C_{\alpha } \frac{|\nu _{\tau }(y)|}{\nu (\bar{y})} ||\mathbf {u}^a(y)||_V + C_{\beta } \left( ||\mathbf {h}_{\tau }(y)||_H + \frac{|\nu _{\tau }(y)|}{\nu (\bar{y})} ||\mathbf {u}(y)||_V \right) \nonumber \\&\quad \le C_{\beta } ||\mathbf {h}_{\tau }(y)||_H + C_{\alpha ,\beta } \frac{|\nu _{\tau }(y)|}{\nu (\bar{y})}\left( C_{\alpha } ||\{\mathbf {u}_d, p_d\}||_{L\times Q} + C_{\beta } ||\mathbf {h}(y)||_H \right) , \end{aligned}$$
(3.9)

where \(C_{\alpha } \), \(C_{\beta }\), and \(C_{\alpha ,\beta }\) are defined in (3.1), and we have used (3.6) for the second inequality. We defer the proof of this result to the Appendix. Thanks to the Assumption 3, by Taylor expansion of \(\nu (y+\tau e_n)\), we find

$$\begin{aligned} C_{\alpha ,\beta }\frac{|\nu _{\tau }(y)|}{\nu (\bar{y})} \le \sum _{k=2} ^{\infty } \tau ^{k-1} \frac{1}{k!} \left( C_{\alpha ,\beta } \frac{\left| \partial _{y_n}^k \nu (y)\right| }{\nu (\bar{y})} \right) \le r_n \sum _{k=3}^{\infty } (\tau r_n)^k, \end{aligned}$$
(3.10)

when \(\tau \) is small enough such that \(\tau r_n < 1\), (3.10) is bounded. Similarly, we have

$$\begin{aligned} C_{\beta } ||\mathbf {h}_{\tau }(y)||_H \le (C_{\alpha } ||\{\mathbf {u}_d, p_d\}||_{L\times Q} + C_{\beta } ||\mathbf {h}(y)||_H)r_n \sum _{k=3}^{\infty } (\tau r_n)^k, \end{aligned}$$
(3.11)

which is bounded when \(\tau r_n < 1\). Note that \(\mathbf {h} \in C^{\infty }(\Gamma , H)\), which is differentiable. By taking \(\tau \rightarrow 0\) in (3.8), we recover the same right hand side as in (3.7) and the existence of a solution \(\{\mathbf {u}_{0}(y), p_{0}(y), \mathbf {f}_{0}(y)\} \in V\times Q\times G\) and \(\{\mathbf {u}^a_{0}(y), p^a_{0}(y) \} \in V\times Q\) due to the compactness of \(V\times Q\times G\) and \(V\times Q\). Moreover,

$$\begin{aligned}&||\{\mathbf {u}_{0}(y), p_{0}(y), \mathbf {f}_{0}(y)\}||_{V\times Q \times G} + ||\{\mathbf {u}^a_{0}(y), p^a_{0}(y)\}||_{V\times Q} \nonumber \\&\quad \le C_{\beta } ||\partial _{y_n}\mathbf {h}(y)||_H + C_{\alpha ,\beta } \frac{|\partial _{y_n} \nu (y)|}{\nu (\bar{y})}\left( C_{\alpha } ||\{\mathbf {u}_d, p_d\}||_{L\times Q} + C_{\beta } ||\mathbf {h}(y)||_H \right) \nonumber \\&\quad \le 2(C_{\alpha } ||\{\mathbf {u}_d, p_d\}||_{L\times Q} + C_{\beta } ||\mathbf {h}(y)||_H) r_n. \end{aligned}$$
(3.12)

Since the solution of (3.7) (or (3.8) when \(\tau \rightarrow 0\)) is unique, we find the equivalence \(\partial _y^{\mathbf {k}} \{\mathbf {u}(y), p(y), \mathbf {f}(y),\mathbf {u}^a(y),p^a(y)\} = \{\mathbf {u}_{0}(y), p_{0}(y), \mathbf {f}_{0}(y),\mathbf {u}^a_{0}(y), p^a_{0}(y)\}\). Therefore, the estimate (3.3) holds by noting that \(|\mathbf {k}|! = 1\), \(r_n \le \hat{r} r_n = (\hat{r}\mathbf {r})^{\mathbf {k}}\) and \(2 \le C\). Analogously, the continuity \(\mathbf {h} \in C^{\infty }(\Gamma , H)\) implies \(\{\mathbf {u},p,\mathbf {f},\mathbf {u}^a,p^a\} \in C^1(\Gamma , V\times Q \times G\times V \times Q)\).

Step 3. As for more general \(\mathbf {k}\) with \(|\mathbf {k}|>1\), by induction we assume that there exists \(\partial _y^{\mathbf {k}'} \{\mathbf {u}(y),p(y),\mathbf {f}(y)\} \in V\times Q\times G\) and \(\partial _y^{\mathbf {k}'} \{\mathbf {u}^a(y),p^a(y)\} \in V \times Q\) such that the estimate (3.3) holds for every \(\mathbf {k}' \in \Lambda (\mathbf {k})\) and \(\mathbf {k}'\ne \mathbf {k}\), where the multivariate index set \(\Lambda (\mathbf {k})\) is defined as \( \Lambda (\mathbf {k}) := \left\{ \mathbf {k}'\in \mathbb {N}^N_0: k'_n \le k_n, \forall 1 \le n \le N \right\} . \) Let us associate the multi-index \(\mathbf {k}\) with a set of indices \(\mathcal {K}\), e.g., if \(\mathbf {k} = (2,1,0)\), then \(\mathcal {K} = \{1_1,1_2,2_1\}\), i.e. there are two indices for the first dimension (represented by \(1_1\) and \(1_2\)) and one index for the second dimension (represented by \(2_1\)). Let \(\mathcal {P}(\mathcal {K})\) denote the power set of \(\mathcal {K}\). We define a map \(\mathcal {M}: \mathcal {P}(\mathcal {K}) \rightarrow \Lambda (\mathbf {k})\) such that \(\mathcal {M}(\mathcal {S}) = \mathbf {s}\), being \(\mathcal {S}\) the set of indices associated with the multi-index \(\mathbf {s}\). Let us take \(\mathbf {k}\)-th order partial derivative of problem (3.4) with respect to the parameter y, which leads to the following problem thanks to the general Leibniz rule: find \(\partial _y^{\mathbf {k}}\{\mathbf {u}(y),p(y),\mathbf {f}(y)\} \in V \times Q \times G\) and \(\partial _y^{\mathbf {k}}\{\mathbf {u}^a(y),p^a(y)\} \in V \times Q\) such that

$$\begin{aligned} \left\{ \begin{array}{l} \displaystyle \mathcal {A}\left( \partial _y^{\mathbf {k}}\{\mathbf {u}(y),p(y),\mathbf {f}(y)\},\{\mathbf {v}^a,q^a,\mathbf {g}\}\right) + \mathcal {B}(\{\mathbf {v}^a,q^a,\mathbf {g}\}, \partial _y^{\mathbf {k}}\{\mathbf {u}^a(y),p^a(y)\};y) \\ \displaystyle \quad = \!-\!\sum _{\mathcal {S}\in \mathcal {P}(\mathcal {K})\setminus \mathcal {K}, \mathbf {k}' = \mathcal {M}(\mathcal {S})}(\partial _y^{\mathbf {k}-\mathbf {k}'}\nu (y)\nabla \partial _y^{\mathbf {k}'}\mathbf {u}^a(y),\nabla \mathbf {v}^a) \quad \forall \{\mathbf {v}^a,q^a,\mathbf {g}\} \in V \times Q {\times } G, \\ \displaystyle \mathcal {B}(\partial _y^{\mathbf {k}}\{\mathbf {u}(y),p(y),\mathbf {g}(y)\},\{\mathbf {v},q\};y) = (\partial _y^{\mathbf {k}}\mathbf {h}(y),\mathbf {v})_{\partial D_N} \\ \displaystyle \quad -\sum _{\mathcal {S}\in \mathcal {P}(\mathcal {K})\setminus \mathcal {K}, \mathbf {k}' = \mathcal {M}(\mathcal {S})}(\partial _y^{\mathbf {k}-\mathbf {k}'}\nu (y)\nabla \partial _y^{\mathbf {k}'} \mathbf {u}(y),\nabla \mathbf {v})\quad \forall \{\mathbf {v},q\} \in V \times Q, \end{array} \right. \end{aligned}$$
(3.13)

By applying the variation of Brezzi theorem in the appendix to (3.13), we have

$$\begin{aligned}&||\partial _y^{\mathbf {k}} \{\mathbf {u}(y),p(y),\mathbf {f}(y)\}||_{V \times Q \times G} \!+\! ||\partial _y^{\mathbf {k}} \{\mathbf {u}^a(y),p^a(y)\}||_{V \times Q} \le C_{\beta } ||\partial _y^{\mathbf {k}} \mathbf {h}(y)||_H \!+\! C_{\alpha ,\beta }\nonumber \\&\sum _{\begin{array}{c} \mathcal {S}\in \mathcal {P}(\mathcal {K})\setminus \mathcal {K} \\ \mathbf {k}' = \mathcal {M}(\mathcal {S}) \end{array}}\frac{|\partial _y^{\mathbf {k}-\mathbf {k}'}\nu (y)|}{\nu (\bar{y})} \left( ||\partial _y^{\mathbf {k}'} \{\mathbf {u}(y),p(y),\mathbf {f}(y)\}||_{V \times Q \times G}{+}||\partial _y^{\mathbf {k}'}\{\mathbf {u}^a(y),p^a(y)\}||_{V\times Q}\right) , \end{aligned}$$
(3.14)

The existence of \(\partial _y^{\mathbf {k}} \{\mathbf {u}(y),p(y),\mathbf {f}(y)\} \in V\times Q\times G\), \(\partial _y^{\mathbf {k}} \{\mathbf {u}^a(y),p^a(y)\} \in V\times Q\) can be proved following the same argument (by using difference quotient) as in Step 2. As for the estimate (3.3), we first prove the following auxiliary estimate

$$\begin{aligned}&||\{\partial _y^{\mathbf {k}} \{\mathbf {u}(y),p(y),\mathbf {f}(y)\}||_{V \times Q \times G} + ||\partial _y^{\mathbf {k}} \{\mathbf {u}^a(y),p^a(y)\}||_{V \times Q} \nonumber \\&\quad \le \left( C_{\alpha } ||\{\mathbf {u}_d, p_d\}||_{L\times Q} + C_{\beta } ||\mathbf {h}(y)||_H\right) s(k) \mathbf {r}^{\mathbf {k}}, \end{aligned}$$
(3.15)

where \(k := |\mathbf {k}|=k_1 + \cdots +k_N\), and s(k) depends on k according to the following recursive formula,

$$\begin{aligned} s(0) = 1, s(1) = 2, s(k) = k! + \sum _{k'=0}^{k-1} \left( \begin{array}{c} k\\ k' \end{array} \right) s(k'). \end{aligned}$$
(3.16)

In fact, (3.15) holds for \(|\mathbf {k}|=0\) and \(|\mathbf {k}|=1\) due to (3.6) and (3.12). By induction, we assume that the estimate (3.15) holds for every \(\mathbf {k}'\in \Lambda (\mathbf {k})\) and \(\mathbf {k}'\ne \mathbf {k}\), so that (3.14) implies

$$\begin{aligned}&||\partial _y^{\mathbf {k}} \{\mathbf {u}(y),p(y),\mathbf {f}(y)\}||_{V \times Q \times G} + ||\partial _y^{\mathbf {k}} \{\mathbf {u}^a(y),p^a(y)\}||_{V \times Q} \le C_{\beta } ||\partial _y^{\mathbf {k}} \mathbf {h}(y)||_H \nonumber \\&\qquad + C_{\alpha ,\beta } \sum _{\mathcal {S}\in \mathcal {P}(\mathcal {K})\setminus \mathcal {K}, \mathbf {k}' = \mathcal {M}(\mathcal {S})} \frac{|\partial _y^{\mathbf {k}-\mathbf {k}'}\nu (y)|}{\nu (\bar{y})} \left( C_{\alpha } ||\{\mathbf {u}_d, p_d\}||_{L\times Q} + C_{\beta } ||\mathbf {h}(y)||_H\right) s(k') \mathbf {r}^{\mathbf {k}'} \nonumber \\&\quad \le \left( C_{\alpha } ||\{\mathbf {u}_d, p_d\}||_{L\times Q} + C_{\beta } ||\mathbf {h}(y)||_H\right) \left( k! \mathbf {r}^{\mathbf {k}} + \sum _{\mathcal {S}\in \mathcal {P}(\mathcal {K})\setminus \mathcal {K}, \mathbf {k}' = \mathcal {M}(\mathcal {S})} \mathbf {r}^{\mathbf {k}-\mathbf {k}'} s(k')\mathbf {r}^{\mathbf {k'}}\right) \nonumber \\&\quad = \left( C_{\alpha } ||\{\mathbf {u}_d, p_d\}||_{L\times Q} + C_{\beta } ||\mathbf {h}(y)||_H\right) \left( k!+ \sum _{k'=0}^{k-1} \left( \begin{array}{c} k\\ k' \end{array} \right) s(k') \right) \mathbf {r}^{\mathbf {k}}\nonumber \\&\quad =\left( C_{\alpha } ||\{\mathbf {u}_d, p_d\}||_{L\times Q} + C_{\beta } ||\mathbf {h}(y)||_H\right) s(k) \mathbf {r}^{\mathbf {k}}, \end{aligned}$$
(3.17)

where we have used the Assumption 3 for the second inequality, the fact that \(\mathbf {r}^{\mathbf {k}} = \mathbf {r}^{\mathbf {k}-\mathbf {k}'} \mathbf {r}^{\mathbf {k}'}\) for any \(\mathbf {k}'\in \Lambda (\mathbf {k})\), and the following relation

$$\begin{aligned} \sum _{\mathcal {S}\in \mathcal {P}(\mathcal {K})\setminus \mathcal {K}, \mathbf {k}' = \mathcal {M}(\mathcal {S})} s({k'}) = \sum _{k'=0}^{k-1} \left( \begin{array}{c} k\\ k' \end{array} \right) s(k'). \end{aligned}$$
(3.18)

To this end, it is left to establish a suitable bound for s(k) in order to prove the estimate (3.3) from the estimate (3.15). Let us define \(t(k) = s(k)/k!\), so that from (3.16) we have

$$\begin{aligned} t(k) = \frac{1}{k!} \left( k! + \sum _{k'=0}^{k-1} \frac{k!}{(k-k')!} \frac{s(k')}{k'!}\right) = 1 + \sum _{k'=0}^{k-1} \frac{t(k')}{(k-k')!} . \end{aligned}$$
(3.19)

We show by induction that when \(\hat{r} > 1/\log (2)\) and \(c_r \ge 1/(2-e^{1/\hat{r}})\), then \(t(k) \le c_r \hat{r}^k\) for all \(k =0, 1, \dots \). In fact, when \(k = 0\), \(t(0) = s(0)/0! = 1 < c_r \); given any \(k > 0\), suppose it holds for all \(k' < k\), then (3.19) yields

$$\begin{aligned} t(k) -1 = \sum _{k'=0}^{k-1} \frac{t(k')}{(k-k')!} = \sum _{k'=1}^{k} \frac{t(k-k')}{k'!} \le c_r\hat{r}^k \sum _{k'=1}^{k} \frac{\hat{r}^{-k'}}{k'!} \le c_r \hat{r}^k \left( e^{\frac{1}{\hat{r}}}-1\right) .\nonumber \\ \end{aligned}$$
(3.20)

As \(c_r \hat{r}^k \left( e^{1/{\hat{r}}}-1\right) + 1\le c_r \hat{r}^k\) when \(\hat{r} > 1/\log (2)\) and \(c_r \ge 1/(2-e^{1/\hat{r}})\) so that \(t(k) \le c_r \hat{r}^k\) for any \(k = 0, 1, \ldots \). Therefore, \(s(k) = t(k)k! \le c_r \hat{r}^k k!\), implying that

$$\begin{aligned} s(k) \le c_r \hat{r}^{k}k! = c_r \mathbf {r}_{\hat{r}}^{\mathbf {k}}k!, \end{aligned}$$
(3.21)

where the N-dimensional constant rate vector \(\mathbf {r}_{\hat{r}} = (\hat{r}, \dots , \hat{r})\). The proof is concluded by substituting (3.21) into (3.15), noting \(\mathbf {r}_{\hat{r}}^{\mathbf {k}}\mathbf {r}^{\mathbf {k}} = (\hat{r}\mathbf {r})^{\mathbf {k}}\), and \(c_r \le C \) in (3.3). Note that the continuity of the \(\mathbf {k}\)-th derivative of the solution follows from that of the input data, as in Step 1 and Step 2. \(\square \)

Let us define a complex region associated with the stability estimate (3.3) as

$$\begin{aligned} \Sigma := \left\{ z \in \mathbb {C}^N: \exists y \in \Gamma \text { such that } \sum _{n=1}^N \hat{r}r_n|z_n-y_n| < 1\right\} . \end{aligned}$$
(3.22)

Then we have that the solution does not only have bounded partial derivative but can be analytically extended to the complex region \(\Sigma \), as stated in the following theorem:

Theorem 3.2

Under assumption 3, the solution of the semi-weak saddle point problem (3.4) admits an analytical extension to the region \(\Sigma \) defined in (3.22).

Proof

Given any \(y\in \Gamma \), the Taylor expansion of the solution of problem (3.4) \(\{\mathbf {u},p,\mathbf {f}\}:\Gamma \rightarrow V\times Q\times G\) and \(\{\mathbf {u}^a,p^a\}:\Gamma \rightarrow V\times Q\) about y reads

$$\begin{aligned} \{\mathbf {u}(z),p(z),\mathbf {f}(z)\} = \sum _{\mathbf {k}\in \mathbb {N}^N_0} \frac{\partial _y^{\mathbf {k}}\{\mathbf {u}(y),p(y),\mathbf {f}(y)\}}{\mathbf {k}!}(z-y)^{\mathbf {k}} \end{aligned}$$
(3.23)

and

$$\begin{aligned} \{\mathbf {u}^a(z),p^a(z)\} = \sum _{\mathbf {k}\in \mathbb {N}^N_0} \frac{\partial _y^{\mathbf {k}}\{\mathbf {u}^a(y),p^a(y)\}}{\mathbf {k}!}(z-y)^{\mathbf {k}}, \end{aligned}$$
(3.24)

where \((z-y)^{\mathbf {k}} = \prod _{n=1}^N (z_n-y_n)^{k_n}\). By Theorem 3.1, we have

$$\begin{aligned}&||\{\mathbf {u}(z),p(z),\mathbf {f}(z)\}||_{V\times Q\times G} + ||\{\mathbf {u}^a(z),p^a(z)\}||_{V\times Q} \nonumber \\&\quad \le \sum _{\mathbf {k} \in \mathbb {N}^N_0} \left( ||\partial _y^{\mathbf {k}} \{\mathbf {u}(y),p(y),\mathbf {f}(y)\}||_{V \times Q \times G} + ||\partial _y^{\mathbf {k}} \{\mathbf {u}^a(y),p^a(y)\}||_{V \times Q}\right) \frac{|z-y|^{\mathbf {k}}}{\mathbf {k}!}\nonumber \\&\quad \le C (C_{\alpha } ||\{\mathbf {u}_d,p_d\}||_{L\times Q} + C_{\beta } ||\mathbf {h}(y)||_H ) \sum _{\mathbf {k} \in \mathbb {N}^N_0} |\mathbf {k}|! (\hat{r}\mathbf {r})^{\mathbf {k}} \frac{|z-y|^{\mathbf {k}}}{\mathbf {k}!}. \end{aligned}$$
(3.25)

Upon reordering, we have

$$\begin{aligned} \sum _{\mathbf {k} \in \mathbb {N}^N_0} |\mathbf {k}|! (\hat{r}\mathbf {r})^{\mathbf {k}} \frac{|z-y|^{\mathbf {k}}}{\mathbf {k}!} = \sum _{k=0}^{\infty } \sum _{|\mathbf {k}| = k} \frac{k!}{\mathbf {k}!} \prod _{n=1}^N(\hat{r}r_n|z_n-y_n|)^{k_n}. \end{aligned}$$
(3.26)

By multinomial theorem, we have

$$\begin{aligned} \sum _{k=0}^{\infty } \sum _{|\mathbf {k}| = k} \frac{k!}{\mathbf {k}!} \prod _{n=1}^N(\hat{r}r_n|z_n-y_n|)^{k_n} = \sum _{k=0}^{\infty } \left( \sum _{n=1}^N \hat{r}r_n|z_n-y_n| \right) ^k, \end{aligned}$$
(3.27)

which converges in the disk \(\mathcal {D}(y) = \{z\in \mathbb {C}^N:\sum _{n=1}^N \hat{r}r_n|z_n-y_n| < 1\}\). Therefore, the Taylor expansion of \(\{\mathbf {u}(z),p(z),\mathbf {f}(z)\}\) and \(\{\mathbf {u}^a(z),p^a(z)\}\) converges to \(\{\mathbf {u}(z),p(z),\mathbf {f}(z)\}\) and \(\{\mathbf {u}^a(z),p^a(z)\}\), respectively, in \(\mathcal {D}(y)\). As it holds for any \(y\in \Gamma \), we conclude that the solution of the problem (3.4) can be analytically extended to \(\Sigma \). \(\square \)

4 Numerical approximation

In order to solve the constrained optimization problem (2.14), we introduce a numerical approximation of the equivalent saddle point problem (2.17) in the probability domain \(\Gamma \) by a stochastic collocation method and in the physical domain D by a finite element method.

4.1 Stochastic collocation method

For stochastic problems with smooth solution in the probability space, the stochastic collocation method based on sparse grid techniques [2, 38, 39, 53] features both fast convergence of stochastic Galerkin method and the non-intrusive structure of Monte Carlo method. This makes it an efficient method in solving stochastic optimal control problems [11, 31, 50].

Let X denote a general Hilbert space defined in the physical domain D, e.g. \(H^1(D)\). Let \(C(\Gamma ; X)\) be the space of continuous functions with values in X, i.e.

$$\begin{aligned} C(\Gamma ; X):=\left\{ v:\Gamma \rightarrow X| v \text { is continuously measurable and } \max _{y\in \Gamma } ||v(y)||_{X} < \infty \right\} . \end{aligned}$$
(4.1)

Let \(\mathcal {P}_m(\Gamma )\) be a space of polynomials with degree less than or equal to m in each coordinate \(y_n, 1 \le n \le N\). Let \(\mathcal {U}^{i_n}:C(\Gamma ;X)\rightarrow \mathcal {P}_{m(i_n)-1}(\Gamma _n)\otimes X\) denote a one-dimensional Lagrangian interpolation operator based on the set of collocation nodes \(\Theta _n^{i_n} = \{y_n^{1}, \dots , y_n^{m(i_n)}\}\), \(1\le n \le N\), defined as

$$\begin{aligned} \mathcal {U}^{i_n} v(y_n) = \sum _{j_n=1}^{m(i_n)} v(y_n^{j_n}) l^{j_n}_n (y_n), \text { with } l^{j_n}_n (y_n) = \prod _{1 \le k \le m(i_n):k\ne j_n} \frac{y_n - y_n^k}{y_n^{j_n} - y_n^k}, \end{aligned}$$
(4.2)

where m(k) is a function of k depending on the choice of collocation nodes, e.g. \(m(k) = 1\) when \(k = 1\) and \(m(k) = 2^{k-1}+1, 1\le n \le N\) when \(k>1\) [39]. We define the sparse grid Smolyak formula \(\mathcal {S}_q:C(\Gamma ;X)\rightarrow \mathcal {P}_{m(q-N+1)-1}(\Gamma )\otimes X\) as [39]

$$\begin{aligned} {\mathcal {S}}_q v(y) = \sum _{{q-N+1}\le |\mathbf {i}|\le q} (-1)^{q-|\mathbf {i}|} \left( \begin{array}{c} N-1\\ q-|\mathbf {i}| \end{array} \right) \mathcal {I}_{\mathbf {i}}v(y), \quad q = N, N+1, \ldots \end{aligned}$$
(4.3)

where \(\mathbf {i} = (i_1, \ldots , i_N)\in \mathbb {N}^N_+\) with \(|\mathbf {i}| = i_1 + \cdots + i_N\) is a multi-index. The tensor-product interpolation operator \(\mathcal {I}_{\mathbf {i}}: C(\Gamma ;X) \rightarrow \mathcal {P}_{m(\mathbf {i})-1}(\Gamma )\otimes X\) is defined on the set of collocation nodes \(\Theta ^{\mathbf {i}} = \Theta _1^{i_1}\times \cdots \times \Theta _N^{i_N}\) as

$$\begin{aligned} \mathcal {I}_{\mathbf {i}}v(y) = (\mathcal {U}^{i_1}\otimes \cdots \otimes \mathcal {U}^{i_N}) v(y) = \sum _{j_1=1}^{m(i_1)}\cdots \sum _{j_N=1}^{m(i_N)} v(y_1^{j_1}, \ldots , y_N^{j_N})\bigotimes _{n=1}^Nl_n^{j_n}(y_n). \end{aligned}$$
(4.4)

With the definition of Lagrangian interpolation operator (4.3) and (4.4), we can approximate statistics of interest, e.g. expectation, by

$$\begin{aligned} \mathbb {E}[{\mathcal {S}}_q v] = \sum _{{q-N+1}\le |\mathbf {i}|\le q} (-1)^{q-|\mathbf {i}|} \left( \begin{array}{c} N-1\\ q-|\mathbf {i}| \end{array} \right) \mathbb {E}[\mathcal {I}_{\mathbf {i}}v], \end{aligned}$$
(4.5)

being \(\mathbb {E}[\mathcal {I}_{\mathbf {i}}v]\) defined as

$$\begin{aligned} \mathbb {E}[\mathcal {I}_{\mathbf {i}}v] = \sum _{j_1=1}^{m(i_1)}\cdots \sum _{j_N=1}^{m(i_N)} v(y_1^{j_1}, \ldots , y_N^{j_N}) \prod _{n=1}^N w_n^{j_n}, \end{aligned}$$
(4.6)

where the quadrature weights are given by

$$\begin{aligned} w_n^{j_n}= \int _{\Gamma _n} l_n^{j_n}(y_n) \rho (y_n)dy_n \quad 0\le j_n \le m(i_n), 1\le n \le N. \end{aligned}$$
(4.7)

Remark 4.1

The accuracy of stochastic collocation approximation depends on the choice of the collocation nodes. Among the most popular, we mention Clenshaw-Curtis abscissas, Gauss abscissas of certain orthogonal polynomials corresponding to the joint probability density function \(\rho \), e.g. Gauss-Jacobi abscissas for beta density function, Gauss-Hermite abscissas for normal density function, see [9, 39].

By using the difference operator \(\triangle ^{i_n} = \mathcal {U}^{i_n} - \mathcal {U}^{i_n-1}\), with \(\mathcal {U}^{0}=0\), we have an alternative representation of the sparse grid Smolyak formula (4.3) as follows

$$\begin{aligned} {\mathcal {S}}_q v(y) = \sum _{\mathbf {i}\in X(q,N)}(\triangle ^{i_1}\otimes \cdots \otimes \triangle ^{i_N}) v(y) \end{aligned}$$
(4.8)

with the multivariate index set defined as

$$\begin{aligned} X(q,N) :=\left\{ \mathbf {i}\in \mathbb {N}^N_+: \sum _{n=1}^N i_n \le q \right\} , \quad q = N, N+1, \ldots . \end{aligned}$$
(4.9)

Let \(H(q, N) := \{\Theta ^{\mathbf {i}}, \mathbf {i}\in X(q, N)\}\) denote the set of collocation nodes associated to the index set X(qN), we have \(H(q,N) \subset H(q+1,N) \subset \cdots \). The cardinality of H(qN) grows exponentially with respect to the dimension of the problem [39, 53]. In tackling high-dimensional problems, each dimension may be given appropriate relevance by applying anisotropic sparse grid interpolation formula written as [38]

$$\begin{aligned} {\mathcal {S}}^{\varvec{\alpha }}_q v(y) = \sum _{\mathbf {i}\in X_{\varvec{\alpha }}(q,N)}(\triangle ^{i_1}\otimes \cdots \otimes \triangle ^{i_N}) v(y), \end{aligned}$$
(4.10)

where the anisotropic multivariate index set \(X_{\varvec{\alpha }}(q,N)\) is defined as

$$\begin{aligned} X_{\varvec{\alpha }}(q,N) :=\left\{ \mathbf {i}\in \mathbb {N}^N_+: \sum _{n=1}^N \alpha _n i_n \le \min _{1\le n \le N} \alpha _n q \right\} \!, \, q = N, N+1, \ldots . \end{aligned}$$
(4.11)

Here, \(\varvec{\alpha } = (\alpha _1, \dots , \alpha _N)\) is a positive multivariate weight, which can be obtained by a priori or a posteriori estimate [38], or by a suitable dimension adaptive algorithm [20]. Similarly, we define the set of collocation nodes \(H_{\alpha }(q,N):=\{\Theta ^{\mathbf {i}}, \mathbf {i}\in X_{\varvec{\alpha }}(q,N)\}\). Note that the isotropic sparse grid interpolation (4.8) is a special case corresponding to \(\varvec{\alpha } = \mathbf {1}\). Evaluation of statistics based on the anisotropic sparse grid stochastic collocation method, e.g. expectation, is straightforward by the following approximation

$$\begin{aligned} \mathbb {E}[\mathcal {S}^{\varvec{\alpha }}_q v] = \sum _{\mathbf {i}\in X_{\varvec{\alpha }}(q,N)} \mathbb {E}\left[ (\triangle ^{i_1}\otimes \cdots \otimes \triangle ^{i_N}) v\right] . \end{aligned}$$
(4.12)

4.2 Finite element method

Given a regular triangulation \(\mathcal {T}_h\) of the physical domain \(\bar{D}\subset \mathbb {R}^d\) with mesh size h, we define the finite element space [41]

$$\begin{aligned} X_h^k := \{v_h \in C^0(\bar{D}): v_h|_K \in \mathbb {P}_k \quad \forall K \in \mathcal {T}_h\}, \quad k \ge 1, \end{aligned}$$
(4.13)

where \(C^0(\bar{D})\) is the space of continuous functions in \(\bar{D}\), \(\mathbb {P}_k, k \ge 1\), is the space of polynomials of degree less than or equal to k in the variables \(x_1, \dots , x_d\). In particular, we define \(V^k_h := (X_h^k)^d\cap V\), \(Q^m_h := X_h^m\cap Q\), and \(G^l_h := (X_h^l)^d\cap G\) with \(k,m,l\ge 1\) as finite element approximation spaces corresponding to the Hilbert spaces V, Q and G, respectively, defined in Sect. 3. The semi-weak finite element approximation of the saddle point problem (2.17) reads: for any \(y\in \Gamma \), find \(\{\mathbf {u}_h(y),p_h(y),\mathbf {f}_h(y)\} \in V^k_h \times Q^m_h \times G^l_h\) and \(\{\mathbf {u}_h^a(y),p_h^a(y)\} \in V^k_h \times Q^m_h\) such that

$$\begin{aligned} \left\{ \begin{array}{l} \mathcal {A}\left( \{\mathbf {u}_h(y),p_h(y),\mathbf {f}_h(y)\},\{\mathbf {v}_h^a,q_h^a,\mathbf {g}_h\}\right) + \mathcal {B}(\{\mathbf {v}_h^a,q_h^a,\mathbf {g}_h\}, \{\mathbf {u}_h^a(y),p_h^a(y)\};y) \\ \quad = (\mathbf {u}_d,\mathbf {v}_h^a)+(p_d,q_h^a) \quad \forall \{\mathbf {v}_h^a,q_h^a,\mathbf {g}_h\} \in V^k_h \times Q^m_h \times G^l_h, \\ \mathcal {B}(\{\mathbf {u}_h(y),p_h(y),\mathbf {g}_h(y)\},\{\mathbf {v}_h,q_h\};y) = (\mathbf {h}(y),\mathbf {v}_h)_{\partial D_N} \quad \forall \{\mathbf {v}_h,q_h\} \in V^k_h \times Q^m_h. \end{array} \right. \nonumber \\ \end{aligned}$$
(4.14)

We use the Taylor-Hood elements (\(m=k-1, k\ge 2\)), among many feasible choices [41], which leads to stable finite element approximation featuring optimal convergence rate. We set \(l=k\) for the control function space \(G^l_h\), so that \(V_h^k = G_h^l\).

Let the finite element solution of the saddle point problem (4.14) be written as

$$\begin{aligned} \mathbf {u}_h(y) = \sum _{n=1}^{N_v} u_n(y) \varvec{\psi }_n, \quad p_h(y) = \sum _{n=1}^{N_p} p_n(y) \varphi _n, \quad \mathbf {f}_h(y) = \sum _{n=1}^{N_v} f_n(y)\varvec{\psi }_n, \end{aligned}$$
(4.15)

and

$$\begin{aligned} \mathbf {u}^a_h(y) = \sum _{n=1}^{N_v} u^a_n(y) \varvec{\psi }_n, \quad p^a_h(y) = \sum _{n=1}^{N_p} p^a_n(y) \varphi _n, \end{aligned}$$
(4.16)

where \(\varvec{\psi }_n, 1\le n \le N_v\) and \(\varphi _n, 1\le n \le N_p\) are the bases of the finite element spaces \(V_h^k\) and \(Q_h^k\), respectively. The finite element mass matrices \(M_{v,h}\) and \(M_{p,h}\) are obtained as

$$\begin{aligned} (M_{v,h})_{mn} = (\varvec{\psi }_n,\varvec{\psi }_m), \; 1\le m,n\le N_v; \quad (M_{p,h})_{mn} = (\varphi _n,\varphi _m), \; 1\le m,n\le N_p.\nonumber \\ \end{aligned}$$
(4.17)

We set \(M_{g,h} =M_{c,h} = M_{v,h}\) as \(k=l\). The mass matrix for Neumann boundary condition is given by

$$\begin{aligned} (M_{n,h})_{mn} = (\varvec{\psi }_m,\varvec{\psi }_n)_{\partial D_N}, \; 1 \le m,n\le N_v. \end{aligned}$$
(4.18)

The stiffness matrix \(A_h^y\) is obtained as

$$\begin{aligned} (A_h^y)_{mn} = a(\varvec{\psi }_n, \varvec{\psi }_m;y), \; 1 \le m,n\le N_v, \end{aligned}$$
(4.19)

and the matrix \(B_h\) corresponding to the compatibility condition is written as

$$\begin{aligned} (B_h)_{mn} = b(\varvec{\psi }_m, \varphi _n), \; 1\le m \le N_v, 1\le n \le N_p. \end{aligned}$$
(4.20)

Let \(U_h(y) = (u_1(y), \dots , u_{N_v}(y))^T\) represent the coefficient vector for the finite element function \(\mathbf {u}_h(y)\), and \(P_h(y), F_h(y), U^a_h(y), P^a_h(y)\) the coefficient vectors for the functions \(p_h(y), \mathbf {f}_h(y), \mathbf {u}^a_h(y), {p^a_h(y)}\), and \(U_{d,h}, P_{d,h}, H_h(y)\) the values of \(\mathbf {u}_{d}, p_{d}, \mathbf {h}(y)\) at the finite element nodes. To this end, the algebraic formulation of problem (4.14) reads

(4.21)

Remark 4.2

The matrix of the linear system (4.21) becomes ill-conditioned with large condition number when h or \(\alpha \) is very small, which makes it unsuitable for direct solve. Alternatively, we may solve it by MINRES iteration with the help of suitable block diagonal preconditioners, see e.g. [42, 48] for more details.

5 Multilevel and weighted reduced basis method

To solve a full system (4.21) at one sample \(y\in \Gamma \) is very expensive when the number of degrees of freedom of the finite element approximation is large. The task becomes prohibitive when the dimension of the probability space \(\Gamma \) is so high that a large number of samples are necessary to be used in order to obtain accurate statistics of interest. To circumvent this computational obstacle, a reduced basis method has been employed to solve the optimality system in [35, 36, 46]. In this work, we adopt the same approach and propose however to use a multilevel greedy algorithm and a weighted a posteriori error bound [10, 12, 16].

5.1 Reduced basis approximation

The idea behind reduced basis approximation is to take “snapshots” - that is high fidelity solutions of the underlying PDE model - as bases and then approximate the solution at a new sample by Galerkin projection on the pre-selected snapshots [8, 25, 40, 45, 52]. Specific to the finite element problem (4.21), the associated reduced basis problem can be formulated as: for any \(y\in \Gamma \), find \(\{\mathbf {u}_r(y),p_r(y),\mathbf {f}_r(y)\} \in V_{N_r} \times Q_{N_r} \times G_{N_r}\) and \(\{\mathbf {u}_r^a(y),p_r^a(y)\} \in V_{N_r} \times Q_{N_r}\) such that

$$\begin{aligned} \left\{ \begin{array}{l} \mathcal {A}\left( \{\mathbf {u}_r(y),p_r(y),\mathbf {f}_r(y)\},\{\mathbf {v}_r^a,q_r^a,\mathbf {g}_r\}\right) + \mathcal {B}(\{\mathbf {v}_r^a,q_r^a,\mathbf {g}_r\}, \{\mathbf {u}_r^a(y),p_r^a(y)\};y) \\ \quad = (\mathbf {u}_d,\mathbf {v}_r^a)+(p_d,q_r^a) \quad \forall \{\mathbf {v}_r^a,q_r^a,\mathbf {g}_r\} \in V_{N_r} \times Q_{N_r} \times G_{N_r}, \\ \mathcal {B}(\{\mathbf {u}_r(y),p_r(y),\mathbf {g}_r(y)\},\{\mathbf {v}_r,q_r\};y) = (\mathbf {h}(y),\mathbf {v}_r)_{\partial D_N} \quad \forall \{\mathbf {v}_r,q_r\} \in V_{N_r} \times Q_{N_r}, \end{array} \right. \end{aligned}$$
(5.1)

where \(V_{N_r}, Q_{N_r}, G_{N_r}\) are reduced basis spaces constructed as [35, 36]

$$\begin{aligned} G_{N_r} = \text {span}\{\mathbf {f}_h(y^n), 1\le n \le N_r\}; \end{aligned}$$
(5.2)
$$\begin{aligned} Q_{N_r} = \text {span}\{p_h(y^n), p_h^a(y^n), 1\le n \le N_r\}; \end{aligned}$$
(5.3)

and

$$\begin{aligned} V_{N_r} = \text {span}\{\mathbf {u}_h(y^n), T p_h(y^n), \mathbf {u}^a_h(y^n), T p^a_h(y^n), 1\le n \le N_r\}. \end{aligned}$$
(5.4)

where \(T:Q^m_h \rightarrow V^k_h\) is the supremizer operator given by [19, 44, 47]

$$\begin{aligned} (T q_h, \mathbf {v}_h)_A = b(\mathbf {v}_h,q_h) \quad \forall \mathbf {v}\in V_h^k, \end{aligned}$$
(5.5)

being the A-scale product defined at a reference value \(\bar{y}\), e.g. center of \(\Gamma \),

$$\begin{aligned} (\mathbf {u}, \mathbf {v})_A = a(\mathbf {u}, \mathbf {v}; \bar{y}) \quad \forall \mathbf {u}, \mathbf {v} \in V. \end{aligned}$$
(5.6)

Under such construction of the reduced basis spaces, it can be shown that there exists a unique reduced basis solution to problem (5.1), see e.g. [35, 36].

For the sake of algebraic stability, we perform Gram–Schmidt orthonormalization [47] to the reduced basis spaces \(V_{N_r}\), \(Q_{N_r}\) and \(G_{N_r}\), obtaining the orthonormal bases such that \(V_{N_r} = \text {span}\{\varvec{\zeta }_n^v, 1\le n \le 4N_r\}\), \(Q_{N_r} = \text {span}\{\zeta _n^p, 1\le n \le 2N_r\}\) and \(G_{N_r} = \text {span}\{\varvec{\zeta }_n^g, 1\le n \le N_r\}\). Finally, at any \(y\in \Gamma \), we project the finite element solution \(\{\mathbf {u}_h(y), p_h(y), \mathbf {f}_h(y)\}\in V_h^k \times Q_h^m\times G_h^l\) into the reduced basis space \(V_{N_r}\times Q_{N_r} \times G_{N_r}\) as

$$\begin{aligned} \mathbf {u}_h(y) = \sum _{n=1}^{4N_r} u_n(y)\varvec{\zeta }^v_n,\quad p_h(y) = \sum _{n=1}^{2N_r} p_n(y) \zeta ^p_n, \quad \mathbf {f}_h(y) = \sum _{n=1}^{N_r} f_n(y)\varvec{\zeta }^g_n, \end{aligned}$$
(5.7)

and the adjoint variables \(\{\mathbf {u}^a_h(y), p^a_h(y)\}\in V_h^k \times Q_h^m\) into \(V_{N_r}\times Q_{N_r}\) as

$$\begin{aligned} \mathbf {u}^a_h(y) = \sum _{n=1}^{4N_r} u^a_n(y)\varvec{\zeta }^v_n,\quad p^a_h(y) = \sum _{n=1}^{2N_r} p^a_n(y) \zeta ^p_n. \end{aligned}$$
(5.8)

Let \(U_r(y) = (u_1(y), \ldots , u_{4N_r}(y))\) denote the coefficient vector of the reduced basis approximation, and define \(P_r(y), F_r(y), U^a_r(y)\) and \(P^a_r(y)\) similarly, corresponding to those of the finite element approximation. Let \(\mathcal {Z}^v_{N_r} = (\varvec{\zeta }^v_1, \ldots , \varvec{\zeta }^v_{4N_r})\), \(\mathcal {Z}^p_{N_r} = (\zeta ^p_1, \ldots , {\zeta }^p_{2N_r})\) and \(\mathcal {Z}^g_{N_r} = (\varvec{\zeta }^g_1, \dots , \varvec{\zeta }^g_{N_r})\), by which we define the reduced basis mass matrices as follows: \(M_{v,r} = (\mathcal {Z}^v_{N_r})^T M_{v,h} \mathcal {Z}^v_{N_r}\), \(M_{p,r} = (\mathcal {Z}^p_{N_r})^T M_{p,h} \mathcal {Z}^p_{N_r}\), \(M_{g,r} = (\mathcal {Z}^g_{N_r})^T M_{g,h} \mathcal {Z}^g_{N_r}\), \(M_{c,r} = (\mathcal {Z}^v_{N_r})^T M_{c,h} \mathcal {Z}^g_{N_r}\), \(M_{n,r} = (\mathcal {Z}^v_{N_r})^T M_{n,h} \mathcal {Z}^v_{N_r}\), and the Stokes matrices \(A_r^y\) and \(B_r\) as \(A_r^y = (\mathcal {Z}^v_{N_r})^T A^y_{h} \mathcal {Z}^v_{N_r}\), and \(B_r = (\mathcal {Z}^p_{N_r})^T B_{h} \mathcal {Z}^v_{N_r}\). The reduced basis data vector \(U_{d,r}, P_{d,r}, H_r(y)\) are defined as \(U_{d,r} = (\mathcal {Z}^v_{N_r})^T U_{d,h}, P_{d,r} = (\mathcal {Z}^p_{N_r})^T P_{d,h}, H_{r}(y) = (\mathcal {Z}^v_{N_r})^T H_{h}(y)\). By projecting the finite element system (4.21) into the reduced basis spaces, we obtain the algebraic formulation of the reduced basis problem corresponding to the finite element algebraic system (4.21) as

(5.9)

which is a \(13N_r\times 13N_r\) linear system with dense blocks, whose numerical solution costs far less computational effort than solving the finite element system (4.21) thanks to the fact that \(N_r\) is much smaller than the number of degrees of freedom of the finite element discretization.

5.2 A multilevel greedy algorithm

The efficiency of the reduced basis approximation depends critically on the choice of reduced bases, and thus on the samples \(y^1, \ldots , y^{N_r}\) selected in the construction of the reduced basis spaces \(V_{N_r}, Q_{N_r}, G_{N_r}\). In order to choose the most representative samples, we propose a multilevel greedy algorithm based on the sparse grid construction for stochastic collocation method and reduce the computational cost of the construction of the reduced basis spaces.

To begin, we choose the first sample from the zeroth level of the sparse grid, i.e. \(y^1 \in H(q,N)\) (or \(H_{\varvec{\alpha }}(q, N)\) for anisotropic sparse grid) with \(q-N = 0\), where only one collocation node is available. We solve the finite element problem (4.21) at \(y^1\) and construct the reduced basis space \(V_{1}, Q_{1}, G_{1}\) according to (5.2), (5.3) and (5.4).

Let \(\mathcal {E}_r\) denote the reduced basis approximation error defined as

$$\begin{aligned} \mathcal {E}_r(y) := ||\mathrm {u}_h - \mathrm {u}_r||_{\mathrm {X}}, \end{aligned}$$
(5.10)

where we denote the Hilbert space \(\mathrm {X} = V\times Q\times G\times V\times Q\) equipped with the norm \(||\{v,q,g,v^a,q^a\}||_{\mathrm {X}} = ||v||_V + ||q||_Q +\sqrt{\kappa }||g||_G + ||v^a||_V+||q^a||_Q\), the solution \(\mathrm {u}(y):= \{\mathbf {u}(y), p(y), \mathbf {f}(y), \mathbf {u}^a(y), p^a(y)\}\) with finite element approximation \(\mathrm {u}_h\) and reduced basis approximation \(\mathrm {u}_r\). At each of the level \(q-N = l, l = 1, 2, \ldots , L\) with prescribed \(L\in \mathbb {N}_+\), we first construct the set of collocation nodes H(qN) of the sparse grid and then choose the “most representative” sample \(y^{N_r+1}\) as the one where the solution is worst approximated over the new nodes in the current level of the sparse grid, i.e.

$$\begin{aligned} y^{N_r+1} = \arg \max _{y\in H(q,N) \setminus H(q-1,N)} \triangle _r(y), \end{aligned}$$
(5.11)

where \(\triangle _r(y)\) represents a cheap, sharp and reliable error bound at \(y\in \Gamma \) such that

$$\begin{aligned} c \triangle _r(y) \le \mathcal {E}_r(y) \le \triangle _r(y) \end{aligned}$$
(5.12)

with the constant c as close to 1 as possible. We present the construction of \(\triangle _r(y)\) in the next section. Note that in the hierarchical sparse grid with nested collocation nodes, we have \(H(q-1,N)\subset H(q,N), q\ge N+1\), which provides further computational efficiency since there is no need to evaluate the error at the collocation nodes in the previous level. After updating the reduced basis spaces \(V_{N_r}\), \(Q_{N_r}\) and \(G_{N_r}\) by the finite element solution of problem (4.21) at \(y^{N_r+1}\), we set \(N_r + 1 \rightarrow N_r\) and proceed to choose the next sample until the error \(\mathcal {E}_r(y^{N_r+1})\) is smaller than a prescribed tolerance \(\epsilon _{tol}\). Then we move to the next level \(q-N = l+1\). The multilevel greedy algorithm for construction of the reduced basis space is summarized in Algorithm 1.

figure a

5.3 A weighted a posteriori error bound

In order to efficiently evaluate a sharp and reliable bound for the reduced basis approximation error, we present a residual-based a posteriori error estimate by following [35, 36] and propose using a weighted version of this a posteriori error estimate. At first, the semi-weak saddle point problem (3.4) is recast as an elliptic problem: for any \(y\in \Gamma \), find \(\mathrm {u}(y)\in \mathrm {X}\)

$$\begin{aligned} \mathrm {B}(\mathrm {u}(y), \mathrm {v};y) = \mathrm {F}(\mathrm {v};y) \quad \forall \mathrm {v}\in \mathrm {X}, \end{aligned}$$
(5.13)

where the bilinear form \(\mathrm {B}(\cdot , \cdot ;y):\mathrm {X}\times \mathrm {X}\rightarrow \mathbb {R}\) is given by

$$\begin{aligned}&\mathrm {B}(\mathrm {u}(y), \mathrm {v};y) = \mathcal {A}\left( \{\mathbf {u}(y),p(y),\mathbf {f}(y)\},\{\mathbf {v}^a,q^a,\mathbf {g}\}\right) \nonumber \\&+ \mathcal {B}(\{\mathbf {v}^a,q^a,\mathbf {g}\}, \{\mathbf {u}^a(y),p^a(y)\};y) + \mathcal {B}(\{\mathbf {u}(y),p(y),\mathbf {g}(y)\},\{\mathbf {v},q\};y), \end{aligned}$$
(5.14)

and the linear functional

$$\begin{aligned} \mathrm {F}( \mathrm {v};y) = (\mathbf {u}_d, \mathbf {v}^a) + (p_d, q^a) + (\mathbf {h}(y), \mathbf {v})_{\partial D_N}. \end{aligned}$$
(5.15)

Let the reduced basis approximation error be defined as \(\mathrm {e}(y) = \mathrm {u}_h(y) - \mathrm {u}_r(y)\). To seek an error bound for \(\mathrm {e}(y)\), we consider the residual

$$\begin{aligned} \mathrm {R}(\mathrm {v}_h;y):= \mathrm {F}(\mathrm {v}_h;y)-\mathrm {B}(\mathrm {u}_r(y),\mathrm {v}_h;y) \quad \mathrm {v}_h \in \mathrm {X}_h. \end{aligned}$$
(5.16)

By the stability property of the bilinear form \(\mathrm {B}\) with inf-sup constant \(\beta _c^h(y)\) in \(X_h= V_h\times Q_h\times G_h\times V_h\times Q_h\) and Babuška theorem [54], we obtain

$$\begin{aligned} ||\mathrm {e}(y)||_{\mathrm {X}_h} \le \frac{||\mathrm {R}(\cdot ;y)||_{\mathrm {X}_h'}}{\beta _c^h(y)}=:\triangle _r(y). \end{aligned}$$
(5.17)

Taking the probability density function \(\rho :\Gamma \rightarrow \mathbb {R}_+\) into account, we replace \(\mathcal {E}_r(y)\) in (5.11) by a weighted a posteriori error bound [12] \(\triangle ^{\rho }_r(y) = \sqrt{\rho (y)}\triangle _r(y)\), which provides a bound for \(\sqrt{\rho (y)}||e(y)||_{\mathrm {X}_h}\). The error bound \(\triangle ^{\rho }_r(y)\) assigns high importance at the sample with big probability density, leading to more efficient (using less bases to achieve the same accuracy) evaluation of statistical moments of interest, see [12] for proof and illustrative examples. In order to evaluate the error bound (5.17), we need to compute both the constant \(\beta _c^h(y)\) and the dual-norm of the residual \(||\mathrm {R}(y)||_{\mathrm {X}_h'}\). For the former, we may apply successive constraint method [27] to compute a lower bound \(\beta _{c}^{LB}(y)\le \beta _{c}^{h}(y)\) with cheap online computational cost, or simply use a uniform lower bound \(\beta _{c}^{LB} \le \beta _{c}^{h}(y)\) evaluated at the minimum random viscosity \(\nu _{min}\). As for the latter, we turn to an offline-online decomposition procedure in order to reduce computational effort in the many-query context.

5.4 Offline-online decomposition

The offline-online decomposition takes advantage of the affine structure of the data, as given in examples (2.10) and (2.11). If the data are provided in a non-affine structure, e.g. log-normal Karhunen-Loève expansion [39], we may apply a weighted empirical interpolation method to obtain an affine decomposition of the data function at first, see [14] for details and error analysis. Let us assume that the random viscosity and the Neumann boundary condition undergoes, after possibly performing empirical interpolation [4, 14], the following affine structure

$$\begin{aligned} \nu (y) = \sum _{n=1}^{N_{\nu }} \nu _n \theta ^{\nu }_n(y) \text { and } \mathbf {h}(x,y) = \sum _{n=1}^{N_h} \mathbf {h}_n(x) \theta ^h_n(y) \quad \forall (x,y) \in \partial D_N\times \Gamma , \end{aligned}$$
(5.18)

where \(\theta ^{\nu }_n, 1\le n \le N_{\nu }\) and \(\theta ^{h}_n, 1\le n \le N_h\) are functions of the random vector \(y\in \Gamma \). Let the matrix \(A_r^y\) and vector \(H_r(y)\) in (5.9) be assembled as

$$\begin{aligned} A_r^y = \sum _{n=1}^{N_{\nu }} A_r^n \theta ^{\nu }_n(y) \text { and } H_r(y) = \sum _{n=1}^{N_h} H_r^n \theta ^h_n(y), \end{aligned}$$
(5.19)

where the deterministic reduced basis matrices \(A_r^n, 1\le n \le N_{\nu }\) are defined as

$$\begin{aligned} A_r^n = (\mathcal {Z}_{N_r}^v)^T A_h^n \mathcal {Z}_{N_r}^v \text { with } (A_h^n)_{ij} = (\nu _n \nabla \varvec{\psi }_i, \nabla \varvec{\psi }_j), 1\le i,j\le N_v, \end{aligned}$$
(5.20)

and the deterministic reduced basis vectors \(H_r^n, 1\le n \le N_h\) are defined as

$$\begin{aligned} H_r^n = (\mathcal {Z}_{N_r}^v)^T H_h^n \text { with } (H_h^n)_{i} = (\mathbf {h}_n, \varvec{\psi }_i)_{\partial D_N}, 1\le i \le N_v. \end{aligned}$$
(5.21)

Accordingly, we decompose the global matrix of the linear system (5.9) as

(5.22)

and \(\mathrm {B}_r^n, 1\le n \le N_{\nu }\) with only the blocks (4, 1), (1, 4) as \(A_r^n\) the other blocks zero. Similarly, we decompose the vector on the right hand side of the linear system (5.9) as \(\mathrm {F}_r^0 = (M_{v,r} U_{d,r}, M_{p,r} P_{d,r},0, 0,0)^T\) and \(\mathrm {F}_r^n = (0,0,0,M_{n,r}H_r^n,0)^T, 1\le n \le N_h\). Thus, the algebraic formulation of the problem (5.13) can be written as: for any \(y\in \Gamma \), find \(\mathrm {U}_r(y) := (U_r(y), P_r(y), F_r(y), U_r^a(y), P_r^a(y))^T \in \mathbb {R}^{13N_r}\) such that

$$\begin{aligned} \left( \sum _{n=0}^{N_{\nu }} \theta ^{\nu }_n(y) \mathrm {B}_r^n \right) \mathrm {U}_r(y)= \sum _{n=0}^{N_h}\theta ^h_n(y) \mathrm {F}_r^n, \end{aligned}$$
(5.23)

where \(\theta _0^{\nu }(y) = 1\) and \(\theta _0^{h}(y) = 1\). Since \(\mathrm {B}_r^n, 1\le n \le N_{\nu }\) and \(\mathrm {F}_r^n, 1\le n \le N_h\) are independent of y, we can assemble them in offline stage. Given any \(y\in \Gamma \), the reduced basis solution can be obtained by solving the linear system (5.23) with at most \(O((N_{\nu } + N_h)N_r)\) operations for assembling and \(O((13N_r)^3)\) operations for solve.

As for the evaluation of the residual norm \(||\mathrm {R}(y)||_{\mathrm {X}_h'}\), we first seek the Riesz representation [41] of \(\mathrm {R}(\cdot ;y)\) as \(\hat{\mathrm {e}}(y)\in \mathrm {X}_h\) such that

$$\begin{aligned} (\hat{\mathrm {e}}(y), \mathrm {v}_h)_{\mathrm {X}_h} = \mathrm {R}(\mathrm {v}_h;y) \quad \forall \mathrm {v}_h \in \mathrm {X}_h, \end{aligned}$$
(5.24)

so that we have \(||{\mathrm {R}(\cdot ;y)}||_{\mathrm {X}_h'} = ||\hat{\mathrm {e}}(y)||_{\mathrm {X}_h}\). Let \(\mathrm {B}_n:\mathrm {X}_h\times \mathrm {X}_h \rightarrow \mathbb {R}\) denote the bilinear form defined in the finite element space corresponding to the matrix \(\mathrm {B}_r^n, 0\le n \le N_{\nu }\) and \(\mathrm {F}_n:\mathrm {X}_h \rightarrow \mathbb {R}\) the linear functional corresponding to the vector \(\mathrm {F}_r^n, 0\le n \le N_h\), then the residual defined in (5.16) can be decomposed as

$$\begin{aligned} \mathrm {R}(\mathrm {v}_h;y) = \sum _{n=0}^{N_h} \theta ^h_n(y)\mathrm {F}_n(\mathrm {v}_h) - \sum _{n=0}^{N_{\nu }} \theta ^{\nu }_n(y)\mathrm {B}_n(\mathrm {u}_r, \mathrm {v}_h) \quad \forall \mathrm {v}_h \in \mathrm {X}_h. \end{aligned}$$
(5.25)

By Riesz representation theorem, we have that there exist \(\mathrm {f}_n\in \mathrm {X}_h, 0\le n \le N_h\) and \(\mathrm {b}^k_n\in \mathrm {X}_h, 0\le n \le N_{\nu }, 1\le k \le 13N_r\) such that

$$\begin{aligned} (\mathrm {f}_n, \mathrm {v}_h)_{\mathrm {X}_h} = \mathrm {F}_n(\mathrm {v}_h) \text { and } (\mathrm {b}_n^k, \mathrm {v}_h)_{\mathrm {X}_h} = - \mathrm {B}_n(\mathrm {u}_h^k, \mathrm {v}_h) \quad \forall \mathrm {v}_h \in \mathrm {X}_h, \end{aligned}$$
(5.26)

where we have set the reduced basis solution as \(\mathrm {u}_h^k = (\varvec{\psi }^v_k, 0, 0, 0, 0), 1\le k \le 4N_r\), \(\mathrm {u}_h^k = (0, \varphi _{k-4N_r}^p, 0, 0, 0), 4N_r< k \le 6N_r\), \(\mathrm {u}_h^k = (0, 0, \varvec{\psi }^g_{k-6N_r}, 0, 0), 6N_r< k \le 7N_r\), \(\mathrm {u}_h^k = (0, 0, 0, \varvec{\psi }^v_{k-7N_r}, 0), 7N_r < k \le 11N_r\), \(\mathrm {u}_h^k = (0, 0, 0, 0, \varphi _{k-11N_r}^p), 11N_r< k \le 13N_r\), being 0 the vector with length \(N_v, N_p, N_v, N_v, N_p\) at the first to fifth argument. Finally, we obtain the norm \(||\hat{e}(y)||_{\mathrm {X}_h}\) as

$$\begin{aligned} ||\hat{e}(y)||^2_{\mathrm {X}_h}&= \sum _{n=1}^{N_h}\sum _{n'=1}^{N_h} \theta _n^h(y)(\mathrm {f}_n, \mathrm {f}_{n'})_{\mathrm {X}_h}\theta _{n'}^h(y) \nonumber \\&\quad + 2\sum _{n=1}^{N_h}\sum _{n'=1}^{N_{\nu }}\sum _{k=1}^{13N_r} \theta _n^h(y)(\mathrm {f}_n, \mathrm {b}_{n'}^k)_{\mathrm {X}_h} (\mathrm {u}_r)_k \theta _{n'}^{\nu }(y)\nonumber \\&\quad + \sum _{n=1}^{N_{\nu }}\sum _{n'=1}^{N_{\nu }}\sum _{k=1}^{13N_r}\sum _{k'=1}^{13N_r} \theta _n^{\nu }(y) (\mathrm {u}_r)_k (\mathrm {b}_n^k, \mathrm {b}_{n'}^{k'})_{\mathrm {X}_h} (\mathrm {u}_r)_{k'} \theta _{n'}^{\nu }(y), \end{aligned}$$
(5.27)

where \((\mathrm {f}_n, \mathrm {f}_{n'})_{\mathrm {X}_h}, 1\le n, n'\le N_h\), \((\mathrm {f}_n, \mathrm {b}_{n'}^k)_{\mathrm {X}_h}, 1\le n \le N_h, 1\le n'\le N_{\nu }, 1\le k \le 13N_r\) and \((\mathrm {b}_n^k, \mathrm {b}_{n'}^{k'})_{\mathrm {X}_h}, 1\le n, n' \le N_{\nu }, 1\le k, k'\le 13N_r\) are independent of y and can be computed and stored in the offline stage, while in the online stage, we only need to assemble the formula (5.27) by \(O(N_h^2+13N_rN_{\nu }N_h + (13N_rN_{\nu })^2)\) operations. Recall that \(N_h\) and \(N_{\nu }\) are the number of affine terms of the random Neumann boundary condition and the viscosity, and \(N_r\) is the number of selected samples in the construction of reduced basis space, leading to fast evaluation of the error bound as they are small.

6 Error estimates

The global error of the numerical approximation presented in sections 4 and 5 comprises three components: the stochastic collocation approximation error [2, 38, 39], the finite element approximation error [41, 41], and the weighted reduced basis approximation error [5, 12, 13], which have been analyzed individually in different contexts. In the following, we provide individual error estimate as well as a global error estimate in the context of the constrained optimization problem (2.14). We remark that as a result of the truncation in the Karhunen-Loève expansion (2.11) of the Neumann data, there exists an additional truncation error contributing linearly to the global error in our particular case (thanks to that the solution is linear with respect to this data, see (2.17) or (4.21)). Due to Assumption 2 on finite dimension of the noise, we will not explicitly consider the truncation error anymore.

6.1 Stochastic collocation approximation error

The error of stochastic collocation approximation of the optimal solution depends on the stochastic regularity of the latter.

We consider the case that \(\Gamma \) is bounded, however similar results can be obtained in the same way for unbounded \(\Gamma \) as in [2]. Let the complex region \(\Sigma (\Gamma ;\tau )\) be defined as

$$\begin{aligned} \Sigma (\Gamma ;\tau ) := \{z\in \Sigma : \exists \, y \in \Gamma \text { such that } |z_n-y_n| \le \tau _n, 1\le n \le N \}, \end{aligned}$$
(6.1)

where \(\Sigma \) has been defined in (3.22); \(\tau = (\tau _1, \dots , \tau _N)\) with each element taking the largest possible value (\(\tau _n = 1/(rr_n), 1\le n \le N\)). Thanks to the analytic regularity obtained in Theorem 3.2, we have the following a priori error estimate for tensor-product stochastic collocation approximation of the optimal solution \(\mathrm {u}:\Gamma \rightarrow \mathrm {X}\) (recall that \(\mathrm {u} = \{\mathbf {u}, p, \mathbf {f}, \mathbf {u}^a, p^a\}\) and \(\mathrm {X} = V\times Q\times G\times V\times Q\))

$$\begin{aligned} \mathcal {E}_s := ||\mathrm {u} - \mathrm {u}_s||_{C(\Gamma ;\mathrm {X})} \le \sum _{n=1}^N C^{\mathbf {i}}_{n}\exp (-(m(i_n)-1)r_n), \end{aligned}$$
(6.2)

where we denote \(\mathrm {u}_s=\mathcal {I}_{\mathbf {i}}\mathrm {u}\); the constants \(C^{\mathbf {i}}_{n}, 1\le n \le N\) are bounded by [2, 13]

$$\begin{aligned} C^{\mathbf {i}}_{n} \le (1+ \Lambda (m(i_n))) C_n, \text { being } C_n:=\frac{2}{e^{r_n}-1}\left( \max _{z \in \Sigma (\Gamma ;\tau )} ||\mathrm {u}(z)||_{\mathrm {X}} \right) \end{aligned}$$
(6.3)

with Lebesgue constant \(\Lambda (m) \le 1+ (2/\pi )\log (m+1)\), and convergence rate

$$\begin{aligned} r_n = \log \left( \frac{2\tau _n}{|\Gamma _n|} + \sqrt{1 + \frac{4\tau _n^2}{|\Gamma _n|^2}}\right) > 1, \quad 1\le n \le N, \end{aligned}$$
(6.4)

where \(\Gamma _n\) is the image of the random variable \(y_n\).

Remark 6.1

In the case of unbounded \(\Gamma \), the convergence rate has been obtained as \(r_n = \tau _n/\delta _n, 1\le n \le N\) with \(\delta _n\) depending on the decay of the probability density function at infinity, e.g. \(\delta _n = 1\) for normal density function, see details in [2].

As for the error of isotropic sparse grid Smolyak interpolation (4.3) with Gauss-abscissas, the following error estimate can be proved via (6.2) [39] (denote \(\mathrm {u}_s = \mathcal {S}_{q}\mathrm {u}\))

$$\begin{aligned} \mathcal {E}_s := ||\mathrm {u} - \mathrm {u}_s||_{C(\Gamma ;\mathrm {X})} \le C_s N_q^{-r}, \end{aligned}$$
(6.5)

where \(N_q\) represents the number of collocation nodes, \(C_s\) depends on Lebesgue constant but not on \(N_q\) (see [13, 39] for more explicit expression), r is such that (see [13])

$$\begin{aligned} r \ge \frac{e\log (2)\min \{r_n, 1\le n \le N\}}{3+\log (N)}. \end{aligned}$$
(6.6)

As for the error of anisotropic sparse grid Smolyak interpolation (4.10) based on Gauss-abscissas, we have the following estimate [38] (denote \(\mathrm {u}_s = \mathcal {S}^{\varvec{\alpha }}_{q}\mathrm {u}\))

$$\begin{aligned} \mathcal {E}^{\varvec{\alpha }}_s := ||\mathrm {u} - \mathrm {u}_s||_{C(\Gamma ;\mathrm {X})} \le C_s^{\varvec{\alpha }} N_q^{-r(\varvec{\alpha })}, \end{aligned}$$
(6.7)

where \(C_s^{\varvec{\alpha }}\) depends on Lebesgue constant but not on \(N_q\) and the algebraic convergence rate \(r(\varvec{\alpha })\) is defined as

$$\begin{aligned} r(\varvec{\alpha }) = \frac{e\log (2)\alpha _{min}}{2\log (2)+ \sum _{n=1}^N\frac{\alpha _{min}}{\alpha _n}}, \end{aligned}$$
(6.8)

being \(\alpha _{min} = \min _{1\le n \le N} \alpha _n\) with the choice \(\alpha _n = r_n/2, 1\le n \le N\), with \(r_n\) defined in (6.4). Moreover, the error of the expectation of the stochastic optimal solution evaluated by isotropic or anisotropic sparse grid Smolyak formula is bounded by [13, 39]

$$\begin{aligned} \mathcal {E}_s^e := ||\mathbb {E}[\mathrm {u}] - \mathbb {E}[\mathrm {u}_s]||_{\mathrm {X}} \le ||\mathrm {u} - \mathrm {u}_s||_{L^{2}_{\rho }(\Gamma ;\mathrm {X})} \le C_s^{e} N_q^{-r(\varvec{\alpha })}, \end{aligned}$$
(6.9)

where \(C_s^{e}\) is a constant independent of both Lebesgue constant and \(N_q\), see [13].

6.2 Finite element approximation error

By \(\gamma _h\) and \(\epsilon _h\) we denote the continuity and coercivity constants of the bilinear form \(\mathcal {A}\) and by \(\delta _h\) and \(\beta _h\) the continuity and inf-sup constants of the bilinear form \(\mathcal {B}\) in the finite element space \(V_h^k, Q_h^m, G_h^l\) with the choice of Taylor-Hood elements. By Brezzi theorem [41], we have the following estimate for the error \(\mathcal {E}_h\) of the finite element approximation to solution of the semi-weak saddle point problem (3.4):

$$\begin{aligned} \mathcal {E}_h(y)&:= ||\mathrm {u}(y)-\mathrm {u}_h(y)||_{\mathrm {X}} \nonumber \\&\le C_1^h \inf _{\{\mathbf {v}_h, q_h, \mathbf {g}_h\}\in V^k_h\times Q^m_h\times G^l_h}||\{\mathbf {u}(y), p(y), \mathbf {f}(y)\} - \{\mathbf {v}_h, q_h, \mathbf {g}_h\}||_{V\times Q\times G} \nonumber \\&\quad + C_2^h \inf _{\{\mathbf {v}^a_h, q^a_h\}\in V^k_h\times Q^m_h}||\{\mathbf {u}^a(y), p^a(y)\} - \{\mathbf {v}^a_h, q^a_h\}||_{V\times Q}\nonumber \\&= O(h^k)\left( C_1^h(||\mathbf {u}(y)||_{k+1}+||p(y)||_{k}+\sqrt{\kappa }||\mathbf {f}(y)||_{k+1})\right) \nonumber \\&\quad + O(h^k)\left( C_2^h(||\mathbf {u}^a(y)||_{k+1}+||p^a(y)||_{k})\right) , \end{aligned}$$
(6.10)

where we have chosen \(m=k-1\) and \(l=k\); the constants \(C_1^h\) and \(C_2^h\) are given by

$$\begin{aligned} C_1^h = \left( 1+\frac{\gamma _h}{\epsilon _h}\right) \left( 1+\frac{\gamma _h}{\beta _h}\right) \left( 1+\frac{\delta _h}{\beta _h}\right) \text { and } C_2^h = 1+\frac{\delta _h}{\epsilon _h} + \frac{\delta _h}{\beta _h} + \frac{\gamma _h\delta _h}{\epsilon _h\beta _h}. \end{aligned}$$
(6.11)

Remark 6.2

Equivalently, we may formulate the semi-weak saddle point finite element problem (4.14) as a weakly coercive elliptic problem and apply Babuška theorem to obtain similar finite element error estimate.

6.3 Reduced basis approximation error

In addition to the a posteriori error bound \(\triangle _r\) obtained in section 5.3, we present some results about a priori error estimate for reduced basis approximation following those obtained in [5, 12].

Thanks to the analytic regularity in Theorem 3.2, we have a priori error estimate for reduced basis solution of (5.1) when \( \Gamma \subset \mathbb {R}\) [12]

$$\begin{aligned} \mathcal {E}_r(y) := ||\mathrm {u}_h(y) - \mathrm {u}_r(y)||_{\mathrm {X}} \le C_r \exp (-rN_r), \end{aligned}$$
(6.12)

where r is defined as in (6.4) for a single dimension, the constant \(C_r\) is bounded by

$$\begin{aligned} C_r \le \max _{z \in \Sigma (\Gamma ;\tau )} ||\mathrm {u}(z)||_{\mathrm {X}}. \end{aligned}$$
(6.13)

As in multidimensional case, the error estimate has been obtained via Kolmogorov N-width defined in an abstract Hilbert space X as [5]

$$\begin{aligned} d_N(\Gamma ;X): = \inf _{X_N\subset X}\sup _{y\in \Gamma }\inf _{w_N\in X_N} ||v(y) - w_N||_X, \end{aligned}$$
(6.14)

where \(X_N\) is a N-dimensional subspace of X. We have the following result for \(\mathcal {E}_r\) [5]: suppose that there exists \(M>0\) such that \(d_0(\Gamma )\le M\); moreover, suppose that there exist two positive constants \(c_1>0, c_2>0\), such that

$$\begin{aligned} \text { if } d_{N_r}(\Gamma ;\mathrm {X}_h) \le M \exp (-c_1 N_r^{c_2}) \text { then } \mathcal {E}_r \le c_5 M \exp (-c_3 N_r^{c_4}), \end{aligned}$$
(6.15)

where \(c_4 = c_2/(c_2+1)\), \(c_3>0, c_5>0\) depend only on \(c_1, c_2\) and \(c_6>0\), which measures the sharpness of the reduced basis error bound in (5.17), i.e.

$$\begin{aligned} c_6 \triangle _{r}(y) \le ||\mathrm {u}_h(y)-\mathrm {u}_r(y)||_{\mathrm {X}}, \end{aligned}$$
(6.16)

where \(c_6\) is the same constant appearing in (5.12).

Remark 6.3

The result (6.15) implies that whenever the error of the best possible approximation decays exponentially, the reduced basis error also enjoys an exponential decay with rate depending on the sharpness of the greedy algorithm (6.16).

6.4 Global error estimate

With the individual error estimate presented above, we obtain the global error estimate in the following theorem.

Theorem 6.1

Under Assumption 1, 2 and 3, for any given \(y\in \Gamma \), by finite element approximation and reduced basis approximation we have

$$\begin{aligned} ||\mathrm {u}(y) - \mathrm {u}_r(y)||_{\mathrm {X}} \le \mathcal {E}_h(y) + \mathcal {E}_r(y). \end{aligned}$$
(6.17)

Moreover, the error for evaluation of the expectation using stochastic collocation method, finite element method and weighted reduced basis method can be bounded by

$$\begin{aligned} ||\mathbb {E}[\mathrm {u}] - \mathbb {E}[\mathrm {u}_r]||_{\mathrm {X}} \le \mathcal {E}_s^e + \max _{y\in H_{\varvec{\alpha }}(q,N)}\mathcal {E}_h(y) + \max _{y\in H_{\varvec{\alpha }}(q,N)}\mathcal {E}_r(y), \end{aligned}$$
(6.18)

where \(\varvec{\alpha } = \mathbf {1}\) when using the isotropic sparse grid stochastic collocation method.

Proof

The proof is straightforward by applying triangular inequality as follows:

$$\begin{aligned} ||\mathrm {u}(y) - \mathrm {u}_r(y)||_{\mathrm {X}} \le ||\mathrm {u}(y) - \mathrm {u}_h(y)||_{\mathrm {X}} + ||\mathrm {u}_h(y) - \mathrm {u}_r(y)||_{\mathrm {X}} \le \mathcal {E}_h(y) + \mathcal {E}_r(y). \end{aligned}$$
(6.19)

Similarly, we have the error estimate for the expectation of the optimal solution as

$$\begin{aligned} ||\mathbb {E}[\mathrm {u}] - \mathbb {E}[\mathrm {u}_r]||_{\mathrm {X}}&\le ||\mathbb {E}[\mathrm {u}] - \mathbb {E}[\mathrm {u}_s]||_{\mathrm {X}} + ||\mathbb {E}[\mathrm {u}_s] - \mathbb {E}[\mathrm {u}_h]||_{\mathrm {X}} + ||\mathbb {E}[\mathrm {u}_h] - \mathbb {E}[\mathrm {u}_r]||_{\mathrm {X}}\nonumber \\&\le \mathcal {E}_s^e + ||\mathrm {u}_s - \mathrm {u}_h||_{L^2_{\rho }(\Gamma ;\mathrm {X})} + ||\mathrm {u}_h - \mathrm {u}_r||_{L^2_{\rho }(\Gamma ;\mathrm {X})}\nonumber \\&\le \mathcal {E}_s^e + \max _{y\in H_{\varvec{\alpha }}(q,N)}\mathcal {E}_h(y) + \max _{y\in H_{\varvec{\alpha }}(q,N)}\mathcal {E}_r(y). \end{aligned}$$
(6.20)

\(\square \)

We remark that \(\mathcal {E}_r(y)\) is bounded by \(\triangle _r(y)\), explicitly computed at \(y\in H_{\varvec{\alpha }}(q,N)\).

7 Numerical experiments

In this section, we perform two numerical experiments in testing reduced basis approximation error and stochastic collocation approximation error with sparse grid techniques in isotropic and anisotropic settings. The aim is to demonstrate the efficiency of the proposed reduced basis method in solving constrained optimization problem (2.14). Numerical examples for verifying finite element approximation error in a similar context can be found in [11].

We consider a two dimensional physical domain \(D = (0,1)^2\). The observation data is set as in [24], \(\mathbf {u}_d = (u_{d1}, u_{d2})\) and \(p_d = 0\), where \(u_{d1}(x) = \partial _{x_2}(\phi (x_1)\phi (x_2))/10\) and \(u_{d2}(x) = -\partial _{x_1}(\phi (x_1)\phi (x_2))/10\) with \(\phi (\xi ) = (1-\cos (0.8\pi \xi ))(1-\xi )^2, \xi \in [0,1]\). The random viscosity \(\nu \) is given as in (2.10) which can be transformed as

$$\begin{aligned} \nu (y^{\nu }) = \frac{1}{2}\sum _{n=0}^{N_{\nu }}\nu _n + \frac{1}{2N_{\nu }}\sum _{n=1}^{N_{\nu }}(\nu _n-\nu _0)y^{\nu }_n, \end{aligned}$$
(7.1)

where \(y^{\nu }\in \Gamma _{\nu } = [-1,1]^{N_{\nu }}\) corresponding to \(N_{\nu }\) uniformly distributed random variables. We set \(\nu _0 = 0.01\), \(\nu _n = \nu _0/2^n\) and use \(N_{\nu } = 3\) for both the isotropic and anisotropic tests without loss of generality. Homogeneous Dirichlet boundary condition is imposed on the upper, lower and left edge. Random Neumann boundary condition is imposed on the right edge as given in (2.11) on the Neumann boundary, more explicitly, we set \(\mathbf {h}(x,y^h) = (h_1(x_2,y^h),0)\) with

$$\begin{aligned} h_1(x_2, y^h) = \frac{1}{10}\left( \left( \frac{\sqrt{\pi }L}{2}\right) ^{1/2} y_1^h+ \sum _{n=1}^{N_h} \sqrt{\lambda _n} \left( \sin (n\pi x_2)y_{2n}^h+\cos (n\pi x_2)y_{2n+1}^h\right) \right) , \end{aligned}$$
(7.2)

which comes from truncation of Karhunen-Loève expansion of a Gauss covariance field with correlation length \(L=1/16\) [39]; the eigenvalues \(\lambda _n, 1\le n \le N_h\) are given by

$$\begin{aligned} \lambda _n = \sqrt{\pi }L \exp \left( -(n\pi L)^2/4\right) ; \end{aligned}$$
(7.3)

\(y^h_n, 1\le n \le 2N_h+1\) are uncorrelated random variables with zero mean and unit variance (which will be specified in the following subsections), which are independent of \(y^{\nu }\). Therefore, the random input are \(y = (y^{\nu },y^h)\), living in \(N = N_{\nu }+2N_h+1\) dimensional probability space. As for the specification of the finite element approximation, we use P1 element for pressure space and P2 element for velocity and control space with 1342 elements in total. We set the regularization parameter \(\alpha = 0.01\).

7.1 Isotropic case

In the first experiment, we set \(y^h_n, 1\le n \le 2N_h+1\) with \(N_h = 3\) as independent standard normal random variables (thus the total stochastic dimension \(N = 10\)), and apply isotropic sparse grid stochastic collocation method with Gauss-Legendre abscissa for the collocation of \(y^{\nu }\) and Gauss-Hermite abscissa for the collocation of \(y^h\). In the multilevel greedy algorithm 1, we set the tolerance \(\epsilon _{tol} = 10^{-1}, 10^{-2}, 10^{-3}, 10^{-4},10^{-5}\), and the interpolation level \(q-N = 0, 1, 2, 3\) in the isotropic sparse grid Smolyak formula (4.3). A uniformly lower bound of the inf-sup constant \(\beta _{c}^{LB} = 0.1436\) (computed at the minimum \(\nu _{min}\)) is used. The results for reduced basis construction is reported in Table 1. The number of collocation nodes in each level is shown in the second row; the number of new bases in each level and the samples whose weighted error bound \(\triangle ^{\rho }_r\) is larger than the tolerance \(\epsilon _{tol}\), which may be selected to construct new bases (potential samples), are shown in the brackets; from this result we can see that the number of reduced bases is much less than that of collocation nodes. For example, with the smallest tolerance \(\epsilon _{tol} = 10^{-5}\), we only need 1, 10, 22, 14 new bases in each level, respectively, resulting in 47 bases in total out of 1581 collocation nodes.

Table 1 The number of samples selected by multilevel greedy Algorithm 1 with different tolerance \(\epsilon _{tol}\) in each of the sparse grid level; the value in \((\cdot )\) reports the number of potential/candidate samples

Figure 1 (left) displays the weighted error bound \(\triangle ^{\rho }_{r}\) and the true error of the reduced basis approximation in each level of the construction with tolerance \(\epsilon _{tol} = 10^{-5}\), from which we can see that the error bound is accurate and relatively sharp, providing good estimate of the true error with cheap computation. Note that the error bound and the true error monotonically decrease in one sparse grid level and jump (at \(N_r = 11, 33\)) to a higher value when go to the next sparse grid level due to that new training samples are tested. On the right of Fig. 1 we plot the expectation error (in \(L^2_{\rho }(\Gamma ;\mathrm {X})\) norm) due to the reduced basis approximation using quadrature formula based on sparse grid of different levels, where the expectation error is defined as (with the reference value computed from the 4th level of the sparse grid)

$$\begin{aligned} \text {exp. error} = |||\mathrm {u_s}||_{L^2_{\rho }(\Gamma ;\mathrm {X})} - ||\mathrm {u}_{r}||_{L^2_{\rho }(\Gamma ;\mathrm {X})}| = |(\mathbb {E}[||\mathrm {u_s}||^2_{\mathrm {X}}])^{1/2} - (\mathbb {E}[||\mathrm {u}_{r}||^2_{\mathrm {X}}])^{1/2}|. \end{aligned}$$
(7.4)

Note that the “true” value of \(||\mathrm {u}||_{L^2_{\rho }(\Gamma ;\mathrm {X})}\) is approximated by the finite element solution \(\mathrm {u}_h\) computed at the deepest level \(q-N = 3\). From this figure, different accuracy with different \(\epsilon _{tol}\) can be observed, implying that decreasing tolerance for the construction of the reduced basis space results in more accurate evaluation of statistics of the solution. How to balance the reduced basis approximation error (by choice of \(\epsilon _{tol}\)) and the sparse grid quadrature error (by choice of \(q-N\)) is subject to further investigation.

Fig. 1
figure 1

Left, weighted error bound \(\triangle _r^{\rho }\) and true error of the reduced basis approximation at the selected samples; right expectation error at different levels with different tolerance \(\epsilon _{tol}\)

7.2 Anisotropic case

In the second experiment, we solve the constrained optimization problem (2.14) in high-dimensional probability space by combination of the anisotropic sparse grid techniques and the multilevel weighted reduced basis method. We set \(y^h_n, 1\le n \le N_h\) in (7.2) with \(N_h = 3, 8, 13, 18, 48\) as uniformly distributed random variables, thus leading to \(N = 10, 20, 30, 40, 100\) stochastic dimensions in total. The weight parameter \(\varvec{\alpha }\) is chosen a priori according to [38] in the following conservative way

$$\begin{aligned} \alpha _n = \frac{1}{2} \log \left( 1+\frac{2\tau _n}{|\Gamma _n|}\right) , \text { with } \tau _n = \frac{1}{4\sqrt{\lambda _n}}, \quad 1\le n \le N_h. \end{aligned}$$
(7.5)

We remark that for a more general random field where \(\varvec{\alpha }\) is difficult to be obtained from a priori estimate, we may use a posteriori estimate by fitting a empirical convergence rate in each dimension [38], or use dimension-adaptive approach which determines the weight automatically [20]. The sparse grid level is chosen as \(q-N = 0, 1, 2, 3, 4\). As for the tolerance for the construction of the reduced basis space, we use \(\epsilon _{tol} = 10^{-5}\). The results for the construction of the reduced basis space with different dimension N and different sparse grid level \(q-N\) (results for \(q-N = 0\) are the same as in Table 1, thus omitted here) are presented in Table 2. Similar conclusion as for results in the isotropic case in Table 1 can be drawn for those in the anisotropic case in Table 2. For example, when \(N = 40\), only 97 samples out of 40479 are used for the construction of the reduced basis space, thus resulting in only 97 full solve the finite element problem (4.21) instead of 40479, which considerably reduces the total computational cost. This observation holds even in the 100 dimensional case. Moreover, the number of nodes of sparse grid and the number of reduced bases increase as the dimension increase when N is small, see the change from 10 to 40. However, they stay almost the same when N becomes large, see the change from 40 to 100, which indicates that out of 100 random variables, the first 40 play the most important role on the impact of the stochastic optimal solution when we set sparse grid level at \(q-N = 4\).

Table 2 The number of samples selected by multilevel greedy Algorithm 1 in each of the sparse grid level with different dimensions; the value in \((\cdot )\) reports the number of samples potential as new bases

On the left of Fig. 2, we plot the weighted a posteriori error bound \(\triangle _r^{\rho }\) and the true error of the reduced basis approximation at each sparse grid level with stochastic dimension \(N=100\). We can observe that the error bound is indeed accurate and sharp for the high-dimensional case, especially when the reduced basis space become large. The right of Fig. 2 depicts the expectation error at different sparse grid level. We show the expectation error with the “true” expectation for each stochastic dimension computed the same as in the isotropic sparse grid case, from which we can see that the expectation error converges with an algebraic rate that verifies the error estimate in Sect. 6. Moreover, the error becomes very small at around \(4 \times 10^4\) nodes for the 100 dimensional problem by anisotropic sparse grid technique, which would need around \(7\times 10^7\) nodes for isotropic sparse grid technique at the same sparse grid level \(q-N=4\). Furthermore, we can observe that no “plateau” (flattening) of expectation error appears as in Fig. 1, demonstrating that the multilevel reduced basis method is very efficient in producing the accurate statistics of the stochastic optimal solution even when the number of the reduced bases shown in Table 2 remains critically small (around 97 for high dimensions).

Fig. 2
figure 2

Left plot of the weighted error bound \(\triangle _r^{\rho }\) and true error of the reduced basis approximation at the selected samples with \(N=100\); right expectation error of different dimensions

We remark that we did not take the finite element error into consideration in our numerical experiments. In order to balance the contribution of different errors in the global error estimate (6.17), further efficient adaptive algorithms, not only in stochastic/parameter space but also in physical space, need to be developed.

8 Concluding remarks

This paper addressed several computational challenges by developing new algorithms for the solution of stochastic optimal control problems, in particular constrained by Stokes equations. These include curse-of-dimensionality, ill-conditioned and coupled optimality system, and heavy computational cost. We proved that under suitable assumptions on the regularity of the random input data, the optimality solution of an optimal control problem is smooth with respect to the parameters and can be analytically extended to a complex region. This result, though obtained only for Stokes equations, can be proven by following the same arguments to hold for more general linear systems that satisfy the necessary inf-sup condition. Based on the smoothness of the optimality solution, we developed a computational strategy using adaptive sparse grid and model order reduction techniques for the solution of the stochastic optimal control problems. In particular we proposed a multilevel and weighted reduced basis method, which was proven to be very efficient by two numerical experiments, especially for high-dimensional and large-scale problems requiring a large number of samples and heavy computational cost for a full solve of the optimization problem at each sample. Further study on more general statistical cost functional, adaptive scheme to balance various computational errors and applications to practical flow control problems are ongoing.