1 Introduction

In this work, we analyze and apply the recent MISC method [22] to the approximation of quantities of interest (outputs) from the solutions of linear elliptic partial differential equations (PDEs) with random coefficients. Such equations arise in many applications in which the coefficients of the PDE are described in terms of random variables/fields due either to a lack of knowledge of the system or to its inherent non-predictability. We focus on the weak approximation of the solution of the following linear elliptic \({{\varvec{y}}}\)-parametric problem:

$$\begin{aligned} {\left\{ \begin{array}{ll} - {\text {div}}(a({{\varvec{x}}},{{\varvec{y}}})\,\nabla u({{\varvec{x}}},{{\varvec{y}}})) = \varsigma ({{\varvec{x}}}) &{} \text {in}\quad {\mathscr {B}} \\ u({{\varvec{x}}},{{\varvec{y}}}) \,=\, 0 &{} \text {on}\quad \partial {\mathscr {B}}. \end{array}\right. } \end{aligned}$$
(1)

Here, \({\mathscr {B}} \subset {\mathbb {R}}^d\) with \(d \in {\mathbb {N}}\) denotes the “physical domain,” and the operators \({\text {div}}\) and \(\nabla \) act with respect to the physical variable, \({{\varvec{x}}}\in {\mathscr {B}}\), only. We assume that \({\mathscr {B}}\) has a tensor structure, i.e., \({\mathscr {B}} = {\mathscr {B}}_1 \times {\mathscr {B}}_2 \times \cdots \times {\mathscr {B}}_D\), with \(D \in {\mathbb {N}}\), \({\mathscr {B}}_i \subset {\mathbb {R}}^{d_i}\) and \(\sum _{i=1}^D d_i = d\), see, e.g., Fig. 1a, b. This assumption simplifies the analysis detailed in the following, although MISC can be applied to more general domains, such as

  • domains obtained by mapping from a reference tensor domain as in Fig. 1c, by suitably extending the approaches in [17, 28];

  • non-tensor domains that can be immersed in a tensor bounding box, \({\mathscr {B}} \subset \hat{{\mathscr {B}}} = \hat{{\mathscr {B}}}_1 \times \hat{{\mathscr {B}}}_2 \times \cdots \times \hat{{\mathscr {B}}}_D\), as in Fig. 1d, whose mesh is obtained as a tensor product of meshes on each component, \(\hat{{\mathscr {B}}}_i\), of the bounding box;

  • domains that admit a structured mesh, i.e., with a regular connectivity, whose level of refinement in each “direction” can be set independently, as in Fig. 1e;

  • domains that can be decomposed in patches satisfying any of the conditions above (observe that the meshes on each patch need not be conforming).

Fig. 1
figure 1

Examples of physical domains on which MISC can be applied: a and b are within the framework of this work, while treating (c) requires the introduction of a mapping from (b). MISC can also be formulated in non-tensor domains as in (d) and (e), but extending the analysis of the present work to this case is less straightforward and out of the scope of this work

The parameter \({{\varvec{y}}}=\{y_j\}_{j\ge 1}\) in (1) is a random sequence whose components are independent and uniformly distributed random variables. More precisely, each \(y_j\) has support in \([-1,1]\) with measure \(\frac{~{\text {d}}\lambda }{2}\), where \(~{\text {d}}\lambda \) is the standard Lebesgue measure. We further define \(\Gamma = \times _{ j\ge 1} [-1,1]\) (hereafter referred to as the “stochastic domain” or the “parameter space”), with the cylindrical probability measure \(~{\text {d}}\mu = \mathop {\times }_{j \ge 1} \frac{~{\text {d}}\lambda }{2}\), (see, e.g., [5, Chapter 3, Section 5]).

The right-hand side of (1), namely the deterministic function \(\varsigma \), does not play a central role in this work, and it is assumed to be a smooth function of class \(C_0^{\infty }(\overline{{\mathscr {B}}})\), where \(\overline{{\mathscr {B}}}\) denotes the closure of \({\mathscr {B}}\). This regularity requirement can be relaxed, but we keep it to ease the presentation, since our main goal in this work is to track the effect of the regularity of the coefficient a in (1) on the MISC convergence rate. Here, we focus on the following family of diffusion coefficients:

$$\begin{aligned} a({{\varvec{x}}},{{\varvec{y}}}) = e^{\kappa ({{\varvec{x}}},{{\varvec{y}}})}, {\text { with }} \kappa ({{\varvec{x}}},{{\varvec{y}}}) = \sum _{j \ge 1} \psi _j({{\varvec{x}}}) y_j, \end{aligned}$$
(2)

where \(\{\psi _j\}_{j \ge 1}\) is a sequence of functions \(\psi _j \in C^t(\overline{{\mathscr {B}}})\) for \(t\ge 0\) such that \(\left\| \psi _j \right\| _{L^\infty ({\mathscr {B}})} \rightarrow 0\) as \(j \rightarrow \infty \). Hereafter, without loss of generality, we assume that the sequence \( \{ \left\| \psi _j \right\| _{L^\infty ({\mathscr {B}})} \}_{j \ge 1}\) is ordered in decreasing order. Thanks to a straightforward application of the Lax–Milgram lemma, the well-posedness of (1) in the classical Sobolev space, \(V=H^1_0({\mathscr {B}})\), is guaranteed almost surely (a.s.) in \(\Gamma \) if two functions, \(a_{\min }, a_{\max }:\Gamma \rightarrow {\mathbb {R}}\), exist such that

$$\begin{aligned} 0< a_{\min }({{\varvec{y}}}) \le a({{\varvec{x}}},{{\varvec{y}}}) \le a_{\max }({{\varvec{y}}}) <\infty , \quad \forall {{\varvec{x}}}\in {\mathscr {B}}, \quad \text {a.s. in } \Gamma . \end{aligned}$$
(3)

Moreover, the equation is well posed in the Bochner space, \(L^q(\Gamma ;V)\), for some \(q \ge 1\),Footnote 1 (see [1, 8] and the following discussion), provided that sufficiently high moments of the functions \(1/a_{\min }\) and \(a_{\max }\) are bounded. The goal of our computation is the approximation of an expected value,

$$\begin{aligned} {\mathbb {E}}\left[ F\right] = {\mathbb {E}}\left[ \Theta (u)\right] \in {\mathbb {R}}, \end{aligned}$$

where \(\Theta \) is a deterministic bounded and linear functional, and \(F({{\varvec{y}}}) = \Theta (u(\cdot ,{{\varvec{y}}}))\) is a real-valued random variable, \(F:\Gamma \rightarrow {\mathbb {R}}\). To this end, we utilize the Multi-index Stochastic Collocation (MISC) method, which we have introduced in a general setting in a previous work [22].

In MISC, we consider a decomposition in terms of tensorized univariate details (i.e., a tensorized hierarchical decomposition), for both the discrete space in which (1) is solved for a fixed value of \({{\varvec{y}}}\in \Gamma \) and for the quadrature operator used to approximate the expected value of \(F\), relying on the well-established theory of sparse-grid approximation of PDEs on the one hand [6, 7, 21, 26, 41] and of sparse-grid quadrature on the other hand [1, 6, 15, 34, 35, 40]. We use tensor products of such univariate details, obtaining combined deterministic-stochastic, first-order mixed differences to build the MISC estimator of \({\mathbb {E}}\left[ F\right] \) by selecting the most effective mixed differences with an optimization approach inspired by the literature on the knapsack problem (see, e.g., [30]). The knapsack approach also was used in [33] to obtain the so-called quasi-optimal sparse grids for PDEs with stochastic coefficients and in [6, 20] in the context of sparse-grid resolution of high-dimensional PDEs.

The resulting method can be seen as an extension of the sparse-grid combination technique for PDEs with stochastic coefficients, as well as a fully sparse, non-randomized version of the Multi-level Monte Carlo method [2, 9, 16, 27]. In particular, MISC differs from other works in the literature that attempt to optimally combine spatial and stochastic resolution levels [4, 25, 29, 37, 38] in two aspects. First, MISC uses combined deterministic-stochastic, first-order differences, which allows us to exploit not only the regularity of the solution with respect to the spatial variables and the stochastic parameters, but also the mixed deterministic-stochastic regularity whenever available. Second, the MISC estimator is built upon an optimization procedure, whereas the above-mentioned works try to balance the error contributions arising from the deterministic and stochastic components of the method without taking into account the corresponding costs. Finally, MISC can also be seen as a sparse-grid quadrature version of the Multi-index Monte Carlo method that was proposed and analyzed in [23].

In [22], MISC was introduced in a general setting and we restricted the analysis to the case of problems of type (1) depending on a finite number of random variables, \({{\varvec{y}}}\in \Gamma \subset {\mathbb {R}}^N\), with \(N < \infty \). Here, we provide a complexity analysis of MISC in the more challenging case in which the diffusion coefficient a depends on a countable sequence of random variables, \(\{y_j\}_{j\ge 1}\). Furthermore, we aim at tracking the dependence of the MISC converge rate on the smoothness of the realizations of a. This new framework requires that the tools used to prove the complexity of the method be changed: while in [22] we used a “direct counting” argument, i.e., we derived a complexity estimate by explicitly summing the work and the error contributions associated with each mixed difference included in the MISC estimator, here instead we base our proof on a summability argument and on suitable interpolation estimates in mixed regularity spaces. We mention that in [14] an infinite dimensional analysis based on a direct counting argument was recently carried out in the case of hyperbolic cross-type index sets that might arise when quasi-optimizing the work contribution of sparse grid stochastic collocation without spatial discretization.

The rest of this work is organized as follows. Section 2 introduces suitable assumptions and a class of random diffusion coefficients that we consider throughout the work; functional analysis results that are needed for the subsequent analysis of the MISC method are also provided. The MISC method is reviewed in Sect. 3. A complexity analysis of MISC with an infinite number of random variables is carried out in Sect. 4, where we provide a general convergence theorem. In Sect. 5, we discuss the application of MISC to the specific class of diffusion coefficients that we consider here and track the dependence of the convergence rate on the regularity of the diffusion coefficient. Section 6 presents some numerical tests to verify the convergence analysis conducted in the previous section. Finally, Sect. 7 provides some conclusions and final remarks. In the Appendix, we include some technical results on the summability and regularity properties of certain random fields written in terms of their series expansion.

In the following, \({\mathbb {N}}\) denotes the set of integer numbers including zero, while \({\mathbb {N}}_+\) denotes the set of positive integer numbers excluding zero. We refer to sequences in \({\mathbb {N}}^{{\mathbb {N}}_+}\) and \({\mathbb {N}}_+^{{\mathbb {N}}_+}\) as “multi-indices.” Moreover, we often use a vector notation for sequences, i.e., we formally treat sequences as vectors in \({\mathbb {N}}^{{\mathbb {N}}_+}\) (or \({\mathbb {R}}^{{\mathbb {N}}_+}\)) and mark them with bold type. We employ the following notation, with the understanding that \(N < \infty \) for actual vectors and \(N=\infty \) for sequences:

  • \(\mathbf{1}\) denotes a vector in \({\mathbb {N}}^N\) whose components are all equal to one;

  • \({{\varvec{e}}}^N_\ell \) denotes the \(\ell \)-th canonical vector in \({\mathbb {R}}^N\), i.e., \(\left( {{\varvec{e}}}^N_\ell \right) _i = 1\) if \(\ell =i\) and zero otherwise; however, for the sake of readability, we often omit the superscript N whenever it is obvious from context. For instance, if \({{\varvec{v}}}\in {\mathbb {R}}^N\), we write \({{\varvec{v}}}- {{\varvec{e}}}_1\) instead of \({{\varvec{v}}}- {{\varvec{e}}}_1^N\);

  • given \({{\varvec{v}}}\in {\mathbb {R}}^N\), \(|{{\varvec{v}}}| = \sum _{i=1}^N v_i\), \(|{{\varvec{v}}}|_0\) denotes the number of nonzero components of \({{\varvec{v}}}\), \(\max ({{\varvec{v}}}) = \max _{i=1,\ldots N}v_i\) and \(\min ({{\varvec{v}}}) = \min _{i=1,\ldots N}v_i\);

  • \({\mathfrak {L}}_+\) denotes the set of sequences with positive components with only finitely many elements larger than 1, i.e., \({\mathfrak {L}}_+ = \{ {{\varvec{p}}}\in {\mathbb {N}}_+^{{\mathbb {N}}_+} : |{{\varvec{p}}}-\mathbf{1}|_0 <\infty \}\);

  • given \({{\varvec{v}}}\in {\mathbb {R}}^N\) and \(f:{\mathbb {R}}\rightarrow {\mathbb {R}}\), \(f({{\varvec{v}}})\) denotes the vector obtained by applying f to each component of \({{\varvec{v}}}\), \(f({{\varvec{v}}}) = [f(v_1),f(v_2),\ldots ,f(v_N)] \in {\mathbb {R}}^N\);

  • given \({{\varvec{v}}}, {{\varvec{w}}}\in {\mathbb {R}}^N\), the inequality \({{\varvec{v}}}> {{\varvec{w}}}\) holds true if and only if \(v_i > w_i\) \(\forall i=1,\ldots ,N\);

  • given \({{\varvec{v}}}\in {\mathbb {R}}^D\) and \({{\varvec{w}}}\in {\mathbb {R}}^N\), we denote their concatenation by \([{{\varvec{v}}}, {{\varvec{w}}}] = (v_1,\ldots ,v_D,w_1,\ldots , w_N) \in {\mathbb {R}}^{D+N}\);

  • given a set with finite cardinality, \({\mathcal {G}}\subset {\mathbb {N}}_+\), we define the set \({\mathbb {N}}^{\mathcal {G}}= \{{{\varvec{z}}}\in \mathbb {{\mathbb {N}}}^{{\mathbb {N}}_+}: z_j=0,\;\forall j\notin {\mathcal {G}}\}\). We similarly define \({\mathbb {R}}^{\mathcal {G}}\) and \({\mathbb {C}}^{\mathcal {G}}\).

2 Functional Setting

Even though condition (3) ensures a.s. well-posedness of (1) in V, we need to make sure that realizations of u a.s. belong to more regular spaces to prove a convergence rate result for MISC. More specifically, due to the classic spatial sparse-grid approximation theory, we need certain conditions on the mixed derivatives of u with respect to the physical coordinates. To this end, we introduce suitable functional spaces (tensor products of fractional Sobolev spaces, see also [19]) and then a “shift regularity” assumption (see Assumption A1), i.e., we assume that the regularity of the realizations of u is induced “in a natural way” by the regularity of a, of the forcing, \(\varsigma \), and of the smoothness of the physical domain, \({\mathscr {B}}\). In other words, we rule out “pathological”/“ad hoc” examples in which u is very regular despite that the data are not, e.g., when the forcing is chosen such that \(u \in C^q\) for \(q>2\) even in the presence of a domain with corners. First, recall the definition of a fractional Sobolev space for \(l_i \in {\mathbb {R}}_+ \setminus {\mathbb {N}}_+\) and \({\mathscr {B}}_i \subset {\mathbb {N}}^{d_i}\):

$$\begin{aligned}&H^{l_i}({\mathscr {B}}_i) \\&\quad = \left\{ u \in H^{\lfloor l_i\rfloor }({\mathscr {B}}_i) : \sup _{{{{\varvec{\alpha }}}}\in {\mathbb {N}}^{d_i}, |{{{\varvec{\alpha }}}}| = \lfloor l_i \rfloor } \int _{{\mathscr {B}}_i} \int _{{\mathscr {B}}_i} \frac{|D^{{{{\varvec{\alpha }}}}} u({{\varvec{x}}})- D^{{{{\varvec{\alpha }}}}}u({{\varvec{x}}}')|^{2}}{|{{\varvec{x}}}-{{\varvec{x}}}'|^{d_i+ 2(l_i -\lfloor l_i \rfloor )}}d{{\varvec{x}}}d{{\varvec{x}}}' < \infty \right\} , \end{aligned}$$

extending the definition of a standard Sobolev space \(H^{l_i}({\mathscr {B}}_i)\) for \(l_i\) integer. The tensorized fractional Sobolev space can then be defined as

$$\begin{aligned} H^{{{\varvec{l}}}}({\mathscr {B}}) = H^{l_1}({\mathscr {B}}_1) \otimes \cdots \otimes H^{l_D}({\mathscr {B}}_D) \end{aligned}$$

for \({{\varvec{l}}}= \left( l_i \right) _{i=1}^D \in {\mathbb {R}}^{D}_+\).Footnote 2 Finally, the mixed fractional Sobolev spaces, that we will need for our analysis, can be defined for each \({{\varvec{q}}}\in {\mathbb {R}}_+^D\) as

$$\begin{aligned} {\mathcal {H}}^{\mathbf{1}+{{\varvec{q}}}}({\mathscr {B}}) = \bigcap _{j=1}^D H^{{{\varvec{e}}}_j +{{\varvec{q}}}}({\mathscr {B}}). \end{aligned}$$

Observe that, while mixed fractional spaces, \({\mathcal {H}}^{\mathbf{1}+{{\varvec{q}}}}({\mathscr {B}})\), are the proper setting for the forthcoming analysis, we will, for ease of presentation, not look for the most general mixed space in which the solution lives. Instead, we will be content with deducing mixed regularity from inclusions in standard Sobolev spaces, to the point that Assumption A1 will be written in terms of standard Sobolev spaces. For this, we observe that \({\mathcal {H}}^{\mathbf{1}}({\mathscr {B}}) = H^1({\mathscr {B}})\) holds and in general we have the following inclusion result between standard and mixed fractional Sobolev spaces:

$$\begin{aligned} u\in H^{1+r}({\mathscr {B}}) \Rightarrow u \in {\mathcal {H}}^{\mathbf{1}+r{{\varvec{q}}}}({\mathscr {B}}) \quad \text {for } r \in (0,\infty ) \quad \text { and }\quad 0<|{{\varvec{q}}}| \le 1. \end{aligned}$$
(4)

Before stating precisely the shift-regularity assumption on u, we need some more notation and setup. First, observe that this assumption needs to be stated in the complex domain, for reasons that will be made clear later. We therefore extend the diffusion coefficient from \(a(\cdot ,{{\varvec{y}}})\) with \({{\varvec{y}}}\in \Gamma \) to \(a(\cdot ,{{\varvec{z}}})\) with \({{\varvec{z}}}\in {\mathbb {C}}^{{\mathbb {N}}_+}\), so that the corresponding solution of (1), \(u(\cdot ,{{\varvec{z}}})\), becomes a \(H^1_0({\mathscr {B}})\) function taking values in \({\mathbb {C}}\), i.e., \(u(\cdot ,{{\varvec{z}}})\in H^1_0({\mathscr {B}},{\mathbb {C}})\). Since the complex-valued version of problem (1) is well posed as long as there exists \(\delta \) such that \(\mathfrak {Re}\left[ a({{\varvec{x}}},{{\varvec{z}}})\right]> \delta > 0\) for almost every (a.e.) \({{\varvec{x}}}\in {\mathscr {B}}\), and since our approximation method will cover the countable set of parameters \({{\varvec{z}}}\in {\mathbb {C}}^{{\mathbb {N}}_+}\) by multiple subsets of finite cardinality, we define the following region in \({\mathbb {C}}^{\mathcal {G}}\) for a set of finite cardinality, \({\mathcal {G}}\subset {\mathbb {N}}_+\):

$$\begin{aligned} \Sigma _{{\mathcal {G}},\delta } = \left\{ {{\varvec{z}}}\in {\mathbb {C}}^{\mathcal {G}}: \mathfrak {Re}\left[ a({{\varvec{x}}},{{\varvec{z}}})\right] \ge \delta >0 {\text { for a.e. }} {{\varvec{x}}}\in {\mathscr {B}} \right\} . \end{aligned}$$
(5)

We are now ready to state the assumption on the link between the regularity of the coefficient, a, and the regularity of solution, u.

Assumption A1

(Shift assumption) For a given \({\mathscr {B}}\), let \(\psi _j \in C^t(\overline{{\mathscr {B}}})\) (cf. eq. (2)) and \(\varsigma \in C^\infty _0(\overline{{\mathscr {B}}})\). We assume that there exists r such that \(1< r < t\) and that, for any finite set \({\mathcal {G}}\subset {\mathbb {N}}_+\) and any \({{\varvec{z}}}\in \Sigma _{{\mathcal {G}},\delta }\), the three following conditions hold:

  1. 1.

    \(u(\cdot ,{{\varvec{z}}}) \in H^{1+r}({\mathscr {B}},{\mathbb {C}}) \cap H^1_0({\mathscr {B}},{\mathbb {C}})\);

  2. 2.

    \(\frac{D u(\cdot ,z_j)}{ D z_j} \in H^{1 + r}({\mathscr {B}},{\mathbb {C}}),\quad \forall j\in {\mathcal {G}}\), where \(\frac{D u(\cdot ,z_j)}{ D z_j}\) denotes the partial complex derivative of u;

  3. 3.

    \(\left\| u(\cdot ,{{\varvec{z}}}) \right\| _{H^{1+s}({\mathscr {B}},{\mathbb {C}})} \le C(\delta ,s,\varsigma ,{\mathscr {B}})\left\| a(\cdot ,{{\varvec{z}}}) \right\| _{C^s(\overline{{\mathscr {B}}},{\mathbb {C}})} \), with \(C(\delta ,s,\varsigma ,{\mathscr {B}})\rightarrow \infty \) for \(\delta \rightarrow 0\), for every \(s=1,\ldots ,\lfloor r \rfloor \).

In the following, we will need to ensure that \(\left\| u(\cdot , {{\varvec{z}}}) \right\| _{{H^{1+s}(\mathscr {B,{\mathbb {C}}})}} \), for \(s > 0\), is uniformely bounded for all \({{\varvec{z}}}\) in certain subregions of the complex plane. Note that this is a stronger condition than what is stated in the previous assumption, where we only assumed pointwise control on the norms of u (i.e., we gave a bound that depends on \({{\varvec{z}}}\)). In particular, we show at the end of this section that the possibility of having such a uniform bound depends on certain summability properties of the diffusion coefficient. Toward this end, we state the following assumption, which also guarantees the well-posedness of Problem (1).

Assumption A2

(Summability of the diffusion coefficient) For every \(s = 0,1,\ldots ,s_{\max } \le r\), define the sequences \({{\varvec{b}}}_{s}=\{b_{s,j}\}_{j \ge 1}\) where

$$\begin{aligned}&b_{s,j} \, =\, \max _{{{\varvec{s}}} \in {\mathbb {N}}^d : |{{\varvec{s}}}| \le s} \left\| D^{{{\varvec{s}}}} \psi _j \right\| _{L^\infty ({\mathscr {B}})} ,\quad j \ge 1. \end{aligned}$$
(6)

We assume that an increasing sequence \(\{p_s\}_{s=0}^{s_{\max }}\) exists such that \(0< p_0 \le \cdots \le p_{s_{\max }}<\frac{1}{2}\) and \({{\varvec{b}}}_{s} \in \ell ^{p_s}\), i.e.,

$$\begin{aligned} \left\| {{\varvec{b}}}_s \right\| _{\ell ^{p_s}} ^{p_s} = \sum _{j \ge 1} b_{s,j}^{p_s} < \infty . \end{aligned}$$
(7)

We observe that with the above assumption, \(b_{s,j} \rightarrow 0^+\) as \(j \rightarrow \infty \) and \(0\le b_{0,j}\le b_{s,j}\) for every \(s=0,1,\ldots ,s_{\max }\). Moreover, given Assumption A2, we have that \({{\varvec{b}}}_0 \in \ell ^1\), which, together with the fact that \(y_j \in [-1,1]\) for \(j \ge 1\), guarantees that condition (3) holds and therefore that (1) is well posed in V a.s. in \(\Gamma \). Incidentally, we observe that the conditions in Assumption A2 are sufficient but not necessary for condition (3) to hold: Indeed, one would only need \({{\varvec{b}}}_{s} \in \ell ^2\) for some integer \(s \ge 1\), see Lemma 15 (and Corollary 16 for a specific example) in the Appendix.

As suggested above, the fact that, for a fixed s, the sequence \({{\varvec{b}}}_s\) is \(p_s\)-summable plays a central role in this work. Indeed, if \({{\varvec{b}}}_s\) is \(p_s\)-summable, we show that \(\left\| u(\cdot ,{{\varvec{z}}}) \right\| _{H^{1+s}({\mathscr {B}},{\mathbb {C}})} \) is uniformely bounded with respect to \({{\varvec{z}}}\) in a region of the complex plane whose size is proportional to \(\left\| {{\varvec{b}}}_s \right\| _{\ell ^{p_s}} ^{p_s}\). We use this fact to show convergence of the MISC method, with the convergence rate dictated by both \(p_0\) and \(p_s\). In Theorem 10, we detail how to optimally choose the value of s in the range \(0,1,\ldots ,s_{\max }\), which is the main result of this work. Restricting the range of values of s by \(p_s < \frac{1}{2}\) is not crucial; we could relax this to \(p_{s} < 1\). However, we follow this more stringent assumption because it considerably simplifies some technical steps in the following discussion without affecting the main part of the proof, as we make clear below (see Remark 4). What is important is that \(s_{\max }\) might be strictly smaller than \(\lfloor r \rfloor \) (i.e., it could happen that \({{\varvec{b}}}_r\) is \(p_r\)-summable but with \(p_r > \frac{1}{2}\), or not summable at all); in this case, the line of proof we propose does not fully exploit the regularity of the solution, u.

Example 1

In the numerical section of this work, we consider either \({\mathscr {B}} = [0,1]\), i.e., \(d=D=d_1=1\), or \({\mathscr {B}} = [0,1]^3\), i.e., \(d=D=3\), \(d_i=1\) and \({\mathscr {B}}_i=[0,1]\) for \(i=1,2,3\). In both cases, we consider the following form for \(\kappa ({{\varvec{x}}},{{\varvec{y}}})\):

$$\begin{aligned} \kappa ({{\varvec{x}}},{{\varvec{y}}}) = \sum _{{{\varvec{k}}} \in {\mathbb {N}}^d} A_{{{\varvec{k}}}} \sum _{{{\varvec{\ell }}} \in \{0,1\}^d} y_{{{\varvec{k}}},{{\varvec{\ell }}} } \, \prod _{i=1}^d \left( \cos \left( {\pi } k_i x_i \right) \right) ^{\ell _i} \left( \sin \left( {\pi } k_i x_i \right) \right) ^{1-\ell _i}. \end{aligned}$$
(8)

Observe that it is possible to write \(\kappa \) in the form of (2) using a bijective mapping from \(\{y_{{{\varvec{k}}},{{\varvec{\ell }}}}\}_{{{\varvec{k}}} \in {\mathbb {N}}^d, {{\varvec{\ell }}} \in \{0,1\}^d}\) to \(\{y_j\}_{j \ge 1}\). We also choose the following values for the \(A_{{{\varvec{k}}}}\) coefficients:

$$\begin{aligned} A_{{{\varvec{k}}}} {= \left( \sqrt{3}\right) } 2^{\frac{|{{\varvec{k}}}|_0}{2}}(1 + |{{\varvec{k}}}|^2)^{-\frac{\nu +d/2}{2}}, \end{aligned}$$
(9)

for some \(\nu >0\). We observe that \(\nu \) is a parameter dictating the \({{\varvec{x}}}\)-regularity of the realizations of \(\kappa \), hence of a. Moreover, the parameters \(\nu \) and d govern the \(p_s\)-summability of the sequence \({{\varvec{b}}}_s\) for any s and, as a consequence, the overall convergence of the MISC method, as discussed earlier. Section 5 analyzes the summability properties of the series (8).

We conclude this preliminary section by making the shape of the above-mentioned regions in the complex plane more precise and showing how their sizes depend on the summability properties of a. In particular, we will exploit the fact that for any finite set \({\mathcal {G}}\subset {\mathbb {N}}_+\), for every \(s=0,1,\ldots ,s_{\max }\) and for any \({{\varvec{z}}}\in {\mathbb {C}}^{\mathcal {G}}\) we have \(\kappa (\cdot ,{{\varvec{z}}})\in C^{s}(\overline{{\mathscr {B}}},{\mathbb {C}})\), \(\left\| \kappa (\cdot ,{{\varvec{z}}}) \right\| _{C^{s}(\overline{{\mathscr {B}}},{\mathbb {C}})} \le \sum _{j\in {\mathcal {G}}}|z_j| b_{s,j}\) and infer, from the multivariate Faà di Bruno formula (see “Appendix 1” and [13]), that \(a(\cdot ,{{\varvec{z}}})\in C^{s}(\overline{{\mathscr {B}}},{\mathbb {C}})\) as well, with the estimate

$$\begin{aligned} \left\| a(\cdot ,{{\varvec{z}}}) \right\| _{C^s(\overline{{\mathscr {B}}},{\mathbb {C}})} \le \frac{s!}{(\log 2)^s}\left\| a(\cdot ,{{\varvec{z}}}) \right\| _{C^0(\overline{{\mathscr {B}}},{\mathbb {C}})} (1+\left\| \kappa (\cdot ,{{\varvec{z}}}) \right\| _{C^s(\overline{{\mathscr {B}}})} )^s, \quad \forall {{\varvec{z}}}\in {\mathbb {C}}^{\mathcal {G}}. \end{aligned}$$
(10)

Next, for a given \(\zeta >1\), let \({\mathcal {E}}_{\zeta }\) denote the polyellipse in the complex plane

$$\begin{aligned} {\mathcal {E}}_{\zeta } = \bigg \{ z \in {\mathbb {C}} : \,\, \mathfrak {Re}\left[ z\right] \le \frac{\zeta + \zeta ^{-1}}{2}\cos {\vartheta }, \,\,\, \mathfrak {Im}\left[ z \right] \le \frac{\zeta - \zeta ^{-1}}{2}\sin {\vartheta }, \,\, \vartheta \in [0, 2\pi ) \bigg \}. \end{aligned}$$

For any sequence \({{\varvec{\zeta }}} = \{\zeta _{j}\}_{j \ge 1}\) with \(\zeta _{j}>1\) for every \(j \ge 1\) and for any finite set, \({\mathcal {G}}\subset {\mathbb {N}}_+\), we introduce the Bernstein polyellipse:

$$\begin{aligned} {\mathcal {E}}_{{{\varvec{\zeta }}}}^{\mathcal {G}}= \{ {{\varvec{z}}}\in {\mathbb {C}}^{{\mathcal {G}}} : z_j \in {\mathcal {E}}_{\zeta _j} \text { for all } j \in {\mathcal {G}}\}. \end{aligned}$$
(11)

Lemma 1

(Holomorphic complex continuation of u in \(H^1_0({\mathscr {B}};{\mathbb {C}})\) in a Bernstein polyellipse) Consider the sequence \({{\varvec{b}}}_0\) defined in (6). For any \(\delta >0\), let \(E_\delta >2\) be such that

$$\begin{aligned} \frac{\pi }{E_\delta } = -\left\| {{\varvec{b}}}_0 \right\| _{\ell ^1} -\log \delta + \log \cos \left( \frac{\pi }{E_\delta } \right) , \end{aligned}$$

and consider the sequence \({{\varvec{\zeta }}}_0 = \{\zeta _{0,j}\}_{j \ge 1}\), with

$$\begin{aligned} \zeta _{0,j}&= \tau _{0,j} + \sqrt{\tau _{0,j}^2+1} > 1 \end{aligned}$$
(12)
$$\begin{aligned} \tau _{0,j}&= \frac{\pi }{E_\delta } \frac{(b_{0,j})^{p_0-1}}{\left\| {{\varvec{b}}}_0 \right\| _{\ell ^{p_0}} ^{p_0}}, \end{aligned}$$
(13)

with \(p_0\) as in (7). Then, for any finite set \({\mathcal {G}}\subset {\mathbb {N}}_+\), the solution, u, admits a holomorphic complex continuation, \(u:{\mathbb {C}}^{\mathcal {G}}\rightarrow H^{1}_0({\mathscr {B}},{\mathbb {C}})\), in the Bernstein polyellipse, \({\mathcal {E}}_{{{\varvec{\zeta }}}_0}^{\mathcal {G}}\subset \Sigma _{{\mathcal {G}},\delta }\), with

$$\begin{aligned} \sup _{{{\varvec{z}}}\in {\mathcal {E}}_{{{\varvec{\zeta }}}_0}^{\mathcal {G}}} \left\| u(\cdot ,{{\varvec{z}}}) \right\| _{H^{1}({\mathscr {B}})} \le C_{0,u} = \frac{\left\| \varsigma \right\| _{H^{-1}({\mathscr {B}})} }{\delta } <\infty , \end{aligned}$$

with \(C_{0,u}\) independent of \({\mathcal {G}}\).

Proof

It is well known in the literature that \(u:{\mathbb {C}}^{\mathcal {G}}\rightarrow H^{1}_0({\mathscr {B}},{\mathbb {C}})\) is holomorphic in the region \(\Sigma _{{\mathcal {G}},\delta }\) defined in (5) (see, e.g., [1]). To compute the parameters \(\{ \zeta _j\}_{j \in {\mathcal {G}}}\) of a Bernstein polyellipse contained in \(\Sigma _{{\mathcal {G}},\delta }\), we rewrite \(a({{\varvec{x}}},{{\varvec{z}}})\) as

$$\begin{aligned} a({{\varvec{x}}},{{\varvec{z}}})&= \exp \left( \sum _{j \in {\mathcal {G}}} z_j \psi _j({{\varvec{x}}}) \right) \\&= \exp \left( \sum _{j \in {\mathcal {G}}} \mathfrak {Re}\left[ z_j\right] \psi _j({{\varvec{x}}}) \right) \exp \left( \sum _{j \in {\mathcal {G}}} i \mathfrak {Im}\left[ z_j \right] \psi _j({{\varvec{x}}}) \right) \\&= \exp \left( \sum _{j \in {\mathcal {G}}} \mathfrak {Re}\left[ z_j\right] \psi _j({{\varvec{x}}}) \right) \left[ \cos \left( \sum _{j \in {\mathcal {G}}} \mathfrak {Im}\left[ z_j \right] \psi _j({{\varvec{x}}}) \right) \right. \\&\left. \qquad +\, i \sin \left( \sum _{j \in {\mathcal {G}}} \mathfrak {Im}\left[ z_j \right] \psi _j({{\varvec{x}}}) \right) \right] , \end{aligned}$$

so that \(\Sigma _{{\mathcal {G}},\delta }\) can be rewritten as

$$\begin{aligned} \Sigma _{{\mathcal {G}},\delta }= & {} \left\{ {{\varvec{z}}}\in {\mathbb {C}}^{\mathcal {G}}: \exp \left( \sum _{j \in {\mathcal {G}}} \mathfrak {Re}\left[ z_j\right] \psi _j({{\varvec{x}}}) \right) \cos \left( \sum _{j \in {\mathcal {G}}} \mathfrak {Im}\left[ z_j \right] \psi _j({{\varvec{x}}}) \right) \right. \\&\quad \left. \ge \delta \,\, \text { for a.e. } {{\varvec{x}}}\in {\mathscr {B}} \right\} . \end{aligned}$$

Now, for some \(E>2\) that we choose in the following, the two following conditions on \({{\varvec{z}}}\) imply that \({{\varvec{z}}}\in \Sigma _{{\mathcal {G}},\delta }\):

$$\begin{aligned} {\left\{ \begin{array}{ll} \cos \Big ( \sum _{j \in {\mathcal {G}}} |\mathfrak {Im}\left[ z_j \right] | \,b_{0,j} \Big ) \ge \displaystyle \cos \left( \frac{\pi }{E} \right) \\ \exp \Big ( -\sum _{j \in {\mathcal {G}}} |\mathfrak {Re}\left[ z_j\right] | \,b_{0,j} \Big ) \ge \displaystyle \frac{\delta }{\cos \left( \frac{\pi }{E} \right) }; \end{array}\right. } \end{aligned}$$

equivalently, we write

$$\begin{aligned} {\left\{ \begin{array}{ll} \sum _{j \in {\mathcal {G}}} |\mathfrak {Im}\left[ z_j \right] | \,b_{0,j} \le \displaystyle \frac{\pi }{E} \\ \sum _{j \in {\mathcal {G}}} |\mathfrak {Re}\left[ z_j\right] | \,b_{0,j} \le \displaystyle -\log \delta + \log \cos \left( \frac{\pi }{E} \right) . \end{array}\right. } \end{aligned}$$

For a fixed value of E, the equations above define a second region, \(\Omega _{{\mathcal {G}},\delta }\), included in \(\Sigma _{{\mathcal {G}},\delta }\). In turn, the previous conditions are verified if the following conditions, which define a hyper-rectangular region, \({\mathcal {R}}_\delta \subset \Omega _{{\mathcal {G}},\delta }\), are verified:

$$\begin{aligned} {\left\{ \begin{array}{ll} \displaystyle |\mathfrak {Im}\left[ z_j \right] | \le \tau _{0,j} = \frac{\pi (b_{0,j})^{p_0-1}}{E \left\| {{\varvec{b}}}_0 \right\| _{\ell ^{p_0}} ^{p_0}}, \\ \displaystyle |\mathfrak {Re}\left[ z_j\right] | \!\le \!1\!+\! w_{0,j}, \quad \text { with } w_{0,j} \!=\! \frac{(b_{0,j})^{p_0-1}}{ \left\| {{\varvec{b}}}_0 \right\| _{\ell ^{p_0}} ^{p_0}} \left( -\left\| {{\varvec{b}}}_0 \right\| _{\ell ^1} \!-\!\log \delta \!+\! \log \cos \left( \frac{\pi }{E} \right) \!\right) \!, \end{array}\right. } \end{aligned}$$

provided that \(\delta \) and E are such that the quantity \(-\left\| {{\varvec{b}}}_0 \right\| _{\ell ^1} -\log \delta + \log \cos \left( \frac{\pi }{E} \right) \) remains positive. Observe that for sufficiently small \(\delta >0\) such E exists, since \(f(E) = \log \cos \left( \frac{\pi }{E} \right) \) is a monotonically increasing function, with \(f(E) \rightarrow -\infty \) for \(E \rightarrow 2\) and \(f(E) \rightarrow 0\) for \(E \rightarrow \infty \), and \(-\log \delta \) is positive for sufficiently small \(\delta \). In particular, for any \(\delta >0\), we choose \(E=E_\delta \) such that \(w_{0,j} = \tau _{0,j}\), which leads to

$$\begin{aligned} \frac{\pi }{E_\delta } = -\left\| {{\varvec{b}}}_0 \right\| _{\ell ^1} -\log \delta + \log \cos \left( \frac{\pi }{E_\delta } \right) . \end{aligned}$$

We observe that with this choice, \(\tau _{0,j}\) (and hence \(w_{0,j}\)) actually does not depend on \({\mathcal {G}}\), therefore we can define the sequence \({{\varvec{\tau }}}_0 = \{\tau _{0,j}\}_{j \ge 1}\).

We are now in the position to compute the Bernstein polyellipses that touch the boundary of \({\mathcal {R}}_\delta \) on the real and imaginary axes. For the real axis, we have to enforce

$$\begin{aligned} \frac{\zeta _{j,{\text {real}}}+\zeta _{j,{\text {real}}}^{-1}}{2} = 1+\tau _{0,j} \Rightarrow \zeta _{j,{\text {real}}} = 1 + \tau _{0,j} + \sqrt{(1+\tau _{0,j})^2-1}, \end{aligned}$$

while for the imaginary axis we have to enforce

$$\begin{aligned} \frac{\zeta _{j,{\text {imag}}}-\zeta _{j,{\text {imag}}}^{-1}}{2} = \tau _{0,j} \Rightarrow \zeta _{j,{\text {imag}}} = \tau _{0,j} + \sqrt{\tau _{0,j}^2+1}. \end{aligned}$$

The proof is concluded by observing that \(\zeta _{j,{\text {imag}}} \le \zeta _{j,{\text {real}}}\), i.e., the only polyellipse entirely contained in \({\mathcal {R}}_\delta \), and hence in \(\Sigma _{{\mathcal {G}},\delta }\), is the one touching \({\mathcal {R}}_\delta \) on the imaginary axis, which also implies that the bound \(\sup _{{{\varvec{z}}}\in {\mathcal {E}}_{{{\varvec{\zeta }}}}^{\mathcal {G}}} \left\| u(\cdot ,{{\varvec{z}}}) \right\| _{H^{1}({\mathscr {B}})} \le C_{0,u} = \frac{\left\| \varsigma \right\| _{H^{-1}({\mathscr {B}})} }{\delta } <\infty \) holds independently of \({\mathcal {G}}\). \(\square \)

Lemma 2

(Holomorphic complex continuation of u in \(H^{1+s}({\mathscr {B}};{\mathbb {C}})\) in a Bernstein polyellipse) For a given \(s=1,2,\ldots ,s_{\max }\), let \({{\varvec{\zeta }}}_s = \{\zeta _{s,j}\}_{j \ge 1}\), with

$$\begin{aligned} \zeta _{s,j}&= \tau _{s,j} + \sqrt{\tau _{s,j}^2+1} > 1, \end{aligned}$$
(14)
$$\begin{aligned} \tau _{s,j}&= \frac{\pi (b_{s,j})^{p_s-1}}{E_\delta \left\| {{\varvec{b}}}_s \right\| _{\ell ^{p_s}} ^{p_s}}, \end{aligned}$$
(15)

with \({{\varvec{b}}}_s\) as in (6), \(p_s\) as in (7), and \(E_\delta \) as in Lemma 1. For any finite set \({\mathcal {G}}\subset {\mathbb {N}}_+\), \(u:{\mathbb {C}}^{\mathcal {G}}\rightarrow H^{1+s}({\mathscr {B}},{\mathbb {C}})\) is holomorphic in the Bernstein polyellipse \({\mathcal {E}}_{{{\varvec{\zeta }}}_s}^{\mathcal {G}}\subset \Sigma _{{\mathcal {G}},\tilde{\delta }}, { with}\)

$$\begin{aligned} \sup _{{{\varvec{z}}}\in {\mathcal {E}}_{{{\varvec{\zeta }}}_s}} \left\| u(\cdot ,{{\varvec{z}}}) \right\| _{H^{1+s}({\mathscr {B}})} \le C_{s,u} = C(\tilde{\delta },s, \varsigma , {\mathscr {B}}) M < \infty , \end{aligned}$$
(16)

where \(M = \frac{s!}{(\log 2)^s} e^{K \frac{\pi }{E_\delta }} \left( 1 + K \frac{\pi }{E_\delta } \right) ^s\), \(K= \left( 2 + \frac{1}{\min _{j \ge 1} \tau _{s,j}^2}\right) ^{1/2}\), \(\tilde{\delta }= e^{- K \frac{\pi }{E_\delta }}\), \(C(\tilde{\delta },s, \varsigma , {\mathscr {B}})\) as in Assumption A1, and \(C_{s,u}\) independent of \({\mathcal {G}}\).

Proof

From Assumption A1, \(u : {\mathbb {C}}^{\mathcal {G}}\rightarrow H^{1+s}({\mathscr {B}},{\mathbb {C}})\) is complex differentiable for every \({{\varvec{z}}}\) in \(\Sigma _{{\mathcal {G}},\varepsilon }\) for any \(\varepsilon > 0\). It is therefore holomorphic in \(\Sigma _{{\mathcal {G}},\varepsilon }\). Similarly to the previous lemma, we look for a region in which we have an a-priori bound on the \(H^{1+s}({\mathscr {B}},{\mathbb {C}})\) norm of u uniformly on \({{\varvec{z}}}\). Again from Assumption A1, we have that this is true in the region

$$\begin{aligned} \Xi _{{\mathcal {G}},\varepsilon }(M) = \{ {{\varvec{z}}}\in {\mathbb {C}}^{\mathcal {G}}: \left\| a(\cdot ,{{\varvec{z}}}) \right\| _{C^s(\overline{{\mathscr {B}}})} \le M \} \cap \Sigma _{{\mathcal {G}},\varepsilon } \quad \text { for any } \varepsilon >0. \end{aligned}$$

However, contrary to the previous lemma, in this proof, we do not derive the expression of a polyellipse contained in \(\Xi _{{\mathcal {G}},\varepsilon }(M)\), but content ourselves with verifying that the polyellipses, \({\mathcal {E}}_{{\mathcal {G}},{{\varvec{\zeta }}}_s}\), proposed in the statement of the lemma (that we have obtained simply by replacing \(b_{0,j}\) with \(b_{s,j}\) in (13)) satisfy the requirement, i.e., \({\mathcal {E}}_{{\mathcal {G}},{{\varvec{\zeta }}}_s} \subset \Xi _{{\mathcal {G}},\tilde{\delta }}(M)\), for every finite set, \({\mathcal {G}}\subset {\mathbb {N}}_+\), and for a certain \(\tilde{\delta }\) that we specify later to control the coercivity of the problem. To this end, let us consider the univariate polyellipse \({\mathcal {E}}_{\zeta _{s,j}}\). We first prove that this polyellipse is contained in the following complex rectangle:

$$\begin{aligned} {\mathcal {R}}_j = \{ z \in {\mathbb {C}} : |\mathfrak {Re}\left[ z\right] | \le \sqrt{1+\tau _{s,j}^2}, \,\, |\mathfrak {Im}\left[ z \right] | \le \tau _{s,j} \}. \end{aligned}$$

The bound on the imaginary part of z is a consequence of the choice of the polyellipse in (14), similarly to what was done in Lemma 1. For the real part, we compute the point \(z_0\) where the polyellipse intersects the real axis by equating

$$\begin{aligned} z_0&= \frac{\zeta _{s,j} + \frac{1}{\zeta _{s,j}}}{2} = \frac{\zeta _{s,j}^2 + 1}{2 \zeta _{s,j}} = \frac{\tau _{s,j}^2 + 1 + \tau _{s,j} \sqrt{\tau _{s,j}^2+1}}{\tau _{s,j} +\sqrt{\tau _{s,j}^2+1}} \\&= \left( \tau _{s,j}^2+1 + \tau _{s,j} \sqrt{\tau _{s,j}^2+1} \right) \left( \sqrt{\tau _{s,j}^2+1} - \tau _{s,j} \right) = \sqrt{1+\tau _{s,j}^2}. \end{aligned}$$

Furthermore, we observe that \(|z| \le \sqrt{1 + 2 \tau _{s,j}^2} \le K \tau _{s,j}\) for every \(z \in {\mathcal {R}}_j\) and some \(K>0\); for instance, we could look for the smallest \(\tau _{s,j}\), say \(\tau _{s,j^*}\), choose K accordingly, i.e., such that \((K^2-2)\tau _{s,j^*}^2 \ge 1\), and obtain the value in the statement of the lemma. Next, according to (10) and Assumption A2,

$$\begin{aligned} \left\| a(\cdot ,{{\varvec{z}}}) \right\| _{C^s(\overline{{\mathscr {B}}},{\mathbb {C}})} \le&\frac{s!}{(\log 2)^s}\left\| a(\cdot ,{{\varvec{z}}}) \right\| _{C^0(\overline{{\mathscr {B}}},{\mathbb {C}})} (1+\left\| \kappa (\cdot ,{{\varvec{z}}}) \right\| _{C^s(\overline{{\mathscr {B}}},{\mathbb {C}})} )^s\\ \le&\frac{s!}{(\log 2)^s} e^{\sum _{j \in {\mathcal {G}}} b_{0,j} |z_j|} \left( 1+\sum _{j \in {\mathcal {G}}} b_{s,j} |z_j|\right) ^s \end{aligned}$$

holds. We finish the proof by observing that for every, \({{\varvec{z}}}\in {\mathcal {E}}_{{\mathcal {G}},{{\varvec{\zeta }}}_s}\), we have

$$\begin{aligned} \sum _{j \in {\mathcal {G}}} b_{0,j} |z_j| \le \sum _{j \in {\mathcal {G}}} b_{s,j} |z_j| \le K \sum _{j \in {\mathcal {G}}} b_{s,j} \tau _{s,j} = K \frac{\pi }{E_\delta } \sum _{j \in {\mathcal {G}}} b_{s,j} \frac{(b_{s,j})^{p_s-1}}{\left\| {{\varvec{b}}}_s \right\| _{\ell ^{p_s}} ^{p_s}} \le K \frac{\pi }{E_\delta }, \end{aligned}$$

which gives uniform control of the norm of \(\left\| a(\cdot ,{{\varvec{z}}}) \right\| _{C^s(\overline{{\mathscr {B}}},{\mathbb {C}})} \) within \({\mathcal {E}}_{{{\varvec{\zeta }}}_s}^{\mathcal {G}}\) as required. More precisely, we have

$$\begin{aligned} \left\| a(\cdot ,{{\varvec{z}}}) \right\| _{C^s(\overline{{\mathscr {B}}},{\mathbb {C}})} \le M = \frac{s!}{(\log 2)^s} e^{K \frac{\pi }{E_\delta }} \left( 1 + K \frac{\pi }{E_\delta } \right) ^s, \quad \forall {{\varvec{z}}}\in {\mathcal {E}}_{{\mathcal {G}},{{\varvec{\zeta }}}_s}, \end{aligned}$$

which together with Assumption A1 gives the desired bound on \(\left\| u(\cdot ,{{\varvec{z}}}) \right\| _{H^{1+s}({\mathscr {B}})} \) in (16) and

$$\begin{aligned} \mathfrak {Re}\left[ a({{\varvec{x}}},{{\varvec{z}}})\right] \ge e^{- K \frac{\pi }{E_\delta }} =: \tilde{\delta } > 0. \end{aligned}$$

\(\square \)

The following result from [22, 33] is also needed. Since this result is concerned with the finite-dimensional case, i.e., \({\mathcal {G}}=\{1,2,\ldots ,N\}\) and \({{\varvec{\zeta }}} \in {\mathbb {R}}^N\), we write, for ease of notation, \({\mathcal {E}}_{{{\varvec{\zeta }}}}\) instead of \({\mathcal {E}}_{{{\varvec{\zeta }}}}^{\mathcal {G}}\), i.e., \({\mathcal {E}}_{{{\varvec{\zeta }}}} = \{ {{\varvec{z}}}\in {\mathbb {C}}^N : z_j \in {\mathcal {E}}_{\zeta _j} \text { for } j=1,2,\ldots ,N\}\).

Lemma 3

(Chebyshev expansion of a holomorphic function) Given \(q_j \in {\mathbb {N}}\), let \(\phi _{q_j}\) be the family of Chebyshev polynomials of the first kind on \([-1,1]\), i.e., \(|\phi _{q_j}(y)|\le 1\) for all \(y \in [-1,1]\), and, for \(N \in {\mathbb {N}}_+\) and any \({{\varvec{p}}}\in {\mathbb {N}}^N\), let \(\Phi _{{\varvec{p}}}({{\varvec{y}}}) =\prod _{j=1}^N \phi _{p_j}(y_j)\). If \(f:[-1,1]^N \rightarrow {\mathbb {R}}\) admits a holomorphic complex extension in a Bernstein polyellipse, \({\mathcal {E}}_{{{\varvec{\zeta }}}}\), for some \({{\varvec{\zeta }}} \in (1, \infty )^N\) and if there exists \(0< C_f < \infty \) such that \(\sup _{{{\varvec{z}}}\in {\mathcal {E}}_{{{\varvec{\zeta }}}}} |f({{\varvec{z}}})| \le C_f\), then f admits the following Chebyshev expansion:

$$\begin{aligned} f({{\varvec{y}}})&= \sum _{{{\varvec{p}}}\in {\mathbb {N}}^N} f_{{\varvec{p}}}\Phi _{{\varvec{p}}}({{\varvec{y}}}), \\ f_{{\varvec{p}}}&= \frac{1}{\int _{[-1,1]^N} \Phi _{{\varvec{p}}}^2({{\varvec{y}}}) \varrho _C({{\varvec{y}}}) d{{\varvec{y}}}} \int _{[-1,1]^N} f({{\varvec{y}}}) \Phi _{{\varvec{p}}}({{\varvec{y}}}) \varrho _C({{\varvec{y}}}) d{{\varvec{y}}}, \\&\quad \varrho _C({{\varvec{y}}}) = \prod _{j=1}^N \left( \sqrt{1-y_j^2}\right) ^{-1}, \end{aligned}$$

which converges uniformely in \({\mathcal {E}}_{{{\varvec{\zeta }}}}\). Moreover the following bound on the coefficients \(f_{{\varvec{p}}}\) holds:

$$\begin{aligned} |f_{{\varvec{p}}}| \le \sup _{{{\varvec{z}}}\in {\mathcal {E}}_{{{\varvec{\zeta }}}}} |f({{\varvec{z}}})| 2^{|{{\varvec{p}}}|_0} \prod _{j=1}^N \zeta _{j}^{-p_j}, \end{aligned}$$

where \(|{{\varvec{p}}}|_0\) denotes the number of nonzero elements of \({{\varvec{p}}}\).

3 The Multi-index Stochastic Collocation Method

In this section, we introduce approximations of \({\mathbb {E}}\left[ F\right] \) along the deterministic and stochastic dimensions and their decomposition in terms of tensorizations of univariate difference operators. We then recall the so-called mixed difference operators and the construction of the MISC estimator that was first introduced in [22] in a general setting.

3.1 Approximation Along the Deterministic and Stochastic Variables

Tensorized deterministic solver. Let \(\{{\mathbb {T}}_i\}_{i=1}^D\) be the triangulations/meshes of each of the subdomains \(\{{\mathscr {B}}_i\}_{i=1}^D\) composing the domain \({\mathscr {B}}\); denote by \(\{h_i\}_{i=1}^D\) the mesh size on each mesh \({\mathbb {T}}_i\); and let \(\bigotimes _{i=1}^D {\mathbb {T}}_i\) be the mesh for \({\mathscr {B}}\). Then, consider a numerical method for the approximation of the solution of (1) for a fixed value of the random variables, \({{\varvec{y}}}\), based on such a mesh, e.g., finite differences, finite volumes, tensorized finite elements, or h-refined splines, such as those used in the isogeometric context. The values of \(h_i\) are actually given as functions of a positive integer value, \(\alpha \ge 1\), referred to as a “deterministic discretization level”, i.e., \(h_i=h_i(\alpha )\). Observe that, in a more general setting, the mesh size needs not be a constant value over the subdomain \({\mathscr {B}}_i\) and could be for instance the result of a grading function intended to refine subregions of \({\mathscr {B}}_i\) as in Fig. 1d (see also [24, Remark 2.2] for further comments on locally refined meshes in the context of Multi-Level Monte Carlo methods). In this work, we restrict ourselves to constant h for ease of presentation. Given a multi-index, \({{{\varvec{\alpha }}}}\in {\mathbb {N}}^D_+\), we denote by \(u^{{{\varvec{\alpha }}}}({{\varvec{x}}},{{\varvec{y}}})\) the approximation of u obtained by setting \(h_i = h_i(\alpha _i)\) and use notation \(F^{{{\varvec{\alpha }}}}({{\varvec{y}}}) = \Theta [u^{{{\varvec{\alpha }}}}(\cdot ,{{\varvec{y}}})]\). More specifically, in the following we will consider

$$\begin{aligned} h_i=h_{0,i} 2^{-\alpha _i}, \quad \text {for } i=1,\ldots ,D \end{aligned}$$
(17)

and a method obtained by tensorizing piecewise multi-linear finite element spaces on each mesh, \(\{{\mathbb {T}}_i\}_{i=1}^D\), discretizing each \(\{{\mathscr {B}}_i\}_{i=1}^D\).

As already mentioned in the previous section, MISC could also be applied to more general domains, such as those discussed in Fig. 1, as long as some kind of “tensor structure” can be induced from the shape of the domain to the solver of the deterministic problem and the vector \({{{\varvec{\alpha }}}}\) determines the refinement level of each component of such a tensor structure. The reason why we need such tensor structure will be made clear when we introduce the classic sparse-grids approach to solve the problem. For non-tensorial domains, we can always set \(D=1\) and consider an unstructured mesh for the whole domain, \({\mathscr {B}}\), having only one discretization level \(\alpha \in {\mathbb {N}}_+\). In this way, we give up the sparse-grid approach on the deterministic part of the problem and obtain a variant of the Multi-Level Stochastic Collocation method discussed in [37, 38], yet with a different algorithm for combining spatial and stochastic discretizations. See Remark 1 stated next and [22] for additional discussion on this aspect.

It would be straightforward to extend this setting to discretization methods based on degree elevation rather than on mesh refinement, such as spectral methods, p-refined finite elements or p- and k-refined splines. However, here we limit ourselves to the setting defined above. It would also be possible to include time-dependent problems in this framework, but in this case we might need to take care of possible constraints on discretization parameters, such as CFL conditions; a broader generalization could also include “non-physical” parameters such as tolerances for numerical solvers. Finally, more general problems, e.g., those depending on random variables with probability distributions other than uniform distributions or with uncertain boundary conditions and/or forcing terms could also be addressed with suitable modifications of the MISC methodology.

Tensorized quadrature formulae for expected value approximation. Similarly to what was presented for the deterministic problem, we base our approximation of the expected value of \(F^{{{\varvec{\alpha }}}}({{\varvec{y}}})\) on a tensorization of quadrature formulae over the stochastic domain, \(\Gamma \).

Let \(C^0([-1,1])\) be the set of real-valued continuous functions over \([-1,1]\), \(\beta \in {\mathbb {N}}_+\) be referred to as a “stochastic discretization level”, and \(m: {\mathbb {N}}\rightarrow {\mathbb {N}}\) be a strictly increasing function with \(m(0)=0\) and \(m(1)=1\), that we call a “level-to-nodes function”. At level \(\beta \), we consider a set of \(m(\beta )\) distinct quadrature points in \([-1,1]\), \({\mathscr {P}}^{m(\beta )} = \{y_\beta ^1, y_\beta ^2, \ldots , y_\beta ^{m(\beta )}\} \subset [-1,1]\), and a set of quadrature weights, \({\mathcal {W}}^{m(\beta )}=\{\varpi _\beta ^1, \varpi _\beta ^2, \ldots , \varpi _\beta ^{m(\beta )}\}\). We then define the quadrature operator as

$$\begin{aligned} Q^{m(\beta )} : C^0([-1,1]) \rightarrow {\mathbb {R}}, \quad Q^{m(\beta )}[f] = \sum _{j=1}^{m(\beta )} f(y_\beta ^j) \varpi _\beta ^j. \end{aligned}$$
(18)

The quadrature weights are selected such that \(Q^{m(\beta )}[y^k] = \int _{-1}^1 \frac{y^k}{2} dy, \quad \forall \, k=0,1,\ldots , m(\beta )-1\). The quadrature points are chosen to optimize the convergence properties of the quadrature error (the specific choice of quadrature points is discussed later in this section). In particular, for symmetry reasons, we define the trivial operator \(Q^{1}[f] = f(0),\; \forall f \in C^0([-1,1])\).

Defining a quadrature operator over \(\Gamma \) is more delicate, since \(\Gamma \) is defined as a countable tensor product of intervals. To this end, we follow [36] and define, for any finitely supported multi-index \({{{\varvec{\beta }}}}\in {\mathfrak {L}}_+\),

$$\begin{aligned} {\mathscr {Q}}^{m({{{\varvec{\beta }}}})}: C^0(\Gamma ) \rightarrow {\mathbb {R}}, \quad {\mathscr {Q}}^{m({{{\varvec{\beta }}}})} = \bigotimes _{j \ge 1} Q^{m(\beta _j)}, \end{aligned}$$

where the j-th quadrature operator is understood to act only on the j-th variable, and the tensor product is well defined since it is composed of finitely many non-trivial factors (see [36] again). In practice, the value of \({\mathscr {Q}}^{m({{{\varvec{\beta }}}})}[f]\) can be obtained by considering the tensor grid \({\mathscr {T}}^{m({{{\varvec{\beta }}}})} = \times _{j\ge 1} {\mathscr {P}}^{m(\beta _j)}\) with cardinality \( \#{\mathscr {T}}^{m({{{\varvec{\beta }}}})} = \prod _{j \ge 1} m(\beta _j)\) and computing

$$\begin{aligned} {\mathscr {Q}}^{m({{{\varvec{\beta }}}})}[f] = \sum _{j=1}^{\# {\mathscr {T}}^{m({{{\varvec{\beta }}}})} } f(\widehat{{{\varvec{y}}}}_j) w_j, \end{aligned}$$

where \(\widehat{{{\varvec{y}}}}_j \in {\mathscr {T}}^{m({{{\varvec{\beta }}}})}\) and \(w_j\) are (infinite) products of weights of the univariate quadrature rules. Notice that having \(m(1)=1\) is essential in this construction so that the cardinality of \({\mathscr {T}}^{m({{{\varvec{\beta }}}})}\) is finite for any \({{{\varvec{\beta }}}}\in {\mathfrak {L}}_+\) and \(\varpi _{\beta _j}^1=1\) whenever \(\beta _j=1\). All weights, \(w_j\), are thus bounded.

Coming back to the choice of the univariate quadrature points, it is recommended, for optimal performance, that they are chosen according to the underlying measure, \(~{\text {d}}\lambda /2\). Moreover, since we aim at a hierarchical decomposition of the operator, \({\mathscr {Q}}^{m({{{\varvec{\beta }}}})}\), it is useful (although not necessary, see e.g., [33]) that the nodes be nested collocation points, i.e., \({\mathscr {P}}^{m(\beta )} \subset {\mathscr {P}}^{m(\beta +1)}\) for any \(\beta \ge 1\). Thus, we consider Clenshaw–Curtis points that are defined as

$$\begin{aligned} y_\beta ^j=\cos \left( \frac{(j-1) \pi }{m(\beta )-1}\right) , \quad 1\le j \le m(\beta ). \end{aligned}$$
(19)

Clenshaw–Curtis points are nested provided that the level-to-nodes function is defined as

$$\begin{aligned} m(0)=0,\,\,m(1)=1,\,\, m(\beta )=2^{\beta -1}+1. \end{aligned}$$
(20)

We close this section by mentioning that another family of nested points for uniform measures available in the literature is the Leja points, whose performance is equivalent to that of Clenshaw–Curtis points for quadrature purposes. See, e.g., [10, 31, 32, 36] and references therein for definitions and comparison.

3.2 Construction of the MISC Estimator

It is straightforward to see that a direct approximation, \({\mathbb {E}}\left[ F\right] \approx {\mathscr {Q}}^{m({{{\varvec{\beta }}}})} [F^{{{\varvec{\alpha }}}}]\), is not a viable option in practical cases, due to the well-known “curse of dimensionality” effect. In [22], we proposed to use MISC as a computational strategy to combine spatial and stochastic discretizations in such a way as to obtain an effective approximation scheme for \({\mathbb {E}}\left[ F\right] \).

MISC is based on a classic sparsification approach in which approximations like \({\mathscr {Q}}^{m({{{\varvec{\beta }}}})}[F^{{{\varvec{\alpha }}}}]\) are decomposed in a hierarchy of operators. Only the most important of these operators are retained in the approximation. In more detail, let us denote for brevity \({\mathscr {Q}}^{m({{{\varvec{\beta }}}})} [F^{{{\varvec{\alpha }}}}] = F_{{{{\varvec{\alpha }}}},{{{\varvec{\beta }}}}}\) and introduce the first-order difference operators for the deterministic and stochastic discretization operators, denoted, respectively, by \(\Delta _{i}^{\mathrm{det}}\) with \(1\le i\le D\) and \(\Delta _{j}^{\mathrm{stoc}}\) with \(j \ge 1\):

$$\begin{aligned}&\Delta _{i}^{\mathrm{det}}[F_{{{{\varvec{\alpha }}}},{{{\varvec{\beta }}}}}] = {\left\{ \begin{array}{ll} F_{{{{\varvec{\alpha }}}},{{{\varvec{\beta }}}}} - F_{{{{\varvec{\alpha }}}}-{{\varvec{e}}}_i,{{{\varvec{\beta }}}}}, &{}\quad {\text {if }}\alpha _i> 1,\\ F_{{{{\varvec{\alpha }}}},{{{\varvec{\beta }}}}} &{}\quad {\text {if }} \alpha _i=1, \end{array}\right. } \\&\Delta _{j}^{\mathrm{stoc}}[F_{{{{\varvec{\alpha }}}},{{{\varvec{\beta }}}}}] = {\left\{ \begin{array}{ll} F_{{{{\varvec{\alpha }}}},{{{\varvec{\beta }}}}} - F_{{{{\varvec{\alpha }}}},{{{\varvec{\beta }}}}-{{\varvec{e}}}_j}, &{}\quad {\text {if }}\beta _j > 1,\\ F_{{{{\varvec{\alpha }}}},{{{\varvec{\beta }}}}} &{}\quad {\text {if }} \beta _j=1. \end{array}\right. } \end{aligned}$$

As a second step, we define the so-called mixed difference operators,

$$\begin{aligned} {{\varvec{\Delta }}}^{\mathrm{det}}[F_{{{{\varvec{\alpha }}}},{{{\varvec{\beta }}}}}]&= \bigotimes _{i=1}^D \Delta _i^{\mathrm{det}}[F_{{{{\varvec{\alpha }}}},{{{\varvec{\beta }}}}}] = \Delta _1^{\mathrm{det}} \left[ \, \Delta _2^{\mathrm{det}} \left[ \, \cdots \Delta _D^{\mathrm{det}} \left[ F_{{{{\varvec{\alpha }}}},{{{\varvec{\beta }}}}} \right] \, \right] \, \right] \nonumber \\&= \sum _{{{\varvec{i}}}\in \{0,1\}^D} (-1)^{|{{\varvec{i}}}|} F_{{{{\varvec{\alpha }}}}-{{\varvec{i}}},{{{\varvec{\beta }}}}}, \end{aligned}$$
(21)
$$\begin{aligned} {{\varvec{\Delta }}}^{\mathrm{stoc}}[F_{{{{\varvec{\alpha }}}},{{{\varvec{\beta }}}}}]&= \bigotimes _{j \ge 1} \Delta _j^{\mathrm{stoc}}[F_{{{{\varvec{\alpha }}}},{{{\varvec{\beta }}}}}] = \sum _{{{\varvec{j}}}\in \{0,1\}^{{\mathbb {N}}_+}} (-1)^{|{{\varvec{j}}}|} F_{{{{\varvec{\alpha }}}},{{{\varvec{\beta }}}}-{{\varvec{j}}}}, \end{aligned}$$
(22)

with the convention that \(F_{{{\varvec{v}}},{{\varvec{w}}}}=0\) whenever a component of \({{\varvec{v}}}\) or \({{\varvec{w}}}\) is zero. Notice that, since \({{{\varvec{\beta }}}}\) has finitely many components larger than 1, the sum on the right-hand side of (22) contains only a finite number of terms. Finally, letting

$$\begin{aligned} {{\varvec{\Delta }}}[F_{{{{\varvec{\alpha }}}},{{{\varvec{\beta }}}}}] = {{\varvec{\Delta }}}^{\mathrm{stoc}}[ {{\varvec{\Delta }}}^{\mathrm{det}}[F_{{{{\varvec{\alpha }}}},{{{\varvec{\beta }}}}}]], \end{aligned}$$
(23)

we define the Multi-index Stochastic Collocation (MISC) estimator of \({\mathbb {E}}\left[ F\right] \) as

$$\begin{aligned} {\mathscr {M}}_{{\mathcal {I}}}[F] = \sum _{[{{{\varvec{\alpha }}}}, {{{\varvec{\beta }}}}] \in {\mathcal {I}}} {{\varvec{\Delta }}}[F_{{{{\varvec{\alpha }}}},{{{\varvec{\beta }}}}}] = \sum _{ [{{{\varvec{\alpha }}}}, {{{\varvec{\beta }}}}] \in {\mathcal {I}}} c_{{{{\varvec{\alpha }}}},{{{\varvec{\beta }}}}} F_{{{{\varvec{\alpha }}}},{{{\varvec{\beta }}}}}, \quad c_{{{{\varvec{\alpha }}}},{{{\varvec{\beta }}}}} = \sum _{\begin{array}{c} [{{\varvec{i}}},{{\varvec{j}}}] \in \{0,1\}^{D+{\mathbb {N}}} \\ {[}{{{\varvec{\alpha }}}}+{{\varvec{i}}},{{{\varvec{\beta }}}}+{{\varvec{j}}}] \in {\mathcal {I}} \end{array}}(-1)^{|[{{\varvec{i}}},{{\varvec{j}}}]|_0}, \end{aligned}$$
(24)

where \({\mathcal {I}} \subset {\mathbb {N}}_+^{D} \times {\mathfrak {L}}_+\). The second form of the MISC estimator is known as the “combination technique”, since it expresses the MISC approximation as a linear combination of a number of tensor approximations, \(F_{{{{\varvec{\alpha }}}},{{{\varvec{\beta }}}}}\), and might be useful for the practical implementation of the method; we observe in particular that many of its coefficients, \(c_{{{{\varvec{\alpha }}}},{{{\varvec{\beta }}}}}\), are zero.

The effectiveness of MISC crucially depends on the choice of the index set, \({\mathcal {I}}\). Given the hierarchical structure of MISC, a natural requirement is that \({\mathcal {I}}\) should be downward-closed, i.e.,

$$\begin{aligned} \forall \,[{{{\varvec{\alpha }}}}, {{{\varvec{\beta }}}}] \in {\mathcal {I}}, \quad {\left\{ \begin{array}{ll} {[}{{{\varvec{\alpha }}}}- {{\varvec{e}}}_i, {{{\varvec{\beta }}}}] \in {\mathcal {I}} \text { for all } 1 \le i \le D \text { such that } \alpha _i> 1,\\ {[}{{{\varvec{\alpha }}}}, {{{\varvec{\beta }}}}- {{\varvec{e}}}_j] \in {\mathcal {I}} \text { for all } j \ge 1 \text { such that } \beta _j > 1 \end{array}\right. } \end{aligned}$$

(see also [6, 33, 39]). In addition to this general constraint, in [22] we have detailed a procedure to derive an efficient set, \({\mathcal {I}}\), based on an optimization technique inspired by the Dantzig algorithm for the approximate solution of the knapsack problem (see [30]). In the following, we briefly summarize this procedure and refer to [22] as well as to [3, 6, 33] for a thorough discussion on the similarities between this procedure and the Dantzig algorithm.

The first step of our optimized construction consists of introducing the “work contribution,” \(\Delta W_{{{{\varvec{\alpha }}}}, {{{\varvec{\beta }}}}}\), and “error contribution”, \(\Delta E_{{{{\varvec{\alpha }}}}, {{{\varvec{\beta }}}}}\), for each operator, \({{\varvec{\Delta }}}[F_{{{{\varvec{\alpha }}}},{{{\varvec{\beta }}}}}]\). The work contribution measures the computational cost (measured, e.g., as a function of the total number of degrees of freedom, or in terms of computational time) required to add \({{\varvec{\Delta }}}[F_{{{{\varvec{\alpha }}}},{{{\varvec{\beta }}}}}]\) to \({\mathscr {M}}_{\mathcal {I}}[F]\), i.e.,

$$\begin{aligned} \Delta W_{{{{\varvec{\alpha }}}}, {{{\varvec{\beta }}}}} = {\mathrm {Work}}\left[ {\mathscr {M}}_{{\mathcal {I}} \cup \{[{{{\varvec{\alpha }}}},{{{\varvec{\beta }}}}]\}}\right] - {\mathrm {Work}}\left[ {\mathscr {M}}_{{\mathcal {I}}}\right] = {\mathrm {Work}}\left[ {{\varvec{\Delta }}}[F_{{{{\varvec{\alpha }}}},{{{\varvec{\beta }}}}}]\right] . \end{aligned}$$
(25)

Similarly, the error contribution measures how much the error, \(|{\mathbb {E}}\left[ F\right] - {\mathscr {M}}_{\mathcal {I}}[F]| \), would decrease if the operator \({{\varvec{\Delta }}}[F_{{{{\varvec{\alpha }}}},{{{\varvec{\beta }}}}}]\) were added to \({\mathscr {M}}_{\mathcal {I}}[F]\),

$$\begin{aligned} \Delta E_{{{{\varvec{\alpha }}}}, {{{\varvec{\beta }}}}} = \left| {\mathscr {M}}_{{\mathcal {I}} \cup \{[{{{\varvec{\alpha }}}},{{{\varvec{\beta }}}}]\}}[F] - {\mathscr {M}}_{{\mathcal {I}}}[F] \right| = \left| {{\varvec{\Delta }}}[F_{{{{\varvec{\alpha }}}},{{{\varvec{\beta }}}}}] \right| . \end{aligned}$$
(26)

We observe that the following decompositions of the total work and error of the MISC estimator hold:

$$\begin{aligned}&{\mathrm {Work}}\left[ {\mathscr {M}}_{\mathcal {I}}\right] = \sum _{[{{{\varvec{\alpha }}}}, {{{\varvec{\beta }}}}] \in {\mathcal {I}}} \Delta W_{{{{\varvec{\alpha }}}}, {{{\varvec{\beta }}}}}, \end{aligned}$$
(27)
$$\begin{aligned}&|{\mathbb {E}}\left[ F\right] - {\mathscr {M}}_{\mathcal {I}}[F]| = \left| {\sum _{[{{{\varvec{\alpha }}}}, {{{\varvec{\beta }}}}] \notin {\mathcal {I}}} {{\varvec{\Delta }}}[F_{{{{\varvec{\alpha }}}},{{{\varvec{\beta }}}}}]} \right| \le \sum _{[{{{\varvec{\alpha }}}}, {{{\varvec{\beta }}}}] \notin {\mathcal {I}}} \left| { {{\varvec{\Delta }}}[F_{{{{\varvec{\alpha }}}},{{{\varvec{\beta }}}}}] } \right| = \sum _{[{{{\varvec{\alpha }}}}, {{{\varvec{\beta }}}}] \notin {\mathcal {I}}} \Delta E_{{{{\varvec{\alpha }}}}, {{{\varvec{\beta }}}}}. \end{aligned}$$
(28)

Although it would be tempting to define \({\mathcal {I}}\) as the set of couples \([{{{\varvec{\alpha }}}},{{{\varvec{\beta }}}}]\) with the largest error contribution, this choice could be far from optimal in terms of computational cost. As suggested in the literature on the knapsack problem (see [30]), the benefit-to-cost ratio should be taken into account in the decision (see also [3, 6, 20, 22, 33]). More precisely, we propose to build the MISC estimator by first assessing the so-called profit of each operator \({{\varvec{\Delta }}}[F_{{{{\varvec{\alpha }}}},{{{\varvec{\beta }}}}}]\), i.e., the quantity

$$\begin{aligned} P_{{{{\varvec{\alpha }}}}, {{{\varvec{\beta }}}}} = \frac{\Delta E_{{{{\varvec{\alpha }}}}, {{{\varvec{\beta }}}}}}{\Delta W_{{{{\varvec{\alpha }}}}, {{{\varvec{\beta }}}}}}. \end{aligned}$$

Then, we build an index set for the MISC estimator:

$$\begin{aligned} {\mathcal {I}} = {\mathcal {I}}(\epsilon ) =\left\{ [{{{\varvec{\alpha }}}}, {{{\varvec{\beta }}}}] \in {\mathbb {N}}_+^D \times {\mathfrak {L}}_+\,:\, P_{{{{\varvec{\alpha }}}}, {{{\varvec{\beta }}}}} \ge \epsilon \right\} , \end{aligned}$$
(29)

for a suitable \(\epsilon >0\). We observe that the obtained set is not necessarily downward-closed; we have to enforce this condition a posteriori. Obviously, \(\Delta E_{{{{\varvec{\alpha }}}}, {{{\varvec{\beta }}}}}\) and \(\Delta W_{{{{\varvec{\alpha }}}}, {{{\varvec{\beta }}}}}\) are not, in general, at our disposal. In practice, we base the construction of the MISC estimator on a-priori bounds for such quantities. More precisely, we derive a-priori ansatzes for these bounds from theoretical considerations and then fit the constants appearing in the ansatzes with some auxiliary computations. We refer to the entire strategy as a priori/a posteriori.

Remark 1

We remark that the general form of the MISC estimator (24) is quite broad and includes other related methods (i.e., methods that combine different spatial and stochastic discretization levels to optimize the computational effort) available in the literature, such as the “Multi-Level Stochastic Collocation” method [37, 38] and the “Sparse Composite Collocation” method [4]; see also [25]. The main novelty of the MISC estimator (24)-(29) with respect to such methods is the profit-oriented selection of difference operators. Another difference from [37, 38] is the fact that difference operators in our approach are introduced in both the spatial and stochastic domains. See also [4] for a similar construction, in which no optimization is performed. More details on the comparison between the above-mentioned methods and MISC can be found in [22].

4 Error Analysis of the MISC Method

In this section, we state and prove a convergence theorem for the profit-based MISC estimator based on the multi-index set (29). The theorem is based on a result from the previous work [33], which was proved in the context of sparse-grid approximation of Hilbert-space-valued functions. Since the sparse grid and the MISC constructions are identical, this theorem can be used verbatim here. In particular, it links the summability of the profits to the convergence rate of methods such as MISC and Sparse Grids Stochastic Collocation, i.e., based on a sum of difference operators. To use this result, we have to assess the summability properties of the profits. We thus introduce suitable estimates of the error and work contributions, \(\Delta E_{{{{\varvec{\alpha }}}},{{{\varvec{\beta }}}}}\) and \(\Delta W_{{{{\varvec{\alpha }}}},{{{\varvec{\beta }}}}}\), respectively. In particular, the estimate of \(\Delta E_{{{{\varvec{\alpha }}}},{{{\varvec{\beta }}}}}\) depends on the spatial regularity of the solution, on the convergence rate of the method used to solve the deterministic problems, and on the summability property of the Chebyshev expansion of the solution over the parameter space.

Theorem 4

(Convergence of the profit-based MISC estimator, see [33]) If the profits, \(P_{{{{\varvec{\alpha }}}},{{{\varvec{\beta }}}}}\), satisfy the weighted summability condition

$$\begin{aligned} \left( \sum _{ [{{{\varvec{\alpha }}}},{{{\varvec{\beta }}}}] \in {\mathbb {N}}_+^D \times {\mathfrak {L}}_+} P_{{{{\varvec{\alpha }}}},{{{\varvec{\beta }}}}}^{p} \Delta W_ {{{{\varvec{\alpha }}}},{{{\varvec{\beta }}}}} \right) ^{1/p} =C_P(p) < \infty \end{aligned}$$

for some \(0<p\le 1\), then

$$\begin{aligned} \big | {\mathbb {E}}\left[ F\right] - {\mathscr {M}}_{\mathcal {I}}[F] \big | \le C_P(p) {\mathrm {Work}}\left[ {\mathscr {M}}_{{\mathcal {I}}}\right] ^{1-1/p}, \end{aligned}$$

where \({\mathrm {Work}}\left[ {\mathscr {M}}_{{\mathcal {I}}}\right] \) is given by (27).

We begin by introducing an estimate for the size of the work contribution, \(\Delta W_{{{{\varvec{\alpha }}}},{{{\varvec{\beta }}}}}\). To this end, let \(\Delta W_{{{{\varvec{\alpha }}}}}^{{\mathrm{det}}}={\mathrm {Work}}\left[ {{\varvec{\Delta }}}^{{\mathrm{det}}}[F^{{{{\varvec{\alpha }}}}}]\right] \), i.e., let it be the cost of computing \({{\varvec{\Delta }}}^{{\mathrm{det}}}[F^{{{{\varvec{\alpha }}}}}]\) according to (21).

Assumption A3

(Spatial work contribution) There exist \(\gamma _i \in [1, \infty )\) for \(i=1,\ldots ,D\) and \(C_W > 0\) such that

$$\begin{aligned} \Delta W_{{{{\varvec{\alpha }}}}}^{{\mathrm{det}}} \le C_{W} 2^{ \sum _{i=1}^D \gamma _i d_i \alpha _i }, \end{aligned}$$
(30)

where \(2^{\sum _{i=1}^D d_i \alpha _i}\) is proportional to the number of degress of freedom in the mesh on level \({{{\varvec{\alpha }}}}\), cf. equation (17), and \(\gamma _i\) are related to the used deterministic solver and to the sparsity structure of the linear system, which might be different on each \({\mathscr {B}}_i\) depending on the chosen discretization.

Lemma 5

(Total work contribution) When using Clenshaw–Curtis points for the discretization over the parameter space, the work contribution, \(\Delta W_{{{{\varvec{\alpha }}}},{{{\varvec{\beta }}}}}\), of each difference operator, \({{\varvec{\Delta }}}[F_{{{{\varvec{\alpha }}}},{{{\varvec{\beta }}}}}]\), can be decomposed as

$$\begin{aligned} \Delta W_{{{{\varvec{\alpha }}}},{{{\varvec{\beta }}}}} \le C_{W} 2^{\sum _{i=1}^D \gamma _i d_i \alpha _i + |{{{\varvec{\beta }}}}-\mathbf{1}|}, \end{aligned}$$

with \(\gamma _i\) and \(C_W\) as in Assumption A3.

Proof

Combining (25) and (23), we have

$$\begin{aligned} \Delta W_{{{{\varvec{\alpha }}}}, {{{\varvec{\beta }}}}} = {\mathrm {Work}}\left[ {{\varvec{\Delta }}}^{\mathrm{stoc}}[ {{\varvec{\Delta }}}^{\mathrm{det}}[F_{{{{\varvec{\alpha }}}},{{{\varvec{\beta }}}}}]] \right] = {\mathrm {Work}}\left[ {{\varvec{\Delta }}}^{\mathrm{stoc}}[ {{\varvec{\Delta }}}^{\mathrm{det}}[{\mathscr {Q}}^{m({{{\varvec{\beta }}}})}[F^{{{\varvec{\alpha }}}}(\cdot )]]] \right] . \end{aligned}$$

Since the Clenshaw–Curtis points are nested, computing \(\Delta W_{{{{\varvec{\alpha }}}},{{{\varvec{\beta }}}}}\) (i.e., adding \([{{{\varvec{\alpha }}}},{{{\varvec{\beta }}}}]\) to the set \({\mathcal {I}}\) that defines the current MISC estimator) amounts to evaluating \(F^{{{\varvec{\alpha }}}}({{\varvec{y}}})\) in the set of “new” points added to the estimator by \({{\varvec{\Delta }}}^{{\mathrm{stoc}}}[\cdot ]\), i.e., \(\times _{j : \beta _j > 1} \left\{ {\mathscr {P}}^{m(\beta _j)} \setminus {\mathscr {P}}^{m(\beta _j-1)} \right\} \), whose cardinality is \(\prod _{j \ge 1} ( m(\beta _j) - m(\beta _j-1))\). The proof is then concluded by observing that the definition of \(m(\beta )\) in (20) immediately gives \(m(\beta _j) - m(\beta _j-1) \le 2^{\beta _j-1}\) and recalling Assumption A3. \(\square \)

Remark 2

We observe that the exponent \({{{\varvec{\beta }}}}-\mathbf{1}\) guarantees that the directions along which no quadrature is actually performed (i.e., \(\beta _j=1\) for any \(j\ge 1\)) do not contribute to the total work.

Next, we prove a sequence of lemmas that allow us to conclude that an analogous estimate holds for the error contribution as well, i.e., that \(\Delta E_{{{{\varvec{\alpha }}}},{{{\varvec{\beta }}}}}\) can be bounded as a product of a term related to the spatial discretization and a term related to the approximation over the parameter space. To this end, we need to introduce the quantity

$$\begin{aligned} {\text {Leb}}_{m(\beta )} = \sup _{f \in C^0([-1,1]), \Vert f\Vert _{L^\infty (-1,1)} = 1} \left| Q^{m(\beta )}[f] - Q^{m(\beta -1)}[f] \right| \quad \forall \beta \in {\mathbb {N}}_+, \end{aligned}$$

where \(Q^{m(\beta )}[\cdot ]\) are the univariate quadrature operators introduced in (18), and observe that \({\text {Leb}}_{1}=1\). Next, let \({\mathbb {L}}= \max _{\beta \ge 1} {\text {Leb}}_{m(\beta )}\), and note that \( {\mathbb {L}}\le 2\) since \(Q^{m(\beta )}\) has positive weights. Moreover, a much smaller bound on \({\mathbb {L}}\) can be obtained for Clenshaw–Curtis points. Indeed, since Clenshaw–Curtis points are nested, we can also bound \({\text {Leb}}_{m(\beta )} \le \widetilde{{\text {Leb}}}_{m(\beta )}\) with

$$\begin{aligned} \widetilde{{\text {Leb}}}_{m(\beta )} = \sum _{y^j_\beta \in {\mathscr {P}}^{m(\beta )} \cap {\mathscr {P}}^{m(\beta -1)}} \left| \varpi _\beta ^j - \varpi _{\beta -1}^j \right| + \sum _{y_j \in {\mathscr {P}}^{m(\beta )} \setminus {\mathscr {P}}^{m(\beta -1)}} \left| \varpi _{\beta }^j \right| , \end{aligned}$$

and it can be verified numerically that \(\widetilde{{\text {Leb}}}_{m(\beta )}\) is bounded, attains its maximum for \(\beta =3\) and converges to 1 for \(\beta \rightarrow \infty \), see Fig. 2. Therefore, we have \({\mathbb {L}}\le \widetilde{{\text {Leb}}}_{m(3)} \approx 1.067\).

Fig. 2
figure 2

Numerical evaluation of \(\widetilde{{\text {Leb}}}_{m(\beta )}\) for Clenshaw–Curtis points

Lemma 6

(Stochastic error contribution) Let \(f:\Gamma \rightarrow {\mathbb {R}}\) and \({{{\varvec{\beta }}}}\in {\mathfrak {L}}_+\), and assume that the quadrature operator, \({\mathscr {Q}}^{m({{{\varvec{\beta }}}})}\), is built with Clenshaw–Curtis abscissae. If there exists a sequence, \({{\varvec{\rho }}}=\{\rho _j\}_{j \ge 1}\), \(\rho _j > 1\) for all j, such that

  1. 1.

    \(\sum _{j \ge 1} \frac{1}{\rho _j} < \infty \),

  2. 2.

    there exists \(0< C_f < \infty \) such that for any finite set, \({\mathcal {G}}' \subset {\mathbb {N}}_+\) with \(\# {\mathcal {G}}' < \infty \), the restriction of f on \(([-1,1])^{{\mathcal {G}}'}\) admits a holomorphic complex extension in a Bernstein polyellipse, \({\mathcal {E}}_{{{\varvec{\rho }}}}^{{\mathcal {G}}'}\) with \(\sup _{{{\varvec{z}}}\in {\mathcal {E}}_{{{\varvec{\rho }}}}^{{\mathcal {G}}'}} |f({{\varvec{z}}})| \le C_f\).

Then, the set

$$\begin{aligned} {\mathcal {J}} = \left\{ j \ge 1 : \rho _j \le 2 ({\mathbb {L}})^{\frac{1}{3}}\right\} , \end{aligned}$$

has finite cardinality, i.e., \(\#{\mathcal {J}} < \infty \) and

$$\begin{aligned} \left| {{\varvec{\Delta }}}^{{\mathrm{stoc}}}[{\mathscr {Q}}^{m({{{\varvec{\beta }}}})}f] \right|\le & {} C_{{\text {SE}}}({{{\varvec{\rho }}}}) \sup _{{{\varvec{z}}}\in {\mathcal {E}}_{{{\varvec{\rho }}}}^{\mathcal {G}}} |f({{\varvec{z}}})| e^{-\sum _{j \ge 1} g_jm(\beta _j-1)} \\\le & {} C_{{\text {SE}}}({{{\varvec{\rho }}}}) C_f e^{-\sum _{j \ge 1} g(\rho _j) m(\beta _j-1)}, \end{aligned}$$

holds, where \({\mathcal {G}}\) is the support of \({{{\varvec{\beta }}}}-\mathbf{1}\),

$$\begin{aligned} 0 < g(\rho _j) = {\left\{ \begin{array}{ll} \log \rho _j, &{} {\text {for }} j \in {\mathcal {J}}, \\ \log \rho _j - \log 2 - \frac{1}{3} \log \left( {\mathbb {L}}\right) , &{} {\text {otherwise}}, \end{array}\right. } \end{aligned}$$

and \(C_{{\text {SE}}}({{{\varvec{\rho }}}}) < \infty \) is independent of \({{{\varvec{\beta }}}}\).

Proof

Let \({\mathcal {G}}\) be the support of \({{{\varvec{\beta }}}}-\mathbf{1}\) with cardinality \(\#{\mathcal {G}}< \infty \), \({{\varvec{k}}}\in {\mathbb {N}}_+^{\mathcal {G}}\), and let \(\Phi _{{\mathcal {G}},{{\varvec{k}}}}\) denote the Chebyshev polynomials of the first kind with degree \(k_j\) along \(y_{j}\) for \(j \ge 1\). We observe that \(\Phi _{{\mathcal {G}},{{\varvec{k}}}}\) are equivalent to the \(\#{\mathcal {G}}\)-variate Chebyshev polynomials over \([-1,1]^{\#{\mathcal {G}}}\) thanks to the product structure of the multivariate Chebyshev polynomials and to the fact that \(\phi _0(y)=1\). Next, consider the holomorphic extension of \(f:{\mathbb {C}}^{\mathcal {G}}\rightarrow {\mathbb {C}}\), and its Chebyshev expansion over \(\Phi _{{\mathcal {G}},{{\varvec{k}}}}\) introduced in Lemma 3. Then

$$\begin{aligned} |{{\varvec{\Delta }}}^{{\mathrm{stoc}}}[ {\mathscr {Q}}^{m({{{\varvec{\beta }}}})} f ]|&= \left| {{\varvec{\Delta }}}^{{\mathrm{stoc}}} \left[ {\mathscr {Q}}^{m({{{\varvec{\beta }}}})} \left[ \sum _{{{\varvec{k}}}\in {\mathbb {N}}_+^{\mathcal {G}}} f_{{\varvec{k}}}\Phi _{{\mathcal {G}},{{\varvec{k}}}} \right] \right] \right| \\&= \left| \sum _{{{\varvec{k}}}\in {\mathbb {N}}_+^{\mathcal {G}}} f_{{\varvec{k}}}{{\varvec{\Delta }}}^{{\mathrm{stoc}}} \left[ {\mathscr {Q}}^{m({{{\varvec{\beta }}}})}\big [\Phi _{{\mathcal {G}},{{\varvec{k}}}} \big ] \right] \right| \end{aligned}$$

holds. By construction of hierarchical surplus, we have \({{\varvec{\Delta }}}^{\mathrm{stoc}}[ {\mathscr {Q}}^{m({{{\varvec{\beta }}}})}[\Phi _{{\mathcal {G}},{{\varvec{k}}}}]] = 0\) for all Chebyshev polynomials, \(\Phi _{{\mathcal {G}},{{\varvec{k}}}}\), such that \(\exists \,j \in {\mathcal {G}}: k_j < m(\beta _j -1)\) (i.e., for polynomials that are integrated exactly at least in one direction by both quadrature operators along that direction). Therefore, the previous sum reduces to the multi-index set \({{\varvec{k}}}\ge m({{{\varvec{\beta }}}}- \mathbf{1})\). Furthermore, by the triangular inequality, we have

$$\begin{aligned} |{{\varvec{\Delta }}}^{{\mathrm{stoc}}}[ {\mathscr {Q}}^{m({{{\varvec{\beta }}}})} f ]| \le \sum _{{{\varvec{k}}}\ge m({{{\varvec{\beta }}}}-\mathbf{1})} | f_{{\varvec{k}}}| \left| {{\varvec{\Delta }}}^{{\mathrm{stoc}}} \Big [{\mathscr {Q}}^{m({{{\varvec{\beta }}}})}\big [\Phi _{{\mathcal {G}},{{\varvec{k}}}} \big ] \Big ] \right| . \end{aligned}$$

Next, using the definitions of \({{\varvec{\Delta }}}^{{\mathrm{stoc}}}\) and \({\text {Leb}}_{m(\beta )}\) and recalling that Chebyshev polynomials of the first kind on \([-1,1]\) are bounded by 1, that \({\text {Leb}}_{m(\beta )} \le \widetilde{{\text {Leb}}}_{m(\beta )} \le 1\) for \(\beta =1,2\) and \({\text {Leb}}_{m(\beta )}\le {\mathbb {L}}\) for \(\beta \ge 3\), we have

$$\begin{aligned} \Big | {{\varvec{\Delta }}}^{{\mathrm{stoc}}} \Big [ {\mathscr {Q}}^{m({{{\varvec{\beta }}}})}\big [ \Phi _{{\mathcal {G}},{{\varvec{k}}}}\big ] \Big ] \Big |&= \left| \bigotimes _{j \in {\mathcal {G}}} \Delta ^{{\mathrm{stoc}}}_j[ Q^{m(\beta _j)}[\phi _{k_j}]] \right| \nonumber \\&\le \prod _{j \in {\mathcal {G}}} \widetilde{{\text {Leb}}}_{m(\beta _j)} \left\| \phi _{k_j} \right\| _{L^\infty ([-1,1])} \le \prod _{\beta _j \ge 3} {\mathbb {L}}. \end{aligned}$$
(31)

We then bound \(| f_{{\varvec{k}}}|\) by Lemma 3 to obtain

$$\begin{aligned} |{{\varvec{\Delta }}}^{{\mathrm{stoc}}}[ {\mathscr {Q}}^{m({{{\varvec{\beta }}}})} f ]|&\le \sup _{{{\varvec{z}}}\in {\mathcal {E}}_{{{\varvec{\rho }}}}^{\mathcal {G}}} |f({{\varvec{z}}})| \Bigg ( \prod _{\beta _j \ge 3} {\mathbb {L}}\Bigg ) \sum _{{{\varvec{k}}}\ge m({{{\varvec{\beta }}}}- \mathbf{1})} \prod _{j \in {\mathcal {G}}} 2^{|k_j|_0} \rho _j^{-k_j} \\&\le \sup _{{{\varvec{z}}}\in {\mathcal {E}}_{{{\varvec{\rho }}}}^{\mathcal {G}}} |f({{\varvec{z}}})| \Bigg ( \prod _{\beta _j \ge 3} {\mathbb {L}}\Bigg ) \left( \prod _{j \in {\mathcal {G}}} \sum _{k_j \ge m(\beta _j - 1)} 2^{|k_j|_0} \rho _j^{-k_j} \right) \\&= \sup _{{{\varvec{z}}}\in {\mathcal {E}}_{{{\varvec{\rho }}}}^{\mathcal {G}}} |f({{\varvec{z}}})| \Bigg ( \prod _{\begin{array}{c} j \in {\mathcal {J}} \\ \beta _j \ge 3 \end{array}} {\mathbb {L}}\Bigg ) \Bigg ( \prod _{\begin{array}{c} j \notin {\mathcal {J}} \\ \beta _j \ge 3 \end{array}} {\mathbb {L}}\Bigg )\\&\quad \times \left( \!\prod _{j \in {\mathcal {G}}\cap {\mathcal {J}} } \sum _{k_j \ge m(\beta _j - 1)} 2^{|k_j|_0} \rho _j^{-k_j} \!\right) \left( \!\prod _{j \in {\mathcal {G}}\setminus {\mathcal {J}}} \sum _{k_j \ge m(\beta _j - 1)} 2^{|k_j|_0} \rho _j^{-k_j} \!\right) \!. \end{aligned}$$

Next, we observe that \(|k_j|_0 \le \min \{1, k_j\}\) for \(k_j \ge 0\) and \(1 \le \frac{1}{3}m(\beta _j - 1)\) for all \(\beta _j \ge 3\). Then,

$$\begin{aligned} |{{\varvec{\Delta }}}^{{\mathrm{stoc}}}[ {\mathscr {Q}}^{m({{{\varvec{\beta }}}})} f ]|&\le \sup _{{{\varvec{z}}}\in {\mathcal {E}}_{{{\varvec{\rho }}}}^{\mathcal {G}}} |f({{\varvec{z}}})| {\mathbb {L}}^{\# {\mathcal {J}}} \Bigg ( \prod _{j \notin {\mathcal {J}}} {\mathbb {L}}^{\frac{1}{3}m(\beta _j-1)} \Bigg )\\&\qquad 2^{\# {\mathcal {J}}} \left( \prod _{j \in {\mathcal {G}}\cap {\mathcal {J}} } \sum _{k_j \ge m(\beta _j - 1)} \rho _j^{-k_j} \right) \left( \prod _{j \in {\mathcal {G}}\setminus {\mathcal {J}}} \sum _{k_j \ge m(\beta _j - 1)} 2^{k_j} \rho _j^{-k_j} \right) \\&\le \left( 2 {\mathbb {L}}\right) ^{\# {\mathcal {J}}} \sup _{{{\varvec{z}}}\in {\mathcal {E}}_{{{\varvec{\rho }}}}^{\mathcal {G}}} |f({{\varvec{z}}})| \left( \prod _{j \in {\mathcal {G}}} \sum _{k_j \ge m(\beta _j - 1)} e^{-g(\rho _j) k_j} \right) \\&= \left( 2 {\mathbb {L}}\right) ^{\# {\mathcal {J}}} \sup _{{{\varvec{z}}}\in {\mathcal {E}}_{{{\varvec{\rho }}}}^{\mathcal {G}}} |f({{\varvec{z}}})| \left( \prod _{j \in {\mathcal {G}}} \frac{1}{1-e^{- g(\rho _j)}}\right) \left( \prod _{j \in {\mathcal {G}}} e^{-g(\rho _j) m(\beta _j-1)}\right) \\&\le C_{{\text {SE}}}({{{\varvec{\rho }}}})\sup _{{{\varvec{z}}}\in {\mathcal {E}}_{{{\varvec{\rho }}}}^{\mathcal {G}}} |f({{\varvec{z}}})| \prod _{j \ge 1} e^{-g(\rho _j) m(\beta _j-1)}, \end{aligned}$$

where the last inequality is due to the fact that \(m(\beta _j - 1) = 0\) whenever \(j \notin {\mathcal {G}}\) or equivalently \(\beta _j=1\) and

$$\begin{aligned} \prod _{j \in {\mathcal {G}}} \frac{1}{1-e^{- g(\rho _j)}} \le \prod _{j > 1} \frac{1}{1-e^{- g(\rho _j)}}, \end{aligned}$$

since \(g(\rho _j) > 0\) for all \(j\ge 1\). Note that \(C_{{\text {SE}}}({{{\varvec{\rho }}}})\) is independent of \({{{\varvec{\beta }}}}\) and is bounded since

$$\begin{aligned} \begin{aligned} \prod _{j \ge 1} \frac{1}{1-e^{-{g}(\rho _j)}}< \infty&\Longleftrightarrow -\sum _{j \ge 1} \log (1-e^{-{g}(\rho _j)})< \infty \Longleftrightarrow \sum _{j \ge 1} e^{-{g}(\rho _j)}< \infty \\&\Longleftrightarrow \sum _{j \in {\mathcal {J}}} \frac{1}{\rho _j} + \sum _{j \notin {\mathcal {J}}} \frac{2\left( {\mathbb {L}}\right) ^{\frac{1}{3}}}{\rho _j} < \infty , \end{aligned} \end{aligned}$$

which was assumed. Moreover, to show that \(\# {\mathcal {J}} < \infty \), note that \(\sum _{j \in {\mathcal {J}}} \rho _j^{-1}\) is otherwise unbounded, which contradicts the first assumption of the theorem, namely \(\sum _{j \ge 1} \rho _j^{-1} < \infty \). \(\square \)

Remark 3

Sharper estimates could be obtained by exploiting the structure of the Chebyshev polynomials when bounding \(\left| {\Delta ^{{\mathrm {stoc}}}[Q^{m(\beta _j)}[\phi _{k_j}]]}\right| \) in (31) (for instance, the fact that \(Q^{m(\beta _j)}[\phi _{k_j}]=0\) whenever \(k_j\) is odd and larger than 1) rather than using the general bound \(\Delta ^{{\mathrm {stoc}}}[ Q^{m(\beta _j)}[\phi _{k_j}]] \le \widetilde{{\text {Leb}}}_{m(\beta _j)} \left\| \phi _{k_j} \right\| _{L^\infty ([-1,1])} \).

We are now almost in the position to prove the estimate on the error contribution (see Lemma 8); before doing this, we need another auxiliary lemma that gives conditions for the summability of certain sequences that will be considered in the proof of Lemma 8 as well as in the proof of the main theorem on the convergence of MISC.

Lemma 7

(Summability of stochastic rates) Recall the definitions of \(\{\zeta _{0,j}\}_{j \ge 1}\) in Lemma 1, of \(\{\zeta _{s,j}\}_{j \ge 1}\) in Lemma 2 and of \(g(\cdot )\) in Lemma 6. Under Assumption A2, for all \(s=0, 1, \ldots , s_{\max }\), the sequences \(\{e^{-g(\zeta _{s,j})}\}_{j \ge 1}\) and \(\left\{ \frac{1}{\zeta _{s,j}}\right\} _{j \ge 1}\) are \(\ell ^{p}\)-summable for \(p \ge \tilde{p}_s = \frac{p_s}{1-p_s}\), with \(\tilde{p}_s < 1\).

Proof

First note that, by definition of \(g(\cdot )\), we have

$$\begin{aligned} \sum _{j \ge 1} e^{-p g(\zeta _{s,j})} \le 2^{p} \left( {\mathbb {L}}\right) ^{3p} \sum _{j \ge 1} \zeta _{s,j}^{-p}. \end{aligned}$$

Then, from (14)–(15), or (12)–(13) for \(s=0\), we can bound \( 2\tau _{s,j} \le \zeta _{s,j}\) and obtain

$$\begin{aligned} \sum _{j \ge 1} \zeta _{s,j}^{-p} \le 2^{-p} \sum _{j \ge 1} \tau _{s,j}^{-p} = 2^{-p}\left( \frac{\pi }{E_\delta \left\| {{\varvec{b}}}_s \right\| _{\ell ^{p_s}} ^{p_s}} \right) ^{-p} \sum _{j \ge 1} b_{s,j}^{(1-p_s)p}. \end{aligned}$$

From Assumption A2, we know that \({{\varvec{b}}}_s \in \ell ^{p_s}\) for \(p_s \le \frac{1}{2}\), and therefore we have the condition

$$\begin{aligned} (1-p_s)p \ge p_s \Rightarrow p \ge \frac{p_s}{1-p_s} < 1. \end{aligned}$$

\(\square \)

Lemma 8

(Total error contribution) Assume that the deterministic problem is solved with a method obtained by tensorizing piecewise multi-linear finite element spaces on each mesh, \(\{{\mathbb {T}}_i\}_{i=1}^D\), discretizing each \(\{{\mathscr {B}}_i\}_{i=1}^D\), and let \(h_i\) as in (17) be the mesh size of each \(\{{\mathbb {T}}_i\}_{i=1}^D\). Then, under Assumptions A1 and A2, the error contribution, \(\Delta E_{{{{\varvec{\alpha }}}},{{{\varvec{\beta }}}}}\), of each difference operator, \({{\varvec{\Delta }}}[F_{{{{\varvec{\alpha }}}},{{{\varvec{\beta }}}}}],\) can be decomposed as

$$\begin{aligned} \Delta E_{{{{\varvec{\alpha }}}},{{{\varvec{\beta }}}}} \le \min _{s=0,1,\ldots ,s_{\max }} C_s \Delta E_{{{{\varvec{\alpha }}}}}^{{\mathrm{det}}}(s)\Delta E_{{{{\varvec{\beta }}}}}^{{\mathrm{stoc}}}(s), \end{aligned}$$
(32)

for a constant \(C_s < \infty \) independent of \({{{\varvec{\alpha }}}}\) and \({{{\varvec{\beta }}}}\) and

$$\begin{aligned} \Delta E_{{{{\varvec{\alpha }}}}}^{{\mathrm{det}}}(s)&= 2^{-{{{\varvec{\alpha }}}}\cdot {{\varvec{r}}}_{{\mathrm {FEM}}}(s{{\varvec{q}}})} , \end{aligned}$$
(33)
$$\begin{aligned} \Delta E_{{{{\varvec{\beta }}}}}^{{\mathrm{stoc}}}(s)&= e^{-\sum _{j\ge 1} m(\beta _j-1)g_{s,j}} , \end{aligned}$$
(34)

with \(g_{s,j} = g(\zeta _{s,j})\) as in Lemma 6 and \(r_{{\mathrm {FEM}}}(s{{\varvec{q}}})_i=\min \left\{ 1,q_is\right\} \) for \(i=1,\ldots ,D\) with \({{\varvec{q}}}\in {\mathbb {R}}_+^D\) s.t. \(|{{\varvec{q}}}|=1\).

Proof

Combining the definition of \({{\varvec{\Delta }}}[F_{{{{\varvec{\alpha }}}},{{{\varvec{\beta }}}}}]\), cf. (23), and the definition of \(\Delta ^{{\mathrm{det}}}[F_{{{{\varvec{\alpha }}}},{{{\varvec{\beta }}}}}]\), cf. (21), we have

$$\begin{aligned} \Delta E_{{{{\varvec{\alpha }}}},{{{\varvec{\beta }}}}} = | {{\varvec{\Delta }}}[F_{{{{\varvec{\alpha }}}},{{{\varvec{\beta }}}}}] |&=\left| {{\varvec{\Delta }}}^{\mathrm{stoc}}[ {{\varvec{\Delta }}}^{\mathrm{det}}[F_{{{{\varvec{\alpha }}}},{{{\varvec{\beta }}}}}]]\right| = \left| {{\varvec{\Delta }}}^{\mathrm{stoc}}\left[ \sum _{{{\varvec{j}}}\in \{0,1\}^D} (-1)^{|{{\varvec{j}}}|} F_{{{{\varvec{\alpha }}}}-{{\varvec{j}}},{{{\varvec{\beta }}}}} \right] \right| \\&= \left| {{\varvec{\Delta }}}^{\mathrm{stoc}}\left[ \sum _{{{\varvec{j}}}\in \{0,1\}^D} (-1)^{|{{\varvec{j}}}|} {\mathscr {Q}}^{m({{{\varvec{\beta }}}})}[\Theta [u^{{{{\varvec{\alpha }}}}-{{\varvec{j}}}}(\cdot ,{{\varvec{y}}})]]\right] \right| \\&= \left| {{\varvec{\Delta }}}^{\mathrm{stoc}}\left[ {\mathscr {Q}}^{m({{{\varvec{\beta }}}})}\Theta \left[ \sum _{{{\varvec{j}}}\in \{0,1\}^D} (-1)^{|{{\varvec{j}}}|} u^{{{{\varvec{\alpha }}}}-{{\varvec{j}}}}(\cdot ,{{\varvec{y}}})\right] \right] \right| \\&= \left| {{\varvec{\Delta }}}^{\mathrm{stoc}}\left[ {\mathscr {Q}}^{m({{{\varvec{\beta }}}})}\Theta [ {{\varvec{\Delta }}}^{\mathrm{det}}[ u^{{{{\varvec{\alpha }}}}}(\cdot ,{{\varvec{y}}})]]\right] \right| . \end{aligned}$$

We observe that \(f({{\varvec{y}}}) = \Theta [ {{\varvec{\Delta }}}^{\mathrm{det}}[u^{{{{\varvec{\alpha }}}}}(\cdot ,{{\varvec{y}}})]]\) is a linear combination of some \(u^{{{\varvec{\alpha }}}}\) and that each of these \(u^{{{\varvec{\alpha }}}}\) is an \(H^1_0({\mathscr {B}},{\mathbb {C}})\)-holomorphic function, since the finite element approximations of u are holomorphic in the same complex region as u itself; hence, \(f({{\varvec{y}}})\) is also holomorphic in the same region. Then, thanks to the summability properties in Lemma 7, we can apply Lemma 6 to \(f(\cdot )\) in the polyellipses defined in Lemmas 1 and 2 by \({{\varvec{\zeta }}}_s\) in (14) (or (12) for \(s=0\)) and obtain

$$\begin{aligned} \Delta E_{{{{\varvec{\alpha }}}},{{{\varvec{\beta }}}}}&\le C_{{\mathrm {SE}}}({{\varvec{\zeta }}}_s) \sup _{{{\varvec{z}}}\in {\mathcal {E}}_{{{\varvec{\zeta }}}_s}^{\mathcal {G}}} |\Theta [{{\varvec{\Delta }}}^{{\mathrm{det}}} [u^{{{{\varvec{\alpha }}}}}({{\varvec{x}}},{{\varvec{z}}})]| e^{- \sum _{j\ge 1} g(\zeta _{s,j}) m(\beta _j-1)} \\&\le C_{{\mathrm {SE}}}({{\varvec{\zeta }}}_s) \left\| \Theta \right\| _{H^{-1}({\mathscr {B}})} \sup _{{{\varvec{z}}}\in {\mathcal {E}}_{{{\varvec{\zeta }_s}}}^{\mathcal {G}}} \left\| {{\varvec{\Delta }}}^{{\mathrm{det}}} [u^{{{{\varvec{\alpha }}}}}(\cdot ,{{\varvec{z}}})] \right\| _{H^1({\mathscr {B}},{\mathbb {C}})} e^{- \sum _{j \ge 1} g(\zeta _{s,j}) m(\beta _j-1)}, \end{aligned}$$

where \({\mathcal {G}}\subset {\mathbb {N}}_+\) denotes the support of \({{{\varvec{\beta }}}}-\mathbf{1}\). Next, assuming that the spatial discretization consists of piecewise linear finite elements with spatial mesh sizes (17) and combining the a-priori bounds on the decay of the difference operators coming from the Combination Technique theory (see, e.g., [19, proof of Theorem 2]) with (4) and the fact that \(u \in H^{1+s}({\mathscr {B}})\) for any \(s=0, 1,\ldots ,s_{\max }\), we have the following bound for every \({{\varvec{z}}}\) in the Bernstein polyellipse, \({\mathcal {E}}_{{{\varvec{\zeta }}}_s}^{\mathcal {G}}\):

$$\begin{aligned} \left\| {{\varvec{\Delta }}}^{{\mathrm{det}}} [u^{{{{\varvec{\alpha }}}}}(\cdot ,{{\varvec{z}}})] \right\| _{H^1({\mathscr {B}},{\mathbb {C}})}&\le C_{CT} \left\| u(\cdot ,{{\varvec{z}}}) \right\| _{{\mathcal {H}}^{\mathbf{1}+s{{\varvec{q}}}}({\mathscr {B}},{\mathbb {C}})} 2^{-\sum _{i=1}^D \alpha _i \min \{1,q_i s\}} \end{aligned}$$
(35)
$$\begin{aligned}&\le C_{CT} C_{s,{{\varvec{q}}}}\left\| u(\cdot ,{{\varvec{z}}}) \right\| _{H^{1+s}({\mathscr {B}},{\mathbb {C}})} 2^{-{{{\varvec{\alpha }}}}\cdot {{\varvec{r}}}_{{\mathrm {FEM}}}(s{{\varvec{q}}})}, \end{aligned}$$
(36)

for \({{\varvec{q}}}\in {\mathbb {R}}_+^D\) s.t. \(q_i\) s.t. \(|{{\varvec{q}}}|=1\), some \(C_{CT}>0\) independent of u, and where \(C_{s,{{\varvec{q}}}}\) is the embedding constant between \({\mathcal {H}}^{\mathbf{1}+s{{\varvec{q}}}}({\mathscr {B}},{\mathbb {C}})\) and \(H^{1+s}({\mathscr {B}},{\mathbb {C}})\). We then have the following bound:

$$\begin{aligned}&\Delta E_{{{{\varvec{\alpha }}}},{{{\varvec{\beta }}}}} \le C_{{\mathrm {SE}}}({{\varvec{\zeta }}}_s) \left\| \Theta \right\| _{H^{-1}({\mathscr {B}})} \\&\quad \times C_{CT}C_{s,{{\varvec{q}}}} \sup _{{{\varvec{z}}}\in {\mathcal {E}}_{{{\varvec{\zeta }}}_s}^{\mathcal {G}}} \left\| u(\cdot ,{{\varvec{z}}}) \right\| _{H^{1+s}({\mathscr {B}},{\mathbb {C}})} e^{- \sum _j g_{s,j} m(\beta _j-1)} 2^{-{{{\varvec{\alpha }}}}\cdot {{\varvec{r}}}_{{\mathrm {FEM}}}(s{{\varvec{q}}})}, \end{aligned}$$

where the constant, \(C_{{\mathrm {SE}}}({{\varvec{\zeta }}}_s)\) is bounded independently of \({{{\varvec{\beta }}}}\), thanks again to Lemma 6. The proof is then concluded by recalling that \(\sup _{{{\varvec{z}}}\in {\mathcal {E}}_{{{\varvec{\zeta }}}_s}^{\mathcal {G}}} \left\| u(\cdot ,{{\varvec{z}}}) \right\| _{H^{1+s}({\mathscr {B}},{\mathbb {C}})} \le C_{s,u}\) independently of \({{{\varvec{\beta }}}}\) and \({\mathcal {G}}\) due to Lemmas 1 and 2. \(\square \)

Observe that the result in Lemma 8 gives a bound on \(\Delta E_{{{{\varvec{\alpha }}}},{{{\varvec{\beta }}}}}\) parametric on the vector \({{\varvec{q}}}\). The optimal choice for such \({{\varvec{q}}}\) will be discussed later on, in the proof of the main theorem, namely Theorem 10.

Remark 4

(Relaxing the simplifying assumption) We now clarify why the assumption \(p_s < \frac{1}{2}\), for \(s=0,1,\ldots ,s_{\max }\), is not essential. Due to a suboptimal choice of \(\zeta _{s,j}\) in Lemma 2 (and Lemma 1 for \(s=0\)), the sequence \(\{e^{-g_{s,j} }\}_{j \ge 1}\) in Lemma 7, which is related to the solution u and which appears in the proof of the MISC convergence theorem, has worse summability \(\tilde{p}_s=\frac{p_s}{1-p_s}\) than the sequence \(\{b_{s,j} \}_{j \ge 1}\), whose summability coefficient is \(p_s\), and which is related to the diffusion coefficient, a. As we see in the main theorem stated below, we need \(\tilde{p}_s<1\) to guarantee convergence of MISC, which implies \(p_s < \frac{1}{2}\). By choosing the polyellipses in Lemmas 1 and 2 by the more elaborate strategy presented in [11], it would be possible to obtain the better summability \(\tilde{p}_s=p_s\) for the sequence \(\{e^{-g_{s,j} }\}_{j \ge 1}\), which would only imply the less stringent condition \(p_s<1\) and a better estimate for the MISC convergence rate. However, for ease of exposition, we maintain the sub-optimal choice, which is enough for the purpose of presenting the argument that proves convergence of MISC. The restriction \(p_s<\frac{1}{2}\) formally prevents us from applying the MISC convergence analysis to diffusion coefficients with low spatial regularity. In practice, we see in Sect. 6 that the convergence estimates are numerically verified beyond this restriction.

Before proving the main theorem of this section, we finally need the following technical lemma.

Lemma 9

(Bounding a sum of double exponentials) For \(a>0\), \(b \ge 2\) and \(0\le c < a b\),

$$\begin{aligned} \sum _{k=1}^\infty e^{-a b^k+c k} \le e^{-ab + \varepsilon (a,b,c)} \end{aligned}$$

holds, where for each fixed c and b, \(\varepsilon (\cdot ,b, c)\) is a monotonically decreasing, strictly positive function with \(\varepsilon (a,b,c) \rightarrow c \text { as } a \rightarrow +\infty \).

Proof

$$\begin{aligned} \sum _{k\ge 1} e^{-a b^k+c k}&= e^{-ab+c} + \sum _{k\ge 2} e^{-a b^k+c k} = e^{-ab+c} + \sum _{k\ge 1} e^{-a b^{k+1}+c(k+1)} \\&= e^{-ab} \left( e^{c}+ e^{c}\sum _{k\ge 1} e^{-a b (b^{k}-1) + ck} \right) . \end{aligned}$$

We observe that for \(b\ge 2\) we have \(b^k-1 \ge k\) for \(k \ge 1\) integer. Therefore, \(e^{-a b (b^{k}-1)} \le e^{-abk}\) and we have

$$\begin{aligned} \sum _{k\ge 1} e^{-a b^k+c k} \le e^{-ab} \left( e^{c}+ e^{c} \sum _{k\ge 1} e^{k(c -a b)} \right) = e^{-ab} \left( e^{c}+ \frac{e^{2c-ab}}{1-e^{c-ab}} \right) . \end{aligned}$$

Then,

$$\begin{aligned} \varepsilon (a,b,c) = \log \left( e^{c}+ \frac{e^{2c-ab}}{1-e^{c-ab}}\right) , \end{aligned}$$

and we finish by verifying that the function, \(\varepsilon \), has the required properties. \(\square \)

We are now ready to state and prove our main result.

Theorem 10

(MISC convergence theorem) Under Assumptions A1A3, the profit-based MISC estimator, \({\mathscr {M}}_{\mathcal {I}}\), built using the set \({\mathcal {I}}\) defined in (29), Stochastic Collocation with Clenshaw–Curtis points as in (19)–(20), and spatial discretization obtained by tensorizing multi-linear piecewise finite element spaces on each mesh, \(\{{\mathbb {T}}_i\}_{i=1}^D\), with mesh sizes \(h_i\) as in (17) for solving the deterministic problems, satisfies, for any \(\delta > 0\),

$$\begin{aligned} \big | {\mathbb {E}}\left[ F\right] - {\mathscr {M}}_{\mathcal {I}}[F] \big | \le C_{\delta }{\mathrm {Work}}\left[ {\mathscr {M}}_{{\mathcal {I}}}\right] ^{-r_{{\mathrm {MISC}}} + \delta }, \end{aligned}$$

for some constant \(C_\delta >0\) that is independent of \({\mathrm {Work}}\left[ {\mathscr {M}}_{{\mathcal {I}}}\right] \) and tends to infinity as \(\delta \rightarrow 0\). Moreover, \({\mathrm {Work}}\left[ {\mathscr {M}}_{{\mathcal {I}}}\right] \) is given by (27), and

$$\begin{aligned} r_{{\mathrm {MISC}}} = \max _{s=0,\ldots ,s_{\max }} {\left\{ \begin{array}{ll} r_{{\mathrm {det}}(s)}, &{}{\text {if }} r_{{\mathrm {det}}}(s) \le \frac{1}{p_s} - 2,\\ \left( \frac{1}{p_0}-2 \right) \left( 1 + \frac{1}{r_{{\mathrm {det}}}(s)} \left( \frac{1}{p_0} - \frac{1}{p_s} \right) \right) ^{-1}, &{} {\text {otherwise,}} \end{array}\right. } \end{aligned}$$

where

$$\begin{aligned} r_{{\mathrm {det}}}(s) = \min \left\{ \frac{1}{\max _{i=1,\ldots ,D} \gamma _i d_i}, \frac{s}{\sum _{j=1}^D \gamma _j d_j} \right\} . \end{aligned}$$

Proof

In this proof, we use Theorem 4. Therefore, we need to estimate the p-summability of the weighted profits for some \(p<1\). To this end, we use Lemma 8. Observe that Lemma 8 provides a family of bounds for each \(\Delta E_{{{{\varvec{\alpha }}}},{{{\varvec{\beta }}}}}\), depending on s; therefore we would then ideally choose the best s for each \(\Delta E_{{{{\varvec{\alpha }}}},{{{\varvec{\beta }}}}}\). However, this optimization problem is too complex and we simplify it by assuming that

  • only two values of s will be considered, \(s=0\) and \(s=s^*\) (which will not necessarily coincide with \(s_{\max }\));

  • the optimization between \(s=0\) and \(s=s^*\) will not be carried out individually on each \(\Delta E_{{{{\varvec{\alpha }}}},{{{\varvec{\beta }}}}}\), but we will rather take a “convex combination” of the two corresponding estimates and choose the best outcome only at the end of the proof.

To this end, we first need to rewrite the result of the Lemma 8 in a more suitable form for any fixed \(s^* \in \{1,2,\ldots ,s_{\max }\}\); note that from here on, with a slight abuse of notation, we drop the superscript \(^*\) and simply use s to denote the fixed value. Thus, for such fixed s, consider the statement of Lemma 8, let \(C_E = \max \{C_{0}, C_{s}\}\), \(\chi _{j,s} = g_{s,j} \log _2 e\), and \(\theta _{j,s} = (g_{0,j} - g_{s,j}) \log _2 e\) and combine (32)–(34), obtaining

$$\begin{aligned} \Delta E_{{{{\varvec{\alpha }}}},{{{\varvec{\beta }}}}}&\le \min _{t=\{0,s\}} C_t \Delta E_{{{{\varvec{\alpha }}}}}^{{\mathrm{det}}}(t)\Delta E_{{{{\varvec{\beta }}}}}^{{\mathrm{stoc}}}(t)\\&\le C_E \min _{\eta \in \{0,1\}} 2^{-\eta {{\varvec{r}}}_{{\mathrm {FEM}}}(s{{\varvec{q}}}) \cdot {{{\varvec{\alpha }}}}} \prod _{j \ge 1} e^{-m(\beta _j-1) [g_{s,j} + (1-\eta )(g_{0,j}-g_{s,j})]} \\&= C_E \min _{\eta \in \{0,1\}} 2^{-\eta {{\varvec{r}}}_{{\mathrm {FEM}}}(s{{\varvec{q}}}) \cdot {{{\varvec{\alpha }}}}} \prod _{j\ge 1} 2^{-m(\beta _j-1) [g_{s,j} + (1-\eta )(g_{0,j}-g_{s,j})] \log _2 e} \\&= C_E \min _{\eta \in \{0,1\}} 2^{-\eta {{\varvec{r}}}_{{\mathrm {FEM}}}(s{{\varvec{q}}}) \cdot {{{\varvec{\alpha }}}}} \prod _{j\ge 1} 2^{-m(\beta _j-1) [\chi _{j,s} + (1-\eta )\theta _{j,s}] } \\&= C_E 2^{- \sum _{j\ge 1} m(\beta _j-1) \chi _{j,s}- \max _{\eta \in \{0,1\}}\left( \eta {{\varvec{r}}}_{{\mathrm {FEM}}}(s{{\varvec{q}}}) \cdot {{{\varvec{\alpha }}}}+\sum _{j\ge 1} m(\beta _j-1)(1-\eta )\theta _{j,s} \right) } \\&= C_E 2^{ - \sum _{j\ge 1} m(\beta _j-1) \chi _{j,s}- \max \left\{ \sum _{j\ge 1} m(\beta _j-1) \theta _{j,s},\,\,{{\varvec{r}}}_{{\mathrm {FEM}}}(s{{\varvec{q}}}) \cdot {{{\varvec{\alpha }}}}\right\} }. \end{aligned}$$

for an arbitrary \({{\varvec{q}}}\in {\mathbb {R}}_+^D\) with \(|{{\varvec{q}}}| = 1\) that we will choose later. We can now bound the weighted sum of the profits as follows:

(37)

We then investigate under what conditions each of the two factors is finite (the constants \(C_E,C_W\) are bounded, cf. Lemmas 5 and 8). Before proceeding, we comment on the equations above. As we already mentioned at the beginning of the proof, here we are working in a suboptimal setting in which instead of choosing a different s for each \(\Delta E_{{{{\varvec{\alpha }}}},{{{\varvec{\beta }}}}}\), we restrict ourselves to choosing between only two values, \(s=0\) or a certain \(s>0\) (second line). Observe that we have an equality between the second and the third lines since \(2^x\) is a monotone function of x. Hence, the minimum is always attained at either \(\lambda =0\) or \(\lambda =1\). However, when it comes to switching the order of the sum and the minimum in the fourth line, i.e., bounding the sum by choosing the same \(\lambda \) to bound every term in the sum, allowing for fractional \(\lambda \) gives a tighter bound on the overall sum than just considering \(\lambda \in \{0,1\}\). Roughly speaking, we are somehow “mimicking” the fact that the optimal bound of the sum would use a different value of \(\lambda \) for every term by choosing an overall \(\lambda \) that is between the two possible values.

To investigate the condition for which each of the two factors in (37) are bounded, we immediately have for the first factor for all \(i=1,\ldots , D\) that

$$\begin{aligned}&{p\left( \lambda r_{{\mathrm {FEM}}}(s{{\varvec{q}}})_i+\gamma _i d_i\right) -\gamma _i d_i}> 0 \nonumber \\&\quad \Longrightarrow \quad p > \frac{\gamma _i d_i}{\lambda r_{{\mathrm {FEM}}}(s{{\varvec{q}}})_i + \gamma _i d_i } = \left( \lambda \frac{r_{{\mathrm {FEM}}}(s{{\varvec{q}}})_i}{\gamma _i d_i} + 1\right) ^{-1}. \end{aligned}$$
(38)

Our goal is to optimize the above constraint for the summability exponent p. To this end, we will minimize the right-hand side of (38), observing that it decreases with respect to \(r_{{\mathrm {FEM}}}(s{{\varvec{q}}})_i\). Hence, recalling the dependence of \(r_{{\mathrm {FEM}}}(s{{\varvec{q}}})_i\) (cf. Lemma 8) on the vector of weights \({{\varvec{q}}}\), we consider

$$\begin{aligned} r_{{\mathrm {det}}}(s)&= \max _{{{\varvec{q}}}\in {\mathbb {R}}^D_+, |{{\varvec{q}}}|=1}\min _{i=1,\ldots ,D} \frac{r_{{\mathrm {FEM}}}(s{{\varvec{q}}})_i}{\gamma _i d_i} \\&= \max _{{{\varvec{q}}}\in {\mathbb {R}}^D_+, |{{\varvec{q}}}|=1}\min _{i=1,\ldots ,D} \min \left\{ \frac{1}{\gamma _i d_i}, \frac{s q_i}{\gamma _i d_i} \right\} \\&= \max _{{{\varvec{q}}}\in {\mathbb {R}}^D_+, |{{\varvec{q}}}|=1}\min \left( \frac{1}{\max _{i=1,\ldots ,D}\gamma _i d_i}, \min _{i=1,\ldots ,D}\frac{s q_i}{\gamma _i d_i} \right) \\&= \min \left( \frac{1}{\max _{i=1,\ldots ,D} \gamma _i d_i}, \max _{{{\varvec{q}}}\in {\mathbb {R}}^D_+, |{{\varvec{q}}}|=1} \min _{i=1,\ldots ,D} \frac{s q_i}{\gamma _i d_i} \right) \end{aligned}$$

and observe that the second argument of the min is maximized with respect to \({{\varvec{q}}}\) by making \(\frac{s q_i}{\gamma _i d_i}\) constant over i, i.e.,

$$\begin{aligned} q_i = \frac{\gamma _i d_i }{\sum _{j=1}^D \gamma _j d_j} \Rightarrow r_{{\mathrm {det}}}(s) = \min \left\{ \frac{1}{\max _{i=1,\ldots ,D} \gamma _i d_i}, \frac{s}{\sum _{j=1}^D \gamma _j d_j} \right\} . \end{aligned}$$

With this optimal choice, then (38) becomes simply

$$\begin{aligned} p > \left( \lambda r_{{\mathrm {det}}}(s) + 1\right) ^{-1}. \end{aligned}$$
(39)

For the second factor, denoting the generic term of the inner sum as \(a_{j,k}\) for brevity and observing that \(a_{j,1} = 1\) for every j, we have

$$\begin{aligned} \prod _{j=1}^\infty \sum _{k=1}^\infty a_{j,k}&\le \prod _{j=1}^\infty \left( 1 + \sum _{k=2}^\infty a_{j,k} \right) = \exp \left( \sum _{j=1}^\infty \log \left( 1 + \sum _{k=2}^\infty a_{j,k} \right) \right) \\&\le \exp \left( \sum _{j=1}^\infty \sum _{k=2}^\infty a_{j,k}\right) . \end{aligned}$$

We thus only have to discuss the convergence of the sum

$$\begin{aligned}&\sum _{j=1}^\infty \sum _{k=2}^\infty 2^{-pm(k-1)[(1-\lambda ) \theta _{s,j} + \chi _{s,j}] +(1-p)(k-1)}\nonumber \\&\quad = \sum _{j=1}^\infty \sum _{k=1}^\infty 2^{-pm(k)[(1-\lambda ) \theta _{s,j} + \chi _{s,j}] +(1-p)k}\nonumber \\&\quad \le \sum _{j=1}^\infty \sum _{k=1}^\infty 2^{-p 2^{k-1}[(1-\lambda ) \theta _{s,j} + \chi _{s,j}] +(1-p)k}, \end{aligned}$$
(40)

where the last step is a consequence of the fact that, for Clenshaw–Curtis points, \(m(k) \ge 2^{k-1}\) for \(k \ge 1\), cf. (20). Moreover, \((1-\lambda ) \theta _{s,j} + \chi _{s,j} \ge 0\). To study the summability of (40), we want to use Lemma 9 to bound the inner sum in (40). First, we rewrite

$$\begin{aligned}&\sum _{k=1}^\infty 2^{-p 2^{k-1}[(1-\lambda ) \theta _{s,j} + \chi _{s,j}] +(1-p)k} \\&\quad = \sum _{k=1}^\infty \exp \left( -p \frac{\log 2}{2}[(1-\lambda ) \theta _{s,j} + \chi _{s,j}] 2^k +(1-p)k \log 2 \right) \\&\quad = \sum _{k=1}^\infty \exp \left( -a2^k +c k \right) \\&\quad \text { with } a = p \frac{\log 2}{2}[(1-\lambda ) \theta _{s,j} + \chi _{s,j}]> 0, \quad c=(1-p)\log 2 > 0, \end{aligned}$$

where we have used the notation in Lemma 9. Note that this lemma holds true under the assumptions that \(a>0\) and \( 0 \le c < 2a\), where the latter has to be verified as follows

$$\begin{aligned}&2a> c \Leftrightarrow p \log 2[(1-\lambda ) \theta _{s,j} + \chi _{s,j}]> (1-p)\log 2 \\&\quad \Leftrightarrow (1-\lambda ) \theta _{s,j} + \chi _{s,j} > \frac{(1-p)}{p}, \end{aligned}$$

which is true whenever

$$\begin{aligned} \chi _{s,j} > r_{{\mathrm{det}}}(s), \end{aligned}$$

due to (39), \(\theta _{s,j} \ge 0\) and \(\lambda \le 1\). Define \(\overline{{\mathcal {J}}} = \{ j \ge 1 \,:\, \chi _{s,j} \le r_{{\mathrm{det}}}(s)\}\) which has a finite cardinality since \(\chi _{s,j} \rightarrow \infty \) as \(j \rightarrow \infty \). Resuming from (40), we have, due to Lemma 9,

$$\begin{aligned} \sum _{j=1}^\infty \sum _{k=1}^\infty 2^{-p 2^{k-1}[(1-\lambda ) \theta _{s,j} + \chi _{s,j}] +(1-p)k}&\le C(\overline{{\mathcal {J}}}) + \sum _{j \notin \overline{{\mathcal {J}}}} \sum _{k=1}^\infty \exp \left( -a2^k +c k \right) \\&\le C(\overline{{\mathcal {J}}}) + \sum _{j \notin \overline{{\mathcal {J}}}} e^{-2a + \varepsilon (a,2,c)}, \end{aligned}$$

where \(C(\overline{{\mathcal {J}}})\) is bounded, since \(\# \overline{{\mathcal {J}}} < \infty \), and \(\varepsilon (a,2,c)\) is a monotonically decreasing function with limit \(c = (1-p) \log 2\) independent of j. Therefore, the previous series converges if and only if

$$\begin{aligned} \sum _{j \notin \overline{{\mathcal {J}}}}^\infty e^{- 2a} = \sum _{j \notin \overline{{\mathcal {J}}}}^\infty e^{-p \log 2[(1-\lambda ) \theta _{s,j} + \chi _{s,j}]} = \sum _{j \notin \overline{{\mathcal {J}}}}^\infty 2^{-p [(1-\lambda ) \theta _{s,j} + \chi _{s,j}]} \end{aligned}$$

converges. Inserting the expression of \(\theta _{s,j}\) and \(\chi _{s,j}\), we get

$$\begin{aligned} \sum _{j \notin \overline{{\mathcal {J}}}}^\infty 2^{-p [(1-\lambda ) \theta _{s,j} + \chi _{s,j}]}&= \sum _{j \notin \overline{{\mathcal {J}}}}^\infty 2^{-p [(1-\lambda ) (g_{0,j}-g_{s,j}) + g_{s,j}] \log _2 e}\\&= \sum _{j \notin \overline{{\mathcal {J}}}}^\infty e^{-p [(1-\lambda ) (g_{0,j}-g_{s,j}) + g_{s,j}]} \\&= \sum _{j \notin \overline{{\mathcal {J}}}}^\infty e^{-p (1-\lambda ) g_{0,j}}e^{ - p \lambda g_{s,j}}. \end{aligned}$$

After applying the Hölder inequality in the previous summation with exponents \(\eta _1^{-1}+\eta _2^{-1}=1\), we need to simultaneously ensure the boundedness of the following sums:

$$\begin{aligned} \sum _{j \notin \overline{{\mathcal {J}}}}^\infty e^{-p (1-\lambda ) g_{0,j} \eta _2} \quad {\text {and}}\quad \sum _{j \notin \overline{{\mathcal {J}}}}^\infty e^{ - p \lambda g_{s,j} \eta _1}. \end{aligned}$$

Recalling the summability result in Lemma 7, we understand that the following two conditions must hold:

$$\begin{aligned} {\left\{ \begin{array}{ll} \displaystyle p(1-\lambda )\eta _2 \ge \frac{p_0}{1-p_0} \\ \displaystyle p\lambda \eta _1 \ge \frac{p_s}{1-p_s} \end{array}\right. } \Rightarrow {\left\{ \begin{array}{ll} \displaystyle p \ge \frac{p_0}{1-p_0} \frac{1}{1-\lambda } \frac{1}{\eta _2}\\ \displaystyle p \ge \frac{p_s}{1-p_s}\frac{1}{\lambda }\left( 1 - \frac{1}{\eta _2} \right) , \end{array}\right. } \end{aligned}$$

which closes the discussion of the summability of the second factor of (37) for a fixed s. Recalling the constraint (39) coming from the first factor of (37), we finally have to solve the following optimization problem:

$$\begin{aligned} p >\min _{ \lambda \in [0,1], 1\le \eta _2}\max \left\{ \left( \lambda r_{{\mathrm {det}}}(s) + 1\right) ^{-1}, \frac{p_s}{1-p_s}\frac{1}{\lambda }\left( 1 - \frac{1}{\eta _2} \right) , \frac{p_0}{1-p_0} \frac{1}{1-\lambda } \frac{1}{\eta _2} \right\} \end{aligned}$$
Fig. 3
figure 3

Illustration of the optimization problem (41). As observed in the proof, \(f_{1}\) is decreasing with \(\lambda \) while \(f_2\) is increasing with \(\lambda \). Left case 1 of the proof: the minmax point is \(\lambda =1\); Right case 2 of the proof: the minmax point is \(\lambda <1\)

i.e., we have to choose \(\eta _2\) and \(\lambda \) to minimize the lower bound on p. We first optimally select \(\eta _2\) given \(\lambda \), i.e., we take \(\eta _2 = \eta _2^*\) such that

$$\begin{aligned} \frac{p_s}{1-p_s}\frac{1}{\lambda }\left( 1 - \frac{1}{\eta _2^*} \right) = \frac{p_0}{1-p_0} \frac{1}{1-\lambda } \frac{1}{\eta _2^*} \,\, \Rightarrow \,\, \eta _2^* = 1+\frac{1-p_s}{p_s} \frac{p_0}{1-p_0} \frac{\lambda }{1-\lambda } \end{aligned}$$

Substituting back, we obtain

$$\begin{aligned} \frac{p_s}{1-p_s}\frac{1}{\lambda } \frac{\eta _2^*-1}{\eta _2^*}&= \left( {\frac{1}{p_0} - 1 + \lambda \left( \frac{1}{p_s} - \frac{1}{p_0} \right) } \right) ^{-1}, \end{aligned}$$

so that the minimization problem reads

$$\begin{aligned}&p >\min _{\lambda \in [0,1]} \max \left\{ f_{1}(\lambda ,s),f_2(\lambda , s) \right\} , \nonumber \\&f_{1}(\lambda ,s) = \left( \lambda r_{{\mathrm {det}}}(s) + 1\right) ^{-1}, \quad f_2(\lambda , s) = \left( {\frac{1}{p_0} - 1 + \lambda \left( \frac{1}{p_s} - \frac{1}{p_0} \right) } \right) ^{-1}. \end{aligned}$$
(41)

Now, we recall that \(p_0 \le p_s\). Hence, \(f_2(\lambda , s)\) is increasing with \(\lambda \). Conversely, \(f_{1}(\lambda ,s)\) is decreasing with \(\lambda \) since \(r_{{\mathrm {det}}}(s)\) is a positive number. Furthermore, notice that we cannot have \(f_{1}(\lambda ,s)<f_2(\lambda ,s)\) for all \(\lambda \in [0,1]\). Indeed, the previous condition is equivalent to \(f_{1}(0, s) \le f_2(0, s)\), i.e., \(1 \le \frac{p_0}{1-p_0} \Rightarrow p_0 \ge \frac{1}{2}\), which does not satisfy Assumption A2. Note that, in this case, the lower bound for p in (41) is minimized for \(\lambda =0\), implying that \(p> \frac{p_0}{1-p_0} > 1\), i.e., the method does not converge (cf. the statement of Theorem 4). Thus, we have only two cases (see also Fig. 3):

Case 1 :

\(f_{1}(\lambda ,s)>f_2(\lambda , s)\) for all \(\lambda \in [0,1]\), which means that the convergence of the method is dictated by the spatial discretization. Given that \(f_{1}\) is decreasing and \(f_2\) is increasing, the previous condition is equivalent to \(f_1(1, s) \ge f_2(1, s)\), i.e., \( r_{{\mathrm {det}}}(s) \le \frac{1}{p_s} - 2\). In this case, the lower bound (41) is minimized for \(\lambda =1\), and we have \(p > \left( r_{{\mathrm {det}}}(s) + 1\right) ^{-1} \).

Case 2 :

There exists \(\lambda ^* \in (0,1)\) such that \(f_{1}(\lambda ,s)=f_2(\lambda , s)\). This condition is equivalent to the two conditions

$$\begin{aligned} {\left\{ \begin{array}{ll} f_{1}(0,s) \ge f_2(0, s) \\ f_{1}(1,s) \le f_2(1, s) \end{array}\right. } \Rightarrow {\left\{ \begin{array}{ll} 1 \le \frac{1}{p_0} - 1 \\ r_{{\mathrm {det}}}(s) \ge \frac{1}{p_s} - 2. \end{array}\right. } \end{aligned}$$

Solving for \(\lambda ^*\) yields

$$\begin{aligned} \lambda ^*&= \frac{\frac{1}{p_0}-2}{r_{{\mathrm {det}}}(s) + \frac{1}{p_0} - \frac{1}{p_s}} > 0 \end{aligned}$$

which yields \(p > \overline{p}\), where

$$\begin{aligned} \begin{aligned} \overline{p}&= \left( \left( \frac{\frac{1}{p_0}-2}{r_{{\mathrm {det}}}(s) + \frac{1}{p_0} - \frac{1}{p_s}}\right) r_{{\mathrm {det}}}(s) + 1\right) ^{-1}\\&= \left( 1 + \left( \frac{1}{p_0}-2 \right) \left( 1 + \frac{1}{r_{{\mathrm {det}}}(s)} \left( \frac{1}{p_0} - \frac{1}{p_s} \right) \right) ^{-1}\right) ^{-1}. \end{aligned} \end{aligned}$$

Since the previous computations were carried out for a fixed s, we can take the minimum over all possible values of s. Then, we can apply Theorem 4 and derive the convergence estimate,

$$\begin{aligned} \big | {\mathbb {E}}\left[ F\right] - {\mathscr {M}}_{\mathcal {I}}[F] \big | \le C_P(p){\mathrm {Work}}\left[ {\mathscr {M}}_{{\mathcal {I}}}\right] ^{1-1/p}, \end{aligned}$$

where \(1-1/p = 1 - \max _{s=0,\ldots , s_{\max }}(1/{\overline{p}}) + \delta \) for any \(\delta > 0\), which we reformulate as

$$\begin{aligned} \big | {\mathbb {E}}\left[ F\right] - {\mathscr {M}}_{\mathcal {I}}[F] \big | \le C_P\left( \frac{1}{1+r_{{\mathrm {MISC}}} - \delta }\right) {\mathrm {Work}}\left[ {\mathscr {M}}_{{\mathcal {I}}}\right] ^{-r_{{\mathrm {MISC}}} + \delta }, \end{aligned}$$

with \(r_{{\mathrm {MISC}}} = \max _{s=0,\ldots , s_{\max }}({1}/{\overline{p}}) -1\). \(\square \)

5 Analysis of Example 1

In this section, we determine the value of \(s_{\max }\) and the sequence \(\{p_s\}_{s=0}^{s_{\max }}\) for Example 1. Since we will work with localized quantities of interest far from the boundary, cf. equation (45) written below, we believe that the effect of the boundary is negligible and the regularity \(s_{\max }\) is mainly limited by the summability properties of \(\kappa \). See “Appendix 2” for a slightly modified problem on the same domain where we can prove that the regularity is only limited by the summability properties of \(\kappa \). Let us define the following family of auxiliary functions,

$$\begin{aligned} \Upsilon _{{{\varvec{k}}}, {{\varvec{\ell }}}}({{\varvec{x}}}) = \prod _{i=1}^d\left( \cos \left( {\pi } k_i x_i \right) \right) ^{\ell _i} \left( \sin \left( {\pi } k_i x_i \right) \right) ^{1-\ell _i}, \end{aligned}$$

so that \(\kappa \) from (8) can be written as

$$\begin{aligned} \kappa ({{\varvec{x}}}, {{\varvec{y}}} )&= \sum _{{{\varvec{k}}} \in {\mathbb {N}}^d} A_{{{\varvec{k}}}} \sum _{{{\varvec{\ell }}} \in \{0,1\}^d} y_{{{\varvec{k}}},{{\varvec{\ell }}} } \Upsilon _{{{\varvec{k}}}, {{\varvec{\ell }}}} ({{\varvec{x}}})\\&= \sum _{j=0}^\infty \sum _{\left\{ {{\varvec{k}}} \in {\mathbb {N}}^d\,:\, |{{\varvec{k}}}| = j\right\} } A_{{{\varvec{k}}}} \sum _{{{\varvec{\ell }}} \in \{0,1\}^d} y_{{{\varvec{k}}},{{\varvec{\ell }}} } \Upsilon _{{{\varvec{k}}}, {{\varvec{\ell }}}} ({{\varvec{x}}}). \end{aligned}$$

Based on this expression, for \(s \ge 0\), we analyze the summability of \(\left\{ A_{{{\varvec{k}}}} \Vert D^{{{\varvec{s}}}}\Upsilon _{{{\varvec{k}}}, {{\varvec{\ell }}}}\Vert _{L^\infty ({\mathscr {B}})} \right\} \) for \(|{{\varvec{s}}}| = s\) to determine the permissible values of \(p_s\). First, for \(|{{\varvec{s}}}| = s\), observe that for a constant c independent of \({{\varvec{k}}}\) we have

$$\begin{aligned} \Vert D^{{{\varvec{s}}}} \Upsilon _{{{\varvec{k}}}, {{\varvec{\ell }}}} ({{\varvec{x}}})\Vert _{L^{\infty }({\mathscr {B}})} = \prod _{j=1}^d\left( {\pi }k_j\right) ^{s_j} \le c |{{\varvec{k}}}|^s. \end{aligned}$$

Then, for all \(s \ge 0\), we have

$$\begin{aligned} \begin{aligned}&\sum _{j=0}^\infty \sum _{\left\{ {{\varvec{k}}} \in {\mathbb {N}}^d\,:\, |{{\varvec{k}}}| = j\right\} }\sum _{{{\varvec{\ell }}} \in \{0,1\}^d} A_{{{\varvec{k}}}}^{p_s} \Vert D^{{{\varvec{s}}}}\Upsilon _{{{\varvec{k}}}, {{\varvec{\ell }}}}\Vert _{L^\infty ({\mathscr {B}})}^{p_s} \\&\quad \le c 2^{d} \sum _{j=0}^\infty \sum _{\left\{ {{\varvec{k}}} \in {\mathbb {N}}^d\,:\, |{{\varvec{k}}}| = j\right\} } 2^{p_s {\frac{|{{\varvec{k}}}|_0}{2}}} |{{\varvec{k}}}|^{p_s s} (1 + |{{\varvec{k}}}|^2)^{-\frac{p_s\left( \nu +\frac{d}{2}\right) }{2}} \\&\quad \le c 2^{d} + c 2^{d+p_s\frac{d}{2}}\sum _{j=1}^\infty \sum _{\left\{ {{\varvec{k}}} \in {\mathbb {N}}^d\,:\, |{{\varvec{k}}}| = j\right\} } j^{-p_s\left( \nu +\frac{d}{2} - s\right) } \\&\quad = c 2^{d} + c \frac{2^{d+p_s\frac{d}{2}}}{(d-1)!} \sum _{j=1}^\infty j^{-p_s\left( \nu +\frac{d}{2} - s\right) } \prod _{i=1}^{d-1} (j+i) \\&\quad = c2^{d} + c \frac{2^{d+p_s\frac{d}{2}}}{(d-1)!} \sum _{j=1}^\infty j^{-p_s\left( \nu +\frac{d}{2} - s\right) + d-1} \left( 1+\frac{d-1}{j}\right) ^{d-1} \\&\quad \le c2^{d} + \frac{c2^{d+p_s\frac{d}{2}} d^{d-1}}{(d-1)!} \sum _{j=1}^\infty j^{-p_s\left( \nu +\frac{d}{2} - s\right) + d-1}. \end{aligned} \end{aligned}$$

From here, we obtain the bound

$$\begin{aligned} p_s > \left( \frac{\nu }{d}+\frac{1}{2} - \frac{s}{d}\right) ^{-1}, \end{aligned}$$
(42)

for all \(s \ge 0\). Moreover, imposing \(p_0 < \frac{1}{2}\) and \(p_{s_{\max }} < \frac{1}{2}\) gives the bounds

$$\begin{aligned} \nu > \frac{3d}{2} \qquad {\text { and }}\ \qquad s_{\max } < \nu - \frac{3d}{2}, \end{aligned}$$
(43)

respectively. Since \(\Upsilon _{{{\varvec{k}}}, {{\varvec{\ell }}}} \in C^{\infty }({\mathscr {B}})\), the bounds in (43) are the only bounds on the value of \(s_{\max }\). To determine an upper bound on the value of \(r_{{\mathrm{MISC}}}\), up to a small \(\delta \), we set \(\gamma _1=\cdots =\gamma _D=\gamma \) (motivated by the fact that all subdomains \({\mathscr {B}}_i\) are equal), we substitute \(d_i=1\) for \(i=1,\ldots ,d\) and the lower bound of \( p_s\) in Theorem 10 and obtain after simplifying

$$\begin{aligned} \begin{aligned} r_{{\mathrm {MISC}}}&= \max _{s=0,\ldots ,s_{\max }} {\left\{ \begin{array}{ll} \frac{\min (1, \frac{s}{d})}{\gamma } &{} {\text { if }} \frac{\min (1, \frac{s}{d})}{\gamma } \le \frac{\nu }{d} - \frac{3}{2} -\frac{s}{d},\\ \left( \frac{\nu }{d} - \frac{3}{2} \right) \left( 1 + \frac{\gamma }{\min (1, \frac{s}{d})}\frac{s}{d} \right) ^{-1} &{} {\text { if }} \frac{\min (1, \frac{s}{d})}{\gamma } \ge \frac{\nu }{d} - \frac{3}{2} -\frac{s}{d}, \end{array}\right. }\\&=\max _{s=0,\ldots ,s_{\max }} {\left\{ \begin{array}{ll} \frac{s}{d \gamma } &{} {\text { if }} \frac{s}{d \gamma } \le \frac{\nu }{d} - \frac{3}{2} -\frac{s}{d} {\text { and }} s \le d,\\ \left( \frac{\nu }{d} - \frac{3}{2} \right) \left( 1 + \gamma \right) ^{-1} &{} {\text { if }} \frac{s}{d \gamma } \ge \frac{\nu }{d} - \frac{3}{2} -\frac{s}{d} {\text { and }} s \le d,\\ \frac{1}{\gamma } &{} {\text { if }} \frac{1}{\gamma } \le \frac{\nu }{d} - \frac{3}{2} -\frac{s}{d} {\text { and }} s \ge d, \\ \left( \frac{\nu }{d} - \frac{3}{2} \right) \left( 1 + \frac{s \gamma }{d} \right) ^{-1} &{}{\text { if }} \frac{1}{\gamma } \ge \frac{\nu }{d} - \frac{3}{2} -\frac{s}{d} {\text { and }} s \ge d. \end{array}\right. } \end{aligned} \end{aligned}$$

Before continuing, we discuss the four branches of the previous expression. If \(s_{\max } \le d\), then only the first two branches are valid. Since the rates in these two branches increase with s, the maximum is achieved for \(s=s_{\max }\). If \(s_{\max } \ge d\), then, since the rates in the third and fourth branches decrease with s, the maximum is achieved for \(s=d\). Hence

$$\begin{aligned} \begin{aligned} r_{{\mathrm {MISC}}}&= {\left\{ \begin{array}{ll} \frac{s_{\max }}{d \gamma } &{} {\text { if }} \frac{s_{\max } (1+\gamma )}{d \gamma } \le \frac{\nu }{d} - \frac{3}{2} {\text { and }} s_{\max } \le d,\\ \left( \frac{\nu }{d} - \frac{3}{2} \right) \left( 1 + \gamma \right) ^{-1} &{} {\text { if }} \frac{s_{\max } (1+\gamma )}{d \gamma } \ge \frac{\nu }{d} - \frac{3}{2} {\text { and }} s_{\max } \le d, \\ \frac{1}{\gamma } &{} {\text { if }} \frac{1}{\gamma } \le \frac{\nu }{d} - \frac{3}{2} -1{\text { and }} s_{\max } \ge d, \\ \left( \frac{\nu }{d} - \frac{3}{2} \right) \left( 1 + \gamma \right) ^{-1} &{} {\text { if }} \frac{1}{\gamma } \ge \frac{\nu }{d} - \frac{3}{2} -1 {\text { and }} s_{\max } \ge d, \end{array}\right. }\\&= {\left\{ \begin{array}{ll} \frac{1}{\gamma }\left( \frac{\nu }{d} - \frac{3}{2}\right) &{}{\text { if }} \frac{1}{\gamma } \le 0 {\text { and }} \frac{\nu }{d} \le \frac{5}{2},\\ \left( \frac{\nu }{d} - \frac{3}{2} \right) \left( 1 + \gamma \right) ^{-1} &{}{\text { if }} \frac{1}{\gamma } \ge 0 {\text { and }} \frac{\nu }{d} \le \frac{5}{2}, \\ \frac{1}{\gamma } &{}{\text { if }} \ \frac{1}{\gamma } \le \frac{\nu }{d} - \frac{5}{2} {\text { and }} \frac{\nu }{d} \ge \frac{5}{2}, \\ \left( \frac{\nu }{d} - \frac{3}{2} \right) \left( 1 + \gamma \right) ^{-1} &{}{\text { if }} \frac{1}{\gamma } \ge \frac{\nu }{d} - \frac{5}{2} {\text { and }} \frac{\nu }{d} \ge \frac{5}{2}, \end{array}\right. }\\&= {\left\{ \begin{array}{ll} \gamma ^{-1} &{} {\text { if }} \ \frac{\nu }{d} \ge \frac{1}{\gamma } + \frac{5}{2}, \\ \left( \frac{\nu }{d} - \frac{3}{2} \right) \left( 1 + \gamma \right) ^{-1} &{} {\text { if }} \frac{\nu }{d} \le \frac{1}{\gamma } + \frac{5}{2}. \end{array}\right. } \end{aligned} \end{aligned}$$

In Fig. 4, we plot the upper bound of the rate of MISC work complexity, \(r_{{\mathrm{MISC}}}\), based on Theorem 10 and the following analysis variants:

  • Theory This is based on the summability properties of \(\left\{ A_{{{\varvec{k}}}} \Vert D^{{{\varvec{s}}}}\Upsilon _{{{\varvec{k}}}, {{\varvec{\ell }}}}\Vert _{L^\infty ({\mathscr {B}})} \right\} \). We also use the value \(r_{{\mathrm{FEM}},i}(s) = 2 \min \left( 1, \frac{s}{d}\right) \) for all \(i=1,\ldots ,d\). This is motivated by the fact that we expect to double the convergence rate of the underlying FEM method since we are considering convergence of a smooth linear functional of the solution.

  • Square summability Motivated by the arguments in Lemma 15 in the appendix, we believe that our results may be improved by instead considering the summability properties of \(\left\{ A_{{{\varvec{k}}}}^2 \Vert D^{{{\varvec{s}}}}\Upsilon _{{{\varvec{k}}}, {{\varvec{\ell }}}}\Vert _{L^\infty ({\mathscr {B}})}^2 \right\} \) for \(|{{\varvec{s}}}| \le s\). Similar calculations yield the bounds

    $$\begin{aligned} p_s > \left( \frac{2\nu }{d}+1 - \frac{2s}{d}\right) ^{-1}, \end{aligned}$$
    (44)

    and the corresponding conditions, \(\nu > \frac{d}{2}\) and \(s_{\max } < \nu - \frac{d}{2}\).

  • Improved As mentioned in Remark 4, we could in principle make our results sharper by taking \(\tilde{p}_s=p_s\) instead of \(\tilde{p}_s=\frac{p_s}{1-p_s}\). The modifications of Theorem 10 to account for these rates are straightforward. Moreover, when considering square summability, the conditions become \(\nu > 0\) and \(s_{\max } < \nu \).

We also include in Fig. 4 the observed convergence rates of MISC when applied to the example with different values of \(\nu \), as discussed below in Sect. 6, and the observed convergence rate of MISC when applied to the same problem with no random variables and a constant diffusion coefficient, a. In the latter case, MISC reduces to a deterministic combination technique [7]. Note that the rate of MISC with no random variables is an upper bound for the convergence rate of MISC with any \(\nu >0\).

From this figure, we can clearly see that the predicted rates in our theory are pessimistic when compared to the observed rates and that the suggested analysis of using the square summability or using the improved rates, \(\tilde{p}_s\), might yield sharper bounds for the predicted work rates. On the other hand, we know from our previous work [22, Theorem 1] that the work degrades with increasing \(d\) with a log factor and in fact the expected work rate for maximum regularity when the number of random variables is finite is of \({{\mathcal {O}}\left( W_{\max }^{-2} \log (W_{\max })^{d-1} \right) }\). This can be seen Fig. 4 and \(d=3\), since in this case the observed work rate seems to be converging to a value less that 2.

Fig. 4
figure 4

The upper bound of the MISC rate, \(r_{{\mathrm{MISC}}}\), as predicted in Theorem 10 versus the observed rates when running the example detailed in Sect. 6. Refer to Sect. 5 for an explanation of the different curves. Also included are the observed convergence rates for a few values of \(\nu \) and the observed convergence rate of MISC with no random variable and constant diffusion coefficient, a, as a horizontal line. The latter is referred to as the “deterministic problem” and shows more clearly the effect of the logarithmic factor in the work for \(d>1\), as shown in Fig. 9 and proved in [22, Theorem 1]

6 Numerical Experiments

We now verify the effectiveness of the MISC approximation on some instances of the general elliptic equation (1), as well as the validity of the convergence analysis detailed in the previous sections. In particular, we consider the domain \({\mathcal {B}} = [0,1]^d\) and the family of random diffusion coefficients specified in Example 1. In more detail, we consider a problem with one physical dimension (\(d=1\)) and another with three dimensions (\(d=3\)); in both cases, we set \(\varsigma ({{\varvec{x}}})=1\), and model the diffusion coefficient by the expansion (8) with different values of \(\nu \). Finally, the quantity of interest is a local average defined as

$$\begin{aligned} F({{\varvec{y}}}) = \frac{10}{(\sigma \sqrt{2\pi })^d} \int _{\mathscr {B}} u({{\varvec{x}}},{{\varvec{y}}})\exp \left( -\frac{\Vert {{\varvec{x}}}-{{\varvec{x}}}_0\Vert _2^2}{2\sigma ^2} \right) d{{\varvec{x}}}\end{aligned}$$
(45)

with \(\sigma =0.2\) and location \({{\varvec{x}}}_0=0.3\) for \(d=1\) and \({{\varvec{x}}}_0= [0.3, 0.2,0.6]\) for \(d=3\). The deterministic problems are discretized with a second-order centered finite differences scheme for which we expect to recover the same rate in the numerical experiments that we would obtain with piece-wise multi-linear finite elements on a structured mesh. We choose the mesh sizes in (17) and \(h_{0,i}=1/3\) for all \(i=1,\ldots ,d\), and the resulting linear system is solved with GMRES with sufficient accuracy. With these values and using the coarsest discretization, \(h_0=1/3\), in all dimensions, the coefficient of variation of the quantity of interest can be approximated to be between \(90\%\) and \(110\%\) depending on the number of dimensions, \(d\), and the particular value of the parameter, \(\nu \), that we consider below. Finally, the quadrature points on the stochastic domain are the already-mentioned Clenshaw–Curtis points (see eq. (19) and (20)).

In the plots below, the computational work is compared in terms of the total number of degrees of freedom to avoid discrepancies in running time due to implementation details, i.e., using (27) and Assumption A3. Moreover, we set \(\gamma =1\) in (30), which is motivated by the fact that, for the tolerances we are interested in, we estimate that the cost of solving a linear system with GMRES is linear with respect to the number of degrees of freedom.

In order to evaluate the MISC estimator, we need to build the index set (29). To do that, we must be able to evaluate two quantities for every \({{{\varvec{\alpha }}}}\) and \({{{\varvec{\beta }}}}\): the work contribution, \(\Delta W_{{{{\varvec{\alpha }}}}, {{{\varvec{\beta }}}}}\), and the error contribution, \(\Delta E_{{{{\varvec{\alpha }}}}, {{{\varvec{\beta }}}}}\). Evaluating the work contribution is straightforward thanks to Assumption A3 and using \(\gamma =1\). On the other hand, evaluating the error contribution is more involved. We look at two options:

  • “brute-force” evaluation We compute \({{\varvec{\Delta }}}[F_{{{{\varvec{\alpha }}}},{{{\varvec{\beta }}}}}]\) for all \(({{{\varvec{\alpha }}}}, {{{\varvec{\beta }}}})\) within some “universe” index set and set \(\Delta E_{{{{\varvec{\alpha }}}}, {{{\varvec{\beta }}}}} = \left| {{\varvec{\Delta }}}[F_{{{{\varvec{\alpha }}}},{{{\varvec{\beta }}}}}] \right| \). Notice that this method is not practical since the cost of constructing the set, \({\mathcal {I}}\), would far dominate the cost of the MISC estimator. However, within some “universe” index set, this method would produce the best possible convergence and serve as a benchmark for other MISC sets within that universe.

  • “a-priori” evaluation We use Lemma 8 to bound \(\Delta E_{{{{\varvec{\alpha }}}}, {{{\varvec{\beta }}}}}\). Using these bounds instead of exact values produces quasi-optimal index sets (cf. [3, 33] ). This method in turn requires the estimation of the parameters \(r_{{\mathrm{FEM}}}, \left\{ g_{s, j}\right\} _{j \ge 1}\) for all \(s=0,\ldots ,s_{\max }\). Since we use a second-order centered finite differences scheme and consider the convergence of a quantity of interest, we expect \(r_{{\mathrm{FEM}}}=2 \min \left( 1, \frac{\nu }{d}\right) \) as motivated by the “improved” analysis in the previous section and considering the summability properties of \(\left\{ A_{{{\varvec{k}}}}^2 \Vert D^{{{\varvec{s}}}}\Upsilon _{{{\varvec{k}}}, {{\varvec{\ell }}}}\Vert _{L^\infty ({\mathscr {B}})}^2 \right\} \). This can also be validated numerically in the usual way by fixing all random variables to their expected value and checking the decay of \(\Delta E_{{{{\varvec{\alpha }}}}, \mathbf{1}}\) with respect to \({{{\varvec{\alpha }}}}\).

    On the other hand, estimating \(\left\{ g_{s, j}\right\} _{j \ge 1}\) for \(s=0,\ldots ,s_{\max }\) is more difficult since, in principle, we do not know a priori, for a given \({{{\varvec{\alpha }}}}\) and \({{{\varvec{\beta }}}}\), which value of \(s \in \{0,1,\ldots ,s_{\max }\}\) yields the smallest estimate of \(\Delta E_{{{{\varvec{\alpha }}}}, {{{\varvec{\beta }}}}}\). Instead, we use a “simplified” model that was used in [22]:

    $$\begin{aligned} \Delta E_{{{{\varvec{\alpha }}}},{{{\varvec{\beta }}}}} \le C e^{-\sum _{j\ge 1} m(\beta _j-1)\tilde{g}_{j}} 2^{-|{{{\varvec{\alpha }}}}|r_{{\mathrm {FEM}}}}, \end{aligned}$$
    (46)

    where \(\tilde{g}_j\) is some unknown function of \(g_{s, j}\) for all \(s = 0,1,\ldots , s_{\max }\). \(\tilde{g}_j\) can be estimated given \(r_{{\mathrm{FEM}}}\) and a set of evaluations of \(\left| {{\varvec{\Delta }}}[F_{{{{\varvec{\alpha }}}},{{{\varvec{\beta }}}}}] \right| \) for some \(({{{\varvec{\alpha }}}}, {{{\varvec{\beta }}}}) \in {\mathcal {I}}^*\) by solving a least-squares problem to fit the linear model

    $$\begin{aligned} \sum _{j\ge 1} \tilde{g}_j m(\beta _j-1) = -\log \left( \left| {{\varvec{\Delta }}}[F_{{{{\varvec{\alpha }}}},{{{\varvec{\beta }}}}}] \right| \right) - |{{{\varvec{\alpha }}}}| r_{{\mathrm{FEM}}},\qquad {\text {for all }} ({{{\varvec{\alpha }}}}, {{{\varvec{\beta }}}}) \in {\mathcal {I}}^*.\end{aligned}$$

    For our example, these rates are plotted in Figs. 5a and 6a for \(d=1\) and \(d=3\), respectively. In our current implementation, the construction of the optimal MISC set, \({\mathcal {I}}\), is separate from the set \({\mathcal {I}}^*\). However, it is possible in principle to construct an algorithm in which the optimal MISC set, \({\mathcal {I}}\), is constructed iteratively by alternating between estimating rates given a set of indices and evaluating the MISC estimator.

Fig. 5
figure 5

Example 1, \(d=1\) and \(\nu =2.5\). a A plot of the estimated stochastic rates, \(\tilde{g}_{j}\), that are used in (46). Different markers correspond to different modes multiplying the same value of \(A_{{{\varvec{k}}}}\). bd solid lines show the computed approximations of \(\Delta E_{\mathbf{1}, \mathbf{1} + j \tilde{{{{\varvec{\beta }}}}}}, \Delta E_{\mathbf{1} + j \tilde{{{{\varvec{\alpha }}}}}, \mathbf{1}}\) and \(\Delta E_{\mathbf{1} + j \tilde{{{{\varvec{\alpha }}}}}, \mathbf{1} + j \tilde{{{{\varvec{\beta }}}}}}\), respectively, versus the model in (46) represented with dashed lines

Note that, in the current work, there are certain operations whose costs we do not track or compare. The first operation is the estimation of the stochastic rates, \(\left\{ \tilde{g}_j\right\} _{j\ge 1}\). The second operation is the construction of the optimal set given estimates of error and work contribution. We believe that the cost of these two operations can be reduced by using the previously mentioned iterative algorithm. The cost of these operations is thus dominated by the cost of evaluating the MISC estimator. The third operation is the assembly of the stiffness matrix, especially since it scales linearly with the number of random variables. While the cost of this operation is relevant to our discussion, it is usually dominated by the cost of the linear solver, at least for fine-enough discretizations.

Finally, we also compare MISC to the Multi-index Monte Carlo (MIMC) method as detailed in [23], for which \({{\mathcal {O}}\left( W_{\max }^{-0.5} \right) }\) convergence can be proved for Example 1 with \(\gamma =1, d\le 3\) and sufficiently large \(\nu \) (see “Appendix 1”). Moreover, when computing errors, we use the result obtained using a well-resolved MISC solution as a reference value.

Fig. 6
figure 6

Example 1, \(d=3\) and \(\nu =4.5\). a A plot of the estimated stochastic rates, \(\tilde{g}_{j}\), that are used in (46). Here different markers correspond to different modes multiplying the same value of \(A_{{{\varvec{k}}}}\). bd solid lines show the computed approximations of \(\Delta E_{\mathbf{1}, \mathbf{1} + j \tilde{{{{\varvec{\beta }}}}}}, \Delta E_{\mathbf{1} + j \tilde{{{{\varvec{\alpha }}}}}, \mathbf{1}}\) and \(\Delta E_{\mathbf{1} + j \tilde{{{{\varvec{\alpha }}}}}, \mathbf{1} + j \tilde{{{{\varvec{\beta }}}}}}\), respectively, versus the model in (46) represented with dashed lines

Fig. 7
figure 7

Example 1, \(d=1\) and \(\nu =2.5\). This figure shows extreme values of \({{{\varvec{\alpha }}}}\) and \({{{\varvec{\beta }}}}\) included in the MISC set, \({\mathcal {I}}\). Specifically, the left-solid lines are the maximum spatial discretization level, \(\max _{({{{\varvec{\alpha }}}}, {{{\varvec{\beta }}}}) \in {\mathcal {I}}} \left( \max \left( {{{\varvec{\alpha }}}}\right) \right) \), the left-dashed lines are the maximum quadrature level, \(\max _{({{{\varvec{\alpha }}}}, {{{\varvec{\beta }}}}) \in {\mathcal {I}}} \left( \max \left( {{{\varvec{\beta }}}}\right) \right) \), the right-solid lines are the index of the last activated random variable, \(\max _{({{{\varvec{\alpha }}}}, {{{\varvec{\beta }}}}) \in {\mathcal {I}}} \left( \max _{\beta _j > 1} j\right) \), and the right-dashed lines are the maximum number of jointly activated variables, \(\max _{({{{\varvec{\alpha }}}}, {{{\varvec{\beta }}}}) \in {\mathcal {I}}} \left( |{{{\varvec{\beta }}}}-\mathbf{1}|_0\right) \)

Fig. 8
figure 8

Example 1, \(d=3\) and \(\nu =4.5\). This figure shows extreme values of \({{{\varvec{\alpha }}}}\) and \({{{\varvec{\beta }}}}\) included in the MISC set \({\mathcal {I}}\). Specifically, the left-solid lines are the maximum spatial discretization level, \(\max _{({{{\varvec{\alpha }}}}, {{{\varvec{\beta }}}}) \in {\mathcal {I}}} \left( \max \left( {{{\varvec{\alpha }}}}\right) \right) \), the left-dashed are the maximum quadrature level, \(\max _{({{{\varvec{\alpha }}}}, {{{\varvec{\beta }}}}) \in {\mathcal {I}}} \left( \max \left( {{{\varvec{\beta }}}}\right) \right) \), the right-solid are the index of the last activated random variable, \(\max _{({{{\varvec{\alpha }}}}, {{{\varvec{\beta }}}}) \in {\mathcal {I}}} \left( \max _{\beta _j > 1} j\right) \), and the right-dashed are the maximum number of jointly activated variables, \(\max _{({{{\varvec{\alpha }}}}, {{{\varvec{\beta }}}}) \in {\mathcal {I}}} \left( |{{{\varvec{\beta }}}}-\mathbf{1}|_0\right) \)

Fig. 9
figure 9

Convergence results of MISC Example 1 with a constant diffusion coefficient, a (left, \(d=1\) and \(\nu =2.5\); right, \(d=3\) and \(\nu =4.5\)). In this case, MISC is equivalent to a deterministic combination technique [7]. These plots show the non-asymptotic effect of the logarithmic factor for \(d>1\) (as discussed in [22, Theorem 1]) on the linear convergence fit in log–log scale

Fig. 10
figure 10

Convergence results of MISC versus MIMC when applied to Example 1 (left, \(d=1\) and \(\nu =2.5\); right, \(d=3\) and \(\nu =4.5\))

Figures 5b–d and 6b–d compare some computed values of \(\left| {{\varvec{\Delta }}}[F_{{{{\varvec{\alpha }}}},{{{\varvec{\beta }}}}}] \right| \) versus the model (46) using the estimated rates \(r_{\mathrm{FEM}}=2 \min \left( 1, \frac{\nu }{d}\right) \) and \(\left\{ \tilde{g}_j\right\} _{j\ge 1}\). These plots show that the model (46) is a good fit for the case \(d=1, \nu =2.5\) and \(d=3, \nu =4.5\). Moreover, similar plots were produced for other values of \(d\) and \(\nu \) that are not reported here but also show good fit. Figures 7 and 8 show

  • the maximum spatial discretization level, \(\max _{({{{\varvec{\alpha }}}}, {{{\varvec{\beta }}}}) \in {\mathcal {I}}} \max ({{{\varvec{\alpha }}}})\),

  • the maximum quadrature level, \(\max _{({{{\varvec{\alpha }}}}, {{{\varvec{\beta }}}}) \in {\mathcal {I}}} \max ({{{\varvec{\beta }}}})\),

  • the index of the last activated random variable, \(\max _{({{{\varvec{\alpha }}}}, {{{\varvec{\beta }}}}) \in {\mathcal {I}}} \max _{\beta _j > 1} j\),

  • and the maximum number of jointly activated variables, \(\max _{({{{\varvec{\alpha }}}}, {{{\varvec{\beta }}}}) \in {\mathcal {I}}} |{{{\varvec{\beta }}}}-\mathbf{1}|_0\).

These values convey the size of the used index set, \({\mathcal {I}}\), for different values of \(W_{\max }\).

Figure 4 shows the observed convergence rates of MISC for the cases \(d=1\) and \(d=3\) and different values of \(\nu \). This figure shows that the observed rates are better than those predicted by the theory developed in this work, which suggests that further improvement in the theory is possible (see Remark 4). Figures 9 and 10 show in greater details the observed convergence curves for \(d=1, \nu =2.5\) and \(d=3, \nu =4.5\) and their respective linear fit in log-log scale.

We recall that, as shown in [22, Theorem 1], the convergence rate of MISC with a finite number of random variables is \({{\mathcal {O}}\left( W_{\max }^{-2} \log (W_{\max })^{d-1} \right) }\). Compare this to the theory presented here that predicts, as \(\nu \rightarrow \infty \), a convergence of \({{\mathcal {O}}\left( W_{\max }^{-2+\epsilon } \right) }\) for any \(\epsilon >0\). However, Fig. 9 shows that even for a problem with \(d=3\) and no random variables, MISC (which, in this case, becomes equivalent to a deterministic combination technique [7]) has an observed convergence rate that is closer to \(-1.38\). This is due to the effect of the logarithmic term that is nonzero for \(d>1\). Based on this, we should not expect a better convergence rate for \(d=3\) and any finite \(\nu > 0\). This is also numerically validated in Fig. 10, which shows the full convergence curves for \(d=1, \nu =2.5\) and \(d=3, \nu =4.5\).

7 Conclusions

In this work, we analyzed the performance of the MISC method when applied to linear elliptic PDEs depending on a countable sequence of random variables. For ease of presentation, we worked on tensor product domains, but the results can be extended to more general domains and non-uniform meshes, as briefly mentioned Sect. 3. We proved a convergence result using a summability argument that shows that, in certain cases, the convergence of the method is essentially dictated by the convergence properties of the deterministic solver. We then applied the convergence theorem to derive convergence rates for the approximation of the expected value of a functional of the solution of an elliptic PDE with diffusion coefficient described by a random field, tracking the dependence of the convergence rate on the spatial regularity of the realizations of the random field. The theoretical findings are backed up by numerical experiments that show the dependence of the convergence rate on the regularity parameter. Future works includes extending the convergence analysis to higher-order finite element solvers and improving the estimates of the error contribution of each difference operator by taking into account the factorial terms appearing in the estimates for the size of the Chebyshev coefficients, cf. [3, 11]. Moreover, the ideas in [12] can be extended to design an algorithm that iteratively estimates the optimal MISC set by alternating between optimizing the set and evaluating the estimator to ensure that the work to optimize the set is dominated by the work to evaluate the MISC estimator.