Toward Machine Wald

Handbook of Uncertainty Quantification


The past century has seen a steady increase in the need of estimating and predicting complex systems and making (possibly critical) decisions with limited information. Although computers have made possible the numerical evaluation of sophisticated statistical models, these models are still designed by humans because there is currently no known recipe or algorithm for dividing the design of a statistical model into a sequence of arithmetic operations. Indeed enabling computers to think as humans, especially when faced with uncertainty, is challenging in several major ways: (1) Finding optimal statistical models remains to be formulated as a well-posed problem when information on the system of interest is incomplete and comes in the form of a complex combination of sample data, partial knowledge of constitutive relations and a limited description of the distribution of input random variables. (2) The space of admissible scenarios along with the space of relevant information, assumptions, and/or beliefs, tends to be infinite dimensional, whereas calculus on a computer is necessarily discrete and finite. With this purpose, this paper explores the foundations of a rigorous framework for the scientific computation of optimal statistical estimators/models and reviews their connections with decision theory, machine learning, Bayesian inference, stochastic optimization, robust optimization, optimal uncertainty quantification, and information-based complexity.

1.1 Construction of \(\pi \odot \mathbb{D}\)

The below construction works when \(\mathcal{A}\subseteq \mathcal{G}\times \mathcal{M}(\mathcal{X})\) for some Polish subset \(\mathcal{G}\subset \mathcal{F}(\mathcal{X})\) and \(\mathcal{X}\) is Polish. Observe that since \(\mathcal{D}\) is metrizable, it follows from [4, Thm. 15.13], that, for any \(B \in \mathcal{B}(\mathcal{D})\), the evaluation \(\nu \mapsto \nu (B)\), \(\nu \in \mathcal{M}(\mathcal{D})\), is measurable. Consequently, the measurability of \(\mathbb{D}\) implies that the mapping

$$\displaystyle{\widehat{\mathbb{D}}: \mathcal{A}\times \mathcal{B}(\mathcal{D}) \rightarrow R}$$

defined by

$$\displaystyle{\widehat{\mathbb{D}}{\bigl ((f,\mu ),B\bigr )} := \mathbb{D}(f,\mu )[B],\quad \text{for }(f,\mu ) \in \mathcal{A},B \in \mathcal{B}(\mathcal{D})}$$

is a transition function in the sense that, for fixed \((f,\mu ) \in \mathcal{A}\), \(\widehat{\mathbb{D}}{\bigl ((f,\mu ),.\bigr )}\) is a probability measure, and, for fixed \(B \in \mathcal{B}(\mathcal{D})\), \(\widehat{\mathbb{D}}{\bigl (.,B\bigr )}\) is Borel measurable. Therefore, by [18, Thm. 10.7.2], any \(\pi \in \mathcal{M}(\mathcal{A})\) defines a probability measure

$$\displaystyle{\pi \odot \mathbb{D} \in \mathcal{M}{\bigl (\mathcal{B}(\mathcal{A}) \times \mathcal{B}(\mathcal{D})\bigr )}}$$


$$\displaystyle{ \pi \odot \mathbb{D}\big[A \times B\big] := \mathbb{E}_{(f.\mu )\sim \pi }\big[\mathbb{1}_{A}(f,\mu )\mathbb{D}(f,\mu )[B]\big],\quad \text{for }A \in \mathcal{B}(\mathcal{A}),B \in \mathcal{B}(\mathcal{D}), }$$

where \(\mathbb{1}_{A}\) is the indicator function of the set A:

$$\displaystyle{\mathbb{1}_{A}(f,\mu ) := \left \{\begin{array}{@{}l@{\quad }l@{}} 1,\quad &\mbox{ if $(f,\mu ) \in A$,}\\ 0,\quad &\mbox{ if $(f,\mu )\notin A$.} \end{array} \right.}$$

It is easy to see that π is the \(\mathcal{A}\)-marginal of \(\pi \odot \mathbb{D}\). Moreover, when \(\mathcal{X}\) is Polish, [4, Thm. 15.15] implies that \(\mathcal{M}(\mathcal{X})\) is Polish, and when \(\mathcal{G}\) is Polish, it follows that \(\mathcal{A}\subseteq \mathcal{G}\times \mathcal{M}(\mathcal{X})\) is second countable. Consequently, since \(\mathcal{D}\) is Suslin and hence second countable, it follows from [32, Prop. 4.1.7] that

$$\displaystyle{\mathcal{B}{\bigl (\mathcal{A}\times \mathcal{D}\bigr )} = \mathcal{B}(\mathcal{A}) \times \mathcal{B}(\mathcal{D})}$$

and hence \(\pi \odot \mathbb{D}\) is a probability measure on \(\mathcal{A}\times \mathcal{D}\). That is,

$$\displaystyle{\pi \odot \mathbb{D} \in \mathcal{M}(\mathcal{A}\times \mathcal{D}).}$$

Henceforth denote \(\pi \cdot \mathbb{D}\) the corresponding Bayes’ sampling distribution defined by the \(\mathcal{D}\)-marginal of \(\pi \odot \mathbb{D}\), and note that by (21), one has

$$\displaystyle{ \pi \cdot \mathbb{D}[B] := \mathbb{E}_{(f,\mu )\sim \pi }\big[\mathbb{D}(f,\mu )[B]\big],\quad \text{for }B \in \mathcal{B}(\mathcal{D}). }$$

Since both \(\mathcal{D}\) and \(\mathcal{A}\) are Suslin, it follows that the product \(\mathcal{A}\times \mathcal{D}\) is Suslin. Consequently, [18, Cor. 10.4.6] asserts that regular conditional probabilities exist for any sub-σ-algebra of \(\mathcal{B}{\bigl (\mathcal{A}\times \mathcal{D}\bigr )}\). In particular, the product theorem of [18, Thm. 10.4.11] asserts that product regular conditional probabilities

$$\displaystyle{{\bigl (\pi \odot \mathbb{D}\bigr )}\vert _{d} \in \mathcal{M}(\mathcal{A}),\quad \text{for }d \in \mathcal{D}}$$

exist and that they are \(\pi \cdot \mathbb{D}\)-a.e. unique.

1.2 Proof of Theorem 2

If \(\pi ^{\dag }\cdot \mathbb{D}\) is not absolutely continuous with respect to \(\pi \cdot \mathbb{D}\), then there exists \(B \in \mathcal{B}(\mathcal{D})\) such that \((\pi \cdot \mathbb{D})[B] = 0\) and \((\pi ^{\dag }\cdot \mathbb{D})[B] > 0\). Let \(\theta \in \Theta (\pi )\). Define

$$\displaystyle{ \theta _{y}(d) :=\theta (d)1_{B^{c}}(d) + y1_{B}(d) }$$

Then it is easy to see that if y is in the range of \(\Phi \), then \(\theta _{y} \in \Theta (\pi )\). Now observe that for \(y,z \in Image(\Phi )\),

$$\displaystyle{ \mathcal{E}(\theta _{y},\pi ^{\dag }) -\mathcal{E}(\theta _{ z},\pi ^{\dag }) = \mathbb{E}_{ (f,\mu,d)\sim \pi ^{\dag }\odot \mathbb{D}}\Bigg[1_{B}(d)\Big(V \big(y - \Phi (f,\mu )\big) - V \big(z - \Phi (f,\mu )\big)\Big)\Bigg] }$$

Hence, for V (x) = x 2, it holds true that

$$\displaystyle{ \mathcal{E}(\theta _{y},\pi ^{\dag }) -\mathcal{E}(\theta _{ z},\pi ^{\dag }) =\big [(y-\gamma )^{2} - (z-\gamma )^{2}\big](\pi ^{\dag }\cdot \mathbb{D})[B] }$$


$$\displaystyle{ \gamma := \mathbb{E}_{\pi ^{\dag }\odot \mathbb{D}}[\Phi \vert D \in B] }$$

which proves

$$\displaystyle{ \begin{array}{rl} \sup _{\theta _{2}\in \Theta (\pi )}\mathcal{E}(\theta _{2},\pi ^{\dag })& -\inf _{\theta _{1}\in \Theta (\pi )}\mathcal{E}(\theta _{1},\pi ^{\dag }) \geq \sup _{B\in \mathcal{B}(\mathcal{D})\,:\,(\pi \cdot \mathbb{D})[B]=0,\,y,z\in Image(\Phi )} \\ &\Big[\big(y - \mathbb{E}_{\pi ^{\dag }\odot \mathbb{D}}[\Phi \vert D \in B]\big)^{2} -\big (z - \mathbb{E}_{\pi ^{\dag }\odot \mathbb{D}}[\Phi \vert D \in B]\big)^{2}\Big](\pi ^{\dag }\cdot \mathbb{D})[B], \end{array} }$$


$$\displaystyle{ \begin{array}{rl} \sup _{\theta _{2}\in \Theta (\pi )}\mathcal{E}(\theta _{2},\pi ^{\dag })& -\inf _{\theta _{1}\in \Theta (\pi )}\mathcal{E}(\theta _{1},\pi ^{\dag }) \leq \big (\mathcal{U}(\mathcal{A}) -\mathcal{L}(\mathcal{A})\big)^{2}\sup _{B\in \mathcal{B}(\mathcal{D})\,:\,(\pi \cdot \mathbb{D})[B]=0}(\pi ^{\dag }\cdot \mathbb{D})[B]. \end{array} }$$

To obtain the right hand side of (19) observe that (see for instance [29, Sec. 5]) there exists \(B^{{\ast}}\in \mathcal{B}(\mathcal{D})\) such that

$$\displaystyle{ (\pi ^{\dag }\cdot \mathbb{D})[B^{{\ast}}] =\sup _{ B\in \mathcal{B}(\mathcal{D})\,:\,(\pi \cdot \mathbb{D})[B]=0}(\pi ^{\dag }\cdot \mathbb{D})[B] }$$

and (since \(\theta _{2} =\theta _{1}\) on the complement of B )

$$\displaystyle{ \begin{array}{rl} \sup _{\theta _{1},\theta _{2}\in \Theta (\pi )}&\big(\mathcal{E}(\theta _{2},\pi ^{\dag }) -\mathcal{E}(\theta _{1},\pi ^{\dag })\big) \\ & =\sup _{\theta _{1},\theta _{2}\in \Theta (\pi )}\mathbb{E}_{(f,\mu,d)\sim \pi ^{\dag }\odot \mathbb{D}}\Bigg[1_{B^{{\ast}}}(d)\Big(V \big(\theta _{2} - \Phi (f,\mu )\big) - V \big(\theta _{1} - \Phi (f,\mu )\big)\Big)\Bigg].\end{array} }$$

We conclude by observing that for V (x) = x 2,

$$\displaystyle{ \sup _{\theta _{1},\theta _{2}\in \Theta (\pi )}\Big(V \big(\theta _{2} - \Phi (f,\mu )\big) - V \big(\theta _{1} - \Phi (f,\mu )\big)\Big) \leq \big (\mathcal{U}(\mathcal{A}) -\mathcal{L}(\mathcal{A})\big)^{2}. }$$

1.3 Conditional Expectation as an Orthogonal Projection

It easily follows from Tonelli’s Theorem that

$$\displaystyle{ \mathbb{E}_{\pi \cdot \mathbb{D}}[h^{2}] = \mathbb{E}_{\pi \odot \mathbb{D}}[h^{2}] = \mathbb{E}_{ (f,\mu )\sim \pi }\mathbb{E}_{\mathbb{D}(f,\mu )}[h^{2}]\,. }$$

By considering the sub σ-algebra \(\mathcal{A}\times \mathcal{B}(\mathcal{D}) \subset \mathcal{B}(\mathcal{A}\times \mathcal{D}) = \mathcal{B}(\mathcal{A}) \times \mathcal{B}(\mathcal{D})\), it follows from, e.g., Theorem 10.2.9 of [32], that \(L_{\pi \cdot \mathbb{D}}^{2}(\mathcal{D})\) is a closed Hilbert subspace of the Hilbert space \(L_{\pi \odot \mathbb{D}}^{2}(\mathcal{A}\times \mathcal{D})\) and the conditional expectation of \(\Phi \) given the random variable D is the orthogonal projection from \(L_{\pi \odot \mathbb{D}}^{2}(\mathcal{A}\times \mathcal{D})\) to \(L_{\pi \cdot \mathbb{D}}^{2}(\mathcal{D})\).

Owhadi, H., Scovel, C. (2015). Toward Machine Wald. In: Ghanem, R., Higdon, D., Owhadi, H. (eds) Handbook of Uncertainty Quantification. Springer, Cham.

