Some Skorohod-type results

Pratelli, Luca; Rigo, Pietro

doi:10.1007/s10203-024-00466-w

Download PDF

Luca Pratelli¹ &
Pietro Rigo²

211 Accesses
Explore all metrics

Abstract

Let S be a metric space, $g:S\rightarrow \mathbb {R}$ a Borel function, and $(\mu _n:n\ge 0)$ a sequence of tight probability measures on $\mathcal {B}(S)$. If $\mu _n=\mu _0$ on $\sigma (g)$, there are S-valued random variables $X_n$, all defined on the same probability space, such that $X_n\sim \mu _n$ and $g(X_n)=g(X_0)$ for all $n\ge 0$. Moreover, $X_n\overset{a.s.}{\longrightarrow }X_0$ if and only if $E_{\mu _n}(f\mid g)\,\overset{\mu _0-a.s.}{\longrightarrow }\,E_{\mu _0}(f\mid g)$ for each $f\in C_b(S)$. This result, proved in Pratelli and Rigo (J Theoret Probab 36:372-389, 2023) , is the starting point of this paper. Three types of contributions are provided. First, $\sigma (g)$ is replaced by an arbitrary sub-$\sigma $-field $\mathcal {G}\subset \mathcal {B}(S)$. Second, the result is applied to some specific frameworks, including equivalence couplings, total variation distances, and the decomposition of cadlag processes with finhite activity. Third, following Hansen et al. (Tempered Bayesian analysis, Unpublished manuscript, 2024), the result is extended to models and kernels. This extension has a fairly natural interpretation in terms of decision theory, mass transportation and statistics.

A Strong Version of the Skorohod Representation Theorem

Article Open access 03 February 2022

Multifractal Spectrum for Barycentric Averages

Article 23 April 2015

An Extended Perron–Frobenius Theorem and Large Deviation Theory for Markov Processes

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

1 Introduction

Let S be a metric space and $(\mu _n:n\ge 0)$ a sequence of probability measures on $\mathcal {B}(S)$. (Throughout, for any topological space T, we let $\mathcal {B}(T)$ denote the Borel $\sigma $-field on T). We say that $(X_n:n\ge 0)$ is a coupling of $(\mu _n)$ if

The $X_n$ are S-valued random variables, all defined on the same probability space, such that $X_n\sim \mu _n$ for each $n\ge 0$.

The Skorohod representation theorem (SRT) states that, if $\mu _n\rightarrow \mu _0$ weakly and $\mu _0$ has a separable support, there is a coupling $(X_n)$ of $(\mu _n)$ such that $X_n\overset{a.s.}{\longrightarrow }X_0$. This version of SRT is due to Wichura (Wichura 1970) who reworked the previous versions by Skorohod (Skorohod 1956) and Dudley (Dudley 1968). We refer to [Dudley (1999), p. 130] and [van der Vaart and Wellner (1996), p. 77] for historical notes, and to Berti et al. (2013) for the case where $\mu _0$ does not have a separable support. Some other related references are (Berti et al. 2007, 2011, 2015; Blackwell and Dubins 1983; Chau and Rasonyi 2017; Cortissoz 2007; Dumav and Stinchombe 2016; Fernique 1988; Hernandez-Ceron 2010; Jakubowski 1998; Sethuraman 2002).

We aim at getting some new results in the spirit of SRT. Our starting point is the following version of SRT, recently proved in Pratelli and Rigo (2023).

Theorem 1

Let T be a separable metric space, $g:S\rightarrow T$ a Borel function, and

$$\begin{aligned} \sigma (g)=\bigl \{g^{-1}(B):B\in \mathcal {B}(T)\bigr \}. \end{aligned}$$

Suppose

$$\begin{aligned} \mu _n\text { is tight and }\mu _n=\mu _0\text { on }\sigma (g)\text { for every }n\ge 0. \end{aligned}$$

Then, on some probability space $(\Omega ,\mathcal {A},\mathbb {P})$, there is a coupling $(X_n)$ of $(\mu _n)$ such that

$$\begin{aligned} g(X_n)=g(X_0)\,\text { for all }n\ge 0. \end{aligned}$$

(1)

In addition to (1), one also obtains $X_n\overset{\mathbb {P}-a.s.}{\longrightarrow }X_0$ if and only if

$$\begin{aligned} E_{\mu _n}(f\mid g)\,\overset{\mu _0-a.s.}{\longrightarrow }\,E_{\mu _0}(f\mid g)\quad \quad \text {for each bounded continuous }f:S\rightarrow \mathbb {R}. \end{aligned}$$

Here and in the sequel, for any probability $\nu $ on $\mathcal {B}(S)$, the notation $E_\nu (f\mid g)$ stands for the conditional expectation of f given $\sigma (g)$ in the probability space $(S,\mathcal {B}(S),\nu )$. Note that, when g is constant, $\sigma (g)$ reduces to the trivial $\sigma $-field and $E_\nu (f\mid g)=E_\nu (f)=\int f\,d\nu $. Hence, if g is constant and the $\mu _n$ are tight, Theorem 1 reduces to SRT.

This paper provides some extensions of Theorem 1 and investigates some of its consequences. Our results are of three types.

(i)
In Theorem 3, $\sigma (g)$ is replaced by an arbitrary sub-$\sigma $-field $\mathcal {G}\subset \mathcal {B}(S)$. In this case, the coupling $(X_n)$ of $(\mu _n)$ only satisfies
$$\begin{aligned} \mathbb {P}(X_n\in A,\,X_0\notin A)=0\quad \quad \text {for all }A\in \mathcal {G}\text { and }n\ge 0. \end{aligned}$$
However, in the special case $\mathcal {G}=\sigma (g)$ with g as in Theorem 1, the above condition is equivalent to $g(X_n)=g(X_0)$ a.s. Hence, Theorem 3 actually extends Theorem 1.
(ii)
In Examples 3 and 4, Theorem 1 is applied to some specific frameworks. Example 3 deals with a sequence $(U_n:n\ge 0)$ of cadlag processes with finite activity. Let $U_n^*$ be the continuous part of $U_n$. It is shown that, if $U_n^*\sim U_0^*$ for all n, the $U_n$ admit a common decomposition. Precisely,
$$\begin{aligned} U_n\sim I+J_n\quad \quad \text {for all }n\ge 0, \end{aligned}$$
where the processes I and $J_n$ are defined on the same probability space, I has continuous paths and $J_n$ is a pure jump process. Example 4 is concerned with optimal transport. It is shown that Theorem 1 implies (and slightly improves) a recent duality result on equivalence couplings and total variation distances; see (Jaffe 2023).
(iii)
In Sect. 4, we deal with models and kernels. Let $(\Theta ,\mathcal {H})$ and $(\mathcal {X},\mathcal {E})$ be measurable spaces. A model is a collection $\mathcal {P}=\{P_\theta :\,\theta \in \Theta \}$, indexed by $\Theta $, of probability measures $P_\theta $ on $\mathcal {E}$. A kernel is a model which satisfies a certain measurability condition. Those non-atomic kernels such that
$$\begin{aligned} P_\theta \Bigl (h^{-1}\{\theta \}\Bigr )=1,\quad \quad \theta \in \Theta , \end{aligned}$$
for some measurable function $h:\mathcal {X}\rightarrow \Theta $, have been recently characterized. Such a characterization, obtained in Hansen et al. (2024), is reported in Theorem 4. Our contribution consists in two versions of Theorem 4. One extends Theorem 4 from kernels to models, while the other is in the spirit of Theorem 1. Unlike Theorem 4, both versions admit a straightforward proof. Obviously, models and kernels are fundamental in probability theory (just think of conditional distributions and Markov processes). But models and kernels play a role in many other frameworks. For instance, in decision theory, a model $\mathcal {P}$ can be regarded as the collection of probability distributions of a state-contingent payoff conditional on a parameter $\theta $. Or else, in statistical inference, $\mathcal {P}$ may be viewed as the class of possible probability distributions on the data. Accordingly, Theorem 4 and its two versions can be attached some interpretation. In Sect. 4, this interpretation is discussed and various examples are given.

2 Preliminaries

We briefly recall some well known definitions and results. To this end, we let $(\mathcal {X},\mathcal {E},\mu )$ denote any probability space.

The measurable space $(\mathcal {X},\mathcal {E})$ is said to be a standard Borel space if $\mathcal {X}$ is a Borel subset of a Polish space and $\mathcal {E}=\mathcal {B}(\mathcal {X})$. Similarly, $(\mathcal {X},\mathcal {E})$ is a Radon space if $\mathcal {X}$ is a metric space, $\mathcal {E}=\mathcal {B}(\mathcal {X})$, and each probability measure on $\mathcal {E}$ is tight. A standard Borel space is a Radon space but not conversely. For instance, if $\mathcal {X}$ is a universally measurable, non-Borel subset of a Polish space, then $(\mathcal {X},\mathcal {B}(\mathcal {X}))$ is not a standard Borel space but every probability measure on $\mathcal {B}(\mathcal {X})$ is tight.

A $\mu $-atom is a set $A\in \mathcal {E}$ such that $\mu (A)>0$ and $\mu (\cdot \mid A)$ is 0-1 valued. We say that $(\mathcal {X},\mathcal {E},\mu )$ is a non-atomic probability space, or that $\mu $ is non-atomic, if $\mu $ has no atoms. If $\mathcal {X}$ is a separable metric space and $\mathcal {E}=\mathcal {B}(\mathcal {X})$, then $\mu $ is non-atomic if and only if $\mu \{x\}=0$ for all $x\in \mathcal {X}$.

Let $\mathcal {F}\subset \mathcal {E}$ be a sub-$\sigma $-field. A regular conditional distribution for $\mu $ given $\mathcal {F}$ is a collection $\gamma =\{\gamma (x):x\in \mathcal {X}\}$ such that:

− $\gamma (x)$ is a probability measure on $\mathcal {E}$ for each $x\in \mathcal {X}$;

− $\gamma (\cdot )(A)$ is a version of $E_\mu (1_A\mid \mathcal {F})$ for each $A\in \mathcal {E}$.

If $(\mathcal {X},\mathcal {E})$ is a Radon space, a regular conditional distribution for $\mu $ given $\mathcal {F}$ exists and is $\mu $-a.s. unique.

Finally, to prove forthcoming Theorem 3, we report the following version of SRT; see (Blackwell and Dubins 1983) and [Hernandez-Ceron (2010), p. 52–54] for a detailed proof.

Theorem 2

(Blackwell and Dubins) Let m be the Lebesgue measure on $\mathcal {B}((0,1))$ and $\Lambda $ the collection of probability measures on $\mathcal {B}(S)$. If S is Polish, there is a Borel map $\Phi :(0,1)\times \Lambda \rightarrow S$ such that

$m\bigl \{\beta \in (0,1):\Phi (\beta ,\lambda )\in B\bigr \}=\lambda (B)$ for all $\lambda \in \Lambda $ and $B\in \mathcal {B}(S)$;
$m\bigl \{\beta \in (0,1):\Phi (\beta ,\lambda _n)\rightarrow \Phi (\beta ,\lambda _0)\bigr \}=1$ if $\lambda _n\in \Lambda $ and $\lambda _n\rightarrow \lambda _0$ weakly.

It is easily seen that Theorem 2 is still true if S is a Borel subset of a Polish space (but not necessarily a Polish space).

3 Theorem 1 and its consequences

This section includes three applications of Theorem 1, outlined in the form of examples, as well as an extension of Theorem 1. We begin with the latter.

Any $\sigma $-field $\mathcal {G}$ over S can be written as $\mathcal {G}=\sigma (g)$ for a suitable function g on S. More precisely, the following result is available.

Lemma 1

For each $\sigma $-field $\mathcal {G}$ over S, there are a measurable space $(T,\mathcal {C})$ and a function $g:S\rightarrow T$ such that

$$\begin{aligned} \mathcal {G}=\bigl \{g^{-1}(C):C\in \mathcal {C}\bigr \}=\sigma (g). \end{aligned}$$

Proof

For each $x\in S$, let H(x) be the $\mathcal {G}$-atom including the point x, that is

$$\begin{aligned} H(x)=\bigl \{y\in S:1_B(y)=1_B(x)\text { for each }B\in \mathcal {G}\bigr \}; \end{aligned}$$

see e.g. (Berti and Rigo 2007) and (Blackwell and Dubins 1975). Define

$$\begin{aligned} T=\bigl \{H(x):x\in S\bigr \}. \end{aligned}$$

Then, T is a partition of S and every element of $\mathcal {G}$ is a union of elements of T. For any $C\subset T$, define $C^*=\bigl \{x\in S:H(x)\in C\bigr \}$. Then, it suffices to let

$$\begin{aligned} \mathcal {C}=\bigl \{C\subset T:C^*\in \mathcal {G}\bigr \}\quad \text {and}\quad g(x)=H(x)\quad \text { for every }x\in S. \end{aligned}$$

$\square $

Based on Lemma 1, it is tempting to extend Theorem 1 to an arbitrary sub-$\sigma $-field $\mathcal {G}\subset \mathcal {B}(S)$. This is impossible, however, if Theorem 1 is stated as above.

Example 1

Suppose $\mu _n\{x\}=\mu _0\{x\}=0$ for all $x\in S$ and take $\mathcal {G}$ to be the collection of countable and co-countable subsets of S. In this case, $\mu _n=\mu _0$ on $\mathcal {G}$. However, since $\mathcal {G}$ includes the singletons, any function g such that $\mathcal {G}=\sigma (g)$ is injective, so that $g(X_n)=g(X_0)$ amounts to $X_n=X_0$. Hence, $(\mu _n)$ admits a coupling $(X_n)$ satisfying condition (1) if and only if $\mu _n=\mu _0$ on all of $\mathcal {B}(S)$.

The next result is motivated by the previous comments. In the sequel, for any topological space T, we denote by $C_b(T)$ the collection of real bounded continuous functions on T.

Theorem 3

Fix a sub-$\sigma $-field $\mathcal {G}\subset \mathcal {B}(S)$ and suppose

$$\begin{aligned} \mu _n\text { is tight and }\mu _n=\mu _0\text { on }\mathcal {G}\text { for every }n\ge 0. \end{aligned}$$

Then, on some probability space $(\Omega ,\mathcal {A},\mathbb {P})$, there is a coupling $(X_n)$ of $(\mu _n)$ such that

$$\begin{aligned} \mathbb {P}(X_n\in A,\,X_0\notin A)=0\,\text { for all }A\in \mathcal {G}\text { and }n\ge 0. \end{aligned}$$

(2)

In addition to (2), one also obtains $X_n\overset{\mathbb {P}-a.s.}{\longrightarrow }X_0$ if and only if

$$\begin{aligned} E_{\mu _n}(f\mid \mathcal {G})\,\overset{\mu _0-a.s.}{\longrightarrow }\,E_{\mu _0}(f\mid \mathcal {G})\quad \quad \text {for each }f\in C_b(S). \end{aligned}$$

(3)

Proof

We just give a sketch of the proof, for it is quite similar to that of Theorem 1.

Since all the $\mu _n$ are tight, S can be assumed to be a Borel subset of a Polish space. Hence, Theorem 2 applies. Moreover, for each $n\ge 0$, we can fix a regular conditional distribution for $\mu _n$ given $\mathcal {G}$, say $\gamma _n=\{\gamma _n(x):x\in S\}$; see Sect. 2.

Let m be the Lebesgue measure on $\mathcal {B}((0,1))$ and $\Phi :(0,1)\times \Lambda \rightarrow S$ the Borel map involved in Theorem 2. Define

$$\begin{aligned} \Omega =(0,1)\times (0,1),\quad \mathcal {A}=\mathcal {B}\Bigl ((0,1)\times (0,1)\Bigr ),\quad \mathbb {P}=m\times m. \end{aligned}$$

For each $n\ge 0$ and $(\alpha ,\beta )\in (0,1)\times (0,1)$, define also

$$\begin{aligned} \phi (\alpha )=\Phi (\alpha ,\mu _0)\quad \text {and}\quad X_n(\alpha ,\beta )=\Phi \Bigl (\beta ,\,\gamma _n[\phi (\alpha )]\Bigr ). \end{aligned}$$

The $X_n$ are S-valued random variables on $(\Omega ,\mathcal {A},\mathbb {P})$. Arguing as in the proof of Theorem 1, it can be shown that $(X_n)$ is a coupling of $(\mu _n)$ and condition (3) is equivalent to $X_n\overset{\mathbb {P}-a.s.}{\longrightarrow }X_0$. Finally, we prove (2). Fix $A\in \mathcal {G}$ and note that

$$\begin{aligned}{} & {} m\Bigl \{\beta :X_n(\alpha ,\beta )\in A,\,X_0(\alpha ,\beta )\notin A\Bigr \}\\\le & {} \min \Bigl \{m\bigl \{\beta :X_n(\alpha ,\beta )\in A\bigr \},\,m\bigl \{\beta :X_0(\alpha ,\beta )\notin A\bigr \}\Bigr \}\\= & {} \min \Bigl \{\gamma _n[\phi (\alpha )](A),\,\gamma _0[\phi (\alpha )](A^c)\Bigr \}\quad \quad \text {for all }\alpha \in (0,1). \end{aligned}$$

Since $A\in \mathcal {G}$, then $\gamma _n(x)(A)=1_A(x)$ for $\mu _n$-almost all $x\in S$. Since $\mu _n=\mu _0$ on $\mathcal {G}$, it follows that

$$\begin{aligned} \mu _0\bigl \{x\in S:\,\gamma _n(x)(A)=1_A(x)\bigr \}=1. \end{aligned}$$

Therefore, since $m\circ \phi ^{-1}=\mu _0$, one obtains

$$\begin{aligned} \mathbb {P}(X_n\in A,\,X_0\notin A)= & {} \int _0^1 m\Bigl \{\beta :X_n(\alpha ,\beta )\in A,\,X_0(\alpha ,\beta )\notin A\Bigr \}\,d\alpha \\\le & {} \int _0^1 \min \Bigl \{\gamma _n[\phi (\alpha )](A),\,\gamma _0[\phi (\alpha )](A^c)\Bigr \}\,d\alpha \\= & {} \int \min \Bigl \{\gamma _n(x)(A),\,\gamma _0(x)(A^c)\Bigr \}\,\mu _0(dx)\\= & {} \int \min \Bigl \{1_A(x),\,1_{A^c}(x)\Bigr \}\,\mu _0(dx)=0. \end{aligned}$$

$\square $

Theorem 3 extends Theorem 1 to an arbitrary sub-$\sigma $-field $\mathcal {G}\subset \mathcal {B}(S)$. In fact, if g is as in Theorem 1, then $g(X_n)=g(X_0)$ a.s. if and only if $\mathbb {P}(X_n\in A,\,X_0\notin A)=0$ for all $A\in \sigma (g)$.

One more remark on Theorem 3 is in order. If X and Y are S-valued random variables on $(\Omega ,\mathcal {A},\mathbb {P})$ such that

$$\begin{aligned} \mathbb {P}(X\in A,\,Y\notin A)=0\quad \quad \text {for all }A\in \mathcal {G}, \end{aligned}$$

(4)

then

$$\begin{aligned} \mathbb {P}(X\in A)=\mathbb {P}(X\in A,\,Y\in A)=\mathbb {P}(Y\in A)\quad \quad \text {for all }A\in \mathcal {G}. \end{aligned}$$

Therefore, for any tight probability measures $\mu $ and $\nu $ on $\mathcal {B}(S)$, Theorem 3 yields

$$\begin{aligned} \mu =\nu \text { on }\mathcal {G}\quad \Leftrightarrow \quad \text {Condition (4) holds for some }X\text { and }Y\\\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\text {such that }X\sim \mu \text { and }Y\sim \nu . \end{aligned}$$

We now turn to some applications of Theorem 1. We begin with an example which is not new but may be useful to make clear the scope of Theorem 1.

Example 2

(Corollary 2 of Pratelli and Rigo (2023)) For each $n\ge 0$, let $U_n$ and $V_n$ be random variables on a probability space $(\Omega _n,\mathcal {A}_n,\mathbb {P}_n)$. Suppose $U_n$ is $S_1$-valued and $V_n$ is $S_2$-valued, where $S_1$ and $S_2$ are metric spaces and $S_1$ is separable. Suppose also that $U_n\sim U_0$ and $(U_n,V_n)$ has a tight probability distribution. Under these conditions, Theorem 1 applies to $S=S_1\times S_2$ and $g(x,y)=x$. It follows that, on a probability space $(\Omega ,\mathcal {A},\mathbb {P})$, there are random variables U and $V_n^*$ such that $(U,V_n^*)\sim (U_n,V_n)$ for all $n\ge 0$. Moreover, $V_n^*\overset{a.s.}{\longrightarrow }V_0^*$ if and only if $E_{\mu _n}(f\mid g)\overset{\mu _0-a.s.}{\longrightarrow }E_{\mu _0}(f\mid g)$ for each $f\in C_b(S_2)$, where $\mu _n$ denotes the probability distribution of $(U_n,V_n)$.

In a nutshell, Example 2 may be summarized as follows. If $U_n\sim U_0$ for all n, the random variables $(U_n,V_n)$ can be replaced by $(U,V_n^*)$. In addition to satisfying $(U,V_n^*)\sim (U_n,V_n)$, the new random variables $(U,V_n^*)$ are all defined on the same probability space and they all have the same first coordinate (that is, U). Using $(U,V_n^*)$ instead of $(U_n,V_n)$ may be useful in various settings, such as mass transportation and stochastic control.

The next example deals with a sequence $U_0,U_1,\ldots $ of cadlag processes indexed by $[0,\infty )$. Using Theorem 1 we prove that, if the continuous part of $U_n$ is distributed as that of $U_0$ for every n, then $U_0,U_1,\ldots $ can be coupled so as to have exactly the same continuous part.

Example 3

(Decomposition of cadlag processes with finite activity) Let D be the set of real cadlag functions on $[0,\infty )$, equipped with the Skorohod topology. Define

$$\begin{aligned} S= & {} \bigl \{x\in D:\sum _{0<s\le t}|\Delta x(s)|<\infty \text { for each }t>0\bigr \}\quad \text {and}\\ g(x)(t)= & {} x(t)-\sum _{0<s\le t}\Delta x(s)\quad \quad \text {for each }x\in S\text { and }t\ge 0, \end{aligned}$$

where $\Delta x(s)=x(s)-x(s-)$ is the jump of x at the point s. In financial econometrics, a cadlag function is said to have finite activity if it has only finitely many jumps on any bounded interval. Hence, in particular, S includes all elements of D with finite activity. In turn, the function g associates every $x\in S$ with its continuous part g(x). It can be shown that $g:S\rightarrow C$ is a Borel map, where C denotes the set of continuous functions on $[0,\infty )$ (we omit the calculations). Moreover, since D is Polish and $S\in \mathcal {B}(D)$, each probability measure on $\mathcal {B}(S)$ is tight.

For each $n\ge 0$, let $U_n$ be a process with paths in S. Suppose $g(U_n)\sim g(U_0)$ for each $n\ge 0$, namely, the continuous parts of the $U_n$ are identically distributed. Then, there are processes I and $J_n$ such that:

I and the $J_n$ are all defined on the same probability space;
$I+J_n\sim U_n$ for all $n\ge 0$;
I has continuous paths while $J_n$ is a pure jump process.

The existence of I and $J_n$ follows from Theorem 1. It suffices to take $\mu _n$ as the probability distribution of $U_n$ and to let

$$\begin{aligned} I=g(X_0)\quad \text {and}\quad J_n=X_n-g(X_n)=X_n-g(X_0). \end{aligned}$$

Note also that $I+J_n\overset{a.s.}{\longrightarrow }I+J_0$ (in the Skorohod topology) if and only if

$$\begin{aligned} E_{\mu _n}(f\mid g)\overset{\mu _0-a.s.}{\longrightarrow }E_{\mu _0}(f\mid g)\quad \quad \text {for each }f\in C_b(S). \end{aligned}$$

Our next example deals with a notion of duality recently introduced by Jaffe (2023). In addition to be theoretically intriguing, this notion is potentially useful in various frameworks, including mathematical finance, decision theory, mass transportation and probability theory.

Example 4

(Equivalence couplings and total variation) To keep the notation easier, in this example, we write $\mathcal {B}$ instead of $\mathcal {B}(S)$. Let $E\subset S\times S$ be a measurable equivalence relation. This means that $E\in \mathcal {B}\otimes \mathcal {B}$ and the relation on S defined as

$$\begin{aligned} x\sim y\quad \Leftrightarrow \quad (x,y)\in E \end{aligned}$$

is reflexive, symmetric and transitive. Say that E is strongly dualizable if there is a sub-$\sigma $-field $\mathcal {C}\subset \mathcal {B}$ such that

$$\begin{aligned} \min _{P\in \Gamma (\mu ,\nu )}(1-P(E))=\sup _{A\in \mathcal {C}}\,|\mu (A)-\nu (A)| \end{aligned}$$

(5)

for all probability measures $\mu $ and $\nu $ on $\mathcal {B}$. Here, $\Gamma (\mu ,\nu )$ is the collection of probability measures on $\mathcal {B}\otimes \mathcal {B}$ with marginals $\mu $ and $\nu $, and the notation “$\min $" asserts that the infimum is actually achieved.

Various conditions for E to be strongly dualizable are in Jaffe (2023); see also (Pratelli and Rigo 2024). One of such conditions is the following. Define the sub-$\sigma $-field

$$\begin{aligned} \mathcal {U}=\bigl \{A\in \mathcal {B}:1_A(x)=1_A(y)\quad \text {for all }(x,y)\in E\bigr \}. \end{aligned}$$

Then, E is strongly dualizable provided $E\in \mathcal {U}\otimes \mathcal {U}$ and $(S,\mathcal {B})$ is a standard Borel space; see [Jaffe (2023), Theo. 3.13] and [Pratelli and Rigo (2024), Cor. 6]. This result is a consequence of Theorem 1, however, as we now prove. Moreover, the assumption that $(S,\mathcal {B})$ is standard Borel can be weakened.

Suppose $E\in \mathcal {U}\otimes \mathcal {U}$ and $(S,\mathcal {B})$ is a Radon space. Since $E\in \mathcal {U}\otimes \mathcal {U}$,

$$\begin{aligned} E\in \sigma \bigl (A_1\times B_1,\,A_2\times B_2,\,\ldots \bigr ) \end{aligned}$$

for some $A_n,\,B_n\in \mathcal {U}$, $n\ge 1$. Define $\mathcal {G}=\sigma (A_1,\,B_1,\,A_2,\,B_2,\,\ldots )$. Since $\mathcal {G}$ is a countably generated sub-$\sigma $-field of $\mathcal {B}$, there is a Borel function $g:S\rightarrow \mathbb {R}$ such that $\mathcal {G}=\sigma (g)$. Moreover, since $E\in \mathcal {G}\otimes \mathcal {G}$, one obtains

$$\begin{aligned} \bigl \{(x,y):\,g(x)=g(y)\bigr \}\subset E. \end{aligned}$$

Next, fix two probability measures $\mu $ and $\nu $ on $\mathcal {B}$ such that $\mu =\nu $ on $\mathcal {U}$. Since $\mathcal {G}\subset \mathcal {U}$ and $(S,\mathcal {B})$ is a Radon space, $\mu $ and $\nu $ are tight and $\mu =\nu $ on $\mathcal {G}$. Because of Theorem 1, applied with $\mu _0=\mu $ and $\mu _n=\nu $ for $n>0$, there are S-valued random variables $X_0$ and $X_1$ such that $X_0\sim \mu $, $X_1\sim \nu $ and $g(X_0)=g(X_1)$. Denoting by P the probabilty distribution of $(X_0,X_1)$, it follows that

$$\begin{aligned} P\in \Gamma (\mu ,\nu )\quad \text {and}\quad P(E)\ge P\bigl \{(x,y):\,g(x)=g(y)\bigr \}=1. \end{aligned}$$

Therefore, letting $\mathcal {C}=\mathcal {U}$, equation (5) holds provided $\mu =\nu $ on $\mathcal {U}$. This concludes the proof. In fact, if $\mathcal {C}=\mathcal {U}$, equation (5) holds for all $\mu $ and $\nu $ if and only if it holds for those $\mu $ and $\nu $ such that $\mu =\nu $ on $\mathcal {U}$; see e.g. [Jaffe (2023),Prop. 3.9].

4 Kernels versus models

Let $(\Theta ,\mathcal {H})$ and $(\mathcal {X},\mathcal {E})$ be measurable spaces. To avoid trivialities, we assume

$$\begin{aligned} \text {card}\,(\mathcal {X})>1. \end{aligned}$$

A model is a collection

$$\begin{aligned} \mathcal {P}=\{P_\theta :\,\theta \in \Theta \} \end{aligned}$$

where each $P_\theta $ is a probability measure on $\mathcal {E}$. A model $\mathcal {P}$ is non-atomic if $P_\theta $ is a non-atomic probability measure on $\mathcal {E}$ for each $\theta \in \Theta $. Moreover, $\mathcal {P}$ is measurable if the real valued map $\theta \mapsto P_\theta (A)$ is $\mathcal {H}$-measurable for fixed $A\in \mathcal {E}$. A measurable model is usually called a kernel.

One more definition is needed. Suppose $\mathcal {H}$ includes the singletons. Then, a model $\mathcal {P}$ is said to be orthogonal if there is a measurable function $h:\mathcal {X}\rightarrow \Theta $ such that

$$\begin{aligned} P_\theta \Bigl (h^{-1}\{\theta \}\Bigr )=1\quad \quad \text {for all }\theta \in \Theta . \end{aligned}$$

Here, measurability of h is meant as $h^{-1}(\mathcal {H})\subset \mathcal {E}$. Orthogonal kernels are investigated in Mauldin et al. (1983) and (Weis 1984). They are involved in many contexts, including ergodic decompositions, Gibbs states, disintegrations and extremal models; see e.g. (Berti and Rigo 2007; Blackwell and Dubins 1975; Dynkin 1978; Farrell 1962; Fölmer 1975; Lauritzen 1974; Maitra 1977). The next example, even if obvious, is useful to frame orthogonal kernels.

Example 5

(An orthogonal kernel) For any real random variables U and V, there is an orthogonal version of the conditional distribution of (U, V) given U. Take in fact $(\Theta ,\mathcal {H})=(\mathbb {R},\mathcal {B}(\mathbb {R}))$, $(\mathcal {X},\mathcal {E})=(\mathbb {R}^2,\mathcal {B}(\mathbb {R}^2))$ and define the function $h(u,v)=u$ for all $(u,v)\in \mathbb {R}^2$. Also, denote by $\pi $ the marginal distribution of U. Any kernel $\mathcal {P}=\{P_\theta :\,\theta \in \Theta \}$ satisfying the equation

$$\begin{aligned} \text {Prob}\bigl (U\in A,\,V\in B\bigr )=\int _A P_\theta (\mathbb {R}\times B)\,\pi (d\theta ),\quad \text {for all }A,\,B\in \mathcal {B}(\mathbb {R}), \end{aligned}$$

is a version of the conditional distribution of (U, V) given U. If $\mathcal {P}$ is one such version, it is well known that

$$\begin{aligned} P_\theta \Bigl (h^{-1}\{\theta \}\Bigr )=P_\theta \bigl (\{\theta \}\times \mathbb {R}\bigr )=1\quad \quad \text {for }\pi \text {-almost all }\theta \in \mathbb {R}; \end{aligned}$$

see e.g. (Berti and Rigo 2007) and (Blackwell and Dubins 1975). Hence, up to modifying $\mathcal {P}$ on a $\pi $-null set, one obtains a kernel $\mathcal {Q}=\{Q_\theta :\,\theta \in \Theta \}$ such that

$$\begin{aligned} \pi \bigl \{\theta :Q_\theta \ne P_\theta \bigr \}=0\quad \text {and}\quad Q_\theta \Bigl (h^{-1}\{\theta \}\Bigr )=1\quad \quad \text {for all }\theta \in \mathbb {R}. \end{aligned}$$

Such a $\mathcal {Q}$ is an orthogonal version of the conditional distribution of (U, V) given U.

In this section, we focus on the following result from (Hansen et al. 2024).

Theorem 4

(Hansen, Maccheroni, Marinacci, Sargent) Let $(\Theta ,\mathcal {H})$ and $(\mathcal {X},\mathcal {E})$ be standard Borel spaces and $\mathcal {P}$ a kernel. Then, $\mathcal {P}$ is non-atomic and orthogonal if and only if, for any other kernel $\mathcal {Q}=\{Q_\theta :\theta \in \Theta \}$, there is a measurable function $f:\mathcal {X}\rightarrow \mathcal {X}$ such that

$$\begin{aligned} Q_\theta =P_\theta \circ f^{-1}\quad \quad \text {for each }\theta \in \Theta . \end{aligned}$$

In Theorem 4, measurability of f is meant as $f^{-1}(\mathcal {E})\subset \mathcal {E}$ and $P_\theta \circ f^{-1}$ denotes the probability on $\mathcal {E}$ defined as $P_\theta \circ f^{-1}(A)=P_\theta \bigl (f^{-1}(A)\bigr )$ for all $A\in \mathcal {E}$.

Essentially, Theorem 4 states that a kernel $\mathcal {P}$ is non-atomic and orthogonal if and only if any other kernel $\mathcal {Q}$ is a push forward of $\mathcal {P}$, in the sense that $Q_\theta =P_\theta \circ f^{-1}$ for all $\theta $ and a suitable function f. This characterization may be useful in every framework where kernels play a role, and the list of such frameworks is very long. In probability theory, for instance, kernels are obviously a basic ingredient: just think of conditional distributions or Markov processes. In Bayesian statistical inference, a kernel $\mathcal {P}$ may be viewed as the collection of the distributions on the data conditional on a parameter. In decision theory, $\mathcal {P}$ can be regarded as the collection of the distributions of a state-contingent payoff conditional on a parameter; see e.g. (Hansen et al. 2024). In weak optimal transport, each $P_\theta $ provides information about how the mass taken at $\theta $ is distributed over $\mathcal {X}$; see e.g. (Chone and Kramarz 2021; Chone et al. 2023; Galichon et al. 2014) and references therein. In each of these frameworks, thus, Theorem 4 has some motivation.

The previous remarks are still valid if kernels are replaced by models. In fact, there are several problems where measurability of a kernel is superfluous. We support this claim by three examples.

Example 6

(Classical statistical inference) According to the classical approach to statistics, the two basic ingredients of an inferential problem are a measurable space $(\mathcal {X},\mathcal {E})$ and a model $\mathcal {P}=\{P_\theta :\theta \in \Theta \}$. The set $\mathcal {X}$ is regarded as the sample space and $P_\theta $ is the probability distribution of the data when the value of the parameter is $\theta $. Importantly, the parameter is viewed as an unknown but fixed constant, and there is no reason to integrate over it. Hence, the $\sigma $ field $\mathcal {H}$ is superfluous and measurability of $\mathcal {P}$ is not required. In the language of this paper, $\mathcal {P}$ is a model but not a kernel.

Example 7

(Disintegrations) For any model $\mathcal {P}$, let $\sigma (\mathcal {P})$ denote the $\sigma $-field over $\Theta $ generated by the maps $\theta \mapsto P_\theta (A)$ for all $A\in \mathcal {E}$. One of the main reasons for requiring measurability of a kernel is the need of defining a probability on $\mathcal {E}$ as

$$\begin{aligned} \mu _\pi (A)=\int P_\theta (A)\,\pi (d\theta ),\quad A\in \mathcal {E}, \end{aligned}$$

(6)

where $\pi $ is a given probability on $\mathcal {H}$. Such $\mu _\pi $ cannot be defined if $\mathcal {P}$ is a model but not a kernel. In Bayesian inference, for instance, $\mathcal {P}$ is asked to be a kernel and $\pi $ is the prior distribution. This procedure assumes that the $\sigma $-field $\mathcal {H}$ is fixed before than $\mathcal {P}$. However, these two steps could be reverted. Precisely, one first selects a model $\mathcal {P}$ and then takes $\mathcal {H}=\sigma (\mathcal {P})$. This actually happens as regards non-measurable disintegrations. To illustrate, suppose we are given a probability P on $\mathcal {E}$ and a partition $\{A_\theta :\theta \in \Theta \}$ with $A_\theta \in \mathcal {E}$ for all $\theta $. A (non-measurable) disintegration for P is a pair $(\mathcal {P},\pi )$ where $\mathcal {P}$ is a model, $\pi $a probability on $\sigma (\mathcal {P})$, and

$P_\theta (A_\theta )=1$ for all $\theta \in \Theta $;
$P(A)=\int P_\theta (A)\,\pi (d\theta )$ for all $A\in \mathcal {E}$.

A disintegration is said to be measurable if $\Theta $ is equipped with a $\sigma $-field $\mathcal {H}$ and $\mathcal {P}$ is a kernel. Obviously, the conditions for having a non-measurable disintegration are much more general than those for a measurable disintegation; see e.g. (Berti et al. 2020) and references therein.

Example 8

(Orthogonality preserving models) As noted in Example 7, if $\mathcal {P}$ is a kernel and $\pi $ a probability on $\mathcal {H}$, one can define a probability $\mu _\pi $ on $\mathcal {E}$ via equation (6). A kernel $\mathcal {P}$ is orthogonality preserving if $\mu _{\pi _1}$ and $\mu _{\pi _2}$ are singular whenever $\pi _1$ and $\pi _2$ are singular probabilities on $\mathcal {H}$. It is straightforward to prove that an orthogonal kernel is orthogonality preserving; see (Mauldin et al. 1983). This implication is still valid if kernels are replaced by models. Indeed, in Proposition 7, we will show that a weakly orthogonal model (as defined below) is orthogonality preserving in a suitable sense.

We now extend Theorem 4 from kernels to models. Unlike Theorem 4, the extended version admits a straightforward proof. Moreover, the notion of orthogonality can be weakened.

For any model $\mathcal {P}$, define the $\sigma $-field

$$\begin{aligned} \mathcal {E}_\mathcal {P}=\bigcap _{\theta \in \Theta }\overline{\mathcal {E}}^{P_\theta } \end{aligned}$$

where $\overline{\mathcal {E}}^{P_\theta }$ is the completion of $\mathcal {E}$ with respect to $P_\theta $. Given a function $f:\mathcal {X}\rightarrow \mathcal {X}$, we say that f is measurable if $f^{-1}(\mathcal {E})\subset \mathcal {E}$ and that f is $\mathcal {P}$-measurable if $f^{-1}(\mathcal {E})\subset \mathcal {E}_\mathcal {P}$. Note that f is $\mathcal {P}$-measurable if and only if it is measurable with respect to $P_\theta $ for every $\theta \in \Theta $. We also say that $\mathcal {P}$ is weakly orthogonal if there is a partition $\{A_\theta :\,\theta \in \Theta \}$ of $\mathcal {X}$ such that

$$\begin{aligned} A_\theta \in \mathcal {E}_\mathcal {P}\quad \text {and}\quad P_\theta (A_\theta )=1\quad \text {for each }\theta \in \Theta . \end{aligned}$$

(7)

Here, with a slight abuse of notation, the only extension of $P_\theta $ to $\mathcal {E}_\mathcal {P}$ is still denoted by $P_\theta $. In this notation, the following result is available.

Theorem 5

Suppose card$\,(\Theta )\le \,\,$card$\,(\mathcal {X})$ and $(\mathcal {X},\mathcal {E})$ is a Radon space. Then, a model $\mathcal {P}$ is non-atomic and weakly orthogonal if and only if, for any other model $\mathcal {Q}$, there is a $\mathcal {P}$-measurable function $f:\mathcal {X}\rightarrow \mathcal {X}$ such that

$$\begin{aligned} Q_\theta =P_\theta \circ f^{-1}\quad \quad \text {for each }\theta \in \Theta . \end{aligned}$$

(8)

Proof

If $\mathcal {E}$ does not support non-atomic probability measures, non-atomic models do not exist and condition (8) certainly fails for some choice of $\mathcal {Q}$. Hence, $\mathcal {E}$ can be assumed to support a non-atomic probability measure.

Suppose $\mathcal {P}$ is non-atomic and weakly orthogonal. Fix a model $\mathcal {Q}$ and a partition $\{A_\theta :\,\theta \in \Theta \}$ of $\mathcal {X}$ satisfying condition (7). Given $\theta \in \Theta $, since $Q_\theta $ is tight and $P_\theta $ is a non-atomic probability measure, there is a measurable function $f_\theta :\mathcal {X}\rightarrow \mathcal {X}$ such that $Q_\theta =P_\theta \circ f_\theta ^{-1}$; see [Berti et al. (2007), Theo. 3.1]. For each $x\in \mathcal {X}$, denote by $\theta _x$ the unique $\theta \in \Theta $ such that $x\in A_\theta $. Define a function $f:\mathcal {X}\rightarrow \mathcal {X}$ as

$$\begin{aligned} f(x)=f_{\theta _x}(x)\quad \quad \text {for every }x\in \mathcal {X}. \end{aligned}$$

Fix $\theta \in \Theta $ and $A\in \mathcal {E}$. Then,

$$\begin{aligned} \bigl \{f\in A\bigr \}=\bigl \{f_\theta \in A,\,f=f_\theta \bigr \}\cup \bigl \{f\in A,\,f\ne f_\theta \bigr \}. \end{aligned}$$

Since $\{f\ne f_\theta \}\subset A_\theta ^c$ and $P_\theta (A_\theta ^c)=0$, both the sets $\bigl \{f=f_\theta \bigr \}$ and $\bigl \{f\in A,\,f\ne f_\theta \bigr \}$ belong to $\overline{\mathcal {E}}^{P_\theta }$. Since $f_\theta $ is measurable, $\bigl \{f_\theta \in A\bigr \}\in \mathcal {E}$. It follows that

$$\begin{aligned} \bigl \{f\in A\bigr \}\in \,\overline{\mathcal {E}}^{P_\theta }. \end{aligned}$$

Therefore, f is $\mathcal {P}$-measurable. Furthermore,

$$\begin{aligned} Q_\theta =P_\theta \circ f_\theta ^{-1}=P_\theta \circ f^{-1}\quad \quad \text {for each }\theta \in \Theta . \end{aligned}$$

Conversely, suppose that, for any model $\mathcal {Q}$, there is a $\mathcal {P}$-measurable function $f:\mathcal {X}\rightarrow \mathcal {X}$ satisfying condition (8). Fix $\theta \in \Theta $ and a non-atomic probability measure $\nu $ on $\mathcal {E}$. Taking $\mathcal {Q}$ such that $Q_\theta =\nu $, condition (8) implies $P_\theta \circ f^{-1}=\nu $ for some f. Hence, $P_\theta $ is non-atomic since $\nu $ is non-atomic and $P_\theta \circ f^{-1}=\nu $. We next prove that $\mathcal {P}$ is weakly orthogonal. Since card$\,(\Theta )\le \,\,$card$\,(\mathcal {X})$, there is an injective function $\phi :\Theta \rightarrow \mathcal {X}$. Letting $Q_\theta =\delta _{\phi (\theta )}$ for each $\theta \in \Theta $, condition (8) yields

$$\begin{aligned} P_\theta \bigl (f^{-1}\{\phi (\theta )\}\bigr )=\delta _{\phi (\theta )}\{\phi (\theta )\}=1 \end{aligned}$$

for some $\mathcal {P}$-measurable function $f:\mathcal {X}\rightarrow \mathcal {X}$. Define $B_\theta =f^{-1}\{\phi (\theta )\}$ and

$$\begin{aligned} D=\Bigl (\bigcup _{\theta \in \Theta } B_\theta \Bigr )^c. \end{aligned}$$

The sets $B_\theta $ belong to $\mathcal {E}_\mathcal {P}$ and are pairwise disjoint since $\phi $ is injective. Moreover, $D\in \mathcal {E}_\mathcal {P}$ since $D\subset B_\theta ^c$ and $P_\theta (B_\theta ^c)=0$ for all $\theta \in \Theta $. Hence, fixed any point $\theta _0\in \Theta $, condition (7) holds with $A_\theta =B_\theta $ for $\theta \ne \theta _0$ and $A_{\theta _0}=B_{\theta _0}\cup D$. $\square $

We do not know whether the assumption card$\,(\Theta )\le \,\,$card$\,(\mathcal {X})$ can be dropped. Such an assumption, instead, is superfluous in Theorem 4. In fact, Theorem 4 is trivially true if $\mathcal {X}$ is countable. Otherwise, if $\mathcal {X}$ is uncountable, card$\,(\Theta )\le \,\,$card$\,(\mathcal {X})$ follows from $(\Theta ,\mathcal {H})$ and $(\mathcal {X},\mathcal {E})$ are standard Borel spaces.

As noted above, the heuristic interpretation of kernels can be attached to models as well. Thus, Theorem 5 has essentially the same motivations as Theorem 4.

Our next result is actually a mixture of Theorems 1, 4 and 5. Let $\mathcal {P},\,\mathcal {Q}_0,\,\mathcal {Q}_1,\ldots $ be models with $\mathcal {P}$ non-atomic and weakly orthogonal. By Theorem 5, for each $n\ge 0$, there is a $\mathcal {P}$-measurable function $f_n:\mathcal {X}\rightarrow \mathcal {X}$ such that $P_\theta \circ f_n^{-1}=Q_{n,\theta }$ for all $\theta $. We now prove that, if $Q_{n,\theta }=Q_{0,\theta }$ on $\sigma (g)$ for all $\theta $ and a suitable function g, then $f_n$ can be taken such that $g(f_n)=g(f_0)$. Moreover, we give conditions for $f_n\overset{P_\theta -a.s.}{\longrightarrow }f_0$, as $n\rightarrow \infty $, for fixed $\theta \in \Theta $.

Theorem 6

Let $(\mathcal {X},\mathcal {E})$ be a Radon space, $\mathcal {Y}$ a separable metric space, and $g:\mathcal {X}\rightarrow \mathcal {Y}$ a Borel function. Let $\mathcal {P}$ and $\mathcal {Q}_n$ be models, where $n\ge 0$. Suppose $\mathcal {P}$ is non-atomic and weakly orthogonal and

$$\begin{aligned} Q_{n,\theta }=Q_{0,\theta }\quad \text {on }\sigma (g)\text { for all }n\ge 0\text { and }\theta \in \Theta . \end{aligned}$$

Then, there are $\mathcal {P}$-measurable functions $f_n:\mathcal {X}\rightarrow \mathcal {X}$ such that

$$\begin{aligned} P_\theta \circ f_n^{-1}=Q_{n,\theta }\,\text { and }\,g(f_n)=g(f_0)\,\text { for all }n\ge 0\text { and }\theta \in \Theta . \end{aligned}$$

(9)

In addition to (9), for fixed $\theta \in \Theta $, one obtains

$$\begin{aligned} f_n\overset{P_\theta -a.s.}{\longrightarrow }f_0 \end{aligned}$$

whenever

$$\begin{aligned} E_{Q_{n,\theta }}(\varphi \mid g)\,\overset{Q_{0,\theta }-a.s.}{\longrightarrow }\,E_{Q_{0,\theta }}(\varphi \mid g)\quad \quad \text {for each }\varphi \in C_b(\mathcal {X}). \end{aligned}$$

(10)

Proof

Fix $\theta \in \Theta $. By Corollary 4 of Pratelli and Rigo (2023), since $(\mathcal {X},\mathcal {E})$ is Radon and $(\mathcal {X},\mathcal {E},P_\theta )$ is a non-atomic probability space, there are measurable functions $f_{n,\theta }:\mathcal {X}\rightarrow \mathcal {X}$ such that

$$\begin{aligned} Q_{n,\theta }=P_\theta \circ f_{n,\theta }^{-1}\quad \text {and}\quad g(f_{n,\theta })=g(f_{0,\theta })\quad \text {for all }n\ge 0. \end{aligned}$$

Moreover, under condition (10), one also obtains $f_{n,\theta }\overset{P_\theta -a.s.}{\longrightarrow }f_{0,\theta }$.

Next, since $\mathcal {P}$ is weakly orthogonal, there is a partition $\{A_\theta :\,\theta \in \Theta \}$ of $\mathcal {X}$ satisfying condition (7). For all $n\ge 0$ and $x\in \mathcal {X}$, define

$$\begin{aligned} f_n(x)=f_{n,\theta _x}(x) \end{aligned}$$

where $\theta _x$ denotes the unique $\theta \in \Theta $ such that $x\in A_\theta $. Then, it is obvious that $g(f_n)=g(f_0)$ for all n. Moreover, arguing as in the proof of Theorem 5, the $f_n$ are $\mathcal {P}$-measurable and $Q_{n,\theta }=P_\theta \circ f_{n}^{-1}$ for all n and $\theta $. Finally, since $P_\theta \bigl (f_n=f_{n,\theta }\bigr )=1$, condition (10) implies $f_n\overset{P_\theta -a.s.}{\longrightarrow }f_0$.

Incidentally, we note that card$\,(\Theta )\le \,\,$card$\,(\mathcal {X})$ under the assumptions of Theorem 6. In fact, card$\,(\Theta )\le \,\,$card$\,(\mathcal {X})$ follows from $\mathcal {P}$ being weakly orthogonal.

We close the paper by proving a claim made in Example 8.

Proposition 7

Let $\mathcal {P}$ be a model and $\sigma (\mathcal {P})$ the $\sigma $-field defined in Example 7. If $\mathcal {P}$ is weakly orthogonal and $\pi _1$ and $\pi _2$ are singular probabilities on $\sigma (\mathcal {P})$, then

$$\begin{aligned} \mu _{\pi _1}(A)=\mu _{\pi _2}(A^c)=1\quad \quad \text {for some }A\in \mathcal {E}_\mathcal {P}. \end{aligned}$$

Proof

Let $\{A_\theta :\,\theta \in \Theta \}$ be a partition of $\mathcal {X}$ satisfying condition (7). Since $\pi _1$ and $\pi _2$ are singular, there is $B\in \sigma (\mathcal {P})$ such that $\pi _1(B)=\pi _2(B^c)=1$. Define

$$\begin{aligned} A=\bigcup _{\theta \in B}A_\theta . \end{aligned}$$

Then, $A\supset A_\theta $ for $\theta \in B$ and $A\subset A_\theta ^c$ for $\theta \in B^c$. Since $P_\theta (A_\theta )=1$ for all $\theta \in \Theta $, it follows that $A\in \mathcal {E}_\mathcal {P}$. Moreover,

$$\begin{aligned} \mu _{\pi _1}(A)=\int \ P_\theta (A)\,\pi _1(d\theta )=\int _B\ P_\theta (A)\,\pi _1(d\theta )=\int _B 1\,d\pi _1=\pi _1(B)=1, \end{aligned}$$

and similarly $\mu _{\pi _2}(A^c)=1$. $\square $

References

Berti, P., Rigo, P.: 0–1 laws for regular conditional distributions. Ann. Probab. 35, 649–662 (2007)
Article Google Scholar
Berti, P., Pratelli, L., Rigo, P.: Skorohod representation on a given probability space. Prob. Theo. Relat. Fields 137, 277–288 (2007)
Article Google Scholar
Berti, P., Pratelli, L., Rigo, P.: A Skorohod representation theorem for uniform distance. Prob. Theo. Relat. Fields 150, 321–335 (2011)
Article Google Scholar
Berti, P., Pratelli, L., Rigo, P.: A Skorohod representation theorem without separability. Electr. Commun. Probab. 18, 1–12 (2013)
Google Scholar
Berti, P., Pratelli, L., Rigo, P.: Gluing lemmas and Skorohod representations. Electr. Commun. Probab. 20, 1–11 (2015)
Google Scholar
Berti, P., Dreassi, E., Rigo, P.: A notion of conditional probability and some of its consequences. Decisions Econ. Financ. 43, 3–15 (2020)
Article Google Scholar
Blackwell, D., Dubins, L.E.: On existence and non-existence of proper, regular, conditional distributions. Ann. Probab. 3, 741–752 (1975)
Article Google Scholar
Blackwell, D., Dubins, L.E.: An extension of Skorohod’s almost sure representation theorem. Proc. Am. Math. Soc. 89, 691–692 (1983)
Google Scholar
Chau, H.N., Rasonyi, M.: Skorohod’s representation theorem and optimal strategies for markets with frictions. SIAM J. Control. Optim. 55, 3592–3608 (2017)
Article Google Scholar
Chone, P., Kramarz, F.: Matching workers’ skills and firms’ technologies: From bundling to unbundling, Working Papers 2021-10 CREST (2021)
Chone, P., Gozlan, N., Kramarz, F.: Weak optimal transport with unnormalized kernels. SIAM J. Math. Anal. 55(6), 6039–6092 (2023)
Article Google Scholar
Cortissoz, J.: On the Skorokhod representation theorem. Proc. Amer. Math. Soc. 135, 3995–4007 (2007)
Article Google Scholar
Dudley, R.M.: Distances of probability measures and random variables. Ann. Math. Stat. 39, 1563–1572 (1968)
Article Google Scholar
Dudley, R.M.: Uniform central limit theorems. Cambridge University Press, Cambridge (1999)
Book Google Scholar
Dumav, M., Stinchombe, M.B.: Skorohod’s representation theorem for sets of probabilities. Proc. Am. Math. Soc. 144, 3123–3133 (2016)
Article Google Scholar
Dynkin, E.B.: Sufficient statistics and extreme points. Ann. Probab. 6, 705–730 (1978)
Article Google Scholar
Farrell, R.H.: Representation of invariant measures. Ill. J. Math. 6, 447–467 (1962)
Google Scholar
Fernique, X.: Un modele presque sur pour la convergence en loi. C.R Acad. Sci. Paris Ser. I 306, 335–338 (1988)
Google Scholar
Fölmer, H.: Phase transition and Martin boundary, Sem. Prob. IX, Springer. Lect. Notes in Math. 465, 305–317 (1975)
Article Google Scholar
Galichon, A., Henry-Labordere, P., Touzi, N.: A stochastic control approach to no-arbitrage bounds given marginals, with an application to lookback options. Ann. Appl. Probab. 24, 312–336 (2014)
Article Google Scholar
Hansen, L.P., Maccheroni, F., Marinacci, M., Sargent, T.J.: Tempered Bayesian analysis, Unpublished manuscript (2024)
Hernandez-Ceron, N.: Extensions of Skorohod’s almost sure representation theorem, Master of Science Thesis, University of Alberta, https://era.library.ualberta.ca/items/78ddb981-ca1c-4945-a6e9-8298217d6be6 (2010)
Jaffe, A.Q.: A strong duality principle for equivalence couplings and total variation. Electr. J. Probab. 28, 1–33 (2023)
Google Scholar
Jakubowski, A.: The almost sure Skorokhod representation for subsequences in nonmetric spaces. Theo. Probab. Appl. 42, 167–174 (1998)
Article Google Scholar
Lauritzen, S.L.: Sufficiency, prediction and extreme models. Scand. J. Stat. 1, 128–134 (1974)
Google Scholar
Maitra, A.: Integral representations of invariant measures. Trans. Am. Math. Soc. 229, 209–225 (1977)
Article Google Scholar
Mauldin, R.D., Preiss, D., Weizsacker, H.V.: Orthogonal transition kernels. Ann. Probab. 11, 970–988 (1983)
Article Google Scholar
Pratelli, L., Rigo, P.: A strong version of the Skorohod representation theorem. J. Theoret. Probab. 36, 372–389 (2023)
Article Google Scholar
Pratelli, L., Rigo, P.: Some duality results for equivalence couplings and total variation. Electr. Commun. Probab. 29, 1–12 (2024)
Google Scholar
Sethuraman, J.: Some extensions of the Skorohod representation theorem. Sankhya 64, 884–893 (2002)
Google Scholar
Skorohod, A.V.: Limit theorems for stochastic processes. Theo. Probab. Appl. 1, 261–290 (1956)
Article Google Scholar
van der Vaart, A., Wellner, J.A.: Weak convergence and empirical processes. Springer, Cham (1996)
Book Google Scholar
Weis, L.: On the representation of order continuous operators by random measures. Trans. Amer. Math. Soc. 285, 535–563 (1984)
Article Google Scholar
Wichura, M.J.: On the construction of almost uniformly convergent random variables with given weakly convergent image laws. Ann. Math. Stat 41, 284–291 (1970)
Article Google Scholar

Download references

Funding

Open access funding provided by Alma Mater Studiorum - Università di Bologna within the CRUI-CARE Agreement.

Author information

Authors and Affiliations

Naval Academy of Livorno, Livorno, Italy
Luca Pratelli
Department of Statistics, University of Bologna, Bologna, Italy
Pietro Rigo

Authors

Luca Pratelli
View author publications
You can also search for this author in PubMed Google Scholar
Pietro Rigo
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Pietro Rigo.

Ethics declarations

Conflict of interest

The authors certify that they have no affiliations with or involvement in any organization or entity with any financial or non-financial interest in the subject matter or materials discussed in this manuscript.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Pratelli, L., Rigo, P. Some Skorohod-type results. Decisions Econ Finan (2024). https://doi.org/10.1007/s10203-024-00466-w

Download citation

Received: 04 January 2024
Accepted: 17 June 2024
Published: 01 July 2024
DOI: https://doi.org/10.1007/s10203-024-00466-w

Keywords

Mathematics Subject Classification

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Some Skorohod-type results

Abstract

Similar content being viewed by others

A Strong Version of the Skorohod Representation Theorem

Multifractal Spectrum for Barycentric Averages

An Extended Perron–Frobenius Theorem and Large Deviation Theory for Markov Processes

1 Introduction

Theorem 1

2 Preliminaries

Theorem 2

3 Theorem 1 and its consequences

Lemma 1

Proof

Example 1

Theorem 3

Proof

Example 2

Example 3

Example 4

4 Kernels versus models

Example 5

Theorem 4

Example 6

Example 7

Example 8

Theorem 5

Proof

Theorem 6

Proof

Proposition 7

Proof

References

Funding

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Mathematics Subject Classification

Mathematics Subject Classification

Search

Navigation