Keywords

AMS classification (2010):

1 Introduction

We call any measure on a path space, a path measure. Some notions about path measures which appear naturally when solving the Schrödinger problem (see (1) below and [5]) are presented and worked out in detail.

Aim of This Article

This paper is about three separate items:

  1. 1.

    Disintegration of an unbounded measure;

  2. 2.

    Basic properties of the relative entropy with respect to an unbounded measure;

  3. 3.

    Positive integration with respect to a Markov measure.

Although items (1) and (2) are mainly about general unbounded measures, we are motivated by their applications to path measures.

In particular, it is shown that when Q is an unbounded path measure, some restriction must be imposed on Q for considering conditional expectations such as \(Q(\cdot \vert X_{t})\). This is the content of the notion of conditionable path measure which is introduced at Definition 3.

Some care is also required when working with the relative entropy with respect to an unbounded reference measure. We also give a detailed proof of the additive property of the relative entropy at Theorem 2. Indeed, we didn’t find in the literature a complete proof of this well known result.

Some Notation

Let \(\mathcal{X}\) be a Polish state space equipped with the corresponding Borel σ-field and \(\varOmega = D([0,1],\mathcal{X})\) the space of all càdlàg (right-continuous and left-limited) paths from the unit time interval [0, 1] to \(\mathcal{X}\). Depending on the context, we may only consider \(\varOmega = C([0,1],\mathcal{X})\), the space of all continuous paths. As usual, the σ-field on Ω is generated by the canonical process

$$\displaystyle{ X_{t}(\omega ):=\omega _{t} \in \mathcal{X},\quad \omega = (\omega _{s})_{0\leq s\leq 1} \in \varOmega,\ 0 \leq t \leq 1. }$$

We write M+(Y ) for the set of all nonnegative measures on a space Y, and P(Y ) for the subset of all probability measures. Let Q ∈ M+(Y ), the push-forward of Q by the measurable mapping \(\phi: Y \rightarrow \mathcal{X}\) is denoted by ϕ # Q or \(Q_{\phi } \in \mathrm{ M}_{+}(\mathcal{X})\).

Any positive measure Q ∈ M+(Ω) on the path space Ω is called a path measure. For any subset \(\mathcal{T} \subset [0,1]\), we denote \(X_{\mathcal{T}} = (X_{t})_{t\in \mathcal{T}}\) and \(Q_{\mathcal{T}} = (X_{\mathcal{T}})_{\#}Q = Q(X_{\mathcal{T}}\in \cdot ) \in \mathrm{ M}_{+}(\varOmega _{\mathcal{T}})\) the push-forward of Q by \(X_{\mathcal{T}}\) on the set of positive measures on the restriction \(\varOmega _{\mathcal{T}}\) of Ω to \(\mathcal{T}\). In particular, for each \(0 \leq t \leq 1,\ Q_{t} = Q(X_{t} \in \cdot ) \in \mathrm{ M}_{+}(\mathcal{X})\).

Motivation

Take a reference path measure R ∈ M+(Ω) and consider the problem

$$\displaystyle{ H(P\vert R) \rightarrow \mathrm{ min};\qquad P \in \mathrm{ P}(\varOmega ): P_{0} =\mu _{0},P_{1} =\mu _{1} }$$
(1)

of minimizing the relative entropy

$$\displaystyle{H(P\vert R):=\int _{\varOmega }\log \left (\frac{\mathit{dP}} {\mathit{dR}}\right )\,\mathit{dP} \in (-\infty,\infty ]}$$

of P ∈ P(Ω) with respect to R ∈ M+(Ω), among all path probability measures P ∈ P(Ω) such that the initial and final marginals P 0 and P 1 are required to equal respectively two prescribed probability measures μ 0 and \(\mu _{1} \in \mathrm{ P}(\mathcal{X})\) on the state space \(\mathcal{X}\). This entropy minimization problem is called, since the eighties, the Schrödinger problem. It is described in the author’s survey paper [5] where it is exemplified with R a reversible Markov process in the classical sense of [4, Sect. 1.2], for instance the reversible Brownian motion on \(\mathbb{R}^{n}\).

If one wants to describe the reversible Brownian motion on \(\mathbb{R}^{n}\) as a measure on the path space \(\varOmega = C([0,1], \mathbb{R}^{n})\), one has to consider an unbounded measure. Indeed, its reversing measure is Lebesgue measure (or any of its positive multiple), and its “law” is

$$\displaystyle{ R =\int _{\mathbb{R}^{n}}\mathcal{W}_{x}(\cdot )\,\mathit{dx} \in \mathrm{ M}_{+}(\varOmega ), }$$

where \(\mathcal{W}_{x} \in \mathrm{ P}(\varOmega )\) stands for the Wiener measure with starting position \(x \in \mathbb{R}^{n}\). Obviously, this path measure has the same unbounded mass as Lebesgue measure. More generally, any path measure Q ∈ M+(Ω) has the same mass as its time-marginal measures \(Q_{t} \in \mathrm{ M}_{+}(\mathcal{X})\) for all t ∈ [0, 1]. In particular, any reversible path measure in M+(Ω) with an unbounded reversing measure in \(\mathrm{M}_{+}(\mathcal{X})\), is also unbounded.

In connection with the Schrödinger problem, the notion of (f, g)-transform of a possibly unbounded Markov measure R is introduced in [5]. It is defined by

$$\displaystyle{ P = f(X_{0})g(X_{1})\,R \in \mathrm{ P}(\varOmega ) }$$
(2)

where f and g are measurable nonnegative functions such that \(E_{R}(f(X_{0})g(X_{1})) = 1\). It is a time-symmetric extension of the usual Doob h-transform. It can be shown that the product form of the Radon-Nikodym derivative f(X 0)g(X 1) implies that P is the solution to the Schrödinger problem with the correct prescribed marginals μ 0 and μ 1 given by

$$\displaystyle{ \left \{\begin{array}{lcl} \mu _{0}(\mathit{dx})& =&f(x)E_{R}(g(X_{1})\mid X_{0} = x)\,R_{0}(\mathit{dx}), \\ \mu _{1}(\mathit{dy})& =&E_{R}(f(X_{0})\mid X_{1} = y)g(y)\,R_{1}(\mathit{dy}). \end{array} \right. }$$
(3)

Disintegration of an Unbounded Path Measure

One has to be careful when saying that the reversible Brownian motion R ∈ M+(Ω) is Markov. Of course, this means that for all \(0 \leq t \leq 1,\ E_{R}(X_{[t,1]} \in \cdot \mid X_{[0,t]}) = E_{R}(X_{[t,1]} \in \cdot \mid X_{t})\). Similarly, we wrote (3) without hesitation. But the problem is to define properly the conditional expectation with respect to an unbounded measure. This will be the purpose of Sect. 2 where extensions of the conditional expectation are considered and a definition of the Markov property for an unbounded path measure is given. The general theory of conditional expectation is recalled at the appendix Sect. 5 to emphasize the role of σ-finiteness.

Relative Entropy with Respect to an Unbounded Measure

The relative entropy with respect to a probability measure is well-known. But once we have to deal with an unbounded path measure at hand, what about the relative entropy with respect to an unbounded measure and its additive property? This is the subject of Sect. 3.

Positive Integration with Respect to a Markov Measure

It is assumed in the (f, g)-transform formula (2) that E R (f(X 0)g(X 1)) <  with f, g ≥ 0, while the conditional expectations E R (f(X 0)∣X 1) and E R (g(X 1)∣X 0) appear at (3). But the assumption that f(X 0)g(X 1) is R-integrable doesn’t ensure, in general, that f(X 0) and g(X 1) are separately R-integrable; which is a prerequisite for defining properly the conditional expectations \(E_{R}(f(X_{0})\mid X_{1})\) and \(E_{R}(g(X_{1})\mid X_{0})\). However, we need a general setting for the conditional expectations in (3) to be meaningful. This will be presented at Sect. 4 where we take advantage of the positivity of the functions f and g.

2 Disintegration of an Unbounded Path Measure

We often need the following notion which is a little more restrictive than the absolute continuity, but which matches with it whenever the measures are σ-finite.

Definition 1

Let R and Q ∈ M+(Ω) be two positive measures on some measurable space Ω. One says that Q admits a density with respect to R if there exists a measurable function θ: Ω → [0, ) verifying

$$\displaystyle{ \int _{\varOmega }f\,\mathit{dQ} =\int _{\varOmega }f\theta \,\mathit{dR} \in [0,\infty ],\quad \forall f \geq 0\mathrm{measurable}. }$$

We write this relation

$$\displaystyle{Q \prec R}$$

and we denote

$$\displaystyle{\theta:= \frac{\mathit{dQ}} {\mathit{dR}}}$$

which is called the Radon-Nikodym derivative of Q with respect to R.

Thanks to the monotone convergence theorem, it is easy to check that if R is σ-finite and θ: Ω → [0, ) is a nonnegative measurable function, then

$$\displaystyle{\theta R(A):=\int _{A}\theta \,\mathit{dR},\quad A \in \mathcal{A},}$$

defines a positive measure on the σ-field \(\mathcal{A}\).

Proposition 1

Let R and Q be two positive measures. Suppose that R is σ-finite. The following assertions are equivalent:

  1. (a)

    Q ≺ R

  2. (b)

    Q is σ-finite and Q ≪ R.

Proof

The implication \((b) \Rightarrow (a)\) is Radon-Nikodym Theorem 4. Let us show its converse \((a) \Rightarrow (b)\). The absolute continuity Q ≪ R is straightforward. Let us prove that Q is σ-finite. Let (A n ) n ≥ 1 be a σ-finite partition of R. Define for all \(k \geq 1,\ B_{k} =\{ k - 1 \leq \mathit{dQ}/\mathit{dR} < k\}\). The sequence (B k ) k ≥ 1 is also a measurable partition. Hence, \((A_{n} \cap B_{k})_{n,k\geq 1}\) is a countable measurable partition. On the other hand, for any \((n,k),\ Q(A_{n} \cap B_{k}) = E_{R}(\mathbf{1}_{A_{n}\cap B_{k}}\,\mathit{dQ}/\mathit{dR}) \leq kR(A_{n}) < \infty \). Therefore \((A_{n} \cap B_{k})_{n,k\geq 1}\) is a σ-finite partition of Q.

Let Q, R ∈ M+(Ω) be two (possibly unbounded) positive measures on Ω. Let \(\phi:\varOmega \rightarrow \mathcal{X}\) be a measurable mapping from Ω to a Polish (separable, complete metric) space \(\mathcal{X}\) equipped with its Borel σ-field. Although Q ≪ R implies that Q ϕ  ≪ R ϕ , in general we do not have Q ϕ  ≺ R ϕ when Q ≺ R, as the following example shows;

Example 1

The measure R is the uniform probability measure on Ω = [0, 1] × [0, 1],  Q is defined by \(Q(dxdy) = 1/y\,R(dxdy)\) and we denote the canonical projections by \(\phi _{X}(x,y) = x,\ \phi _{Y }(x,y) = y,\ (x,y) \in \varOmega\). We observe that on the one hand R,  Q and \(R_{\phi _{X}}(\mathit{dx}) =\mathrm{ Leb}(\mathit{dx}) = \mathit{dx}\) are σ-finite, but on the other hand, \(Q_{\phi _{X}}\) is defined by \(Q_{\phi _{X}}(A) = \left \{\begin{array}{ll} 0 &\mathrm{if}\mathrm{Leb}(A) = 0\\ + \infty &\mathrm{otherwise }\\ \end{array} \right.\). We have \(Q_{\phi _{X}} \ll R_{\phi _{X}}\), but \(Q_{\phi _{X}}\) is not σ-finite. We also see that \(Q_{\phi _{Y }}(\mathit{dy}) = 1/y\,\mathit{dy}\) is σ-finite.

2 An Extension of the Conditional Expectation

To extend easily results about conditional expectation with respect to a bounded measure (in particular Propositions 4 and 5) to a σ-finite measure, it is useful to rely on the following preliminary result.

Lemma 1

Let us assume that R ϕ is σ-finite.

  1. (a)

    Let \(\gamma: \mathcal{X} \rightarrow (0,1]\) be a measurable function such that γR ϕ is a bounded measure. Then, L 1 (R) ⊂ L 1 (γ(ϕ)R) and for any \(f \in L^{1}(R),\ E_{R}(f\mid \phi ) = E_{\gamma (\phi )R}(f\mid \phi ),R\mathrm{-a.e.}\)

  2. (b)

    There exists a function γ ∈ L 1 (R ϕ ) such that 0 < γ ≤ 1,R ϕ−a.e. In particular, the measure γ(ϕ)R is bounded and equivalent to R, i.e. for any measurable subset \(A,\ R(A) = 0\;\Longleftrightarrow\;[\gamma (\phi )R](A) = 0\) .

  3. (c)

    Let Q be another positive measure on Ω such that Q ϕ is σ-finite. Then, there exists a function γ ∈ L 1 (R ϕ + Q ϕ ) such that \(0 <\gamma \leq 1,(R_{\phi } + Q_{\phi })\mathrm{-a.e.}\) In particular, the measures γ(ϕ)R and γ(ϕ)Q are bounded and respectively equivalent to R and Q.

Proof

  • Proof of (a). Denote B ϕ the space of all \(\mathcal{A}(\phi )\)-measurable and bounded functions and \(\gamma B_{\phi }:=\{ h: h/\gamma (\phi ) \in B_{\phi }\} \subset B_{\phi }\). For all f ∈ L 1(R) and h ∈ γ B ϕ ,

    $$\displaystyle\begin{array}{rcl} \int _{\varOmega }hf\,\mathit{dR}& =& \int _{\varOmega } \frac{h} {\gamma (\phi )}f\gamma (\phi )\,\mathit{dR} {}\\ & =& \int _{\varOmega } \frac{h} {\gamma (\phi )}E_{\gamma (\phi )R}(f\mid \phi )\,d(\gamma (\phi )R) {}\\ & =& \int _{\varOmega }\mathit{hE}_{\gamma (\phi )R}(f\mid \phi )\,\mathit{dR}. {}\\ \end{array}$$

    On the other hand, \(\int _{\varOmega }hf\,\mathit{dR} =\int _{\varOmega }\mathit{hE}_{R}(f\mid \phi )\,\mathit{dR}\) so that

    $$\displaystyle{\int _{\varOmega }\mathit{hE}_{R}(f\mid \phi )\,\mathit{dR}_{\mathcal{A}(\phi )} =\int _{\varOmega }\mathit{hE}_{\gamma (\phi )R}(f\mid \phi )\,\mathit{dR}_{\mathcal{A}(\phi )},\quad \forall h \in \gamma B_{\phi }.}$$

    In other words, the measures \(E_{R}(f\mid \phi )R_{\mathcal{A}(\phi )}\) and \(E_{\gamma (\phi )R}(f\mid \phi )R_{\mathcal{A}(\phi )}\) match on γ B ϕ . But, since γ(ϕ) > 0, the measures on \(\mathcal{A}(\phi )\) are characterized by their values on γ B ϕ . Consequently, \(E_{R}(f\mid \phi )R_{\mathcal{A}(\phi )} = E_{\gamma (\phi )R}(f\mid \phi )R_{\mathcal{A}(\phi )}\). This completes the proof of statement (1).

  • Proof of (b). It is a particular instance of statement (c), taking Q = 0.

  • Proof of (c). If R and Q are bounded, it is sufficient to take \(\gamma \equiv 1\). Suppose now that R + Q is unbounded. The intersection of two partitions which are respectively σ-finite with respect to R ϕ and Q ϕ is a partition \((\mathcal{X}_{n})_{n\geq 1}\) of \(\mathcal{X}\) which is simultaneously σ-finite with respect to R ϕ and Q ϕ . We assume without loss of generality that \((R_{\phi } + Q_{\phi })(\mathcal{X}_{n}) \geq 1\) for all n. Let us define

    $$\displaystyle{\gamma:=\sum _{n\geq 1} \frac{2^{-n}} {(R_{\phi } + Q_{\phi })(\mathcal{X}_{n})}\mathbf{1}_{\mathcal{X}_{n}}.}$$

    It is a measurable function on \(\mathcal{X}\). As \(\int _{\varOmega }\gamma (\phi )\,d(R + Q) = 1\) and \(0 <\gamma (\phi ) \leq 1,(R + Q)\mathrm{-a.e.},\ \gamma (\phi )(R + Q)\) is a probability measure that is equivalent to R + Q and \(L^{1}(R + Q) \subset L^{1}(\gamma (\phi )(R + Q))\).

Definition 2 (Extension of the Conditional Expectation)

With Lemma 1, we see that E γ(ϕ)R (⋅ ∣ϕ) is an extension of E R (⋅ ∣ϕ) from L 1(R) to L 1(γ(ϕ)R). We denote

$$\displaystyle{ E_{R}(f\mid \phi ):= E_{\gamma (\phi )R}(f\mid \phi ),\quad f \in L^{1}(\gamma (\phi )R) }$$

where γ is a function the existence of which is ensured by Lemma 1.

Theorem 1

Let R,Q ∈ M+ (Ω) and \(\phi:\varOmega \rightarrow \mathcal{X}\) a measurable mapping in the Polish space \(\mathcal{X}\) . We suppose that Q ≺ R, and also that R ϕ are Q ϕ  σ-finite measures on \(\mathcal{X}\) . Then,

  1. (a)

    E R (⋅∣ϕ) and E Q (⋅∣ϕ) admit respectively a regular conditional probability kernel \(x \in \mathcal{X}\mapsto R(\cdot \mid \phi = x) \in \mathrm{ P}(\varOmega )\) and \(x \in \mathcal{X}\mapsto Q(\cdot \mid \phi = x) \in \mathrm{ P}(\varOmega )\) .

  2. (b)

    \(Q_{\phi } \prec R_{\phi },\quad \frac{\mathit{dQ}} {\mathit{dR}} \in L^{1}(\gamma (\phi )R)\) and

    $$\displaystyle{\frac{\mathit{dQ}_{\phi }} {\mathit{dR}_{\phi }}(x) = E_{R}\left (\frac{\mathit{dQ}} {\mathit{dR}}\mid \phi = x\right ),\quad \forall x \in \mathcal{X},\ R_{\phi }\mathrm{-a.e.}}$$

    The function γ in the above formulas is the one whose existence is ensured by Lemma  1(c); it also appears in Definition  2.

  3. (c)

    Moreover, \(Q(\cdot \mid \phi ) \prec R(\cdot \mid \phi ),\ Q\mathrm{-a.e.}\) and

    $$\displaystyle{ \frac{\mathit{dQ}} {\mathit{dR}}(\omega ) = \frac{\mathit{dQ}_{\phi }} {\mathit{dR}_{\phi }}(\phi (\omega ))\frac{\mathit{dQ}(\cdot \mid \phi =\phi (\omega ))} {\mathit{dR}(\cdot \mid \phi =\phi (\omega ))}(\omega ),\quad \forall \omega \in \varOmega,\ Q\mathrm{-a.e.} }$$
    (4)
  4. (d)

    A formula, more practical than (4) is the following one. For any bounded measurable function f, we have

    $$\displaystyle{ E_{Q}(f\mid \phi ) = \frac{E_{R}\left (\frac{\mathit{dQ}} {\mathit{dR}} f\mid \phi \right )} {E_{R}\left (\frac{\mathit{dQ}} {\mathit{dR}} \mid \phi \right )},\quad Q\mathrm{-a.e.} }$$
    (5)

    where no division by zero occurs since \(E_{R}\left (\frac{\mathit{dQ}} {\mathit{dR}} \mid \phi \right ) > 0,Q\mathrm{-a.e.}\)

Identity (4) also writes more synthetically as

$$\displaystyle{ \frac{\mathit{dQ}} {\mathit{dR}}(\omega ) = \frac{\mathit{dQ}_{\phi }} {\mathit{dR}_{\phi }}(\phi (\omega ))\frac{\mathit{dQ}(\cdot \mid \phi )} {\mathit{dR}(\cdot \mid \phi )}(\omega ),\quad \forall \omega \in \varOmega,\ Q\mathrm{-a.e.} }$$

or more enigmatically as

$$\displaystyle{ \frac{\mathit{dQ}} {\mathit{dR}}(\omega ) = \frac{\mathit{dQ}_{\phi }} {\mathit{dR}_{\phi }}(x)\frac{\mathit{dQ}(\cdot \mid \phi = x)} {\mathit{dR}(\cdot \mid \phi = x)}(\omega ),\quad \forall (\omega,x),\ Q_{\phi }(\mathit{dx})R(d\omega \mid \phi = x)\mathrm{-a.e.} }$$

since we have \(\phi (\omega ) = x,\ Q_{\phi }(\mathit{dx})R(d\omega \mid \phi = x)\)-almost surely.

Proof (Proof of Theorem 1)

If R and Q are bounded measures, this theorem is an immediate consequence of Propositions 4 and 5.

When R ϕ and Q ϕ are σ-finite, we are allowed to invoke Lemma 1: γ(ϕ)R and γ(ϕ)Q are bounded measures and we can apply (i) to them. But,

$$\displaystyle{ \frac{\mathit{dQ}} {\mathit{dR}} = \frac{d(\gamma (\phi )Q)} {d(\gamma (\phi )R)}\qquad \mathrm{and}\qquad \frac{\mathit{dQ}_{\phi }} {\mathit{dR}_{\phi }} = \frac{d(\gamma Q_{\phi })} {d(\gamma R_{\phi })}. }$$

This completes the proof of the theorem.

2 Hilbertian Conditional Expectation

So far, we have considered the conditional expectation of a function f in L 1(R). If the reference measure R is bounded, then L 2(R) ⊂ L 1(R). But if R is unbounded, this inclusion fails and the conditional expectation which we have just built is not valid for every f in L 2(R). It is immediate to extend this notion from \(L^{1}(R) \cap L^{2}(R)\) to L 2(R), interpreting the fundamental relation (14) in restriction to L 2(R):

$$\displaystyle{ \int _{\varOmega }hf\,\mathit{dR} =\int _{\varOmega }\mathit{hE}_{R}(f\mid \mathcal{A})\,\mathit{dR},\quad \forall h \in B_{\mathcal{A}},f \in L^{1}(R) \cap L^{2}(R), }$$

as an Hilbertian projection. We thus define the operator

$$\displaystyle{E_{R}(\cdot \mid \mathcal{A}): L^{2}(R) \rightarrow L^{2}(R_{ \mathcal{A}})}$$

as an orthogonal projection on the Hilbertian subspace \(L^{2}(R_{\mathcal{A}})\). In particular, when \(\mathcal{A}\) is the σ-field generated by the measurable mapping \(\phi:\varOmega \rightarrow \mathcal{X}\),

$$\displaystyle{ E_{R}(\cdot \mid \phi ): L^{2}(\varOmega,R) \rightarrow L^{2}(\mathcal{X},R_{\phi }) }$$

is specified for any function f ∈ L 2(R) by

$$\displaystyle{ \int _{\varOmega }h(\phi (\omega ))f(\omega )\,R(d\omega ) =\int _{\mathcal{X}}h(x)E_{R}(f\mid \phi = x)\,R_{\phi }(\mathit{dx}),\quad \forall h \in L^{2}(\mathcal{X},R_{\phi }). }$$

2 Conditional Expectation of Path Measures

Now we particularize Ω to be the path space \(D([0,1],\mathcal{X})\) or \(C([0,1],\mathcal{X})\).

Lemma 2

Let Q ∈ M+ (Ω) be a path measure and \(\mathcal{T} \subset [0,1]\) a time subset. For \(Q_{\mathcal{T}}\) to be a σ-finite measure, it is sufficient that there is some \(t_{o} \in \mathcal{T}\) such that \(Q_{t_{o}}\) is a σ-finite measure.

Proof

Let \(t_{o} \in \mathcal{T}\) be such that \(Q_{t_{o}} \in \mathrm{ M}_{+}(\mathcal{X})\) is a σ-finite measure with \((\mathcal{X}_{n})_{n\geq 1}\) an increasing sequence of measurable sets such that \(Q_{t_{o}}(\mathcal{X}_{n}) < \infty \) and \(\cup \mathcal{X}_{n} = \mathcal{X}\). Then, \(Q_{\mathcal{T}}\) is also σ-finite, since \(Q_{\mathcal{T}}(X_{t_{o}} \in \mathcal{X}_{n}) = Q_{t_{o}}(\mathcal{X}_{n})\) for all n and \(\cup _{n\geq 1}[\varOmega _{\mathcal{T}}\cap \{ X_{t_{o}} \in \mathcal{X}_{n}\}] =\varOmega _{\mathcal{T}}\).

Definition 3 (Conditionable Path Measure)

 

  1. 1.

    A positive measure Q ∈ M+(Ω) is called a path measure.

  2. 2.

    The path measure Q ∈ M+(Ω) is said to be conditionable if for all t ∈ [0, 1],  Q t is a σ-finite measure on \(\mathcal{X}\).

With Lemma 2, for any conditionable path measure Q ∈ M+(Ω), the conditional expectation \(E_{Q}(\cdot \mid X_{\mathcal{T}})\) is well-defined for any \(\mathcal{T} \subset [0,1]\). This is the reason for this definition.

Even when Q(Ω) = , Proposition 4 tells us that \(Q(\cdot \mid X_{\mathcal{T}})\) is a probability measure. In particular, \(Q(B\mid X_{\mathcal{T}})\) and \(E_{Q}(b\mid X_{\mathcal{T}})\) are bounded measurable functions for any measurable subset B and any measurable bounded function b.

Example 2

Let Q ∈ M+(Ω) the law of the real-valued process X such that for all 0 ≤ t < 1, X t  = X 0 is distributed with Lebesgue measure and X 1 = 0,  Q-almost everywhere. We see with Lemma 2 that Q = Q 01 is a σ-finite measure since Q 0 is σ-finite. But Q 1 is not a σ-finite measure. Consequently, Q is not a conditionable path measure.

Definition 4 (Markov Measure)

The path measure Q ∈ M+(Ω) is said to be Markov if it is conditionable in the sense of Definition 3 and if for all 0 ≤ t ≤ 1

$$\displaystyle{Q(X_{[t,1]} \in \cdot \mid X_{[0,t]}) = Q(X_{[t,1]} \in \cdot \mid X_{t}).}$$

3 Relative Entropy with Respect to an Unbounded Measure

Let R ∈ M+(Ω) be some σ-finite positive measure on some measurable space Ω. The relative entropy of the probability measure P ∈ P(Ω) with respect to R is loosely defined by

$$\displaystyle{ H(P\vert R):=\int _{\varOmega }\log (\mathit{dP}/\mathit{dR})\,\mathit{dP} \in (-\infty,\infty ],\qquad P \in \mathrm{ P}(\varOmega ) }$$

if P ≪ R and H(P | R) =  otherwise.

In the special case where R is a probability measure, this definition is meaningful.

Lemma 3

We assume that R ∈ P (Ω) is a probability measure.

We have for all P ∈ P (Ω), H(P|R) ∈ [0,∞] and H(P|R) = 0 if and only if P = R.

The function H(⋅|R) is strictly convex on the convex set P (Ω).

Proof

We have \(H(P\vert R) =\int _{\varOmega }h\left (\frac{\mathit{dP}} {\mathit{dR}}\right )\,\mathit{dR}\) with \(h(a) = a\log a - a + 1\) if a > 0 and h(0) = 1. As h ≥ 0, we see that for any P ∈ P(Ω) such that \(P \ll R,\ H(P\vert R) =\int _{\varOmega }h\left (\frac{\mathit{dP}} {\mathit{dR}}\right )\,\mathit{dR} \geq 0\). Hence H(P | R) ∈ [0, ]. Moreover, h(a) = 0 if and only if a = 1. Therefore, H(P | R) = 0 if and only if P = R.

The strict convexity of H(⋅ | R) follows from the strict convexity of h.

If R is unbounded, one must restrict the definition of H(⋅ | R) to some subset of P(Ω) as follows. As R is assumed to be σ-finite, there exists some measurable function W: Ω → [0, ) such that

$$\displaystyle{ z_{W}:=\int _{\varOmega }e^{-W}\,\mathit{dR} < \infty. }$$
(6)

Define the probability measure \(R_{W}:= z_{W}^{-1}e^{-W}\,R\) so that \(\log (\mathit{dP}/\mathit{dR}) =\log (\mathit{dP}/\mathit{dR}_{W}) - W -\log z_{W}\). It follows that for any P ∈ P(Ω) satisfying Ω WdP < , the formula

$$\displaystyle{ H(P\vert R):= H(P\vert R_{W}) -\int _{\varOmega }W\,\mathit{dP} -\log z_{W} \in (-\infty,\infty ] }$$

is a meaningful definition of the relative entropy which is coherent in the following sense. If Ω W′ dP <  for another measurable function W′: Ω → [0, ) such that z W < , then \(H(P\vert R_{W}) -\int _{\varOmega }W\,\mathit{dP} -\log z_{W} = H(P\vert R_{W'}) -\int _{\varOmega }W'\,\mathit{dP} -\log z_{W'} \in (-\infty,\infty ]\).

Therefore, H(P | R) is well-defined for any P ∈ P(Ω) such that Ω Wdp <  for some measurable nonnegative function W verifying (6). For any such function, let us define

$$\displaystyle{\mathrm{P}_{W}(\varOmega ):= \left \{P \in \mathrm{ P}(\varOmega );\int _{\varOmega }W\,\mathit{dP} < \infty \right \}.}$$

and B W (Ω) the space of measurable functions \(u:\varOmega \rightarrow \mathbb{R}\) such that \(\sup _{\varOmega }\vert u\vert /(1 + W) < \infty \). When Ω is a topological space, we also define the space C W (Ω) of all continuous functions on Ω such that \(\sup _{\varOmega }\vert u\vert /(1 + W) < \infty \).

Proposition 2

Let W be some function which satisfies (6). For all P ∈ PW (Ω),

$$\displaystyle{ \begin{array}{ll} H(P\vert R)& =\sup \left \{\int u\,\mathit{dP} -\log \int e^{u}\,\mathit{dR};u \in B_{W}(\varOmega )\right \} \\ & =\sup \left \{\int u\,\mathit{dP} -\log \int e^{u}\,\mathit{dR};u \in C_{W}(\varOmega )\right \}\end{array} }$$
(7)

and for all P ∈ P (Ω) such that P ≪ R,

$$\displaystyle\begin{array}{rcl} H(P\vert R) =\sup \left \{\int u\,\mathit{dP} -\log \int e^{u}\,\mathit{dR};u:\int e^{u}\,\mathit{dR} < \infty,\int u_{ -}\,\mathit{dP} < \infty \right \}& &{}\end{array}$$
(8)

where \(u_{-} = (-u) \vee 0\) and ∫u dP ∈ (−∞,∞] is well-defined for all u such that ∫u  dP < ∞.

In (7), when C W (Ω) is invoked, it implicitly assumed that Ω is a topological space equipped with its Borel σ-field.

The proof below is mainly a rewriting of the proof of [3, Prop. B.1] in the setting where the reference measure is possibly unbounded.

Proof (Proof of Proposition 2)

Once we have (8), (7) follows by standard approximation arguments.

The proof of (8) relies on Fenchel inequality for the convex function h(t) = tlogt:

$$\displaystyle{st \leq t\log t + e^{s-1}}$$

for all s ∈ [−, ),  t ∈ [0, ), with the conventions \(0\log 0 = 0,\ e^{-\infty } = 0\) and \(-\infty \times 0 = 0\) which are legitimated by limiting procedures. The equality is attained when \(t = e^{s-1}\).

Taking \(s = u(x),\ t = \frac{\mathit{dP}} {\mathit{dR}}(x)\) and integrating with respect to R leads us to

$$\displaystyle{\int u\,\mathit{dP} \leq H(P\vert R) +\int e^{u-1}\,\mathit{dR},}$$

whose terms are meaningful with values in (−, ], provided that ∫ u dP <  and Ω e udR < . Formally, the case of equality corresponds to \(\frac{\mathit{dP}} {\mathit{dR}} = e^{u-1}\). With the monotone convergence theorem, one sees that it is approached by the sequence \(u_{n} = 1 +\log (\frac{\mathit{dP}} {\mathit{dR}} \vee e^{-n})\), as n tends to infinity. This gives us

$$\displaystyle{H(P\vert R) =\sup \left \{\int u\,\mathit{dP} -\int e^{u-1}\,\mathit{dR};u:\int e^{u}\,\mathit{dR} < \infty,\inf u > -\infty \right \},}$$

which in turn implies that

$$\displaystyle{ H(P\vert R) =\sup \left \{\int u\,\mathit{dP} -\int e^{u-1}\,\mathit{dR};u:\int e^{u}\,\mathit{dR} < \infty,\int u_{ -}\,\mathit{dP} < \infty \right \}. }$$

Now, we take advantage of the unit mass of P ∈ P(Ω): 

$$\displaystyle{\int (u + b)\,\mathit{dP} -\int e^{u+b-1}\,\mathit{dR} =\int u\,\mathit{dP} - e^{b-1}\int e^{u}\,\mathit{dR} + b,\quad \forall b \in \mathbb{R},}$$

and we use the easy identity \(\log a =\inf _{b\in \mathbb{R}}\{\mathit{ae}^{b-1} - b\}\) to obtain

$$\displaystyle{\sup _{b\in \mathbb{R}}\left \{\int (u + b)\,\mathit{dP} -\int e^{u+b-1}\,\mathit{dR}\right \} =\int u\,\mathit{dP} -\log \int e^{u}\,\mathit{dR}.}$$

Whence,

$$\displaystyle\begin{array}{rcl} & & \sup \left \{\int u\,\mathit{dP} -\int e^{u-1}\,\mathit{dR};u:\int e^{u}\,\mathit{dR} < \infty,\int u_{ -}\,\mathit{dP} < \infty \right \} {}\\ & &\quad =\sup \left \{\int (u + b)\,\mathit{dP} -\int e^{u+b-1}\,\mathit{dR};b\,\in \,\mathbb{R},u:\int e^{u}\,\mathit{dR}\,<\,\infty,\int u_{ -}\,\mathit{dP}\,<\,\infty \right \} {}\\ & &\quad =\sup \left \{\int u\,\mathit{dP} -\log \int e^{u}\,\mathit{dR};u:\int e^{u}\,\mathit{dR} < \infty,\int u_{ -}\,\mathit{dP} < \infty \right \}. {}\\ \end{array}$$

This completes the proof of (8).

Let W be a nonnegative measurable function on Ω that verifies (6). Let us introduce the space M W (Ω) of all signed measures Q on Ω such that Ω Wd | Q |  < .

Corollary 1

The function H(⋅|R) is convex on the vector space of all signed measures. Its effective domain \(\mathop{\text{dom}}\nolimits H(\cdot \vert R):= \left \{H(\cdot \vert R) < \infty \right \}\) is included in PW (R)

Suppose furthermore that Ω is a topological space. Then, H(⋅|R) is lower semicontinuous with respect to the topology σ(MW (Ω),C W (Ω)).

As a function of its two arguments on \(\mathrm{M}_{W}(\varOmega )\! \times \!\mathrm{ M}_{W}(\varOmega ),\ H(\cdot \mid \cdot )\) is jointly convex and jointly lower semicontinuous with respect to the product topology. In particular, it is a jointly Borel function.

Proof

The first statement follows from (7).

With Proposition 2, we see that H(⋅ | R) is the supremum of a family of affine continuous functions: \(Q\mapsto \int _{\varOmega }u\,\mathit{dQ} -\log \int _{\varOmega }e^{u}\,\mathit{dR}\) indexed by u. Hence, it is convex and lower semicontinuous. The same argument works with the joint arguments.

Let Ω and Z be two Polish spaces equipped with their Borel σ-fields. For any measurable function ϕ: Ω → Z and any measure Q ∈ M+(Ω) we have the disintegration formula

$$\displaystyle{ Q(\cdot ) =\int _{Z}Q(\cdot \mid \phi = z)\,Q_{\phi }(\mathit{dz}) }$$

where we write Q ϕ : = ϕ # Q and \(z \in Z\mapsto Q(\cdot \vert \phi = z) \in \mathrm{ P}(\varOmega )\) is measurable.

Theorem 2 (Additive Property of the Relative Entropy)

We have

$$\displaystyle{H(P\vert R) = H(P_{\phi }\vert R_{\phi }) +\int _{Z}H\Big(P(\cdot \mid \phi = z)\Big\vert R(\cdot \mid \phi = z)\Big)\,P_{\phi }(\mathit{dz}),\quad P \in \mathrm{ P}(\varOmega ).}$$

Proof

By Theorem 1,

$$\displaystyle\begin{array}{rcl} H(P\vert R)& =& \int _{Z}E_{P}\left [\log (\frac{\mathit{dP}} {\mathit{dR}})\mid \phi = z\right ]\,P_{\phi }(\mathit{dz}) =\int _{Z}\log \frac{\mathit{dP}_{\phi }} {\mathit{dR}_{\phi }}(z)\,P_{\phi }(\mathit{dz}) {}\\ & & +\int _{Z}\left [\int _{\varOmega }\log \frac{\mathit{dP}(\cdot \mid \phi = z)} {\mathit{dR}(\cdot \mid \phi = z)}(\omega )\,P(d\omega \mid \phi = z)\right ]\,P_{\phi }(\mathit{dz}) {}\\ \end{array}$$

which is the announced result.

Remarks 1

There are serious measurability problems hidden behind this proof.

  1. (a)

    The assumption that Z is Polish ensures the existence of kernels \(z\mapsto P(\cdot \mid \phi = z)\) and \(z\mapsto R(\cdot \mid \phi = z)\). On the other hand, we know that for any function u ∈ B W , the mapping \(z \in \mathcal{X}\mapsto E_{P}(u\mid \phi = z) \in \mathbb{R}\) is measurable. Therefore, the mapping \(z \in Z\mapsto P(\cdot \mid \phi = z) \in \mathrm{ P}_{W}(\varOmega )\) is measurable once P W (Ω) is equipped with its cylindrical σ-field, i.e. generated by the mappings \(Q \in \mathrm{ P}_{W}(\varOmega )\mapsto \int _{\varOmega }u\,\mathit{dQ}\) where u describes B W . But this σ-field matches with the Borel σ-field of σ(P W (Ω), C W ) when Ω is metric and separable. As H is jointly Borel (see Corollary 1), it is jointly measurable with respect to the product of the cylindrical σ-fields. Hence, \(z\mapsto H\Big(P(\cdot \mid \phi = z)\Big\vert R(\cdot \mid \phi = z)\Big)\) is measurable.

    Note that in general, the Borel σ-field of σ(P W (Ω), B W ) is too rich to match with the cylindrical σ-field. This is the reason why Ω is assumed to be Polish (completeness doesn’t play any role here).

  2. (b)

    The relative entropy \(H\Big(P(\cdot \mid \phi = z)\Big\vert R(\cdot \mid \phi = z)\Big)\) inside the second integral of the additive property formula is a function of couples of probability measures. Therefore, with Lemma 3, we know that it is nonnegative in general and that it vanishes if and only if \(P(\cdot \mid \phi = z) = R(\cdot \mid \phi = z)\).

  3. (c)

    Together with its measurability, which was proved at Remak (a) above, this allows us to give a meaning to the integral \(\int _{Z}H\Big(P(\cdot \mid \phi = z)\Big\vert R(\cdot \mid \phi = z)\Big)\,P_{\phi }(\mathit{dz})\) in [0, ].

Let us mention an application of this theorem in the context of the Schrödinger problem (1) where Ω is a path space, see [2, 5]. For any, R ∈ M+(Ω),  P ∈ P(Ω), we have

$$\displaystyle{ H(P\vert R) = H(P_{01}\vert R_{01}) +\int _{\mathcal{X}^{2}}H(P^{\mathit{xy}}\vert R^{\mathit{xy}})\,P_{ 01}(dxdy) }$$

where \(Q_{01}:= (X_{0},X_{1})_{\#}Q\) is the law of the endpoint position and \(Q^{\mathit{xy}}:= Q(\cdot \vert X_{0} = x,X_{1} = y)\) is the bridge from x to y under Q. From this additive property formula and Corollary 1, it is easily seen that the solution \(\hat{P}\) of (1) (it is unique, since the entropy is strictly convex) satisfies

$$\displaystyle{ \hat{P}^{\mathit{xy}} = R^{\mathit{xy}},\quad \forall (x,y) \in \mathcal{X}^{2},\hat{P}_{ 01}\mathrm{-a.e.} }$$

and that \(\hat{P}_{01}\) is the unique solution of

$$\displaystyle{ H(\pi \vert R_{01}) \rightarrow \mathrm{ min};\qquad \pi \in \mathrm{ P}(\mathcal{X}^{2}):\pi _{ 0} =\mu _{0},\pi _{1} =\mu _{1} }$$

where π 0 and \(\pi _{1} \in \mathrm{ P}(\mathcal{X})\) are the first and second marginals of \(\pi \in \mathrm{ P}(\mathcal{X}^{2})\).

4 Positive Integration with Respect to a Markov Measure

4 Integration of Nonnegative Functions

The expectation E R Z of a nonnegative random variable Z with respect to a positive σ-finite measure R is a well-defined notion, even when Z is not R-integrable; in which case, one sets \(E_{R}Z = +\infty \). Indeed, with the monotone convergence theorem we have

$$\displaystyle{ E_{R}Z =\lim _{n\rightarrow \infty }E_{R}[\mathbf{1}_{\{\cup _{k\leq n}\varOmega _{k}\}}(Z \wedge n)] \in [0,\infty ] }$$

where (Ω k ) k ≥ 1 is a σ-finite partition of R.

Since \(R(\cdot \mid \mathcal{A})\) is a bounded measure, we see that \(E_{R}(Z\mid \mathcal{A})\) is well defined in [0, ]. Moreover, the fundamental formula of the conditional expectation is kept:

$$\displaystyle{ E_{R}[\mathit{aE}_{R}(Z\mid \mathcal{A})] = E_{R}(\mathit{aZ}) }$$

for any nonnegative function \(a \in \mathcal{A}\). To see this, denote \(a_{n} = \mathbf{1}_{\{\cup _{k\leq n}\varOmega _{k}\}}(a \wedge n)\) and \(Z_{n} = \mathbf{1}_{\{\cup _{k\leq n}\varOmega _{k}\}}(Z \wedge n)\). We have \(E_{R}[a_{n}E_{R}(Z_{n}\mid \mathcal{A})] = E_{R}(a_{n}Z_{n})\) for all n ≥ 1. Letting n tend to infinity, we obtain the announced identity with the monotone convergence theorem.

4 Positive Integration with Respect to a Markov Measure

We present a technical lemma about positive integration with respect to a Markov measure R ∈ M+(Ω). It is an easy result, but it is rather practical. It allows to work with (f, g)-transforms of Markov processes without assuming unnecessary integrability conditions on f and g.

Lemma 4

Let R ∈ M+ (Ω) be a Markov measure.

  1. (a)

    Let 0 ≤ t ≤ 1 and α,β be nonnegative functions such that \(\alpha \in \mathcal{A}_{[0,t]}\) and \(\beta \in \mathcal{A}_{[t,1]}\) . Then, for any ω outside an R-negligible set:

    1. (i)

      if E R (αβ∣X t )(ω) = 0, we have E R (α∣X t )(ω) = 0 or E R (β∣X t )(ω) = 0;

    2. (ii)

      if E R (αβ∣X t )(ω) > 0, we have \(E_{R}(\alpha \mid X_{t})(\omega ),E_{R}(\beta \mid X_{t})(\omega ) > 0\) and \(E_{R}(\alpha \beta \mid X_{t})(\omega ) = E_{R}(\alpha \mid X_{t})(\omega )E_{R}(\beta \mid X_{t})(\omega ) \in (0,\infty ].\)

  2. (b)

    Let P ∈ M+ (Ω) be a conditionable path measure such that P ≺ R and whose density writes as \(\frac{\mathit{dP}} {\mathit{dR}} =\alpha \beta\) with α,β nonnegative functions such that \(\alpha \in \mathcal{A}_{[0,t]}\) and \(\beta \in \mathcal{A}_{[t,1]}\) for some 0 ≤ t ≤ 1. Then,

    $$\displaystyle{\left \{\begin{array}{l} E_{R}(\alpha \mid X_{t}),E_{R}(\beta \mid X_{t}) \in (0,\infty ) \\ E_{R}(\alpha \beta \mid X_{t}) = E_{R}(\alpha \mid X_{t})E_{R}(\beta \mid X_{t}) \in (0,\infty )\\ \end{array} \right.\quad P\mathrm{-a.e.}}$$

    (but not R−a.e. in general). Furthermore,

    $$\displaystyle\begin{array}{rcl} & & E_{R}(\alpha \beta \mid X_{t}) \\ & & \quad = \mathbf{1}_{\{E_{R}(\alpha \mid X_{t})<\infty,E_{R}(\beta \mid X_{t})<\infty \}}E_{R}(\alpha \mid X_{t})E_{R}(\beta \mid X_{t}) \in [0,\infty )\quad R\mathrm{-a.e.} {}\end{array}$$
    (9)

As regards (9), even if α β is integrable, it is not true in general that the nonnegative functions α and β are integrable. Therefore, a priori the conditional expectations E R (αX t ) and E R (βX t ) may be infinite.

Proof

  • Proof of (a). The measure R disintegrates with respect to the initial and final positions:

    $$\displaystyle{ R =\int _{\mathcal{X}}R(\cdot \mid X_{0} = x)\,R_{0}(\mathit{dx}) =\int _{\mathcal{X}}R(\cdot \mid X_{1} = y)\,R_{1}(\mathit{dy}) }$$

    But, R 0 and R 1 are assumed to be σ-finite measures. Let \((\mathcal{X}_{n}^{0})_{n\geq 1}\) and \((\mathcal{X}_{n}^{1})_{n\geq 1}\) be two σ-finite partitions of R 0 and R 1, respectively. We denote \(\varOmega _{n}^{0} =\{ X_{0} \in \cup _{k\leq n}\mathcal{X}_{k}^{0}\},\ \varOmega _{n}^{1} =\{ X_{1} \in \cup _{k\leq n}\mathcal{X}_{k}^{1}\}\) and \(\varOmega _{n} =\varOmega _{ n}^{0} \cap \varOmega _{n}^{1}\).

    As R is Markov, if the functions α and β are integrable, then E R (αX t ) are E R (βX t ) well-defined and

    $$\displaystyle{ E_{R}(\alpha \beta \mid X_{t}) = E_{R}(\alpha \mid X_{t})E_{R}(\beta \mid X_{t}). }$$

    Letting n tend to infinity in \(E_{R}[(\alpha \wedge n)(\beta \wedge n)\mathbf{1}_{\varOmega _{n}}\mid X_{t}] = E_{R}((\alpha \wedge n)\mathbf{1}_{\varOmega _{n}^{0}}\mid X_{t})E_{R}((\beta \wedge n)\mathbf{1}_{\varOmega _{n}^{1}}\mid X_{t})\), we obtain \(E_{R}(\alpha \beta \mid X_{t}) = E_{R}(\alpha \mid X_{t})E_{R}(\beta \mid X_{t}) \in [0,\infty ]\). One concludes, remarking that the sequences are increasing.

  • Proof of (b). It is a consequence of the first part of the lemma. As P t is σ-finite measure, \(\frac{\mathit{dP}_{t}} {\mathit{dR}_{t}}(X_{t}) < \infty,\ R\mathrm{-a.e.}\) (hence, a fortiori P−a.e.). In addition, \(\frac{\mathit{dP}_{t}} {\mathit{dR}_{t}}(X_{t}) > 0,\ P\mathrm{-a.e.}\) (but not R−a.e. in general) and \(\frac{\mathit{dP}_{t}} {\mathit{dR}_{t}}(X_{t}) = E_{R}(\alpha \beta \mid X_{t})\), by Theorem 1(b). Consequently, we are allowed to apply part (ii) of (a) to obtain the identity which holds P−a.e. This identity extends R−a.e., yielding (9). To see this, remark with part (i) of (a) that when the density vanishes, the two terms of the product cannot be simultaneously equal to and one of them vanishes.

Analogously, one can prove the following extension.

Lemma 5

Let R ∈ M+ (Ω) be a Markov measure.

  1. 1.

    Let 0 ≤ s ≤ t ≤ 1 and two nonnegative functions α,β such that \(\alpha \in \mathcal{A}_{[0,s]},\ \beta \in \mathcal{A}_{[t,1]}\) . Then, for any ω outside an R-negligible set:

    1. (a)

      if E R (αβ∣X [s,t] )(ω) = 0, we have E R (α∣X s )(ω) = 0 or E R (β∣X t )(ω) = 0;

    2. (b)

      if E R (αβ∣X [s,t] )(ω) > 0, we have E R (α∣X s )(ω),E R (β∣X t )(ω) > 0 and \(E_{R}(\alpha \beta \mid X_{[s,t]})(\omega ) = E_{R}(\alpha \mid X_{s})(\omega )E_{R}(\beta \mid X_{t})(\omega ) \in (0,\infty ].\)

  2. 2.

    Let P ∈ M+ (Ω) be a conditionable path measure such that P ≺ R and whose density writes as \(\frac{\mathit{dP}} {\mathit{dR}} =\alpha \zeta \beta\) with α,ζ and β nonnegative functions such that \(\alpha \in \mathcal{A}_{[0,s]},\zeta \in \mathcal{A}_{[s,t]}\) and \(\beta \in \mathcal{A}_{[t,1]}\) for some 0 ≤ s ≤ t ≤ 1. Then,

    $$\displaystyle{\left \{\begin{array}{l} E_{R}(\alpha \mid X_{s}),E_{R}(\beta \mid X_{t}) \in (0,\infty ) \\ E_{R}(\alpha \beta \mid X_{[s,t]}) = E_{R}(\alpha \mid X_{s})E_{R}(\beta \mid X_{t}) \in (0,\infty )\\ \end{array} \right.\quad P\mathrm{-a.e.}}$$

    (and not R−a.e. in general). In addition,

    $$\displaystyle\begin{array}{rcl} & & E_{R}(\alpha \zeta \beta \mid X_{[s,t]}) {}\\ & & \quad = \mathbf{1}_{\{E_{R}(\alpha \mid X_{s})<\infty,E_{R}(\beta \mid X_{t})<\infty \}}E_{R}(\alpha \mid X_{s})\zeta E_{R}(\beta \mid X_{t}) \in [0,\infty )\quad R\mathrm{-a.e.} {}\\ \end{array}$$

5 Conditional Expectation with Respect to an Unbounded Measure

In standard textbooks, the theory of conditional expectation is presented and developed with respect to a probability measure (or equivalently, a bounded positive measure). However, there are natural unbounded path measures, such as the reversible Brownian motion on \(\mathbb{R}^{n}\), with respect to which a conditional expectation theory is needed. We present the details of this notion in this appendix section. From a measure theoretic viewpoint, this section is about disintegration of unbounded positive measures.

5 The Role of σ-Finiteness in Radon-Nikodym Theorem

The keystone of conditioning is Radon-Nikodym theorem. In order to emphasize the role of σ-finiteness, we recall a classical proof of this theorem, following von Neumann and Rudin, [6]. Let Ω be a space with its σ-field and P, Q, R ∈ M+(Ω) be positive measures on Ω. One says that P is absolutely continuous with respect to R and denotes P ≪ R, if for every measurable subset \(A \subset \varOmega,\ R(A) = 0 \Rightarrow P(A) = 0\). It is said to be concentrated on the measurable subset C ⊂ Ω if for any measurable subset \(A \subset \varOmega,\ P(A) = P(A \cap C)\). The measures P and Q are said to be mutually singular and one denotes P ⊥ Q, if there exist two disjoint measurable subsets C, D ⊂ Ω such that P is concentrated on C and Q is concentrated on D.

Theorem 3

Let P and R be two bounded positive measures.

  1. (a)

    There exists a unique pair (P a ,P s ) of measures such that \(P = P_{a} + P_{s},\ P_{a} \ll R\) and P s ⊥R. These measures are positive and P a ⊥P s .

  2. (b)

    There is a unique function θ ∈ L 1 (R) such that

    $$\displaystyle{ P_{a}(A) =\int _{A}\theta \,\mathit{dR},\quad \mathrm{for\ any\ measurable\ subset}\ A. }$$

Proof

The uniqueness proofs are easy. Let us begin with (a). Suppose we have two Lebesgue decompositions: \(P = P_{a} + P_{s} = P_{a}' + P_{s}'\). Then, \(P_{a} - P_{a}' = P_{s}' - P_{s},\ P_{a} - P_{a}' \ll R\) and P s ′ − P s  ⊥ R. Hence, \(P_{a} - P_{a}' = P_{s}' - P_{s} = 0\) since Q ≪ R and Q ⊥ R imply that Q = 0. As regards (b), if we have \(P_{a} =\theta R =\theta 'R\), then \(\int _{A}(\theta -\theta ')\,\mathit{dR} = 0\) for any measurable A ⊂ Ω. Therefore \(\theta =\theta ',R\mathrm{-a.e.}\)

Denote \(Q = P + R\). It is a bounded positive measure and for any function f ∈ L 2(Q),

$$\displaystyle{ \vert \int _{\varOmega }f\,\mathit{dP}\vert \leq \int _{\varOmega }\vert f\vert \,\mathit{dQ} \leq \sqrt{Q(\varOmega )}\|f\|_{L^{2}(Q)}. }$$
(10)

It follows that \(f \in L^{2}(Q)\mapsto \int _{\varOmega }f\,\mathit{dP} \in \mathbb{R}\) is a continuous linear form on the Hilbert space L 2(Q). Consequently, there exists g ∈ L 2(Q) such that

$$\displaystyle{ \int _{\varOmega }f\,\mathit{dP} =\int _{\varOmega }fg\,\mathit{dQ},\quad \forall f \in L^{2}(Q). }$$
(11)

Since \(0 \leq P \leq P + R:= Q\), we obtain 0 ≤ g ≤ 1,  Q−a.e. Let us take a version of g such that 0 ≤ g ≤ 1 everywhere. The identity (11) rewrites as

$$\displaystyle{ \int _{\varOmega }(1 - g)f\,\mathit{dP} =\int _{\varOmega }fg\,\mathit{dR},\quad \forall f \in L^{2}(Q). }$$
(12)

Let us set \(C:=\{ 0 \leq g < 1\},\ D =\{ g = 1\},\ P_{a}(\cdot ) = P(\cdot \cap C)\) et \(P_{s}(\cdot ) = P(\cdot \cap D)\).

Choosing f = 1 D in (12), we obtain R(D) = 0 so that P s  ⊥ R.

Choosing \(f = (1 + g + \cdots + g^{n})\mathbf{1}_{A}\) with n ≥ 1 and A any measurable subset in (12), we obtain

$$\displaystyle{ \int _{A}(1 - g^{n+1})\,\mathit{dP} =\int _{ A}g(1 + g + \cdots + g^{n})\,\mathit{dR}. }$$

But the sequence of functions \((1 - g^{n+1})\) increases pointwise towards 1 C . Now, by the monotone convergence theorem, we have \(P(A \cap C) =\int _{A}\mathbf{1}_{C}g/(1 - g)\,\mathit{dR}\). This means that P a  = θ R with \(\theta = \mathbf{1}_{\{0\leq g<1\}}g/(1 - g)\).

Finally, we see that θ ≥ 0 is R-integrable since \(\int _{\varOmega }\theta \,\mathit{dR}\,=\,P_{a}(\varOmega )\,\leq \,P(\varOmega )\,<\,\infty \).

The main argument of this proof is Riesz theorem on the representation of the dual of a Hilbert space. As the continuity of the linear form is ensured by Q(Ω) <  at (10), we have used crucially the boundedness of the measures P and R. This can be relaxed by means of the following notion.

Definition 5

The positive measure R is said to be σ-finite if it is either bounded or if there exists a sequence (Ω k ) k ≥ 1 of disjoint measurable subsets which partitions \(\varOmega:\ \sqcup _{k}\varOmega _{k} =\varOmega\) and are such that R(Ω k ) <  for all k.

In such a case it is said that (Ω k ) k ≥ 1 finitely partitions R or that it is a σ-finite partition of R.

Recall that an unbounded positive measure is allowed to take the value + . For instance, the measure R which is defined on the trivial σ-field \(\{\varnothing,\varOmega \}\) by \(R(\varnothing ) = 0\) and R(Ω) =  is a genuine positive measure and L 1(R) = { 0}. This situation may seem artificial, but in fact it is not, as can be observed with the following examples.

Examples 1

 

  1. (a)

    The push-forward of Lebesgue measure on \(\mathbb{R}\) by a function which takes finitely many values is a positive measure on the set of these values which charges at least one of them with an infinite mass. By the way this provides us with an example of a σ-finite measure whose pushed forward is not.

  2. (b)

    Lebesgue measure on \(\mathbb{R}^{2}\) is σ-finite, but its push-forward by the projection on the first coordinate assigns an infinite mass to any non-negligible Borel set.

Theorem 4 (Radon-Nikodym)

Let P and R two positive σ-finite measures such that P ≪ R. Then, there exists a unique measurable function θ such that

$$\displaystyle{ \int _{\varOmega }f\,\mathit{dP} =\int _{\varOmega }f\theta \,\mathit{dR},\quad \forall f \in L^{1}(P). }$$
(13)

Moreover, P is bounded if and only if θ ∈ L 1 (R).

Proof

Taking the intersection of two partitions which respectively finitely partition R and P, one obtains a countable measurable partition which simultaneously finitely partitions R and P. Theorem 3 applies on each subset of this partition and one obtains the desired result by recollecting the pieces. The resulting function θ need not be integrable anymore, but it is still is locally integrable in the sense that it is integrable in restriction to each subset of the partition. We have just extended Theorem 3 when the measures P and R are σ-finite. We conclude noticing that by Theorem 3 we have: P ≪ R if and only if P s  = 0.

With respect to Radon-Nikodym theorem, making a step away from σ-finiteness seems to be hopeless, as one can guess from the following example. Take R =  x ∈ [0, 1] δ x : the counting measure on Ω = [0, 1], and P the Lebesgue measure on [0, 1]. We see that P ≪ R, but there is no measurable function θ which satisfies (13).

5 Conditional Expectation with Respect to a Positive Measure

Let Ω be a space furnished with some σ-field and a sub-σ-field \(\mathcal{A}\). We take a positive measure R ∈ M+(Ω) on Ω and denote \(R_{\mathcal{A}}\) its restriction to \(\mathcal{A}\). The space of bounded measurable functions is denoted by B, while \(B_{\mathcal{A}}\) is the subspace of bounded \(\mathcal{A}\)-measurable functions. The subspace of L 1(R) consisting of the \(\mathcal{A}\)-measurable integrable functions is denoted by \(L^{1}(R_{\mathcal{A}})\).

We take g ≥ 0 in L 1(R). The mapping \(h \in B_{\mathcal{A}}\mapsto \int _{\varOmega }hg\,\mathit{dR}:=\int _{\varOmega }h\,\mathit{dR}_{\mathcal{A}}^{g}\) defines a finite positive measure \(R_{\mathcal{A}}^{g}\) on \((\varOmega,\mathcal{A})\). Clearly, if h ≥ 0 and \(\int _{\varOmega }h\,\mathit{dR} =\int _{\varOmega }h\,\mathit{dR}_{\mathcal{A}} = 0\), then \(\int _{\varOmega }h\,\mathit{dR}_{\mathcal{A}}^{g} = 0\). This means that \(R_{\mathcal{A}}^{g}\) is a finite measure which is absolutely continuous with respect to \(R_{\mathcal{A}}\). If \(R_{\mathcal{A}}\) is assumed to be σ-finite, by the Radon-Nikodym Theorem 4, there is a unique function \(\theta _{g} \in L^{1}(R_{\mathcal{A}})\) such that \(R_{\mathcal{A}}^{g} =\theta _{g}R_{\mathcal{A}}\). We have just obtained \(\int _{\varOmega }hg\,\mathit{dR} =\int _{\varOmega }h\theta _{g}\,\mathit{dR}_{\mathcal{A}} =\int _{\varOmega }h\theta _{g}\,\mathit{dR},\ \forall h \in B_{\mathcal{A}}\). Now, let f ∈ L 1(R) which might not be nonnegative. Considering its decomposition \(f = f_{+} - f_{-}\) into nonnegative and nonpositive parts: \(f_{+} = f \vee 0,\ f_{-} = (-f) \vee 0\), and setting \(\theta _{f} =\theta _{f_{+}} -\theta _{f_{-}}\), we obtain

$$\displaystyle{ \int _{\varOmega }hf\,\mathit{dR} =\int _{\varOmega }h\theta _{f}\,\mathit{dR}_{\mathcal{A}} =\int _{\varOmega }h\theta _{f}\,\mathit{dR},\quad \forall h \in B_{\mathcal{A}},f \in L^{1}(R). }$$
(14)

Definition 6 (Conditional Expectation)

It is assumed that \(R_{\mathcal{A}}\) is σ-finite.

For any f ∈ L 1(R), the conditional expectation of f with respect to \(\mathcal{A}\) is the unique (modulo R−a.e.-equality) function

$$\displaystyle{E_{R}(f\mid \mathcal{A}) \in L^{1}(R_{ \mathcal{A}})}$$

which is integrable, \(\mathcal{A}\)-measurable and such that \(\theta _{f} =: E_{R}(f\mid \mathcal{A})\) satisfies (14).

It is essential in this definition that \(R_{\mathcal{A}}\) is assumed to be σ-finite.

Of course,

$$\displaystyle{ E_{R}(f\mid \mathcal{A}) = f,\quad \forall f \in L^{1}(R_{ \mathcal{A}}) }$$
(15)

If, in (14), we take the function \(h =\mathrm{ sign}(E_{R}(f\vert \mathcal{A}))\) which is in \(B_{\mathcal{A}}\), we have

$$\displaystyle{ \int _{\varOmega }\vert E_{R}(f\mid \mathcal{A})\vert \,\mathit{dR}_{\mathcal{A}}\leq \int _{\varOmega }\vert f\vert \,\mathit{dR} }$$
(16)

which expresses that \(E_{R}(\cdot \mid \mathcal{A}): L^{1}(R) \rightarrow L^{1}(R_{\mathcal{A}})\) is a contraction, the spaces L 1 being equipped with their usual norms \(\|\cdot \|_{1}\). With (15), we see that the opertot norm of this contraction is 1. Therefore, \(E_{R}(\cdot \mid \mathcal{A}): L^{1}(R) \rightarrow L^{1}(R_{\mathcal{A}})\) is a continuous projection.

Taking h = 1 in (14), we have

$$\displaystyle{ \int _{\varOmega }f(\omega )\,R(d\omega ) =\int _{\varOmega }E_{R}(f\mid \mathcal{A})(\eta )\,R(d\eta ), }$$

which can be written

$$\displaystyle{ E_{R}E_{R}(f\mid \mathcal{A}) = E_{R}(f), }$$
(17)

with the notation E R (f): =  Ω fdR.

Remark 1

When R is a bounded measure, the mapping \(E_{R}(\cdot \mid \mathcal{A})\) shares the following properties.

  1. (a)

    For all \(f \in L^{1}(R) \geq 0,\ E_{R}(f\mid \mathcal{A}) \geq 0,\ R_{\mathcal{A}}\mathrm{-a.e.}\)

  2. (b)

    \(E_{R}(1\mid \mathcal{A}) = 1,\ R_{\mathcal{A}}\mathrm{-a.e.}\)

  3. (c)

    For all f, g ∈ L 1(R) and \(\lambda \in \mathbb{R},\ E_{R}(f +\lambda g\mid \mathcal{A}) = E_{R}(f\mid \mathcal{A}) +\lambda E_{R}(g\mid \mathcal{A}),\ R_{\mathcal{A}}\mathrm{-a.e.}\)

  4. (d)

    For any sequence (f n ) n ≥ 1 in L 1(R) with 0 ≤ f n  ≤ 1, which converges pointwise to 0, we have: \(\lim _{n\rightarrow \infty }E_{R}(f_{n}\mid \mathcal{A}) = 0,\ R_{\mathcal{A}}\mathrm{-a.e.}\)

Except for the “\(R_{\mathcal{A}}\mathrm{-a.e.}\)”, these properties characterize the expectation with respect to a probability measure. They can easily be checked, using (14), as follows.

  1. (i)

    For any \(h \in B_{\mathcal{A}}\geq 0\) and f ∈ L 1(R) ≥ 0, (14) implies that \(\int _{\varOmega }\mathit{hE}_{R}(f\mid \mathcal{A})\,\mathit{dR}_{\mathcal{A}}\geq 0\), which in turns implies (a).

  2. (ii)

    For any \(h \in B_{\mathcal{A}}\geq 0\), (14) implies that \(\int _{\varOmega }\mathit{hE}_{R}(1\mid \mathcal{A})\,\mathit{dR}_{\mathcal{A}} =\int _{\varOmega }h\,\mathit{dR}_{\mathcal{A}}\), whence (b).

  3. (iii)

    The linearity of \(f\mapsto E_{R}(f\mid \mathcal{A})(\eta )\) comes from the linearity of \(f\mapsto \int _{\varOmega }hf\,\mathit{dR}\) for all \(h \in B_{\mathcal{A}}\). Indeed, for all f, g ∈ L 1(R) and \(\lambda \in \mathbb{R}\), we have \(\int _{\varOmega }\mathit{hE}_{R}(f +\lambda g\mid \mathcal{A})\,\mathit{dR} =\int _{\varOmega }h[E_{R}(f\mid \mathcal{A}) +\lambda E_{R}(g\mid \mathcal{A})]\,\mathit{dR}\), which implies (c).

  4. (iv)

    For any \(h \in B_{\mathcal{A}}\), Fatou’s lemma, (14) and the dominated convergence theorem lead us to \(0 \leq \int _{\varOmega }h\lim _{n\rightarrow \infty }E_{R}(f_{n}\mid \mathcal{A})\,\mathit{dR}_{\mathcal{A}}\leq \liminf _{n\rightarrow \infty }\int _{\varOmega }\mathit{hE}_{R}(f_{n}\mid \mathcal{A})\,\mathit{dR}_{\mathcal{A}} =\lim _{n\rightarrow \infty }\int _{\varOmega }hf_{n}\,\mathit{dR} = 0\). This proves (d).

We used the boundedness of R at items (ii) and (iv), since in this case, bounded functions are integrable.

One could hope that for \(R_{\mathcal{A}}\mathrm{-a.e.}\ \eta\), there exists a probability kernel \(\eta \mapsto R(\cdot \mid \mathcal{A})(\eta )\) which admits \(E_{R}(\cdot \mid \mathcal{A})\) as its expectation. But negligible sets have to be taken into account. Indeed, the \(R_{\mathcal{A}}\)-negligible sets which invalidate these equalities depend on the function f, g, the real numbers λ and the sequences (f n ) n ≥ 1. Their non-countable union might not be measurable, and even in this case the measure of this union might be positive. Therefore, the σ-field on Ω must not be too rich for such a probability kernel to exist. Let us give a couple of definitions before stating at Proposition 4 that \(R(\cdot \mid \mathcal{A})\) exists in a general setting.

We are looking for a conditional probability measure in the following sense.

Definition 7 (Regular Conditional Probability Kernel)

The kernel \(R(\cdot \mid \mathcal{A})\) is a regular conditional probability if

  1. (a)

    for any \(f \in L^{1}(R),\ E_{R}(f\mid \mathcal{A})(\eta ) =\int _{\varOmega }f\,R(d\omega \mid \mathcal{A})(\eta )\) for \(R_{\mathcal{A}}\)-almost every η; 

  2. (b)

    for \(R_{\mathcal{A}}\)-almost every \(\eta,\ R(\cdot \mid \mathcal{A})(\eta )\) is a probability measure on Ω.

Property (a) was proved at Remark 1 when R is a bounded measure. It is property (b) which requires additional work, even when R is bounded. Proposition 4 provides us with a general setting where such a regular kernel exists. When a regular conditional kernel \(R(\cdot \mid \mathcal{A})\) exists, (17) is concisely expressed as a disintegration formula:

$$\displaystyle{ R(d\omega ) =\int _{\{\eta \in \varOmega \}}R(d\omega \mid \mathcal{A})(\eta )\,R_{\mathcal{A}}(d\eta ) }$$
(18)

Definition 8

Let \(\phi:\varOmega \rightarrow \mathcal{X}\) be a measurable function with values in a measurable space \(\mathcal{X}\). The smallest sub-σ-field on Ω which makes ϕ a measurable function is called the σ-field generated by ϕ. It is denoted by \(\mathcal{A}(\phi )\).

We are going to consider the conditional expectation with respect to \(\mathcal{A}(\phi )\) which is denoted by

$$\displaystyle{E(\cdot \mid \mathcal{A}(\phi )) = E(\cdot \mid \phi ).}$$

Proposition 3

Let \(\mathcal{B}\) be the σ-field on \(\mathcal{X}\) and \(\phi ^{-1}(\mathcal{B}):=\{\phi ^{-1}(B);B \in \mathcal{B}\}\).

  1. 1.

    \(\mathcal{A}(\phi ) =\phi ^{-1}(\mathcal{B})\) .

  2. 2.

    Any \(\mathcal{A}(\phi )\) -measurable function \(g:\varOmega \rightarrow \mathbb{R}\) can be written as

    $$\displaystyle{g =\tilde{ g}\circ \phi }$$

    with \(\tilde{g}: \mathcal{X} \rightarrow \mathbb{R}\) a measurable function.

Proof

  • Proof of (1). First remark that \(\mathcal{A}(\phi )\) is the smallest sub-σ-field on Ω which makes ϕ a measurable function. Consequently, it is the σ-field which is generated by \(\phi ^{-1}(\mathcal{B})\). But it is easy to check that \(\phi ^{-1}(\mathcal{B})\) is a σ-field. Hence, \(\mathcal{A}(\phi ) =\phi ^{-1}(\mathcal{B})\).

  • Proof of (2). Let y ∈ g(Ω). As g is \(\mathcal{A}(\phi )\)-measurable, \(g^{-1}(y) \in \mathcal{A}(\phi )\). By (1), it follows that there exists a measurable subset \(B_{y} \subset \mathcal{X}\) such that \(\phi ^{-1}(B_{y}) = g^{-1}(y)\). Let us set

    $$\displaystyle{\tilde{g}(x) = y,\quad \mathrm{forall}x \in B_{y}.}$$

    For any ω ∈ g −1(y), we have ϕ(ω) ∈ B y , so that \(g(\omega ) = y =\tilde{ g}(\phi (\omega ))\). But (g −1(y)) y ∈ g(Ω) is a partition of Ω, hence \(g(\omega ) =\tilde{ g}(\phi (\omega ))\) for all ω ∈ Ω.

This proposition allows us to denote

$$\displaystyle{x \in \mathcal{X}\mapsto E_{R}(f\mid \phi = x) \in \mathbb{R}}$$

the unique function in \(L^{1}(\mathcal{X},R_{\phi })\) such \(E_{R}(f\mid \phi =\phi (\eta )) = E_{R}(f\mid \mathcal{A}(\phi ))(\eta ),\ R\mathrm{-a.e.}\) en η.

Proposition 4

Let R ∈ M+ (Ω) be a bounded positive measure on Ω and \(\phi:\varOmega \rightarrow \mathcal{X}\) a measurable application in the Polish (separable, complete metric) space \(\mathcal{X}\) equipped with the corresponding Borel σ-field. Then, E R (⋅∣ϕ) admits a regular conditional probability kernel \(x \in \mathcal{X}\mapsto R(\cdot \mid \phi = x) \in \mathrm{ P}(\varOmega )\) .

Proof

This well-known and technically delicate result can be found at [1, Thm. 10.2.2].

In the setting of Proposition 4, the disintegration formula (18) is

$$\displaystyle{ R(d\omega ) =\int _{\mathcal{X}}R(d\omega \mid \phi = x)\,R_{\phi }(\mathit{dx}). }$$

The main assumption for defining properly \(E_{R}(f\mid \mathcal{A})\) with f ∈ L 1(R) at Definition 6 is that \(R_{\mathcal{A}}\) is σ-finite. In the special case where \(\mathcal{A} = \mathcal{A}(\phi )\), it is equivalent to the following.

Assumption 1

The measure \(R_{\phi } \in \mathrm{ M}_{+}(\mathcal{X})\) is σ-finite.

Remark 2 (About this Assumption)

It is necessary that R is σ-finite for R ϕ to be σ-finite too. Indeed, if \((\mathcal{X}_{n})_{n\geq 1}\) is a σ-finite partition of \(R_{\phi },\ (\phi ^{-1}(\mathcal{X}_{n}))_{n\geq 1}\) is a countable measurable partition of Ω which satisfies \(R(\phi ^{-1}(\mathcal{X}_{n})) = R_{\phi }(\mathcal{X}_{n}) < \infty \) for all n. This means that it finitely partitions R.

5 Radon-Nikodym Derivative and Conditioning

In addition to the measurable mapping \(\phi:\varOmega \rightarrow \mathcal{X}\) and the positive measure R ∈ M+(Ω), let us introduce another positive measure P ∈ M+(Ω) which admits a Radon-Nikodym derivative with respect to R:  P ≺ R.

Proposition 5

Under the Assumption  1, let us suppose that P is bounded and P ≺ R. Then,

  1. 1.

    We have P ϕ ≺ R ϕ and

    $$\displaystyle{\frac{\mathit{dP}_{\phi }} {\mathit{dR}_{\phi }}(\phi ) = E_{R}\left (\frac{\mathit{dP}} {\mathit{dR}}\mid \phi \right ),\quad R\mathrm{-a.e.}}$$
  2. 2.

    For any bounded measurable function f,

    $$\displaystyle{E_{P}(f\vert \phi )E_{R}\left (\frac{\mathit{dP}} {\mathit{dR}}\mid \phi \right ) = E_{R}\left (\frac{\mathit{dP}} {\mathit{dR}}f\mid \phi \right ),\quad R\mathrm{-a.e.}}$$
  3. 3.

    Furthermore,

    $$\displaystyle{E_{R}\left (\frac{\mathit{dP}} {\mathit{dR}}\mid \phi \right ) > 0,\quad P\mathrm{-a.e.}}$$

Remark 3

One might not have \(E_{R}(\frac{\mathit{dP}} {\mathit{dR}}\vert \phi ) > 0,\ R\mathrm{-a.e.}\)

Proof

As P is bounded, we have

$$\displaystyle{ \frac{\mathit{dP}} {\mathit{dR}} \in L^{1}(R) }$$

and we are allowed to consider \(E_{R}(\frac{\mathit{dP}} {\mathit{dR}}f\mid \phi )\) for any bounded measurable function f.

  • Proof of (1). For any bounded measurable function u on \(\mathcal{X}\),

    $$\displaystyle\begin{array}{rcl} E_{P_{\phi }}(u)& =& E_{P}(u(\phi )) = E_{R}\left (\frac{\mathit{dP}} {\mathit{dR}}u(\phi )\right ) {}\\ & =& E_{R}\left (u(\phi )E_{R}\left (\frac{\mathit{dP}} {\mathit{dR}}\mid \phi \right )\right ) = E_{R_{\phi }}\left (uE_{R}\left (\frac{\mathit{dP}} {\mathit{dR}}\mid \phi = \cdot \right )\right ) {}\\ \end{array}$$
  • Proof of (2). For any bounded measurable functions f, h with \(h \in \mathcal{A}(\phi )\), we have

    $$\displaystyle\begin{array}{rcl} E_{P}(hf)& =& E_{R}\left (\frac{\mathit{dP}} {\mathit{dR}}hf\right ) = E_{R}\left (\mathit{hE}_{R}\left (\frac{\mathit{dP}} {\mathit{dR}}f\mid \phi \right )\right )\quad \mathrm{and} {}\\ E_{P}(hf)& =& E_{P}(\mathit{hE}_{P}(f\mid \phi )) = E_{R}\left (\mathit{hE}_{P}\left (f\mid \phi \right )\frac{\mathit{dP}} {\mathit{dR}}\right ) {}\\ & =& E_{R}\left [\mathit{hE}_{P}(f\mid \phi )E_{R}\left (\frac{\mathit{dP}} {\mathit{dR}}\mid \phi \right )\right ]. {}\\ \end{array}$$

    The desired result follows by identifying the right-hand side terms of these series of equalities.

  • Proof of (3). Let \(A \in \mathcal{A}(\phi )\) be such that \(\mathbf{1}_{A}E_{R}\left (\frac{\mathit{dP}} {\mathit{dR}}\mid \phi \right ) = 0,\ R\mathrm{-a.e.}\) Then,

    \(0 = E_{R}\left (\mathbf{1}_{A}E_{R}\left (\frac{\mathit{dP}} {\mathit{dR}}\mid \phi \right )\right ) = E_{R}\left (\frac{\mathit{dP}} {\mathit{dR}}\mathbf{1}_{A}\right ) = P(A)\). This proves the desired result.