Keywords

These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

1 Definition of the Lebesgue Integral

In this section we revisit the familiar notion of mathematical expectation, but now we define it for general (not necessarily discrete) random variables. The notion of expectation is identical to the notion of the Lebesgue integral.

Let \(({\it\Omega}, \mathcal{F},\mu )\) be a measurable space with a finite measure. A measurable function is said to be simple if it takes a finite or countable number of values. The sum, product and quotient (when the denominator does not take the value zero) of two simple functions is again a simple function.

Theorem 3.1

Any non-negative measurable function f is a monotone limit from below of non-negative simple functions, that is \(f(\omega ) {=\lim }_{n\rightarrow \infty }{f}_{n}(\omega )\) for every \(\omega \), where \({f}_{n}\) are non-negative simple functions and \({f}_{n}(\omega ) \leq {f}_{n+1}(\omega )\) for every \(\omega \) . Moreover, if a function \(f\) is a limit of measurable functions for all \(\omega \), then f is measurable.

Proof

Let \({f}_{n}\) be defined by the relations

$${f}_{n}(\omega ) = k{2}^{-n}\ \ \mathrm{if}\ \ k{2}^{-n} \leq f(\omega ) < (k + 1){2}^{-n},\ k = 0,1,\ldots. $$

The sequence \({f}_{n}\) satisfies the requirements of the theorem.

We now prove the second statement. Given a function f which is the limit of measurable functions \({f}_{n}\), consider the subsets \(A \subseteq \mathbb{R}\) for which \({f}^{-1}(A) \in \mathcal{F}\). It is easy to see that these subsets forms a \(\sigma \)-algebra which we shall denote by \({\mathcal{R}}_{f}\). Let us prove that open intervals \({A}_{t} = (-\infty, t)\) belong to \({\mathcal{R}}_{f}\). Indeed it is easy to check the following relation

$${f}^{-1}({A}_{t}) ={\bigcup \limits_{k}}{\bigcup \limits_{m}}{\bigcap \limits_{n\geq m}}\{\omega : {f}_{n}(\omega) < t -\frac{1} {k}\}. $$

Since \({f}_{n}\) are measurable, the sets \(\{\omega : {f}_{n}(\omega ) < t -\frac{1} {k}\}\) belong to \(\mathcal{F}\), and therefore \({f}^{-1}({A}_{t}) \in \mathcal{F}\). Since the smallest \(\sigma \)-algebra which contains all \({A}_{t}\) is the Borel \(\sigma \)-algebra on \(\mathbb{R}\), \({f}^{-1}(A) \in \mathcal{F}\) for any Borel set A of the real line. This completes the proof of the theorem.

We now introduce the Lebesgue integral of a measurable function. When f is measurable and the measure is a probability measure, we refer to the integral as the expectation of the random variable, and denote it by \(\mathrm{E}f\).

We start with the case of a simple function. Let f be a simple function taking non-negative values, which we denote by \({a}_{1},{a}_{2},\ldots \). Let us define the events \({C}_{i} =\{\omega : f(\omega ) = {a}_{i}\}\).

Definition 3.2

The sum of the series \({\sum }_{i=1}^{\infty }{a}_{i}\mu ({C}_{i})\), provided that the series converges, is called the Lebesgue integral of the function f. It is denoted by \({\int }_{{\it\Omega} }fd\mu \) . If the series diverges, then it is said that the integral is equal to plus infinity.

It is clear that the sum of the series does not depend on the order of summation. The following lemma is clear.

Lemma 3.3

The integral of a simple non-negative function has the following properties.

  1. 1.

    \({\int }_{{\it\Omega} }fd\mu \geq 0\).

  2. 2.

    \({\int }_{{\it\Omega} }{\chi }_{{\it\Omega} }d\mu = \mu ({\it\Omega} )\), where \({\chi }_{{\it\Omega} }\) is the function identically equal to 1 on \({\it\Omega} \).

  3. 3.

    \({\int }_{{\it\Omega} }(a{f}_{1} + b{f}_{2})d\mu = a{\int }_{{\it\Omega} }{f}_{1}d\mu + b{\int }_{{\it\Omega} }{f}_{2}d\mu \) for any \(a,b > 0\).

  4. 4.

    \({\int }_{{\it\Omega} }{f}_{1}d\mu \geq {\int }_{{\it\Omega} }{f}_{2}d\mu \) if \({f}_{1} \geq {f}_{2} \geq 0\).

Now let f be an arbitrary measurable function taking non-negative values. We consider the sequence f n of non-negative simple functions which converge monotonically to f from below. It follows from the fourth property of the Lebesgue integral that the sequence \({\int }_{{\it\Omega} }{f}_{n}d\mu \) is non-decreasing and there exists a limit \(\lim_{n\rightarrow \infty }{\int }_{{\it\Omega} }{f}_{n}d\mu \), which is possibly infinite.

Theorem 3.4

Let f and \({f}_{n}\) be as above. Then the value of \(\lim_{n\rightarrow \infty }{\int }_{{\it\Omega} }{f}_{n}d\mu \) does not depend on the choice of the approximating sequence.

We first establish the following lemma.

Lemma 3.5

Let \(g \geq 0\) be a simple function such that \(g \leq f\) . Assume that \(f {=\lim }_{n\rightarrow \infty }{f}_{n}\), where \({f}_{n}\) are non-negative simple functions such that \({f}_{n+1} \geq {f}_{n}\) . Then \({\int }_{{\it\Omega} }gd\mu {\leq \lim }_{n\rightarrow \infty }{\int }_{{\it\Omega} }{f}_{n}d\mu \).

Proof

Take an arbitrary \(\epsilon > 0\) and set \({C}_{n} =\{\omega : {f}_{n}(\omega ) - g(\omega ) > -\epsilon \}.\) It follows from the monotonicity of \({f}_{n}\) that \({C}_{n} \subseteq {C}_{n+1}\). Since \({f}_{n} \uparrow f\) and \(f \geq g\), we have \({\bigcup }_{n}{C}_{n} = {\it\Omega} \). Therefore, \(\mu ({C}_{n}) \rightarrow \mu ({\it\Omega} )\) as \(n \rightarrow \infty \). Let \({\chi }_{{C}_{n}}\) be the indicator function of the set \({C}_{n}\). Then \({g}_{n} = g{\chi }_{{C}_{n}}\) is a simple function and \({g}_{n} \leq {f}_{n} + \epsilon \). Therefore, by the monotonicity of \({\int }_{{\it\Omega} }{f}_{n}d\mu \),

$${\int }_{{\it\Omega} }{g}_{n}d\mu \leq {\int }_{{\it\Omega} }{f}_{n}d\mu + \epsilon, \ \ {\int }_{{\it\Omega} }{g}_{n}d\mu {\leq \lim }_{m\rightarrow \infty }{\int }_{{\it\Omega} }{f}_{m}d\mu + \epsilon. $$

Since \(\epsilon \) is arbitrary, we obtain \({\int }_{{\it\Omega} }{g}_{n}d\mu {\leq \lim }_{m\rightarrow \infty }{\int }_{{\it\Omega} }{f}_{m}d\mu \). It remains to prove that \(\lim_{n\rightarrow \infty }{\int }_{{\it\Omega} }{g}_{n}d\mu ={ \int }_{{\it\Omega} }gd\mu \).

We denote by \({b}_{1},{b}_{2},\ldots \) the values of the function g, and by \({B}_{i}\) the set where the value \({b}_{i}\) is taken, \(i = 1,2,\ldots \). Then

$${\int}_{{\it\Omega}}gd\mu ={ \sum \limits_{i}}{b}_{i}\mu ({B}_{i}),\ \ {\int }_{\it\Omega}{g}_{n}d\mu ={ \sum \limits_{i}}{b}_{i}\mu ({B}_{i} \bigcap {C}_{n}).$$

It is clear that for all i we have \(\lim_{n\rightarrow \infty }\mu ({B}_{i} \bigcap {C}_{n}) = \mu ({B}_{i})\). Since the series above consists of non-negative terms and the convergence is monotonic for each i, we have

$${\lim \limits_{n\rightarrow \infty }}{\int }_{{\it\Omega} }{g}_{n}d\mu = {\lim \limits_{n\rightarrow \infty}}{\sum }_{i}{b}_{i}\mu ({B}_{i} \bigcap {C}_{n})$$
$$={ \sum \limits_{i}}{b{}_{i}}{\lim \limits_{n\rightarrow \infty }}\mu ({B}_{i} \bigcap {C}_{n}) ={ \sum \limits_{i}}{b}_{i}\mu ({B}_{i}) ={ \int }_{{\it\Omega} }gd\mu.$$

This completes the proof of the lemma.

It is now easy to prove the independence of \(\lim_{n\rightarrow \infty }{\int }_{{\it\Omega} }{f}_{n}d\mu \) from the choice of the approximating sequence.

Proof of Theorem 3.4. Let there be two sequences \({f}_{n}^{(1)}\) and \({f}_{n}^{(2)}\) such that \({f}_{n+1}^{(1)} \geq {f}_{n}^{(1)}\) and \({f}_{n+1}^{(2)} \geq {f}_{n}^{(2)}\) for all n, and

$${\lim \limits_{n\rightarrow \infty}}{f}_{n}^{(1)}(\omega ) ={\lim \limits_{ n\rightarrow \infty} }{f}_{n}^{(2)}(\omega ) = f(\omega )\ \ \mathrm{for}\ \ \mathrm{every}\ \omega. $$

It follows from Lemma 3.5 that for any k,

$${\int}_{{\it\Omega} }{f}_{k}^{(1)}d\mu {\leq {\lim \limits_{ n\rightarrow \infty}}} {\int}_{{\it\Omega}}{f}_{n}^{(2)}d\mu, $$

and therefore,

$${\lim \limits_{n\rightarrow \infty}}{\int }_{{\it\Omega} }{f}_{n}^{(1)}d\mu {\leq {\lim \limits_{ n\rightarrow \infty}}}{\int }_{{\it\Omega} }{f}_{n}^{(2)}d\mu. $$

We obtain

$${\lim \limits_{n\rightarrow \infty }}{\int }_{{\it\Omega} }{f}_{n}^{(1)}d\mu {\geq {\lim \limits_{ n\rightarrow \infty}}}{\int }_{{\it\Omega} }{f}_{n}^{(2)}d\mu $$

by interchanging \({f}_{n}^{(1)}\) and \({f}_{n}^{(2)}\). Therefore,

$${\lim \limits_{n\rightarrow \infty }}{\int}_{{\it\Omega} }{f}_{n}^{(1)}d\mu ={\lim \limits_{ n\rightarrow \infty }}{\int}_{{\it\Omega} }{f}_{n}^{(2)}d\mu. $$

Definition 3.6

Let f be a non-negative measurable function and \({f}_{n}\) a sequence of non-negative simple functions which converge monotonically to f from below. The limit \(\lim_{n\rightarrow \infty }{\int }_{{\it\Omega} }{f}_{n}d\mu \) is called the Lebesgue integral of the function f. It is denoted by \({\int }_{{\it\Omega} }fd\mu \).

In the case of a simple function f, this definition agrees with the definition of the integral for a simple function, since we can take \({f}_{n} = f\) for all n.

Now let f be an arbitrary (not necessarily positive) measurable function. We introduce the indicator functions:

$${\chi }_{+}(\omega ) = \left\{\begin{array}{ll} 1\ \ \ { if}f(\omega ) \geq 0,\\ 0\ \ \ { if} f(\omega ) < 0, \end{array} \right. $$
$${\chi }_{-}(\omega ) = \left\{\begin{array}{ll} 1\ \ \ { if}f(\omega ) < 0,\\ 0\ \ \ {if} f(\omega ) \geq 0. \end{array} \right. $$

Then \({\chi }_{+}(\omega ) + {\chi }_{-}(\omega ) \equiv 1\), \(f = f{\chi }_{+} + f{\chi }_{-} = {f}_{+} - {f}_{-}\), where \({f}_{+} = f{\chi }_{+}\) and \({f}_{-} = -f{\chi }_{-}\). Moreover, \({f}_{+} \geq 0\), \({f}_{-}\geq 0\), so the integrals \({\int }_{{\it\Omega} }{f}_{+}d\mu \) and \({\int }_{{\it\Omega} }{f}_{-}d\mu \) have already been defined.

Definition 3.7

The function f is said to be integrable if \({\int }_{{\it\Omega} }{f}_{+}d\mu < \infty \) and \({\int }_{{\it\Omega} }{f}_{-}d\mu < \infty \) . In this case the integral is equal to \({\int }_{\Omega} fd\mu ={ \int }_{{\it\Omega} }{f}_{+}d\mu -{\int }_{{\it\Omega} }{f}_{-}d\mu \) . If \({\int }_{{\it\Omega} }{f}_{+}d\mu = \infty \) and \({\int }_{{\it\Omega} }{f}_{-}d\mu < \infty \) ( \({\int }_{{\it\Omega} }{f}_{+}d\mu < \infty \), \({\int }_{{\it\Omega} }{f}_{-}d\mu = \infty \) ), then \({\int }_{{\it\Omega} }fd\mu = \infty \) ( \({\int }_{{\it\Omega} }fd\mu = -\infty \) ). If \({\int }_{{\it\Omega} }{f}_{+}d\mu ={ \int }_{{\it\Omega} }{f}_{-}d\mu = \infty \), then \({\int }_{{\it\Omega} }fd\mu \) is not defined.

Since \(\vert f\vert = {f}_{+} + {f}_{-}\), we have \({\int }_{{\it\Omega} }\vert f\vert d\mu ={ \int }_{{\it\Omega} }{f}_{+}d\mu +{ \int }_{{\it\Omega} }{f}_{-}d\mu \), and so \({\int }_{{\it\Omega} }fd\mu \) is finite if and only if \({\int }_{{\it\Omega} }\vert f\vert d\mu \) is finite. The integral has Properties 24 listed in Lemma 3.3.

Let \(A \in \mathcal{F}\) be a measurable set and f a measurable function on \(({\it\Omega}, \mathcal{F},\mu )\). We can define the integral of f over the set A (which is a subset of \({\it\Omega} \)) in two equivalent ways. One way is to define

$${\int }_{A}fd\mu ={ \int }_{{\it\Omega} }f{\chi }_{A}d\mu, $$

where \({\chi }_{A}\) is the indicator function of the set A. Another way is to consider the restriction of μ from Ω to A. Namely, we consider the new σ-algebra \({\mathcal{F}}_{A}\), which contains all the measurable subsets of A, and the new measure \({\mu }_{A}\) on \({\mathcal{F}}_{A}\), which agrees with μ on all the sets from \({\mathcal{F}}_{A}\). Then \((A,{\mathcal{F}}_{A})\) is a measurable space with a measure \({\mu }_{A}\), and we can define

$${\int }_{A}fd\mu ={ \int }_{A}fd{\mu }_{A}.$$

It can easily be seen that the above two definitions lead to the same notion of the integral over a measurable set.

Let us note another important property of the Lebesgue integral: it is a \(\sigma \)-additive function on \(\mathcal{F}\). Namely, let \(A ={ \bigcup }_{i=1}^{\infty }{A}_{i}\), where \({A}_{1},{A}_{2},\ldots \) are measurable sets such that \({A}_{i} \cap {A}_{j} = \emptyset \) for \(i\neq j\). Let \(f\) be a measurable function such that \({\int }_{A}fd\mu \) is finite. Then

$${\int }_{A}fd\mu ={ \sum \limits_{i=1}^{\infty }}{\int}_{{A}_{i}}fd\mu. $$

To justify this statement we can first consider f to be a non-negative simple function. Then the \(\sigma \)-additivity follows from the fact that in an infinite series with non-negative terms the terms can be re-arranged. For an arbitrary non-negative measurable f we use the definition of the integral as a limit of integrals of simple functions. For f which is not necessarily non-negative, we use Definition 3.7.

If f is a non-negative function, the \(\sigma \)-additivity of the integral implies that the function \(\eta (A) ={ \int }_{A}fd\mu \) is itself a measure.

The mathematical expectation (which is the same as the Lebesgue integral over a probability space) has all the properties described in Chap. 1. In particular

  1. 1.

    \(\mathrm{E}\xi \geq 0\) if \(\xi \geq 0\).

  2. 2.

    \(\mathrm{E}{\chi }_{{\it\Omega} } = 1\) where \({\chi }_{{\it\Omega} }\) is the random variable identically equal to 1 on \( \Omega \).

  3. 3.

    \(\mathrm{E}(a{\xi }_{1} + b{\xi }_{2}) = a\mathrm{E}{\xi }_{1} + b\mathrm{E}{\xi }_{2}\) if \(\mathrm{E}{\xi }_{1}\) and \(\mathrm{E}{\xi }_{2}\) are finite.

The variance of the random variable \(\xi \) is defined as \(\mathrm{E}{(\xi -\mathrm{ E}\xi )}^{2}\), and the n-th order moment is defined as \(\mathrm{E}{\xi }^{n}\). Given two random variables \({\xi }_{1}\) and \({\xi }_{2}\), their covariance is defined as \(\mathrm{Cov}({\xi }_{1},{\xi }_{2}) =\mathrm{ E}({\xi }_{1} -\mathrm{ E}{\xi }_{1})({\xi }_{2} -\mathrm{ E}{\xi }_{2})\). The correlation coefficient of two random variables \({\xi }_{1},{\xi }_{2}\) is defined as \(\rho ({\xi }_{1},{\xi }_{2}) =\mathrm{ Cov}({\xi }_{1},{\xi }_{2})/\sqrt{\mathrm{Var }{\xi }_{1 } \mathrm{Var }{\xi }_{2}}\).

2 Induced Measures and Distribution Functions

Given a probability space \(({\it\Omega}, \mathcal{F},\mathrm{P})\), a measurable space \((\widetilde{\it \Omega},\widetilde{\mathcal{F}})\) and a measurable function \(f : {\it\Omega} \rightarrow \widetilde{ {\it\Omega} }\), we can define the induced probability measure \(\widetilde{\mathrm{P}}\) on the \(\sigma \)-algebra \(\widetilde{\mathcal{F}}\) via the formula

$$\widetilde{\mathrm{P}}(A) =\mathrm{ P}({f}^{-1}(A))\ \ \rm for\ A \in \widetilde{\mathcal{F}}.$$

Clearly \(\widetilde{\mathrm{P}}(A)\) satisfies the definition of a probability measure. The following theorem states that the change of variable is permitted in the Lebesgue integral.

Theorem 3.8

Let \(g :\widetilde{ {\it\Omega} } \rightarrow \mathbb{R}\) be a random variable. Then

$${\int}_\Omega g(f(\omega ))d\mathrm{P}(\omega ) ={ \int}_{\widetilde\Omega} g(\widetilde{w}) d\widetilde{\mathrm{P}}(\widetilde{w}).$$

The integral on the right-hand side is defined if and only if the integral on the left-hand side is defined.

Proof

Without loss of generality we can assume that g is non-negative. When g is a simple function, the theorem follows from the definition of the induced measure. For an arbitrary measurable function it suffices to note that any such function is a limit of a non-decreasing sequence of simple functions.

Let us examine once again the relationship between the random variables and their distribution functions. Consider the collection of all intervals:

$$\mathcal{I} =\{ (a,b),[a,b),(a,b],[a,b],\ \mathrm{where}\ -\infty \leq a \leq b \leq \infty \}.$$

Let \(m : \mathcal{I}\rightarrow \mathbb{R}\) be a \(\sigma \)-additive nonnegative function, that is

  1. 1.

    \(m(I) \geq 0\) for any \(I \in \mathcal{I}\).

  2. 2.

    If \(I,{I}_{i} \in \mathcal{I},\ i = 1,2,\ldots \), \({I}_{i} \bigcap {I}_{j} = \emptyset \) for \(i\neq j\), and \(I ={ \bigcup }_{i=1}^{\infty }{I}_{i}\), then

    $$m(I) ={ \sum \limits_{i=1}^{\infty }}m({I}_{ i}).$$

Although m is \(\sigma \)-additive, as required of a measure, it is not truly a measure since it is defined on the collection of intervals, which is not a \(\sigma \)-algebra.

We shall need the following theorem (a particular case of the theorem on the extension of a measure discussed in Sect. 3.4).

Theorem 3.9

Let m be a σ-additive function satisfying conditions 1 and 2. Then there is a unique measure μ defined on the σ-algebra of Borel sets of the real line, which agrees with m on all the intervals, that is \(\mu (I) = m(I)\) for each \(I \in \mathcal{I}\).

Consider the following three examples, which illustrate how a measure can be constructed given its values on the intervals.

Example

Let \(F(x)\) be a distribution function. We define

$$m((a,b]) = F(b) - F(a),\ m([a,b]) = F(b) {-\lim \limits_{t\uparrow a}}\ F(t),$$
$$m((a,b)) ={\lim \limits_{t\uparrow b}}\ F(t) - F(a),\ m([a,b)) ={\lim \limits_{t\uparrow b}}\ F(t) {-\lim \limits_{t\uparrow a}}\ F(t).$$

Let us check that m is a σ-additive function. Let \(I,{I}_{i}\), \(i = 1,2,\ldots \) be intervals of the real line (open, half-open, or closed) such that \(I\,=\,{\bigcup }_{i=1}^{\infty }{I}_{i}\) and \({I}_{i} \bigcap {I}_{j}\,=\,\emptyset \) if \(i\neq j\). We need to check that

$$m(I) ={ \sum \limits_{i=1}^{\infty }}m({I}_{ i}).$$
(3.1)

It is clear that \(m(I) \geq {\sum }_{i=1}^{n}m({I}_{i})\) for each n, since the intervals I i do not intersect. Therefore, \(m(I) \geq {\sum }_{i=1}^{\infty }m({I}_{i})\).

In order to prove the opposite inequality, we assume that an arbitrary \(\epsilon > 0\) is given. Consider a collection of intervals \(J,{J}_{i}\), \(i = 1,2,\ldots \) which are constructed as follows. The interval J is a closed interval, which is contained in I and satisfies \(m(J) \geq m(I) - \epsilon /2\). (In particular, if I is closed we can take \(J = I\)). Let \({J}_{i}\) be an open interval, which contains \({I}_{i}\) and satisfies \(m({J}_{i}) \leq m({I}_{i}) + \epsilon /{2}^{i+1}\). The fact that it is possible to select such intervals J and \({J}_{i}\) follows from the definition of the function m and the continuity from the right of the function F. Note that \(J \subseteq {\bigcup }_{i=1}^{\infty }{J}_{i}\), J is compact, and all J i are open. Therefore, \(J \subseteq {\bigcup }_{i=1}^{n}{J}_{i}\) for some n. Clearly \(m(J) \leq {\sum }_{i=1}^{n}m({J}_{i})\). Therefore, \(m(I) \leq {\sum }_{i=1}^{n}m({I}_{i}) + \epsilon \). Since \(\epsilon \) is arbitrary, we obtain \(m(I) \leq {\sum }_{i=1}^{\infty }m({I}_{i})\). Therefore, (3.1) holds, and m is a σ-additive function.

Thus any distribution function gives rise to a probability measure on the Borel σ-algebra of the real line. This measure will be denoted by μ F . Sometimes, instead of writing \(d{\mu }_{F}\) in the integral with respect to such a measure, we shall write dF.

Conversely, any probability measure μ on the Borel sets of the real line defines a distribution function via the formula \(F(x) = \mu ((-\infty, x])\). Thus there is a one-to-one correspondence between probability measures on the real line and distribution functions.

Remark 3.10

Similarly, there is a one-to-one correspondence between the distribution functions on \({\mathbb{R}}^{n}\) and the probability measures on the Borel sets of \({\mathbb{R}}^{n}\). Namely, the distribution function F corresponding to a measure μ is defined by \(F({x}_{1},\ldots, {x}_{n}) = \mu ((-\infty, {x}_{1}] \times \ldots \times (-\infty, {x}_{n}])\).

Example

Let f be a function defined on an interval [a, b] of the real line. Let \(\sigma = \{ {t}_{0},{t}_{1},\ldots, {t}_{n}\}\), with \(a = {t}_{0} \leq {t}_{1} \leq \ldots \leq {t}_{n} = b\), be a partition of the interval \([a,b]\) into n subintervals. We denote the length of the largest interval by \(\delta (\sigma ) {=\max }_{1\leq i\leq n}({t}_{i} - {t}_{i-1})\). The p-th variation (with \(p > 0\)) of f over the partition \(\sigma \) is defined as

$${V }_{[a,b]}^{p}(f,\sigma ) ={\sum \limits_{i=1}^{n}}\vert f({t}_{ i}) - f({t}_{i-1}){\vert }^{p}.$$

Definition 3.11

The following limit

$${V }_{[a,b]}^{p}(f) = {\mathop {\rm lim\ sup} \limits_{\delta (\sigma )\rightarrow 0}}\ {V }_{[a,b]}^{p}(f,\sigma ),$$

is referred to as the p-th total variation of f over the interval [a,b].

Now let f be a continuous function with finite first (p = 1) total variation defined on an interval [a, b] of the real line. Then it can be represented as a difference of two continuous non-decreasing functions, namely,

$$f(x) = {V }_{[a,x]}^{1}(f) - ({V }_{ [a,x]}^{1}(f) - f(x)) = {F}_{ 1}(x) - {F}_{2}(x).$$

Now we can repeat the construction used in the previous example to define the measures \({\mu }_{{F}_{1}}\) and \({\mu }_{{F}_{2}}\) on the Borel subsets of [a, b]. Namely, we can define

$${m}_{i}((x,y]) = {m}_{i}([x,y]) = {m}_{i}((x,y)) = {m}_{i}([x,y)) = {F}_{i}(y) - {F}_{i}(x),\ \ i = 1,2,$$

and then extend \({m}_{i}\) to the measure \({\mu }_{{F}_{i}}\) using Theorem 3.9. The difference \({\mu }_{f} = {\mu }_{{F}_{1}} - {\mu }_{{F}_{2}}\) is then a signed measure (see Sect. 3.6). If g is a Borel-measurable function on [a, b], its integral with respect to the signed measure \({\mu }_{f}\), denoted by \({\int }_{a}^{b}g(x)df(x)\) or \({\int }_{a}^{b}g(x)d{\mu }_{f}(x)\), is defined as the difference of the integrals with respect to the measures \({\mu }_{{F}_{1}}\) and \({\mu }_{{F}_{2}}\),

$${\int }_{a}^{b}g(x)df(x) ={ \int }_{a}^{b}g(x)d{\mu }_{{ F}_{1}}(x) -{\int }_{a}^{b}g(x)d{\mu }_{{ F}_{2}}(x).$$

It is called the Lebesgue-Stieltjes integral of g with respect to f.

Example

For an interval I, let \({I}_{n} = I \cap [-n,n]\). Define \({m}_{n}(I)\) as the length of \({I}_{n}\). As in the first example, \({m}_{n}\) is a \(\sigma \)-additive function. Thus \({m}_{n}\) gives rise to a measure on the Borel sets of the real line, which will be denoted by \({\lambda }_{n}\) and referred to as the Lebesgue measure on the segment \([-n,n]\). Now for any Borel set A of the real line we can define its Lebesgue measure \(\lambda (A)\) via \(\lambda (A) {=\lim }_{n\rightarrow \infty }{\lambda }_{n}(A)\). It is easily checked that \(\lambda \) is a \(\sigma \)-additive measure which, however, may take infinite values for unbounded sets A.

Remark 3.12

The Lebesgue measure on the real line is an example of a \(\sigma \)-finite measure. We now give the formal definition of a \(\sigma \)-finite measure, although most of the measures that we deal with in this book are finite (probability) measures. An integral with respect to a \(\sigma \)-finite measure can be defined in the same way as an integral with respect to a finite measure.

Definition 3.13

Let \(({\it\Omega}, \mathcal{F})\) be a measurable space. A \(\sigma \) -finite measure is a function \(\mu \), defined on \(\mathcal{F}\) with values in \([0,\infty ]\), which satisfies the following conditions.

  1. 1.

    There is a sequence of measurable sets \({{\it\Omega} }_{1} \subseteq {{\it\Omega} }_{2} \subseteq \ldots \subseteq {\it\Omega} \) such that \(\mu ({{\it\Omega} }_{i}) < \infty \) for all i, and \({\bigcup }_{i=1}^{\infty }{{\it\Omega} }_{i} = {\it\Omega} \).

  2. 2.

    If \({C}_{i} \in \mathcal{F},\ i = 1,2,\ldots \) and \({C}_{i} \bigcap {C}_{j} = \emptyset \) for \(i\neq j\), then

    $$\mu ({\bigcup \limits_{i=1}^{\infty }}{C}_{ i}) ={\sum \limits_{i=1}^{\infty} }\mu ({C}_{ i}). $$

If \({F}_{\xi }\) is the distribution function of a random variable \(\xi \), then the measure \({\mu }_{{F}_{\xi }}\) (also denoted by \({\mu }_{\xi }\)) coincides with the measure induced by the random variable \(\xi \). Indeed, the values of the induced measure and of \({\mu }_{\xi }\) coincide on the intervals, and therefore on all the Borel sets due to the uniqueness part of Theorem 3.9.

Theorem 3.8 together with the fact that \({\mu }_{\xi }\) coincides with the induced measure imply the following.

Theorem 3.14

Let \(\xi \) be a random variable and g be a Borel measurable function on \(\mathbb{R}\) . Then

$$\mathrm{E}g(\xi ) ={ \int }_{-\infty }^{\infty }g(x)d{F}_{ \xi }(x).$$

Applying this theorem to the functions \(g(x) = x\), \(g(x) = {x}^{p}\) and \(g(x) = {(x -\mathrm{ E}\xi )}^{2}\), we obtain the following.

Corollary 3.15

$$\mathrm{E}\xi ={ \int }_{-\infty }^{\infty }xd{F}_{ \xi }(x),\ \mathrm{E}{\xi }^{p} ={ \int }_{-\infty }^{\infty }{x}^{p}d{F}_{ \xi }(x),\ \mathrm{Var}\xi ={ \int }_{-\infty }^{\infty }{(x-\mathrm{E}\xi )}^{2}d{F}_{ \xi }(x).$$

3 Types of Measures and Distribution Functions

Let \(\mu \) be a finite measure on the Borel \(\sigma \)-algebra of the real line. We distinguish three special types of measures.

  1. (a)

    Discrete measure. Assume that there exists a finite or countable set \(A =\{ {a}_{1},{a}_{2},\ldots \}\) such that \(\mu ((-\infty, \infty )) = \mu (A)\), that is A is a set of full measure. In this case \(\mu \) is called a measure of discrete type.

  2. (b)

    Singular continuous measure. Assume that the measure of any single point is zero, \(\mu (a) = 0\) for any \(a \in \mathbb{R}\), and there is a Borel set B of Lebesgue measure zero which is of full measure for the measure \(\mu \), that is \(\lambda (B) = 0\) and \(\mu ((-\infty, \infty )) = \mu (B)\). In this case \(\mu \) is called a singular continuous measure.

  3. (c)

    Absolutely continuous measure. Assume that for every set of Lebesgue measure zero the \(\mu \) measure of that set is also zero, that is \(\lambda (A) = 0\) implies \(\mu (A) = 0\). In this case \(\mu \) is called an absolutely continuous measure.

While any given measure does not necessarily belong to one of the three classes above, the following theorem states that it can be decomposed into three components, one of which is discrete, the second singular continuous, and the third absolutely continuous.

Theorem 3.16

Given any finite measure \(\mu \) on \(\mathbb{R}\) there exist measures \({\mu }_{1}\), \({\mu }_{2}\) and \({\mu }_{3}\), the first of which is discrete, the second singular continuous and the third absolutely continuous, such that for any Borel set C of the real line we have

$$\mu (C) = {\mu }_{1}(C) + {\mu }_{2}(C) + {\mu }_{3}(C).$$

Such measures \({\mu }_{1}\), \({\mu }_{2}\) and \({\mu }_{3}\) are determined by the measure \(\mu \) uniquely.

Proof

Let \({A}_{1}\) be the collection of points a such that \(\mu (a) \geq 1\), let \({A}_{2}\) be the collection of points \(a \in \mathbb{R}\setminus {A}_{1}\) such that \(\mu (a) \geq \frac{1} {2}\), let \({A}_{3}\) be the collection of points \(a \in \mathbb{R}\setminus ({A}_{1} \bigcup {A}_{2})\) such that \(\mu (a) \geq \frac{1} {3}\), and so on. Since the measure is finite, each set \({A}_{n}\) contains only finitely many elements. Therefore, \(A ={ \bigcup }_{n}{A}_{n}\) is countable. At the same time \(\mu (b) = 0\) for any \(b\notin A\). Let \({\mu }_{1}(C) = \mu (C\bigcap A)\).

We shall now construct the measure \({\mu }_{2}\) and a set B of zero Lebesgue measure, but of full \({\mu }_{2}\) measure. (Note that it may turn out that \({\mu }_{2}(B) = 0\), that is \({\mu }_{2}\) is identically zero.) First we inductively construct sets \({B}_{n}\), \(n \geq 1\), as follows. Take \({B}_{1}\) to be an empty set. Assuming that \({B}_{n}\) has been constructed, we take \({B}_{n+1}\) to be any set of Lebesgue measure zero, which does not intersect \({\bigcup }_{i=1}^{n}{B}_{i}\) and satisfies

$$\mu ({B}_{n+1}) - {\mu }_{1}({B}_{n+1}) \geq \frac{1} {m}$$
(3.2)

with the smallest possible m, where \(m \geq 1\) is an integer. If no such m exists, then we take \({B}_{n+1}\) to be the empty set. For each m there is at most a finite number of non-intersecting sets which satisfy (3.2), and therefore the set \(\mathbb{R}\setminus {\bigcup }_{n=1}^{\infty }{B}_{n}\) contains no set C for which \(\mu (C) - {\mu }_{1}(C) > 0\). We put \(B ={ \bigcup }_{n=1}^{\infty }{B}_{n}\), which is a set of Lebesgue measure zero, and define \({\mu }_{2}(C) = \mu (C\bigcap B) - {\mu }_{1}(C\bigcap B)\). Note that \({\mu }_{2}(B) = {\mu }_{2}((-\infty, \infty ))\), and therefore \({\mu }_{2}\) is singular continuous.

By the construction of \({\mu }_{1}\) and \({\mu }_{2}\), we have that \({\mu }_{3}(C) = \mu (C) - {\mu }_{1}(C) - {\mu }_{2}(C)\) is a measure which is equal to zero on each set of Lebesgue measure zero. Thus we have the desired decomposition. The uniqueness part is left as an easy exercise for the reader.

Since there is a one-to-one correspondence between probability measures on the real line and distribution functions, we can single out the classes of distribution functions corresponding to the discrete, singular continuous and absolutely continuous measures. In the discrete case \(F(x) = \mu ((-\infty, x])\) is a step function. The jumps occur at the points \({a}_{i}\) of positive μ-measure.

If the distribution function F has a Lebesgue integrable density p, that is \(F(x) ={ \int }_{-\infty }^{x}p(t)dt\), then F corresponds to an absolutely continuous measure. Indeed, \({\mu }_{F}(A) ={ \int }_{A}p(t)dt\) for any Borel set A, since the equality is true for all intervals, and therefore it is true for all Borel sets due to the uniqueness of the extension of the measure. The value of the integral \({\int }_{A}p(t)dt\) over any set of Lebesgue measure zero is equal to zero.

The converse is also true, i.e., any absolutely continuous measure has a Lebesgue integrable density function. This follows from the Radon-Nikodym theorem, which we shall state below.

If a measure \(\mu \) does not contain a discrete component, then the distribution function is continuous. Yet if the singular continuous component is present, it cannot be represented as an integral of a density. The so-called Cantor Staircase is an example of such a distribution function. Set \(F(t) = 0\) for \(t \leq 0\), \(F(t) = 1\) for \(t \geq 1\). We construct F(t) for \(0 < t < 1\) inductively. At the n-th step (\(n \geq 0\)) we have disjoint intervals of length \({3}^{-n}\), where the function F(t) is not yet defined, although it is defined at the end points of such intervals. Let us divide every such interval into three equal parts, and set F(t) on the middle interval (including the end-points) to be a constant equal to the half-sum of its values at the above-mentioned end-points. It is easy to see that the function F(t) can be extended by continuity to the remaining t. The limit function is called the Cantor Staircase. It corresponds to a singular continuous probability measure. The theory of fractals is related to some classes of singular continuous measures.

4 Remarks on the Construction of the Lebesgue Measure

In this section we provide an abstract generalization of Theorem 3.9 on the extension of a σ-additive function. Theorem 3.9 applies to the construction of a measure on the real line which, in the case of the Lebesgue measure, can be viewed as an extension of the notion of length of an interval. In fact we can define the notion of measure starting from a σ-additive function defined on a certain collection of subsets of an abstract set.

Definition 3.17

A collection \(\mathcal{G}\) of subsets of Ω is called a semialgebra if it has the following three properties:

  1. 1.

    \({\it\Omega} \in \mathcal{G}\).

  2. 2.

    If \({C}_{1},{C}_{2} \in \mathcal{G}\), then \({C}_{1} \bigcap {C}_{2} \in \mathcal{G}\).

  3. 3.

    If \({C}_{1},{C}_{2} \in \mathcal{G}\) and \({C}_{2} \subseteq {C}_{1}\), then there exists a finite collection of disjoint sets \({A}_{1},\ldots, {A}_{n} \in \mathcal{G}\) such that \({C}_{2} \bigcap {A}_{i} = \emptyset \) for \(i = 1,\ldots, n\) and \({C}_{2} \bigcup {A}_{1} \bigcup \ldots \bigcup {A}_{n} = {C}_{1}\).

Definition 3.18

A non-negative function with values in \(\mathbb{R}\) defined on a semialgebra \(\mathcal{G}\) is said to be \(\sigma \) -additive if it satisfies the following condition:

If \(C ={ \bigcup }_{i=1}^{\infty }{C}_{i}\) with \(C \in \mathcal{G}\), \({C}_{i} \in \mathcal{G},\ i = 1,2,\ldots \), and \({C}_{i} \bigcap {C}_{j} = \emptyset \) for \(i\neq j\), then

$$m(C) ={ \sum \limits_{i=1}^{\infty }}m({C}_{ i}). $$

Theorem 3.19 (Caratheodory)

Let m be a σ-additive function defined on a semialgebra \(({\it\Omega}, \mathcal{G})\) . Then there exists a measure \(\mu \) defined on \(({\it\Omega}, \sigma (\mathcal{G}))\) such that \(\mu (C) = m(C)\) for every \(C \in \mathcal{G}\) . The measure \(\mu \) which has this property is unique.

We shall only indicate a sequence of steps used in the proof of the theorem, without giving all the details. A more detailed exposition can be found in the textbook of Fomin and Kolmogorov “Elements of Theory of Functions and Functional Analysis”.

Step 1. Extension of the \(\sigma \) -additive function from the semialgebra to the algebra. Let \(\mathcal{A}\) be the collection of sets which can be obtained as finite unions of disjoint elements of \(\mathcal{G}\), that is \(A \in \mathcal{A}\) if \(A ={ \bigcup }_{i=1}^{n}{C}_{i}\) for some \({C}_{i} \in \mathcal{G}\), where \({C}_{i} \bigcap {C}_{j} = \emptyset \) if \(i\neq j\). The collection of sets \(\mathcal{A}\) is an algebra since it contains the set \({\it\Omega} \) and is closed under finite unions, intersections, differences, and symmetric differences. For \(A ={ \bigcup }_{i=1}^{n}{C}_{i}\) with \({C}_{i} \bigcap {C}_{j} = \emptyset \), \(i\neq j\), we define \(m(A) ={ \sum }_{i=1}^{n}m({C}_{i})\). We can then show that m is still a \(\sigma \)-additive function on the algebra \(\mathcal{A}\).

Step 2. Definition of exterior measure and of measurable sets. For any set \(B \subseteq {\it\Omega} \) we can define its exterior measure as \({\mu }^{{_\ast}}(B) =\inf { \sum }_{i}m({A}_{i})\), where the infimum is taken over all countable coverings of B by elements of the algebra \(\mathcal{A}\). A set B is called measurable if for any \(\epsilon > 0\) there is \(A \in \mathcal{A}\) such that \({\mu }^{{_\ast}}(A \bigtriangleup B) \leq \epsilon \). Recall that \(A \bigtriangleup B\) is the notation for the symmetric difference of the sets A and B. If B is measurable we define its measure to be equal to the exterior measure: \(\mu (B) = {\mu }^{{_\ast}}(B)\). Denote the collection of all measurable sets by \(\mathcal{B}\).

Step 3. The \(\sigma \) -algebra of measurable sets and \(\sigma \) -additivity of the measure. The main part of the proof consists of demonstrating that \(\mathcal{B}\) is a \(\sigma \)-algebra, and that the function \(\mu \) defined on it has the properties of a measure. We can then restrict the measure to the smallest \(\sigma \)-algebra containing the original semialgebra. The uniqueness of the measure follows easily from the non-negativity of m and from the fact that the measure is uniquely defined on the algebra \(\mathcal{A}\). Alternatively, see Lemma 4.14 in Chap. 4, which also implies the uniqueness of the measure.

Remark 3.20

It is often convenient to consider the measure μ on the measurable space \(({\it\Omega}, \mathcal{B})\), rather than to restrict the measure to the \(\sigma \)-algebra \(\sigma (\mathcal{G})\), which is usually smaller than \(\mathcal{B}\). The difference is that \(({\it\Omega}, \mathcal{B})\) is always complete with respect to measure \(\mu \), while \(({\it\Omega}, \sigma (\mathcal{G}))\) does not need to be complete. We discuss the notion of completeness in the remainder of this section.

Definition 3.21

Let \(({\it\Omega}, \mathcal{F})\) be a measurable space with a finite measure \(\mu \) on it. A set \(A \subseteq {\it\Omega} \) is said to be μ-negligible if there is an event \(B \in \mathcal{F}\) such that \(A \subseteq B\) and \(\mu (B) = 0\) . The space \(({\it\Omega}, \mathcal{F})\) is said to be complete with respect to μ if all μ-negligible sets belong to \(\mathcal{F}\).

Given an arbitrary measurable space \(({\it\Omega}, \mathcal{F})\) with a finite measure μ on it, we can consider an extended \(\sigma \)-algebra \(\widetilde{\mathcal{F}}\). It consists of all sets \(\widetilde{B} \subseteq {\it\Omega} \) which can be represented as \(\widetilde{B} = A \cup B\), where A is a μ-negligible set and \(B \in \mathcal{F}\). We define \(\widetilde{\mu }(\widetilde{B}) = \mu (B)\). It is easy to see that \(\widetilde{\mu }(\widetilde{B})\) does not depend on the particular representation of \(\widetilde{B}\), \(({\it\Omega}, \widetilde{\mathcal{F}})\) is a measurable space, \(\widetilde{\mu }\) is a finite measure, and \(({\it\Omega}, \widetilde{\mathcal{F}})\) is complete with respect to \(\widetilde{\mu }\). We shall refer to \(({\it\Omega}, \widetilde{\mathcal{F}})\) as the completion of \(({\it\Omega}, \mathcal{F})\) with respect to the measure \(\mu \).

It is not difficult to see that \(\widetilde{\mathcal{F}} = \sigma (\mathcal{F}\cup {\mathcal{N}}^{\mu })\), where \({\mathcal{N}}^{\mu }\) is the collection of \(\mu \)-negligible sets in \({\it\Omega} \).

5 Convergence of Functions, Their Integrals, and the Fubini Theorem

Let \(({\it\Omega}, \mathcal{F},\mu )\) be a measurable space with a finite measure. Let f and \({f}_{n}\), \(n = 1,2,\ldots \) be measurable functions.

Definition 3.22

A sequence of functions \({f}_{n}\) is said to converge to f uniformly if

$${\lim \limits_{n\rightarrow \infty}} {\sup \limits_{\omega \in {\it\Omega}}}\vert {f}_{n}(\omega ) - f(\omega )\vert = 0.$$

Definition 3.23

A sequence of functions \({f}_{n}\) is said to converge to f in measure (or in probability, if \(\mu \) is a probability measure) if for any \(\delta > 0\) we have

$${\lim \limits_{n\rightarrow \infty }}\mu (\omega : \vert {f}_{n}(\omega ) - f(\omega )\vert > \delta ) = 0.$$

Definition 3.24

A sequence of functions \({f}_{n}\) is said to converge to \(f\) almost everywhere (or almost surely) if there is a measurable set A with \(\mu ({\it\Omega} \setminus A) = 0\) such that

$${\lim \limits_{n\rightarrow \infty}}{f}_{n}(\omega ) = f(\omega )\ \ \mathrm{for}\ \ \omega \in A.$$

It is not difficult to demonstrate that convergence almost everywhere implies convergence in measure. The opposite implication is only true if we consider a certain subsequence of the original sequence f n (see Problem 8). The following theorem relates the notions of convergence almost everywhere and uniform convergence.

Theorem 3.25 (Egorov Theorem)

If a sequence of measurable functions \({f}_{n}\) converges to a measurable function f almost everywhere, then for any \(\delta > 0\) there exists a measurable set \({{\it\Omega} }_{\delta } \subseteq {\it\Omega} \) such that \(\mu ({{\it\Omega} }_{\delta }) \geq \mu ({\it\Omega} ) - \delta \) and \({f}_{n}\) converges to f uniformly on \({{\it\Omega} }_{\delta }\).

Proof

Let \(\delta > 0\) be fixed. Let

$${\it\Omega}_{n}^{m} ={\bigcap \limits_{i\geq n}}\{\omega : \vert {f}_{i}(\omega ) - f(\omega )\vert < \frac{1} {m}\}$$

and

$${\it\Omega}^{m} ={\bigcup \limits_{n=1}^{\infty}}{{\it\Omega} }_{ n}^{m}.$$

Due to the continuity of the measure (Theorem 1.36), for every m there is \({n}_{0}(m)\) such that \(\mu ({{\it\Omega} }^{m}\setminus {{\it\Omega} }_{{n}_{0}(m)}^{m}) < \delta /{2}^{m}\). We define \({{\it\Omega} }_{\delta } ={ \bigcap }_{m=1}^{\infty }{{\it\Omega} }_{{n}_{0}(m)}^{m}\). We claim that \({{\it\Omega} }_{\delta }\) satisfies the requirements of the theorem.

The uniform convergence follows from the fact that \(\vert {f}_{i}(\omega ) - f(\omega )\vert < 1/m\) for all \(\omega \in {{\it\Omega} }_{\delta }\) if \(i > {n}_{0}(m)\). In order to estimate the measure of \({{\it\Omega} }_{\delta }\), we note that \({f}_{n}(\omega )\) does not converge to \(f(\omega )\) if \(\omega \) is outside of the set \({{\it\Omega} }^{m}\) for some m. Therefore, \(\mu ({\it\Omega} \setminus {{\it\Omega} }^{m}) = 0\). This implies

$$\mu ({\it\Omega} \setminus {\it\Omega}_{{n}_{0}(m)}^{m}) = \mu ({\it\Omega}^{m}\setminus {\it\Omega}_{{ n}_{0}(m)}^{m}) < \frac{\delta } {{2}^{m}}.$$

Therefore,

$$\mu ({\it\Omega} \setminus {\it\Omega}_{\delta }) = \mu ({\bigcup \limits_{m=1}^{\infty}}({\it\Omega} \setminus {\it\Omega}_{{ n}_{0}(m)}^{m})) \leq {\sum \limits_{m=1}^{\infty}}\mu ({\it\Omega} \setminus {\it\Omega}_{{ n}_{0}(m)}^{m}) <{ \sum \limits_{m=1}^{\infty}} \frac{\delta } {{2}^{m}} = \delta, $$

which completes the proof of the theorem.

The following theorem justifies passage to the limit under the sign of the integral.

Theorem 3.26 (Lebesgue Dominated Convergence Theorem)

If a sequence of measurable functions \({f}_{n}\) converges to a measurable function \(f\) almost everywhere and

$$\vert {f}_{n}\vert \leq \varphi, $$

where \(\varphi \) is integrable on \({\it\Omega} \), then the function \(f\) is integrable on \({\it\Omega} \) and

$${\lim \limits_{n\rightarrow \infty}}{\int}_{\it\Omega }{f}_{n}d\mu ={ \int }_{{\it\Omega} }fd\mu. $$

Proof

Let some \(\epsilon > 0\) be fixed. It is easily seen that \(\vert f(\omega )\vert \leq \varphi (\omega )\) for almost all \(\omega \). Therefore, as follows from the elementary properties of the integral, the function f is integrable. Let \({{\it\Omega} }_{k} =\{ \omega : k - 1 \leq \varphi (\omega ) < k\}\). Since the integral is a \(\sigma \)-additive function,

$${\int}_{{\it\Omega} }\varphi d\mu ={\sum \limits_{k=1}^{\infty}}{\int}_{{{\it\Omega} }_{k}}\varphi d\mu $$

Let \(m > 0\) be such that \({\sum }_{k=m}^{\infty }{\int }_{{{\it\Omega} }_{k}}\varphi d\mu < \epsilon /5\). Let \(A ={ \bigcup }_{k=m}^{\infty }{{\it\Omega} }_{k}\). By the Egorov Theorem, we can select a set \(B \subseteq {\it\Omega} \setminus A\) such that \(\mu (B) \leq \epsilon /5m\) and \({f}_{n}\) converges to f uniformly on the set \(C = ({\it\Omega} \setminus A)\setminus B\). Finally,

$$\vert {\int }_{{\it\Omega} }{f}_{n}d\mu -{\int }_{{\it\Omega} }fd\mu \vert \leq \vert {\int }_{A}{f}_{n}d\mu -{\int }_{A}fd\mu \vert $$
$$+\vert {\int }_{B}{f}_{n}d\mu -{\int }_{B}fd\mu \vert + \vert {\int }_{C}{f}_{n}d\mu -{\int }_{C}fd\mu \vert. $$

The first term on the right-hand side can be estimated from above by \(2\epsilon /5\), since \({\int }_{A}\vert {f}_{n}\vert d\mu, {\int }_{A}\vert f\vert d\mu \leq {\int }_{A}\varphi d\mu < \epsilon /5\). The second term does not exceed \(\mu {(B)\sup }_{\omega \in B}(\vert {f}_{n}(\omega )\vert + \vert f(\omega )\vert ) \leq 2\epsilon /5\). The last term can be made smaller than \(\epsilon /5\) for sufficiently large n due to the uniform convergence of f n to f on the set C. Therefore, \(\vert {\int }_{{\it\Omega} }{f}_{n}d\mu -{\int }_{{\it\Omega} }fd\mu \vert \leq \epsilon \) for sufficiently large n, which completes the proof of the theorem.

From the Lebesgue Dominated Convergence Theorem it is easy to derive the following two statements, which we provide here without proof.

Theorem 3.27 (Levi Monotonic Convergence Theorem)

Let a sequence of measurable functions be non-decreasing almost surely, that is

$${f}_{1}(\omega ) \leq {f}_{2}(\omega ) \leq \ldots \leq {f}_{n}(\omega ) \leq \ldots $$

almost surely. Assume that the integrals are bounded:

$${\int }_{{\it\Omega} }{f}_{n}d\mu \leq K\ \ \mathrm{for}\ \mathrm{all}\ n.$$

Then, almost surely, there exists a finite limit

$$f(\omega ) ={\lim \limits_{n\rightarrow \infty}}{f}_{n}(\omega ),$$

the function f is integrable, and \({\int }_{{\it\Omega} }fd\mu {=\lim }_{n\rightarrow \infty }{\int }_{{\it\Omega} }{f}_{n}d\mu \).

Lemma 3.28 (Fatou Lemma)

If \({f}_{n}\) is a sequence of non-negative measurable functions, then

$${\int}_{\it\Omega}{\mathop {lim\ inf} \limits_{n\rightarrow \infty}}{f}_{n}d\mu \leq {\mathop {lim\ inf} \limits_{n\rightarrow \infty}}{\int}_{\it\Omega}{f}_{n}d\mu \leq \infty. $$

Let us discuss products of \(\sigma \)-algebras and measures. Let \(({{\it\Omega} }_{1},{\mathcal{F}}_{1},{\mu }_{1})\) and \(({{\it\Omega} }_{2},{\mathcal{F}}_{2},{\mu }_{2})\) be two measurable spaces with finite measures. We shall define the product space with the product measure \(({\it\Omega}, \mathcal{F},\mu )\) as follows. The set \({\it\Omega} \) is just a set of ordered pairs \({\it\Omega} = {{\it\Omega} }_{1} \times {{\it\Omega} }_{2} =\{ ({\omega }_{1},{\omega }_{2}),{\omega }_{1} \in {{\it\Omega} }_{1},{\omega }_{2} \in {{\it\Omega} }_{2}\}\).

In order to define the product \(\sigma \)-algebra, we first consider the collection of rectangles \(\mathcal{R} =\{ A \times B,A \in {\mathcal{F}}_{1},B \in {\mathcal{F}}_{2}\}\). Then \(\mathcal{F}\) is defined as the smallest \(\sigma \)-algebra containing all the elements of \(\mathcal{R}\).

Note that \(\mathcal{R}\) is a semialgebra. The product measure \(\mu \) on \(\mathcal{F}\) is defined to be the extension to the \(\sigma \)-algebra of the function \(m\) defined on \(\mathcal{R}\) via \(m(A \times B) = {\mu }_{1}(A){\mu }_{2}(B)\). In order to justify this extension, we need to prove that m is a \(\sigma \)-additive function on \(\mathcal{R}\).

Lemma 3.29

The function \(m(A \times B) = {\mu }_{1}(A){\mu }_{2}(B)\) is a \(\sigma \) -additive function on the semialgebra \(\mathcal{R}\).

Proof

Let \({A}_{1} \times {B}_{1},{A}_{2} \times {B}_{2},\ldots \) be a sequence of non-intersecting rectangles such that \(A \times B ={ \bigcup }_{n=1}^{\infty }{A}_{n} \times {B}_{n}\). Consider the sequence of functions \({f}_{n}({\omega }_{1}) ={ \sum }_{i=1}^{n}{\chi }_{{A}_{i}}({\omega }_{1}){\mu }_{2}({B}_{i})\), where \({\chi }_{{A}_{i}}\) is the indicator function of the set \({A}_{i}\). Similarly, let \(f({\omega }_{1}) = {\chi }_{A}({\omega }_{1}){\mu }_{2}(B)\). Note that \({f}_{n} \leq {\mu }_{2}(B)\) for all n and \(\lim_{n\rightarrow \infty }{f}_{n}({\omega }_{1}) = f({\omega }_{1})\). Therefore, the Lebesgue Dominated Convergence Theorem applies. We have

$${\lim \limits_{n\rightarrow \infty}}{\sum \limits_{i=1}^{n}}m({A}_{ i}\times {B}_{i}) ={\lim \limits_{n\rightarrow \infty}}{\sum \limits_{i=1}^{n}}{\mu }_{ 1}({A}_{i}){\mu}_{2}({B}_{i}) ={\lim \limits_{n\rightarrow \infty}} {\int}_{{\it\Omega}1}{f}_{n}({\omega }_{1})d{\mu }_{1}({\omega }_{1})$$
$$={ \int }_{{{\it\Omega} }_{1}}f({\omega }_{1})d{\mu }_{1}({\omega }_{1}) = {\mu }_{1}(A){\mu }_{2}(B) = m(A \times B).$$

We are now in a position to state the Fubini Theorem. If \(({\it\Omega}, \mathcal{F},\mu )\) is a measurable space with a finite measure, and f is defined on a set of full measure \(A \in \mathcal{F}\), then \({\int }_{{\it\Omega} }fd\mu \) will mean \({\int }_{A}fd\mu \).

Theorem 3.30 (Fubini Theorem)

Let \(({{\it\Omega} }_{1},{\mathcal{F}}_{1},{\mu }_{1})\) and \(({{\it\Omega} }_{2},{\mathcal{F}}_{2},{\mu }_{2})\) be two measurable spaces with finite measures, and let \(({\it\Omega}, \mathcal{F},\mu )\) be the product space with the product measure. If a function \(f({\omega }_{1},{\omega }_{2})\) is integrable with respect to the measure μ, then

$${\int }_{{\it\Omega} }f({\omega }_{1},{\omega }_{2})d\mu ({\omega }_{1},{\omega }_{2})$$
$$={ \int }_{{{\it\Omega} }_{1}}({\int }_{{{\it\Omega} }_{2}}f({\omega }_{1},{\omega }_{2})d{\mu }_{2}({\omega }_{2}))d{\mu }_{1}({\omega }_{1})$$
(3.3)
$$=\int_{\Omega2}(\int_{\Omega1} f(w_{1},w_{2})d\mu_{1}(w_{1}))d\mu_2(w_{2}).$$

In particular, the integrals inside the brackets are finite almost surely and are integrable functions of the exterior variable.

Sketch of the Proof

The fact that the theorem holds if f is an indicator function of a set \(A \times B\), where \(A \in {\mathcal{F}}_{1},B \in {\mathcal{F}}_{2}\), follows from the construction of the Lebesgue measure on the product space. The fact that the theorem also holds if f is an indicator function of a measurable set then easily follows from Lemma 4.13 proved in the next chapter.

Concerning f which is not necessarily an indicator function, without loss of generality we may assume that f is non-negative. If f is a simple integrable function with a finite number of values, we can represent it as a finite linear combination of indicator functions, and therefore the theorem holds for such functions. If f is any integrable function, we can approximate it by a monotonically non-decreasing sequence of simple integrable functions with finite number of values. Then from the Levi Convergence Theorem it follows that the repeated integrals are finite and are equal to the integral on the left-hand side of (3.3).

6 Signed Measures and the Radon-Nikodym Theorem

In this section we state, without proof, the Radon-Nikodym Theorem and the Hahn Decomposition Theorem. Both proofs can be found in the textbook of S. Fomin and A. Kolmogorov, “Elements of Theory of Functions and Functional Analysis”.

Definition 3.31

Let \(({\it\Omega}, \mathcal{F})\) be a measurable space. A function \(\eta : \mathcal{F}\rightarrow \mathbb{R}\) is called a signed measure if

$$\eta ({\bigcup \limits_{i=1}^{\infty }}{C}_{ i}) = {\sum _{i=1}^{\infty}}\eta ({C}_{ i}),$$

whenever \({C}_{i} \in \mathcal{F}\), \(i \geq 1\), are such that \({C}_{i} \cap {C}_{j} = \emptyset \) for \(i\neq j\).

If \(\mu \) is a non-negative measure on \(({\it\Omega}, \mathcal{F})\), then an example of a signed measure is provided by the integral of a function with respect to \(\mu \),

$$\eta (A) ={ \int }_{A}fd\mu, $$

where \(f \in {L}^{1}({\it\Omega}, \mathcal{F},\mu )\). Later, when we talk about conditional expectations, it will be important to consider the converse problem—given a measure \(\mu \) and a signed measure \(\eta \), we would like to represent \(\eta \) as an integral of some function with respect to measure \(\mu \). In fact this is always possible, provided \(\mu (A) = 0\) for a set \(A \in \mathcal{F}\) implies that \(\eta (A) = 0\) (which is, of course, true if \(\eta (A)\) is an integral of some function over the set A).

To make our discussion more precise we introduce the following definition.

Definition 3.32

Let \(({\it\Omega}, \mathcal{F})\) be a measurable space with a finite non-negative measure \(\mu \) . A signed measure \(\eta : \mathcal{F}\rightarrow \mathbb{R}\) is called absolutely continuous with respect to \(\mu \) if \(\mu (A) = 0\) implies that \(\eta (A) = 0\) for \(A \in \mathcal{F}\).

Remark 3.33

An equivalent definition of absolute continuity is as follows. A signed measure \(\eta : \mathcal{F}\rightarrow \mathbb{R}\) is called absolutely continuous with respect to \(\mu \) if for any \(\epsilon > 0\) there is a \(\delta > 0\) such that \(\mu (A) < \delta \) implies that \(\vert \eta (A)\vert < \epsilon \). (In Problem 10 the reader is asked to prove the equivalence of the definitions when \(\eta \) is a non-negative measure.)

Theorem 3.34 (Radon-Nikodym Theorem)

Let \(({\it\Omega}, \mathcal{F})\) be a measurable space with a finite non-negative measure \(\mu \), and \(\eta \) a signed measure absolutely continuous with respect to \(\mu \) . Then there is an integrable function f such that

$$\eta (A) ={ \int }_{A}fd\mu $$

for all \(A \in \mathcal{F}\) . Any two functions which have this property can be different on at most a set of \(\mu \) -measure zero.

The function f is called the density or the Radon-Nikodym derivative of \(\eta \) with respect to the measure \(\mu \).

The following theorem implies that signed measures are simply differences of two non-negative measures.

Theorem 3.35 (Hahn Decomposition Theorem)

Let \(({\it\Omega}, \mathcal{F})\) be a measurable space with a signed measure \(\eta : \mathcal{F}\rightarrow \mathbb{R}\) . Then there exist two sets \({{\it\Omega} }^{+} \in \mathcal{F}\) and \({{\it\Omega} }^{-}\in \mathcal{F}\) such that

  1. 1.

    \({{\it\Omega} }^{+} \cup {{\it\Omega} }^{-} = {\it\Omega} \) and \({{\it\Omega} }^{+} \cap {{\it\Omega} }^{-} = \emptyset \).

  2. 2.

    \(\eta (A \cap {{\it\Omega} }^{+}) \geq 0\) for any \(A \in \mathcal{F}\).

  3. 3.

    \(\eta (A \cap {{\it\Omega} }^{-}) \leq 0\) for any \(A \in \mathcal{F}\).

If \(\widetilde{{{\it\Omega} }}^{+}\), \(\widetilde{{{\it\Omega} }}^{-}\) is another pair of sets with the same properties, then \(\eta (A) = 0\) for any \(A \in \mathcal{F}\) such that \(A \in {{\it\Omega} }^{+}\Delta \widetilde{{{\it\Omega} }}^{+}\) or \(A \in {{\it\Omega} }^{-}\Delta \widetilde{{{\it\Omega} }}^{-}\).

Consider two non-negative measures \({\eta }^{+}\) and \({\eta }^{-}\) defined by

$${\eta }^{+}(A) = \eta (A \cap {{\it\Omega} }^{+})\ \ \ \mathrm{and}\ \ \ {\eta }^{-}(A) = -\eta (A \cap {{\it\Omega} }^{-}).$$

These are called the positive part and the negative part of \(\eta \), respectively. The measure \(\vert \eta \vert = {\eta }^{+} + {\eta }^{-}\) is called the total variation of \(\eta \). It easily follows from the Hahn Decomposition Theorem that \({\eta }^{+}\), \({\eta }^{-}\), and \(\vert \eta \vert \) do not depend on the particular choice of \({{\it\Omega} }^{+}\) and \({{\it\Omega} }^{-}\). Given a measurable function \(f\) which is integrable with respect to \(\vert \eta \vert \), we can define

$${\int }_{{\it\Omega} }fd\eta ={ \int }_{{\it\Omega} }fd{\eta }^{+} -{\int }_{{\it\Omega} }fd{\eta }^{-}.$$

7 Lp Spaces

Let \(({\it\Omega}, \mathcal{F},\mu )\) be a space with a finite measure. We shall call two complex-valued measurable functions f and g equivalent (\(f \sim g\)) if \(\mu (f\neq g) = 0\). Note that \(\sim \) is indeed an equivalence relationship, i.e.,

  1. 1.

    \(f \sim f\).

  2. 2.

    \(f \sim g\) implies that \(g \sim f\).

  3. 3.

    \(f \sim g\) and \(g \sim h\) imply that \(f \sim h\).

It follows from general set theory that the set of measurable functions can be viewed as a union of non-intersecting subsets, the elements of the same subset being all equivalent, and the elements which belong to different subsets not being equivalent.

We next introduce the \({L}^{p}({\it\Omega}, \mathcal{F},\mu )\) spaces, whose elements are some of the equivalence classes of measurable functions. We shall not distinguish between a measurable function and the equivalence class it represents.

For \(1 \leq p < \infty \) we define

$$\vert \vert f\vert {\vert }_{p} = {({\int }_{{\it\Omega} }\vert f{\vert }^{p}d\mu )}^{\frac{1} {p} }.$$

The set of functions (or rather the set of equivalence classes) for which \(\vert \vert f\vert {\vert }_{p}\) is finite is denoted by \({L}^{p}({\it\Omega}, \mathcal{F},\mu )\) or simply \({L}^{p}\). It readily follows that \({L}^{p}\) is a normed linear space, with the norm \(\vert \vert \cdot \vert {\vert }_{p}\), that is

  1. 1.

    \(\vert \vert f\vert {\vert }_{p} \geq 0,\ \ \ \vert \vert f\vert {\vert }_{p} = 0\) if and only if \(f = 0\).

  2. 2.

    \(\vert \vert \alpha f\vert {\vert }_{p} = \vert \alpha \vert \vert \vert f\vert {\vert }_{p}\) for any complex number \(\alpha \).

  3. 3.

    \(\vert \vert f + g\vert {\vert }_{p} \leq \vert \vert f\vert {\vert }_{p} + \vert \vert g\vert {\vert }_{p}\).

It is also not difficult to see, and we leave it for the reader as an exercise, that all the \({L}^{p}\) spaces are complete. We also formulate the H\(\ddot{\mathrm{o}}\)lder Inequality, which states that if \(f \in {L}^{p}\) and \(g \in {L}^{q}\) with \(p,q > 1\) such that \(1/p + 1/q = 1\), then \(fg \in {L}^{1}\) and

$$\vert \vert fg\vert {\vert }_{1} \leq \vert \vert f\vert {\vert }_{p}\vert \vert g\vert {\vert }_{q}.$$

When \(p = q = 2\) this is also referred to as the Cauchy-Bunyakovskii Inequality. Its proof is available in many textbooks, and thus we omit it, leaving it as an exercise for the reader.

The norm in the \({L}^{2}\) space comes from the inner product, \(\vert \vert f\vert {\vert }_{2} = {(f,f)}^{1/2}\), where

$$(f,g) ={ \int }_{{\it\Omega} }f\overline{g}d\mu. $$

The set \({L}^{2}\) equipped with this inner product is a Hilbert space.

8 Monte Carlo Method

Consider a bounded measurable set \(U \subset {\mathbb{R}}^{d}\) and a bounded measurable function \(f : U \rightarrow \mathbb{R}\). In this section we shall discuss a numerical method for evaluating the integral \(I(f) ={ \int }_{U}f(x)d{x}_{1}\ldots d{x}_{d}\).

One way to evaluate such an integral is based on approximating it by Riemann sums. Namely, the set U is split into measurable subsets U 1,…,U n with small diameters, and a point x i is selected in each of the subsets U i . Then the sum \({\sum }_{i=1}^{n}f({x}_{i})\lambda ({U}_{i})\), where \(\lambda ({U}_{i})\) is the measure of U i , serves as an approximation to the integral. This method is effective provided that f does not change much for a small change of the argument (for example, if its gradient is bounded), and if we can split the set U into a reasonably small number of subsets with small diameters (so that n is not too large for a computer to handle the summation).

On the other hand, consider the case when U is a unit cube in \({\mathbb{R}}^{d}\), and d is large (say, \(d = 20\)). If we try to divide U into cubes \({U}_{i}\), each with the side of length 1 ∕ 10 (these may still be rather large, depending on the desired accuracy of the approximation), there will be \(n = 1{0}^{20}\) of such sub-cubes, which shows that approximating the integral by the Riemann sums cannot be effective in high dimensions.

Now we describe the Monte Carlo method of numerical integration. Consider a homogeneous sequence of independent trials \(\omega = ({\omega }_{1},{\omega }_{2},\ldots )\), where each \({\omega }_{i} \in U\) has uniform distribution in U, that is \(\mathrm{P}({\omega }_{i} \in V ) = \lambda (V )/\lambda (U)\) for any measurable set \(V \subseteq U\). If U is a unit cube, such a sequence can be implemented in practice with the help of a random number generator. Let

$${I}^{n}(\omega ) ={\sum \limits_{i=1}^{n}}f({\omega }_{ i}).$$

We claim that \({I}^{n}/n\) converges (in probability) to \(I(f)/\lambda (U)\).

Theorem 3.36

For every bounded measurable function f and every \(\epsilon > 0\)

$$\lim\limits_{n\rightarrow \infty}\mathrm{P}(\vert \frac{{I}^{n}} {n} - \frac{I(f)} {\lambda (U)}\vert < \epsilon ) = 1.$$

Proof

Let \(\epsilon > 0\) be fixed, and assume that \(\vert f(x)\vert \leq M\) for all \(x \in U\) and some constant M. We split the interval \([-M,M]\) into k disjoint sub-intervals \({\Delta }_{1},\ldots, {\Delta }_{k}\), each of length not greater than \(\epsilon /3\). The number of such intervals should not need to exceed \(1 + 6M/\epsilon \). We define the sets \({U}_{j}\) as the pre-images of \({\Delta }_{j}\), that is \({U}_{j} = {f}^{-1}({\Delta }_{j})\). Let us fix a point \({a}_{j}\) in each \({\Delta }_{j}\). Let \({\nu }_{j}^{n}(\omega )\) be the number of those \({\omega }_{i}\) with \(1 \leq i \leq n\), for which \({\omega }_{i} \in {U}_{j}\). Let \({J}^{n}(\omega ) ={ \sum }_{j=1}^{k}{a}_{j}{\nu }_{j}^{n}(\omega )\).

Since \(f(x)\) does not vary by more than \(\epsilon /3\) on each of the sets \({U}_{j}\),

$$\vert \frac{{I}^{n}(\omega )} {n} -\frac{{J}^{n}(\omega )} {n} \vert \leq \frac{\epsilon } {3}\ \ \mathrm{and}\ \ \frac{\vert I(f) -{\sum }_{j=1}^{k}{a}_{j}\lambda ({U}_{j})\vert } {\lambda (U)} \leq \frac{\epsilon } {3}.$$

Therefore, it is sufficient to demonstrate that

$${\lim \limits_{n\rightarrow \infty}}\mathrm{P}(\vert \frac{{J}^{n}} {n} -\frac{{\sum _{j=1}^{k}}{a}_{j}\lambda ({U}_{j})} {\lambda (U)} \vert < \frac{\epsilon } {3}) = 1\, $$

or, equivalently,

$$\lim \limits_{n\rightarrow \infty}\mathrm{P}(\vert \mathop{\sum \limits_{j=1}^{k}}{a}_{ j}(\frac{{\nu }_{j}^{n}} {n} -\frac{\lambda ({U}_{j})} {\lambda (U)} )\vert < \frac{\epsilon } {3}) = 1.$$

This follows from the law of large numbers, which states that \({\nu }_{j}^{n}/n\) converges in probability to \(\lambda ({U}_{j})/\lambda (U)\) for each j.

Remark 3.37

Later we shall prove the so-called strong law of large numbers, which will imply the almost sure convergence of the approximations in the Monte Carlo method (see Chap. 7). It is important that the convergence rate (however it is defined) can be estimated in terms of \(\lambda (U)\) and \(\sup_{x\in U}\vert f(x)\vert \), independently of the dimension of the space and the smoothness of the function f.

9 Problems

  1. 1.

    Let \({f}_{n}\), \(n \geq 1\), and f be measurable functions on a measurable space \(({\it\Omega}, \mathcal{F})\). Prove that the set \(\{\omega {:\lim }_{n\rightarrow \infty }{f}_{n}(\omega ) = f(\omega )\}\) is \(\mathcal{F}\)-measurable. Prove that the set \(\{\omega {:\lim }_{n\rightarrow \infty }{f}_{n}(\omega )\ \ \mathrm{exists}\}\) is \(\mathcal{F}\)-measurable.

  2. 2.

    Prove that if a random variable \(\xi \) taking non-negative values is such that

    $$\mathrm{P}(\xi \geq n) \geq 1/n\ \ \ \mathrm{for}\ \mathrm{all}\ n \in \mathbb{N},$$

    then \(\mathrm{E}\xi = \infty \).

  3. 3.

    Construct a sequence of random variables \({\xi }_{n}\) such that \({\xi }_{n}(\omega ) \rightarrow 0\) for every \(\omega \), but \(\mathrm{E}{\xi }_{n} \rightarrow \infty \) as \(n \rightarrow \infty \).

  4. 4.

    A random variable \(\xi \) takes values in the interval \([A,B]\) and \(\mathrm{Var}(\xi ) = {((B - A)/2)}^{2}\). Find the distribution of \(\xi \).

  5. 5.

    Let \(\{{x}_{1},{x}_{2},\ldots \}\) be a collection of rational points from the interval [0, 1]. A random variable \(\xi \) takes the value \({x}_{n}\) with probability \(1/{2}^{n}\). Prove that the distribution function \({F}_{\xi }(x)\) of \(\xi \) is continuous at every irrational point \(x\).

  6. 6.

    Let \(\xi \) be a random variable with a continuous density \({p}_{\xi }\) such that \({p}_{\xi }(0)\,>\,0\). Find the density of \(\eta \), where

    $$\eta (\omega ) = \left\{\begin{array}{ll} 1/\xi (\omega )\ \ \ \text{ if}\xi (\omega )\neq 0,\\ 0\ \ \ \ \ \ \ \ \ \ \text{ if} \xi (\omega ) = 0. \end{array} \right.$$

    Prove that \(\eta \) does not have a finite expectation.

  7. 7.

    Let \({\xi }_{1},{\xi }_{2},\ldots \) be a sequence of random variables on a probability space \(({\it\Omega}, \mathcal{F},\mathrm{P})\) such that \(\mathrm{E}\vert {\xi }_{n}\vert \leq {2}^{-n}\). Prove that \({\xi }_{n} \rightarrow 0\) almost surely as \(n \rightarrow \infty \).

  8. 8.

    Prove that if a sequence of measurable functions f n converges to f almost surely as \(n \rightarrow \infty \), then it also converges to f in measure. If \({f}_{n}\) converges to f in measure, then there is a subsequence \({f}_{{n}_{k}}\) which converges to f almost surely as \(k \rightarrow \infty \).

  9. 9.

    Let F(x) be a distribution function. Compute \({\int }_{-\infty }^{\infty }(F(x + 10) - F(x))dx.\)

  10. 10.

    Prove that a measure \(\eta \) is absolutely continuous with respect to a measure \(\mu \) if and only if for any \(\epsilon > 0\) there is a \(\delta > 0\) such that \(\mu (A) < \delta \) implies that \(\eta (A) < \epsilon \).

  11. 11.

    Prove that the \({L}^{p}([0,1],\mathcal{B},\lambda )\) spaces are complete for \(1 \leq p < \infty \). Here \(\mathcal{B}\) is the \(\sigma \)-algebra of Borel sets, and \(\lambda \) is the Lebesgue measure.

  12. 12.

    Prove the H\(\ddot{\mathrm{o}}\)lder Inequality.

  13. 13.

    Let \({\xi }_{1},{\xi }_{2},\ldots \) be a sequence of random variables on a probability space \(({\it\Omega}, \mathcal{F},\mathrm{P})\) such that \(\mathrm{E}{\xi }_{n}^{2} \leq c\) for some constant c. Assume that \({\xi }_{n} \rightarrow \xi \) almost surely as \(n \rightarrow \infty \). Prove that \(\mathrm{E}\xi \) is finite and \(\mathrm{E}{\xi }_{n} \rightarrow \mathrm{ E}\xi \).