Abstract
The goals of this chapter are to: Explore the conditions under which there is equality between the Kantorovich and the Kantorovich–Rubinstein functionals; Provide inequalities between the Kantorovich and Kantorovich–Rubinstein functionals; Provide criteria for convergence, compactness, and completeness of probability measures in probability spaces involving the Kantorovich and Kantorovich–Rubinstein functionals; Analyze the problem of uniformity between the two functionals.
Access provided by Autonomous University of Puebla. Download chapter PDF
Similar content being viewed by others
Keywords
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.
The goals of this chapter are to:
-
Explore the conditions under which there is equality between the Kantorovich and the Kantorovich–Rubinstein functionals;
-
Provide inequalities between the Kantorovich and Kantorovich–Rubinstein functionals;
-
Provide criteria for convergence, compactness, and completeness of probability measures in probability spaces involving the Kantorovich and Kantorovich–Rubinstein functionals;
-
Analyze the problem of uniformity between the two functionals.
Notation introduced in this chapter:
1 Introduction
In Chap. 5, we discussed the Kantorovich and Kantorovich–Rubinstein functionals. They generate minimal distances, \(\widehat{\mu }_{c}\), and minimal norms, \(\mu ^{ \circ }_{c}\), respectively, and we considered the problem of evaluating these functionals. The similarities between the two functionals indicate there can be quantitative relationships between them.
In this chapter, we begin by exploring the conditions under which \(\widehat{\mu }_{c} ={\mathop{\mu}\limits^{\circ}}_{c}\). It turns out that equality holds if and only if the cost function c(x, y) is a metric itself. Under more general conditions, certain inequalities hold involving \(\widehat{\mu }_{c}\), \(\mu ^{ \circ }_{c}\), and other probability metrics. These inequalities imply criteria for convergence, compactness, and uniformity in the spaces of probability measures \((\mathcal{P}(U),\widehat{\mu }_{c})\) and \((\mathcal{P}(U), \mu ^{ \circ }_{c})\). Finally, we conclude with a generalization of the Kantorovich and Kantorovich–Rubinstein functionals.
2 Equivalence Between Kantorovich Metric and Kantorovich–Rubinstein Norm
Levin [1975] proved that if U is a compact, c(x, x) = 0, c(x, y) > 0, and c(x, y) + c(y, x) > 0 for x≠y, then \(\widehat{\mu }_{c} ={\mathop{\mu}\limits^{\circ}}_{c}\) if and only if c(x, y) + c(y, x) is a metric in U. In the case of an s.m.s. U, we have the following version of Levin’s result.
Theorem 6.2.1 (Neveu and Dudley 1980).
Suppose U is an s.m.s. and \(c \in { \mathfrak{C}}^{{_\ast}}\) (Corollary 5.3.1). Then
for all P 1 and P 2 with
if and only if c is a metric.
Proof.
Suppose (6.2.1) holds and set P 1 = δ x and P 2 = δ y for x, y ∈ U. Then the set \({\mathcal{P}}^{(P_{1},P_{2})}\) of all laws in U ×U with marginals P 1 and P 2 contains only P 1 ×P 2 = δ(x, y), and by Theorem 5.4.2,
By assumption, \(c \in {\mathfrak{C}}^{{_\ast}}\), and therefore the triangle inequality implies that c is a metric in U.
Now define \(\mathcal{G}(U)\) as the set of all pairs (f, g) of continuous functions f : U → ℝ and g : U → ℝ such that f(x) + g(y) < c(x, y) ∀x, y ∈ U. Let \(\mathcal{G}_{B}(U)\) be the set of all pairs \((f,g) \in \mathcal{G}(U)\) with f and g bounded.
Now suppose that c(x, y) is a metric and that \((f,g) \in \mathcal{G}_{B}(U)\). Define \(h(x) =\inf \{ c(x,y) - g(y) : y \in U\}\). As the infimum of a family of continuous functions, h is upper semicontinuous. For each x ∈ U we have f(x) ≤ h(x) ≤ − g(x). Then
so that \(\|h\|_{c} \leq 1\). Then for P 1, P 2 satisfying (6.2.2) we have
so that (according to Corollary 5.3.1 and Theorem 5.4.2 of Chap. 5) we have
Thus \(\widehat{\mu }_{c}(P_{1},P_{2}) ={\mathop{\mu}\limits^{\circ}}_{c}(P_{1},P_{2})\).
Corollary 6.2.1.
Let (U,d) be an s.m.s. and a ∈ U. Then
whenever
The supremum is attained for some optimal f 0 with \(\|f_{0}\|_{L} :=\mathop{\sup}\limits_{x\neq y}\{\vert f(x) - f(y)\vert /d(x,y)\}\) .
If P 1 and P 2 are tight, there are some \(b_{0} \in {\mathcal{P}}^{(P_{1},P_{2})}\) and f 0 : U → ℝ with \(\|f_{0}\|_{L} \leq 1\) such that
where \(f_{0}(x) - f_{0}(y) = d(x,y)\) for b 0 -a.e. (x,y) in U × U.
Proof.
Set c(x, y) = d(x, y). Application of the theorem proves the first statement. The second (existence of f 0) follows from Theorem 5.4.3.
For each n ≥ 1 choose \(b_{n} \in {\mathcal{P}}^{(P_{1},P_{2})}\) with
If P 1 and P 2 are tight, then by Corollary 5.3.1 there exists \(b_{0} \in {\mathcal{P}}^{(P_{1},P_{2})}\) such that
i.e., that b 0 is optimal. Integrating both sides of f 0(x) − f 0(y) ≤ d(x, y) with respect to b 0 yields ∫f 0d(P 1 − P 2) ≤ ∫d(x, y)b 0(dx, dy). However, we know that we have equality of these integrals. This implies that \(f_{0}(x) - f_{0}(y) = d(x,y)\ b_{0}\)-a.e.
3 Inequalities Between \(\widehat{\mu }_{c}\) and \({\mathop{\mu}\limits^{\circ}}_{c}\)
In the previous section we looked at conditions under which \(\widehat{\mu }_{c} ={\mathop{\mu}\limits^{\circ}}_{c}\). In general, \(\widehat{\mu }_{c}\neq \mu ^{ \circ }_{c}\). For example, if U = ℝ, \(d(x,y) = \vert x - y\vert \),
then for any laws P i (i = 1, 2) on ℬ(R) with distribution functions (DFs) F i we have the following explicit expressions:
where F i − 1 is the function inverse to the DF F i (see Theorem 7.4.2 in Chap. 7). On the other hand,
(see Theorem 5.5.1 in Chap. 5). However, in the space ℳ p = ℳ p (U) [U = (U, d) is an s.m.s.] of all Borel probability measures P with finite ∫d p(x, a)P(dx), the functionals \(\widehat{\mu }_{c}\) and \(\mu ^{ \circ }_{c}\) [where c is given by (6.3.1)] metrize the same exact topology, that is, the following \(\widehat{\mu }_{c}\)- and \(\mu ^{ \circ }_{c}\)-convergence criteria will be proved.
Theorem 6.3.1.
Let (U,d) be an s.m.s., let c be given by (6.3.1), and let P,Pn ∈ℳp (\(n = 1,2,\ldots \) ). Then the following relations are equivalent:
-
(I)
$$\widehat{\mu }_{c}(P_{n},P) \rightarrow 0;$$
-
(II)
$$\mu ^{ \circ }_{c}(P_{n},P) \rightarrow 0;$$
-
(III)
$$P_{n}\ \mbox{ converges weakly to}\ P\ (P_{n}\stackrel{\mathrm{w}}{\rightarrow }P)\ \mbox{ and}$$$$\lim _{N\rightarrow \infty }\mathop{\sup}\limits_{n} \int \nolimits {d}^{p}(x,a)I\{d(x,a) > N\}P_{ n}(\mathrm{d}x) = 0;$$
-
(IV)
$$P_{n}\stackrel{\mathrm{w}}{\rightarrow }P\ \mbox{ and}\ \int \nolimits {d}^{p}(x,a)P_{ n}(\mathrm{d}x) \rightarrow \int \nolimits {d}^{p}(x,a)P(\mathrm{d}x).$$
(The assertion of the theorem is an immediate consequence of Theorems 6.3.2–6.3.5 below and the more general Theorem 6.4.1).
Theorem 6.3.1 is a qualitative \(\widehat{\mu }_{c}\) (\(\mu ^{ \circ }_{c}\))-convergence criterion. One can rewrite (III) as
where \(\boldsymbol \pi \) is the Prokhorov metricFootnote 1
and ω(ε; P; λ) is the following modulus of λ-integrability:
where λ(x) : = max(d(x, a), d p(x, a)). Analogously, (IV) is equivalent to
-
(IV ∗ )
$$\boldsymbol \pi (P_{n},P) \rightarrow 0\ \mbox{ and }\ D(P_{n},P;\lambda ) \rightarrow 0,$$
where
In this section we investigate quantitative relationships between \(\widehat{\mu }_{c}\), \(\mu ^{ \circ }_{c}\), \(\boldsymbol \pi \), ω, and D in terms of inequalities between these functionals. These relationships yield convergence and compactness criteria in the space of measures w.r.t. the Kantorovich-type functionals \(\widehat{\mu }_{c}\) and \(\mu ^{ \circ }_{c}\) (see Examples 3.3.2 and 3.3.6 in Chap. 3) as well as the \(\mu ^{ \circ }_{c}\)-completeness of the space of measures.
In what follows, we assume that the cost function c has the form considered in Example 5.2.1:
where k 0(t, s) is a symmetric continuous function nondecreasing on both arguments t ≥ 0, s ≥ 0, and satisfying the following conditions:
-
(C1)
$$\alpha :=\mathop{\sup}\limits_{s\neq t}\frac{\vert K(t) - K(s)\vert } {\vert t - s\vert k_{0}(t,s)} < \infty,$$
where K(t) : = tk 0(t, t), t≠0;
-
(C2)
$$\beta := k(0) > 0,$$
where k(t) = k 0(t, t) t ≥ 0; and
-
(C3)
$$\gamma :=\mathop{\sup}\limits_{t\geq 0,s\geq 0}\frac{k_{0}(2t,2s)} {k_{0}(t,s)} < \infty.$$
If c is given by (6.3.1), then c admits the form (6.3.7) with k 0(t, s) = max(1, \({t}^{p-1},{s}^{p-1})\), and in this case α = p, β = 1, \(\gamma = {2}^{p-1}\). Further, let \(\mathcal{P}_{\lambda } = \mathcal{P}_{\lambda }(U)\) be the space of all probability measures on the s.m.s. (U, d) with finite λ-moment
where λ(x) = K(d(x, a)) and a is a fixed point of U.
In Theorems 6.3.2–6.3.5 we assume that \(P_{1} \in \mathcal{P}_{\lambda }\), \(P_{2} \in \mathcal{P}_{\lambda }\), ε > 0, and denote \(\widehat{\mu }_{c} :=\widehat{ \mu }_{c}(P_{1},P_{2})\) [see (5.2.16)], \(\mu ^{ \circ }_{c} :={\mathop{\mu}\limits^{\circ}}_{c}(P_{1},P_{2})\) [see (5.2.17)], \(\boldsymbol \pi := \boldsymbol \pi (P_{1},P_{2})\),
and the function c satisfies conditions (C1)–(C3). We begin with an estimate of \(\widehat{\mu }_{c}\) from above in terms of \(\boldsymbol \pi \) and ω i (ε).
Theorem 6.3.2.
Proof.
Recall that \({\mathcal{P}}^{(P_{1},P_{2})}\) is the space of all laws P on U ×U with prescribed marginals P 1 and P 2. Let K = K 1 be the Ky Fan metric with parameter 1 (see Example 3.4.2 in Chap. 3)
Claim 1.
For any N > 0 and for any measure P on U 2 with marginals P 1 and P 2, i.e., \(P \in {\mathcal{P}}^{(P_{1},P_{2})}\), we have
Proof of Claim 1. Suppose K(P) < σ ≤ 1, \(P \in {\mathcal{P}}^{(P_{1},P_{2})}\). Then by (6.3.7) and (C3),
where
and
Let us estimate I 1:
where
Obviously, by λ(x) : = K(d(x, a)), \(I_{11} \leq \delta \int \nolimits k(d(x,a))I\{d(x,a) \geq 1\}P_{1}(\mathrm{d}x) + \delta k(1) \leq \delta \omega _{1}(1) + \delta k(1)\). Further,
Now let us estimate the last term in estimate (6.3.12):
Summarizing the preceding estimates we obtain \(I_{1} \leq \delta \omega _{1}(1) + \delta k(1) + 3\omega _{1}(1/N) + 2\omega _{2}(1/N) + 2K(N)\delta \). By symmetry we have \(I_{2} \leq \delta \omega _{2}(1) + \delta k(1) + 3\omega _{2}(1/N) + 2\omega _{1}(1/N) + 2K(N)\delta \). Therefore, the last two estimates imply
Letting δ → K(P) we obtain (6.3.11), which proves the claim.
Claim 2 (Strassen–Dudley Theorem).
Proof of Claim 2. See Dudley [2002] (see also further Corollary 7.5.2 in Chap. 7).
Claims 1 and 2 complete the proof of the theorem.
The next theorem shows that \(\widehat{\mu }_{c}\)-convergence and \(\mu ^{ \circ }_{c}\)-convergence imply the weak convergence of measures.
Theorem 6.3.3.
Proof.
Obviously, for any continuous nonnegative function c,
and
where \(\boldsymbol \zeta _{c}\) is the Zolatarev simple metric with a ζ-structure (Definition 4.4.1)
Now, using assumption (C2) we have that c(x, y) ≥ βd(x, y) and, hence, \(\boldsymbol \zeta _{c} \geq \beta \boldsymbol \zeta _{d}\). Thus, by (6.3.16),
Claim 3.
Proof of Claim 3. Using the dual representation of \(\widehat{\mu }_{d}\) [see (6.2.3)] we are led to
which in view of the inequality
establishes (6.3.19). The proof of the claim is now completed.
The desired inequalities (6.3.14) are the consequence of (6.3.15), (6.3.16), (6.3.18), and Claim 3.
The next theorem establishes the uniform λ-integrability
of the sequence of measures \(P_{n} \in \mathcal{P}_{\lambda }\) \(\mu ^{ \circ }_{c}\)-converging to a measure \(P \in \mathcal{P}_{\lambda }\).
Theorem 6.3.4.
Proof.
For any N > 0, by the triangle inequality, we have
where
and
Claim 4.
Proof of Claim 4. Denote \(f_{N}(x) := (1/\alpha )\max (\lambda (x),K(2N))\). Since \(\lambda (x) = K(d(x,a)) = d(x,a)k_{0}(d(x,a),d(x,a))\), then by (C1),
for any x, y ∈ U. Thus the inequalities
follow from (6.3.16) and (6.3.17). Since αf N (x) = max(K(d(x, a)), K(2N)) and (6.3.25) holds, then
which proves the claim.
Claim 5.
Proof of Claim 5. As in the proof of Claim 4, we choose an appropriate Lipschitz function. That is, write
where O(a, N) : = { x : d(x, a) < N}. Using (C1) and (C3),
Hence
Using (6.3.27) and the implications
we obtain the following chain of inequalities:
which proves the claim.
For A(P 2) [see (6.3.26)] we have the following estimate:
Summarizing (6.3.23), (6.3.24), (6.3.26), and (6.3.29) we obtain
for any N > 0, as desired.
The next theorem shows that \(\mu ^{ \circ }_{c}\)-convergence implies convergence of the λ-moments.
Theorem 6.3.5.
Proof.
By (C1), for any finite nonnegative measure Q with marginals P 1 and P 2 we have
which completes the proof of (6.3.30).
Inequalities (6.3.9), (6.3.14), (6.3.22), and (6.3.30), described in Theorems 6.3.2–6.3.5, imply criteria for convergence, compactness, and uniformity in the spaces of probability measures \((\mathcal{P}(U),\widehat{\mu }_{c})\) and \((\mathcal{P}(U), \mu ^{ \circ }_{c})\) (see also the next section). Moreover, the estimates obtained for \(\widehat{\mu }_{c}\) and \(\mu ^{ \circ }_{c}\) may be viewed as quantitative results demanding conditions that are necessary and sufficient for \(\widehat{\mu }_{c}\)-convergence and \(\mu ^{ \circ }_{c}\)-convergence. Note that, in general, quantitative results require assumptions additional to the set of necessary and sufficient conditions implying the qualitative results. The classic example is the central limit theorem, where the uniform convergence of the normalized sum of i.i.d. RVs can be at any low rate assuming only the existence of the second moment.
4 Convergence, Compactness, and Completeness in \((\mathcal{P}(U),\widehat{\mu }_{c})\) and \((\mathcal{P}(U),{\mathop{\mu}\limits^{\circ}}_{c})\)
In this section, we assume that the cost function c satisfies conditions (C1)–(C3) in the previous section and λ(x) = K(d(x, a)). We begin with the criterion for \(\widehat{\mu }_{c}\)- and \(\mu ^{ \circ }_{c}\)-convergence.
Theorem 6.4.1.
If P n , and \(P \in \mathcal{P}_{\lambda }(U)\) , then the following statements are equivalent
-
(A)
$$\widehat{\mu }_{c}(P_{n},P) \rightarrow 0;$$
-
(B)
$$\mu ^{ \circ }_{c}(P_{n},P) \rightarrow 0;$$
-
(C)
$$P_{n}\stackrel{\mathrm{w}}{\rightarrow }P\ (P_{n}\ \mbox{ converges weakly to}\ P)\ \mbox{ and}\ \int \nolimits \lambda \mathrm{d}(P_{n} - P) \rightarrow 0\ \mbox{ as}\ n \rightarrow \infty ;$$
-
(D)
$$P_{n}\stackrel{\mathrm{w}}{\rightarrow }P\ \mbox{ and }\ \lim _{\epsilon \rightarrow 0}\mathop{\sup}\limits_{n}\omega _{n}(\epsilon ) = 0,$$
where \(\omega _{n}(\epsilon ) := \omega (\epsilon ;P_{n};\lambda ) = \int \nolimits \lambda (x)\{d(x,a) > 1/\epsilon \}P_{n}(\mathrm{d}x)\) .
Proof.
From inequality (6.3.14) it is apparent that A ⇒ B and B ⇒ P n w →P. Using (6.3.30) we obtain that B implies ∫λd(P n − P) → 0, and thus B ⇒ C. Now, let C hold.
Claim 6.
C ⇒ D.
Proof of Claim 6. Choose a sequence ε1 > ε2 > ⋯ → 0 such that \(P(d(x,a) = 1/\epsilon _{n}) = 0\) for any \(n = 1,2,\ldots \). Then for fixed n
by Billingsley [1999, Theorem 5.1]. Since \(P \in \mathcal{P}_{\lambda }\), ω(ε n ) : = ω(ε n ; P; c) → 0 as n → ∞, and hence
The last inequality and \(P_{k} \in \mathcal{P}_{\lambda }\) imply limε → 0sup n ω n (ε) = 0, and hence D holds.
The claim is proved.
Claim 7.
D ⇒ A.
Proof of Claim 7. By Theorem 6.3.2,
where ω n and ω are defined as in Claim 6 and, moreover, ε n > 0 is such that
Hence, using the last two inequalities we obtain
and hence D ⇒ A, as we claimed.
The Kantorovich–Rubinstein functional \(\mu ^{ \circ }_{c}\) is a metric in \(\mathcal{P}_{\lambda }(U)\), while \(\widehat{\mu }_{c}\) is not a metric except for the case c = d (see the discussion in the previous section). The next theorem establishes a criterion for \(\mu ^{ \circ }_{c}\)-relative compactness of sets of measures. Recall that a set \(\mathcal{A}\subset \mathcal{P}_{\lambda }\) is said to be \(\mu ^{ \circ }_{c}\)-relatively compact if any sequence of measures in \(\mathcal{A}\) has a \(\mu ^{ \circ }_{c}\)-convergent subsequence and the limit belongs to \(\mathcal{P}_{\lambda }\). Recall that the set \(\mathcal{A}\subset \mathcal{P}(U)\) is weakly compact if \(\mathcal{A}\) is \(\boldsymbol \pi \)-relatively compact, i.e., any sequence of measures in \(\mathcal{A}\) has a weakly (\(\boldsymbol \pi \)-) convergent subsequence.
Theorem 6.4.2.
The set \(\mathcal{A}\subset \mathcal{P}_{\lambda }\) is \(\mu ^{ \circ }_{c}\) -relatively compact if and only if \(\mathcal{A}\) is weakly compact and
Proof.
“If” part: If \(\mathcal{A}\) is weakly compact, (6.4.1) holds and \(\{P_{n}\}_{n\geq 1} \subset \mathcal{A}\), then we can choose a subsequence {P n′ } ⊂ { P n } that converges weakly to a probability measure P.
Claim 8.
\(P \in \mathcal{P}_{\lambda }\).
Proof of Claim 8. Let 0 < α1 < α2 < ⋯ , limα n = ∞ be such a sequence that \(P(d(x,a) = \alpha _{n}) = 0\) for any n ≥ 1. Then, by Billingsley [1999, Theorem 5.1] and (6.4.1),
which proves the claim.
Claim 9.
Proof of Claim 9. Using Theorem 6.3.2, Claim 8, and (6.4.1) we have, for any δ > 0,
if ε = ε(δ) is small enough. Hence, by \(\boldsymbol \pi (P_{n\prime},P) \rightarrow 0\), we can choose N = N(δ) such that \(\mu ^{ \circ }_{c}(P_{n\prime},P) < 2\delta \) for any n′ ≥ N, as desired.
Claims 8 and 9 establish the “if” part of the theorem.
“Only if” part: If \(\mathcal{A}\) is \(\mu ^{ \circ }_{c}\)-relatively compact and \(\{P_{n}\} \subset \mathcal{A}\), then there exists a subsequence {P n′ } ⊂ { P n } that is convergent w.r.t. \(\mu ^{ \circ }_{c}\), and let P be the limit. Hence, by Theorem 6.3.3, \(\mu ^{ \circ }_{c}(P_{n},P) \geq \beta {\boldsymbol \pi }^{2}(P_{n},P) \rightarrow 0\), which demonstrates that the set \(\mathcal{A}\) is weakly compact.
Further, if (6.4.1) is not valid, then there exists δ > 0 and a sequence {P n } such that
Let {P n′ } be a \(\mu ^{ \circ }_{c}\)-convergent subsequence of {P n }, and let \(P \in \mathcal{P}_{\lambda }\) be the corresponding limit. By Theorem 6.3.4, \(\omega (1/n\prime;P_{n\prime};\lambda ) \geq (2\gamma + 2)(\alpha \mu ^{ \circ }_{c}(P_{n\prime},P) + \omega (1/n\prime;P;\lambda )) \rightarrow 0\) as n′ → ∞, which is in contradiction with (6.4.2).
In the light of Theorem 6.4.1, we can now interpret Theorem 6.4.2 as a criterion for \(\mu ^{ \circ }_{c}\)-relative compactness of sets of measures in \(\mathcal{P}\) by simply changing \(\mu ^{ \circ }_{c}\) with \(\widehat{\mu }_{c}\) in the formation of the last theorem.
The well-known Prokhorov theorem says that (U, d) is a complete s.m.s; then the set of all laws onUis complete w.r.t. the Prokhorov metric \(\boldsymbol \pi \).Footnote 2 The next theorem is an analog of the Prokhorov theorem for the metric space \(\mathcal{P}_{\lambda }, \mu ^{ \circ }_{c})\).
Theorem 6.4.3.
If (U,d) is a complete s.m.s., then \((\mathcal{P}_{\lambda }(U), \mu ^{ \circ }_{c})\) is also complete.
Proof.
If {P n } is a \(\mu ^{ \circ }_{c}\)-fundamental sequence, then by Theorem 6.3.3, {P n } is also \(\boldsymbol \pi \)-fundamental, and hence there exists the weak limit \(P \in \mathcal{P}(U)\).
Claim 10.
\(P \in \mathcal{P}_{\lambda }\).
Proof of Claim 10. Let ε > 0 and \(\mu ^{ \circ }_{c}(P_{n},P_{m}) \leq \epsilon \) for any n, m ≥ n ε. Then, by Theorem 6.3.5, \(\left \vert \int \nolimits \lambda (x)(P_{n} - P_{n_{\epsilon }})(\mathrm{d}x)\right \vert < \alpha \epsilon \) for any n > n ε; hence,
Choose the sequence 0 < α1 < α2 < ⋯ , lim k → ∞ α k = ∞, such that \(P(d(x,a) = \alpha _{k}) = 0\) for any k > 1. Then
Letting k → ∞ the assertion follows.
Claim 11.
Proof of Claim 11. Since \(\mu ^{ \circ }_{c}(P_{n},P_{n_{\epsilon }}) \leq \epsilon \) for any n ≥ n ε, then, by Theorem 6.3.4,
for any δ > 0. The last inequality and Theorem 6.3.2 yield
for any n ≥ n ε and δ > 0. Next, choose δ n = δ n, ε > 0 such that δ n → 0 as n → ∞ and
Combining (6.4.3) and (6.4.4) we have that \(\mu ^{ \circ }_{c}(P_{n},P) \leq \mathrm{ const.}\) ε for n large enough, which proves the claim.
5 \({\mathop{\mu}\limits^{\circ}}_{c}\)- and \(\widehat{\mu }_{c}\)-Uniformity
In the previous section, we saw that \({\mathop{\mu}\limits^{\circ}}_{c}\) and \(\widehat{\mu }_{c}\) induce the same exact convergence in \(\mathcal{P}_{\lambda }\). Here we would like to analyze the uniformity of \({\mathop{\mu}\limits^{\circ}}_{c}\) and \(\widehat{\mu }_{c}\)-convergence. Namely, if for any \(P_{n},Q_{n} \in \mathcal{P}_{\lambda }\), the equivalence
holds. Obviously, ⇐ , by \({\mathop{\mu}\limits^{\circ}}_{c}(P_{n},Q_{n}) \leq \widehat{ \mu }_{c}\). So, if
for a continuous nondecreasing function, ϕ(0) = 0, then (6.5.1) holds.
Remark 6.5.1.
Given two metrics, say μ and ν, in the space of measures, the equivalence of μ- and ν-convergence does not imply the existence of a continuous nondecreasing function ϕ vanishing at 0 and such that μ ≤ ϕ(ν). For example, both the Lévy metric L [see (4.2.3)] and the Prokhorov metric \(\boldsymbol \pi \) [see (3.3.18)] metrize the weak convergence in the space \(\mathcal{P}(\mathbb{R})\). Suppose there exists ϕ such that
for any real-valued r.v.s X and Y. (Recall our notation \(\mu (X,Y ) := \mu (\mathrm{Pr}_{X},\mathrm{Pr}_{Y })\) for any metric μ in the space of measures.) Then, by (4.2.4) and (3.3.23),
and
where \(\boldsymbol \rho \) is the Kolmogorov metric [see (4.2.6)] and \(\boldsymbol \sigma \) is the total variation metric [see (3.3.13)]. Thus, (6.5.3)–(6.5.5) imply that \(\boldsymbol \sigma (X,Y ) \leq \phi (\boldsymbol \rho (X,Y ))\). The last inequality simply is, however, not true because in general \(\boldsymbol \rho \)-convergence does not yield \(\boldsymbol \sigma \)-convergence. [For example, if X n is a random variable taking values k ∕ n, \(k = 1,\ldots,n\) with probability 1 ∕ n, then \(\boldsymbol \rho (X_{n},Y ) \rightarrow 0\) where Y is a (0, 1)-uniformly distributed random variable. On the other hand, \(\boldsymbol \sigma (X_{n},Y ) = 1\).]
We are going to prove (6.5.2) for the special but important case where \(\mu ^{ \circ }_{c}\) is the Fortet–Mourier metric on \(\mathcal{P}_{\lambda }(\mathbb{R})\), i.e., \(\mu ^{ \circ }_{c}(P,Q) = \boldsymbol \zeta (P,Q;{\mathcal{G}}^{p})\) [see (4.4.34)]; in other words, for any \(P,Q \in \mathcal{P}_{\lambda }\),
where
Since λ(x) : = 2max( | x | , | x | p), then \(\mathcal{P}_{\lambda }(\mathbb{R})\) is the space of all laws on ℝ, with finite pth absolute moment.
Theorem 6.5.1.
If c is given by (6.5.6), then
Proof.
Denote \(h(t) =\max (1,\vert t{\vert }^{p-1})\), t ∈ ℝ, and H(x) = ∫0 x h(t)dt, x ∈ ℝ. Let X and Y be real-valued RVs on a nonatomic probability space \((\Omega,\mathcal{A},\mathrm{Pr})\) with distributions P and Q, respectively. Theorem 5.5.1 gives us explicit representation of \(\mu ^{ \circ }_{c}\), namely,
and thus
Claim 12.
Let X and Y be real-valued RVs with distributions P and Q, respectively. Then
Proof of Claim 12. Using the equality \(\widehat{\mu }_{d} ={\mathop{\mu}\limits^{\circ}}_{d}\) [see (6.2.3) and (5.5.5)] with H(t) = t we have that
for any DFs F and G. Hence, by (6.5.9)
which proves the claim.
Next we use Theorem 2.7.2, which claims that on a nonatomic probability space, the class of all joint distributions \(\mathrm{Pr}_{X,Y }\) coincides with the class of all probability Borel measures on ℝ 2. This implies
Claim 13.
For any x, y ∈ ℝ, c(x, y) ≤ p | H(x) − H(y) | .
Proof of Claim 13.
-
(a)
Let y > x > 0. Then
$$\begin{array}{rcl} c(x,y)& =& (y - x)h(y) = yh(y) - xh(y) \leq yh(y) - xh(x) \\ & \leq & (H(y) - H(x))\mathop{\sup}\limits_{y>x>0}\frac{yh(y) - xh(x)} {H(y) - H(x)} \end{array}$$Since H(t) is a strictly increasing continuous function,
$$B :=\mathop{\sup}\limits_{y>x>0}\frac{yh(y) - xh(x)} {H(y) - H(x)} =\mathop{\sup}\limits_{t>s>0}\frac{f(t) - f(s)} {t - s},$$where \(f(t) := {H}^{-1}(t)h({H}^{-1}(t))\) and H − 1 is a function inverse to H; hence, B = ess sup t | f′(t) | ≤ p.
-
(b)
Let y > 0 > x > − y. Then \(c(x,y) = \vert x-y\vert h(y) = (y+(-x))h(y) = yh(y)+(-x)h(\vert x\vert )+((-x)h(y)-(-x)h(\vert x\vert )) \leq yh(y)+(-x)h(\vert x\vert )\). Since
$$th(t) = \left \{\begin{array}{lll} t &\mbox{ if}&t \leq 1,\\ {t}^{p } &\mbox{ if} &t \geq 1,\end{array} \right.\qquad H(t) = \left \{\begin{array}{lll} t &\mbox{ if}&0 < t \leq 1, \\ \dfrac{p - 1} {p} + \dfrac{1} {p}{t}^{p}&\mbox{ if}&t \geq 1, \end{array} \right.$$then \(yh(y) + (-x)h(\vert x\vert ) \leq p(H(y) + H(-x)) = p(H(y) - H(x))\). By symmetry, the other cases are reduced to (a) or (b). The claim is shown. Now, (6.5.7) is a consequence of Claims 12, 13, and (6.5.12).
6 Generalized Kantorovich and Kantorovich–Rubinstein Functionals
In this section, we consider a generalization of the Kantorovich-type functionals \(\widehat{\mu }_{c}\) and \(\mu ^{ \circ }_{c}\) [see (5.2.16) and (5.2.17)].
Let U = (U, d) be an s.m.s. and ℳ(U ×U) the space of all nonnegative Borel measures on the Cartesian product U ×U. For any probability measures P 1 and P 2 define the sets \({\mathcal{P}}^{(P_{1},P_{2})}\) and \({\mathcal{Q}}^{(P_{1},P_{2})}\) as in Sect. 5.2 [see (5.2.2) and (5.2.13)].
Let Λ : ℳ(U ×U) → [0, ∞] satisfy the conditions
-
1.
Λ(αP) = αΛ(P) ∀α ≥ 0,
-
2.
\(\Lambda (P + Q) \leq \Lambda (P) + \Lambda (Q)\) ∀P and Q in ℳ(U ×U).
We introduce the generalized Kantorovich functional
and the generalized Kantorovich–Rubinstein functional
Example 6.6.1.
The Kantorovich metricFootnote 3
in the space of measures P with finite “first moment,” ∫d(x, a)P(dx) < ∞, has the dual representations \(\mathcal{l}_{1}(P_{1},P_{2}) =\mathop{ \Lambda }\limits^{ \circ } (P_{1},P_{2}) =\widehat{ \Lambda }(P_{1},P_{2})\), where
Example 6.6.2.
Let U = ℝ, \(d(x,y) = \vert x - y\vert \). Then
where F i is the DF of P i and
for RVs X and Y with \(\mathrm{Pr}_{X,Y } = P\). We generalize (6.6.3) as follows: for any 1 ≤ p ≤ ∞, define
where c t (t ∈ ℝ) is the following semimetric in ℝ
and λ( ⋅) is a nonnegative measure on ℝ. In the space \(\mathfrak{X} = \mathfrak{X}(\mathbb{R})\) of all real-valued RVs on a nonatomic probability space \((\Omega,\mathcal{A},\mathrm{Pr})\), the minimal metric w.r.t. Λ is given by
Similarly, the minimal norm with respect to Λ is
The next theorem gives the explicit form of \(\widehat{\Lambda }_{p}\) and \(\Lambda ^{ \circ }_{p}\).
Theorem 6.6.1.
Let F i be the DF of P i (i = 1,2). Then
where
Claim 14.
\(\boldsymbol \lambda _{p}(F_{1},F_{2}) \leq \mathop{ \Lambda }\limits^{ \circ }_{p}(P_{1},P_{2})\).
Proof of Claim 14. Let \(P \in {\mathcal{Q}}^{(P_{1},P_{2})}\). Then in view of Remark 2.7.2 in Chap. 2, there exist α > 0, \(X \in \mathfrak{X}\), \(Y \in \mathfrak{X}\), such that \(\alpha \mathrm{Pr}_{X,Y } = P\) and \(\alpha (F_{X} - F_{Y }) = F_{1} - F_{2}\); thus
By (6.6.7) and (6.6.11), it follows that \(\boldsymbol \lambda _{p}(F_{1},F_{2}) \leq \mathop{ \Lambda }\limits^{ \circ }_{p}(P_{1},P_{2})\), as desired.
Further
by the representations (6.6.6) and (6.6.7).
Claim 15.
Proof of claim 15. Let \(\widetilde{X} := F_{1}^{-1}(V )\), \(\widetilde{Y } := F_{2}^{-1}(V )\), where F i − 1 is the generalized inverse to the DF F i [see (3.3.16) in Chap. 3] and V is a (0, 1)-uniformly distributed RV. Then \(F_{\widetilde{X},\widetilde{Y }}(t,s) =\min (F_{1}(t),F_{2}(s))\) for all t, s ∈ ℝ. Hence, \(\phi _{t}(\widetilde{X},\widetilde{Y }) = \vert F_{1}(t) - F_{2}(t)\vert \), which proves the claim by using (6.6.6) and (6.6.7).
Combining Claims 14, 15, and (6.6.12) we obtain (6.6.9).
Problem 6.6.1.
In general, dual and explicit solutions of \(\widehat{\Lambda }_{p}\) and \( \mathop\Lambda\limits^{\circ }_{p}\) in (6.6.1) and (6.6.2) are not known.
References
Billingsley P (1999) Convergence of probability measures, 2nd edn. Wiley, New York
Dudley RM (2002) Real analysis and probability, 2nd edn. Cambridge University Press, New York
Hennequin PL, Tortrat A (1965) Théorie des probabilités et quelques applications. Masson, Paris
Levin VL (1975) On the problem of mass transfer. Sov Math Dokl 16:1349–1353
Neveu J, Dudley RM (1980) On Kantorovich–Rubinstein theorems (transcript)
Author information
Authors and Affiliations
Rights and permissions
Copyright information
© 2013 Springer Science+Business Media, LLC
About this chapter
Cite this chapter
Rachev, S.T., Klebanov, L.B., Stoyanov, S.V., Fabozzi, F.J. (2013). Quantitative Relationships Between Minimal Distances and Minimal Norms. In: The Methods of Distances in the Theory of Probability and Statistics. Springer, New York, NY. https://doi.org/10.1007/978-1-4614-4869-3_6
Download citation
DOI: https://doi.org/10.1007/978-1-4614-4869-3_6
Published:
Publisher Name: Springer, New York, NY
Print ISBN: 978-1-4614-4868-6
Online ISBN: 978-1-4614-4869-3
eBook Packages: Mathematics and StatisticsMathematics and Statistics (R0)