Keywords

These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

The goals of this chapter are to:

  • Explore the conditions under which there is equality between the Kantorovich and the Kantorovich–Rubinstein functionals;

  • Provide inequalities between the Kantorovich and Kantorovich–Rubinstein functionals;

  • Provide criteria for convergence, compactness, and completeness of probability measures in probability spaces involving the Kantorovich and Kantorovich–Rubinstein functionals;

  • Analyze the problem of uniformity between the two functionals.

Notation introduced in this chapter:

1 Introduction

In Chap. 5, we discussed the Kantorovich and Kantorovich–Rubinstein functionals. They generate minimal distances, \(\widehat{\mu }_{c}\), and minimal norms, \(\mu ^{ \circ }_{c}\), respectively, and we considered the problem of evaluating these functionals. The similarities between the two functionals indicate there can be quantitative relationships between them.

Table 1

In this chapter, we begin by exploring the conditions under which \(\widehat{\mu }_{c} ={\mathop{\mu}\limits^{\circ}}_{c}\). It turns out that equality holds if and only if the cost function c(x, y) is a metric itself. Under more general conditions, certain inequalities hold involving \(\widehat{\mu }_{c}\), \(\mu ^{ \circ }_{c}\), and other probability metrics. These inequalities imply criteria for convergence, compactness, and uniformity in the spaces of probability measures \((\mathcal{P}(U),\widehat{\mu }_{c})\) and \((\mathcal{P}(U), \mu ^{ \circ }_{c})\). Finally, we conclude with a generalization of the Kantorovich and Kantorovich–Rubinstein functionals.

2 Equivalence Between Kantorovich Metric and Kantorovich–Rubinstein Norm

Levin [1975] proved that if U is a compact, c(x, x) = 0, c(x, y) > 0, and c(x, y) + c(y, x) > 0 for xy, then \(\widehat{\mu }_{c} ={\mathop{\mu}\limits^{\circ}}_{c}\) if and only if c(x, y) + c(y, x) is a metric in U. In the case of an s.m.s. U, we have the following version of Levin’s result.

Theorem 6.2.1 (Neveu and Dudley 1980). 

Suppose U is an s.m.s. and \(c \in { \mathfrak{C}}^{{_\ast}}\) (Corollary 5.3.1). Then

$$\widehat{\mu }_{c}(P_{1},P_{2}) ={\mathop{\mu}\limits^{\circ}}_{c}(P_{1},P_{2})$$
(6.2.1)

for all P 1 and P 2 with

$$\int\nolimits_{U}c(x,a)(P_{1} + P_{2})(\mathrm{d}x) < \infty $$
(6.2.2)

if and only if c is a metric.

Proof.

Suppose (6.2.1) holds and set P 1 = δ x and P 2 = δ y for x, yU. Then the set \({\mathcal{P}}^{(P_{1},P_{2})}\) of all laws in U ×U with marginals P 1 and P 2 contains only P 1 ×P 2 = δ(x, y), and by Theorem 5.4.2,

$$\begin{array}{rcl} \widehat{\mu }_{c}(P_{1},P_{2})& =& c(x,y) ={\mathop{\mu}\limits^{\circ}}_{c}(P_{1},P_{2}) =\sup \left \{\int \nolimits f\mathrm{d}(P_{1} - P_{2}) :\| f\|_{c} \leq 1\right \} \\ & =& \sup \{\vert f(x) - f(y)\vert :\| f\|_{c} \leq 1\} \\ & \leq & \sup \{\vert f(x) - f(z)\vert + \vert f(z) - f(y)\vert :\| f\|_{c} \leq 1\} \\ & \leq & c(x,z) + c(z,y)\end{array}$$

By assumption, \(c \in {\mathfrak{C}}^{{_\ast}}\), and therefore the triangle inequality implies that c is a metric in U.

Now define \(\mathcal{G}(U)\) as the set of all pairs (f, g) of continuous functions f : U and g : U such that f(x) + g(y) < c(x, y) ∀x, yU. Let \(\mathcal{G}_{B}(U)\) be the set of all pairs \((f,g) \in \mathcal{G}(U)\) with f and g bounded.

Now suppose that c(x, y) is a metric and that \((f,g) \in \mathcal{G}_{B}(U)\). Define \(h(x) =\inf \{ c(x,y) - g(y) : y \in U\}\). As the infimum of a family of continuous functions, h is upper semicontinuous. For each xU we have f(x) ≤ h(x) ≤ − g(x). Then

$$\begin{array}{rcl} h(x) - h(x\prime)& =& \mathop{\inf}\limits_{u}(c(x,u) - g(u)) -\mathop{\inf}\limits_{v}(c(x\prime,v) - g(v)) \\ & <& \mathop{\sup}\limits_{v}(g(v) - c(x\prime,v) + c(x,v) - g(v)) \\ & =& \mathop{\sup}\limits_{v}(c(x,v) - c(x\prime,v)) \leq c(x,x\prime), \\ \end{array}$$

so that \(\|h\|_{c} \leq 1\). Then for P 1, P 2 satisfying (6.2.2) we have

$$\int \nolimits f\mathrm{d}P_{1} + \int \nolimits g\mathrm{d}P_{2} \leq \int \nolimits h\mathrm{d}(P_{1} - P_{2}),$$

so that (according to Corollary 5.3.1 and Theorem 5.4.2 of Chap. 5) we have

$$\begin{array}{rcl} \widehat{\mu }_{c}(P_{1},P_{2})& =& \sup \left \{\int \nolimits f\mathrm{d}P_{1} + \int \nolimits g\mathrm{d}P_{2} : (f,g) \in \mathcal{G}_{B}(U)\right \} \\ & \leq & \sup \left \{\int \nolimits h\mathrm{d}(P_{1} - P_{2}) :\| h\|_{c} \leq 1\right \} ={\mathop{\mu}\limits^{\circ}}_{c}(P_{1},P_{2})\end{array}$$

Thus \(\widehat{\mu }_{c}(P_{1},P_{2}) ={\mathop{\mu}\limits^{\circ}}_{c}(P_{1},P_{2})\).

Corollary 6.2.1.

Let (U,d) be an s.m.s. and a ∈ U. Then

$$\widehat{\mu }_{d}(P_{1},P_{2}) = {\mathop{\mu}\limits^{\circ}}_{ d}(P_{1},P_{2}) =\sup \left \{\int \nolimits f\mathrm{d}(P_{1} - P_{2}) :\| f\|_{L} \leq 1\right \}$$
(6.2.3)

whenever

$$\int \nolimits d(x,a)P_{i}(\mathrm{d}x) < \infty,\qquad i = 1,2.$$
(6.2.4)

The supremum is attained for some optimal f 0 with \(\|f_{0}\|_{L} :=\mathop{\sup}\limits_{x\neq y}\{\vert f(x) - f(y)\vert /d(x,y)\}\) .

If P 1 and P 2 are tight, there are some \(b_{0} \in {\mathcal{P}}^{(P_{1},P_{2})}\) and f 0 : U → ℝ with \(\|f_{0}\|_{L} \leq 1\) such that

$$\widehat{\mu }_{d}(P_{1},P_{2}) = \int \nolimits d(x,y)b_{0}(\mathrm{d}x,\mathrm{d}y) = \int \nolimits f_{0}\mathrm{d}(P_{1} - P_{2}),$$

where \(f_{0}(x) - f_{0}(y) = d(x,y)\) for b 0 -a.e. (x,y) in U × U.

Proof.

Set c(x, y) = d(x, y). Application of the theorem proves the first statement. The second (existence of f 0) follows from Theorem 5.4.3.

For each n ≥ 1 choose \(b_{n} \in {\mathcal{P}}^{(P_{1},P_{2})}\) with

$$\int \nolimits d(x,y)b_{n}(\mathrm{d}x,\mathrm{d}y) <\widehat{ \mu }_{d}(P_{1},P_{2}) + \frac{1} {n}.$$

If P 1 and P 2 are tight, then by Corollary 5.3.1 there exists \(b_{0} \in {\mathcal{P}}^{(P_{1},P_{2})}\) such that

$$\widehat{\mu }_{d}(P_{1},P_{2}) = \int \nolimits d(x,y)b_{0}(\mathrm{d}x,\mathrm{d}y),$$

i.e., that b 0 is optimal. Integrating both sides of f 0(x) − f 0(y) ≤ d(x, y) with respect to b 0 yields ∫f 0d(P 1 − P 2) ≤  ∫d(x, y)b 0(dx, dy). However, we know that we have equality of these integrals. This implies that \(f_{0}(x) - f_{0}(y) = d(x,y)\ b_{0}\)-a.e.

3 Inequalities Between \(\widehat{\mu }_{c}\) and \({\mathop{\mu}\limits^{\circ}}_{c}\)

In the previous section we looked at conditions under which \(\widehat{\mu }_{c} ={\mathop{\mu}\limits^{\circ}}_{c}\). In general, \(\widehat{\mu }_{c}\neq \mu ^{ \circ }_{c}\). For example, if U = , \(d(x,y) = \vert x - y\vert \),

$$c(x,y) = d(x,y)\max (1,{d}^{p-1}(x,a),{d}^{p-1}(y,a)),\quad p \geq 1,$$
(6.3.1)

then for any laws P i (i = 1, 2) on (R) with distribution functions (DFs) F i we have the following explicit expressions:

$$\widehat{\mu }_{c}(P_{1},P_{2}) = \int\nolimits_{0}^{1}c(F_{ `}^{-1}(t),F_{ 2}^{-1}(t))\mathrm{d}t,$$
(6.3.2)

where F i  − 1 is the function inverse to the DF F i (see Theorem 7.4.2 in Chap. 7). On the other hand,

$$\mu ^{ \circ }_{c}(P_{1},P_{2}) = \int\nolimits_{-\infty }^{\infty }\vert F_{ 1}(x) - F_{2}(x)\vert \max (1,\vert x - a{\vert }^{p-1})\mathrm{d}x$$
(6.3.3)

(see Theorem 5.5.1 in Chap. 5). However, in the space p  = p (U) [U = (U, d) is an s.m.s.] of all Borel probability measures P with finite ∫d p(x, a)P(dx), the functionals \(\widehat{\mu }_{c}\) and \(\mu ^{ \circ }_{c}\) [where c is given by (6.3.1)] metrize the same exact topology, that is, the following \(\widehat{\mu }_{c}\)- and \(\mu ^{ \circ }_{c}\)-convergence criteria will be proved.

Theorem 6.3.1.

Let (U,d) be an s.m.s., let c be given by (6.3.1), and let P,Pn ∈ℳp (\(n = 1,2,\ldots \) ). Then the following relations are equivalent:

  1. (I)
    $$\widehat{\mu }_{c}(P_{n},P) \rightarrow 0;$$
  2. (II)
    $$\mu ^{ \circ }_{c}(P_{n},P) \rightarrow 0;$$
  3. (III)
    $$P_{n}\ \mbox{ converges weakly to}\ P\ (P_{n}\stackrel{\mathrm{w}}{\rightarrow }P)\ \mbox{ and}$$
    $$\lim _{N\rightarrow \infty }\mathop{\sup}\limits_{n} \int \nolimits {d}^{p}(x,a)I\{d(x,a) > N\}P_{ n}(\mathrm{d}x) = 0;$$
  4. (IV)
    $$P_{n}\stackrel{\mathrm{w}}{\rightarrow }P\ \mbox{ and}\ \int \nolimits {d}^{p}(x,a)P_{ n}(\mathrm{d}x) \rightarrow \int \nolimits {d}^{p}(x,a)P(\mathrm{d}x).$$

(The assertion of the theorem is an immediate consequence of Theorems 6.3.2–6.3.5 below and the more general Theorem 6.4.1).

Theorem 6.3.1 is a qualitative \(\widehat{\mu }_{c}\) (\(\mu ^{ \circ }_{c}\))-convergence criterion. One can rewrite (III) as

$$\boldsymbol \pi (P_{n},P) \rightarrow 0\ \mbox{ and }\ \lim _{\epsilon \rightarrow 0}\mathop{\sup}\limits_{n}\omega (\epsilon ;P_{n};\lambda ) = 0,$$

where \(\boldsymbol \pi \) is the Prokhorov metricFootnote 1

$$\begin{array}{rcl} \boldsymbol \pi (P,Q)& :=& \inf \{\epsilon > 0 : P(A) \leq Q({A}^{\epsilon }) + \epsilon \quad \forall A \in \mathcal{B}(U)\} \\ & & \quad ({A}^{\epsilon } :=\{ x : d(x,A) < \epsilon \}) \end{array}$$
(6.3.4)

and ω(ε; P; λ) is the following modulus of λ-integrability:

$$\omega (\epsilon ;P;\lambda ) := \int \nolimits \lambda (x)I\left \{d(x,a) > \frac{1} {\epsilon }\right \}P(\mathrm{d}x),$$
(6.3.5)

where λ(x) : = max(d(x, a), d p(x, a)). Analogously, (IV) is equivalent to

  1. (IV  ∗ )
    $$\boldsymbol \pi (P_{n},P) \rightarrow 0\ \mbox{ and }\ D(P_{n},P;\lambda ) \rightarrow 0,$$

where

$$D(P,Q;\lambda ) := \left \vert \int \nolimits \lambda (x)(P - Q)(\mathrm{d}x)\right \vert.$$
(6.3.6)

In this section we investigate quantitative relationships between \(\widehat{\mu }_{c}\), \(\mu ^{ \circ }_{c}\), \(\boldsymbol \pi \), ω, and D in terms of inequalities between these functionals. These relationships yield convergence and compactness criteria in the space of measures w.r.t. the Kantorovich-type functionals \(\widehat{\mu }_{c}\) and \(\mu ^{ \circ }_{c}\) (see Examples 3.3.2 and 3.3.6 in Chap. 3) as well as the \(\mu ^{ \circ }_{c}\)-completeness of the space of measures.

In what follows, we assume that the cost function c has the form considered in Example 5.2.1:

$$c(x,y) = d(x,y)k_{0}(d(x,a),d(y,a))\quad x,y \in U,$$
(6.3.7)

where k 0(t, s) is a symmetric continuous function nondecreasing on both arguments t ≥ 0, s ≥ 0, and satisfying the following conditions:

  1. (C1)
    $$\alpha :=\mathop{\sup}\limits_{s\neq t}\frac{\vert K(t) - K(s)\vert } {\vert t - s\vert k_{0}(t,s)} < \infty,$$

    where K(t) : = tk 0(t, t), t≠0;

  2. (C2)
    $$\beta := k(0) > 0,$$

    where k(t) = k 0(t, t) t ≥ 0; and

  3. (C3)
    $$\gamma :=\mathop{\sup}\limits_{t\geq 0,s\geq 0}\frac{k_{0}(2t,2s)} {k_{0}(t,s)} < \infty.$$

If c is given by (6.3.1), then c admits the form (6.3.7) with k 0(t, s) = max(1, \({t}^{p-1},{s}^{p-1})\), and in this case α = p, β = 1, \(\gamma = {2}^{p-1}\). Further, let \(\mathcal{P}_{\lambda } = \mathcal{P}_{\lambda }(U)\) be the space of all probability measures on the s.m.s. (U, d) with finite λ-moment

$$\mathcal{P}_{\lambda }(U) = \left \{P \in \mathcal{P}(U) : \int\nolimits_{U}\lambda (x)P(\mathrm{d}x) < \infty \right \}\!,$$
(6.3.8)

where λ(x) = K(d(x, a)) and a is a fixed point of U.

In Theorems 6.3.2–6.3.5 we assume that \(P_{1} \in \mathcal{P}_{\lambda }\), \(P_{2} \in \mathcal{P}_{\lambda }\), ε > 0, and denote \(\widehat{\mu }_{c} :=\widehat{ \mu }_{c}(P_{1},P_{2})\) [see (5.2.16)], \(\mu ^{ \circ }_{c} :={\mathop{\mu}\limits^{\circ}}_{c}(P_{1},P_{2})\) [see (5.2.17)], \(\boldsymbol \pi := \boldsymbol \pi (P_{1},P_{2})\),

$$\begin{array}{rcl} \omega _{i}(\epsilon )& := \omega (\epsilon ;P_{i};\lambda ) := \int \nolimits \lambda (x)I\{d(x,a) > 1/\epsilon \}P_{i}(\mathrm{d}x),\quad P_{i} \in \mathcal{P}_{\lambda }& \\ D& := D(P_{1},P_{2};\lambda ) := \left \vert \int \nolimits \lambda \mathrm{d}(P_{1} - P_{2})\right \vert, & \\ \end{array}$$

and the function c satisfies conditions (C1)–(C3). We begin with an estimate of \(\widehat{\mu }_{c}\) from above in terms of \(\boldsymbol \pi \) and ω i (ε).

Theorem 6.3.2.

$$\widehat{\mu }_{c} \leq \boldsymbol \pi [4K(1/\epsilon ) + \omega _{1}(1) + \omega _{2}(1) + 2k(1)] + 5\omega _{1}(\epsilon ) + 5\omega _{2}(\epsilon ).$$
(6.3.9)

Proof.

Recall that \({\mathcal{P}}^{(P_{1},P_{2})}\) is the space of all laws P on U ×U with prescribed marginals P 1 and P 2. Let K = K 1 be the Ky Fan metric with parameter 1 (see Example 3.4.2 in Chap. 3)

$$\mathbf{K}(P) :=\inf \{ \delta > 0 : P(d(x,y) > \delta ) < \delta \}\quad P \in \mathcal{P}_{\lambda }(U).$$
(6.3.10)

Claim 1.

For any N > 0 and for any measure P on U 2 with marginals P 1 and P 2, i.e., \(P \in {\mathcal{P}}^{(P_{1},P_{2})}\), we have

$$\begin{array}{rcl} \int\nolimits_{U\times U}c(x,y)P(\mathrm{d}x,\mathrm{d}y)& \leq & \mathbf{K}(P)\left [4K(N) + \int\nolimits_{U}k(d(x,a))(P_{1} + P_{2})(\mathrm{d}x)\right ] \\ & & +\,5\omega _{1}(1/N) + 5\omega _{2}(1/N). \end{array}$$
(6.3.11)

Proof of Claim 1. Suppose K(P) < σ ≤ 1, \(P \in {\mathcal{P}}^{(P_{1},P_{2})}\). Then by (6.3.7) and (C3),

$$\begin{array}{rcl} \int \nolimits c(x,y)P(\mathrm{d}x,\mathrm{d}y)& \leq & \int \nolimits d(x,y)k(\max \{d(x,a),d(y,a)\})P(\mathrm{d}x,\mathrm{d}y) \\ & \leq & I_{1} + I_{2}, \\ \end{array}$$

where

$$I_{1} := \int\nolimits_{U\times U}d(x,y)k(d(x,a))P(\mathrm{d}x,\mathrm{d}y)$$

and

$$I_{2} := \int\nolimits_{U\times U}d(x,y)k(d(y,a))P(\mathrm{d}x,\mathrm{d}y).$$

Let us estimate I 1:

$$\begin{array}{rcl} I_{1}& :=& \int \nolimits d(x,y)k(d(x,a))[I\{d(x,y) < \delta \} + I\{d(x,y) \geq \delta \}]P(\mathrm{d}x,\mathrm{d}y) \\ & \leq & \delta \int \nolimits k(d(x,a))P(\mathrm{d}x,\mathrm{d}y) \\ & & +\int \nolimits d(x,y)k(d(x,a))I\{d(x,y) \geq \delta \}P(\mathrm{d}x,\mathrm{d}y) \\ & \leq & I_{11} + I_{12} + I_{13}, \end{array}$$
(6.3.12)

where

$$\begin{array}{rcl} I_{11}& :=& \delta \int\nolimits_{U}k(d(x,a))[I\{d(x,a) \geq 1\} + I\{d(x,a) \leq 1\}]P_{1}(\mathrm{d}x), \\ I_{12}& :=& \int\nolimits_{U\times U}d(x,a)k(d(x,a))I\{d(x,y) \geq \delta \}P(\mathrm{d}x,\mathrm{d}y),\ \ \mbox{ and} \\ I_{13}& :=& \int\nolimits_{U\times U}d(y,a)k(d(x,a))I\{d(x,y) \geq \delta \}P(\mathrm{d}x,\mathrm{d}y)\end{array}$$

Obviously, by λ(x) : = K(d(x, a)), \(I_{11} \leq \delta \int \nolimits k(d(x,a))I\{d(x,a) \geq 1\}P_{1}(\mathrm{d}x) + \delta k(1) \leq \delta \omega _{1}(1) + \delta k(1)\). Further,

$$\begin{array}{rcl} I_{12}& =& \int \nolimits K(d(x,a))I\{d(x,y)\geq \delta \}[I\{d(x,a) > N\}+I\{d(x,a)\leq N\}]P(\mathrm{d}x,\mathrm{d}y) \\ & \leq &\int\nolimits_{U}\lambda (x)I\{d(x,a) > N\}P_{1}(\mathrm{d}x) + K(N)\int\nolimits_{U\times U}I\{d(x,y) \geq \delta \}P(\mathrm{d}x,\mathrm{d}y) \\ & \leq & \omega _{1}(1/N) + K(N)\delta \end{array}$$

Now let us estimate the last term in estimate (6.3.12):

$$\begin{array}{rcl} I_{13}& =& \int\nolimits_{U\times U}d(y,a)k(d(x,a))I\{d(x,y) \geq \delta \}[I\{d(x,a) \geq d(y,a) > N\} \\ & & +\,I\{d(y,a) > d(x,a) > N\} + I\{d(x,a) > N,d(y,a) \leq N\} \\ & & +\,I\{d(x,a)\leq N,d(y,a) > N\}+I\{d(x,a) \leq N,d(y,a) \leq N\}]P(\mathrm{d}x,\mathrm{d}y) \\ & \leq & \int\nolimits_{U\times U}\lambda (x)I\{d(x,a) > d(y,a) > N\}P(\mathrm{d}x,\mathrm{d}y) \\ & & +\int\nolimits_{U\times U}\lambda (y)I\{d(y,a) \geq d(x,a) \geq N\}P(\mathrm{d}x,\mathrm{d}y) \\ & & +\int\nolimits_{U}\lambda (x)I\{d(x,a) > N\}P_{1}(\mathrm{d}x) + \int\nolimits_{U}\lambda (y)I\{d(y,a) > N\}P_{2}(\mathrm{d}y) \\ & & +\,K(N)\int\nolimits_{U\times U}I\{d(x,y) \geq \delta \}P(\mathrm{d}x,\mathrm{d}y) \\ & \leq & 2\omega _{1}(1/N) + 2\omega _{2}(1/N) + K(N)\delta \end{array}$$

Summarizing the preceding estimates we obtain \(I_{1} \leq \delta \omega _{1}(1) + \delta k(1) + 3\omega _{1}(1/N) + 2\omega _{2}(1/N) + 2K(N)\delta \). By symmetry we have \(I_{2} \leq \delta \omega _{2}(1) + \delta k(1) + 3\omega _{2}(1/N) + 2\omega _{1}(1/N) + 2K(N)\delta \). Therefore, the last two estimates imply

$$\begin{array}{rcl} \int \nolimits c(x,y)P(\mathrm{d}x,\mathrm{d}y)& \leq & I_{1} + I_{2} \\ & \leq & \delta (\omega _{1}(1) + \omega _{2}(1) + 2k(1) + 4K(N)) \\ & & +\,5\omega _{1}(1/N) + 5\omega _{2}(1/N)\end{array}$$

Letting δ → K(P) we obtain (6.3.11), which proves the claim.

Claim 2 (Strassen–Dudley Theorem). 

$$\inf \{\mathbf{K}(P) : P \in {\mathcal{P}}^{(P_{1},P_{2})}\} = \boldsymbol \pi (P_{ 1},P_{2}).$$
(6.3.13)

Proof of Claim 2. See Dudley [2002] (see also further Corollary 7.5.2 in Chap. 7).

Claims 1 and 2 complete the proof of the theorem.

The next theorem shows that \(\widehat{\mu }_{c}\)-convergence and \(\mu ^{ \circ }_{c}\)-convergence imply the weak convergence of measures.

Theorem 6.3.3.

$$\beta {\boldsymbol \pi }^{2} \leq {\mathop{\mu}\limits^{\circ}}_{ c} \leq \widehat{ \mu }_{c}.$$
(6.3.14)

Proof.

Obviously, for any continuous nonnegative function c,

$$\mu ^{ \circ }_{c} \leq \widehat{ \mu }_{c}$$
(6.3.15)

and

$$\mu ^{ \circ }_{c} \geq \boldsymbol \zeta _{c},$$
(6.3.16)

where \(\boldsymbol \zeta _{c}\) is the Zolatarev simple metric with a ζ-structure (Definition 4.4.1)

$$\begin{array}{rcl} \boldsymbol \zeta _{c}& :=& \boldsymbol \zeta _{c}(P_{1},P_{2}) \\ & :=& \sup \left \{\left \vert \int\nolimits_{U}f\mathrm{d}(P_{1} - P_{2})\right \vert : f : U \rightarrow \mathbb{R},\vert f(x) - f(y)\vert \leq c(x,y)\forall x,y\in U\right \}.\end{array}$$
(6.3.17)

Now, using assumption (C2) we have that c(x, y) ≥ βd(x, y) and, hence, \(\boldsymbol \zeta _{c} \geq \beta \boldsymbol \zeta _{d}\). Thus, by (6.3.16),

$$\mu ^{ \circ }_{c} \geq \beta \boldsymbol \zeta _{d}.$$
(6.3.18)

Claim 3.

$$\boldsymbol \zeta _{d} \geq {\boldsymbol \pi }^{2}.$$
(6.3.19)

Proof of Claim 3. Using the dual representation of \(\widehat{\mu }_{d}\) [see (6.2.3)] we are led to

$$\widehat{\mu }_{d} = \boldsymbol \zeta _{d},$$
(6.3.20)

which in view of the inequality

$$\int \nolimits d(x,y)P(\mathrm{d}x,\mathrm{d}y) \geq {\mathbf{K}}^{2}(P)\ \mbox{ for any }\ P \in {\mathcal{P}}^{(P_{1},P_{2})}$$
(6.3.21)

establishes (6.3.19). The proof of the claim is now completed.

The desired inequalities (6.3.14) are the consequence of (6.3.15), (6.3.16), (6.3.18), and Claim 3.

The next theorem establishes the uniform λ-integrability

$$\lim _{\epsilon \rightarrow 0}\mathop{\sup}\limits_{n}\omega (\epsilon,P_{n},\lambda ) = 0$$

of the sequence of measures \(P_{n} \in \mathcal{P}_{\lambda }\) \(\mu ^{ \circ }_{c}\)-converging to a measure \(P \in \mathcal{P}_{\lambda }\).

Theorem 6.3.4.

$$\omega _{1}(\epsilon /2) \leq \alpha (2\gamma + 1) \mu ^{ \circ }_{c} + 2(\gamma + 1)\omega _{2}(\epsilon ).$$
(6.3.22)

Proof.

For any N > 0, by the triangle inequality, we have

$$\omega _{1}(1/2N) := \int \nolimits \lambda (x)I\{d(x,a) > 2N\}P_{1}(\mathrm{d}x) \leq \mathcal{T}_{1} + \mathcal{T}_{2},$$
(6.3.23)

where

$$\mathcal{T}_{1} := \left \vert \int \nolimits \lambda (x)I\{d(x,a) > 2N\}(P_{1} - P_{2})(\mathrm{d}x)\right \vert $$

and

$$\mathcal{T}_{2} := \int \nolimits \lambda (x)I\{d(x,a) > N\}P_{2}(\mathrm{d}x) = \omega _{2}(1/N).$$

Claim 4.

$$\mathcal{T}_{1} \leq \alpha \mu ^{ \circ }_{c} + K(2N)\int \nolimits I\{d(x,a) > 2N\}(P_{1} + P_{2})(\mathrm{d}x).$$
(6.3.24)

Proof of Claim 4. Denote \(f_{N}(x) := (1/\alpha )\max (\lambda (x),K(2N))\). Since \(\lambda (x) = K(d(x,a)) = d(x,a)k_{0}(d(x,a),d(x,a))\), then by (C1),

$$\begin{array}{rcl} \vert f_{N}(x) - f_{N}(y)\vert & \leq & (1/\alpha )\vert \lambda (x) - \lambda (y)\vert \\ &\leq & \vert d(x,a) - d(y,a)\vert k_{0}(d(x,a),d(y,a)) \leq c(x,y) \\ \end{array}$$

for any x, yU. Thus the inequalities

$$\left \vert \int\nolimits_{U}f_{N}(x)(P_{1} - P_{2})(\mathrm{d}x)\right \vert \leq \boldsymbol \zeta _{c}(P_{1},P_{2}) \leq {\mathop{\mu}\limits^{\circ}}_{c}(P_{1},P_{2})$$
(6.3.25)

follow from (6.3.16) and (6.3.17). Since αf N (x) = max(K(d(x, a)), K(2N)) and (6.3.25) holds, then

$$\begin{array}{rcl} \mathcal{T}_{1}& <& \left \vert \int\nolimits_{U}K(d(x,a))I\{d(x,a) > 2N\}(P_{1} - P_{2})(\mathrm{d}x)\right. \\ & & \left.-\,\int\nolimits_{U}K(2N)I\{d(x,a) \leq 2N\}(P_{1} - P_{2})(\mathrm{d}x)\right \vert \\ & & +\,K(2N)\left \vert \int\nolimits_{U}I\{d(x,a) \leq 2N\}(P_{1} - P_{2})(\mathrm{d}x)\right \vert \\ & =& \left \vert \int\nolimits_{U}\alpha f_{N}(x)(P_{1} - P_{2})(\mathrm{d}x)\right \vert + K(2N)\left \vert \int\nolimits_{U}I\{d(x,a) > 2N\}(P_{1} - P_{2})(\mathrm{d}x)\right \vert \\ &\leq & \alpha \mu ^{ \circ }_{c} + K(2N)\int \nolimits I\{d(x,a) > 2N\}(P_{1} + P_{2})(\mathrm{d}x), \\ \end{array}$$

which proves the claim.

Claim 5.

$$A(P_{1}) := K(2N)\int\nolimits_{U}I\{d(x,a) > 2N\}P_{1}(\mathrm{d}x) \leq 2\alpha \gamma \mu ^{ \circ }_{c} + 2\gamma \omega _{2}(1/N).$$
(6.3.26)

Proof of Claim 5. As in the proof of Claim 4, we choose an appropriate Lipschitz function. That is, write

$$g_{N}(x) = (1/(2\alpha \gamma ))\min \{K(2N),K(2d(x,O(a,N)))\},$$

where O(a, N) : = { x : d(x, a) < N}. Using (C1) and (C3),

$$\begin{array}{rcl} \vert g_{N}(x) - g_{N}(y)\vert & \leq & (1/(2\alpha \gamma ))\vert K(2d(x,O(a,N))) - K\{2d(y,O(a,N)))\vert \\ \mbox{ by (C1)}& & \\ & \leq & (1/\gamma )\vert d(x,O(a,N)) \\ & & -d(y,O(a,N))\vert k_{0}(2d(x,O(a,N)),2d(y,O(a,N))) \\ \mbox{ by (C3)}& & \\ & \leq & d(x,y)k_{0}(d(x,O(a,N)),d(y,O(a,N))) \leq c(x,y)\end{array}$$

Hence

$$\left \vert \int \nolimits g_{N}(P_{1} - P_{2})(\mathrm{d}x)\right \vert \leq \boldsymbol \zeta _{c} \leq {\mathop{\mu}\limits^{\circ}}_{c}.$$
(6.3.27)

Using (6.3.27) and the implications

$$d(x,a) > 2N \Rightarrow d(x,O(a,N)) > N \Rightarrow K(2d(x,O(a,N))) \geq K(2N)$$

we obtain the following chain of inequalities:

$$\begin{array}{rcl} A(P_{1})& \leq & 2\alpha \gamma \int \nolimits g_{N}(x)P_{1}(\mathrm{d}x) \\ & \leq & 2\alpha \gamma \left \vert \int \nolimits g_{N}(x)(P_{1} - P_{2})(\mathrm{d}x)\right \vert + 2\alpha \gamma \int\nolimits_{U}g_{N}(x)P_{2}(\mathrm{d}x) \\ & \leq & 2\alpha \mu ^{ \circ }_{c} + \int \nolimits K(2d(x,O(a,N)))I\{d(x,a) \geq N\}P_{2}(\mathrm{d}x), \\ & & \\ \Biggl (\mbox{ by C3},\ \frac{K(2t)} {K(t)} & =& \frac{2tk_{0}(2t,2t)} {tk_{0}(t,t)} \leq 2\gamma \Biggr ) \\ & \leq & 2\alpha \gamma \mu ^{ \circ }_{c}+2\gamma \int \nolimits K(d(x,O(a,N)))I\{d(x,a)\geq N\}P_{2}(\mathrm{d}x) \\ & \leq & 2\alpha \gamma \mu ^{ \circ }_{c} + 2\gamma \omega _{2}(1/N), \end{array}$$
(6.3.28)

which proves the claim.

For A(P 2) [see (6.3.26)] we have the following estimate:

$$A(P_{2}) \leq \int\nolimits_{U}K(d(x,a))I\{d(x,a) > 2N\}P_{2}(\mathrm{d}x) \leq \omega _{2}(1/N).$$
(6.3.29)

Summarizing (6.3.23), (6.3.24), (6.3.26), and (6.3.29) we obtain

$$\begin{array}{rcl} \omega _{1}(1/2N)& \leq & \alpha \mu ^{ \circ }_{c} + A(P_{1}) + A(P_{2}) + \omega _{2}(1/N) \\ & \leq & (\alpha + 2\alpha \gamma ) \mu ^{ \circ }_{c} + (2\gamma + 2)\omega _{2}(1/N) \\ \end{array}$$

for any N > 0, as desired.

The next theorem shows that \(\mu ^{ \circ }_{c}\)-convergence implies convergence of the λ-moments.

Theorem 6.3.5.

$$D \leq \alpha \mu ^{ \circ }_{c}.$$
(6.3.30)

Proof.

By (C1), for any finite nonnegative measure Q with marginals P 1 and P 2 we have

$$\begin{array}{rcl} D& :=& \left \vert \int\nolimits_{U}\lambda (x)(P_{1} - P_{2})(\mathrm{d}x)\right \vert = \left \vert \int\nolimits_{U\times U}\lambda (x) - \lambda (y)Q(\mathrm{d}x,\mathrm{d}y)\right \vert \\ & \leq & \int\nolimits_{U\times U}\alpha \vert d(x,a) - d(y,a)\vert k_{0}(d(x,a),d(y,a))Q(\mathrm{d}x,\mathrm{d}y) \\ & \leq & \alpha \int\nolimits_{U\times U}c(x,y)Q(\mathrm{d}x,\mathrm{d}y) \\ \end{array}$$

which completes the proof of (6.3.30).

Inequalities (6.3.9), (6.3.14), (6.3.22), and (6.3.30), described in Theorems 6.3.2–6.3.5, imply criteria for convergence, compactness, and uniformity in the spaces of probability measures \((\mathcal{P}(U),\widehat{\mu }_{c})\) and \((\mathcal{P}(U), \mu ^{ \circ }_{c})\) (see also the next section). Moreover, the estimates obtained for \(\widehat{\mu }_{c}\) and \(\mu ^{ \circ }_{c}\) may be viewed as quantitative results demanding conditions that are necessary and sufficient for \(\widehat{\mu }_{c}\)-convergence and \(\mu ^{ \circ }_{c}\)-convergence. Note that, in general, quantitative results require assumptions additional to the set of necessary and sufficient conditions implying the qualitative results. The classic example is the central limit theorem, where the uniform convergence of the normalized sum of i.i.d. RVs can be at any low rate assuming only the existence of the second moment.

4 Convergence, Compactness, and Completeness in \((\mathcal{P}(U),\widehat{\mu }_{c})\) and \((\mathcal{P}(U),{\mathop{\mu}\limits^{\circ}}_{c})\)

In this section, we assume that the cost function c satisfies conditions (C1)–(C3) in the previous section and λ(x) = K(d(x, a)). We begin with the criterion for \(\widehat{\mu }_{c}\)- and \(\mu ^{ \circ }_{c}\)-convergence.

Theorem 6.4.1.

If P n , and \(P \in \mathcal{P}_{\lambda }(U)\) , then the following statements are equivalent

  1. (A)
    $$\widehat{\mu }_{c}(P_{n},P) \rightarrow 0;$$
  2. (B)
    $$\mu ^{ \circ }_{c}(P_{n},P) \rightarrow 0;$$
  3. (C)
    $$P_{n}\stackrel{\mathrm{w}}{\rightarrow }P\ (P_{n}\ \mbox{ converges weakly to}\ P)\ \mbox{ and}\ \int \nolimits \lambda \mathrm{d}(P_{n} - P) \rightarrow 0\ \mbox{ as}\ n \rightarrow \infty ;$$
  4. (D)
    $$P_{n}\stackrel{\mathrm{w}}{\rightarrow }P\ \mbox{ and }\ \lim _{\epsilon \rightarrow 0}\mathop{\sup}\limits_{n}\omega _{n}(\epsilon ) = 0,$$

where \(\omega _{n}(\epsilon ) := \omega (\epsilon ;P_{n};\lambda ) = \int \nolimits \lambda (x)\{d(x,a) > 1/\epsilon \}P_{n}(\mathrm{d}x)\) .

Proof.

From inequality (6.3.14) it is apparent that AB and BP n w →P. Using (6.3.30) we obtain that B implies ∫λd(P n  − P) → 0, and thus BC. Now, let C hold.

Claim 6.

CD.

Proof of Claim 6. Choose a sequence ε1 > ε2 > ⋯ → 0 such that \(P(d(x,a) = 1/\epsilon _{n}) = 0\) for any \(n = 1,2,\ldots \). Then for fixed n

$$\int \nolimits \lambda (x)I\{d(x,a) \leq 1/\epsilon _{n}\}(P_{k} - P)(\mathrm{d}x) \rightarrow 0\ \mbox{ as }\ k \rightarrow \infty $$

by Billingsley [1999, Theorem 5.1]. Since \(P \in \mathcal{P}_{\lambda }\), ω(ε n ) : = ω(ε n ; P; c) → 0 as n, and hence

$$\begin{array}{rcl} limsup_{k\rightarrow \infty }\omega _{k}(\epsilon _{n})& \leq & limsup_{k\rightarrow \infty }\left \vert \int \nolimits \lambda (x)\{d(x,a) > 1/\epsilon _{n}\}(P_{k} - P)(\mathrm{d}x)\right \vert + \omega (\epsilon _{n}) \\ & \leq & limsup_{k\rightarrow \infty }\left \vert \int \nolimits \lambda (x)(P_{k} - P)(\mathrm{d}x)\right \vert \\ & & +limsup_{k\rightarrow \infty }\left \vert \int \nolimits \lambda (x)I\{d(x,a) \leq 1/\epsilon _{n}\}(P_{k} - P)(\mathrm{d}x)\right \vert \\ & & +\omega (\epsilon _{n}) \rightarrow 0\ \mbox{ as}\ n \rightarrow \infty \end{array}$$

The last inequality and \(P_{k} \in \mathcal{P}_{\lambda }\) imply limε → 0sup n ω n (ε) = 0, and hence D holds.

The claim is proved.

Claim 7.

DA.

Proof of Claim 7. By Theorem 6.3.2,

$$\widehat{\mu }_{c}(P_{n},P) \leq \boldsymbol \pi (P_{n},P)[4K(1/\epsilon _{n})+\omega _{n}(1)+\omega (1)+2k(1)]+5\omega _{n}(\epsilon _{n})+5\omega (\epsilon _{n}),$$

where ω n and ω are defined as in Claim 6 and, moreover, ε n  > 0 is such that

$$4K(1/\epsilon _{n}) +\mathop{\sup}\limits_{n\geq 1}\omega _{n}(1) + \omega (1) + 2k(1) \leq {(\boldsymbol \pi (P_{n},P))}^{-1/2}.$$

Hence, using the last two inequalities we obtain

$$\widehat{\mu }_{c}(P_{n},P) \leq \sqrt{\boldsymbol \pi (P_{n }, P)} + 5\mathop{\sup}\limits_{n\geq 1}\omega _{n}(\epsilon _{n}) + 5\omega (\epsilon _{n}),$$

and hence DA, as we claimed.

The Kantorovich–Rubinstein functional \(\mu ^{ \circ }_{c}\) is a metric in \(\mathcal{P}_{\lambda }(U)\), while \(\widehat{\mu }_{c}\) is not a metric except for the case c = d (see the discussion in the previous section). The next theorem establishes a criterion for \(\mu ^{ \circ }_{c}\)-relative compactness of sets of measures. Recall that a set \(\mathcal{A}\subset \mathcal{P}_{\lambda }\) is said to be \(\mu ^{ \circ }_{c}\)-relatively compact if any sequence of measures in \(\mathcal{A}\) has a \(\mu ^{ \circ }_{c}\)-convergent subsequence and the limit belongs to \(\mathcal{P}_{\lambda }\). Recall that the set \(\mathcal{A}\subset \mathcal{P}(U)\) is weakly compact if \(\mathcal{A}\) is \(\boldsymbol \pi \)-relatively compact, i.e., any sequence of measures in \(\mathcal{A}\) has a weakly (\(\boldsymbol \pi \)-) convergent subsequence.

Theorem 6.4.2.

The set \(\mathcal{A}\subset \mathcal{P}_{\lambda }\) is \(\mu ^{ \circ }_{c}\) -relatively compact if and only if \(\mathcal{A}\) is weakly compact and

$$\lim _{\epsilon \rightarrow 0}\mathop{\sup}\limits_{P\in \mathcal{A}}\omega (\epsilon ;P;\lambda ) = 0.$$
(6.4.1)

Proof.

“If” part: If \(\mathcal{A}\) is weakly compact, (6.4.1) holds and \(\{P_{n}\}_{n\geq 1} \subset \mathcal{A}\), then we can choose a subsequence {P n′ } ⊂ { P n } that converges weakly to a probability measure P.

Claim 8.

\(P \in \mathcal{P}_{\lambda }\).

Proof of Claim 8. Let 0 < α1 < α2 < ⋯ , limα n  = be such a sequence that \(P(d(x,a) = \alpha _{n}) = 0\) for any n ≥ 1. Then, by Billingsley [1999, Theorem 5.1] and (6.4.1),

$$\begin{array}{rcl} \int \nolimits \lambda (x)I\{d(x,a) \leq \alpha _{n\prime}\}P(\mathrm{d}x)& =& \lim _{n\rightarrow \infty }\int \nolimits \lambda (x)I\{d(x,a) \leq \alpha _{n\prime}\}P_{n\prime}(\mathrm{d}x) \\ & \leq & liminf _{n\rightarrow \infty }\int \nolimits \lambda (x)P_{n\prime}(\mathrm{d}x) < \infty, \\ \end{array}$$

which proves the claim.

Claim 9.

$${\mathop{\mu}\limits^{\circ}}_{ c}(P_{n\prime},P) \rightarrow 0.$$

Proof of Claim 9. Using Theorem 6.3.2, Claim 8, and (6.4.1) we have, for any δ > 0,

$$\begin{array}{rcl} \mu ^{ \circ }_{c}(P_{n\prime},P)& \leq & \widehat{\mu }_{c}(P_{n\prime},P) \leq \boldsymbol \pi (P_{n\prime},P)[4K(1/\epsilon ) + \omega _{1}(1) + \omega _{2}(1) + 2K(1)] \\ & & +5\mathop{\sup}\limits_{n\prime}\omega (P_{n\prime};\epsilon ;\lambda ) + \omega (P;\epsilon ;\lambda ) \\ & \leq & \boldsymbol \pi (P_{n\prime},P)[4K(1/\epsilon ) + \omega _{1}(1) + \omega _{2}(1) + 2K(1)] + \delta \\ \end{array}$$

if ε = ε(δ) is small enough. Hence, by \(\boldsymbol \pi (P_{n\prime},P) \rightarrow 0\), we can choose N = N(δ) such that \(\mu ^{ \circ }_{c}(P_{n\prime},P) < 2\delta \) for any n′N, as desired.

Claims 8 and 9 establish the “if” part of the theorem.

“Only if” part: If \(\mathcal{A}\) is \(\mu ^{ \circ }_{c}\)-relatively compact and \(\{P_{n}\} \subset \mathcal{A}\), then there exists a subsequence {P n′ } ⊂ { P n } that is convergent w.r.t. \(\mu ^{ \circ }_{c}\), and let P be the limit. Hence, by Theorem 6.3.3, \(\mu ^{ \circ }_{c}(P_{n},P) \geq \beta {\boldsymbol \pi }^{2}(P_{n},P) \rightarrow 0\), which demonstrates that the set \(\mathcal{A}\) is weakly compact.

Further, if (6.4.1) is not valid, then there exists δ > 0 and a sequence {P n } such that

$$\omega (1/n;P_{n};\lambda ) > \delta \quad \forall n \geq 1.$$
(6.4.2)

Let {P n′ } be a \(\mu ^{ \circ }_{c}\)-convergent subsequence of {P n }, and let \(P \in \mathcal{P}_{\lambda }\) be the corresponding limit. By Theorem 6.3.4, \(\omega (1/n\prime;P_{n\prime};\lambda ) \geq (2\gamma + 2)(\alpha \mu ^{ \circ }_{c}(P_{n\prime},P) + \omega (1/n\prime;P;\lambda )) \rightarrow 0\) as n′, which is in contradiction with (6.4.2).

In the light of Theorem 6.4.1, we can now interpret Theorem 6.4.2 as a criterion for \(\mu ^{ \circ }_{c}\)-relative compactness of sets of measures in \(\mathcal{P}\) by simply changing \(\mu ^{ \circ }_{c}\) with \(\widehat{\mu }_{c}\) in the formation of the last theorem.

The well-known Prokhorov theorem says that (U, d) is a complete s.m.s; then the set of all laws onUis complete w.r.t. the Prokhorov metric \(\boldsymbol \pi \).Footnote 2 The next theorem is an analog of the Prokhorov theorem for the metric space \(\mathcal{P}_{\lambda }, \mu ^{ \circ }_{c})\).

Theorem 6.4.3.

If (U,d) is a complete s.m.s., then \((\mathcal{P}_{\lambda }(U), \mu ^{ \circ }_{c})\) is also complete.

Proof.

If {P n } is a \(\mu ^{ \circ }_{c}\)-fundamental sequence, then by Theorem 6.3.3, {P n } is also \(\boldsymbol \pi \)-fundamental, and hence there exists the weak limit \(P \in \mathcal{P}(U)\).

Claim 10.

\(P \in \mathcal{P}_{\lambda }\).

Proof of Claim 10. Let ε > 0 and \(\mu ^{ \circ }_{c}(P_{n},P_{m}) \leq \epsilon \) for any n, mn ε. Then, by Theorem 6.3.5, \(\left \vert \int \nolimits \lambda (x)(P_{n} - P_{n_{\epsilon }})(\mathrm{d}x)\right \vert < \alpha \epsilon \) for any n > n ε; hence,

$$\mathop{\sup}\limits_{n\geq n_{\epsilon }} \int \nolimits \lambda (x)P_{n}(\mathrm{d}x) < \alpha \epsilon + \int \nolimits \lambda (x)P_{n_{\epsilon }}(\mathrm{d}x) < \infty.$$

Choose the sequence 0 < α1 < α2 < ⋯ , lim k α k  = , such that \(P(d(x,a) = \alpha _{k}) = 0\) for any k > 1. Then

$$\begin{array}{rcl} \int \nolimits \lambda (x)I\{d(x,a) \leq \alpha _{k}\}P(\mathrm{d}x)& =& \lim _{n\rightarrow \infty }\int \nolimits \lambda (x)I\{d(x,a) \leq \alpha _{k}\}P_{n}(\mathrm{d}x) \\ & \leq & liminf _{n\rightarrow \infty }\int \nolimits \lambda (x)P_{n}(\mathrm{d}x) \\ & \leq & \mathop{\sup}\limits_{n\geq n_{\epsilon }} \int\nolimits_{U}\lambda (x)P_{n}(\mathrm{d}x) < \infty \end{array}$$

Letting k the assertion follows.

Claim 11.

$$\mu ^{ \circ }_{c}(P_{n},P) \rightarrow 0.$$

Proof of Claim 11. Since \(\mu ^{ \circ }_{c}(P_{n},P_{n_{\epsilon }}) \leq \epsilon \) for any nn ε, then, by Theorem 6.3.4,

$$\mathop{\sup}\limits_{n\geq n_{\epsilon }}\omega (\delta ;P_{n};\lambda ) \leq 2(\gamma + 1)(\alpha \epsilon + \omega (2\delta ;P_{n_{\epsilon }};\lambda ))$$

for any δ > 0. The last inequality and Theorem 6.3.2 yield

$$\begin{array}{rcl} \mu ^{ \circ }_{c}(P_{n},P)& \leq & \widehat{\mu }_{c}(P_{n},P) \leq \boldsymbol \pi (P_{n},P)[4K(1/\delta ) \\ & & +\mathop{\sup}\limits_{n\geq n_{\epsilon }}\omega (1;P_{n};\lambda ) + \omega (1;P;\lambda ) + 2K(1)] \\ & & +10(\gamma + 1)(\alpha \epsilon + \omega (2\delta ;P_{n_{\epsilon }};\lambda ) + 5\omega (\delta ;P_{n_{\epsilon }};\lambda ))\end{array}$$
(6.4.3)

for any nn ε and δ > 0. Next, choose δ n  = δ n, ε > 0 such that δ n  → 0 as n and

$$4K(1/\delta _{n}) +\mathop{\sup}\limits_{n\geq n_{\epsilon }}\omega (1;P_{n};\lambda ) + \omega (1;P;\lambda ) + 2k(1) \leq \frac{1} {{(\boldsymbol \pi (P_{n},P))}^{1/2}}.$$
(6.4.4)

Combining (6.4.3) and (6.4.4) we have that \(\mu ^{ \circ }_{c}(P_{n},P) \leq \mathrm{ const.}\) ε for n large enough, which proves the claim.

5 \({\mathop{\mu}\limits^{\circ}}_{c}\)- and \(\widehat{\mu }_{c}\)-Uniformity

In the previous section, we saw that \({\mathop{\mu}\limits^{\circ}}_{c}\) and \(\widehat{\mu }_{c}\) induce the same exact convergence in \(\mathcal{P}_{\lambda }\). Here we would like to analyze the uniformity of \({\mathop{\mu}\limits^{\circ}}_{c}\) and \(\widehat{\mu }_{c}\)-convergence. Namely, if for any \(P_{n},Q_{n} \in \mathcal{P}_{\lambda }\), the equivalence

$${\mathop{\mu}\limits^{\circ}}_{ c}(P_{n},Q_{n})\;\Longleftrightarrow\;\widehat{\mu }_{c}(P_{n},Q_{n}) \rightarrow 0\quad n \rightarrow \infty $$
(6.5.1)

holds. Obviously, ⇐ , by \({\mathop{\mu}\limits^{\circ}}_{c}(P_{n},Q_{n}) \leq \widehat{ \mu }_{c}\). So, if

$$\widehat{\mu }_{c}(P,Q) \leq \phi ({\mathop{\mu}\limits^{\circ}}_{ c}(P,Q))\qquad P,Q \in \mathcal{P}_{\lambda }$$
(6.5.2)

for a continuous nondecreasing function, ϕ(0) = 0, then (6.5.1) holds.

Remark 6.5.1.

Given two metrics, say μ and ν, in the space of measures, the equivalence of μ- and ν-convergence does not imply the existence of a continuous nondecreasing function ϕ vanishing at 0 and such that μ ≤ ϕ(ν). For example, both the Lévy metric L [see (4.2.3)] and the Prokhorov metric \(\boldsymbol \pi \) [see (3.3.18)] metrize the weak convergence in the space \(\mathcal{P}(\mathbb{R})\). Suppose there exists ϕ such that

$$\boldsymbol \pi (X,Y ) \leq \phi (\mathbf{L}(X,Y ))$$
(6.5.3)

for any real-valued r.v.s X and Y. (Recall our notation \(\mu (X,Y ) := \mu (\mathrm{Pr}_{X},\mathrm{Pr}_{Y })\) for any metric μ in the space of measures.) Then, by (4.2.4) and (3.3.23),

$$\mathbf{L}(X/\lambda,Y/\lambda ) = \mathbf{L}_{\lambda }(X,Y ) \rightarrow \boldsymbol \rho (X,Y )\quad \mbox{ as}\quad \lambda \rightarrow 0$$
(6.5.4)

and

$$\boldsymbol \pi (X/\lambda,Y/\lambda ) = \boldsymbol \pi _{\lambda }(X,Y ) \rightarrow \boldsymbol \sigma (X,Y )\quad \mbox{ as}\quad \lambda \rightarrow 0,$$
(6.5.5)

where \(\boldsymbol \rho \) is the Kolmogorov metric [see (4.2.6)] and \(\boldsymbol \sigma \) is the total variation metric [see (3.3.13)]. Thus, (6.5.3)–(6.5.5) imply that \(\boldsymbol \sigma (X,Y ) \leq \phi (\boldsymbol \rho (X,Y ))\). The last inequality simply is, however, not true because in general \(\boldsymbol \rho \)-convergence does not yield \(\boldsymbol \sigma \)-convergence. [For example, if X n is a random variable taking values kn, \(k = 1,\ldots,n\) with probability 1 ∕ n, then \(\boldsymbol \rho (X_{n},Y ) \rightarrow 0\) where Y is a (0, 1)-uniformly distributed random variable. On the other hand, \(\boldsymbol \sigma (X_{n},Y ) = 1\).]

We are going to prove (6.5.2) for the special but important case where \(\mu ^{ \circ }_{c}\) is the Fortet–Mourier metric on \(\mathcal{P}_{\lambda }(\mathbb{R})\), i.e., \(\mu ^{ \circ }_{c}(P,Q) = \boldsymbol \zeta (P,Q;{\mathcal{G}}^{p})\) [see (4.4.34)]; in other words, for any \(P,Q \in \mathcal{P}_{\lambda }\),

$$\mu ^{ \circ }_{c}(P,Q)=\sup \left \{\int \nolimits f\mathrm{d}(P - Q) : f : \mathbb{R} \rightarrow \mathbb{R},\vert f(x)-f(y)\vert \leq c(x,y)\forall x,y\in \mathbb{R}\right \}\!,$$

where

$$c(x,y) = \vert x - y\vert \max (1,\vert x{\vert }^{p-1},\vert y{\vert }^{p-1})\quad p \geq 1.$$
(6.5.6)

Since λ(x) : = 2max( | x | , | x | p), then \(\mathcal{P}_{\lambda }(\mathbb{R})\) is the space of all laws on , with finite pth absolute moment.

Theorem 6.5.1.

If c is given by (6.5.6), then

$$\widehat{\mu }_{c}(P,Q) \leq p \mu ^{ \circ }_{c}(P,Q)\quad \forall P,Q \in \mathcal{P}_{\lambda }(\mathbb{R}).$$
(6.5.7)

Proof.

Denote \(h(t) =\max (1,\vert t{\vert }^{p-1})\), t, and H(x) =  ∫0 x h(t)dt, x. Let X and Y be real-valued RVs on a nonatomic probability space \((\Omega,\mathcal{A},\mathrm{Pr})\) with distributions P and Q, respectively. Theorem 5.5.1 gives us explicit representation of \(\mu ^{ \circ }_{c}\), namely,

$$\mu ^{ \circ }_{c}(P,Q) = \int\nolimits_{-\infty }^{\infty }h(t)\vert F_{ X}(t) - F_{Y }(t)\vert \mathrm{d}t,$$
(6.5.8)

and thus

$$\mu ^{ \circ }_{c}(P,Q) = \int\nolimits_{-\infty }^{\infty }\vert F_{ H(X)}(x) - F_{H(Y )}(x)\vert \mathrm{d}x.$$
(6.5.9)

Claim 12.

Let X and Y be real-valued RVs with distributions P and Q, respectively. Then

$$\mu ^{ \circ }_{c}(P,Q) =\inf \{ E\vert H(\widetilde{X}) - H(\widetilde{Y })\vert : F_{\widetilde{X}} = F_{X},F_{\widetilde{Y }} = F_{Y }\}.$$
(6.5.10)

Proof of Claim 12. Using the equality \(\widehat{\mu }_{d} ={\mathop{\mu}\limits^{\circ}}_{d}\) [see (6.2.3) and (5.5.5)] with H(t) = t we have that

$$\begin{array}{rcl} \mu ^{ \circ }_{d}(F,G)& =& \widehat{\mu }_{d}(F,G) =\inf \{ E\vert X\prime - Y \prime\vert : F_{X\prime} = F,F_{Y \prime} = G\} \\ & =& \int\nolimits_{-\infty }^{\infty }\vert F(x) - G(x)\vert \mathrm{d}x \end{array}$$
(6.5.11)

for any DFs F and G. Hence, by (6.5.9)

$$\begin{array}{rcl} \mu ^{ \circ }_{c}(P,Q)& =& \inf \{E\vert X\prime - Y \prime\vert : F_{X\prime} = F_{H(X)},F_{Y \prime} = F_{H(Y )}\} \\ & =& \inf \{E\vert H(\widetilde{X}) - H(\widetilde{Y })\vert : F_{\widetilde{X}} = F_{X},F_{\widetilde{Y }} = F_{Y }\} \\ \end{array}$$

which proves the claim.

Next we use Theorem 2.7.2, which claims that on a nonatomic probability space, the class of all joint distributions \(\mathrm{Pr}_{X,Y }\) coincides with the class of all probability Borel measures on 2. This implies

$$\widehat{\mu }_{c}(P,Q) =\inf \{ Ec(\widetilde{X},\widetilde{Y }) : F_{\widetilde{X}} = F_{X},F_{\widetilde{Y }} = F_{Y }\}.$$
(6.5.12)

Claim 13.

For any x, y, c(x, y) ≤ p | H(x) − H(y) | .

Proof of Claim 13.

  1. (a)

    Let y > x > 0. Then

    $$\begin{array}{rcl} c(x,y)& =& (y - x)h(y) = yh(y) - xh(y) \leq yh(y) - xh(x) \\ & \leq & (H(y) - H(x))\mathop{\sup}\limits_{y>x>0}\frac{yh(y) - xh(x)} {H(y) - H(x)} \end{array}$$

    Since H(t) is a strictly increasing continuous function,

    $$B :=\mathop{\sup}\limits_{y>x>0}\frac{yh(y) - xh(x)} {H(y) - H(x)} =\mathop{\sup}\limits_{t>s>0}\frac{f(t) - f(s)} {t - s},$$

    where \(f(t) := {H}^{-1}(t)h({H}^{-1}(t))\) and H  − 1 is a function inverse to H; hence, B = ess sup t  | f′(t) | ≤ p.

  2. (b)

    Let y > 0 > x >  − y. Then \(c(x,y) = \vert x-y\vert h(y) = (y+(-x))h(y) = yh(y)+(-x)h(\vert x\vert )+((-x)h(y)-(-x)h(\vert x\vert )) \leq yh(y)+(-x)h(\vert x\vert )\). Since

    $$th(t) = \left \{\begin{array}{lll} t &\mbox{ if}&t \leq 1,\\ {t}^{p } &\mbox{ if} &t \geq 1,\end{array} \right.\qquad H(t) = \left \{\begin{array}{lll} t &\mbox{ if}&0 < t \leq 1, \\ \dfrac{p - 1} {p} + \dfrac{1} {p}{t}^{p}&\mbox{ if}&t \geq 1, \end{array} \right.$$

    then \(yh(y) + (-x)h(\vert x\vert ) \leq p(H(y) + H(-x)) = p(H(y) - H(x))\). By symmetry, the other cases are reduced to (a) or (b). The claim is shown. Now, (6.5.7) is a consequence of Claims 12, 13, and (6.5.12).

6 Generalized Kantorovich and Kantorovich–Rubinstein Functionals

In this section, we consider a generalization of the Kantorovich-type functionals \(\widehat{\mu }_{c}\) and \(\mu ^{ \circ }_{c}\) [see (5.2.16) and (5.2.17)].

Let U = (U, d) be an s.m.s. and (U ×U) the space of all nonnegative Borel measures on the Cartesian product U ×U. For any probability measures P 1 and P 2 define the sets \({\mathcal{P}}^{(P_{1},P_{2})}\) and \({\mathcal{Q}}^{(P_{1},P_{2})}\) as in Sect. 5.2 [see (5.2.2) and (5.2.13)].

Let Λ : (U ×U) → [0, ] satisfy the conditions

  1. 1.

    Λ(αP) = αΛ(P) ∀α ≥ 0,

  2. 2.

    \(\Lambda (P + Q) \leq \Lambda (P) + \Lambda (Q)\)P and Q in (U ×U).

We introduce the generalized Kantorovich functional

$$\widehat{\Lambda }(P_{1},P_{2}) :=\inf \{ \Lambda (P) : P \in {\mathcal{P}}^{(P_{1},P_{2})}\}$$
(6.6.1)

and the generalized Kantorovich–Rubinstein functional

$$\Lambda ^{ \circ } (P_{1},P_{2}) :=\inf \{ \Lambda (P) : P \in {\mathcal{Q}}^{(P_{1},P_{2})}\}.$$
(6.6.2)

Example 6.6.1.

The Kantorovich metricFootnote 3

$$\begin{array}{rcl} \mathcal{l}_{1}(P_{1},P_{2})& :=& \sup \left \{\left \vert \int \nolimits f\mathrm{d}(P_{1} - P_{2})\right \vert : f : U\right. \\ & & \left.\rightarrow \mathbb{R},\vert f(x) - f(y)\vert \leq d(x,y),x,y \in U\right \} \\ \end{array}$$

in the space of measures P with finite “first moment,” ∫d(x, a)P(dx) < , has the dual representations \(\mathcal{l}_{1}(P_{1},P_{2}) =\mathop{ \Lambda }\limits^{ \circ } (P_{1},P_{2}) =\widehat{ \Lambda }(P_{1},P_{2})\), where

$$\Lambda (P) = \Lambda _{1}(P) := \int\nolimits_{U\times U}d(x,y)P(\mathrm{d}x,\mathrm{d}y).$$
(6.6.3)

Example 6.6.2.

Let U = , \(d(x,y) = \vert x - y\vert \). Then

$$\mathcal{l}_{1}(P_{1},P_{2}) = \int\nolimits_{\mathbb{R}}\vert F_{1}(t) - F_{2}(t)\vert \mathrm{d}t,$$

where F i is the DF of P i and

$$\begin{array}{rcl} \Lambda _{1}(P)& =& \int\nolimits_{\mathbb{R}}(\mathrm{Pr}(X \leq t < Y ) +\mathrm{ Pr}(Y \leq t < X))\mathrm{d}t \\ & =& \int\nolimits_{\mathbb{R}}\mathrm{Pr}(X \leq t) +\mathrm{ Pr}(Y \leq t) - 2\mathrm{Pr}(\max (X,Y ) \leq t)\mathrm{d}t \\ & =& E(2\max (X,Y ) - X - Y ) = E\vert X - Y \vert \\ \end{array}$$

for RVs X and Y with \(\mathrm{Pr}_{X,Y } = P\). We generalize (6.6.3) as follows: for any 1 ≤ p, define

$$\Lambda (P) := \Lambda _{p}(P):=\left \{\begin{array}{l} {\left \{\int\nolimits_{\mathbb{R}}{\left [\int\nolimits_{\mathbb{R}}c_{t}(x,y)P(\mathrm{d}x,\mathrm{d}y)\right ]}^{p}\lambda (\mathrm{d}t)\right \}}^{1/p}\qquad 1 \leq p < \infty \\ \\ \mathrm{ess\,sup}_{\lambda } \int\nolimits_{{\mathbb{R}}^{2}}c_{t}(x,y)P(\mathrm{d}x,\mathrm{d}y) \\ \\ :=\inf \left \{\epsilon > 0 : \lambda \left \{t : \int\nolimits_{{\mathbb{R}}^{2}}c_{t}\mathrm{d}P > \epsilon \right \} = 0\right \}\quad p = \infty, \end{array} \right.$$
(6.6.4)

where c t (t) is the following semimetric in

$$c_{t}(x,y) := I\{x \leq t \leq y\} + I\{y \leq t \leq x\}\forall x,y \in \mathbb{R},$$
(6.6.5)

and λ( ⋅) is a nonnegative measure on . In the space \(\mathfrak{X} = \mathfrak{X}(\mathbb{R})\) of all real-valued RVs on a nonatomic probability space \((\Omega,\mathcal{A},\mathrm{Pr})\), the minimal metric w.r.t. Λ is given by

$$\widehat{\Lambda }_{p}(P_{1},P_{2})=\left \{\begin{array}{l} \inf \left \{{\left [\int\nolimits_{\mathbb{R}}\phi _{t}^{p}(X,Y )\lambda (\mathrm{d}t)\right ]}^{1/p}\!\! : X,Y \in \mathfrak{X},\mathrm{Pr}_{ X} = P_{1},\mathrm{Pr}_{Y } = P_{2}\right \} \\ \\ 1 \leq p < \infty \\ \\ \inf \left \{\sup \limits _{t\in \mathbb{R}}\phi _{t}(X,Y ) : X,Y \in \mathfrak{X},\mathrm{Pr}_{X} = P_{1},\mathrm{Pr}_{Y } = P_{2}\right \}\\ \\ p = \infty. \end{array} \right.$$
(6.6.6)

Similarly, the minimal norm with respect to Λ is

$$\Lambda ^{\circ }_{p}(P_{1},P_{2}) = \left \{\begin{array}{l} \inf \Biggl \{\alpha {\left [\int\nolimits_{\mathbb{R}}\phi _{t}^{p}(X,Y )\lambda (\mathrm{d}t)\right ]}^{1/p} : \alpha > 0,\quad X,Y \in \mathfrak{X},\quad \\ \alpha (\mathrm{Pr}_{X} -\mathrm{ Pr}_{Y }) = P_{1} - P_{2}\Biggr \}\quad \mbox{ if}\ p < \infty \\ \inf \Bigl \{\alpha \mathop{\sup}\limits_{\lambda }\phi _{t}(X,Y ) : \alpha > 0,\ X,Y \in \mathfrak{X}, \\ \alpha (\mathrm{Pr}_{X} -\mathrm{ Pr}_{Y }) = P_{1} - P_{2}\Bigr \}\quad \mbox{ if}\ p = \infty, \end{array} \right.$$
(6.6.7)

where in (6.6.6) and (6.6.7)

$$\phi _{t}(X,Y ) :=\mathrm{ Pr}(X \leq t < Y ) +\mathrm{ Pr}(Y \leq t < X).$$
(6.6.8)

The next theorem gives the explicit form of \(\widehat{\Lambda }_{p}\) and \(\Lambda ^{ \circ }_{p}\).

Theorem 6.6.1.

Let F i be the DF of P i (i = 1,2). Then

$$\widehat{\Lambda }_{p}(P_{1},P_{2}) =\mathop{ \Lambda }\limits^{ \circ }_{p}(P_{1},P_{2}) = \boldsymbol \lambda _{p}(F_{1},F_{2}),$$
(6.6.9)

where

$$\boldsymbol \lambda _{p}(F_{1},F_{2})=\left \{\begin{array}{l} {\left (\int\nolimits_{\mathbb{R}}\vert F_{1}(t) - F_{2}(t){\vert }^{p}\lambda (\mathrm{d}t)\right )}^{1/p}\quad 1 \leq p < \infty \\ \mathrm{ess\sup }_{\lambda }\vert F_{1} - F_{2}\vert =\inf \{ \epsilon > 0 : \lambda (t : \vert F_{1}(t) - F_{2}(t)\vert > \epsilon ) = 0\}\\ p = \infty. \end{array} \right.$$
(6.6.10)

Claim 14.

\(\boldsymbol \lambda _{p}(F_{1},F_{2}) \leq \mathop{ \Lambda }\limits^{ \circ }_{p}(P_{1},P_{2})\).

Proof of Claim 14. Let \(P \in {\mathcal{Q}}^{(P_{1},P_{2})}\). Then in view of Remark 2.7.2 in Chap. 2, there exist α > 0, \(X \in \mathfrak{X}\), \(Y \in \mathfrak{X}\), such that \(\alpha \mathrm{Pr}_{X,Y } = P\) and \(\alpha (F_{X} - F_{Y }) = F_{1} - F_{2}\); thus

$$\begin{array}{rcl} \vert F_{1}(x) - F_{2}(x)\vert & =& \alpha \vert F_{X}(t) - F_{Y }(t)\vert \\ & =& \alpha [\max (F_{X}(t) - F_{Y }(t),0) +\max (F_{Y }(t) - F_{X}(t),0)] \\ & \leq & \alpha \phi _{t}(X,Y ). \end{array}$$
(6.6.11)

By (6.6.7) and (6.6.11), it follows that \(\boldsymbol \lambda _{p}(F_{1},F_{2}) \leq \mathop{ \Lambda }\limits^{ \circ }_{p}(P_{1},P_{2})\), as desired.

Further

$$\Lambda ^{ \circ }_{p}(P_{1},P_{2}) \leq \widehat{ \Lambda }_{p}(P_{1},P_{2})$$
(6.6.12)

by the representations (6.6.6) and (6.6.7).

Claim 15.

$$\widehat{\Lambda }_{p}(P_{1},P_{2}) \leq \boldsymbol \lambda _{p}(F_{1},F_{2}).$$

Proof of claim 15. Let \(\widetilde{X} := F_{1}^{-1}(V )\), \(\widetilde{Y } := F_{2}^{-1}(V )\), where F i  − 1 is the generalized inverse to the DF F i [see (3.3.16) in Chap. 3] and V is a (0, 1)-uniformly distributed RV. Then \(F_{\widetilde{X},\widetilde{Y }}(t,s) =\min (F_{1}(t),F_{2}(s))\) for all t, s. Hence, \(\phi _{t}(\widetilde{X},\widetilde{Y }) = \vert F_{1}(t) - F_{2}(t)\vert \), which proves the claim by using (6.6.6) and (6.6.7).

Combining Claims 14, 15, and (6.6.12) we obtain (6.6.9).

Problem 6.6.1.

In general, dual and explicit solutions of \(\widehat{\Lambda }_{p}\) and \( \mathop\Lambda\limits^{\circ }_{p}\) in (6.6.1) and (6.6.2) are not known.