The notion of association is a form of positive dependence among random variables independently introduced in reliability theory, percolation theory and statistical physics, where it is expressed in a form known as the “FKG-Inequalities.” The main focus of this chapter is (i) a proof of Newman’s central limit theorem for associated random fields with summable fast decay of correlations, and (ii) Pitt’s characterization of association of multidimensional Gaussian distributions by non-negativity of covariances.

The notion of association as a form of positive dependence has proved to be of much interest in statistical physics,Footnote 1 but its potential importance goes beyond statistical physics applications. In 1980 C. M. NewmanFootnote 2 announced a central limit theorem for associated random fields that will be the focus of this chapter. For stationary random fields the role of association in the asymptotic distribution of centered and scaled sums may be compared to that of martingales for stationary sequences, where only the finiteness of second moments come into play.

We will restrict the exposition to random fields of real-valued random variables \(\{X_x:x\in {\mathbb Z}^k\}\) defined on a probability space \((\varOmega ,\mathcal {F}, P)\) and indexed by the k-dimensional integer lattice \({\mathbb Z}^k\). Here the natural extension of stationarity of sequences to that of random fields is as follows.

FormalPara Definition 23.1

The random field \(\mathbf {X} := \{X_x:x\in {\mathbb Z}^k\}\) is said to be translation invariant if for each fixed \(z\in {\mathbb Z}^k\) the random field \(\{X_{x+z}:x\in {\mathbb Z}^k\}\) is distributed as X.

FormalPara Definition 23.2

A finite set of random variables X 1, …, X m is said to be associated if

$$\displaystyle \begin{aligned} \begin{array}{rcl} & &\displaystyle \mathrm{Cov}(f(X_1,\dots,X_m),g(X_1,\dots,X_m))\\ & &\displaystyle \quad \equiv {\mathbb E}f(X_1,\dots,X_m)g(X_1,\dots,X_m) - {\mathbb E}f(X_1,\dots,X_m){\mathbb E}g(X_1,\dots,X_m) \ge 0 \end{array} \end{aligned} $$

for any pair of bounded measurable coordinatewise non-decreasing functions f, g. An arbitrary collection {X λ : λ ∈ Λ} is said to be associated if every finite subcollection is associated.

The inequalities (23.1) are referred to as the Fortuin–Kasteleyn–Ginbre (FKG) Inequalities.Footnote 3 Let us begin with a useful formula for covariance in this context. The special case of this formula with f(x) = x, g(y) = y was derived in Lehmann (1966) with attribution to Hoeffding (1940). Newman (1980) noticed the simple but significant extension presented here. (Recall the Definition 2.1 of the covariance of complex-valued random variables.)

FormalPara Lemma 1 (Hoeffding-Newman Covariance Formula)

Suppose that \(f(X),g(Y)\in L^2(\varOmega ,\mathcal {F},P)\) and assume f, g are continuously differentiable complex-valued functions on \(\mathbb {R}\) having bounded derivatives. Then,

$$\displaystyle \begin{aligned}\mathrm{Cov}(f(X),g(Y)) = \int_{\mathbb{R}}\int_{\mathbb{R}}f^\prime(x) \overline{g}^\prime(y)H_{X,Y}(x,y)dxdy,\end{aligned}$$

where

$$\displaystyle \begin{aligned} H_{X,Y}(x,y) &= \mathrm{Cov}({\mathbf{1}}_{[{X}>x]},{\mathbf{1}}_{[{Y})>y]}) = P(X> x, Y > y)\\ &\quad -P(X> x)P(Y > y),\ x,y\in\mathbb{R}. \end{aligned} $$
FormalPara Proof

Let (X 1, Y 1) and (X 2, Y 2) be independent random vectors distributed as (X, Y ). Note that 1 (u,)(X 1) −1 (u,)(X 2) is 1 if X 2 < u < X 1, − 1 if X 1 < u < X 2, and 0 otherwise. Thus, by the fundamental theorem of calculus,

$$\displaystyle \begin{aligned}f(X_1)-f(X_2) = \int_{-\infty}^\infty f^\prime(u)\{{\mathbf{1}}_{(u,\infty)}(X_1)-{\mathbf{1}}_{(u,\infty)}(X_2)\}du.\end{aligned}$$

Similarly,

$$\displaystyle \begin{aligned}\overline{g}(X_1)-\overline{g}(X_2) = \int_{-\infty}^\infty \overline{g}^\prime(u)\{{\mathbf{1}}_{(u,\infty)}(Y_1)-{\mathbf{1}}_{(u,\infty)}(Y_2)\}du.\end{aligned}$$

Thus,

$$\displaystyle \begin{aligned} \begin{array}{rcl} & &\displaystyle 2\mathrm{Cov}(f(X),g(Y))\\ & &\displaystyle \quad = {\mathbb E}[f(X_1)-f(X_2)][\overline{g}(Y_1)-\overline{g}(Y_2)]\\ & &\displaystyle \quad = {\mathbb E}\int_{-\infty}^{\infty}\int_{-\infty}^\infty \big({\mathbf{1}}_{[X_1 > u]}-{\mathbf{1}}_{[X_2 > u]}\big) \big({\mathbf{1}}_{[Y_1> v]} - {\mathbf{1}}_{[Y_2 > v]}\big)f^\prime(u)\overline{g}^\prime(v)dudv. \end{array} \end{aligned} $$

The formula follows by an application of Fubini’s theorem to interchange expected value with integrals, after canceling the factors of 2, since expanding the product of indicators one also has by independence and the specified common joint distributions of (X i, Y i), i = 1, 2, that

$$\displaystyle \begin{aligned}{\mathbb E}\big({\mathbf{1}}_{[X_1 > u]}-{\mathbf{1}}_{[X_2 > u]}\big) \big({\mathbf{1}}_{[Y_1 > v]} - {\mathbf{1}}_{[Y_2 > v]}\big) = 2\{P(X_1 > u, Y_1 > v) - P(X_1 > u)P(Y_1 > v)\}.\end{aligned} $$

\(\hfill \blacksquare \)

FormalPara Remark 23.1

Under the same conditions, the covariance formula may be expressed equivalently as

$$\displaystyle \begin{aligned} \mathrm{Cov}(f(X),g(Y))= \int_{\mathbb{R}}\int_{\mathbb{R}}\mathrm{Cov}({\mathbf{1}}_{[{X}>x]}, {\mathbf{1}}_{[{Y}>y]})f^\prime(x) \overline{g}^\prime(y)dxdy. \end{aligned}$$
FormalPara Definition 23.3

A pair of real-valued random variables X, Y  for which

$$\displaystyle \begin{aligned}P(X> u, Y > v) -P(X> u)P(Y > v) \ge 0 \quad \text{for all}\ u,v\in{\mathbb{R}},\end{aligned}$$

is said to be positive quadrant dependent Footnote 4

FormalPara Proposition 23.1

Associated random variables are (pairwise) positive quadrant dependent.

FormalPara Proof

Simply note that for any fixed number \(a\in {\mathbb {R}}\), a function of the form f(u) = 1 [a,)(u) is non-decreasing.\(\hfill \blacksquare \)

Newman’s proof of the central limit theorem exploits the covariance formulae to compare characteristic functions of sums of random variables with the corresponding product of characteristic functions through the following key lemma. The non-negativity of the covariance is essential to this comparison.

FormalPara Lemma 2 (Newman)

Suppose that \(f(X),g(Y)\in L^2(\varOmega ,\mathcal {F},P)\) where X, Y  are positive quadrant dependent and f, g are continuously differentiable complex-valued functions on \(\mathbb {R}\) with bounded derivatives. Then

$$\displaystyle \begin{aligned}|\mathrm{Cov}\big(f(X),g(Y)\big)| \le ||f^\prime||{}_\infty||g^\prime||{}_\infty\mathrm{Cov}(X,Y),\end{aligned}$$

where ||⋅|| denotes the essential supremum norm. In particular,

$$\displaystyle \begin{aligned}|{\mathbb E}e^{irX + isY} - {\mathbb E}e^{irX}{\mathbb E}e^{isY}|\le |r||s|\mathrm{Cov}(X,Y), \quad r,s\in\mathbb{R}.\end{aligned}$$
FormalPara Proof

Using Lemma 1, the assertion follows from the triangle inequality, bounding the derivatives, and the positivity of H(x, y). Specifically,

$$\displaystyle \begin{aligned}|\mathrm{Cov}\big(f(X),g(Y)\big)| \le ||f^\prime||{}_\infty||g^\prime||{}_\infty\int_{-\infty}^\infty \int_{-\infty}^\infty H(x,y)dxdy = ||f^\prime||{}_\infty||g^\prime||{}_\infty\mathrm{Cov}(X,Y).\end{aligned}$$

This completes the proof of the general bound. The second bound is simply an application.\(\hfill \blacksquare \)

Let us say that a collection of functions \(\mathcal {C}\) is association determining if one may restrict the FKG inequalities to \(f,g\in \mathcal {C}\) to establish association. A proof of the following proposition is left to Exercise 8.

FormalPara Proposition 23.2

The collections of coordinatewise non-decreasing binary 0 − 1-valued functions, and of coordinatewise non-decreasing bounded continuous functions, respectively, are each association determining.

The following properties are useful in “tracking association” and/or building examples of associated families of random variables.

FormalPara Proposition 23.3
  1. 1.

    Any subcollection of associated random variables is associated.

  2. 2.

    The union of independent collections of associated random variables is associated.

  3. 3.

    Measurable coordinatewise non-decreasing or coordinatewise nonincreasing functions of associated random variables are associated.

  4. 4.

    If for each n, \(X_1^{(n)},\dots , X_m^{(n)}\) is associated and if \((X_1^{(n)},\dots ,X_m^{(n)})\) converges in distribution to (X 1, …, X m), then X 1, …, X m is associated.

  5. 5.

    A singleton {X 1} is associated.

  6. 6.

    Independent random variables are associated.

  7. 7.

    If X, Y  are binary random variables, the X, Y  are associated if and only if Cov(X, Y ) ≥ 0.

FormalPara Proof

Part (1) follows directly from definition by considering functions whose values do not depend on variables not included in the subset. For (2), let X = (X 1, …, X m) and Y = (Y 1, …, Y n) be two independent sequences of associated random variables. Let Z = (X 1, …, X m, Y 1, …Y n). For non-decreasing bounded measurable functions f, g of m + n variables, since the joint distribution of X and Y is a product measure by independence, one has

$$\displaystyle \begin{aligned} \begin{array}{rcl} & &\displaystyle \mathop{\mathrm{Cov}}\nolimits(f(\mathbf{Z}),g(\mathbf{Z})) = {\mathbb E}f(\mathbf{Z})g(\mathbf{Z}) - {\mathbb E}f(\mathbf{ Z}){\mathbb E}g(\mathbf{Z})\\ & &\displaystyle \quad = \int_{{\mathbb{R}}^n}\int_{{\mathbb{R}}^m}\!\!\!\!\!\!f(z_1,\dots,z_{m+n})g(z_1,\dots,z_{m+n}) P_{X}(dz_1\times\cdots\times dz_m)P_{Y}(dz_{m+1}\times\cdots\times dz_{m+n})\\ & &\displaystyle \qquad - \int_{{\mathbb{R}}^n}\int_{{\mathbb{R}}^m}f(z_1,\dots,z_{m+n}) P_{X}(dz_1\times\cdots\times dz_m)P_{Y}(dz_{m+1}\times\cdots\times dz_{m+n})\\ & &\displaystyle \qquad \times\int_{\mathbb{R}^n}\int_{\mathbb{R}^m}g(z_1,\dots,z_{m+n}) P_{X}(dz_1\times\cdots\times dz_m)P_{Y}(dz_{m+1}\times\cdots\times dz_{m+n})\\ & &\displaystyle \quad = \int_{\mathbb{R}^n}\Big\{\int_{\mathbb{R}^m}f(z_1,\dots,z_{m+n})g(z_1,\dots,z_{m+n}) dP_{X}\\ & &\displaystyle \qquad - \int_{\mathbb{R}^m}f(z_1,\dots,z_{m+n}) dP_{X} \int_{\mathbb{R}^m}g(z_1,\dots,z_{m+n}) dP_{X}\Big\}dP_{Y}\\ & &\displaystyle \qquad +\int_{\mathbb{R}^n}\big\{\int_{\mathbb{R}^m}f(z_1,\dots,z_{m+n}) dP_{X} \int_{\mathbb{R}^m}g(z_1,\dots,z_{m+n}) dP_{X} \big\}dP_{Y}\\ & &\displaystyle \qquad - \int_{\mathbb{R}^n}\int_{\mathbb{R}^m}f(z_1,\dots,z_{m+n}) dP_{X}dP_{Y} \int_{\mathbb{R}^n}\int_{\mathbb{R}^m}g(z_1,\dots,z_{m+n}) dP_{X}dP_{Y}\\ & &\displaystyle \quad = \int_{\mathbb{R}^n}\mathop{\mathrm{Cov}}\nolimits\Big(f(X_1,\dots,X_m,z_{m+1},\dots,z_{n+m}), g(X_1,\dots,X_m,z_{m+1},\dots,z_{n+m})\Big)dP_{Y}\\ & &\displaystyle \qquad +\mathop{\mathrm{Cov}}\nolimits\left(\int_{\mathbb{R}^m}f(z_1,\dots,z_m,Y_1,\dots,Y_n)dP_{X}, \int_{\mathbb{R}^m}g(z_1,\dots,z_m,Y_1,\dots,Y_n)dP_{X}\right) \ge 0, \end{array} \end{aligned} $$

where dP X = P X(dz 1 ×⋯ × dz m), dP Y = P X(dz m+1 ×⋯ × dz m+n). The proof of part (3) follows directly from the definition since if X 1, …, X m are associated and Y i = h i(X 1, …, X m) for measurable coordinatewise non-decreasing functions h 1, …, h m, then f(h 1, …, h m) and g(h 1, …, h m) are bounded measurable coordinatewise non-decreasing whenever the same is true of f, g. For the coordinatewise nonincreasing case the composites f(h 1, …, h m) and g(h 1, …, h m) are bounded measurable coordinatewise nonincreasing for coordinatewise non-decreasing f, g. Now, Cov(f(h 1, …, h m), g(h 1, …, h m)) = Cov( − f(h 1, …, h m), −g(h 1, …, h m)) and − f(h 1, …, h m) and − g(h 1, …, h m) are bounded measurable coordinatewise non-decreasing. For part (4), by definition of weak convergence, \( \mathop {\mathrm {Cov}} \nolimits (f(\mathbf {X}), g(\mathbf {X})) = \lim _{n\to \infty } \mathop {\mathrm {Cov}} \nolimits (f({\mathbf {X}}^{(n)}),g({\mathbf {X}}^{(n)}))\) for bounded continuous functions f, g. Therefore the result follows since, by the previous proposition, bounded continuous coordinatewise non-decreasing are association determining. To prove (5) restrict to the association determining class of non-decreasing binary functions, and observe that for non-decreasing binary functions f, g of a single variable one has either f ≤ g or g ≤ f. Without loss of generality consider the case f ≤ g. Then \( \mathop {\mathrm {Cov}} \nolimits (f(X_1),g(X_1)) = {\mathbb E}f(X_1)g(X_1) - {\mathbb E}f(X_1){\mathbb E}g(X_1) = {\mathbb E}f(X_1) - {\mathbb E}f(X_1){\mathbb E}g(X_1) = {\mathbb E}f(X_1)(1-g(X_1)) \ge 0\). Property (6) follows by application of (2) and (5). Obviously Cov(X, Y ) ≥ 0 is necessary for X, Y  to be associated. For binary 0 − 1-valued, X, Y , suppose Cov(X, Y ) ≥ 0. Then the only binary coordinatewise non-decreasing 0 − 1-valued functions of X, Y  are f 0 ≡ 0, f 1 ≡ 1, g 0(x, y) = x, g 1(x, y) = y, h ( x, y) = x ∨ y, x ∈{0, 1}. Also, f 0 ≤ g 0, g 1 ≤ h ≤ f 1. In particular, this ordering trivially implies that Cov(f j(X, Y ), g i(X, Y ))] ≥ 0, Cov(g i(X, Y ), h(X, Y )) ≥ 0, Cov(f j(X, Y ), h(X, Y )) ≥ 0 for i, j = 0, 1. The case Cov(g 0(X, Y ), g 1(X, Y )) ≥ 0 is the hypothesis. Thus, X, Y  is an associated pair proving part (7).\(\hfill \blacksquare \)

FormalPara Remark 23.2

An alternative proof of the association of a single random variable by coupling is an Exercise 2 in Chapter 24.

FormalPara Example 1 (A Tendency to Align Under Associated Dependence)

The purpose of this exampleFootnote 5 is to illustrate the tendency for alignment under associated dependence. Consider identically distributed Bernoulli 0 − 1-valued random variables Y 0, Y 1 with distribution specified by P(Y 0 = j) = 1∕2, P(Y 1 = j|Y 0 = j) = p, j = 0, 1 for p ∈ (0, 1). Association requires that Y 1 be most likely to align with the given value of Y 0. That is,

Proposition 23.4

Y 0, Y 1 is associated if and only if p ≥ 1∕2.

Proof

First observe that taking f(i, j) = i and g(i, j) = j, i, j = 0, 1, one has that \(\mathrm {Cov}(f(Y_0,Y_1),g(Y_0,Y_1))=\mathrm {Cov}(Y_0,Y_1) = {1\over 2}p - {1\over 4} \ge 0\) if and only if p ≥ 1∕2. Thus p ≥ 1∕2 is necessary for association. Since Y 0, Y 1 are binary, it likewise follows from Proposition 23.3(g) that p ≥ 1∕2 is sufficient as well.\(\hfill \blacksquare \)

Remark 23.3

In the context of statistical physics association is often expressed as a property of the joint distribution μ of coordinate maps X x, x ∈ Λ, on the product space Ω = {−1, 1}Λ for some finite set Λ of integer lattice points connected to the origin; i.e., X x(ω) = ω x, ω ∈ Ω. The probability measure μ is said to satisfy the FKG inequalities if for any coordinatewise non-decreasing functions f, g on Ω one has

$$\displaystyle \begin{aligned} \int_\varOmega f(X_x)g(X_y)d\mu \ge \int_\varOmega f(X_x)d\mu \int_\varOmega g(X_y)d\mu, \quad x,y\in\varLambda. \end{aligned} $$
(23.1)

Equivalently, the FKG inequalities are the property that the collection of spin ± 1-valued random variables X x, x ∈ Λ have associated dependence. The ferromagnetic Ising model (see Chapter 13, Exercise 13) provides a well-known example in this context. The FKG inequalities for the ferromagnetic Ising model will be proved in Chapter 24, Proposition 24.12. The magnetic spin alignment reflected by association is a distinct feature of ferromagnets, responsible for their ability to be magnetized by placement in an external magnetic field. Other more general inequalities to be considered in Chapter 24 are available that imply association.Footnote 6

The proof of the central limit theorem exploits the following basic inequality.

FormalPara Lemma 3 (Newman’s Inequality)

Suppose that X 1, …, X m are associated random variables having finite variance. Then for any \(r_1,\dots ,r_m\in \mathbb {R}\) one has

$$\displaystyle \begin{aligned}|{\mathbb E}\exp\{i\sum_{j=1}^mr_jX_j\} - \prod_{j=1}^m{\mathbb E}e^{ir_jX_j}| \le\sum_{1\le j<k\le m}|r_j||r_k|\mathop{\mathrm{Cov}}\nolimits(X_j,X_k).\end{aligned}$$
FormalPara Proof

The proof is by induction on m. The case m = 1 is obvious and the case m = 2 was proven in Lemma 2. Assume the inequality holds for all m ≤ M and rearrange the indices (if necessary) in such a way that sgn(r j) is constant, say 𝜖 (either + 1 or − 1), for 1 ≤ j ≤ m 0, and sgn(r j) is also constant, say δ, for m 0 + 1 ≤ j ≤ M. Then 𝜖r j ≥ 0, δr j ≥ 0, so that each of \(X = \sum _{j=1}^{m_0}\epsilon r_jX_j\) and \(Y = \sum _{j=m_0+1}^{M+1}\delta r_jX_j\) is a non-decreasing function of associated variables X 1, …, X M+1 and therefore associated. Also \(\sum _{j=1}^{M+1}r_jX_j = \epsilon X + \delta Y\). Thus, applying Lemma 2 and the induction hypothesis, one has

$$\displaystyle \begin{aligned} \begin{array}{rcl} & &\displaystyle \left|{\mathbb E}\exp\left\{i\sum_{j=1}^{M+1}r_jX_j\right\} - \prod_{j=1}^{M+1}{\mathbb E}e^{ir_jX_j}\right|\\ & &\displaystyle \quad \le \left|{\mathbb E}e^{i(\epsilon X + \delta Y)} - {\mathbb E}e^{i\epsilon X} {\mathbb E}e^{i\delta Y}\right| + \left|{\mathbb E}e^{i\epsilon X}{\mathbb E}e^{i\delta Y} - {\mathbb E}e^{i\epsilon X}\prod_{j=m_0+1}^{M+1}{\mathbb E}e^{ir_jX_j}\right| \\ & &\displaystyle \qquad + \left|{\mathbb E}e^{i\epsilon X}\prod_{j=m_0+1}^{M+1}{\mathbb E}e^{ir_jX_j} - \left(\prod_{j=1}^{m_0}{\mathbb E}e^{ir_jX_j}\right)\prod_{j=m_0+1}^{M+1} {\mathbb E}e^{ir_jX_j}\right|\\ & &\displaystyle \quad \le |\epsilon||\delta|\mathrm{Cov}(X,Y) + \left|{\mathbb E}e^{i\delta Y} - \prod_{j=m_0+1}^{M+1}{\mathbb E}e^{ir_jX_j}\right| + \left|{\mathbb E}e^{i\epsilon X} - \prod_{j=1}^{m_0}{\mathbb E}e^{ir_jX_j}\right|\\ & &\displaystyle \quad \le \mathop{\mathrm{Cov}}\nolimits\left(\sum_{j=1}^{m_0}\epsilon r_jX_j,\sum_{k=m_0+1}^{M+1}\delta r_kX_k\right) + \sum_{m_0+1\le j < k\le M+1}|r_j||r_k| \mathop{\mathrm{Cov}}\nolimits(X_j,X_k)\\ & &\displaystyle \qquad +\sum_{1\le j< k\le m_0}|r_j||r_k|\mathop{\mathrm{Cov}}\nolimits(X_j,X_k)\\ & &\displaystyle \quad = \sum_{1\le j <k\le M+1} |r_j||r_k|\mathop{\mathrm{Cov}}\nolimits(X_j,X_k). \end{array} \end{aligned} $$

\(\hfill \blacksquare \)

Of course, one may prefer the equivalent expression of Newman’s bound as

$$\displaystyle \begin{aligned} {1\over 2}\sum_{1\le j,k\le m, j\neq k}|r_j||r_k|\mathop{\mathrm{Cov}}\nolimits(X_j,X_k) = \sum_{1\le j <k\le M+1}|r_j||r_k|\mathop{\mathrm{Cov}}\nolimits(X_j,X_k). \end{aligned} $$
(23.2)
FormalPara Lemma 4

Let \(\mathbf {X} := \{X_x:x\in {\mathbb Z}^k\}\) be a translation invariant random field of associated random variables having finite second moments. Assume that

$$\displaystyle \begin{aligned}\gamma := \sum_{x\in{\mathbb Z}^k}\mathop{\mathrm{Cov}}\nolimits(X_0,X_x) < \infty.\end{aligned}$$

Let

$$\displaystyle \begin{aligned}B_x^{(N)} := \{{y}\in{\mathbb Z}^k: Nx_l\le y_l < N(x_l+1), l=1,\dots, k\}\end{aligned}$$

denote a “block of lattice sites of length N located near Nx”, x = (x 1, …, x k), and define a random field of centered and rescaled “block sum averages” by

$$\displaystyle \begin{aligned}A_x^{(N)} = N^{-{k\over 2}}\sum_{{y}\in B_x^{(N)}}(X_{y}-{\mathbb E}X_{y}),\qquad x\in{\mathbb Z}^k.\end{aligned}$$

Then

$$\displaystyle \begin{aligned}\lim_{N\to\infty}\mathop{\mathrm{Var}}\nolimits(A_x^{(N)}) = \gamma,\quad \text{and} \quad \lim_{N\to\infty}\mathop{\mathrm{Cov}}\nolimits(A_x^{(N)},A_y^{(N)}) = 0 \quad x\neq y.\end{aligned}$$
FormalPara Proof

By translation invariance it suffices to check the asserted limits for the case x = 0. Clearly

$$\displaystyle \begin{aligned}\mathop{\mathrm{Var}}\nolimits(A_0^{(N)}) = N^{-k}\sum_{x\in B_0^{(N)}}\sum_{y\in B_0^{(N)}}\mathrm{Cov}(X_0,X_{y-x}) \le N^{-k}\sum_{x\in B_0^{(N)}}\sum_{y\in {\mathbb Z}^k}\mathop{\mathrm{Cov}}\nolimits(X_0,X_{y-x}).\end{aligned}$$

In particular, letting N →,

$$\displaystyle \begin{aligned}\limsup_{N\to\infty}\mathop{\mathrm{Var}}\nolimits(A_0^{(N)}) \le \gamma.\end{aligned}$$

For the reverse inequality let 0 < 𝜖 < 1∕2 and define

$$\displaystyle \begin{aligned}B_0^{(N)}(\epsilon) := \{z =(z_1,\dots,z_k): \epsilon N < z_i < (1-\epsilon)N, i = 1,\dots, k\}.\end{aligned}$$

Note that for \(x\in B_0^{(N)}, y\notin B_0^{(N)}\), |x − y|≥ 𝜖N, so that

$$\displaystyle \begin{aligned} \begin{array}{rcl} \mathop{\mathrm{Var}}\nolimits(A_0^{(N)}) & \ge&\displaystyle N^{-k}\sum_{x\in B_0^{(N)}(\epsilon)}\sum_{y\in B_0^{(N)}}\mathrm{Cov}(X_0,X_{y-x})\\ & \ge&\displaystyle N^{-k}\sum_{x\in B_0^{(N)}(\epsilon)}\sum_{|y-x|\le \epsilon N}\mathop{\mathrm{Cov}}\nolimits(X_0,X_{y-x}) = {|B_0^{(N)}(\epsilon)|\over N^k}\sum_{|z|\le \epsilon N}\mathop{\mathrm{Cov}}\nolimits(X_0,X_z), \end{array} \end{aligned} $$

where |B| denotes cardinality of the set B. Choosing a sequence such that 𝜖 N 0 and 𝜖 N N →, one obtains

$$\displaystyle \begin{aligned}\liminf_{N\to\infty}\mathop{\mathrm{Var}}\nolimits(A_0^{(N)}) \ge \gamma.\end{aligned}$$

This proves the asserted asymptotic variance. For the covariance decay choose a sequence M N ≤ N such that M NN → 1 and N − M N → as N →.

$$\displaystyle \begin{aligned} \begin{array}{rcl} & &\displaystyle \mathop{\mathrm{Var}}\nolimits(A_0^{(N)} - A_0^{(M_N)})\\ & &\displaystyle \quad = \mathop{\mathrm{Var}}\nolimits(A_0^{(N)}) +\mathop{\mathrm{Var}}\nolimits(A_0^{(M_N)}) - 2(NM_N)^{-{k\over 2}} \mathop{\mathrm{Cov}}\nolimits(\sum_{y\in B_0^{(N)}}X_y,\sum_{y\in B_0^{(M_N)}}X_y)\\ & &\displaystyle \quad \le \mathop{\mathrm{Var}}\nolimits(A_0^{(N)}) +\mathop{\mathrm{Var}}\nolimits(A_0^{(M_N)}) - 2({M_N\over N})^{{k\over 2}}\mathop{\mathrm{Var}}\nolimits(X_0^{(M_N)}) \to 0. \end{array} \end{aligned} $$

One has for z ≠ 0,

$$\displaystyle \begin{aligned} \begin{array}{rcl} \mathop{\mathrm{Cov}}\nolimits(A_0^{(N)},A_z^{(N)}) & =&\displaystyle \mathop{\mathrm{Cov}}\nolimits(A_0^{(N)}-A_0^{(M_N)},A_z^{(N)}) + \mathop{\mathrm{Cov}}\nolimits(A_0^{(M_N)},A_z^{(N)})\\ & \le&\displaystyle \sqrt{\mathop{\mathrm{Var}}\nolimits(A_0^{(N)} - A_0^{(M_N)})}\sqrt{\mathop{\mathrm{Var}}\nolimits(A_z^{(N)})} + \mathop{\mathrm{Cov}}\nolimits(A_0^{(M_N)},A_z^{(N)}). \end{array} \end{aligned} $$

Thus the proof of covariance decay is therefore completed by the following calculation

$$\displaystyle \begin{aligned} \begin{array}{rcl} \mathop{\mathrm{Cov}}\nolimits(A_0^{(M_N)},A_z^{(N)}) & =&\displaystyle M_N^{-{k\over 2}}N^{-{k\over 2}} \sum_{x\in B_0^{(M_N)}}\sum_{y\in B_z^{(N)}}\mathop{\mathrm{Cov}}\nolimits(X_0,X_{x-y})\\ & \le&\displaystyle \left({M_N\over N}\right)^{{k\over 2}}M_N^{-k} \sum_{x\in B_0^{(M_N)}}\sum_{|y-x|\ge N-M_N}\mathop{\mathrm{Cov}}\nolimits(X_0,X_{x-y})\\ & =&\displaystyle \left({M_N\over N}\right)^{k\over 2}\sum_{|y|\ge N-M_N}\mathop{\mathrm{Cov}}\nolimits(X_0,X_{y}) \to 0, \end{array} \end{aligned} $$

as N →.\(\hfill \blacksquare \)

To state Newman’s central limit theorem it is helpful to have some extra notation. For \(x = (x_1,\dots ,x_k)\in {\mathbb Z}^k\), a “block of lattice sites of length N located near Nx” is denoted

$$\displaystyle \begin{aligned}B_x^{(N)} := \{{y}\in{\mathbb Z}^k: Nx_l\le y_l < N(x_l+1), l=1,\dots, k\}.\end{aligned}$$

Given a translation invariant random field with finite second moments \(\mathbf {X} := \{X_x:x\in {\mathbb Z}^k\}\), the random field of centered and rescaled “block sum averages” is denoted

$$\displaystyle \begin{aligned}A_x^{(N)} = N^{-{k\over 2}}\sum_{{y}\in B_x^{(N)}}(X_{y}-{\mathbb E}X_{y}).\end{aligned}$$
FormalPara Theorem 23.5 (Newman’s Central Limit Theorem)

Let \(\mathbf {X} := \{X_x:x\in {\mathbb Z}^k\}\) be a translation invariant random field of associated random variables having finite second moments. Assume that

$$\displaystyle \begin{aligned}\gamma := \sum_{x\in{\mathbb Z}^k}\mathop{\mathrm{Cov}}\nolimits(X_0,X_x) < \infty.\end{aligned}$$

Then for any finite number n of lattice sites z 1, …, z n, the (finite dimensional) distribution of \((A_{z_1}^{(N)},A_{z_2}^{(N)},\dots ,A_{z_n}^{(N)})\) converges weakly as N → to the Gaussian distribution with mean zero and covariance matrix \( \mathop {\mathrm {diag}} \nolimits (\gamma ,\dots ,\gamma )\).

FormalPara Proof

By Newman’s inequality and association inherited by the \(A_z^{(N)}, z\in {\mathbb Z}^k\), and the previous lemma, it suffices to show convergence of \(A_z^{(N)}\), i.e., n = 1, to obtain convergence for finite dimensional distributions of arbitrary size n ≥ 1. More specifically, if one can show \({\mathbb E}e^{irA_z^{(N)}}\to e^{-{\gamma \over 2}r^2}\) as N →, then

$$\displaystyle \begin{aligned} \lim_{N\to \infty}\left|{\mathbb E}e^{i\sum_{j=1}^nr_jA_{z_j}^{(N)}} - \prod_{j=1}^ne^{-{\gamma\over 2}r_j^2}\right| \le \lim_{N\to\infty}\sum_{1\le m < j\le n}|r_m||r_j|\mathop{\mathrm{Cov}}\nolimits(A_{z_m}^{(N)},A_{z_j}^{(N)}) = 0. \end{aligned} $$
(23.3)

As noted earlier, by translation invariance it is sufficient to consider the case z = 0. For fixed M = 1, 2, …, let \(M_N = M[{N\over M}] \le N\), where [⋅] denotes integer-part. In the proof of the previous lemma it was shown that \( \mathop {\mathrm {Var}} \nolimits (A_0^{(N)}-A_0^{(M_N)})\to 0\) as N →. Thus, one has

$$\displaystyle \begin{aligned} \begin{array}{rcl} \left|{\mathbb E}e^{irA_0^{(N)}} - {\mathbb E}e^{irA_0^{(M_N)}}\right| & \le&\displaystyle {\mathbb E}\left|e^{ir(A_0^{(N)} - A_0^{(M_N)})} - 1\right| \\ & \le&\displaystyle {\mathbb E}\left|A_0^{(N)}- A_0^{(M_N)}\right| \le \sqrt{\mathop{\mathrm{Var}}\nolimits(A_0^{(N)}-A_0^{(M_N)})}\to 0. \end{array} \end{aligned} $$

Next, using the simple property of the block averages that

$$\displaystyle \begin{aligned} A_0^{(N_1N_2)} = N_1^{-{k\over 2}}\sum_{y\in B_0^{(N_1)}}A_0^{(N_2)}, \end{aligned} $$
(23.4)

(for \(M_N = M[{N\over M}] = N_1N_2\)), one has by Newman’s inequality (applied to \(A_0^{(M)}\))

$$\displaystyle \begin{aligned}\left|{\mathbb E}e^{irA_0^{(M[{N\over M}])}} - \left({\mathbb E}e^{ir[{N\over M}]^{-{k\over 2}}A_0^{(M)}}\right)^{([{N\over M}])^{k}}\right| \le {1\over 2}\sum_{{x,y\in B_0^{([{N\over M}])}} x\neq y}\!\!\!r^2\left(\left[{N\over M}\right]\right)^{-k}\!\!\mathrm{Cov}\left(A_x^{(M)},A_y^{(M)}\right). \end{aligned}$$

This upper bound may be equivalently expressed using the block average property (23.4) as

$$\displaystyle \begin{aligned} \begin{array}{rcl} & &\displaystyle {r^2\over 2} \left\{\mathrm{Cov}\left(A_0^{(M[{N\over M}])}, A_0^{(M[{N\over M}])}\right) - \left[{N\over M}\right]^{-k} \sum_{y\in B_0^{([{N\over M}])}}\mathrm{Cov}\left(A_y^{(M)},A_y^{(M)}\right)\right\} \\ & &\displaystyle \quad = {r^2\over 2}\left\{\mathop{\mathrm{Var}}\nolimits\big(A_0^{(M[{N\over M}])}\big) - \mathop{\mathrm{Var}}\nolimits\big(A_0^{(M)}\big)\right\} \to {r^2\over 2}\left\{\gamma - \mathop{\mathrm{Var}}\nolimits\big(A_0^{(M)}\big)\right\}. \end{array} \end{aligned} $$

Letting N → with M fixed, it follows that

$$\displaystyle \begin{aligned} \begin{array}{rcl} \left({\mathbb E}e^{ir[{N\over M}]^{-{k\over 2}}A_0^{(M)}}\right)^{([{N\over M}])^{k}} & =&\displaystyle \left( 1 - {r^2\over 2}\big(\big[{N\over M}\big]\big)^{-k}\mathop{\mathrm{Var}}\nolimits(A_0^{(M)}) + o\big(\big[{N\over M}\big]^{-k}\big)\right)\\ & \to&\displaystyle e^{-{{\mathop{\mathrm{Var}}\nolimits(A_0^{(M)})\over 2}r^2}}. \end{array} \end{aligned} $$
(23.5)

Thus, combining these estimates, one has

$$\displaystyle \begin{aligned}\limsup_{N\to\infty}\left|{\mathbb E}e^{irA_0^{(N)}} - e^{-{\gamma\over 2}r^2}\right| \le {r^2\over 2}\left\{\gamma -\mathop{\mathrm{Var}}\nolimits(A_0^{(M)})\right\} + \left\{e^{-{\mathop{\mathrm{Var}}\nolimits(A_0^{(M)})\over 2}r^2} - e^{-{\gamma\over 2}r^2}\right\}. \end{aligned}$$

Finally, letting M → completes the proof.\(\hfill \blacksquare \)

The following exampleFootnote 7 is a significant frameworkFootnote 8 in which association naturally occurs.

FormalPara Example 2 (Two-dimensional Bond Percolation Model)

The independent bond percolation model Footnote 9 on \({\mathbb Z}^2\) can be defined as follows: Each lattice site \(x\in {\mathbb Z}^2\) has four nearest neighbor sites of the form y = x ± e where e is either (1, 0) or (0, 1). A pair of such nearest neighbor sites x, y, in turn, defines a (unoriented) bond b = {x, y} of Z 2. Let \({\mathbb L}^2\) denote the collection all such bonds of \({\mathbb Z}^2\). Let \(\{Y_{b}:b\in {\mathbb L}^2 \}\) be the i.i.d. random field of Bernoulli 0 − 1 valued random variables with p = P p(Y b = 1) defined by coordinate projections on the product probability space \(\varOmega = \{0,1\}^{{\mathbb L}^2}\) equipped with the σ-field \({\mathcal F}\) generated by finite dimensional cylinder sets and product measure \(P_p = \prod _{{\mathbb L}^2}(q\delta _{\{0\}} + p\delta _{\{1\}})\), where q = 1 − p. Declare the bonds b as open or closed according to whether the value of Y b is 1 or 0, respectively. The usual interpretation of percolation is as a model for a disordered porous medium in which the open bonds permit fluid flow between nearest neighbor sites, while closed bonds block the passage of fluid. Two sites \(x,z\in {\mathbb Z}^2\) are said to be connected by an open path, denoted x ↔ z if there is a succession of sites in \({\mathbb Z}^2\), x 0 = x, x 1, …x m = z, m ≥ 1, such that pairs x i, x i+1 are nearest neighbor with b i = {x i, x i+1} open (i = 0, …, m − 1). A cluster C(x) at site \(x\in {\mathbb Z}^2\) is defined by the (random) set

$$\displaystyle \begin{aligned}C(x) := \{z\in{\mathbb Z}^2: x\leftrightarrow z\},\qquad x\in{\mathbb Z}^2.\end{aligned}$$

The cluster size refers to the (possibly infinite) cardinality of C(x) and is denoted by |C(x)|. The set C(x) is referred to as a percolation cluster Footnote 10 at x if |C(x)| = .

Definition 23.4

The existence of an infinite cluster that is the event \(E := \cup _{x\in {\mathbb Z}^2}[|C(x)| = \infty ]\) is referred to as the percolation event. Also, the percolation probability is defined by

$$\displaystyle \begin{aligned} \rho \equiv\rho(p) := P_p(E) = P_p(\cup_{x\in{\mathbb Z}^2}[|C(x)| = \infty]). \end{aligned} $$
(23.6)

Proposition 23.6

Define

$$\displaystyle \begin{aligned} \theta \equiv \theta(p) := P_p(|C(0)| = \infty). \end{aligned} $$
(23.7)

Then ρ(p) = 0 or ρ(p) = 1 if and only if θ(p) = 0 or θ(p) > 0, respectively.

Proof

Note that the percolation event \(E = \cup _{x\in {\mathbb Z}^2}[|C(x)| = \infty ]\) is a tail event for a countable collection of i.i.d. random variables \(\{Y_b: b\in {\mathbb L}^2\}\). The assertion follows immediately from subadditivity and Kolmogorov’s zero-one law.Footnote 11 Namely, ρ(p) = 0 or 1, θ(p) ≤ ρ(p), and \(\rho (p)\le \sum _{x\in {\mathbb Z}^2}\theta (p)\). So ρ(p) = 0 if and only if θ(p) = 0, and θ(p) > 0 if and only if ρ(p) = 1.\(\hfill \blacksquare \)

Remark 23.4

A proof of the monotonicity of the percolation probability p → θ(p) as a function of p by monotone coupling techniques is given for Proposition 24.3 in Chapter 24.

Definition 23.5

The critical probability for existence of an infinite cluster, i.e., percolation, is defined by

$$\displaystyle \begin{aligned}p_c = \sup\{p\in[0,1]: \theta(p) = 0\}.\end{aligned}$$

Remark 23.5

An important role for the FKG inequalities occurs in a simplified proof of the criticality of p = 1∕2 for bond percolation by Bollabás and Riordan (2006). The original proof is the result of Kesten (1980), after completing the upper bound calculation from two-decades earlier by Harris (1960), that p c = 1∕2 for 2d-bond percolation. The upper bound p c ≤ 1∕2 had already involved inequalities, now known as Harris inequalities, that may be viewed as a special case of the FKG inequalities for product measure.

For probability measures μ 1 and μ 2 on the compact space (for product topology) S = {0, 1}Λ, where Λ is a finite or countably infinite set, the Holley inequalities Footnote 12 are a generalization of associated dependence of the form

$$\displaystyle \begin{aligned} \int_Sfd\mu_1 \ge \int_Sfd\mu_2, \end{aligned} $$
(23.8)

for coordinatewise non-decreasing functions f on S; equivalently it is non-decreasing with respect to the partial order ≼ on S defined by x ≼ y if and only if x j ≤ y j, j ∈ Λ for x, y ∈ S,

To see that (23.8) embodies association of a probability distribution μ on S, let f, g be nonnegative coordinatewise non-decreasing functions on S. Take \(d\mu _1 = {gd\mu \over \int _Sgd\mu }, \mu _2 = \mu \). Holley’s inequalities for μ 1 and μ 2 are then equivalent to the FKG inequalities for μ. The so-called log-convexity type conditionsFootnote 13 on μ, μ 1, μ 2 are available to ensure either FKG inequalities or Holley inequalities, respectively.

In the so-called disordered phase Footnote 14 defined by 0 < p < p c, the lattice a.s. consists of infinitely many disjoint finite random clusters of lattice sites connected by open bonds. The following simple path counting argument demonstrates the existence of a disordered phase.

Proposition 23.7

p c > 0.

Proof

Consider the number N n of open (self-avoiding) paths of length n starting at the origin. Clearly, noting that such a path can connect to any of the 4 neighbors of (0, 0) and continue in n − 1 self-avoiding steps, N n ≤ 4(3n−1). Thus for p < 1∕3, applying a useful but very simple inequality for nonnegative integer-valued random variables,

$$\displaystyle \begin{aligned}P_p(N_n\ge 1) \le {\mathbb E}_pN_n \le 4(3^{n-1})p^n\to 0\ \text{as}\ n\to\infty.\end{aligned}$$

In particular,Footnote 15 since θ(p) ≤ P p(N n ≥ 1) for all n ≥ 1, one has θ(p) = 0 for p < 1∕3 and hence p c ≥ 1∕3.\(\hfill \blacksquare \)

Lemma 5 (Harris’ LemmaFootnote 16)

Let \(X_x = {\mathbf {1}}_{[C(x)\neq \emptyset ]}, x\in {\mathbb Z}^2\). Then \(\{X_{x}:{x}\in {\mathbb Z}^2\}\) is a translation invariant random field of associated random variables.

Proof

Translation invariance follows directly from the definition and the fact that the distribution of the underlying random field \(\{Y_{b}:{b}\in {\mathbb L}^2\}\) is invariant under translation of the lattice \({\mathbb Z}^2\). Also each X x, \({x}\in {\mathbb Z}^2\), is a (coordinatewise) non-decreasing function of \(\mathbf {Y} \equiv \{Y_{b}:{b}\in {\mathbb L}^2\}\). Apply Proposition 23.3.\(\hfill \blacksquare \)

For an application of the central limit theorem in this context we will establish the asymptotic normality of the cumulative size \(\sum _{x\in B_0^{(N)}}|C(x)|\) of all clusters connected to points in the cube \(B_0^{(N)}\), suitably centered and scaled for 0 < p < 1∕3. Additional applicationsFootnote 17 along these lines are given in the exercises. The conditions for the theorem will be checked in a sequence of simple lemmas, the first of which is a special case of an inequality known as the BK Inequality after its originators van den Berg and Kesten (1985).

To prepare for the BK inequality let us refer to a random variable X defined on Ω as increasing if X(ω 1) ≤ X(ω 2) whenever ω 1, ω 2 ∈ Ω satisfy ω 1(b) ≤ ω 2(b) for all \(b\in {\mathbb L}^2\); the latter set of coordinatewise inequalities defines a partial order on Ω which we denote as ω 1 ≼ ω 2. Similarly we say that an event \(A\in \mathcal {F}\) is an increasing event if 1 A is an increasing random variable. Connectivity events of the form [x ↔ y] are prototypical increasing events. From here out we restrict to this case.

Definition 23.6

The disjoint occurrence of two increasing events A = [x ↔ y], B = [z ↔ w] is an event denoted by A ∘ B and defined by

$$\displaystyle \begin{aligned}{}[x\leftrightarrow z]\circ [y\leftrightarrow w] = [x\leftrightarrow z, x\notin C(y), y\leftrightarrow w].\end{aligned}$$

Lemma 6 (BK Inequality-Special Case)

For \(x,y,w,z\in {\mathbb Z}^2\)

$$\displaystyle \begin{aligned}P_p([x\leftrightarrow z]\circ[y\leftrightarrow w])\le P_p(x\leftrightarrow z)P_p(y\leftrightarrow w).\end{aligned}$$

Proof

Observe that

$$\displaystyle \begin{aligned} \begin{array}{rcl} & &\displaystyle P_p([x\leftrightarrow z]\circ[y\leftrightarrow w]) = {\mathbb E}({\mathbf{1}}_{[x\leftrightarrow z]}{\mathbf{1}}_{[x\notin C(y)]}{\mathbf{1}}_{[y\leftrightarrow w]})\\ & &\displaystyle \quad ={\mathbb E}({\mathbf{1}}_{[x\leftrightarrow z]}{\mathbf{1}}_{[x\notin C(y)]}{\mathbf{1}}_{[x\notin C(w)]}{\mathbf{1}}_{[y\leftrightarrow w]})\\ & &\displaystyle \quad = P_p(x\leftrightarrow z, x\notin C(y),x\notin C(w),y\leftrightarrow w)\\ & &\displaystyle \quad = P_p(y\leftrightarrow w\vert x\leftrightarrow z, x\notin C(y), x\notin C(w)) P_p(x\leftrightarrow z, x\notin C(y),x\notin C(w))\\ & &\displaystyle \quad \le P_p(y\leftrightarrow w\vert x\leftrightarrow z, x\notin C(y), x\notin C(w)) P_p(x\leftrightarrow z). \end{array} \end{aligned} $$
(23.9)

So it suffices to show that

$$\displaystyle \begin{aligned} P_p(y\leftrightarrow w\vert x\leftrightarrow z, x\notin C(y), x\notin C(w)) \le P_p(y\leftrightarrow w). \end{aligned} $$
(23.10)

Let A be an arbitrary but fixed finite connected subgraph of \({\mathbb L}^2\) with vertices x and z connected in A, but not connected to y nor w, i.e., having the properties of the conditioning. The graph A is referred to as a lattice animal. Denote the vertex and edge sets of A by A v and A e, respectively. Also define the edge boundary e A as the set of (closed) edges which do not belong to A e but have at least one endvertex in A v. First consider the case in which y is “interior” to the lattice animal A and w is “exterior” to A in the sense that any path of bonds connecting y to w must include a bond from e A. Then, since on [C(x) = A] the edges in e A are all closed, one has for this case that

$$\displaystyle \begin{aligned}P_p(y\leftrightarrow w, C(x) = A\vert x\leftrightarrow z, x\notin C(y), x\notin C(w)) = 0.\end{aligned}$$

On the other hand, for the case when e A does not obstruct the existence of a path of open bonds connecting y to w, let us see that one may use association (FKG inequalities) to establish that

$$\displaystyle \begin{aligned} P_p(y\leftrightarrow w, C(x) = A \vert x\leftrightarrow z, x\notin C(y),x\notin C(w)) \le P_p(y\leftrightarrow w). \end{aligned} $$
(23.11)

To prove (23.11) consider a lattice animal A such that x ↔ z and xC(y), xC(w) and for which y, w are not separated by e A in the previous sense. Then 1 [yw] and 1 [C(x)=A] are, respectively, increasing and decreasing functions of independent random variables. Thus, by association (see Exercise 5),

$$\displaystyle \begin{aligned} \begin{array}{rcl} P_p(y\leftrightarrow w, C(x) & =&\displaystyle A, x\leftrightarrow z, x\notin C(y), x\notin C(w)) = P_p(y\leftrightarrow w, C(x) = A)\\ & \le&\displaystyle P_p(y\leftrightarrow w)P_p(C(x) = A)\\ & =&\displaystyle P_p(y\leftrightarrow w)P_p(C(x) = A, x\leftrightarrow z, x\notin C(y), x\notin C(w)). \end{array} \end{aligned} $$

Divide by the common (positive) probability P p(C(x) = A, x ↔ z, xC(y), xC(w)) to obtain the bound (23.11). Then summing over such lattice animals A completes the proof of (23.10) and thus the BK inequality follows.\(\hfill \blacksquare \)

Lemma 7

$$\displaystyle \begin{aligned}\mathrm{Cov}({\mathbf{1}}_{[x\leftrightarrow z]},{\mathbf{1}}_{[y\leftrightarrow w]}) \le {\mathbb E}({\mathbf{1}}_{[x\leftrightarrow z]}{\mathbf{1}}_{[x\leftrightarrow y]}{\mathbf{1}}_{[x\leftrightarrow w]}).\end{aligned}$$

Proof

Let \(\tau (x,z,y,w) := {\mathbb E}({\mathbf {1}}_{[x\leftrightarrow z]}{\mathbf {1}}_{[x\leftrightarrow y]}{\mathbf {1}}_{[x\leftrightarrow w]})\). Note that

$$\displaystyle \begin{aligned} \begin{array}{rcl} {} {\mathbb E}({\mathbf{1}}_{[x\leftrightarrow z]}{\mathbf{1}}_{[y\leftrightarrow w]}) & =&\displaystyle \tau(x,z,y,w) + {\mathbb E}({\mathbf{1}}_{[x\leftrightarrow z]}{\mathbf{1}}_{[x\notin C(y)]} {\mathbf{1}}_{[y\leftrightarrow w]})\\ & =&\displaystyle \tau(x,z,y,w) + P_p([x\leftrightarrow z]\circ[y\leftrightarrow w]). \end{array} \end{aligned} $$
(23.12)

Now apply the BK inequality to the second term. Subtracting \({\mathbb E}{\mathbf {1}}_{[x\leftrightarrow z]}{\mathbb E}{\mathbf {1}}_{[y\leftrightarrow w]}\) from both sides establishes the assertion of the lemma.\(\hfill \blacksquare \)

Lemma 8

Let U x = |C(x)| =∑z 1 [xz]. Then

$$\displaystyle \begin{aligned}\gamma = \sum_{x\in{\mathbb Z}^2}\mathrm{Cov}(U_0,U_x) \le {\mathbb E}|C(0)|{}^3.\end{aligned}$$

Proof

Using bi-linearity of covariance and the bound from the first lemma,

$$\displaystyle \begin{aligned}\mathrm{Cov}({\mathbf{1}}_{[0\leftrightarrow w]},{\mathbf{1}}_{[y\leftrightarrow z]})\le \tau(0,w,y,z) = {\mathbb E}({\mathbf{1}}_{[0\leftrightarrow w]}{\mathbf{1}}_{[w\leftrightarrow y]}{\mathbf{1}}_{[y\leftrightarrow z]}),\end{aligned}$$

it follows that

$$\displaystyle \begin{aligned} \begin{array}{rcl} \sum_{y\in{\mathbb Z}^2}\mathrm{Cov}(U_0,U_y) & \le&\displaystyle \sum_{y,w,z}\tau(0,w,y,z) = \sum_{y,w,z}{\mathbb E}({\mathbf{1}}_{[0\leftrightarrow w]} {\mathbf{1}}_{[0\leftrightarrow y]}{\mathbf{1}}_{[0\leftrightarrow z]})\\ & =&\displaystyle {\mathbb E}\left(\sum_{y,w,z}{\mathbf{1}}_{[0\leftrightarrow w]} {\mathbf{1}}_{[0\leftrightarrow y]}{\mathbf{1}}_{[0\leftrightarrow z]}\right) = {\mathbb E}|C(0)|{}^3. \end{array} \end{aligned} $$

\(\hfill \blacksquare \)

Lemma 9

\({\mathbb E}|C(0)|{ }^m < \infty \) for all m ≥ 0.

Proof

Note from the proof of Proposition 23.7 that for p < 1∕3, \(\tau (0,x) = P_p(0\leftrightarrow x) \le {4\over 3}e^{-c|x|}\), where \(c = -\ln (3p) > 0\). Thus, denoting by R k the complimentary region to the (two-dimensional) square of side-lengths 2k + 1 centered at 0, one has the tail probability bound

$$\displaystyle \begin{aligned} \begin{array}{rcl} P_p(|C(0)| \ge (2k+1)^2) & \le&\displaystyle \sum_{x\in R_k}\tau(0,x) \le {4\over 3}\sum_{x\in R_k}e^{-c|x|}\\ & \le&\displaystyle c^\prime\sum_{j=k+1}^\infty je^{-cj} \le c^{\prime\prime}ke^{-c^{\prime\prime}k}, \end{array} \end{aligned} $$

for a suitable c ′′ > 0. The second to the last inequality is a consequence of summing over x on the perimeters at respective distances j from the origin, noting that the number of sites on the perimeter is linear in j. It now follows that \(\sum _{k=1}^\infty k^{2m-1}P(\sqrt {|C(0)|} \ge k) <\infty \), and therefore \({\mathbb E}|C(0)|{ }^m = {\mathbb E}(\sqrt {|C(0)|})^{2m} < \infty \).\(\hfill \blacksquare \)

In view of Newman’s central limit theorem this series of lemmas establishes the following fluctuation law.

Theorem 23.8

Consider two-dimensional bond percolation with 0 < p < 1∕3. Then the centered and rescaled cumulative size \({1\over N}\sum _{x\in B_0^{(N)}}\{|C(x)|-{\mathbb E}|C(0)|\}\) of all clusters connected to points in the cube \(B_0^{(N)}\) is asymptotically normal with mean zero and variance \(0< \gamma = \sum _{x\in {\mathbb Z}^2}\mathrm {Cov}(|C(0)|,|C(x)|) \le {\mathbb E}|C(0)|{ }^3<\infty \).

We close this chapter with a celebrated result of Loren Pitt on association of positively correlated normal random variables. We provide the essence of hisFootnote 18 very clever proof leaving the technical details to exercises.

FormalPara Theorem 23.9 (Pitt)

Let X = (X 1, …, X k) be a positively correlated normal random vector. Then {X 1, …, X k} is an associated family.

FormalPara Proof

First consider the case in which the covariance matrix Γ = ((γ i,j)) is non-singular matrix with nonnegative entries. One may show that the collection of coordinatewise non-decreasing functions f, g on \({\mathbb R}^k\) that are continuously differentiable with bounded partials is association determining (Exercise 12). As a result, one may restrict to such functions f, g. Let Z = (Z 1, …, Z k) be an independent copy of X and define

$$\displaystyle \begin{aligned}Y(\lambda) = \lambda X + (1-\lambda^2)^{1\over 2}Z.\end{aligned}$$

Then, Y (λ) is mean-zero normal with covariance matrix

$$\displaystyle \begin{aligned}\mathop{\mathrm{Cov}}\nolimits(Y_i(\lambda),Y_j(\lambda)) = \lambda^2\gamma_{i,j} + (1-\lambda^2)\gamma_{i,j} = \gamma_{i,j}.\end{aligned}$$

Also, \( \mathop {\mathrm {Cov}} \nolimits (X,Y(\lambda ))_{i,j} = \lambda \gamma _{i,j}.\) Consider

$$\displaystyle \begin{aligned}F(\lambda) = {\mathbb E}f(X)g(Y(\lambda)).\end{aligned}$$

Then, \( \mathop {\mathrm {Cov}} \nolimits (f(X),g(X)) = F(1)-F(0)\). So it suffices to show F (λ) exists and is positive for 0 ≤ λ < 1. This is where the analysis is required. Namely, writing Γ −1 = ((c i,j)), let

$$\displaystyle \begin{aligned}\varphi(x) = (2\pi)^{-{k\over 2}}(\det\varGamma)^{-{1\over 2}} \exp\left\{-{1\over 2}\sum_{i,j=1}^kc_{i,j}x_ix_j\right\}\end{aligned}$$

denote the Gaussian pdfFootnote 19 of X. Then the conditional pdf of Y (λ) given [X = x] is

$$\displaystyle \begin{aligned}p(\lambda;x,y) = (1-\lambda^2)^{-{k\over 2}} \varphi((1-\lambda^2)^{-{1\over 2}}(y-\lambda x)).\end{aligned}$$

That is, p(λ;x, y) is normal with covariance matrix (1 − λ 2)Γ and mean vector λx. Thus,

$$\displaystyle \begin{aligned}F(\lambda) = \int_{\mathbb{R}^k}f(x)\varphi(x)g(\lambda,x)dx,\end{aligned}$$

where

$$\displaystyle \begin{aligned}g(\lambda, x) = \int_{\mathbb{R}^k}g(y)p(\lambda;x,y)dy.\end{aligned}$$

Observing that

$$\displaystyle \begin{aligned}g(\lambda,x) = \varphi(\lambda,\cdot)*g(\lambda x),\end{aligned}$$

where \(\varphi (\lambda ,x) = (1-\lambda ^2)^{-{k\over 2}} \varphi ((1-\lambda ^2)^{-{1\over 2}}x)\), one sees that ∂g(λ, x)∕∂λ exists and is nonnegative, while ∂g(λ, x)∕∂x j exists and is bounded. To compute \({\partial p\over \partial \lambda }\), let h(t, y) denote the pdf of \(\varGamma ^{1\over 2}B_t\), where B is k-dimensional standard Brownian motion. Then one has

$$\displaystyle \begin{aligned}p(\lambda;x,y) = h(1-\lambda^2, y-\lambda x).\end{aligned}$$

Using the chain rule and an application of the heat equation for multivariate Brownian motionFootnote 20 one arrives at

$$\displaystyle \begin{aligned} {\partial p\over d\lambda} = -{1\over\lambda}\left\{\sum_{i\neq j}\gamma_{i,j} {\partial p\over\partial x_i,\partial x_j} - \sum_i x_i{\partial p\over\partial x_i}\right\}. \end{aligned}$$

Thus,

$$\displaystyle \begin{aligned}F^\prime(\lambda) = -{1\over\lambda}\int_{\mathbb{R}^k}f(x)\varphi(x) \left\{\sum_{i\neq j}\gamma_{i,j} {\partial g(\lambda,x)\over\partial x_i,\partial x_j} - \sum_i x_i{\partial g(\lambda,x)\over\partial x_i}\right\}dx.\end{aligned}$$

Finally, with an integration by parts one arrives at

$$\displaystyle \begin{aligned}F^\prime(\lambda) = {1\over\lambda}\int_{\mathbb{R}^k}\varphi(x) \left\{\sum_{i\neq j}\gamma_{i,j}{\partial f(x)\over\partial x_i} {\partial g(\lambda,x)\over\partial x_j}\right\}dx \ge 0.\end{aligned}$$

In the case Γ is singular one may replace Γ by the non-singular matrix Γ + 𝜖 1 k×k, 𝜖 > 0, and observe that for continuous f, g, \( \mathop {\mathrm {Cov}} \nolimits (f(X),g(X))\) depends continuously on Γ. Thus positivity is preserved in the limit as 𝜖 → 0.\(\hfill \blacksquare \)

Exercises

  1. 1.

    Let X 1, …, X n be associated random variables, and Y j = f j(X 1, …, X n), j = 1, …, m where f j is coordinatewise non-decreasing for j = 1, …, m. Show that (a) \(P(Y_1\le y_1,\dots ,Y_k\le y_k) \ge \prod _{j=1}^kP(Y_j\le y_j)\), and (b) \(P(Y_1> y_1,\dots ,Y_k> y_k) \ge \prod _{j=1}^kP(Y_j> y_j)\). [Hint: Define non-decreasing functions of Z j’s by \(Z_j = {\mathbf {1}}_{[Y_j>y_j]}, j=1,\dots ,m\), and note that Z 1Z i, and Z i+1Z m are non-decreasing functions of Z j’s. Apply the FKG inequalities iteratively for i = 1, …, m, noting \({\mathbb E}Z_j = P(Z_j=1)\). (c) Suppose that X 1, …X n are independent random variables and let \(S_j=\sum _{i=1}^jX_i, j=1,\dots ,n\). Show that \(P(S_1\le s_1,\dots , S_n\le s_n) \ge \prod _{i=1}^nP(S_i\le s_i)\) for all s 1, …, s n. [Hint: S j is a non-decreasing function of X 1, …, X n, and independent random variables are associated.]

  2. 2.

    Suppose f is a continuous function on [0, 1] and consider the Bernstein polynomial defined by \(f_n(x) = \sum _{j=0}^n{n\choose j}x^j(1-x)^{n-j} = {\mathbb E}f({S_n(x)\over n}), 0\le x\le 1\), where S j(x) = X 1(x) + ⋯X j(x), for i.i.d. Bernoulli 0 − 1-valued random variables with P(X j(x) = 1) = x, j = 1, …, n. (a) Show that f n → f uniformly on [0, 1] as n →. (b)(Seymour-Welsh) ShowFootnote 21 that for non-decreasing f, g on [0, 1], (fg)n(x) ≥ f n(x)g n(x), 0 ≤ x ≤ 1.

  3. 3.

    Show that binary 0 − 1-valued random variables X, Y  are associated if and only if they are positively quadrant dependent.

  4. 4.

    Complete the details for the extension of Hoeffding’s lemma used in the generalization Lemma 2.

  5. 5.

    Suppose that X = (X 1, …, X m) is a vector of associated random variables. Let f, g be, respectively, coordinatewise increasing and decreasing functions. Show that \( \mathop {\mathrm {Cov}} \nolimits (f(X),g(X)) \le 0\). Extend this to countably many associated random variables.

  6. 6.

    Prove the alternative formula

    $$\displaystyle \begin{aligned} H_{X,Y}(x,y) = P(X\le x,Y\le y) - P(X\le x)P(Y\le y). \end{aligned}$$
  7. 7.

    Suppose \(f(X),g(Y)\in L^2(\varOmega ,{\mathcal F},P)\), where X, Y  are real-valued random variables bounded below by a constant \(b\in {\mathbb R}\), and f, g are continuously differentiable complex functions with bounded derivatives. Prove the Hoeffding-Newman formula in this case, starting from the familiar moment formulae

    $$\displaystyle \begin{aligned} {\mathbb E}(f(X)-f(b)) = {\mathbb E}\int_b^Xf^\prime(x)dx = \int_b^\infty P(X>x)f^\prime(x)dx, \end{aligned}$$

    and

    $$\displaystyle \begin{aligned} {\mathbb E}(f(X)-f(b))(\overline{g(Y)-g(b)}) = {\mathbb E}\int_b^X\int_b^Yf^\prime(x)\overline{g}^\prime(y)dxdy. \end{aligned}$$
  8. 8.

    Show that each of the collections of non-decreasing binary 0 or 1-valued functions and those of non-decreasing bounded continuous functions are association determining.

  9. 9.

    Consider the spatial intermittency of clusters as reflected in the density of isolated points and/or non-isolated points. A site \(x\in {\mathbb Z}^2\) is isolated whenever [C(x) = ∅]. The numbers of isolated points and non-isolated points in a square \(B_0^{(N)}\) are perfectly correlated since their total is (fixed) \(|B_0^{(N)}|\). It is convenient to consider the number of non-isolated sites in the square \(B_0^{(N)}\) (including surface sites for simplicity) as given by

    $$\displaystyle \begin{aligned}D_N = \sum_{x\in B_0^{(N)}}{\mathbf{1}}_{[C(x)\neq\emptyset]}.\end{aligned}$$

    Show that \({\mathbb E}D_N = (1-q^4)N^2\) and compute the asymptotic distribution of D N, suitably centered and scaled, as N →.

  10. 10.

    Fix a positive integer k and obtain the asymptotic fluctuation law for the numbers of sites x in \(B_0^{(N)}\) belonging to a cluster of size at most k, i.e., such that |C(x)|≤ k.

  11. 11.

    Let \(N = \{N(A): A\in {\mathcal B}\}\) be a Poisson point process on \(\mathbb {R}^n\). Show that N is an associated family of random variables.

  12. 12.

    Show that the collection of coordinatewise non-decreasing functions f, g on \({\mathbb R}^k\) that are continuously differentiable with bounded partial derivatives is association determining. [Hint: (a) Check that any measurable increasing functions is a pointwise a.s. limit of continuous increasing functions. (b) Check that if ρ 𝜖, 𝜖 > 0 is a nonnegative C -mollifier,Footnote 22 then f ∗ ρ 𝜖 is C increasing with bounded partials such that f ∗ ρ 𝜖 → f pointwise, uniform boundedly.]