1 Introduction

The distance covariance, a measure of dependence between multivariate random variables X and Y, was introduced by Székely et al. (2007) and has since received extensive attention in the statistical literature. A crucial feature of the distance covariance is that it equals zero if and only if X and Y are mutually independent. Hence the distance covariance is sensitive to arbitrary dependencies; this is in contrast to the classical covariance, which is generally capable of detecting only linear dependencies. This property is illustrated in Fig. 1, which illustrates that tests based on the distance covariance are able to detect numerous types of non-linear associations even when tests based on the classical covariance may fail to detect many such statistical relationships.

Figure 1
figure 1

The sub-figures A-C represent scatter-plots of bivariate samples (X,Y) with n = 600 data points to which independence tests, based on the distance covariance and classical covariance, were applied. In each case a distance covariance permutation test using 100,000 permutations yields p-values of 10− 5, demonstrating that the distance covariance is able to detect these dependencies. The p-values of permutation tests based on the classical covariance with 100,000 permutations are 0.663, 0.129, and 0.889 for A, B, and C, respectively

While the dependencies illustrated in Fig. 1 clearly represent purely illustrative examples, the sensitivity of the distance covariance to arbitrary dependencies can be very useful for applications. This is demonstrated in Fig. 2, where we show three dependencies between expression values genes in the breast cancer data set by Van De Vijver et al. (2002); all these dependencies can be detected by the distance covariance but not by the classical covariance.

Figure 2
figure 2

Sub-figures A-C represent three scatter-plots of the expression values of genes in a breast cancer data set provided by Van De Vijver et al. (2002) (n = 295 samples) on which permutation tests, based on the distance covariance and classical covariance, were applied. The p-values of the distance covariance permutation tests using 100,000 permutations are A: 10− 5; B: 10− 5; C: 3.00 × 10− 4. The p-values of permutation tests based on the classical covariance with 100,000 permutations are A: 0.079 ; B: 0.503; C: 0.930

For comparisons of the distance covariance and classical covariance in applications to data, see the examples given by Székely and Rizzo (2009, Section 5.2) and Dueck et al. (2014, Section 5); for extensive numerical experiments and fast algorithms for computing the distance covariance, see Huo and Székely (2016, Section 5). We also refer to Sejdinovic et al. (2013), Dueck et al. (2014), Székely and Rizzo (2009), Székely and Rizzo (2014), and Huo and Székely (2016), and Edelmann et al. (2020), representing only a few of the many authors who have given further theoretical results on the distance covariance and distance correlation coefficients; and to Zhou (2012) and Fiedler (2016), and Edelmann et al. (2019) as among the applications to time series analyses. Many applications to data analysis of the distance correlation coefficient and the distance covariance are now available, including: Kong et al. (2012) on data in sociology, Martínez-Gómez et al. (2014) and Richards et al. (2014) on astrophysical databases and galaxy clusters, Dueck et al. (2014) on time series analyses of wind vectors at electricity-generating facilities, Richards (2017) on the relationship between the strength of gun control laws and firearm-related homicides, Zhang et al. (2018) for remote sensing applications, and Ohana-Levi et al. (2020) on grapevine transpiration.

The original papers of Székely et al. (2007, 2009) are now widely recognized as seminal and important contributions to measuring dependence between sets of random variables; however, the exposition therein includes some ingenious arguments that may make the material challenging to readers not having an advanced background in mathematical statistics. With the benefit of hindsight, we are able to provide in this article a simpler, albeit mathematically rigorous, introduction to the distance covariance that can be taught even in an undergraduate-level course covering the basic theory of U-statistics. Other than standard U-statistics theory and some well-known properties of characteristic functions, the requirements for our treatment are a knowledge of multidimensional integration and trigonometric inequalities, as covered in a course on undergraduate-level advanced calculus. Consequently, we hope that this treatment will prove to be beneficial to non-mathematical statisticians.

Our presentation introduces the distance covariance as an important alternative to the classical covariance. Moreover, the distance covariance constitutes a particularly interesting example of a U-statistic since it includes both the “non-degenerate” and “first-order degenerate” cases of the asymptotic distribution theory of U-statistics, these corresponding to the situations in which X and Y are dependent, leading to the non-degenerate case, or X and Y are independent, leading to the first-order degenerate case of the asymptotic theory.

Throughout the exposition, ∥⋅∥ denotes the Euclidean norm and 〈⋅,⋅〉 the corresponding inner product. Also, we denote by |⋅| the modulus in \(\mathbb {C}\) or the absolute value in , and the imaginary unit is \(i = \sqrt {-1}\).

2 The Fundamental Integral of Distance Covariance Theory

Following Székely et al. (2007), we first establish a closed-form expression for an integral that plays a central work in this article, leading to the equivalence of two crucial expressions for the distance covariance. The first expression displays the distance covariance as an integrated distance between the joint characteristic function of (X,Y ) and the product of the marginal characteristic functions of X and Y; we will deduce from this expression that the distance covariance equals zero if and only if X and Y are independent. The second expression allows us to derive consistent distance covariance estimators that are expressible as polynomials in the distances between random samples.

Since the ability to characterize independence and the existence of easily computable estimators are arguably the most important properties of the distance covariance, we will refer to this integral as the fundamental integral of distance covariance.

Lemma 2.1.

For \(x \in \mathbb {R}^{p}\),

$$ {\int}_{\mathbb{R}^{p}} \frac{1-\cos \langle t,x \rangle}{\|t\|^{p+1}} \text{d} t = \frac{\pi^{(p+1)/2}}{\Gamma\big((p+1)/2\big)} \|x\|. $$
(1)

Proof 1.

Since (1) is valid for x = 0, we need only treat the case in which x≠ 0.

Denote by Ip the integral in Eq. 1. For p = 1, replacing t by t/x yields

$$ I_{1} = {\int}_{-\infty}^{\infty} \frac{1-\cos t x }{t^{2}} \text{d}t = |x| {\int}_{-\infty}^{\infty} \frac{1-\cos t}{t^{2}} {\text{d}t}. $$
(2)

Denoting the latter integral in Eq. 2 by c1, it follows by integration-by-parts that

$$ c_{1} = 2 {\int}_{0}^{\infty} \frac{1- \cos t}{t^{2}} {\text{d}t} = 2 {\int}_{0}^{\infty} \frac{\sin t}{t} {\text{d}t} = \pi, $$
(3)

the last equality being classical in calculus (Spivak 1994, Chapter 19, Problem 43).

For general p, note that Ip is invariant under orthogonal transformations H of x:

$$ \begin{array}{@{}rcl@{}} {\int}_{\mathbb{R}^{p}} \frac{1-\cos \langle t,Hx \rangle}{\|t\|^{p+1}} \text{d}t &=& {\int}_{\mathbb{R}^{p}} \frac{1-\cos \langle Ht, Hx \rangle}{\|H t\|^{p+1}} \text{d}t \\ &=& {\int}_{\mathbb{R}^{p}} \frac{1-\cos \langle t, x \rangle}{\|t\|^{p+1}} \text{d}t, \end{array} $$

where the first equality follows from the transformation tHt, which leaves the Lebesgue measure dt unchanged; and the second equality holds because the norm and the inner product are orthogonally invariant. Therefore, in evaluating Ip we may replace x by ∥x∥(1,0,…,0); letting t = (t1,…,tp), we obtain

$$ I_{p} = {\int}_{\mathbb{R}^{p}} \frac{1-\cos (t_{1} \|x\|)}{\|t\|^{p+1}} \text{d}t = \|x\| {\int}_{\mathbb{R}^{p}} \frac{1-\cos t_{1}}{\|t\|^{p+1}} \text{d}t, $$
(4)

the last equality obtained by replacing tj by tj/∥x∥, j = 1,…,p.

Denoting by cp the latter integral in Eq. 4, we substitute in that integral tj = vj, j = 1,…,p − 1, and \(t_{p} = p^{-1/2} ({v_{1}^{2}} + {\cdots } + v_{p-1}^{2})^{1/2} v_{p}\). As the Jacobian of this transformation is \(p^{-1/2} ({v_{1}^{2}} + {\cdots } + v_{p-1}^{2})^{1/2}\), we obtain

$$ \begin{array}{@{}rcl@{}} c_{p} \!\!\!&=&\!\! p^{-1/2} \displaystyle{\int}_{\mathbb{R}^{p-1}} \frac{1-\cos v_{1}}{({v_{1}^{2}} + {\cdots} + v_{p-1}^{2})^{p/2}} \text{d}v_{1} {\cdots} \text{d}v_{p-1} \cdot {\int}_{-\infty}^{\infty} \frac{\text{d}v_{p}}{(1 + p^{-1} {v_{p}^{2}})^{(p+1)/2}} \\ &= &p^{-1/2} c_{p-1} \displaystyle{\int}_{-\infty}^{\infty} \frac{\text{d}v_{p}}{(1 + p^{-1} {v_{p}^{2}})^{(p+1)/2}}. \end{array} $$
(5)

As the remaining integral in Eq. 5 is the familiar normalizing constant of the Student’s t-distribution on p degrees-of-freedom, we obtain

$$ c_{p} = \frac{\pi^{1/2} {\Gamma}(p/2)}{\Gamma\big((p+1)/2\big)} c_{p-1}. $$

Starting with c1 = π, we solve this recursive equation for cp, obtaining (1). □

3 Two Representations for the Distance Covariance

We now introduce the representations of the distance covariance mentioned above. Following Székely et al. (2007), we define the distance covariance through its characteristic function representation. For jointly distributed random vectors and , let \(\phi _{X,Y} (s,t) = \mathbb {E} e^{i \langle s , X \rangle + i \langle t , Y \rangle }\) be the joint characteristic function of (X,Y ) and ϕX(s) = ϕX,Y(s,0) and ϕY(t) = ϕX,Y(0,t) be the corresponding marginal characteristic functions.

Definition 3.1.

The distance covariance \(\mathcal {V}(X,Y)\) between X and Y is defined as the nonnegative square-root of

$$ \mathcal{V}^{2}(X,Y) = \frac{1}{c_{p} c_{q}} {\int}_{\mathbb{R}^{p}}{\int}_{\mathbb{R}^{q}} \frac{|\phi_{X,Y}(s,t) - \phi_{X}(s) \phi_{Y} (t)|^{2}}{\|s\|^{p+1} \|t\|^{p+1}} \text{d}s \text{d}t, $$
(6)

where cp is the normalizing constant in Eq. 1.

As the integrand in Eq. 6 is nonnegative, it follows that \(\mathcal {V}^{2}(X,Y) \geq 0\). Further, we will show in Corollary 3.4 that \(\mathcal {V}^{2}(X,Y) < \infty \) whenever X and Y have finite first moments.

An advantage of the representation (6) is that it directly implies one of the most important properties of the distance covariance, viz., the characterization of independence.

Theorem 3.2.

For all X and Y, \(\mathcal {V}^{2}(X,Y) = 0\) if and only if X and Y are independent.

Proof 2.

If X and Y are independent then ϕX,Y(s,t) = ϕX(s)ϕY(t) for all s and t; hence \(\mathcal {V}^{2}(X,Y) = 0\).

Conversely, if X and Y are not independent then the functions ϕX,Y(s,t) and ϕX(s)ϕY(t) are not identical (Van der Vaart 2000, Lemma 2.15). Since characteristic functions are continuous then there exists an open set \(\mathcal {A} \subseteq \) such that |ϕX,Y(s,t) − ϕX(s)ϕY(t)|2 > 0 for all \((s,t) \in \mathcal {A}\). Hence, by Eq. 6, \(\mathcal {V}^{2}(X,Y) > 0\). □

For the purpose of deriving estimators for \(\mathcal {V}^{2}(X,Y)\), we now apply Lemma 2.1 to obtain a second representation of the distance covariance.

Theorem 3.3.

Suppose that (X1,Y1),…,(X4,Y4) are independent, identically distributed (i.i.d.) copies of (X,Y ). Then

$$ \mathcal{V}^{2}(X,Y ) = \mathbb{E} \!\Big[\|X_{1} - X_{2} \|\ \cdot \|Y_{1} - Y_{2} \|- 2\|X_{1} - X_{2} \| \cdot \|Y_{1} - Y_{3} \| + \|X_{1} - X_{2} \| \cdot \|Y_{3} - Y_{4} \|\! \Big]. $$
(7)

Proof 3.

First, we observe that the numerator in the integrand in Eq. 6 equals

$$ \begin{array}{@{}rcl@{}} |\phi_{X,Y}&&{}(s,t) - \phi_{X}(s) \phi_{Y} (t)|^{2} \\ &=& (\phi_{X,Y} (s,t) - \phi_{X}(s) \phi_{Y} (t)) \overline{(\phi_{X,Y} (s,t) - \phi_{X}(s) \phi_{Y} (t))} \\ &= &\mathbb{E}\big[e^{i \langle s , X_{1}-X_{2} \rangle+ i \langle t , Y_{1}-Y_{2} \rangle} - 2 e^{i \langle s , X_{1}-X_{2} \rangle+ i \langle t , Y_{1}-Y_{3} \rangle} + e^{i \langle s , X_{1}-X_{2} \rangle+ i \langle t , Y_{3}-Y_{4} \rangle}\big]. \end{array} $$

Since the latter expression is real, any term of the form eiz, \(z \in \mathbb {R}\), can be replaced by \({\cos \limits } z\). Hence, by Eq. 6,

$$ \begin{array}{@{}rcl@{}} c_{p} c_{q} \mathcal{V}^{2}(X,Y) ={\int}_{\mathbb{R}^{p}}{\int}_{\mathbb{R}^{q}} \frac{A_{12}(s,t)- 2 A_{13}(s,t) + A_{34}(s,t)} {\|s\|^{p+1} \| t \|^{q+1}} \text{d}s \text{d}t \end{array} $$
(8)

where, for each (j,k),

$$ A_{jk}(s,t) = \mathbb{E} \cos{\big(\langle s, X_{1} - X_{2} \rangle + \langle t ,Y_{j} - Y_{k} \rangle \big)}. $$
(9)

Replacing t by − t in Eq. 8, we also obtain

$$ \begin{array}{@{}rcl@{}} c_{p} c_{q} \mathcal{V}^{2}(X,Y) ={\int}_{\mathbb{R}^{p}}{\int}_{\mathbb{R}^{q}} \frac{A_{12}(s,-t)- 2 A_{13}(s,-t) + A_{34}(s,-t)} {\|s\|^{p+1} \| t \|^{q+1}} \text{d}s \text{d}t, \end{array} $$
(10)

and by adding (8) and (10), we find that

$$ c_{p} c_{q} \mathcal{V}^{2}(X,Y) = {\int}_{\mathbb{R}^{p}}{\int}_{\mathbb{R}^{q}} \frac{B_{12}(s,t)- 2 B_{13}(s,t) + B_{34}(s,t)} {\|s\|^{p+1} \| t \|^{q+1}} \text{d}s \text{d}t $$

where for each (j,k),

$$ B_{jk}(s,t) = \frac{1}{2} \big(A_{jk}(s,t) + A_{jk}(s,-t) \big). $$
(11)

On applying to Eqs. 9 and 11 the trigonometric identity,

$$ \cos(x+y) + \cos(x-y) = 2 \cos x \cos y, $$

we deduce that

$$ B_{jk}(s,t) = \mathbb{E} \big[ \cos{\langle s, X_{1} - X_{2} \rangle} \cos{\langle t, Y_{j} - Y_{k} \rangle} \big]. $$
(12)

For j,k,∈{1,2,3,4}, we apply to Eq. 12 the elementary identity,

$$ \begin{array}{@{}rcl@{}} \cos{\!\langle\! s, X_{1}\! -\! X_{2} \rangle} \! \cos{\langle t, Y_{j} - Y_{k} \rangle}\!\!\!\! &=&\!\!\!\! \big(1-\cos{\langle s, X_{1} - X_{2} \rangle}\big) \big(1-\cos{\langle t, Y_{j} - Y_{k} \rangle}\big) \\ &&\!\!\!\!\!\!\!\! - 1 + \cos{\langle s, X_{1} - X_{2} \rangle} + \cos{\langle t, Y_{j} - Y_{k} \rangle}; \end{array} $$
(13)

then we obtain

$$ \begin{array}{@{}rcl@{}} c_{p} & &{\kern-1.7pc}c_{q} \mathcal{V}^{2}(X,Y) \\ &&{\kern-1.6pc}= {\int}_{\mathbb{R}^{p}}{\int}_{\mathbb{R}^{q}} \Big(\mathbb{E} \big[\big(1-\cos{\langle s, X_{1} - X_{2} \rangle} \big) \big(1-\cos{\langle t, Y_{1} - Y_{2} \rangle} \big) \big] \\ && \qquad\quad - 2 \mathbb{E} \big[\big(1-\cos{\langle s, X_{1} - X_{2} \rangle} \big) \big(1-\cos{\langle t, Y_{1} - Y_{3} \rangle} \big)\big] \\ && \qquad\quad + \mathbb{E} \big[\big(1-\cos{\langle s, X_{1} - X_{2} \rangle} \big) \big(1-\cos{\langle t, Y_{3} - Y_{4} \rangle} \big)\big] \Big) \frac{\text{d}s \text{d}t}{\|s\|^{p+1} \|t\|^{q+1}}, \end{array} $$

which is obtained by decomposing all summands on the right-hand side using Eq. 13 and observing that all terms which are not of the form \( \mathbb {E}[ \cos \limits {\langle s, X_{i} - X_{j} \rangle }\) \(\cos \limits {\langle t, Y_{l} - Y_{k} \rangle }]\) cancel each other. By applying the Fubini-Tonelli Theorem and the linearity of expectation and integration, we obtain

$$ \begin{array}{@{}rcl@{}} c_{p} & &{\kern-1.7pc}c_{q} \mathcal{V}^{2}(X,Y) \\ &&{\kern-1.4pc}= \mathbb{E} {\int}_{\mathbb{R}^{p}}{\int}_{\mathbb{R}^{q}} \Big[ \big(1-\cos{\langle s, X_{1} - X_{2} \rangle} \big) \big(1-\cos{\langle t, Y_{1} - Y_{2} \rangle} \big) \\ && \qquad\qquad\ - 2 \big(1-\cos{\langle s, X_{1} - X_{2} \rangle} \big) \big(1-\cos{\langle t, Y_{1} - Y_{3} \rangle} \big) \\ && \qquad\qquad\quad + \big(1-\cos{\langle s, X_{1} - X_{2} \rangle} \big) \big(1-\cos{\langle t, Y_{3} - Y_{4} \rangle} \big) \Big] {\hskip-1.54312pt} \frac{\text{d}s \text{d}t}{\|s\|^{p+1} \| t \|^{q+1}}. \end{array} $$

The proof is completed by applying Lemma 2.1 to calculate these three integrals. □

Before establishing estimators for \(\mathcal {V}^{2}(X,Y)\), we remark briefly on the assumptions necessary for the existence of the distance covariance.

Corollary 3.4.

Suppose that \(\mathbb {E} \|X\| < \infty \) and \(\mathbb {E} \|Y\| < \infty \). Then \(\mathcal {V}^{2}(X,Y) < \infty \).

Proof 4.

From the representation (7), we directly obtain the alternative representation

$$ \begin{array}{@{}rcl@{}} \mathcal{V}^{2}(X,Y ) = \mathbb{E} \Big[\|X_{1} - X_{2} \| \|Y_{1} - Y_{2} \|- \|X_{1} - X_{2} \| \|Y_{1} - Y_{3} \| \notag\\ - \|X_{1} - X_{2} \| \|Y_{2} - Y_{3} \| + \|X_{1} - X_{2} \| \|Y_{3} - Y_{4} \| \Big]. \end{array} $$
(14)

Applying the triangle inequality yields

$$ \|X_{1} - X_{2} \| \|Y_{1} - Y_{2} \|- \|X_{1} - X_{2} \| \|Y_{1} - Y_{3} \| - \|X_{1} - X_{2} \| \|Y_{2} - Y_{3} \| \leq 0, $$

and hence

$$ \begin{array}{@{}rcl@{}} 0 \leq \mathcal{V}^{2}(X,Y) &\leq \mathbb{E} \|X_{1} - X_{2} \| \|Y_{3} - Y_{4} \| \\ &= \mathbb{E} \|X_{1} - X_{2} \| \mathbb{E} \|Y_{3} - Y_{4} \| \leq 4 E \|X\| \mathbb{E} \|Y\|, \end{array} $$

where the last inequality follows again by the triangle inequality. □

4 Asymptotic Theory for Estimating the Distance Covariance

Using the representation of the distance covariance given in Eq. 7, it is straightforward to derive a U-statistic estimator for \(\mathcal {V}^{2}(X)\). Specifically, we define the symmetric kernel function

$$ \begin{array}{@{}rcl@{}} h\big((X_{1},Y_{1}),\ldots,(X_{4},Y_{4})\big) \notag\\ = \frac{1}{24} \sum \big(\|X_{i} - X_{j} \| \|Y_{i} - Y_{j} \| - 2 \|X_{i} - X_{j} \| \|Y_{i} - Y_{k} \| + \|X_{i} - X_{j} \| \|Y_{k} - Y_{l} \| \big), \end{array} $$
(15)

where the sum is over all i,j,k,l ∈{1,2,3,4} such that i, j, k, and l are distinct.

It follows from the representation (7) that each of the 24 summands in Eq. 15 has expectation \(\mathcal {V}^{2}(X,Y)\). Therefore,

$$ \mathbb{E} h\big((X_{1},Y_{1}),\ldots,(X_{4},Y_{4})\big) = \mathcal{V}^{2}(X,Y). $$

Letting (X1,Y1),…,(Xn,Yn) be a random sample from (X,Y ), we find that an unbiased estimator of \(\mathcal {V}^{2}(X,Y)\) is

$$ \widehat{\Omega} = {\binom{n}{4}}^{-1} \sum\limits_{1 \le i < j < k < l \le n} h\big((X_{i},Y_{i}),(X_{j},Y_{j}),(X_{k},Y_{k}),(X_{l},Y_{l})\big). $$
(16)

We can now derive the consistency and asymptotic distribution of this estimator using standard U-statistic theory (Lee, 2019). For this purpose, let us define

$$ h_{1}(x,y) = \mathbb{E}\big[h\big((x,y),(X_{2},Y_{2}),(X_{3},Y_{3}),(X_{4}, Y_{4})\big)\big]. $$

and

$$ h_{2}((x_{1},y_{1}),(x_{2},y_{2})) = \mathbb{E}\big[h\big((x_{1},y_{1}),(x_{2},y_{2}),(X_{3},Y_{3}),(X_{4}, Y_{4})\big)\big]. $$

The preceding formulas and a classical result on U-statistics (Hoeffding 1948, Theorem 7.1) leads immediately to a proof of the following result.

Theorem 4.1.

Suppose that \(0 < \text {Var} (h_{1}(X,Y)) < \infty \). Then \(\sqrt {n} \big (\widehat {\Omega } - \mathcal {V}^{2}(X,Y)\big ) \stackrel {P}{\longrightarrow } Z\) as \(n \to \infty \), where \(Z \sim \mathcal {N}\big (0,16 \text {Var} (h_{1}(X,Y)\big )\).

Except for pathological examples, Theorem 4.1 provides the asymptotic distribution of \(\mathcal {V}^{2}(X,Y)\) if X and Y are dependent. For the crucial case of independent X and Y, however, the asymptotic distribution of \(\sqrt {n} (\widehat {\Omega } - \mathcal {V}(X,Y)^{2})\) is degenerate; in this case, the asymptotic distribution can be derived using results on first-order degenerate U-statistics (Lee 2019, Section 3.2.2).

Lemma 4.2.

Let X and Y be independent, and (X1,Y1) and (X2,Y2) be i.i.d. copies of (X,Y ). Then h1(x,y) ≡ 0 and \(\text {Var} \big (h_{2}((X_{1},Y_{1}),(X_{2},Y_{2}))\big ) = \mathcal {V}^{2}(X,X) \mathcal {V}^{2}(Y,Y)/36\).

The proof follows by elementary, but lengthy, transformations and may be left as an exercise to students. A complete proof is provided by Huang and Huo (2017), Appendices B.6 and B.7.

Finally, the following result follows directly from Lemma 4.2 and classical results on the distributions of first-order degenerate U-statistics (Lee 2019, Section 3.2.2).

Theorem 4.3.

Let X and Y be independent, with \(\mathbb {E}(\|X\|) < \infty \) and \(\mathbb {E}(\|Y\|) < \infty \). Then,

$$ \begin{array}{@{}rcl@{}} n \big(\widehat{\Omega} - \mathcal{V}^{2}(X,Y)\big) \stackrel{\mathcal{D}}{\longrightarrow} 6 \sum\limits_{i=1}^{\infty} \lambda_{i} ({Z_{i}^{2}} - 1), \end{array} $$
(17)

as \(n \to \infty \), where Z1,Z2,… are i.i.d. standard normal random variables and λ1,λ2,… are the eigenvalues of the integral equation

$$ \mathbb{E} \big[h_{2}\big((x_{1},y_{1}),(X_{2},Y_{2})\big) f(X_{2},Y_{2})\big] = \lambda f(x_{1},y_{1}). $$

5 Concluding Remarks

In this article, we have derived under minimal technical requirements the most important statistical properties of the distance covariance. From this starting point, there are several additional interesting topics that can be explored, e.g., as instructional assignments:

  1. (i)

    The estimator Eq. 16 is O(n4) and is computationally inefficient. A straightforward combinatorial computation shows that an O(n2) estimator of \(\mathcal {V}\) is given by

    $$ \begin{aligned} \widetilde{\Omega} = \frac{1}{n (n-3)}\Bigg[ & \sum\limits_{i,j=1}^{n} \|X_{i}-X_{j}\| \|Y_{i}-Y_{j}\| \\ & \ + \frac{1}{(n-1) (n-2)} \sum\limits_{i,j=1}^{n} \|X_{i}-X_{j}\| \cdot \sum\limits_{i,j=1}^{n} \|Y_{i}-Y_{j}\| \\ & \ - \frac{2}{(n-2)} \sum\limits_{i,j,k=1}^{n} \|X_{i}-X_{j}\| \|Y_{i}-Y_{k}\|\Bigg]; \end{aligned} $$
    (18)

    see Huo and Székely (2016).

  2. (ii)

    We remark that although no assumption was provided in Theorem 4.1 to ensure that \(\text {Var}(h_{1}(X,Y)) < \infty \), it can be shown that this condition holds whenever X and Y have finite second moments; see Edelmann et al. (2020).

  3. (iii)

    Important contributions of Székely et al. (2007) and (Székely and Rizzo, 2009) are based on the distance correlation coefficient, which is defined as the nonnegative square-root of

    $$ \mathcal{V}^{2}(X,Y) = \frac{\mathcal{V}^{2}(X,Y)}{\sqrt{\mathcal{V}^{2}(X,X) \mathcal{V}^{2}(Y,Y)}}. $$

Numerous properties of \(\mathcal {V}^{2}(X,Y)\) (see, e.g., Székely et al. 2007, Theorem 3) may be derived using the methods that we have presented here.

We also remark on the fundamental integral, Eq. 1, that underpins the entire distance covariance and distance correlation theory. As noted by Dueck et al. (2015), the fundamental integral and variants of it have appeared in functional analysis (Gelfand and Shilov 1964, pp. 192–195), in Fourier analysis (Stein1970, pp. 140 and 263), and in the theory of fractional Brownian motion on generalized random fields (Chilès and Delfiner P. 2012, p. 266; Reed et al.1995).

The fundamental integral also extends further. For \(m \in \mathbb {N}\) and \(v \in \mathbb {R}\), define

$$ \cos_{m}(v) := \sum\limits_{j=0}^{m-1} (-1)^{j} \frac{v^{2j}}{(2j)!}, $$
(19)

the truncated Maclaurin expansion of the cosine function. Dueck et al. (2015) proved that for \(\alpha \in \mathbb {C}\),

$$ {\int}_{\mathbb{R}^{d}}\frac{\cos_{m}(\langle t,x\rangle) - \cos(\langle t,x\rangle)}{\|t\|^{d+\alpha}} \text{d}t = \frac{2\pi^{p/2} {\Gamma}(1-\alpha/2)}{\alpha 2^{\alpha} {\Gamma}\big((p+\alpha)/2\big)} \|x\|^{\alpha}, $$
(20)

with absolute convergence if and only if 2(m − 1) < R(α) < 2m. For m = 1 and α = 1, Eq. 20 reduces to Eq. 1. Further, for m = 1 and 0 < α < 2, the integral (20) provides the Lévy-Khintchine representation of the negative definite function ∥xα, thereby linking the fundamental integral to the probability theory of the stable distributions.

In conclusion, the statistical analysis of data through distance covariance and distance correlation theory, by means of the fundamental integral, is seen to be linked closely to many areas of the mathematical sciences.