What Intraclass Covariance Structures Can Symmetric Bernoulli Random Variables Have?

Pinelis, Iosif

doi:10.3103/S1066530722040020

What Intraclass Covariance Structures Can Symmetric Bernoulli Random Variables Have?

Published: 03 March 2023

Volume 31, pages 165–169, (2022)
Cite this article

Download PDF

Access provided by Autonomous University of Puebla

Mathematical Methods of Statistics Aims and scope Submit manuscript

What Intraclass Covariance Structures Can Symmetric Bernoulli Random Variables Have?

Download PDF

Iosif Pinelis¹

55 Accesses
Explore all metrics

Abstract

The covariance matrix of random variables $X_{1},\dots,X_{n}$ is said to have an intraclass covariance structure if the variances of all the $X_{i}$’s are the same and all the pairwise covariances of the $X_{i}$’s are the same. We provide a possibly surprising characterization of such covariance matrices in the case when the $X_{i}$’s are symmetric Bernoulli random variables.

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

For natural $n\geq 2$, let $\Sigma=[\Sigma_{i,j}]_{i,j\in[n]}$ be the covariance matrix of random variables (r.v.’s) $X_{1},\dots,X_{n}$ with finite second moments, so that $\Sigma_{i,j}=\operatorname{\mathsf{Cov}}(X_{i},X_{j})$ for all $i$ and $j$ in the set $[n]:=\{1,\dots,n\}$. We are assuming that the matrix $\Sigma$ is nonzero.

The covariance matrix $\Sigma$ is said to have an intraclass covariance structure if (i) $\Sigma_{i,i}=\operatorname{\mathsf{Var}}X_{i}=\operatorname{\mathsf{Cov}}(X_{i},X_{i})$ is the same for all $i\in[n]$ and (ii) $\Sigma_{i,j}=\operatorname{\mathsf{Cov}}(X_{i},X_{j})$ is the same for all distinct $i$ and $j$ in $[n]$. Let $\text{ICCS}_{n}$ denote the set of all $n\times n$ covariance matrices that have an intraclass covariance structure.

In particular, if the r.v.’s $X_{1},\dots,X_{n}$ are exchangeabl—that is, if the joint distribution of the $X_{i}$’s is invariant with respect to all permutations of the indices $1,\dots,n$ (see e.g., [4] for much more on exchangeability of r.v.’s), then the covariance matrix $\Sigma$ will be in the set $\text{ICCS}_{n}$. So, one may say that the covariance matrix $\Sigma$ has an intraclass covariance structure if the r.v.’s $X_{1},\dots,X_{n}$ pertain to items that belong to one class and thus are exchangeable in a certain weak sense; this explains the use of the term ‘‘intraclass’’. The notion of an intraclass covariance structure was introduced by Fisher [3] and has been studied in many subsequent papers, including e.g., [7, 9, 10].

Obviously, the covariance matrix $\Sigma$ is in the set $\text{ICCS}_{n}$ if and only if

$$\Sigma=(a-b)I_{n}+b\,{\mathsf{1}}_{n}{\mathsf{1}}_{n}^{\top}$$

(1)

for some real numbers $a$ and $b$, where $I_{n}$ is the $n\times n$ identity matrix and ${\mathsf{1}}_{n}:=[1,\dots,1]^{\top}$, the $n\times 1$ matrix of $1$’s.

Recall that a real $n\times n$ matrix is a covariance matrix if and only if it is positive semidefinite; cf. e.g., [2, Sect. III.6, Theorem 4]. Note that (i) ${\mathsf{1}}_{n}$ is an eigenvector of the matrix ${\mathsf{1}}_{n}{\mathsf{1}}_{n}^{\top}$ belonging to the eigenvalue $n$ and (ii) any nonzero vector orthogonal to ${\mathsf{1}}_{n}$ is an eigenvector of the matrix ${\mathsf{1}}_{n}{\mathsf{1}}_{n}^{\top}$ belonging to the eigenvalue $0$. So, the only eigenvalues of the matrix $\Sigma$ of the form (1) are $a-b+bn$ and $a-b$.

It follows that the matrix $\Sigma$ of the form (1) is in $\text{ICCS}_{n}$ if and only if $-\frac{a}{n-1}\leq b\leq a$, that is, if and only if the pairwise correlation, $\rho=b/a$, between r.v.’s whose covariance matrix has an intraclass covariance structure is no less that $-1/(n-1)$:

$$\rho\geq\rho_{n,\textrm{min}}:=-\frac{1}{n-1}.$$

(2)

This is in contrast with the general lower bound $-1$ on the correlation between arbitrary r.v.’s. Let us refer to the values of $\rho$ satisfying condition (2) as good.

In the rest of this note, we shall consider the special case when the r.v.’s $X_{1},\dots,X_{n}$ are symmetric Bernoulli, so that

$$\operatorname{\mathsf{P}}(X_{i}=1)=\tfrac{1}{2}=\operatorname{\mathsf{P}}(X_{i}=0)$$

(3)

for all $i\in[n]$. This important case has been extensively studied in computer science in general and in machine learning in particular (see e.g., [1, 5, 8, 11]), as well as in other applications of probability theory—though mainly when the $X_{i}$’s are independent.

The question now is the following:

$$\begin{gathered}\text{For what values of pairwise correlation } \rho \text{ do there exist symmetric Bernoulli r.v.'s } X_{1}, ..., X_{n} \\ \text{ whose covariance matrix } \Sigma \text{ is in ICCS}{}_n \text{?}\end{gathered}$$

Let us refer to such values of $\rho$ as symmetric-binary-good. Clearly, any symmetric-binary-good value of $\rho$ must be good. One then may wonder whether every good value of $\rho$ is symmetric-binary-good.

The answer to this question may seem surprising:

if $n$ is even, then yes, every good value of $\rho$ is symmetric-binary-good;
if $n$ is odd, then ‘‘nearly every’’ good value of $\rho$ is symmetric-binary-good.

For symmetric Bernoulli r.v.’s $X_{1},\dots,X_{n}$ whose covariance matrix $\Sigma$ is in $\text{ICCS}_{n}$, it is a bit more convenient to deal with the probability

$$p:=\operatorname{\mathsf{P}}(X_{1}=X_{2})$$

than with the correlation $\rho$. It is easy to see that the values of $\rho$ and $p$ are in the simple bijective correspondence

$$(-1,1)\ni 2p-1=\rho\longleftrightarrow p=\frac{1+\rho}{2}\in(0,1),$$

(4)

so that $\operatorname{\mathsf{P}}(X_{i}=X_{j})=p$ for all distinct $i$ and $j$ in $[n]$.

Let us refer to the values of $p$ corresponding to the good values of $\rho$ as good values of $p$, and let us similarly define the symmetric-binary-good values of $p$. So, in view of (2) and (4), a value $p\in(0,1)$ is good if and only if

$$p\geq p_{n}:=\frac{n-2}{2(n-1)}.$$

(5)

Thus, we have to determine the symmetric-binary-good values of $p$.

Suppose for a moment that $p\in(0,1)$ is symmetric-binary-good. Then there exist symmetric Bernoulli r.v.’s $X_{1},\dots,X_{n}$ such that $\operatorname{\mathsf{P}}(X_{i}=X_{j})=p$ for all distinct $i$ and $j$ in $[n]$. Letting $g$ stand for the joint probability mass function of the r.v.’s $X_{1},\dots,X_{n}$, we note that $g$ is a nonnegative function such that

(i) $\sum_{x\in\{0,1\}^{n}}g(x)=1$,

(ii) $\sum_{x\in\{0,1\}^{n}}1(x_{i}=0)g(x)=\frac{1}{2}$ for all $i\in[n]$,

(iii) $\sum_{x\in\{0,1\}^{n}}1(x_{i}=x_{j})g(x)=p$ for all distinct $i$ and $j$ in $[n]$;

of course, here $x_{i}$ denotes the $i$th coordinate of the vector $x=(x_{1},\dots,x_{n})\in\{0,1\}^{n}$. By symmetry, conditions (i)–(iii) will hold with $\tilde{g}(x):=\frac{1}{n!}\sum_{\pi\in\Pi_{n}}g(\pi(x))$ in place of $g(x)$, where $\Pi_{n}$ is the set of all permutations of the set $[n]$. Note that $\tilde{g}(x)=f(\sum_{1}^{n}x_{i})$ for some nonnegative function $f\colon\{0,\dots,n\}\to\mathbb{R}$ and all $x\in\{0,1\}^{n}$. So, conditions (i)–(iii) can be rewritten as

(I) $\sum_{k=0}^{n}\binom{n}{k}f(k)=1$,

(II) $\sum_{k=0}^{n}\binom{n-1}{k}f(k)=\frac{1}{2}$ for all $i$,

(III) $\sum_{k=0}^{n}a_{n,k}f(k)=p$,

where

$$a_{n,k}=\binom{n-2}{k}+\binom{n-2}{k-2};$$

of course, $\binom{n-1}{n}=0$, $\binom{n-2}{k}=0$ if $k\geq n-1$ and $\binom{n-2}{k-2}=0$ if $k\leq 1$.

Thus, for any given $n\geq 2$ and $p\in(0,1)$, we want to see whether there is a nonnegative function $f\colon\{0,\dots,n\}\to\mathbb{R}$ such that conditions (I)–(III) hold.

Towards this goal, consider the problem of finding the extrema of $\sum_{k=0}^{n}a_{n,k}f(k)$ over all $f\in F_{n}$, where $F_{n}$ is the set of all nonnegative function $f\colon\{0,\dots,n\}\to\mathbb{R}$ satisfying condition (I). In view of the symmetries $\binom{n}{k}=\binom{n}{n-k}$ and $a_{n,k}=a_{n,n-k}$, without loss of generality the functions $f$ are symmetric in the same sense: $f(k)=f(n-k)$ for all $k\in\{0,\dots,n\}$—otherwise, replacing $f(k)$ by $\frac{1}{2}\,(f(k)+f(n-k))$, we will have the sums in (I) and (III) unchanged. Next, consider the ratios

$$r_{k}:=r_{n,k}:=\frac{a_{n,k}}{\binom{n}{k}}=\frac{(n-k)(n-k-1)+k(k-1)}{n(n-1)}.$$

Note that $r_{k+1}\leq r_{k}$ if $0\leq k\leq\frac{n-1}{2}$ and $r_{k+1}\geq r_{k}$ if $\frac{n-1}{2}\leq k\leq n-1$. Also, $r_{k}=r_{n-k}$. So, the smallest among the $r_{k}$’s is/are the one/ones with index/indices $k$ closest to $\frac{n}{2}$.

More specifically, if $n=2m-1$ is odd, then $r_{k}\geq r_{m}=r_{m-1}$ for all $k\in\{1,\dots,n-1\}$. Letting then

$$f^{\textrm{odd}}_{\textrm{min}}(m-1):=\frac{1/2}{\binom{n}{m-1}}=\frac{1/2}{\binom{n}{m}},\quad f^{\textrm{odd}}_{\textrm{min}}(m):=\frac{1/2}{\binom{n}{m}}=\frac{1/2}{\binom{n}{m-1}},$$

$$f^{\textrm{odd}}_{\textrm{min}}(k):=0\quad\text{for all}\quad k\in\{0,\dots,n\}\setminus\{m-1,m\},$$

we see that $f^{\textrm{odd}}_{\textrm{min}}$ is a symmetric function in $F_{n}$ and

$$(r_{k}-r_{m})(f^{\textrm{odd}}_{\textrm{min}}(k)-f(k))\leq 0$$

for all $k\in\{0,\dots,n\}$ and all symmetric functions $f\in F_{n}$, which implies

$$\sum_{k=0}^{n}a_{n,k}f^{\textrm{odd}}_{\textrm{min}}(k)-\sum_{k=0}^{n}a_{n,k}f(k)=\sum_{k=0}^{n}a_{n,k}(f^{\textrm{odd}}_{\textrm{min}}(k)-f(k))$$

$${}=\sum_{k=0}^{n}\binom{n}{k}r_{k}(f^{\textrm{odd}}_{\textrm{min}}(k)-f(k))$$

$${}=\sum_{k=0}^{n}\binom{n}{k}(r_{k}-r_{m})(f^{\textrm{odd}}_{\textrm{min}}(k)-f(k))\leq 0.$$

It follows that $f^{\textrm{odd}}_{\textrm{min}}$ is a minimizer of $\sum_{k=0}^{n}a_{n,k}f(k)$ over all $f\in F_{n}$, that is, over all nonnegative $f$ satisfying condition (I). Moreover, condition (II) is satisfied with $f^{\textrm{odd}}_{\textrm{min}}$ in place of $f$.

We conclude that, in the case when $n=2m-1$ is odd, $f^{\textrm{odd}}_{\textrm{min}}$ is a minimizer of $\sum_{k=0}^{n}a_{n,k}f(k)$ over all nonnegative $f$ satisfying both conditions (I) and (II). The corresponding minimum value of $\sum_{k=0}^{n}a_{n,k}f(k)$ is

$$p^{\textrm{odd}}_{n,\textrm{min}}:=\sum_{k=0}^{n}a_{n,k}f^{\textrm{odd}}_{\textrm{min}}(k)=\frac{m-1}{2m-1}=\frac{n-1}{2n}.$$

Similarly, in the case when $n=2m$ is even, a minimizer of $\sum_{k=0}^{n}a_{n,k}f(k)$ over all nonnegative $f$ satisfying both conditions (I) and (II) is given by

$$f^{\mathsf{even}}_{\textrm{min}}(m):=\frac{1}{\binom{n}{m}}\quad\text{and}\quad f^{\mathsf{even}}_{\textrm{min}}(k):=0\ \ \,\text{for all}\ \,k\in\{0,\dots,n\}\setminus\{m\},$$

and the corresponding minimum value of $\sum_{k=0}^{n}a_{n,k}f(k)$ is

$$p^{\mathsf{even}}_{n,\textrm{min}}:=\sum_{k=0}^{n}a_{n,k}f^{\mathsf{even}}_{\textrm{min}}(k)=\frac{m-1}{2m-1}=\frac{n-2}{2(n-1)}.$$

The above minimization can of course be recognized as something similar to, or even a special case of, the Neyman–Pearson lemma [6, part III].

The just considered cases of odd and even $n$ can be summarized as follows. For

$$m_{n}:=\lceil n/2\rceil,$$

let $f_{\textrm{min}}$ be the symmetric function in $F_{n}$ such that $\sum_{k\in\{m_{n},n-m_{n}\}}f_{\textrm{min}}(k)=1,$ so that $f(k)=0$ for $k\in\{0,\dots,n\}\setminus\{m_{n},n-m_{n}\}$. Then $f_{\textrm{min}}$ is a minimizer of $\sum_{k=0}^{n}a_{n,k}f(k)$ over all nonnegative $f$ satisfying conditions (I) and (II). The corresponding minimum value of $\sum_{k=0}^{n}a_{n,k}f(k)$ is

$$p_{n,\textrm{min}}:=\sum_{k=0}^{n}a_{n,k}f_{\textrm{min}}(k)=\dfrac{m_{n}-1}{2m_{n}-1}.$$

The extremal joint distribution of the binary r.v.’s $X_{1},\dots,X_{n}$ corresponding to the minimizer $f_{\textrm{min}}$ can be described as follows: the random set $I:=\{i\in[n]\colon X_{i}=1\}$ is uniformly distributed on the set $S_{n}:=\binom{[n]}{m_{n}}\cup\binom{[n]}{n-m_{n}}$, where $\binom{[n]}{k}$ denotes the set of all subsets of cardinality $k$ of the set $[n]$; of course, $S_{n}:=\binom{[n]}{n/2}$ if $n$ is even.

Next, letting

$$f_{\max}(0):=\tfrac{1}{2},\quad f_{\max}(n):=\tfrac{1}{2},\quad f_{\max}(k):=0\ \,\text{for all}\ \,k\in\{1,\dots,n-1\},$$

we see that the nonnegative function $f_{\max}$ satisfies conditions (I) and (II), and also $\sum_{k=0}^{n}a_{n,k}f_{\max}(k)=1$. On the other hand, for any nonnegative function $f$ satisfying conditions (I) and (II), the sum $\sum_{k=0}^{n}a_{n,k}f(k)$ is a probability and hence does not exceed $1$. We conclude that $f_{\max}$ is a maximizer of $\sum_{k=0}^{n}a_{n,k}f(k)$ over all nonnegative $f$ satisfying conditions (I) and (II). The corresponding maximum value of $\sum_{k=0}^{n}a_{n,k}f(k)$ is

$$p_{n,\max}:=\sum_{k=0}^{n}a_{n,k}f_{\max}(k)=1.$$

The extremal joint distribution of the binary r.v.’s $X_{1},\dots,X_{n}$ corresponding to the maximizer $f_{\max}$ can be described as follows: the random set $I=\{i\in[n]\colon X_{i}=1\}$ is uniformly distributed on the set $\{\emptyset,[n]\}$; that is, $\operatorname{\mathsf{P}}(I=\emptyset)=\frac{1}{2}=\operatorname{\mathsf{P}}(I=[n])$.

Now note that the set of all values of $\sum_{k=0}^{n}a_{n,k}f(k)$, where $f\colon\{0,\dots,n\}\to\mathbb{R}$ is a nonnegative function such that conditions (I) and (II) hold, is convex and therefore coincides with the interval $[p_{n,\textrm{min}},p_{n,\max}]=[p_{n,\textrm{min}},1]$.

Thus, a value $p\in(0,1)$ is symmetric-binary-good if and only if

$$p\geq p_{n,\textrm{min}}=\dfrac{m_{n}-1}{2m_{n}-1}=\begin{cases}\dfrac{n-2}{2(n-1)}=p_{n}\quad\text{if $n$ is even}\\ \dfrac{n-1}{2n}=p_{n+1}>p_{n}\quad\text{if $n$ is odd},\end{cases}$$

where $p_{n}$ is as in (5).

Because $p_{n+1}$ is close to $p_{n}$ for large $n$ and in view of the correspondence (4) between $\rho$ and $p$, we have now confirmed that

if $n$ is even then every good value of $\rho$ is symmetric-binary-good;
if $n$ is odd then, for large $n$, nearly every good value of $\rho$ is symmetric-binary-good.

One may also note here that for large $n$ the lower bound $\rho_{n,\textrm{min}}$ (defined in (2)) is close to (but less than) $0$, whereas the lower bound $p_{n,\textrm{min}}$ is close to (but less than) $\frac{1}{2}$.

REFERENCES

P. Baldi and R. Vershynin, ‘‘A theory of capacity and sparse neural encoding,’’ Neural Networks 143, 12–27 (2021).
Article Google Scholar
W. Feller, An Introduction to Probability Theory and Its Applications, Vol. II, 2nd ed. (John Wiley and Sons, Inc., New York-London-Sydney, 1971).
MATH Google Scholar
F. A. Fisher. Statistical Methods for Research Workers. Pure and Applied Mathematics (New York, Oliver and Boyd, 1932).
Google Scholar
O. Kallenberg, Probabilistic Symmetries and Invariance Principles. Probability and its Applications (New York, Springer, New York, 2005).
N. Natarajan, I. S. Dhillon, P. K. Ravikumar, and A. Tewari, Learning with noisy labels, Ed. by C. Burges, L. Bottou, M. Welling, Z. Ghahramani, and K. Weinberger, Advances in Neural Information Processing Systems, Vol. 26 (Curran Associates, Inc., 2013).
MATH Google Scholar
J. Neyman and E. S. Pearson, ‘‘On the problem of the most efficient tests of statistical hypotheses,’’ Mathematics, Mathematical Proceedings of the Cambridge Philosophical Society 231, 289–337 (1933).
S. J. Press, ‘‘Structured multivariate Behrens–Fisher problems,’’ Sankhya: The Indian Journal of Statistics, Series A (1961–2002) 29 (1), 41–48 (1967).
K. Senel and E. G. Larsson, Joint user activity and non-coherent data detection in mmtcenabled massive mimo using machine learning algorithms, in WSA 2018; 22nd International ITG Workshop on Smart Antennas (2018), pp. 1–6.
M. S. Srivastava and M. Singull, ‘‘Testing sphericity and intraclass covariance structures under a growth curve model in high dimension,’’ Communications in Statistics—Simulation and Computation 46 (7), 5740–5751 (2017).
Article MathSciNet MATH Google Scholar
J. E. Walsh, ‘‘Concerning the Effect of Intraclass Correlation on Certain Significance Tests,’’ The Annals of Mathematical Statistics 18 (1), 88–96 (1947).
Article MathSciNet MATH Google Scholar
F. Zhang, W. Wang, J. Hou, J. Wang, and J. Huang, ‘‘Tensor restricted isometry property analysis for a large class of random measurement ensembles,’’ Sci. China Inf. Sci. 64 (1), 119101 (2021).

Download references

Author information

Authors and Affiliations

Michigan Technological University, Houghton, Michigan, USA
Iosif Pinelis

Authors

Iosif Pinelis
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Iosif Pinelis.

About this article

Cite this article

Pinelis, I. What Intraclass Covariance Structures Can Symmetric Bernoulli Random Variables Have?. Math. Meth. Stat. 31, 165–169 (2022). https://doi.org/10.3103/S1066530722040020

Download citation

Received: 16 June 2022
Revised: 24 September 2022
Accepted: 03 October 2022
Published: 03 March 2023
Issue Date: December 2022
DOI: https://doi.org/10.3103/S1066530722040020

Keywords:

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

What Intraclass Covariance Structures Can Symmetric Bernoulli Random Variables Have?

Abstract

REFERENCES

Author information

Authors and Affiliations

Corresponding author

About this article

Cite this article

Share this article

Keywords:

Search

Navigation