Abstract
The covariance matrix of random variables \(X_{1},\dots,X_{n}\) is said to have an intraclass covariance structure if the variances of all the \(X_{i}\)’s are the same and all the pairwise covariances of the \(X_{i}\)’s are the same. We provide a possibly surprising characterization of such covariance matrices in the case when the \(X_{i}\)’s are symmetric Bernoulli random variables.
Avoid common mistakes on your manuscript.
For natural \(n\geq 2\), let \(\Sigma=[\Sigma_{i,j}]_{i,j\in[n]}\) be the covariance matrix of random variables (r.v.’s) \(X_{1},\dots,X_{n}\) with finite second moments, so that \(\Sigma_{i,j}=\operatorname{\mathsf{Cov}}(X_{i},X_{j})\) for all \(i\) and \(j\) in the set \([n]:=\{1,\dots,n\}\). We are assuming that the matrix \(\Sigma\) is nonzero.
The covariance matrix \(\Sigma\) is said to have an intraclass covariance structure if (i) \(\Sigma_{i,i}=\operatorname{\mathsf{Var}}X_{i}=\operatorname{\mathsf{Cov}}(X_{i},X_{i})\) is the same for all \(i\in[n]\) and (ii) \(\Sigma_{i,j}=\operatorname{\mathsf{Cov}}(X_{i},X_{j})\) is the same for all distinct \(i\) and \(j\) in \([n]\). Let \(\text{ICCS}_{n}\) denote the set of all \(n\times n\) covariance matrices that have an intraclass covariance structure.
In particular, if the r.v.’s \(X_{1},\dots,X_{n}\) are exchangeabl—that is, if the joint distribution of the \(X_{i}\)’s is invariant with respect to all permutations of the indices \(1,\dots,n\) (see e.g., [4] for much more on exchangeability of r.v.’s), then the covariance matrix \(\Sigma\) will be in the set \(\text{ICCS}_{n}\). So, one may say that the covariance matrix \(\Sigma\) has an intraclass covariance structure if the r.v.’s \(X_{1},\dots,X_{n}\) pertain to items that belong to one class and thus are exchangeable in a certain weak sense; this explains the use of the term ‘‘intraclass’’. The notion of an intraclass covariance structure was introduced by Fisher [3] and has been studied in many subsequent papers, including e.g., [7, 9, 10].
Obviously, the covariance matrix \(\Sigma\) is in the set \(\text{ICCS}_{n}\) if and only if
for some real numbers \(a\) and \(b\), where \(I_{n}\) is the \(n\times n\) identity matrix and \({\mathsf{1}}_{n}:=[1,\dots,1]^{\top}\), the \(n\times 1\) matrix of \(1\)’s.
Recall that a real \(n\times n\) matrix is a covariance matrix if and only if it is positive semidefinite; cf. e.g., [2, Sect. III.6, Theorem 4]. Note that (i) \({\mathsf{1}}_{n}\) is an eigenvector of the matrix \({\mathsf{1}}_{n}{\mathsf{1}}_{n}^{\top}\) belonging to the eigenvalue \(n\) and (ii) any nonzero vector orthogonal to \({\mathsf{1}}_{n}\) is an eigenvector of the matrix \({\mathsf{1}}_{n}{\mathsf{1}}_{n}^{\top}\) belonging to the eigenvalue \(0\). So, the only eigenvalues of the matrix \(\Sigma\) of the form (1) are \(a-b+bn\) and \(a-b\).
It follows that the matrix \(\Sigma\) of the form (1) is in \(\text{ICCS}_{n}\) if and only if \(-\frac{a}{n-1}\leq b\leq a\), that is, if and only if the pairwise correlation, \(\rho=b/a\), between r.v.’s whose covariance matrix has an intraclass covariance structure is no less that \(-1/(n-1)\):
This is in contrast with the general lower bound \(-1\) on the correlation between arbitrary r.v.’s. Let us refer to the values of \(\rho\) satisfying condition (2) as good.
In the rest of this note, we shall consider the special case when the r.v.’s \(X_{1},\dots,X_{n}\) are symmetric Bernoulli, so that
for all \(i\in[n]\). This important case has been extensively studied in computer science in general and in machine learning in particular (see e.g., [1, 5, 8, 11]), as well as in other applications of probability theory—though mainly when the \(X_{i}\)’s are independent.
The question now is the following:
Let us refer to such values of \(\rho\) as symmetric-binary-good. Clearly, any symmetric-binary-good value of \(\rho\) must be good. One then may wonder whether every good value of \(\rho\) is symmetric-binary-good.
The answer to this question may seem surprising:
-
if \(n\) is even, then yes, every good value of \(\rho\) is symmetric-binary-good;
-
if \(n\) is odd, then ‘‘nearly every’’ good value of \(\rho\) is symmetric-binary-good.
For symmetric Bernoulli r.v.’s \(X_{1},\dots,X_{n}\) whose covariance matrix \(\Sigma\) is in \(\text{ICCS}_{n}\), it is a bit more convenient to deal with the probability
than with the correlation \(\rho\). It is easy to see that the values of \(\rho\) and \(p\) are in the simple bijective correspondence
so that \(\operatorname{\mathsf{P}}(X_{i}=X_{j})=p\) for all distinct \(i\) and \(j\) in \([n]\).
Let us refer to the values of \(p\) corresponding to the good values of \(\rho\) as good values of \(p\), and let us similarly define the symmetric-binary-good values of \(p\). So, in view of (2) and (4), a value \(p\in(0,1)\) is good if and only if
Thus, we have to determine the symmetric-binary-good values of \(p\).
Suppose for a moment that \(p\in(0,1)\) is symmetric-binary-good. Then there exist symmetric Bernoulli r.v.’s \(X_{1},\dots,X_{n}\) such that \(\operatorname{\mathsf{P}}(X_{i}=X_{j})=p\) for all distinct \(i\) and \(j\) in \([n]\). Letting \(g\) stand for the joint probability mass function of the r.v.’s \(X_{1},\dots,X_{n}\), we note that \(g\) is a nonnegative function such that
(i) \(\sum_{x\in\{0,1\}^{n}}g(x)=1\),
(ii) \(\sum_{x\in\{0,1\}^{n}}1(x_{i}=0)g(x)=\frac{1}{2}\) for all \(i\in[n]\),
(iii) \(\sum_{x\in\{0,1\}^{n}}1(x_{i}=x_{j})g(x)=p\) for all distinct \(i\) and \(j\) in \([n]\);
of course, here \(x_{i}\) denotes the \(i\)th coordinate of the vector \(x=(x_{1},\dots,x_{n})\in\{0,1\}^{n}\). By symmetry, conditions (i)–(iii) will hold with \(\tilde{g}(x):=\frac{1}{n!}\sum_{\pi\in\Pi_{n}}g(\pi(x))\) in place of \(g(x)\), where \(\Pi_{n}\) is the set of all permutations of the set \([n]\). Note that \(\tilde{g}(x)=f(\sum_{1}^{n}x_{i})\) for some nonnegative function \(f\colon\{0,\dots,n\}\to\mathbb{R}\) and all \(x\in\{0,1\}^{n}\). So, conditions (i)–(iii) can be rewritten as
(I) \(\sum_{k=0}^{n}\binom{n}{k}f(k)=1\),
(II) \(\sum_{k=0}^{n}\binom{n-1}{k}f(k)=\frac{1}{2}\) for all \(i\),
(III) \(\sum_{k=0}^{n}a_{n,k}f(k)=p\),
where
of course, \(\binom{n-1}{n}=0\), \(\binom{n-2}{k}=0\) if \(k\geq n-1\) and \(\binom{n-2}{k-2}=0\) if \(k\leq 1\).
Thus, for any given \(n\geq 2\) and \(p\in(0,1)\), we want to see whether there is a nonnegative function \(f\colon\{0,\dots,n\}\to\mathbb{R}\) such that conditions (I)–(III) hold.
Towards this goal, consider the problem of finding the extrema of \(\sum_{k=0}^{n}a_{n,k}f(k)\) over all \(f\in F_{n}\), where \(F_{n}\) is the set of all nonnegative function \(f\colon\{0,\dots,n\}\to\mathbb{R}\) satisfying condition (I). In view of the symmetries \(\binom{n}{k}=\binom{n}{n-k}\) and \(a_{n,k}=a_{n,n-k}\), without loss of generality the functions \(f\) are symmetric in the same sense: \(f(k)=f(n-k)\) for all \(k\in\{0,\dots,n\}\)—otherwise, replacing \(f(k)\) by \(\frac{1}{2}\,(f(k)+f(n-k))\), we will have the sums in (I) and (III) unchanged. Next, consider the ratios
Note that \(r_{k+1}\leq r_{k}\) if \(0\leq k\leq\frac{n-1}{2}\) and \(r_{k+1}\geq r_{k}\) if \(\frac{n-1}{2}\leq k\leq n-1\). Also, \(r_{k}=r_{n-k}\). So, the smallest among the \(r_{k}\)’s is/are the one/ones with index/indices \(k\) closest to \(\frac{n}{2}\).
More specifically, if \(n=2m-1\) is odd, then \(r_{k}\geq r_{m}=r_{m-1}\) for all \(k\in\{1,\dots,n-1\}\). Letting then
we see that \(f^{\textrm{odd}}_{\textrm{min}}\) is a symmetric function in \(F_{n}\) and
for all \(k\in\{0,\dots,n\}\) and all symmetric functions \(f\in F_{n}\), which implies
It follows that \(f^{\textrm{odd}}_{\textrm{min}}\) is a minimizer of \(\sum_{k=0}^{n}a_{n,k}f(k)\) over all \(f\in F_{n}\), that is, over all nonnegative \(f\) satisfying condition (I). Moreover, condition (II) is satisfied with \(f^{\textrm{odd}}_{\textrm{min}}\) in place of \(f\).
We conclude that, in the case when \(n=2m-1\) is odd, \(f^{\textrm{odd}}_{\textrm{min}}\) is a minimizer of \(\sum_{k=0}^{n}a_{n,k}f(k)\) over all nonnegative \(f\) satisfying both conditions (I) and (II). The corresponding minimum value of \(\sum_{k=0}^{n}a_{n,k}f(k)\) is
Similarly, in the case when \(n=2m\) is even, a minimizer of \(\sum_{k=0}^{n}a_{n,k}f(k)\) over all nonnegative \(f\) satisfying both conditions (I) and (II) is given by
and the corresponding minimum value of \(\sum_{k=0}^{n}a_{n,k}f(k)\) is
The above minimization can of course be recognized as something similar to, or even a special case of, the Neyman–Pearson lemma [6, part III].
The just considered cases of odd and even \(n\) can be summarized as follows. For
let \(f_{\textrm{min}}\) be the symmetric function in \(F_{n}\) such that \(\sum_{k\in\{m_{n},n-m_{n}\}}f_{\textrm{min}}(k)=1,\) so that \(f(k)=0\) for \(k\in\{0,\dots,n\}\setminus\{m_{n},n-m_{n}\}\). Then \(f_{\textrm{min}}\) is a minimizer of \(\sum_{k=0}^{n}a_{n,k}f(k)\) over all nonnegative \(f\) satisfying conditions (I) and (II). The corresponding minimum value of \(\sum_{k=0}^{n}a_{n,k}f(k)\) is
The extremal joint distribution of the binary r.v.’s \(X_{1},\dots,X_{n}\) corresponding to the minimizer \(f_{\textrm{min}}\) can be described as follows: the random set \(I:=\{i\in[n]\colon X_{i}=1\}\) is uniformly distributed on the set \(S_{n}:=\binom{[n]}{m_{n}}\cup\binom{[n]}{n-m_{n}}\), where \(\binom{[n]}{k}\) denotes the set of all subsets of cardinality \(k\) of the set \([n]\); of course, \(S_{n}:=\binom{[n]}{n/2}\) if \(n\) is even.
Next, letting
we see that the nonnegative function \(f_{\max}\) satisfies conditions (I) and (II), and also \(\sum_{k=0}^{n}a_{n,k}f_{\max}(k)=1\). On the other hand, for any nonnegative function \(f\) satisfying conditions (I) and (II), the sum \(\sum_{k=0}^{n}a_{n,k}f(k)\) is a probability and hence does not exceed \(1\). We conclude that \(f_{\max}\) is a maximizer of \(\sum_{k=0}^{n}a_{n,k}f(k)\) over all nonnegative \(f\) satisfying conditions (I) and (II). The corresponding maximum value of \(\sum_{k=0}^{n}a_{n,k}f(k)\) is
The extremal joint distribution of the binary r.v.’s \(X_{1},\dots,X_{n}\) corresponding to the maximizer \(f_{\max}\) can be described as follows: the random set \(I=\{i\in[n]\colon X_{i}=1\}\) is uniformly distributed on the set \(\{\emptyset,[n]\}\); that is, \(\operatorname{\mathsf{P}}(I=\emptyset)=\frac{1}{2}=\operatorname{\mathsf{P}}(I=[n])\).
Now note that the set of all values of \(\sum_{k=0}^{n}a_{n,k}f(k)\), where \(f\colon\{0,\dots,n\}\to\mathbb{R}\) is a nonnegative function such that conditions (I) and (II) hold, is convex and therefore coincides with the interval \([p_{n,\textrm{min}},p_{n,\max}]=[p_{n,\textrm{min}},1]\).
Thus, a value \(p\in(0,1)\) is symmetric-binary-good if and only if
where \(p_{n}\) is as in (5).
Because \(p_{n+1}\) is close to \(p_{n}\) for large \(n\) and in view of the correspondence (4) between \(\rho\) and \(p\), we have now confirmed that
-
if \(n\) is even then every good value of \(\rho\) is symmetric-binary-good;
-
if \(n\) is odd then, for large \(n\), nearly every good value of \(\rho\) is symmetric-binary-good.
One may also note here that for large \(n\) the lower bound \(\rho_{n,\textrm{min}}\) (defined in (2)) is close to (but less than) \(0\), whereas the lower bound \(p_{n,\textrm{min}}\) is close to (but less than) \(\frac{1}{2}\).
REFERENCES
P. Baldi and R. Vershynin, ‘‘A theory of capacity and sparse neural encoding,’’ Neural Networks 143, 12–27 (2021).
W. Feller, An Introduction to Probability Theory and Its Applications, Vol. II, 2nd ed. (John Wiley and Sons, Inc., New York-London-Sydney, 1971).
F. A. Fisher. Statistical Methods for Research Workers. Pure and Applied Mathematics (New York, Oliver and Boyd, 1932).
O. Kallenberg, Probabilistic Symmetries and Invariance Principles. Probability and its Applications (New York, Springer, New York, 2005).
N. Natarajan, I. S. Dhillon, P. K. Ravikumar, and A. Tewari, Learning with noisy labels, Ed. by C. Burges, L. Bottou, M. Welling, Z. Ghahramani, and K. Weinberger, Advances in Neural Information Processing Systems, Vol. 26 (Curran Associates, Inc., 2013).
J. Neyman and E. S. Pearson, ‘‘On the problem of the most efficient tests of statistical hypotheses,’’ Mathematics, Mathematical Proceedings of the Cambridge Philosophical Society 231, 289–337 (1933).
S. J. Press, ‘‘Structured multivariate Behrens–Fisher problems,’’ Sankhya: The Indian Journal of Statistics, Series A (1961–2002) 29 (1), 41–48 (1967).
K. Senel and E. G. Larsson, Joint user activity and non-coherent data detection in mmtcenabled massive mimo using machine learning algorithms, in WSA 2018; 22nd International ITG Workshop on Smart Antennas (2018), pp. 1–6.
M. S. Srivastava and M. Singull, ‘‘Testing sphericity and intraclass covariance structures under a growth curve model in high dimension,’’ Communications in Statistics—Simulation and Computation 46 (7), 5740–5751 (2017).
J. E. Walsh, ‘‘Concerning the Effect of Intraclass Correlation on Certain Significance Tests,’’ The Annals of Mathematical Statistics 18 (1), 88–96 (1947).
F. Zhang, W. Wang, J. Hou, J. Wang, and J. Huang, ‘‘Tensor restricted isometry property analysis for a large class of random measurement ensembles,’’ Sci. China Inf. Sci. 64 (1), 119101 (2021).
Author information
Authors and Affiliations
Corresponding author
About this article
Cite this article
Pinelis, I. What Intraclass Covariance Structures Can Symmetric Bernoulli Random Variables Have?. Math. Meth. Stat. 31, 165–169 (2022). https://doi.org/10.3103/S1066530722040020
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.3103/S1066530722040020