For natural \(n\geq 2\), let \(\Sigma=[\Sigma_{i,j}]_{i,j\in[n]}\) be the covariance matrix of random variables (r.v.’s) \(X_{1},\dots,X_{n}\) with finite second moments, so that \(\Sigma_{i,j}=\operatorname{\mathsf{Cov}}(X_{i},X_{j})\) for all \(i\) and \(j\) in the set \([n]:=\{1,\dots,n\}\). We are assuming that the matrix \(\Sigma\) is nonzero.

The covariance matrix \(\Sigma\) is said to have an intraclass covariance structure if (i) \(\Sigma_{i,i}=\operatorname{\mathsf{Var}}X_{i}=\operatorname{\mathsf{Cov}}(X_{i},X_{i})\) is the same for all \(i\in[n]\) and (ii) \(\Sigma_{i,j}=\operatorname{\mathsf{Cov}}(X_{i},X_{j})\) is the same for all distinct \(i\) and \(j\) in \([n]\). Let \(\text{ICCS}_{n}\) denote the set of all \(n\times n\) covariance matrices that have an intraclass covariance structure.

In particular, if the r.v.’s \(X_{1},\dots,X_{n}\) are exchangeabl—that is, if the joint distribution of the \(X_{i}\)’s is invariant with respect to all permutations of the indices \(1,\dots,n\) (see e.g., [4] for much more on exchangeability of r.v.’s), then the covariance matrix \(\Sigma\) will be in the set \(\text{ICCS}_{n}\). So, one may say that the covariance matrix \(\Sigma\) has an intraclass covariance structure if the r.v.’s \(X_{1},\dots,X_{n}\) pertain to items that belong to one class and thus are exchangeable in a certain weak sense; this explains the use of the term ‘‘intraclass’’. The notion of an intraclass covariance structure was introduced by Fisher [3] and has been studied in many subsequent papers, including e.g., [7, 9, 10].

Obviously, the covariance matrix \(\Sigma\) is in the set \(\text{ICCS}_{n}\) if and only if

$$\Sigma=(a-b)I_{n}+b\,{\mathsf{1}}_{n}{\mathsf{1}}_{n}^{\top}$$
(1)

for some real numbers \(a\) and \(b\), where \(I_{n}\) is the \(n\times n\) identity matrix and \({\mathsf{1}}_{n}:=[1,\dots,1]^{\top}\), the \(n\times 1\) matrix of \(1\)’s.

Recall that a real \(n\times n\) matrix is a covariance matrix if and only if it is positive semidefinite; cf. e.g., [2, Sect. III.6, Theorem 4]. Note that (i) \({\mathsf{1}}_{n}\) is an eigenvector of the matrix \({\mathsf{1}}_{n}{\mathsf{1}}_{n}^{\top}\) belonging to the eigenvalue \(n\) and (ii) any nonzero vector orthogonal to \({\mathsf{1}}_{n}\) is an eigenvector of the matrix \({\mathsf{1}}_{n}{\mathsf{1}}_{n}^{\top}\) belonging to the eigenvalue \(0\). So, the only eigenvalues of the matrix \(\Sigma\) of the form (1) are \(a-b+bn\) and \(a-b\).

It follows that the matrix \(\Sigma\) of the form (1) is in \(\text{ICCS}_{n}\) if and only if \(-\frac{a}{n-1}\leq b\leq a\), that is, if and only if the pairwise correlation, \(\rho=b/a\), between r.v.’s whose covariance matrix has an intraclass covariance structure is no less that \(-1/(n-1)\):

$$\rho\geq\rho_{n,\textrm{min}}:=-\frac{1}{n-1}.$$
(2)

This is in contrast with the general lower bound \(-1\) on the correlation between arbitrary r.v.’s. Let us refer to the values of \(\rho\) satisfying condition (2) as good.

In the rest of this note, we shall consider the special case when the r.v.’s \(X_{1},\dots,X_{n}\) are symmetric Bernoulli, so that

$$\operatorname{\mathsf{P}}(X_{i}=1)=\tfrac{1}{2}=\operatorname{\mathsf{P}}(X_{i}=0)$$
(3)

for all \(i\in[n]\). This important case has been extensively studied in computer science in general and in machine learning in particular (see e.g., [1, 5, 8, 11]), as well as in other applications of probability theory—though mainly when the \(X_{i}\)’s are independent.

The question now is the following:

$$\begin{gathered}\text{For what values of pairwise correlation } \rho \text{ do there exist symmetric Bernoulli r.v.'s } X_{1}, ..., X_{n} \\ \text{ whose covariance matrix } \Sigma \text{ is in ICCS}{}_n \text{?}\end{gathered}$$

Let us refer to such values of \(\rho\) as symmetric-binary-good. Clearly, any symmetric-binary-good value of \(\rho\) must be good. One then may wonder whether every good value of \(\rho\) is symmetric-binary-good.

The answer to this question may seem surprising:

  • if \(n\) is even, then yes, every good value of \(\rho\) is symmetric-binary-good;

  • if \(n\) is odd, then ‘‘nearly every’’ good value of \(\rho\) is symmetric-binary-good.

For symmetric Bernoulli r.v.’s \(X_{1},\dots,X_{n}\) whose covariance matrix \(\Sigma\) is in \(\text{ICCS}_{n}\), it is a bit more convenient to deal with the probability

$$p:=\operatorname{\mathsf{P}}(X_{1}=X_{2})$$

than with the correlation \(\rho\). It is easy to see that the values of \(\rho\) and \(p\) are in the simple bijective correspondence

$$(-1,1)\ni 2p-1=\rho\longleftrightarrow p=\frac{1+\rho}{2}\in(0,1),$$
(4)

so that \(\operatorname{\mathsf{P}}(X_{i}=X_{j})=p\) for all distinct \(i\) and \(j\) in \([n]\).

Let us refer to the values of \(p\) corresponding to the good values of \(\rho\) as good values of \(p\), and let us similarly define the symmetric-binary-good values of \(p\). So, in view of (2) and (4), a value \(p\in(0,1)\) is good if and only if

$$p\geq p_{n}:=\frac{n-2}{2(n-1)}.$$
(5)

Thus, we have to determine the symmetric-binary-good values of \(p\).

Suppose for a moment that \(p\in(0,1)\) is symmetric-binary-good. Then there exist symmetric Bernoulli r.v.’s \(X_{1},\dots,X_{n}\) such that \(\operatorname{\mathsf{P}}(X_{i}=X_{j})=p\) for all distinct \(i\) and \(j\) in \([n]\). Letting \(g\) stand for the joint probability mass function of the r.v.’s \(X_{1},\dots,X_{n}\), we note that \(g\) is a nonnegative function such that

(i) \(\sum_{x\in\{0,1\}^{n}}g(x)=1\),

(ii) \(\sum_{x\in\{0,1\}^{n}}1(x_{i}=0)g(x)=\frac{1}{2}\) for all \(i\in[n]\),

(iii) \(\sum_{x\in\{0,1\}^{n}}1(x_{i}=x_{j})g(x)=p\) for all distinct \(i\) and \(j\) in \([n]\);

of course, here \(x_{i}\) denotes the \(i\)th coordinate of the vector \(x=(x_{1},\dots,x_{n})\in\{0,1\}^{n}\). By symmetry, conditions (i)–(iii) will hold with \(\tilde{g}(x):=\frac{1}{n!}\sum_{\pi\in\Pi_{n}}g(\pi(x))\) in place of \(g(x)\), where \(\Pi_{n}\) is the set of all permutations of the set \([n]\). Note that \(\tilde{g}(x)=f(\sum_{1}^{n}x_{i})\) for some nonnegative function \(f\colon\{0,\dots,n\}\to\mathbb{R}\) and all \(x\in\{0,1\}^{n}\). So, conditions (i)–(iii) can be rewritten as

(I) \(\sum_{k=0}^{n}\binom{n}{k}f(k)=1\),

(II) \(\sum_{k=0}^{n}\binom{n-1}{k}f(k)=\frac{1}{2}\) for all \(i\),

(III) \(\sum_{k=0}^{n}a_{n,k}f(k)=p\),

where

$$a_{n,k}=\binom{n-2}{k}+\binom{n-2}{k-2};$$

of course, \(\binom{n-1}{n}=0\), \(\binom{n-2}{k}=0\) if \(k\geq n-1\) and \(\binom{n-2}{k-2}=0\) if \(k\leq 1\).

Thus, for any given \(n\geq 2\) and \(p\in(0,1)\), we want to see whether there is a nonnegative function \(f\colon\{0,\dots,n\}\to\mathbb{R}\) such that conditions (I)–(III) hold.

Towards this goal, consider the problem of finding the extrema of \(\sum_{k=0}^{n}a_{n,k}f(k)\) over all \(f\in F_{n}\), where \(F_{n}\) is the set of all nonnegative function \(f\colon\{0,\dots,n\}\to\mathbb{R}\) satisfying condition (I). In view of the symmetries \(\binom{n}{k}=\binom{n}{n-k}\) and \(a_{n,k}=a_{n,n-k}\), without loss of generality the functions \(f\) are symmetric in the same sense: \(f(k)=f(n-k)\) for all \(k\in\{0,\dots,n\}\)—otherwise, replacing \(f(k)\) by \(\frac{1}{2}\,(f(k)+f(n-k))\), we will have the sums in (I) and (III) unchanged. Next, consider the ratios

$$r_{k}:=r_{n,k}:=\frac{a_{n,k}}{\binom{n}{k}}=\frac{(n-k)(n-k-1)+k(k-1)}{n(n-1)}.$$

Note that \(r_{k+1}\leq r_{k}\) if \(0\leq k\leq\frac{n-1}{2}\) and \(r_{k+1}\geq r_{k}\) if \(\frac{n-1}{2}\leq k\leq n-1\). Also, \(r_{k}=r_{n-k}\). So, the smallest among the \(r_{k}\)’s is/are the one/ones with index/indices \(k\) closest to \(\frac{n}{2}\).

More specifically, if \(n=2m-1\) is odd, then \(r_{k}\geq r_{m}=r_{m-1}\) for all \(k\in\{1,\dots,n-1\}\). Letting then

$$f^{\textrm{odd}}_{\textrm{min}}(m-1):=\frac{1/2}{\binom{n}{m-1}}=\frac{1/2}{\binom{n}{m}},\quad f^{\textrm{odd}}_{\textrm{min}}(m):=\frac{1/2}{\binom{n}{m}}=\frac{1/2}{\binom{n}{m-1}},$$
$$f^{\textrm{odd}}_{\textrm{min}}(k):=0\quad\text{for all}\quad k\in\{0,\dots,n\}\setminus\{m-1,m\},$$

we see that \(f^{\textrm{odd}}_{\textrm{min}}\) is a symmetric function in \(F_{n}\) and

$$(r_{k}-r_{m})(f^{\textrm{odd}}_{\textrm{min}}(k)-f(k))\leq 0$$

for all \(k\in\{0,\dots,n\}\) and all symmetric functions \(f\in F_{n}\), which implies

$$\sum_{k=0}^{n}a_{n,k}f^{\textrm{odd}}_{\textrm{min}}(k)-\sum_{k=0}^{n}a_{n,k}f(k)=\sum_{k=0}^{n}a_{n,k}(f^{\textrm{odd}}_{\textrm{min}}(k)-f(k))$$
$${}=\sum_{k=0}^{n}\binom{n}{k}r_{k}(f^{\textrm{odd}}_{\textrm{min}}(k)-f(k))$$
$${}=\sum_{k=0}^{n}\binom{n}{k}(r_{k}-r_{m})(f^{\textrm{odd}}_{\textrm{min}}(k)-f(k))\leq 0.$$

It follows that \(f^{\textrm{odd}}_{\textrm{min}}\) is a minimizer of \(\sum_{k=0}^{n}a_{n,k}f(k)\) over all \(f\in F_{n}\), that is, over all nonnegative \(f\) satisfying condition (I). Moreover, condition (II) is satisfied with \(f^{\textrm{odd}}_{\textrm{min}}\) in place of \(f\).

We conclude that, in the case when \(n=2m-1\) is odd, \(f^{\textrm{odd}}_{\textrm{min}}\) is a minimizer of \(\sum_{k=0}^{n}a_{n,k}f(k)\) over all nonnegative \(f\) satisfying both conditions (I) and (II). The corresponding minimum value of \(\sum_{k=0}^{n}a_{n,k}f(k)\) is

$$p^{\textrm{odd}}_{n,\textrm{min}}:=\sum_{k=0}^{n}a_{n,k}f^{\textrm{odd}}_{\textrm{min}}(k)=\frac{m-1}{2m-1}=\frac{n-1}{2n}.$$

Similarly, in the case when \(n=2m\) is even, a minimizer of \(\sum_{k=0}^{n}a_{n,k}f(k)\) over all nonnegative \(f\) satisfying both conditions (I) and (II) is given by

$$f^{\mathsf{even}}_{\textrm{min}}(m):=\frac{1}{\binom{n}{m}}\quad\text{and}\quad f^{\mathsf{even}}_{\textrm{min}}(k):=0\ \ \,\text{for all}\ \,k\in\{0,\dots,n\}\setminus\{m\},$$

and the corresponding minimum value of \(\sum_{k=0}^{n}a_{n,k}f(k)\) is

$$p^{\mathsf{even}}_{n,\textrm{min}}:=\sum_{k=0}^{n}a_{n,k}f^{\mathsf{even}}_{\textrm{min}}(k)=\frac{m-1}{2m-1}=\frac{n-2}{2(n-1)}.$$

The above minimization can of course be recognized as something similar to, or even a special case of, the Neyman–Pearson lemma [6, part III].

The just considered cases of odd and even \(n\) can be summarized as follows. For

$$m_{n}:=\lceil n/2\rceil,$$

let \(f_{\textrm{min}}\) be the symmetric function in \(F_{n}\) such that \(\sum_{k\in\{m_{n},n-m_{n}\}}f_{\textrm{min}}(k)=1,\) so that \(f(k)=0\) for \(k\in\{0,\dots,n\}\setminus\{m_{n},n-m_{n}\}\). Then \(f_{\textrm{min}}\) is a minimizer of \(\sum_{k=0}^{n}a_{n,k}f(k)\) over all nonnegative \(f\) satisfying conditions (I) and (II). The corresponding minimum value of \(\sum_{k=0}^{n}a_{n,k}f(k)\) is

$$p_{n,\textrm{min}}:=\sum_{k=0}^{n}a_{n,k}f_{\textrm{min}}(k)=\dfrac{m_{n}-1}{2m_{n}-1}.$$

The extremal joint distribution of the binary r.v.’s \(X_{1},\dots,X_{n}\) corresponding to the minimizer \(f_{\textrm{min}}\) can be described as follows: the random set \(I:=\{i\in[n]\colon X_{i}=1\}\) is uniformly distributed on the set \(S_{n}:=\binom{[n]}{m_{n}}\cup\binom{[n]}{n-m_{n}}\), where \(\binom{[n]}{k}\) denotes the set of all subsets of cardinality \(k\) of the set \([n]\); of course, \(S_{n}:=\binom{[n]}{n/2}\) if \(n\) is even.

Next, letting

$$f_{\max}(0):=\tfrac{1}{2},\quad f_{\max}(n):=\tfrac{1}{2},\quad f_{\max}(k):=0\ \,\text{for all}\ \,k\in\{1,\dots,n-1\},$$

we see that the nonnegative function \(f_{\max}\) satisfies conditions (I) and (II), and also \(\sum_{k=0}^{n}a_{n,k}f_{\max}(k)=1\). On the other hand, for any nonnegative function \(f\) satisfying conditions (I) and (II), the sum \(\sum_{k=0}^{n}a_{n,k}f(k)\) is a probability and hence does not exceed \(1\). We conclude that \(f_{\max}\) is a maximizer of \(\sum_{k=0}^{n}a_{n,k}f(k)\) over all nonnegative \(f\) satisfying conditions (I) and (II). The corresponding maximum value of \(\sum_{k=0}^{n}a_{n,k}f(k)\) is

$$p_{n,\max}:=\sum_{k=0}^{n}a_{n,k}f_{\max}(k)=1.$$

The extremal joint distribution of the binary r.v.’s \(X_{1},\dots,X_{n}\) corresponding to the maximizer \(f_{\max}\) can be described as follows: the random set \(I=\{i\in[n]\colon X_{i}=1\}\) is uniformly distributed on the set \(\{\emptyset,[n]\}\); that is, \(\operatorname{\mathsf{P}}(I=\emptyset)=\frac{1}{2}=\operatorname{\mathsf{P}}(I=[n])\).

Now note that the set of all values of \(\sum_{k=0}^{n}a_{n,k}f(k)\), where \(f\colon\{0,\dots,n\}\to\mathbb{R}\) is a nonnegative function such that conditions (I) and (II) hold, is convex and therefore coincides with the interval \([p_{n,\textrm{min}},p_{n,\max}]=[p_{n,\textrm{min}},1]\).

Thus, a value \(p\in(0,1)\) is symmetric-binary-good if and only if

$$p\geq p_{n,\textrm{min}}=\dfrac{m_{n}-1}{2m_{n}-1}=\begin{cases}\dfrac{n-2}{2(n-1)}=p_{n}\quad\text{if $n$ is even}\\ \dfrac{n-1}{2n}=p_{n+1}>p_{n}\quad\text{if $n$ is odd},\end{cases}$$

where \(p_{n}\) is as in (5).

Because \(p_{n+1}\) is close to \(p_{n}\) for large \(n\) and in view of the correspondence (4) between \(\rho\) and \(p\), we have now confirmed that

  • if \(n\) is even then every good value of \(\rho\) is symmetric-binary-good;

  • if \(n\) is odd then, for large \(n\), nearly every good value of \(\rho\) is symmetric-binary-good.

One may also note here that for large \(n\) the lower bound \(\rho_{n,\textrm{min}}\) (defined in (2)) is close to (but less than) \(0\), whereas the lower bound \(p_{n,\textrm{min}}\) is close to (but less than) \(\frac{1}{2}\).