1 Introduction

A basic assumption in covariance structure analysis (CSA) is

$$\sqrt{n}(s-\sigma)\stackrel{\mathcal{D}}{\rightarrow}N(0,\varGamma), $$

where \(s=\operatorname{vech}(S)\), \(\sigma=\operatorname{vech}(\varSigma )\), and S and Σ are sample and population covariance matrices, respectively.

We are interested in the almost universal assumption that Γ is nonsingular. See, for example, Bentler (1983, p. 503, line 4), Browne (1984, R1), Satorra (1989, Theorem 5.2), and Yuan and Bentler (2001, C5). While this assumption has been used for over 30 years, there has been no discussion of its importance. Is this an assumption that almost always holds, or perhaps one that almost never holds? We will show that Γ can be singular when Σ is nonsingular, thus demonstrating that this is a real assumption.

Statistical software programs for CSA are based on theorems that assume Γ is nonsingular. The validity of standard errors and tests produced by these programs is predicated on the validity of this assumption. We show that when this assumption does not hold these programs can fail dramatically and may give no indication that a failure has occurred.

It is the user of these programs who must make this assumption. Thus, if I perform a confirmatory factor analysis using SAS, I’m the one who must assume Γ is nonsingular. Because SAS’s confirmatory factor analysis program has many users, one might automatically come to the conclusion that this assumption must be OK. This is the fifty million Frenchmen argument. Fortunately, Γ is always, or at least almost always, nonsingular. Showing this is the purpose of this paper. We wish to move something that is a matter of faith to a matter of fact.

The assumption that Γ is nonsingular is much more difficult to make than the common assumption that Σ is nonsingular. In regression analysis, for example, one tentatively assumes Σ is nonsingular. If it is not, the sample covariance matrix S must be singular and one’s program reports this. The user generally responds by using a subset of predictors for which S is nonsingular.

In CSA one’s computer cannot in general provide a red flag to warn the user that Γ may be singular. As we will show, when Γ is singular its estimate \(\hat{\varGamma}\) is generally nonsingular. As a consequence, ones computer cannot detect the singularity of Γ from that of its estimate \(\hat{\varGamma}\).

We begin by showing Γ is always nonsingular when sampling from a nonsingular distribution, including distributions defined by density functions. In this case, Γ is nonsingular and no additional assumptions are required.

From the point of view of applications, the primary singular distributions are the discrete distributions. In this case, we give necessary and sufficient conditions for the nonsingularity of Γ and show how to use these to show Γ is nonsingular in specific applications, at least with high probability. Hopefully, these results will make users of CSA software much more comfortable about the non-singularity assumption on which their software is based.

A reviewer suggested that we show what may happen when Γ is singular. In the Appendix, we show that an extensively used goodness of fit test statistic breaks down completely.

The asymptotic covariance matrix Γ also arises in structural equation modeling, where it is also assumed to be nonsingular. Our results apply to this form of analysis as well.

2 Covariance Structure Analysis

If S is a symmetric matrix, \(\operatorname{vech}(S)\) is a column vector containing the diagonal and upper diagonal elements s ij of S listed in lexicographical order. All but the last line of the following theorem is a well-known result in covariance structure analysis.

Theorem 1

If

  • x 1,…,x n is a sample from a distribution \(\mathcal{D}\) with finite fourth moments, mean μ, and covariance matrix Σ,

  • \(S=\frac{1}{n}\sum(x_{i}-\bar{x})(x_{i}-\bar{x})'\)

  • \(s=\operatorname{vech}(S)\) and \(\sigma=\operatorname {vech}(\varSigma)\),

then

$$\sqrt{n}(s-\sigma)\stackrel{\mathcal{D}}{\rightarrow}N(0,\varGamma), $$

where \(\varGamma=\operatorname{cov}(\operatorname{vech}((x-\mu)(x-\mu )'))\) and x is a sample of size one from \(\mathcal{D}\).

To discuss the nonsingularity of Γ one must know how Γ, which is a population parameter, is related to the distribution sampled. This is provided by the last line of Theorem 1. We have been unable to find a reference for Theorem 1 that includes this last line. Because of this, a proof is given in the Appendix.

As noted, it is generally assumed that Γ is nonsingular. We will show first that this is always the case when the distribution sampled is nonsingular and, in particular, when the distribution sampled is defined by a density function.

3 When the Distribution Sampled Is Nonsingular

We begin with a result about the zeros of a polynomial in m variables.

Lemma 1

If p(x) is a polynomial of degree d≥1 in m variables, then the zeros of p(x) have Lebesgue measure zero.

Proof

The proof is by induction. If d=1, p(x)=x+c and ≠0. Because ≠0 the zeros of p(x) have Lebesgue measure zero.

Assume the theorem holds for d=k and let p(x) be a k+1 degree polynomial in m variables.

Note that the zeros of \(\dot{p}(x)\) are a closed set. Hence, the x such that \(\dot{p}(x)\neq0\) is an open set. It follows from Theorem 5-1 of Spivak (1965) that the zeros of p(x) for which \(\dot{p}(x)\neq0\) are a smooth manifold of dimension m−1.

It follows from Spivak (1965, problem 5.8) that a manifold of dimension less than m has Lebesgue measure zero. Thus, the zeros of p(x) for which \(\dot{p}(x)\neq0\) have Lebesgue measure zero.

Now consider the zeros of p(x) for which \(\dot{p}(x)=0\). Since \(\dot{p}(x)\) is a k degree polynomial in m variables it follows from the induction hypotheses that the zeros of \(\dot{p}(x)\) have Lebesgue measure zero. Thus, the zeros of p(x) for which \(\dot{p}(x)=0\) have Lebesgue measure zero. Since the zeros of p(x) are a union of two sets with Lebesgue measure zero, the zeros of p(x) have Lebesgue measure zero. □

A distribution \(\mathcal{D}\) is said to be singular with respect to Lebesgue measure if there is a set with Lebesgue measure zero and probability measure one.

Theorem 2

If the matrix Γ in Theorem 1 is singular, then the distribution \(\mathcal{D}\) in Theorem 1 is singular.

Proof

Since s and σ do not depend on μ we may assume without loss of generality that μ=0. Then \(\varGamma =\operatorname{cov}(\operatorname{vech}(xx'))\).

Because Γ is singular there is a vector ≠0 such that Γℓ=0. It follows that \(\operatorname{var}(\ell '\operatorname{vech}(xx'))=\ell\varGamma \ell=0\), and hence there is a c such that \(\ell'\operatorname {vech}(xx')=c\) with probability one. Let

$$p(x)=\ell'\operatorname{vech}\bigl(x'x\bigr)-c. $$

Then p(x)=0 with probability one. Since the components of are not all zero, p(x) is a polynomial of degree one or more. It follows from Lemma 1 that the zeros of p(x) have Lebesgue measure zero. Since they also have probability one, the distribution \(\mathcal{D}\) is singular. □

The following theorem is an immediate consequence of Theorem 2.

Theorem 3

If the distribution \(\mathcal{D}\) in Theorem 1 is nonsingular, then the matrix Γ in Theorem 1 is nonsingular.

Thus, as promised, we have shown that Γ is nonsingular whenever the sampling is from a nonsingular distribution. In particular, Γ is nonsingular whenever the sampling is from a distribution defined by a density function.

4 When the Distribution Sampled Is Discrete

A natural question is what happens when \(\mathcal{D}\) is singular. From the point of view of applications, the primary singular distributions are the discrete distributions. These are considered here.

Theorem 4

If the distribution \(\mathcal{D}\) in Theorem 1 is discrete with mass points d 1,…,d q , then Γ is nonsingular if and only if

$$A=\left ( \begin{array}{c@{\quad}c@{\quad}c}\operatorname{vech}((d_1-\mu)(d_1-\mu)')& \ldots &\operatorname{vech}((d_q-\mu)(d_q-\mu)')\\ 1& \ldots &1 \end{array} \right ) $$

has full row rank.

Proof

Let e i =d i μ. Then

$$A=\left( \begin{array}{c@{\quad}c@{\quad}c} \operatorname{vech}(e_1e_1')& \ldots &\operatorname{vech}(e_qe_q')\\ 1& \ldots &1 \end{array} \right). $$

Let e=xμ; then

$$\varGamma=\operatorname{cov}\bigl(\operatorname{vech}\bigl(ee' \bigr)\bigr). $$

Assume Γ is singular. Then there is an ≠0 such that Γℓ=0 and

$$\operatorname{var}\bigl(\ell'\operatorname{vech} \bigl(ee'\bigr)\bigr)=\ell'\varGamma\ell=0. $$

This implies

$$\ell'\operatorname{vech}\bigl(ee'\bigr)=c $$

for some c with probability one. Because \(\mathcal{D}\) is discrete, for all i=1,…,q

$$\ell'\operatorname{vech}\bigl(e_ie_i' \bigr)=c. $$

This can be written in the form

$$\bigl(\ell',-c\bigr)\left ( \begin{array}{c@{\quad}c@{\quad}c}\operatorname{vech}(e_1e_1')& \ldots &\operatorname {vech}(e_qe_q')\\ 1& \ldots &1 \end{array} \right )=0, $$

or in terms of A as

$$\bigl(\ell',-c\bigr)A=0. $$

Since ≠0, this implies A does not have full row rank. Thus, Γ singular implies A does not have full row rank.

Assume now that A does not have full row rank. Then there is a vector a and scalar b such that a and b are not both zero and

$$\bigl(a',b\bigr)\left ( \begin{array}{c@{\quad}c@{\quad}c}\operatorname{vech}(e_1e_1')& \ldots &\operatorname {vech}(e_qe_q')\\ 1& \ldots &1 \end{array} \right )=0. $$

Note that a cannot be zero because this would imply b=0.

Taking the ith column of both sides gives

$$a'\operatorname{vech}\bigl(e_ie_i' \bigr)+b=0 $$

for all e i . It follows that

$$a'\operatorname{vech}\bigl(ee'\bigr)+b=0 $$

with probability one. Hence,

$$\operatorname{var}\bigl(a'\operatorname{vech}\bigl(ee' \bigr)\bigr)=0. $$

Thus, \(a'\varGamma a=a'(\operatorname{vech}(ee'))a=\operatorname {var}(a'\operatorname{vech}(ee'))=0\). Since a≠0, Γ is singular.

Thus, Γ is singular if and only if A does not have full row rank; or Γ is nonsingular if and only if A has full row rank. □

We will use Theorem 4 to show, as promised, that Γ can be singular even when Σ is nonsingular.

Assume \(\mathcal{D}\) is discrete and has the mass points displayed in Figure 1.

Figure 1.
figure 1

Mass points.

Assume these carry equal probability mass. Clearly, Σ is nonsingular.

Let d 1,…,d 6 be the points displayed, then

$$\mu=(d_1+\cdots+d_6)/6=\left ( \begin{array}{c} 2\\ 1.5 \end{array} \right ). $$

This together with the formula for the matrix A in Theorem 4 gives

$$A=\left ( \begin{array}{c@{\quad}c@{\quad}c@{\quad}c@{\quad}c@{\quad}c} 1.00 & 1.00 & 0.00 & 0.00 & 1.00 & 1.00 \\ 0.50 & -0.50 & 0.00 & 0.00 & -0.50 & 0.50 \\ 0.25 & 0.25 & 0.25 & 0.25 & 0.25 & 0.25 \\ 1.00 & 1.00 & 1.00 & 1.00 & 1.00 & 1.00 \\ \end{array} \right ). $$

The singular values of A are

$$\lambda=\left ( \begin{array}{c} 3.08 \\ 1.00 \\ 0.95 \\ 0.00 \\ \end{array} \right ). $$

Hence, A does not have full row rank. It follows from Theorem 4 that Γ is singular.

We have assumed in this example that in the population sampled the probabilities of the design points are exactly equal. In practice, this seems very unlikely. If one assigns random probabilities to the design points the resulting A matrix always seems to have full row rank. More precisely, when we did this using 1,000 randomly generated probability assignments, all 1,000 A matrices had full row rank, and hence by Theorem 4 had nonsingular Γ. This does not show that Γ is nonsingular with probability one, but the probability must at least be very high when probability masses are assigned at random.

In general, it is difficult to use Theorem 4 to show Γ is nonsingular. One would have to know μ, which generally is unknown because the population probability masses are unknown.

As in the previous example, however, one can assign random probability masses to one’s mass points to investigate the probability that the distribution sampled in one’s specific application has a nonsingular Γ. If a large number of probability assignments all produce a nonsingular Γ, this strongly suggests the population Γ for the distribution under investigation is nonsingular with high probability. This should make one much more comfortable about making this assumption.

A somewhat anecdotal argument for non-singularity of Γ is that it took the authors a long time to find any set of mass points and a probability assignment that would produce a singular Γ. If the mass points are a 3 by 3 array rather than a 2 by 3 array, for example, uniform probabilities give a nonsingular Γ.

5 Discussion

We have shown the asymptotic covariance matrix Γ used in covariance structural analysis and structural equation modeling of nonnormal data is generally nonsingular. This is important because this assumption is used to obtain standard errors and goodness of fit tests in standard statistical software.

To date there have been no conditions identified that guarantee or even motivate this nonsingularity assumption in the nonparametric context. Theorem 3 shows that, when sampling from a non-singular distribution, Γ must be nonsingular. The nonsingularity of Γ in the discrete case depends on the probabilities of the mass points in the population sampled. When these are known, Theorem 4 can be used to determine the nonsingularity of Γ. In general, however, they are not known. When this is the case, Theorem 4 can be used to investigate the likelihood that Γ is nonsingular. Our results make one much more comfortable about assuming Γ is nonsingular.

We conjecture that in the discrete case Γ is nonsingular with probability one whenever probability masses are assigned randomly. Our numerical example strongly suggests this is true, but at present we have no proof.