Abstract
Covariance structure analysis of nonnormal data is important because in practice all data are nonnormal. When applying covariance structure analysis to nonnormal data, it is generally assumed that the asymptotic covariance matrix Γ for the nonredundant terms in the sample covariance matrix S is nonsingular. It is shown this need not be the case, which raises a question of how restrictive this assumption may be and how difficult it may be to verify it. It is shown that Γ is nonsingular whenever sampling is from a nonsingular distribution, including any distribution defined by a density function. In the discrete case necessary and sufficient conditions are given for the nonsingularity of Γ, and it is shown how to demonstrate Γ is nonsingular with high probability. Thus, the nonsingularity of Γ assumption is mild and one should feel comfortable about making it. These observations also apply to the asymptotic covariance matrix Γ that arises in structural equation modeling.
Similar content being viewed by others
Avoid common mistakes on your manuscript.
1 Introduction
A basic assumption in covariance structure analysis (CSA) is
where \(s=\operatorname{vech}(S)\), \(\sigma=\operatorname{vech}(\varSigma )\), and S and Σ are sample and population covariance matrices, respectively.
We are interested in the almost universal assumption that Γ is nonsingular. See, for example, Bentler (1983, p. 503, line 4), Browne (1984, R1), Satorra (1989, Theorem 5.2), and Yuan and Bentler (2001, C5). While this assumption has been used for over 30 years, there has been no discussion of its importance. Is this an assumption that almost always holds, or perhaps one that almost never holds? We will show that Γ can be singular when Σ is nonsingular, thus demonstrating that this is a real assumption.
Statistical software programs for CSA are based on theorems that assume Γ is nonsingular. The validity of standard errors and tests produced by these programs is predicated on the validity of this assumption. We show that when this assumption does not hold these programs can fail dramatically and may give no indication that a failure has occurred.
It is the user of these programs who must make this assumption. Thus, if I perform a confirmatory factor analysis using SAS, I’m the one who must assume Γ is nonsingular. Because SAS’s confirmatory factor analysis program has many users, one might automatically come to the conclusion that this assumption must be OK. This is the fifty million Frenchmen argument. Fortunately, Γ is always, or at least almost always, nonsingular. Showing this is the purpose of this paper. We wish to move something that is a matter of faith to a matter of fact.
The assumption that Γ is nonsingular is much more difficult to make than the common assumption that Σ is nonsingular. In regression analysis, for example, one tentatively assumes Σ is nonsingular. If it is not, the sample covariance matrix S must be singular and one’s program reports this. The user generally responds by using a subset of predictors for which S is nonsingular.
In CSA one’s computer cannot in general provide a red flag to warn the user that Γ may be singular. As we will show, when Γ is singular its estimate \(\hat{\varGamma}\) is generally nonsingular. As a consequence, ones computer cannot detect the singularity of Γ from that of its estimate \(\hat{\varGamma}\).
We begin by showing Γ is always nonsingular when sampling from a nonsingular distribution, including distributions defined by density functions. In this case, Γ is nonsingular and no additional assumptions are required.
From the point of view of applications, the primary singular distributions are the discrete distributions. In this case, we give necessary and sufficient conditions for the nonsingularity of Γ and show how to use these to show Γ is nonsingular in specific applications, at least with high probability. Hopefully, these results will make users of CSA software much more comfortable about the non-singularity assumption on which their software is based.
A reviewer suggested that we show what may happen when Γ is singular. In the Appendix, we show that an extensively used goodness of fit test statistic breaks down completely.
The asymptotic covariance matrix Γ also arises in structural equation modeling, where it is also assumed to be nonsingular. Our results apply to this form of analysis as well.
2 Covariance Structure Analysis
If S is a symmetric matrix, \(\operatorname{vech}(S)\) is a column vector containing the diagonal and upper diagonal elements s ij of S listed in lexicographical order. All but the last line of the following theorem is a well-known result in covariance structure analysis.
Theorem 1
If
-
x 1,…,x n is a sample from a distribution \(\mathcal{D}\) with finite fourth moments, mean μ, and covariance matrix Σ,
-
\(S=\frac{1}{n}\sum(x_{i}-\bar{x})(x_{i}-\bar{x})'\)
-
\(s=\operatorname{vech}(S)\) and \(\sigma=\operatorname {vech}(\varSigma)\),
then
where \(\varGamma=\operatorname{cov}(\operatorname{vech}((x-\mu)(x-\mu )'))\) and x is a sample of size one from \(\mathcal{D}\).
To discuss the nonsingularity of Γ one must know how Γ, which is a population parameter, is related to the distribution sampled. This is provided by the last line of Theorem 1. We have been unable to find a reference for Theorem 1 that includes this last line. Because of this, a proof is given in the Appendix.
As noted, it is generally assumed that Γ is nonsingular. We will show first that this is always the case when the distribution sampled is nonsingular and, in particular, when the distribution sampled is defined by a density function.
3 When the Distribution Sampled Is Nonsingular
We begin with a result about the zeros of a polynomial in m variables.
Lemma 1
If p(x) is a polynomial of degree d≥1 in m variables, then the zeros of p(x) have Lebesgue measure zero.
Proof
The proof is by induction. If d=1, p(x)=ℓ′x+c and ℓ≠0. Because ℓ≠0 the zeros of p(x) have Lebesgue measure zero.
Assume the theorem holds for d=k and let p(x) be a k+1 degree polynomial in m variables.
Note that the zeros of \(\dot{p}(x)\) are a closed set. Hence, the x such that \(\dot{p}(x)\neq0\) is an open set. It follows from Theorem 5-1 of Spivak (1965) that the zeros of p(x) for which \(\dot{p}(x)\neq0\) are a smooth manifold of dimension m−1.
It follows from Spivak (1965, problem 5.8) that a manifold of dimension less than m has Lebesgue measure zero. Thus, the zeros of p(x) for which \(\dot{p}(x)\neq0\) have Lebesgue measure zero.
Now consider the zeros of p(x) for which \(\dot{p}(x)=0\). Since \(\dot{p}(x)\) is a k degree polynomial in m variables it follows from the induction hypotheses that the zeros of \(\dot{p}(x)\) have Lebesgue measure zero. Thus, the zeros of p(x) for which \(\dot{p}(x)=0\) have Lebesgue measure zero. Since the zeros of p(x) are a union of two sets with Lebesgue measure zero, the zeros of p(x) have Lebesgue measure zero. □
A distribution \(\mathcal{D}\) is said to be singular with respect to Lebesgue measure if there is a set with Lebesgue measure zero and probability measure one.
Theorem 2
If the matrix Γ in Theorem 1 is singular, then the distribution \(\mathcal{D}\) in Theorem 1 is singular.
Proof
Since s and σ do not depend on μ we may assume without loss of generality that μ=0. Then \(\varGamma =\operatorname{cov}(\operatorname{vech}(xx'))\).
Because Γ is singular there is a vector ℓ≠0 such that ℓ′Γℓ=0. It follows that \(\operatorname{var}(\ell '\operatorname{vech}(xx'))=\ell\varGamma \ell=0\), and hence there is a c such that \(\ell'\operatorname {vech}(xx')=c\) with probability one. Let
Then p(x)=0 with probability one. Since the components of ℓ are not all zero, p(x) is a polynomial of degree one or more. It follows from Lemma 1 that the zeros of p(x) have Lebesgue measure zero. Since they also have probability one, the distribution \(\mathcal{D}\) is singular. □
The following theorem is an immediate consequence of Theorem 2.
Theorem 3
If the distribution \(\mathcal{D}\) in Theorem 1 is nonsingular, then the matrix Γ in Theorem 1 is nonsingular.
Thus, as promised, we have shown that Γ is nonsingular whenever the sampling is from a nonsingular distribution. In particular, Γ is nonsingular whenever the sampling is from a distribution defined by a density function.
4 When the Distribution Sampled Is Discrete
A natural question is what happens when \(\mathcal{D}\) is singular. From the point of view of applications, the primary singular distributions are the discrete distributions. These are considered here.
Theorem 4
If the distribution \(\mathcal{D}\) in Theorem 1 is discrete with mass points d 1,…,d q , then Γ is nonsingular if and only if
has full row rank.
Proof
Let e i =d i −μ. Then
Let e=x−μ; then
Assume Γ is singular. Then there is an ℓ≠0 such that ℓ′Γℓ=0 and
This implies
for some c with probability one. Because \(\mathcal{D}\) is discrete, for all i=1,…,q
This can be written in the form
or in terms of A as
Since ℓ≠0, this implies A does not have full row rank. Thus, Γ singular implies A does not have full row rank.
Assume now that A does not have full row rank. Then there is a vector a and scalar b such that a and b are not both zero and
Note that a cannot be zero because this would imply b=0.
Taking the ith column of both sides gives
for all e i . It follows that
with probability one. Hence,
Thus, \(a'\varGamma a=a'(\operatorname{vech}(ee'))a=\operatorname {var}(a'\operatorname{vech}(ee'))=0\). Since a≠0, Γ is singular.
Thus, Γ is singular if and only if A does not have full row rank; or Γ is nonsingular if and only if A has full row rank. □
We will use Theorem 4 to show, as promised, that Γ can be singular even when Σ is nonsingular.
Assume \(\mathcal{D}\) is discrete and has the mass points displayed in Figure 1.
Assume these carry equal probability mass. Clearly, Σ is nonsingular.
Let d 1,…,d 6 be the points displayed, then
This together with the formula for the matrix A in Theorem 4 gives
The singular values of A are
Hence, A does not have full row rank. It follows from Theorem 4 that Γ is singular.
We have assumed in this example that in the population sampled the probabilities of the design points are exactly equal. In practice, this seems very unlikely. If one assigns random probabilities to the design points the resulting A matrix always seems to have full row rank. More precisely, when we did this using 1,000 randomly generated probability assignments, all 1,000 A matrices had full row rank, and hence by Theorem 4 had nonsingular Γ. This does not show that Γ is nonsingular with probability one, but the probability must at least be very high when probability masses are assigned at random.
In general, it is difficult to use Theorem 4 to show Γ is nonsingular. One would have to know μ, which generally is unknown because the population probability masses are unknown.
As in the previous example, however, one can assign random probability masses to one’s mass points to investigate the probability that the distribution sampled in one’s specific application has a nonsingular Γ. If a large number of probability assignments all produce a nonsingular Γ, this strongly suggests the population Γ for the distribution under investigation is nonsingular with high probability. This should make one much more comfortable about making this assumption.
A somewhat anecdotal argument for non-singularity of Γ is that it took the authors a long time to find any set of mass points and a probability assignment that would produce a singular Γ. If the mass points are a 3 by 3 array rather than a 2 by 3 array, for example, uniform probabilities give a nonsingular Γ.
5 Discussion
We have shown the asymptotic covariance matrix Γ used in covariance structural analysis and structural equation modeling of nonnormal data is generally nonsingular. This is important because this assumption is used to obtain standard errors and goodness of fit tests in standard statistical software.
To date there have been no conditions identified that guarantee or even motivate this nonsingularity assumption in the nonparametric context. Theorem 3 shows that, when sampling from a non-singular distribution, Γ must be nonsingular. The nonsingularity of Γ in the discrete case depends on the probabilities of the mass points in the population sampled. When these are known, Theorem 4 can be used to determine the nonsingularity of Γ. In general, however, they are not known. When this is the case, Theorem 4 can be used to investigate the likelihood that Γ is nonsingular. Our results make one much more comfortable about assuming Γ is nonsingular.
We conjecture that in the discrete case Γ is nonsingular with probability one whenever probability masses are assigned randomly. Our numerical example strongly suggests this is true, but at present we have no proof.
References
Bentler, P.M. (1983). Some contributions to efficient statistics in structural models: specification and estimation of structural models. Psychometrika, 48, 493–517.
Browne, M.B. (1984). Asymptotically distribution-free methods for the analysis of covariance structures. British Journal of Mathematical & Statistical Psychology, 37, 62–83.
Satorra, A. (1989). Alternative test criteria in covariance structure analysis: a unified approach. Psychometrika, 54, 131–151.
Satorra, A., & Bentler, P.M. (1990). Model conditions for asymptotic robustness in the analysis of linear relations. Computational Statistics & Data Analysis, 10, 235–249.
Spivak, M. (1965). Calculus on manifolds. New York: Perseus Book Publishing, LLC
Yuan, K., & Bentler, P.M. (2001). Effect of outliers on estimators and tests in covariance structure analysis. British Journal of Mathematical & Statistical Psychology, 54, 161–175.
Author information
Authors and Affiliations
Corresponding author
Additional information
The research of the second author is supported by grant EC02011-28875 from the Spanish Ministry of Science and Innovation.
Appendix: Proof of Theorem 1 and Singular Γ Example
Appendix: Proof of Theorem 1 and Singular Γ Example
1.1 A.1 Proof of Theorem 1
Let e i =x i −μ. Then \(e_{i}-\bar{e} =x_{i}-\bar{x}\) and
and
Since \(\bar{e}=\bar{x}-\mu\stackrel{p}{\rightarrow}0\) and \(\sqrt {n}\bar{e}=\sqrt{n}(\bar{x}-\mu )\stackrel{\mathcal{D}}{\rightarrow}N(0,\varSigma)\), \(\sqrt{n}\bar{e}\bar{e}'\stackrel{p}{\rightarrow}0\). Thus,
and
Note that the \(\operatorname{vech}(e_{i}e_{i}')\) are independent and identically distributed, σ is the common expected value of the \(\operatorname{vech} (e_{i}e_{i}') =\operatorname{vech}((x_{i}-\mu)(x_{i}-\mu)')\), and Γ is the common covariance matrix of the \(\operatorname{vech}(e_{i}e_{i}')\). It follows from the central limit theorem that the right-hand side of (A.1) converges in distribution to N(0,Γ), and hence that
Moreover,
1.2 A.2 What Happens When Γ Is Singular?
Using a very simple example, we will show that Browne’s (1984) extensively used goodness of fit test for CSA can fail completely when Γ is singular. Consider random sampling from the array in Figure 1. We have shown that the asymptotic covariance matrix Γ for the sample covariances s generated in this way is singular. Consider a covariance structure
When θ=0 this is equal to σ. Using a least squares estimate \(\hat{\theta}\) of θ, Browne’s statistic for testing the goodness of fit of γ(θ) is
where U is an orthogonal complement of
the Jacobian of γ(θ), and \(\hat{\varGamma}\) is a consistent estimator for Γ. We have used the estimator \(\hat{\varGamma}\) given by Satorra and Bentler (1990, Formula 2.4).
If Γ were nonsingular, T would have an asymptotic χ 2 distribution with two degrees of freedom. But in this example Γ is singular. To investigate the performance of Browne’s test in this situation we generated N=1000 samples of size n=100, and for each a value of T was computed. Figure 2 is a Q–Q plot of the values of T on the corresponding quantiles of the \(\chi^{2}_{2}\). Clearly, the distribution of T differs greatly from \(\chi^{2}_{2}\). Moreover, in 78 of the 1,000 trials T could not even be computed because \(\hat {\varGamma}\) was too nearly singular. On these trials, the computer output correctly suggests that Γ is singular. On 92 % of the trials, however, no such warning is produced.
One wonders what would happen in this example if Γ were nonsingular. Assume the array in Figure 1 were replaced by a 3 by 3 array with x and y values equal to 1, 2, 3. If these carry equal probability mass, Γ is nonsingular and the Q–Q plot becomes that displayed in Figure 3.
Clearly, T now is very nearly \(\chi^{2}_{2}\) distributed. Moreover, there were no problems in computing the values of T.
Rights and permissions
About this article
Cite this article
Jennrich, R., Satorra, A. The Nonsingularity of Γ in Covariance Structure Analysis of Nonnormal Data. Psychometrika 79, 51–59 (2014). https://doi.org/10.1007/s11336-013-9353-1
Received:
Revised:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11336-013-9353-1