Synonyms

Gaussian distribution; Normal distribution

Definition

The normal distribution is a widely used probability distribution to describe samples, populations, and sampling distributions of statistics.

Description

In most of the inferential statistical analyses, one of the important assumptions is the assumption of multivariate normality. Multivariate normality is the assumption that each variable and all linear combinations of the variables are normally distributed (Tabachnick & Fidell, 2007, p. 78). When this assumption is violated, results derived from statistical analysis may not be reliable and valid. For example, multivariate normality of ordinal variables is a condition for the testing of measurement invariance or construct equivalence using the maximum likelihood estimation method of multiple-group confirmatory factor analysis (Byrne, 1998: Koh & Zumbo, 2008). Tests for checking multivariate normality are overly sensitive, and hence, researchers are encouraged to check for univariate normality, which is the distribution of each individual variable rather than the distribution of an infinite number of linear combinations of variables. The normal or Gaussian distribution is a bell-curved model, which shows symmetric, continuous distribution and is described by two parameters, namely, the mean and the standard deviation (see Fig. 1).

Univariate Normal Distribution, Fig. 1
figure 213figure 213

The normal distribution

The normal distribution is an approximation to the distribution of values or scores of a characteristic, for example, IQ scores or mathematics achievement scores. The exact shape of the normal distribution depends on the mean and the standard deviation of the distribution. The mean is a measure of central tendency, which describes the most typical value in a sample. The standard deviation is a measure of dispersion, which indicates the amount of departure of the values from the mean.

The normality of individual variables is assessed by either graphical or statistical methods. Graphical methods allow for visualization of the distributions of random variables, whereas numerical methods provide descriptive statistics or statistical tests of normality of random variables. The two most commonly used descriptive statistics of univariate normal distribution are skewness and kurtosis. Skewness indicates the extent of the normality of a variable and a skewed variable is a variable whose mean is not in the center of the distribution.

When a variable is normally distributed, the values of both skewness and kurtosis are zero. The ratio of each statistic to its standard error can be used as a test of normality. Values of skewness and kurtosis that fall within the range of −2 and +2 indicate univariate normality. There are two types of skewness: (1) positive skewness and (2) negative skewness. A positively skewed variable has a pileup of cases to the left and the right tail is too long. In contrast, a negatively skewed variable has a pileup of cases to the right and the left tail is too long. Kurtosis is related to the peakedness of a distribution. When a variable has a positive kurtosis, its distribution is too peaked. A negative kurtosis refers to a distribution that is too flat. A variable can have significant skewness, kurtosis, or both (Tabachnick & Fidell, 2007, p. 79). Skewness and kurtosis statistics are sensitive to anomalies or outliers in the distribution. As such, they must be examined in conjunction with other graphical methods such as a histogram, boxplot, or stem-and-leaf diagram. Most of the statistical tests of univariate normality are also sensitive to large sample size. For small to moderate samples, conventional but conservative alpha levels (.01 or.001) are used to evaluate the significance of skewness and kurtosis with small to moderate samples. But for large sample sizes, tests of univariate normality using the skewness or kurtosis ratio can be sensitive even though when there are only minor deviations from normality (Tabachnick & Fidell, 2007).

Typically, skewness, kurtosis, histogram, boxplot, or stem-leaf diagram is used in the stage of exploratory data analysis prior to the application of inferential statistics. There are also theory-driven graphical and numerical methods for evaluating univariate normality. Two graphical methods are P-P and Q-Q plots. Some of the numerical methods include the Shapiro-Wilk and Kolmogorov-Smirnov tests. These various methods are available in SPSS frequencies, descriptives, or explore. Both P-P and Q-Q plots can be generated in two ways: expected normality plots (Chambers, Cleveland, Kleiner, & Tukey, 1983) and detrended normal probability plots. If a variable has a normal distribution, the expected normality plot of the variable will show that all cases fall along the diagonal line running from lower left to upper right. An alternative method for checking univariate normality is the detrended normal probability plots. If a variable has a normal distribution, the detrended normal probability plot will show that all cases distribute themselves evenly above and below the horizontal line that intersects the Y axis at 0.0, the line of zero deviation from expected normal value (Tabachnick & Fidell, 2007).

Cross-References

Confirmatory Factor Analysis (CFA)

Interval Scale

Measurement Invariance

Representative Sample

Study Population

Univariate Tests