Chapter 7 of The Measurement of Association applies exact and Monte Carlo permutation statistical methods to measures of association designed for two or more interval-level variables. While permutation statistical methods are commonly associated with non-parametric statistics and, therefore, thought by many to be limited to nominal- and ordinal-level measurements, such is certainly not the case, as noted by Feinstein in 1973 [12]. In fact, a great strength of exact and Monte Carlo permutation statistical methods is in the analysis of interval-level measurements [6]. Chapter 7 begins with a discussion and comparison of simple and multiple ordinary least squares (OLS) regression and simple and multiple least absolute deviation (LAD) regression using permutation statistical methods. Multiple regression with multiple independent variables and multivariate dependent variables is described and illustrated. Point-biserial and biserial correlation coefficients are described and analyzed with exact and Monte Carlo permutation methods. Fisher’s z transform is examined and evaluated as to its utility in transforming skewed distributions for both hypothesis testing and confidence intervals. Chapter 7 concludes with a discussion of permutation statistical methods applied to Pearson’s intraclass correlation coefficient.

7.1 Ordinary Least Squares (OLS) Linear Regression

Ordinary least squares (OLS) regression with a single predictor is a popular statistical measure of the degree of association (correlation) between two interval-level variables, usually denoted as x and y. The assumption of normality comes into play when the null hypothesis is tested by conventional means. Permutation statistical methods do not assume normality and, therefore, are often more useful than conventional statistical methods, especially when the sample size is small. Let r xy denote the Pearson product-moment correlation coefficient for variables x and y given by

$$\displaystyle \begin{aligned} r_{xy} = \frac{\displaystyle\sum_{i=1}^{N}(x_{i}-\bar{x})(y_{i}-\bar{y})}{\sqrt{\left[ \displaystyle\sum_{i=1}^{N}(x_{i}-\bar{x})^{2} \right] \left[ \displaystyle\sum_{i=1}^{N}(y_{i}-\bar{y})^{2} \right]}}\;, \end{aligned}$$

where \(\bar {x}\) and \(\bar {y}\) denote the arithmetic means of variables x and y, respectively, and N is the number of bivariate measurements. The conventional test of significance is given by

$$\displaystyle \begin{aligned} t = \frac{r_{xy} \sqrt{N-2}}{\sqrt{1-r_{xy}^{2}}}\;, \end{aligned}$$

which is distributed as Student’s t with N − 2 degrees of freedom, under the assumption of normality.

More useful than simple OLS regression and correlation is multiple OLS regression with p predictors, x 1, x 2, …, x p. Let \(R_{y.x_{1},\,x_{2},\,\ldots ,\,x_{p}}\) indicate the multiple correlation coefficient for variables y and x 1, x 2, …, x p given by

$$\displaystyle \begin{aligned} R_{x_{1},\,x_{2},\,\ldots,\,x_{p}}^{2} = \boldsymbol{\beta}^{\prime}{\mathbf{r}}_{y}\;, \end{aligned}$$

where β is the transposed vector of standardized regression weights and r y is the vector of zero-order correlation coefficients of y with x 1, x 2, …, x p. The conventional test of significance is given by

$$\displaystyle \begin{aligned} F = \frac{(N-p-1)R_{y.x_{1},\,x_{2},\,\ldots,\,x_{p}}^{2}}{p(1-R_{y.x_{1},\,x_{2},\,\ldots,\,x_{p}}^{2})}\;, \end{aligned}$$

which is distributed as Snedecor’s F with p and N − p − 1 degrees of freedom, under the assumption of normality.

7.1.1 Univariate Example of OLS Regression

Consider the example set of bivariate data listed in Table 7.1 for N = 11 subjects. For the bivariate data listed in Table 7.1, the Pearson product-moment correlation coefficient is r xy = +0.8509. An exact permutation analysis requires random shuffles of either the x or the y values with the other set of values held constant. For this small example there are

$$\displaystyle \begin{aligned} M = N! = 11! = 39{,}916{,}800 \end{aligned}$$

possible, equally-likely arrangements in the reference set of all permutations of the observed bivariate data, making an exact permutation analysis feasible. Monte Carlo resampling methods are generally preferred for permutation correlation analyses since N! is usually a very large number, e.g., with N = 13 there are 13! = 6, 227, 020, 800 possible arrangements. Let r o indicate the observed value of r xy. Then, based on L = 1, 000, 000 random arrangements of the observed data under the null hypothesis, there are 861 |r xy| values equal to or greater than |r o| = 0.8509, yielding a Monte Carlo resampling two-sided probability value of P = 861∕1, 000, 000 = 0.8610×10−3.

Table 7.1 Example bivariate OLS correlation data on N = 11 subjects

While M = 39, 916, 800 possible arrangements of the observed data makes an exact permutation analysis impractical, it is not impossible. Based on the M = 39, 916, 800 arrangements of the observed data under the null hypothesis, there are 35,216 |r xy| values equal to or greater than |r o| = 0.8509, yielding an exact two-sided probability value of P = 35, 216∕39, 916, 800 = 0.8822×10−3. For comparison, for the data listed in Table 7.1 t = 4.8591 and the two-sided probability value of |r o| = 0.8509 based on Student’s t distribution with N − 2 = 11 − 2 = 9 degrees of freedom is P = 0.8969×10−3.

7.1.2 Multivariate Example of OLS Regression

For a multivariate example of OLS linear regression, consider the small example data set with p = 2 predictors listed in Table 7.2 where variable y is Weight in pounds, variable x 1 is Height in inches, and variable x 2 is Age in years for N = 12 school children. For the multivariate data listed in Table 7.2, the unstandardized OLS regression coefficients are

$$\displaystyle \begin{aligned} \hat{\beta}_{1} = +1.1973 \quad \mbox{and} \quad \hat{\beta}_{2} = +1.1709\;, \end{aligned}$$

and the squared OLS multiple correlation coefficient is \(R_{y.x_{1},\,x_{2}}^{2} = 0.7301\) (henceforth, simply R 2). An exact permutation analysis of multiple correlation requires random shuffles of either the x or the y values. It is important to note that the predictor variables must be shuffled as a unit, i.e., x 1, …, x p. Otherwise, a researcher may end up with a combination of predictor variables that make no sense, e.g., 4-year-old child, married, with two children. Thus, it is advisable to simply shuffle the y values. Even with this very small example there are

$$\displaystyle \begin{aligned} M = N! = 12! = 479{,}001{,}600 \end{aligned}$$

possible, equally-likely arrangements of the observed data, making an exact permutation analysis impractical. Based on L = 1, 000, 000 random arrangements of the observed data, the Monte Carlo resampling probability of R 2 = 0.7301 is

where \(R_{\text{o}}^{2}\) denotes the observed value of R 2.

Table 7.2 Example multivariate OLS correlation data on N = 12 children

While M = 479, 001, 600 possible arrangements makes an exact permutation analysis impractical, it is not impossible. If the reference set of all possible permutations of the observed scores in Table 7.2 occur with equal chance, the exact probability of R 2 = 0.7301 under the null hypothesis is

where \(R^{2}_{\text{o}}\) denotes the observed value of R 2. For comparison, for the data listed in Fig. 7.2, F = 12.1728 and the probability value of R 2 = 0.7301 based on Snedecor’s F distribution with p, N − p − 1 = 2, 12 − 2 − 1 = 2, 9 degrees of freedom is approximately P = 0.2757×10−2, under the null hypothesis.

7.2 Least Absolute Deviation (LAD) Regression

Ordinary least squares (OLS) linear regression has long been recognized as a useful tool in many areas of research. The optimal properties of OLS linear regression are well known when the errors are normally distributed. In practice, however, the assumption of normality is rarely justified. Least absolute deviation (LAD) linear regression is often superior to OLS linear regression when the errors are not normally distributed [8, 9, 29, 44, 55]. Estimators of OLS regression parameters can be severely affected by unusual values in either the criterion variable or in one or more of the predictor variables, which is largely due to the weight given to each data point when minimizing the sum of squared errors. In contrast, LAD regression is less sensitive to the effects of unusual values because the errors are not squared. The comparison between OLS and LAD linear regression is analogous to the effect of extreme values on the mean and median as measures of location [8]. In this section, the robust nature of least absolute linear regression is illustrated with a simple example and the effects of distance, leverage, and influence are examined. For clarity and efficiency, the illustration and ensuing discussion are limited to simple linear regression with one predictor variable (x) and one criterion variable (y), with no loss of generality.

Consider N paired x i and y i observed values for i = 1, …, N. For the OLS regression equation given by

$$\displaystyle \begin{aligned} \hat{y}_{i} = \hat{\alpha}_{yx}+\hat{\beta}_{yx} x_{i}\;, \end{aligned}$$

where \(\hat {y}_{i}\) is the ith of N predicted criterion values and x i is the ith of N predictor values, \(\hat {\alpha }_{yx}\) and \(\hat {\beta }_{yx}\) are the OLS parameter estimates of the intercept (α yx) and slope (β yx), respectively, and are given by

$$\displaystyle \begin{aligned} \hat{\beta}_{yx} = \frac{\displaystyle\sum_{i=1}^{N}\big(y_{i}-\bar{y}\big)\big(x_{i}-\bar{x}\big)}{\displaystyle\sum_{i=1}^{N}\big(x_{i}-\bar{x}\big)^{2}} \end{aligned} $$
(7.1)

and

$$\displaystyle \begin{aligned} \hat{\alpha}_{yx} = \bar{y}-\hat{\beta}_{yx}\bar{x}\;, \end{aligned} $$
(7.2)

where \(\bar {x}\) and \(\bar {y}\) are the sample means of variables x and y, respectively. Estimates of OLS regression parameters minimize the sum of the squared differences between the observed and predicted criterion values, i.e.,

$$\displaystyle \begin{aligned} \sum_{i=1}^{N}\big( y_{i}-\hat{y}_{i} \big)^{2}\;. \end{aligned}$$

For the LAD regression equation given by

$$\displaystyle \begin{aligned} \tilde{y}_{i} = \tilde{\alpha}_{yx}+\tilde{\beta}_{yx} x_{i}\;, \end{aligned}$$

where \(\tilde {y}_{i}\) is the ith of N predicted criterion values and x i is the ith of N predictor values, \(\tilde {\alpha }_{yx}\) and \(\tilde {\beta }_{yx}\) are the LAD parameter estimates of the intercept (α yx) and slope (β yx), respectively.Footnote 1 Unlike OLS regression, no simple expressions can be given for \(\tilde {\alpha }_{yx}\) and \(\tilde {\beta }_{yx}\), as for OLS regression in Eqs. (7.1) and (7.2). However, values for \(\tilde {\alpha }_{yx}\) and \(\tilde {\beta }_{yx}\) may be found through an efficient linear programming algorithm, such as provided by Barrodale and Roberts [1, 2]. In contrast to estimates of OLS regression parameters, estimates of LAD regression parameters minimize the sum of the absolute differences between the observed and predicted criterion values, i.e.,

$$\displaystyle \begin{aligned} \sum_{i=1}^{N} \big| y_{i}-\tilde{y}_{i} \big|\;. \end{aligned}$$

It is convenient to have a measure of agreement, not correlation, between the observed and predicted y values. Let

$$\displaystyle \begin{aligned} \delta = \frac{1}{N} \sum_{i=1}^{N}\big| y_{i}-\tilde{y}_{i} \big|\;. \end{aligned}$$

Then, the expected value of δ is given by

$$\displaystyle \begin{aligned} \mu_{\delta} = \frac{1}{N^{2}} \sum_{i=1}^{N}\,\sum_{j=1}^{N}\big| y_{i}-\tilde{y}_{j} \big|\;, \end{aligned}$$

and a measure of agreement between the observed y values and the predicted \(\tilde {y}\) values is given by

$$\displaystyle \begin{aligned} \mathfrak{R} = 1-\frac{\delta}{\mu_{\delta}}\;. \end{aligned}$$

\(\mathfrak {R}\) is a chance-corrected measure of agreement and/or effect size, reflecting the amount of agreement in excess of what would be expected by chance. \(\mathfrak {R}\) attains a maximum value of unity when the agreement between the observed y values and the predicted \(\tilde {y}\) values is perfect, i.e., y i and \(\tilde {y}_{i}\) values are identical for i = 1, …, N. \(\mathfrak {R}\) is zero when the agreement between the observed y values and predicted \(\tilde {y}\) values is equal to what is expected by chance, i.e., \(\mathrm {E}[\mathfrak {R}|H_{0}] = 0\). Like all chance-corrected measures, \(\mathfrak {R}\) will occasionally be slightly negative when agreement is less than what is expected by chance.

7.2.1 Illustration of Effects of Extreme Values

Three useful diagnostics for assessing the potential effects of extreme values on regression estimators are distance, leverage, and influence. In general terms, distance refers to the possible presence of unusual values in the criterion variable and is typically measured as the deviation of a value from the measured center of the criterion variable (y). Leverage refers to the possible presence of unusual values in a predictor variable. In the case of a single predictor, leverage is typically measured as the deviation of a value from the measured center of the predictor variable (x). Influence incorporates both distance and leverage and refers to the possible presence of unusual values in some combination of the criterion and predictor variables.

For OLS regression, the measure of distance for any data point is simply an error term or residual, i.e., \(e_{i} = y_{i}-\hat {y}_{i}\) and is sometimes standardized and sometimes Studentized. Leverage is a measure of the importance of the ith observation in determining the model fit and is usually designated as h i. More specifically, h i is the ith diagonal element of the N×N matrix

$$\displaystyle \begin{aligned} \mathbf{H} = \mathbf{X}\left( {\mathbf{X}}^{\prime}\mathbf{X} \right)^{-1} {\mathbf{X}}^{\prime} \end{aligned}$$

called the “hat matrix,” since \(\hat {\mathbf {y}} = \mathbf {Hy}\) in which \(\hat {\mathbf {y}}\) is the transposed column vector

$$\displaystyle \begin{aligned} \hat{\mathbf{y}} = \left( \hat{y}_{1},\hat{y}_{2},\ldots,\hat{y}_{N} \right)^{\prime} \quad \mbox{and} \quad \mathbf{y} = \left( y_{1},y_{2},\ldots,y_{N} \right)^{\prime}\;. \end{aligned}$$

In the case of only one predictor, leverage is simply a function of the deviation of an x score on that predictor from the prediction mean and is given by

$$\displaystyle \begin{aligned} h_{i} = \frac{1}{N}+\frac{(x_{i}-\bar{x})^{2}}{(N-1)s_{x}^{2}} \qquad \mbox{for }i = 1,\,\ldots,\,N\;, \end{aligned}$$

where \(s_{x}^{2}\) is the estimated population variance for variable x given by

$$\displaystyle \begin{aligned} s_{x}^{2} = \frac{1}{N-1}\sum_{i=1}^{N}\big( x_{i}-\bar{x} \big)^{2}\;. \end{aligned}$$

Influence combines both leverage and distance, measured as a Studentized residual, to identify unusually influential observations. Residuals are sometimes standardized and sometimes Studentized. Standardized residuals are given by

$$\displaystyle \begin{aligned} z_{i} = \frac{e_{i}}{s_{y.x}} \qquad \mbox{for }i = 1,\,\ldots,\,N\;, \end{aligned}$$

where \(e_{i} = y_{i}-\hat {y}_{i}\) for i = 1, …, N is the unstandardized residual and

$$\displaystyle \begin{aligned} s_{y.x} = \left(\frac{1}{N-p-1}\sum_{i=1}^{N}e_{i}^{2}\right)^{1/2} \end{aligned}$$

is the standard error of estimate. Standardized residuals have a mean of zero and a variance of one. Studentized residuals are given by

$$\displaystyle \begin{aligned} r_{i} = \frac{e_{i}}{s_{y.x}\sqrt{1-h_{i}}} = \frac{z_{i}}{\sqrt{1-h_{i}}} \qquad \mbox{for }i = 1,\,\ldots,\,N\;. \end{aligned}$$

Studentized residuals follow Student’s t distribution with mean near zero and variance slightly greater than one.

The most common measure of influence is Cook’s distance given by

$$\displaystyle \begin{aligned} d_{i} = \left( \frac{1}{p+1} \right) r_{i}^{2} \left( \frac{h_{i}}{1-h_{i}} \right)\;, \end{aligned}$$

where \(r_{i}^{2}\) denotes the squared Studentized residual and p is the number of predictor variables.

To illustrate the effects of extreme values on the estimates of OLS and LAD regression parameters, consider an example of linear regression with one predictor and a single extreme data point. This simplified example permits the isolation and assessment of distance, leverage, and influence and allows comparison of the effects of an atypical value on estimates of OLS and LAD regression parameters. The data for a linear regression with one predictor variable are listed in Table 7.3. The bivariate data listed in Table 7.3 consist of nine data points with x i = i and y i = 10 − i for i = 1, …, 9 and describe a perfect negative linear relationship. Figure 7.1 displays the example bivariate data listed in Table 7.3 and indicates the directions of unusual values implicit in distance (D), leverage (L), and influence (I).

Fig. 7.1
figure 1

Scatterplot of the data given in Table 7.3 with the directions of extreme values indicated by D, I, and L for distance, influence, and leverage, respectively

Fig. 7.2
figure 2

Scatterplot of the data given in Table 7.3 with the locations of an added tenth value indicated by four white circles

Table 7.3 Example bivariate data on N = 9 objects for a perfect negative linear regression with one predictor variable

7.2.1.1 Distance

If a tenth bivariate value is added to the nine bivariate values given in Table 7.3 where (x 10, y 10) = (5, 5), the new data point is located at the common mean and median of both variable x and variable y and, therefore, does not affect the perfect linear relationship between the variables. If x 10 is held constant at x 10 = 5, but y 10 takes on the added values of 6, 7, …, 30, 40, 60, 80, and 100, then the effects of distance on the two regression models can be observed. The vertical movement of y 10 with variable x held constant at x 10 = 5 is depicted by the directional arrow labeled “D” in Fig. 7.1 and by the four white circles in Fig. 7.2, illustrating an additional data point moving vertically away from location (x 5, y 5) = (5, 5) by increments of one y unit, i.e., (5, 6), (5, 7), (5, 8), and so on.

Table 7.4 lists the values for x 10 and y 10 in the first two columns, the \(\hat {\alpha }_{yx}\) and \(\hat {\beta }_{yx}\) estimates of the OLS regression parameters in the next two columns, and the \(\tilde {\alpha }_{yx}\) and \(\tilde {\beta }_{yx}\) estimates of the LAD regression parameters in the last two columns. The \(\tilde {\alpha }_{yx}\) and \(\tilde {\beta }_{yx}\) parameter estimates in the last two columns of Table 7.4 were obtained using the linear program of Barrodale and Roberts [2]. The estimates of the OLS regression parameters listed in Table 7.4 demonstrate that \(\hat {\alpha }_{yx}\) systematically changes with increases in distance, but \(\hat {\beta }_{yx}\) remains constant at − 1.00. In contrast, estimates of the LAD regression parameters are unaffected by changes in distance, remaining constant at \(\tilde {\alpha }_{yx} = 10.00\) and \(\tilde {\beta }_{yx} = -1.00\) for x 10 = 5 and any value of y 10. Given the nine bivariate data points listed in Table 7.3 and an additional bivariate data point with x 10 = 5, it follows that

$$\displaystyle \begin{aligned} \sum_{i=1}^{10} \big| y_{i}-\tilde{y}_{i} \big| = \big| y_{10}-5 \big|\;.\end{aligned} $$
Table 7.4 Effects of distance on intercepts and slopes of OLS and LAD linear regression models

7.2.1.2 Leverage

If a tenth bivariate value is added to the nine bivariate values given in Table 7.3 where y 10 = 5 and x 10 takes on the added values of 6, 7, …, 30, 40, 60, 80, and 100, then the effects of leverage on the two regression models can be observed. The horizontal movement of x 10 with y 10 held constant at y 10 = 5 is depicted by the directional arrow labeled “L” in Fig. 7.1 and by the four white circles in Fig. 7.3, illustrating an additional data point moving horizontally away from (x 5, y 5) = (5, 5) by increments of one x unit, i.e., (6, 5), (7, 5), (8, 5), and so on.

Fig. 7.3
figure 3

Scatterplot of the data given in Table 7.3 with the locations of an added tenth value indicated by four white circles

Table 7.5 lists the values of x 10 and y 10 in the first two columns, the \(\hat {\alpha }_{yx}\) and \(\hat {\beta }_{yx}\) estimates of the OLS regression parameters in the next two columns, and the \(\tilde {\alpha }_{yx}\) and \(\tilde {\beta }_{yx}\) estimates of the LAD regression parameters in the last two columns. The \(\tilde {\alpha }_{yx}\) and \(\tilde {\beta }_{yx}\) estimates were again obtained using the linear program of Barrodale and Roberts [2]. The estimates of the OLS regression parameters listed in Table 7.5 demonstrate that both \(\hat {\alpha }_{yx}\) and \(\hat {\beta }_{yx}\) exhibit complex changes with increases in leverage. Note the dramatic changes in the intercept from \(\hat {\alpha }_{yx} = +10.00\) to \(\hat {\alpha }_{yx} = +5.1063\), approaching the mean of y (+ 5.00), and the slope from \(\hat {\beta }_{yx} = -1.00\) to \(\hat {\beta }_{yx} = -0.0073\), approaching a slope of zero. In contrast, \(\tilde {\alpha }_{yx}\) and \(\tilde {\beta }_{yx}\) are unaffected for y 10 = 5 and 5 ≤ x 10 ≤ 24. For y 10 = 5 and x 10 ≥ 26, the LAD estimated regression parameters change from \(\tilde {\alpha }_{yx} = +10.00\) and \(\tilde {\beta }_{yx} = -1.00\) to \(\tilde {\alpha }_{yx} = +5.00\) and \(\tilde {\beta }_{yx} = 0.00\).

Table 7.5 Effects of leverage on intercepts and slopes of OLS and LAD linear regression models

Given the bivariate data listed in Table 7.3 on p. 9 and an additional bivariate data point with variable y held constant at y 10 = 5, it follows that

$$\displaystyle \begin{aligned} \sum_{i=1}^{10} \big| y_{i}-\tilde{y}_{i} \big| \leq 20.00 \end{aligned}$$

for x 10 ≤ 25 and

$$\displaystyle \begin{aligned} \sum_{i=1}^{10} \big| y_{i}-\tilde{y}_{i} \big| = 20.00 \end{aligned}$$

for x 10 ≥ 25. When x 10 ≤ 25, the LAD regression line defined by \(\tilde {\alpha }_{yx} = +10.00\) and \(\tilde {\beta }_{yx} = -1.00\) yields the minimum sum of absolute differences. However, when x 10 ≥ 25 the LAD regression line defined by \(\tilde {\alpha }_{yx} = +5.00\) and \(\tilde {\beta }_{yx} = 0.00\) that passes through the data point located at (x 10, y 10) yields the minimum sum of absolute differences. For x 10 = 25, the LAD regression line is not unique. While this is an interesting property of LAD regression and can easily be demonstrated with one predictor and a small number of data points, in practice any extreme value would have to be so far removed from the measured center of the distribution of variable x to be considered a “grossly aberrant” value [47, p. 871].

The fact that when y 10 = 5 and x 10 = 25, the solution is not unique and either of the two LAD regression lines is appropriate, deserves some additional explanation. Consider the data points in Fig. 7.4 where the additional tenth point is indicated at locations

$$\displaystyle \begin{aligned} (x_{6},y_{5}), (x_{7},y_{5}),\,\ldots\,,(x_{9},y_{5}) \end{aligned}$$

and the LAD regression line for the original nine data points with \(\tilde {\alpha } = +10.00\) and \(\tilde {\beta } = -1.00\) is depicted. If only the original nine data points are considered, the sum of absolute deviations is zero, i.e.,

$$\displaystyle \begin{aligned} \begin{array}{rcl} \sum_{i=1}^{9}\big| y_{i}-\tilde{y}_{i} \big| = \big| 9-9 \big|+\big| 8-8 \big|&\displaystyle +&\displaystyle \big| 7-7 \big|+\big| 6-6 \big|+\big| 5-5 \big|+\big| 4-4 \big|\\ &\displaystyle &\displaystyle +\big| 3-3 \big|+\big| 2-2 \big|+\big| 1-1 \big| = 0.00\;. \end{array} \end{aligned} $$
Fig. 7.4
figure 4

Scatterplot of the data given in Table 7.3 with the regression line \(\tilde {\beta }_{\mathit {yx}}\) depicted and the locations of an added tenth value indicated by four white circles

The addition of a tenth data point at location (x 6, y 5), the first white circle to the right of the regression line in Fig. 7.4, increases the sum of absolute deviations by one, i.e., \(|y_{i}-\hat {y}_{i}| = |6-5| = 1\). Moving the new data point horizontally to location (x 7, y 5), the second white circle to the right of the regression line in Fig. 7.4, increases the sum of absolute deviations by two, i.e., \(|y_{i}-\tilde {y}_{i}| = |7-5| = 2\). Continuing to move the new data point horizontally increments the sum of absolute deviations by increasing amounts. Consider locations (x 24, y 5), (x 25, y 5), and (x 26, y 5), where

$$\displaystyle \begin{aligned} \big| y_{i}-\tilde{y}_{i} \big| = \big| 24-5 \big| = 19\;, \quad \big| y_{i}-\tilde{y}_{i} \big| = \big| 25-5 \big| = 20\;, \end{aligned}$$

and

$$\displaystyle \begin{aligned} \big| y_{i}-\tilde{y}_{i} \big| = \big| 26-5 \big| = 21\;, \end{aligned}$$

respectively.

Thus, for an additional value up to location (x 25, y 5) the sum of absolute deviations will be equal to or less than 20, and for an additional value beyond location (x 25, y 5) the sum of absolute deviations will be equal to or greater than 20. However, when a data point is added at location (x 25, y 5) something interesting happens, which is readily apparent in Table 7.5. At this point a dramatic shift in the LAD regression line occurs, from \(\tilde {\alpha }_{yx} = +10.00\) and \(\tilde {\beta }_{yx} = -1.00\) to \(\tilde {\alpha }_{yx} = +5.00\) and \(\tilde {\beta }_{yx} = 0.00\). The regression line is leveraged and forced through the new data point location at (x 25, y 5). The new regression line is depicted in Fig. 7.5 with the absolute errors indicated by dashed lines. The sum of the absolute errors around the new regression line is

$$\displaystyle \begin{aligned} \begin{array}{rcl} \sum_{i=1}^{10} \big| y_{i}-\tilde{y}_{i} \big| = \big| 9-5 \big|&\displaystyle +&\displaystyle \big| 8-5 \big|+\big| 7-5 \big|+\big| 6-5 \big|+\big| 5-5 \big|+\big| 4-5 \big|\\ &\displaystyle &\displaystyle +\big| 3-5 \big|+\big| 2-5 \big|+\big| 1-5 \big|+\big| 5-5 \big| = 20.00\;. \end{array} \end{aligned} $$

Thus both regression lines given by \(\tilde {\alpha }_{yx} = +10.00\) and \(\tilde {\beta }_{yx} = -1.00\) and \(\tilde {\alpha }_{yx} = +5.00\) and \(\tilde {\beta }_{yx} = 0.00\) minimize the sum of absolute deviations when an additional data point is located at (x 25, y 5). Note, however, that the additional data point is far to the right and is a very extreme value, unlikely to be encountered in everyday research. Specifically, for this minimalist example, a tenth value at location (x 25, y 5) is almost three times the range and over seven standard deviations above the mean—too extreme to be of concern in practice. Thus, LAD regression is highly stable under all but the most extreme cases.

Fig. 7.5
figure 5

Scatterplot of the data given in Table 7.3 with absolute errors indicated by dashed lines

7.2.1.3 Influence

If a tenth bivariate value is added to the nine bivariate values given in Table 7.3 on p. 9 where x 10 = y 10 takes on the added values of 6, 7, …, 30, 40, 60, 80, and 100, then the effects of influence on the two regression models can be observed. The diagonal movement of (x 10, y 10) is depicted by the directional arrow labeled “I” in Fig. 7.3 and by the four white circles in Fig. 7.6, illustrating an additional data point moving diagonally away from (x 5, y 5) = (5, 5) by increments of one x and one y unit, i.e., (6, 6), (7, 7), (8, 8), and so on.

Fig. 7.6
figure 6

Scatterplot of the data given in Table 7.3 with the locations of an added tenth value indicated by four white circles

Table 7.6 lists the values of x 10 and y 10 in the first two columns, the \(\hat {\alpha }_{yx}\) and \(\hat {\beta }_{yx}\) estimates of the OLS regression parameters in the next two columns, and the \(\tilde {\alpha }_{yx}\) and \(\tilde {\beta }_{yx}\) estimates of the LAD regression parameters in the last two columns. The estimates of the OLS regression parameters listed in Table 7.4 demonstrate that both \(\hat {\alpha }_{yx}\) and \(\hat {\beta }_{yx}\) exhibit complex changes with increases in influence, quickly becoming unstable with changes in the intercept from \(\hat {\alpha }_{yx} = +10.00\) to \(\hat {\alpha }_{yx} = +0.2126\) and changes in the slope from \(\hat {\beta }_{yx} = -1.00\) to \(\hat {\beta }_{yx} = +0.9853\). Note that \(\hat {\beta }_{yx}\) is negative from x 10 = 5 up to x 10 = 13, then changes to positive for x 10 = 14 up to x 10 = 100. Note also that the range of changes in \(\hat {\beta }_{yx}\) is from \(\hat {\beta }_{yx} = -1.00\) for x 10 = 5 approaching \(\hat {\beta }_{yx} = +1.00\) for x 10 = 100; actually, \(\hat {\beta }_{yx} = +0.9853\) for x 10 = 100. In contrast, \(\tilde {\alpha }_{yx}\) and \(\tilde {\beta }_{yx}\) do not change for 5 ≤ x 10 = y 10 ≤ 24. For x 10 = y 10 ≥ 26, the estimates of the LAD regression parameters change from \(\tilde {\alpha }_{yx} = +10.00\) and \(\tilde {\beta }_{yx} = -1.00\) to \(\tilde {\alpha }_{yx} = 0.00\) and \(\tilde {\beta }_{yx} = +1.00\). When x 10 = y 10 = 25, either of the two LAD regression lines holds since the solution is not unique. Thus, two LAD regression lines minimize the sum of absolute errors: one with \(\tilde {\alpha }_{yx} = +10.00\) and \(\tilde {\beta }_{yx} = -1.00\) and the other with \(\tilde {\alpha }_{yx} = 0.00\) and \(\tilde {\beta }_{yx} = +1.00\).

Table 7.6 Effects of influence on intercepts and slopes of OLS and LAD linear regression models

Figure 7.7 depicts the two LAD regression lines, labeled with the values for \(\tilde {\alpha }_{yx}\) and \(\tilde {\beta }_{yx}\), and dashed lines indicating the errors around the regression line with \(\tilde {\alpha }_{yx} = 0.00\) and \(\tilde {\beta }_{yx} = +1.00\). As shown in Fig. 7.7, the sum of absolute errors is

$$\displaystyle \begin{aligned} \begin{array}{rcl} \sum_{i=1}^{10} \big| y_{i}-\tilde{y}_{i} \big| = \big| 9-1 \big|&\displaystyle +&\displaystyle \big| 8-2 \big|+\big| 7-3 \big|+\big| 6-4 \big|+\big| 5-5 \big|+\big| 4-6 \big|\\ &\displaystyle +&\displaystyle \big| 3-7 \big|+\big| 2-8 \big|+\big| 1-9 \big|+\big| 25-25 \big| = 40.00\;. \end{array} \end{aligned} $$
Fig. 7.7
figure 7

Scatterplot of the data given in Table 7.3 with the regression lines minimizing the sum of absolute errors

Given the bivariate data listed in Table 7.3 on p. 9 and an additional bivariate data point x 10 = y 10, it follows that

$$\displaystyle \begin{aligned} \sum_{i=1}^{10}\big| y_{i}-\tilde{y}_{i} \big| \leq 40.00 \end{aligned}$$

for 5 ≤ x 10 = y 10 ≤ 25 and

$$\displaystyle \begin{aligned} \sum_{i=1}^{10}\big| y_{i}-\tilde{y}_{i} \big| = 40.00 \end{aligned}$$

for x 10 = y 10 ≥ 25. When x 10 = y 10 ≤ 25, the LAD regression line defined by \(\tilde {\alpha }_{yx} = +10.00\) and \(\tilde {\beta }_{yx} = -1.00\) yields the minimum sum of absolute differences between y i and \(\tilde {y}_{i}\) for i = 1, …, N. However, when x 10 = y 10 ≥ 25, the LAD regression line defined by \(\tilde {\alpha }_{yx} = 0.00\) and \(\tilde {\beta }_{yx} = +1.00\) that passes through the data point located at (x 10, y 10) yields the minimum sum of absolute differences between y i and \(\tilde {y}_{i}\) for i = 1, …, N. For x 10 = y 10 = 25, the LAD regression line is not unique. It should be noted that the shift in the LAD regression line is a consequence of only the leverage component of influence. For these data, the LAD regression line is defined by \(\tilde {\alpha }_{yx} = +10.00\) and \(\tilde {\beta }_{yx} = -1.00\) if |x 10 − 5|≤ 20.00 and the regression line is unique if |x 10 − 5| < 20.0 or y 10 = 10 − x 10.

LAD linear regression is a robust alternative to OLS linear regression, especially when errors are generated by fat-tailed distributions [10, 52]. Fat-tailed distributions mean an abundance of extreme values and OLS linear regression gives disproportionate weight to extreme values. In practice, LAD linear regression is virtually unaffected by the presence of a few extreme values. While the effects of distance, leverage, and influence are illustrated with only a simplified example of perfect linear regression with one predictor, the results extend to more general regression models. If a less-than-perfect regression model with p predictors is considered, then the estimators of the LAD regression parameters are unaffected by unusual y i values, when the leverage effect is absent. In addition, only exceedingly extreme values of the predictors x 1, …, x p have any effect on the estimation of the LAD regression parameters.

7.2.2 Univariate Example of LAD Regression

Consider the small example set of bivariate data listed in Table 7.7 for N = 10 subjects. For the bivariate data listed in Table 7.7, the LAD regression coefficient is \(\tilde {\beta } = +2.1111\), δ = 5.9889, μ δ = 9.2267, and the LAD chance-corrected measure of agreement between the observed y values and the predicted \(\tilde {y}\) values is

$$\displaystyle \begin{aligned} \mathfrak{R} = 1-\frac{\delta}{\mu_{\delta}} = 1-\frac{5.9889}{9.2267} = +0.3509\;. \end{aligned}$$

Since there are M = N! = 10! = 3, 628, 800 possible arrangements of the observed data, an exact permutation analysis may not be practical. Based on L = 1, 000, 000 random arrangements of the observed data, the Monte Carlo resampling probability value of \(\mathfrak {R} = +0.3509\) is

where \(\mathfrak {R}_{\text{o}}\) denotes the observed value of \(\mathfrak {R}\).

Table 7.7 Example bivariate LAD correlation data on N = 10 subjects

While M = 3, 628, 800 possible arrangements makes an exact permutation analysis impractical, it is not impossible. If the reference set of all possible permutations of the observed scores in Table 7.7 occur with equal chance, the exact probability of \(\mathfrak {R} = +0.3509\) under the null hypothesis is

where \(\mathfrak {R}_{\text{o}}\) denotes the observed value of \(\mathfrak {R}\).

7.2.3 Multivariate Example of LAD Regression

To illustrate a multivariate LAD linear regression analysis, an application of the LAD regression model to forecasting African rainfall in the western Sahel is utilized [38]. For the multivariate data listed in Table 7.8, the first column lists N = 15 calendar years from 1950 to 1964 and the second through fourth columns (U 50, U 30, and |U 50 − U 30|) contain values based on the quasibiennial oscillation of equatorial east/west winds. U 50 is the zonal wind measured in meters per second at 50 millibars (approximately 20 km in altitude) and U 30 is the zonal wind measured in meters per second at 30 millibars (approximately 23 km is altitude).Footnote 2 The R s values in the fifth column are standard deviations from the mean rainfall for the western Sahel region. The values for R g in the sixth column are standard deviations from the mean rainfall for the Gulf of Guinea. The dependent variable in the seventh column is the April to October rainfall in the western Sahel region based on recordings from 20 stations in the region.

Table 7.8 Regional rainfall precipitation by years with predictors U 50, U 30, |U 50 − U 30|, R s, and R g

For the multivariate data listed in Table 7.8, the LAD regression coefficients are

$$\displaystyle \begin{gathered} \tilde{\beta}_{1} = -0.0021\;, \quad \tilde{\beta}_{2} = -0.0364\;, \quad \tilde{\beta}_{3} = -0.0325\;,\\ \tilde{\beta}_{4} = +0.5328\;, \;\; \mbox{and} \quad \tilde{\beta}_{5} = +0.5215\;,\vspace{-3pt} \end{gathered} $$

δ = 0.3439, μ δ = 0.4756, and the LAD chance-corrected measure of agreement between the observed y values and the predicted \(\tilde {y}\) values is

$$\displaystyle \begin{aligned} \mathfrak{R} = 1-\frac{\delta}{\mu_{\delta}} = 1-\frac{0.3439}{0.4756} = +0.2768. \end{aligned}$$

Even with a small sample of observations such as this, there are

$$\displaystyle \begin{aligned} M = N! = 15! = 1{,}307{,}674{,}368{,}000 \end{aligned}$$

possible, equally-likely arrangements of the observed to be considered, far too many for an exact permutation analysis. Based on L = 1, 000, 000 random arrangements of the observed data, the Monte Carlo resampling probability value of \(\mathfrak {R} = +0.2768\) is

$$\displaystyle \begin{aligned} P \big( \mathfrak{R} \geq \mathfrak{R}_{\text{o}}|H_{0} \big) = \frac{\text{number of }\mathfrak{R}\text{ values } \geq \mathfrak{R}_{\text{o}}}{L} = \frac{42{,}279}{1{,}000{,}000} = 0.0423\;, \end{aligned}$$

where \(\mathfrak {R}_{\text{o}}\) denotes the observed value of \(\mathfrak {R}\).

7.3 LAD Multivariate Multiple Regression

An extension of LAD multiple linear regression to include multiple response variables, coupled with multiple predictor variables, is developed in this section [36, 37]. The extension was prompted by a multivariate Least Sum of Euclidean Distances (LSED) algorithm developed by Kaufman, Taylor, Mielke, and Berry in 2002 [24].

Consider the multivariate multiple linear regression model given by

$$\displaystyle \begin{aligned} y_{ik} = \sum_{j=1}^{m} x_{ij} \beta_{jk}+e_{ik} \end{aligned}$$

for i = 1, …, N and k = 1, …, r, where y ik represents the ith of N measurements for the kth of r response variables, possibly affected by a treatment; x ij is the jth of m covariates associated with the ith response, where x i1 = 1 if the model includes an intercept; β jk denotes the jth of m regression parameters for the kth of r response variables; and e ik designates the error associated with the ith of N measurements for the k of r response variables.

If estimates of β jk that minimize

$$\displaystyle \begin{aligned} \sum_{i=1}^{N} \left( \:\sum_{k=1}^{r} e_{ik}^{2} \right)^{1/2} \end{aligned}$$

are denoted by \(\tilde {\beta }_{jk}\) for j = 1, …, m and k = 1, …, r, then the N r-dimensional residuals of the LSED multivariate multiple linear regression model are given by

$$\displaystyle \begin{aligned} e_{ik} = y_{ik}-\sum_{j=1}^{m} x_{ij} \tilde{\beta}_{jk} \end{aligned}$$

for i = 1, …, N and k = 1, …, r.

Let the N r-dimensional residuals, e i1, …, e ir for i = 1, …, N, obtained from a LSED multivariate multiple linear regression model, be partitioned into g treatment groups of sizes n 1, …, n g, where n i ≥ 2 for i = 1, …, g and

$$\displaystyle \begin{aligned} N = \sum_{i=1}^{g} n_{i}\;. \end{aligned}$$

The analysis of the multivariate multiple regression residuals depends on test statistic

$$\displaystyle \begin{aligned} \delta = \sum_{i=1}^{g} C_{i} \xi_{i}\;, \end{aligned} $$
(7.3)

where C i = n iN is a positive weight for the ith of g treatment groups and ξ i is the average pairwise Euclidean distance among the n i r-dimensional residuals in the ith of g treatment groups defined by

$$\displaystyle \begin{aligned} \xi_{i} = \binom{n_{i}}{2}^{-1} \sum_{k=1}^{N-1}\,\sum_{l=k+1}^{N} \left[ \sum_{j=1}^{r} \big( e_{kj}-e_{lj} \big)^{2} \right]^{1/2} \Psi_{ki}\Psi_{li}\;, \end{aligned} $$
(7.4)

where

$$\displaystyle \begin{aligned} \Psi_{ki} = \begin{cases} \,1 & \text{if }(e_{k1},\,\ldots,\,e_{kr})\text{ is in the }i\text{th treatment group ,} \\ {} \,0 & \text{otherwise .} \end{cases}\end{aligned} $$

The null hypothesis specifies that each of the

$$\displaystyle \begin{aligned} M = \frac{N!}{\displaystyle\prod_{i=1}^{g}n_{i}!}\end{aligned} $$

possible allocations of the N r-dimensional residuals to the g treatment groups is equally-likely. Under the null hypothesis, an exact probability value associated with the observed value of δ, δ o, is given by

$$\displaystyle \begin{aligned} P(\delta \leq \delta_{\text{o}}|H_{0}) = \frac{\mbox{number of }\delta\text{ values }\leq \delta_{\text{o}}}{M}\;.\end{aligned} $$

As with LAD univariate multiple regression models, the criterion for fitting LSED multivariate multiple regression models based on δ is the chance-corrected measure of effect size between the observed and predicted response measurement values given by

$$\displaystyle \begin{aligned} \mathfrak{R} = 1-\frac{\delta}{\mu_{\delta}}\;, \end{aligned} $$
(7.5)

where μ δ is the expected value of δ over the N! possible pairings under the null hypothesis, given by

$$\displaystyle \begin{aligned} \mu_{\delta} = \frac{1}{M} \sum_{i=1}^{M} \delta_{i}\;. \end{aligned} $$
(7.6)

Note that \(\mathfrak {R} = 1\) implies perfect agreement between the observed and model-predicted response vectors and the expected value of \(\mathfrak {R}\) is 0 under the null hypothesis, i.e., chance-corrected.

7.3.1 Example of Multivariate Multiple Regression

To illustrate a multivariate LSED multiple regression analysis, consider an unbalanced two-way randomized-block experimental design in which N = 16 subjects are tested over a = 3 levels of Factor A, the experiment is repeated b = 2 times for Factor B, and there are r = 2 response measurement scores for each subject. The data are listed in Table 7.9. The design is intentionally kept small to illustrate the multivariate multiple regression procedure.

Table 7.9 Example data for a two-way randomized-block design with a = 3 blocks, b = 2 treatments, and N = 16 subjects

7.3.1.1 Analysis of Factor A

A design matrix of dummy codes (0, 1) for a regression analysis of Factor A is given in Table 7.10, where the first column of 1 values provides for an intercept, the next column contains the dummy codes for Factor B, and the third and fourth columns contain the bivariate response measurement scores listed according to the original random assignment of the N = 16 subjects to the a = 3 levels of Factor A, with the first \(n_{A_{1}} = 5\) scores, the next \(n_{A_{2}} = 7\) scores, and the last \(n_{A_{3}} = 4\) scores associated with the a = 3 levels of Factor A, respectively. The analysis of the data listed in Table 7.10 examines the N = 16 regression residuals for possible differences among the a = 3 treatment levels of Factor A; consequently, no dummy codes are provided for Factor A as this information is implicit in the ordering of the a = 3 levels of Factor A in the last two columns of Table 7.10.

Table 7.10 Example design matrix and bivariate response measurement scores for a multivariate LSED multiple regression analysis of Factor A with N = 16 subjects

Because there are only

$$\displaystyle \begin{aligned} M = \frac{N!}{\displaystyle\prod_{i=1}^{a}n_{A_{i}}!} = \frac{16!}{5!\;7!\;4!} = 1{,}441{,}440 \end{aligned}$$

possible, equally-likely arrangements of the N = 16 bivariate response measurement scores listed in Table 7.10, an exact permutation analysis is feasible. The analysis of the N = 16 LAD regression residuals calculated on the bivariate response measurement scores for Factor A in Table 7.10 yields estimated LAD regression coefficients of

$$\displaystyle \begin{aligned} \tilde{\beta}_{1,1} = +58.00\;, \quad \tilde{\beta}_{2,1} = -9.00\;, \quad \tilde{\beta}_{1,2} = +94.00\;, \;\;\mbox{and} \quad \tilde{\beta}_{2,2} = +8.00 \end{aligned}$$

for Factor A. Table 7.11 lists the observed y ik values, LAD-predicted \(\tilde {y}_{ik}\) values, and residual e ik values for i = 1, …, 16 subjects and k = 1, 2 response variables.

Table 7.11 Observed, predicted, and residual values for a multivariate LSED multiple regression analysis of Factor A with N = 16 subjects

Following Eq. (7.4) on p. 23 and employing ordinary Euclidean distance between residuals, the N = 16 LAD regression residuals listed in Table 7.11 yield a = 3 average distance-function values of

$$\displaystyle \begin{aligned} \xi_{A_{1}} = 7.2294\;, \quad \xi_{A_{2}} = 20.0289\;, \;\; \mbox{and} \quad \xi_{A_{3}} = 7.3475\;. \end{aligned}$$

Following Eq. (7.3) on p. 23, the observed value of test statistic δ calculated on the N = 16 LAD regression residuals listed in Table 7.11 with treatment group weights

$$\displaystyle \begin{aligned} C_{j} = \frac{n_{A_{j}}}{N} \qquad \mbox{for }j = 1,2,3 \end{aligned}$$

is

$$\displaystyle \begin{aligned} \delta_{A} = \sum_{j=1}^{a} C_{j}\xi_{j} = \frac{1}{16} \big[ (5)(7.2294)+(7)(20.0289)+(4)(7.3475) \big] = 12.8587\;. \end{aligned}$$

If all M arrangements of the N = 16 observed LAD regression residuals listed in Table 7.11 occur with equal chance, the exact probability value of δ A = 12.8587 computed on the M = 1, 441, 440 possible arrangements of the observed LAD regression residuals with \(n_{A_{1}} = 5\), \(n_{A_{2}} = 7\), and \(n_{A_{3}} = 4\) preserved for each arrangement is

$$\displaystyle \begin{aligned} P(\delta \leq \delta_{A}|H_{0}) = \frac{\mbox{number of }\delta\text{ values }\leq \delta_{A}}{M} = \frac{6{,}676}{1{,}441{,}440} = 0.0046\;. \end{aligned}$$

Following Eq. (7.6) on p. 24, the exact expected value of the M = 1, 441, 440 δ values is

$$\displaystyle \begin{aligned} \mu_{\delta} = \frac{1}{M} \sum_{i=1}^{M} \delta_{i} = \frac{26{,}092{,}946.8800}{1{,}441{,}440} = 18.1020 \end{aligned}$$

and, following Eq. (7.5) on p. 23, the observed chance-corrected measure of effect size for the y i and \(\tilde {y}_{i}\) values, i = 1, …, N, is

$$\displaystyle \begin{aligned} \mathfrak{R}_{A} = 1-\frac{\delta_{A}}{\mu_{\delta}} = 1-\frac{12.8587}{18.1020} = +0.2897\;, \end{aligned}$$

indicating approximately 29% agreement between the observed and predicted values above that expected by chance.

7.3.1.2 Analysis of Factor B

A design matrix of dummy codes (0, 1) for a regression analysis of Factor B is given in Table 7.12, where the first column of 1 values provides for an intercept, the next two columns contain the dummy codes for Factor A, and the fourth and fifth columns contain the bivariate response measurement scores listed according to the original random assignment of the N = 16 subjects to the b = 2 levels of Factor B, with the first \(n_{B_{1}} = 7\) scores and the last \(n_{B_{2}} = 9\) scores associated with the b = 2 levels of Factor B, respectively. The analysis of the data listed in Table 7.12 examines the N = 16 regression residuals for possible differences between the b = 2 treatment levels of Factor B; consequently, no dummy codes are provided for Factor B as this information is implicit in the ordering of the b = 2 levels of Factor B in the last two columns of Table 7.12.

Table 7.12 Example design matrix and bivariate response measurement scores for a multivariate LSED multiple regression analysis of Factor B with N = 16 subjects

Because there are only

$$\displaystyle \begin{aligned} M = \frac{N!}{\displaystyle\prod_{i=1}^{b}n_{B_{i}}!} = \frac{16!}{7!\;9!} = 11{,}440 \end{aligned}$$

possible, equally-likely arrangements of the N = 16 response measurement scores listed in Table 7.12, an exact permutation analysis is feasible. The analysis of the N = 16 LAD regression residuals calculated on the bivariate response measurement scores for Factor B in Table 7.12 yields estimated LAD regression coefficients of

$$\displaystyle \begin{gathered} \tilde{\beta}_{1,1} = +46.00\;, \quad \tilde{\beta}_{2,1} = +5.00\;, \quad \tilde{\beta}_{3,1} = +20.00\;, \quad \tilde{\beta}_{1,2} = +104.00\;,\\ \tilde{\beta}_{2,2} = -4.00\;, \;\;\; \mbox{and} \quad \tilde{\beta}_{3,2} = -20.00 \end{gathered} $$

for Factor B. Table 7.13 lists the observed y ik values, LAD-predicted \(\tilde {y}_{ik}\) values, and residual e ik values for i = 1, …, 16 subjects and k = 1, 2 response variables.

Table 7.13 Observed, predicted, and residual values for a multivariate LSED multiple regression analysis of Factor A with N = 16 subjects

Following Eq. (7.4) on p. 23 and employing ordinary Euclidean distance between residuals, the N = 16 LAD regression residuals listed in Table 7.13 yield b = 2 average distance-function values of

$$\displaystyle \begin{aligned} \xi_{B_{1}} = 6.0229 \quad \mbox{and} \quad \xi_{B_{2}} = 16.7440\;. \end{aligned}$$

Following Eq. (7.3) on p. 23, the observed value of test statistic δ calculated on the N = 16 LAD regression residuals listed in Table 7.13 with treatment group weights

$$\displaystyle \begin{aligned} C_{i} = \frac{n_{B_{i}}}{N} \qquad \mbox{for }i = 1,2\;, \end{aligned}$$

is

$$\displaystyle \begin{aligned} \delta_{B} = \sum_{i=1}^{b} C_{i} \xi_{i} = \frac{1}{16} \big[ (7)(6.0229)+(9)(16.7440) \big] = 12.0535\;. \end{aligned}$$

If all M arrangements of the N = 16 observed LAD regression residuals listed in Table 7.13 occur with equal chance, the exact probability value of δ B = 12.0535 computed on the M = 11, 440 possible arrangements of the observed LAD regression residuals with \(n_{B_{1}} = 7\) and \(n_{B_{2}} = 9\) preserved for each arrangement is

$$\displaystyle \begin{aligned} P(\delta \leq \delta_{B}|H_{0}) = \frac{\mbox{number of }\delta\text{ values }\leq \delta_{B}}{M} = \frac{2{,}090}{11{,}440} = 0.1827\;. \end{aligned}$$

Following Eq. (7.6) on p. 24, the exact expected value of the M = 11, 440 δ values is

$$\displaystyle \begin{aligned} \mu_{\delta} = \frac{1}{M} \sum_{i=1}^{M} \delta_{i} = \frac{140{,}623.9120}{11{,}440} = 12.2923 \end{aligned}$$

and, following Eq. (7.5) on p. 23, the observed chance-corrected measure of effect size for the y i and \(\tilde {y}_{i}\) values, i = 1, …, N, is

$$\displaystyle \begin{aligned} \mathfrak{R}_{B} = 1-\frac{\delta_{B}}{\mu_{\delta}} = 1-\frac{12.0535}{12.2923} = +0.0194\;, \end{aligned}$$

indicating approximately 2% agreement between the observed and predicted values above that expected by chance.

For another example of LAD multiple multivariate example, see an informative and widely cited article by Endler and Mielke on “Comparing entire colour patterns as birds see them” in Biological Journal of the Linnean Society [11].

7.4 Comparison of OLS and LAD Linear Regression

In this section, OLS and LAD linear regression analyses are illustrated and compared on two example data sets—one with p = 2 predictors and no extreme values and one with p = 2 predictors and a single extreme value.Footnote 3 Consider first the small example data set with p = 2 predictors listed in Table 7.14 where variable y is Hours of Housework done by husbands per week, variable x 1 is Number of Children, and variable x 2 is husband’s Years of Education for N = 12 families.

Table 7.14 Example multivariate correlation data on N = 12 families with p = 2 predictors

7.4.1 Ordinary Least Squares (OLS) Analysis

For the multivariate data listed in Table 7.14, the unstandardized OLS regression coefficients are

$$\displaystyle \begin{aligned} \hat{\beta}_{1} = +0.6356 \quad \mbox{and} \quad \hat{\beta}_{2} = -0.0649\;, \end{aligned}$$

and the observed squared OLS multiple correlation coefficient is \(R_{\text{o}}^{2} = 0.2539\). Based on L = 1, 000, 000 random arrangements of the observed data, the Monte Carlo resampling probability value of \(R_{\text{o}}^{2} = 0.2539\) is

$$\displaystyle \begin{aligned} P \big( R^{2} \geq R_{\text{o}}^{2}|H_{0} \big) = \frac{\text{number of }R^{2}\text{ values } \geq R_{\text{o}}^{2}}{L} = \frac{268{,}026}{1{,}000{,}000} = 0.2680\;, \end{aligned}$$

where \(R_{\text{o}}^{2}\) denotes the observed value of R 2. For comparison, the exact probability value of \(R_{\text{o}}^{2} = 0.2539\) based on M = N! = 12! = 479, 001, 600 possible arrangements of the data listed in Table 7.14 is P = 0.2681.

7.4.2 Least Absolute Deviation (LAD) Analysis

For the multivariate data listed in Table 7.14, the LAD regression coefficients are

$$\displaystyle \begin{aligned} \tilde{\beta}_{1} = +0.4138 \quad \mbox{and} \quad \tilde{\beta}_{2} = +0.1207\;, \end{aligned}$$

δ = 1.5000, μ δ = 1.8084, and the LAD chance-corrected measure of agreement between the observed y values and the predicted \(\tilde {y}\) values is

$$\displaystyle \begin{aligned} \mathfrak{R}_{\text{o}} = 1-\frac{\delta}{\mu_{\delta}} = 1-\frac{1.5000}{1.8084} = +0.1706\;. \end{aligned}$$

Based on L = 1, 000, 000 random arrangements of the observed data, the Monte Carlo resampling probability value of \(\mathfrak {R} = +0.1706\) is

$$\displaystyle \begin{aligned} P \big( \mathfrak{R} \geq \mathfrak{R}_{\text{o}}|H_{0} \big) = \frac{\text{number of }\mathfrak{R}\text{ values } \geq \mathfrak{R}_{\text{o}}}{L} = \frac{19{,}176}{1{,}000{,}000} = 0.0192\;, \end{aligned}$$

where \(\mathfrak {R}_{\text{o}}\) denotes the observed value of \(\mathfrak {R}\). For comparison, the exact probability value of \(\mathfrak {R}_{\text{o}} = +0.1706\) based on M = N! = 12! = 479, 001, 600 possible arrangements of the data listed in Table 7.14 is P = 0.0221.

Now, suppose that the husband in family “L” was a stay-at-home house-husband and instead of contributing just four hours of housework per week, he actually contributed 40 hours, as in Table 7.15.

Table 7.15 Example multivariate correlation data on N = 12 families with p = 2 predictors, where the husband in Family L contributed 40 hours of housework per week

7.4.3 Ordinary Least Squares (OLS) Analysis

For the multivariate data listed in Table 7.15, the unstandardized OLS regression coefficients are

$$\displaystyle \begin{aligned} \hat{\beta}_{1} = +5.7492 \quad \mbox{and} \quad \hat{\beta}_{2} = +2.3896\;, \end{aligned}$$

and the observed squared OLS multiple correlation coefficient is \(R_{\text{o}}^{2} = 0.5786\). Based on L = 1, 000, 000 random arrangements of the observed data, the Monte Carlo resampling probability value of \(R_{\text{o}}^{2} = 0.5786\) is

$$\displaystyle \begin{aligned} P \big( R^{2} \geq R_{\text{o}}^{2}|H_{0} \big) = \frac{\text{number of }R^{2}\text{ values } \geq R_{\text{o}}^{2}}{L} = \frac{15{,}215}{1{,}000{,}000} = 0.0152\;, \end{aligned}$$

where \(R_{\text{o}}^{2}\) denotes the observed value of R 2. For comparison, the exact probability value of \(R_{\text{o}}^{2} = 0.5786\) based on M = N! = 12! = 479, 001, 600 possible arrangements of the data listed in Table 7.15 is P = 0.0153.

7.4.4 Least Absolute Deviation (LAD) Analysis

For the multivariate data listed in Table 7.15, the LAD regression coefficients are

$$\displaystyle \begin{aligned} \tilde{\beta}_{1} = +1.3000 \quad \mbox{and} \quad \tilde{\beta}_{2} = +0.0500\;, \end{aligned}$$

δ o = 4.0333, μ δ = 5.2194, and the LAD chance-corrected measure of agreement between the observed y values and the predicted \(\tilde {y}\) values is

$$\displaystyle \begin{aligned} \mathfrak{R}_{\text{o}} = 1-\frac{\delta_{\text{o}}}{\mu_{\delta}} = 1-\frac{4.0333}{5.2194} = +0.2272\;. \end{aligned}$$

Based on L = 1, 000, 000 random arrangements of the observed data, the Monte Carlo resampling probability value of \(\mathfrak {R}_{\text{o}} = +0.2272\) is

where \(\mathfrak {R}_{\text{o}}\) denotes the observed value of \(\mathfrak {R}\). For comparison, the exact probability value of \(\mathfrak {R}_{\text{o}} = +0.2272\) based on M = N! = 12! = 479, 001, 600 possible arrangements of the data listed in Table 7.14 is P = 0.5630×10−2.

The results of the comparison of OLS and LAD analyses with 4 and 40 hours of housework by the husband in family “L” are summarized in Table 7.16. The value of 40 hours of housework by the husband in family “L” is, by any definition, an extreme value. It is six times the mean of \(\bar {y} = 6.3333\) and three standard deviations above the mean. It is readily apparent that the extreme value of 40 hours had a profound impact on the results of the OLS analysis. The OLS multiple correlation coefficient more than doubled from \(R_{\text{o}}^{2} = 0.2539\) to \(R_{\text{o}}^{2} = 0.5786\), a difference of R 2 = 0.3247, and the corresponding probability value decreased from P = 0.2680 to P = 0.0152, a difference of P = 0.2528. The impact of 40 hours of housework on the LAD analysis is more modest with the LAD chance-corrected measure of agreement increasing only slightly from \(\mathfrak {R}_{\text{o}} = 0.1706\) to \(\mathfrak {R}_{\text{o}} = 0.2272\), a difference of \(\mathfrak {R} = 0.0566\), and the probability value decreasing from P = 0.0192 to P = 0.0046, a difference of only P = 0.0146.

Table 7.16 Comparison of OLS and LAD analyses for the data given in Table 7.14 with 4 hours of housework for the husband in family L and the data given in Table 7.15 with 40 hours of housework for the husband in family L

7.5 Fisher’s r xy to z Transformation

In order to attach a probability statement to inferences about the Pearson product-moment correlation coefficient, it is necessary to know the sampling distribution of a statistic that relates the sample correlation coefficient, r xy, to the population parameter, ρ xy. Because − 1.0 ≤ r xy ≤ +1.0, the sampling distribution of statistic r xy is asymmetric whenever ρ xy ≠ 0.0.Footnote 4 Given two random variables that follow the bivariate normal distribution with population parameter ρ xy, the sampling distribution of statistic r xy approaches normality as the sample size increases; however, it converges very slowly for |ρ xy|≥ 0.6, even with samples as large as N = 400 [7, p. xxxiii]. Fisher [13, 14] obtained the basic distribution of r xy and showed that, when bivariate normality is assumed, a logarithmic transformation of r xy (henceforth referred to as the Fisher z transform),

$$\displaystyle \begin{aligned} z = \frac{1}{2} \ln \left( \frac{1+r_{xy}}{1-r_{xy}} \right) = \tanh^{-1}(r_{xy})\;, \end{aligned}$$

becomes normally distributed with a mean of approximately

$$\displaystyle \begin{aligned} \frac{1}{2} \ln \left( \frac{1+\rho_{xy}}{1-\rho_{xy}} \right) = \tanh^{-1}(\rho_{xy}) \end{aligned}$$

and the standard error approaches

$$\displaystyle \begin{aligned} \frac{1}{\sqrt{N-3}} \end{aligned}$$

as N →.

The Fisher r xy to z transform is presented in most textbooks and is available in a wide array of statistical software packages. In this section, the precision and accuracy of the Fisher z transform are examined for a variety of bivariate distributions, sample sizes, and values of ρ xy [5]. If ρ xy ≠ 0.0 and the distribution is not bivariate normal, then the desired properties of the Fisher z transform generally fail.

There are two general applications of the Fisher z transform. The first application comprises the computation of the confidence limits for ρ xy and the second involves the testing of hypotheses about specified values of ρ xy ≠ 0.0. The second application is more tractable than the first application as a hypothesized value of ρ xy is available. The next part of this section describes the bivariate distributions to be examined, followed by an exploration of confidence intervals and an examination of hypothesis testing. The last part of the section provides some general conclusions about the propriety of uncritically using the Fisher z transform in actual research.

7.5.1 Distributions

Seven bivariate distributions are utilized to test the Fisher z transform. In addition, two related methods by Gayen [17] and Jeyaratnam [22] are also examined. The Gayen and Jeyaratnam techniques are characterized by simplicity, accuracy, and ease of use. For other interesting approaches, see David [7]; Hotelling [21]; Kraemer [25]; Liu, Woodward, and Bonett [28]; Mudholkar and Chaubey [41]; Pillai [45]; Ruben [48]; and Samiuddin [49].

7.5.1.1 Normal Distribution

The density function of the standardized normal, N(0, 1), distribution is given by

$$\displaystyle \begin{aligned} f(x) = (2\pi)^{-1/2} \exp(-x^{2}/2)\;. \end{aligned}$$

7.5.1.2 Generalized Logistic Distribution

The density function of the generalized logistic (GL) distribution is given by

$$\displaystyle \begin{aligned} f(x) = \big[ \exp(\theta x)/\theta \big]^{1/\theta} \big[ 1+ \exp(\theta x)/\theta \big]^{-(\theta+1)/\theta} \end{aligned}$$

for θ > 0 [34]. The generalized logistic distribution is positively skewed for θ < 1 and negatively skewed for θ > 1. When θ = 1.0, GL(θ) is a logistic distribution that closely resembles the normal distribution, with somewhat lighter tails. When θ = 0.10, GL(θ) is a generalized logistic distribution with positive skewness. When θ = 0.01, GL(θ) is a generalized logistic distribution with even greater positive skewness.

7.5.1.3 Symmetric Kappa Distribution

The density function of the symmetric kappa (SK) distribution is given by

$$\displaystyle \begin{aligned} f(x) = 0.5 \lambda^{-1/\lambda}\left( 1+|x|{}^{\lambda}/\lambda \right)^{-(\lambda+1)/\lambda} \end{aligned}$$

for λ > 0 [34, 35]. The shape of the symmetric kappa distribution ranges from an exceedingly heavy-tailed distribution as λ approaches zero to a uniform distribution as λ goes to infinity. When λ = 2, SK(λ) is a peaked, heavy-tailed distribution, identical to Student’s t distribution with 2 degrees of freedom. Thus, the variance of SK(2) does not exist. When λ = 3, SK(λ) is also a heavy-tailed distribution, but the variance does exist. When λ = 25, SK(λ) is a loaf-shaped distribution resembling a uniform distribution with the addition of very light tails. These distributions provide a variety of populations from which to sample and evaluate the Fisher z transformation and the Gayen [17] and Jeyaratnam [22] modifications.

Seven bivariate correlated distributions were constructed in the following manner. Let x and y be independent identically distributed univariate random variables from each of seven univariate distributions, i.e., N(0, 1), GL(1.0), GL(0.1), GL(0.01), SK(2), SK(3), and SK(25), and define the correlated random variables U 1 and U 2 of each bivariate distribution by

$$\displaystyle \begin{aligned} U_{1} = x(1-\rho_{xy}^{2})^{1/2}+\rho_{xy}y \end{aligned}$$

and U 2 = y, where ρ xy is the desired Pearson product-moment correlation coefficient of random variables U 1 and U 2. Then a Monte Carlo procedure obtains random samples, corresponding to x and y, from the normal, generalized logistic, and symmetric kappa distributions.

7.5.2 Confidence Intervals

In this section, Monte Carlo confidence intervals are based on the seven distributions: N(0, 1), GL(1.0), GL(0.1), GL(0.01), SK(2), SK(3), and SK(25). Each simulation is based on L = 1, 000, 000 bivariate random samples, U 1 and U 2, of size N = 10, 20, 40, and 80 for ρ xy = 0.00, + 0.40, + 0.60, and + 0.80 with 1 − α = 0.90, 0.95, and 0.99. Confidence intervals obtained from two methods are considered. The first confidence interval is based on the Fisher z transform and is defined by

$$\displaystyle \begin{aligned} \begin{array}{rcl} \tanh \left[ \tanh^{-1}(r_{xy})-\frac{z_{\alpha/2}}{\sqrt{N-3}} \right] \leq \rho_{xy} \leq \tanh \left[ \tanh^{-1}(r_{xy})+\frac{z_{\alpha/2}}{\sqrt{N-3}} \right]\;, \end{array} \end{aligned} $$

where z α∕2 is the upper 0.50α probability point of the N(0, 1) distribution. The second confidence interval is based on a method proposed by Jeyaratnam [22] and is defined by

$$\displaystyle \begin{aligned} \frac{r_{xy}-w}{1-r_{xy}w} \leq \rho_{xy} \leq \frac{r_{xy}+w}{1+r_{xy}w}\;,\end{aligned} $$

where

$$\displaystyle \begin{aligned} w = \frac{\big( t_{\alpha/2,N-2} \big)/\sqrt{N-2}}{\Big[ 1+\big( t_{\alpha/2,N-2} \big)^{2}/\sqrt{N-2} \,\Big]^{1/2}} \end{aligned}$$

and t α∕2,N−2 is the upper 0.50α probability point of Student’s t distribution with N − 2 degrees of freedom.

The results of the Monte Carlo analyses are summarized in Tables 7.17, 7.18, 7.19, 7.20, 7.21, 7.22, 7.23, which contain simulated containment probability values for the seven bivariate distributions with specified nominal values of 1 − α (0.90, 0.95, 0.99), ρ xy (0.00, +0.40, +0.60, +0.80), and N (10, 20, 40, 80) for the Fisher (F) and Jeyaratnam (J) confidence intervals. Table 7.17 analyzes data obtained from the N(0, 1) distribution; Tables 7.18, 7.19, and 7.20 analyze data obtained from the generalized logistic distribution with θ = 1.0, 0.1, and 0.01, respectively; and Tables 7.21, 7.22, and 7.23 analyze data obtained from the symmetric kappa distribution with λ = 2, 3, and 25, respectively.

Table 7.17 Containment probability values for a bivariate N(0, 1) distribution with Fisher (F) and Jeyaratnam (J) 1 − α correlation confidence intervals
Table 7.18 Containment probability values for a bivariate GL(1.0) distribution with Fisher (F) and Jeyaratnam (J) 1 − α correlation confidence intervals
Table 7.19 Containment probability values for a bivariate GL(0.1) distribution with Fisher (F) and Jeyaratnam (J) 1 − α correlation confidence intervals
Table 7.20 Containment probability values for a bivariate GL(0.01) distribution with Fisher (F) and Jeyaratnam (J) 1 − α correlation confidence intervals
Table 7.21 Containment probability values for a bivariate SK(2) distribution with Fisher (F) and Jeyaratnam (J) 1 − α correlation confidence intervals
Table 7.22 Containment probability values for a bivariate SK(3) distribution with Fisher (F) and Jeyaratnam (J) 1 − α correlation confidence intervals
Table 7.23 Containment probability values for a bivariate SK(25) distribution with Fisher (F) and Jeyaratnam (J) 1 − α correlation confidence intervals

In each of the seven tables, the Monte Carlo containment probability values for a 1 − α confidence interval based on the Fisher z transform and a 1 − α confidence interval based on the Jeyaratnam technique were obtained from the same L = 1, 000, 000 bivariate random samples of size N drawn with replacement from the designated bivariate distribution characterized by the specified population correlation ρ xy. If the Fisher and Jeyaratnam transforms are appropriate for the simulated data, the containment probability values should agree with the nominal 1 − α values.

Some general observations can be made about the Monte Carlo results contained in Tables 7.17 through 7.23. First, in each of the tables there is little difference between the Fisher and Jeyaratnam Monte Carlo containment probability values and both techniques provide values close to the nominal 1 − α values for the N(0, 1) distribution analyzed in Table 7.17 with any value of ρ xy and for any of the other distributions analyzed in Tables 7.18 through 7.23 when ρ xy = 0.00. Second, for the skewed and heavy-tailed distributions, i.e., GL(0.1), GL(0.01), SK(2), and SK(3), with N held constant, the differences between the Monte Carlo containment probability values and the nominal 1 − α values become greater as |ρ xy| increases. Third, the differences between the Monte Carlo containment probability values and the nominal 1 − α values increase with increasing N and |ρ xy| > 0.00 for all the distributions except N(0, 1) and SK(25). This is especially evident with the skewed and heavy-tailed distributions GL(0.1), GL(0.01), SK(2), and SK(3).

7.5.3 Hypothesis Testing

In this section, Monte Carlo tests of hypotheses are based on the same seven distributions: N(0, 1), GL(1.0), GL(0.1), GL(0.01), SK(2), SK(3), and SK(25). Each simulation is based on L = 1, 000, 000 bivariate random samples of size N = 20 and N = 80 for ρ xy = 0.00 and ρ xy = +0.60 and compared to seven nominal upper-tail probability values of P = 0.99, 0.90, 0.75, 0.50, 0.25, 0.10, and 0.01. Two tests of ρ xy ≠ 0.00 are considered. The first test is based on the Fisher z transform and uses the standardized test statistic given by

$$\displaystyle \begin{aligned} T = \frac{z-\mu_{z}}{\sigma_{z}}\;, \end{aligned}$$

where

$$\displaystyle \begin{aligned} z = \tanh^{-1}(r_{xy})\;, \quad \mu_{z} = \tanh^{-1}(\rho_{xy})\;, \;\; \mbox{and} \quad \sigma_{z} =\frac{1}{\sqrt{N-3}}\;. \end{aligned}$$

The second test is based on corrected values proposed by Gayen [17], where

$$\displaystyle \begin{aligned} z = \tanh^{-1}(r_{xy})\;, \end{aligned}$$
$$\displaystyle \begin{aligned} \mu_{z} = \tanh^{-1}(\rho_{xy})+\frac{\rho_{xy}}{2(N-1)}\left[ 1+\frac{5-\rho_{xy}^{2}}{4(N-1)} \right]\;, \end{aligned}$$

and

$$\displaystyle \begin{aligned} \sigma_{z} = \left\{ \frac{1}{N-1} \left[ 1+\frac{4-\rho_{xy}^{2}}{2(N-1)}+\frac{22-6\rho_{xy}^{2}-3\rho_{xy}^{4}}{6(N-1)^{2}} \right] \right\}^{1/2}\;. \end{aligned}$$

The results of the Monte Carlo analyses are summarized in Tables 7.24, 7.25, 7.26, 7.27, 7.28, 7.29, 7.30, which contain simulated upper-tail probability values for the seven distributions with specified nominal probability values of P (0.99, 0.95, 0.75, 0.50, 0.25, 0.10, 0.01), ρ xy (0.00, +0.60), and N (20, 80) for the Fisher (F) and Gayen (G) test statistics. Table 7.24 analyzes data obtained from the N(0, 1) distribution; Tables 7.25, 7.26, and 7.27 analyze data obtained from the generalized logistic distribution with θ = 1.0, 0.1, and 0.01, respectively; and Tables 7.28, 7.29, and 7.30 analyze data obtained from the symmetric kappa distribution with λ = 2, 3, and 25, respectively.

Table 7.24 Upper-tail probability values compared with nominal values (P) for a bivariate N(0, 1) distribution with Fisher (F) and Gayen (G) tests of hypotheses on ρ xy = 0.00 and ρ xy = 0.60
Table 7.25 Upper-tail probability values compared with nominal values (P) for a bivariate GL(1.0) distribution with Fisher (F) and Gayen (G) tests of hypotheses on ρ xy = 0.00 and ρ xy = +0.60
Table 7.26 Upper-tail probability values compared with nominal values (P) for a bivariate GL(0.1) distribution with Fisher (F) and Gayen (G) tests of hypotheses on ρ xy = 0.00 and ρ xy = +0.60
Table 7.27 Upper-tail probability values compared with nominal values (P) for a bivariate GL(0.01) distribution with Fisher (F) and Gayen (G) tests of hypotheses on ρ xy = 0.00 and ρ xy = +0.60
Table 7.28 Upper-tail probability values compared with nominal values (P) for a bivariate SK(2) distribution with Fisher (F) and Gayen (G) tests of hypotheses on ρ xy = 0.00 and ρ xy = +0.60
Table 7.29 Upper-tail probability values compared with nominal values (P) for a bivariate SK(3) distribution with Fisher (F) and Gayen (G) tests of hypotheses on ρ xy = 0.00 and ρ xy = +0.60
Table 7.30 Upper-tail probability values compared with nominal values (P) for a bivariate SK(25) distribution with Fisher (F) and Gayen (G) tests of hypotheses on ρ xy = 0.00 and ρ xy = +0.60

In each table, the Monte Carlo upper-tail probability values for tests of hypotheses based on the Fisher and Gayen approaches were obtained from the same L = 1, 000, 000 bivariate random samples of size N drawn with replacement from the designated bivariate distribution characterized by the specified population correlation ρ xy. If the Fisher [14] and Gayen [17] techniques are appropriate for the simulated data, the upper-tail probability values should agree with the nominal upper-tail values, P.

Considered as a set, some general statements can be made about the Monte Carlo results contained in Tables 7.24 through 7.30. First, both the Fisher z transform and the Gayen correction provide very satisfactory results for the N(0, 1) distribution analyzed in Table 7.24 with any value of ρ xy and for any of the other distributions analyzed in Tables 7.25 through 7.30 when ρ xy = 0.00. Second, in general the Monte Carlo upper-tail probability values obtained with the Gayen correction are better than those obtained with the uncorrected Fisher z transform, especially near P = 0.50. Where differences exist, the Fisher z transform is somewhat better than the Gayen correction with P > 0.75 and the Gayen correction performs better when P < 0.75. Third, discrepancies between the Monte Carlo upper-tail probability values and the nominal probability values are noticeably larger for N = 80 than for N = 20 and for ρ xy = 0.60 than for ρ xy = 0.00, especially for the skewed and heavy-tailed distributions, i.e., GL(0.1), GL(0.01), SK(2), and SK(3). Fourth, the Monte Carlo upper-tail probability values in Tables 7.24 through 7.30 are consistently closer to the nominal values for ρ xy = 0.00 than for ρ xy = +0.60.

To illustrate the difference in results among the seven distributions, consider the first and last values in the last column in each table, i.e., the two Gayen values corresponding to P = 0.99 and P = 0.01 for N = 80 and ρ xy = +0.60 in Tables 7.25 to 7.30, inclusive. If an investigator was to test the null hypothesis H 0: ρ xy = +0.60 with a two-tailed test at α = 0.02, then given the N(0, 1) distribution analyzed in Table 7.24, the investigator would reject the null hypothesis at a rate of 0.0202 or about 2.02% of the time, i.e., 1.0000 − 0.9899 + 0.0101 = 0.0202, which is very close to α = 0.02. For the light-tailed GL(1.0) or generalized logistic distribution analyzed in Table 7.25, the investigator would reject H 0: ρ xy = 0.60 at a rate of 0.0339 or about 3.39% of the time, i.e., 1.0000 − 0.9838 + 0.0177 = 0.0339, compared with the specified α = 0.02. For the skewed GL(0.1) distribution analyzed in Table 7.26, the investigator would reject H 0: ρ xy = +0.60 at a rate of 0.0446 or about 4.46% of the time, and for the GL(0.01) distribution analyzed in Table 7.27, which has a more pronounced skewness than GL(0.1), the rejection rate is 0.0476 or about 4.76%, compared to α = 0.02. The heavy-tailed distributions, SK(2) and SK(3), analyzed in Tables 7.28 and 7.29, respectively, yield rejection rates of 0.3629 and 0.1346, respectively, which are not the least bit close to α = 0.02. Finally, the very light-tailed distribution, SK(25), analyzed in Table 7.30 yields a reversal with a very conservative rejection rate of 0.0096, compared to α = 0.02.

7.5.4 Discussion

The Fisher z transform of the sample correlation coefficient, r xy, is widely used in a variety of disciplines for both estimating population ρ xy values and for testing hypothesized values of ρ xy ≠ 0.00. The transform is presented in most textbooks and is a standard feature of many statistical software packages. The assumptions underlying the use of the Fisher z transform are (1) a simple random sample drawn with replacement from (2) a bivariate normal distribution. It is commonly believed that the Fisher z transform is robust to non-normality. For example, in 1929 Karl Pearson observed:

[T]he normal bivariate surface can be mutilated and distorted to a remarkable degree without affecting the frequency distribution of r in samples as small as 20 [43, p. 357].

Given correlated non-normal bivariate distributions, these Monte Carlo analyses demonstrate that the Fisher z transform is not at all robust.

In general, while the Fisher z transform and the alternative techniques proposed by Gayen [17] and Jeyaratnam [22] provide accurate results for a bivariate normal distribution with any value of ρ xy and for non-normal bivariate distributions when ρ xy = 0.0, serious problems surface with non-normal bivariate distributions when |ρ xy| > 0.0. The results for the light-tailed SK(25) distribution are, in general, slightly conservative when |ρ xy| > 0.0; cf. Liu, Woodward, and Bonett [28, p. 508]. This is usually not seen as a serious problem in practice as conservative results imply possible failure to reject the null hypothesis and a potential increase in type II error. In comparison, the results for the heavy-tailed distributions, SK(2) and SK(3), and the skewed distributions, GL(0.1) and GL(0.01) are quite liberal when |ρ xy| > 0.0. Also, GL(1.0) is a light-tailed distribution that yields slightly liberal results. Liberal results are much more serious than conservative results, as they imply possible rejection of the null hypothesis and a potential increase in type I error.

Most surprisingly, from a statistical perspective, for the heavy-tailed and skewed distributions, small samples provide better estimates than large samples. Table 7.31 extends the analyses of Tables 7.21, 7.22, 7.23, and 7.24 to larger sample sizes. In Table 7.31 the investigation is limited to Monte Carlo containment probability values obtained from the Fisher z transform for the skewed bivariate distributions based on GL(0.1) and GL(0.01) and for the heavy-tailed bivariate distributions based on SK(2) and SK(3), with ρ xy = 0.00 and ρ xy = 0.60, and for N = 10, 20, 40, 80, 160, 320, and 640. Inspection of Table 7.31 confirms that the trend observed in Tables 7.19 through 7.22 continues with larger sample sizes, producing increasingly smaller containment probability values with increasing N for |ρ xy| > 0.00, where ρ xy = +0.60 is considered representative of larger ρ xy values.

Table 7.31 Containment probability values for the bivariate GL(0.1), GL(0.01), SK(2), and SK(3) distributions with Fisher (F) 1 − α correlation confidence intervals

The impact of large sample sizes is most pronounced in the heavy-tailed bivariate distribution based on SK(2) and the skewed bivariate distribution based on GL(0.01) where, with ρ xy = +0.60, the divergence between the containment probability values and the nominal 1 − α values for N = 10 and N = 640 is quite extreme. For example, SK(2) with 1 − α = 0.90, ρ xy = +0.60, and N = 10 yields a containment probability value of P = 0.7487, whereas N = 640 for this case yields a containment probability value of P = 0.2677, compared with 1 − α = 0.90. Obviously, large samples have a greater chance of selecting rare extreme values than small samples. Consequently, the Monte Carlo containment probability values become worse with increasing sample size when heavy-tailed distributions are encountered.

It is clear that the Fisher z transform provides very good results for the bivariate normal distribution and any of the other distributions when ρ xy = 0.00. However, if a distribution is not bivariate normal and ρ xy > 0.00, then the Fisher z random variable does not follow a normal distribution. Geary [18, p. 241] admonished: “Normality is a myth; there never was, and never will be, a normal distribution.” In the absence of bivariate normality and in the presence of correlated heavy-tailed bivariate distributions, such as those contaminated by extreme values, or correlated skewed bivariate distributions, the Fisher z transform and related techniques can yield highly inaccurate results.

Given that normally distributed populations are rarely encountered in actual research situations [18, 33] and that both heavy-tailed symmetrical distributions and heavy-tailed skewed distributions are prevalent in much research, considerable caution should be exercised when using the Fisher z transform or related techniques such as those proposed by Gayen [17] and Jeyaratnam [22], as these methods clearly are not robust to deviations from normality when |ρ xy|≠ 0.0. In general, there is no easy answer to this problem. However, a researcher cannot simply ignore a problem just because it is annoying. Unfortunately, given a non-normal population with ρ xy ≠ 0.0, there appear to be no published alternative tests of significance nor viable options for the construction of confidence intervals.

Finally, to paraphrase a line from Thompson regarding the use of tiltmeters in volcanology [53, p. 258],

  1. 1.

    Do not use the Fisher z transformation.

  2. 2.

    If you do use it, don’t believe it.

  3. 3.

    If you do believe it, don’t publish it.

  4. 4.

    If you do publish it, don’t be the first author.

7.6 Point-Biserial Linear Correlation

The point-biserial correlation coefficient measures the association between a dichotomous variable and an interval-level variable. Applications of the point-biserial correlation abound in fields such as education and educational psychology. The point-biserial correlation may be thought of simply as the Pearson product-moment correlation between an interval-level variable and a variable with two disjoint, unordered categories.

7.6.1 Example

To illustrate the point-biserial correlation coefficient, consider the dichotomous data listed in Table 7.32 for N = 13 subjects where variable x is a dichotomous variable coded (0, 1) and variable y is an interval-level variable. The point-biserial correlation is usually computed as

$$\displaystyle \begin{aligned} r_{pb} = \frac{\bar{y}_{1}-\bar{y}_{0}}{s_{y}}\sqrt{\frac{n_{0}n_{1}}{N(N-1)}}\;, \end{aligned}$$

where n 0 and n 1 denote the number of y values coded 0 and 1, respectively, N = n 0 + n 1, \(\bar {y}_{0}\) and \(\bar {y}_{1}\) denote the means of the y values coded 0 and 1, respectively, and s y is the sample standard deviation of the y values given by

$$\displaystyle \begin{aligned} s_{y} = \sqrt{\frac{1}{N-1}\sum_{i=1}^{N}\big( y_{i}-\bar{y} \big)^{2}}\;. \end{aligned}$$
Table 7.32 Example bivariate data for point-biserial correlation on N = 13 subjects

For the data listed in Table 7.32, n 0 = 6, n 1 = 7, \(\bar {y}_{0} = 20.3333\), \(\bar {y}_{1} = 24.1429\), s y = 4.2728, and the point-biserial correlation is

$$\displaystyle \begin{aligned} r_{pb} = \frac{\bar{y}_{1}-\bar{y}_{0}}{s_{y}}\sqrt{\frac{n_{0}n_{1}}{N(N-1)}} = \frac{24.1429-20.3333}{4.2728}\sqrt{\frac{(6)(7)}{13(13-1)}} = +0.4626\;. \end{aligned}$$

However, r pb can also be calculated simply as the Pearson product-moment correlation (r xy) between dichotomous variable x and interval variable y. For the data listed in Table 7.32, N = 13,

$$\displaystyle \begin{aligned} \sum_{i=1}^{N}x_{i} = \sum_{i=1}^{N}x_{i}^{2} = 7\;, \quad \sum_{i=1}^{N}y_{i} = 291\;, \quad \sum_{i=1}^{N}y_{i}^{2} = 6{,}733\;, \quad \sum_{i=1}^{N}x_{i}y_{i} = 169\;, \end{aligned}$$

and

Approaching the calculation of the probability value from a product-moment perspective, there are

$$\displaystyle \begin{aligned} M = N! = 13! = 6{,}227{,}020{,}800 \end{aligned}$$

possible, equally-likely arrangements in the reference set of all permutations of the observed bivariate data, making an exact permutation analysis impractical. Let r o denote the observed value of r pb. Then, based on L = 1, 000, 000 random arrangements of the observed data under the null hypothesis, there are 121,667 |r pb| values equal to or greater than |r o| = 0.4626, yielding a Monte Carlo resampling two-sided probability value of P = 121, 667∕1, 000, 000 = 0.121667.

In general, L = 1, 000, 000 ensures three decimal places of accuracy. However, it requires an increase of two orders of magnitude, i.e., L = 100, 000, 000, to ensure four decimal places of accuracy [23]. Based on L = 100, 000, 000 random arrangements of the observed bivariate data, the two-sided Monte Carlo resampling probability value of r pb = +0.4626 to six decimal places is P = 12, 121, 600∕100, 000, 000 = 0.121216.

However, because variable x is composed of only two categories, an alternative procedure exists for establishing the probability value of r pb. The relationships between r pb and Student’s two-sample t test are

$$\displaystyle \begin{aligned} r_{pb} = \sqrt{\frac{t^{2}}{t^{2}+N-2}} \quad \mbox{and} \quad t = \frac{r_{pb}\sqrt{N-2}}{\sqrt{1-r_{pb}^{2}}}\;. \end{aligned}$$

Thus, the probability value for a specified point-biserial correlation coefficient can be calculated much more efficiently as the probability value of a two-sample t test with N − 2 degrees of freedom. Consider the data in Table 7.32 rearranged into two groups coded 0 and 1 as in Table 7.33.

Table 7.33 Example data on N = 13 subjects for Student’s t test

For the observed data listed in Table 7.33, Student’s t test statistic is

$$\displaystyle \begin{aligned} t = \frac{r_{pb}\sqrt{N-2}}{\sqrt{1-r_{pb}^{2}}} = \frac{+0.4626 \sqrt{13-2}}{\sqrt{1-(+0.4626)^{2}}} = +1.7307\;. \end{aligned}$$

For the data listed in Table 7.33, there are only

$$\displaystyle \begin{aligned} M = \frac{N!}{n_{0}!\;n_{1}!} = \frac{13!}{6!\;7!} = 1{,}716 \end{aligned}$$

possible, equally-likely arrangements in the reference set of all permutations of the observed scores, compared with

$$\displaystyle \begin{aligned} M = N! = 13! = 6{,}227{,}020{,}800 \end{aligned}$$

in the initial set, making an exact permutation analysis possible. If all arrangements of the N = 13 observed scores occur with equal chance, the exact two-sided probability value of t = +1.7307 to six places computed on the M = 1, 716 possible arrangements of the observed data with n 0 = 6 and n 1 = 7 preserved for each arrangement is 208∕1, 716 = 0.121212.

The Monte Carlo resampling probability value of P = 0.121667 based on L = 1, 000, 000 and the Monte Carlo resampling probability value of P = 0.121216 based on L = 100, 000, 000 both compare favorably with the exact probability value of P = 0.121212. For comparison, the two-sided probability value of t = +1.7303 based on Student’s t distribution with N − 2 = 13 − 2 = 11 degrees of freedom is P = 0.111421.

7.6.2 Problems with the Point-Biserial Coefficient

Whenever a dichotomous variable is correlated with an interval-level variable, as in point-biserial correlation, there are potential problems with proper norming between ± 1. In brief, it is not possible to obtain a perfect correlation, positive or negative, between a dichotomous variable and a continuous variable [42, p. 145]. The reason is simply that it is not possible for a dichotomous variable and a continuous variable to have the same shape, as illustrated in Fig. 7.8 where a dichotomous variable (x) is correlated with a continuous variable (y) that is comprised of a uniform distribution, i.e, y = 1, 2, …, 10. In order to achieve a perfect correlation of r pb = +1.00, it would be necessary for all the scores at the two points of variable x (x = 0 and x = 1) to fall exactly on two points on variable y, as depicted in Fig. 7.9 where the larger black circles represent a cluster of points at x = 0 and x = 1. Since variable y is assumed to be continuous, this is not possible. Consequently, values of variable y at either of the two points on variable x (the dichotomous variable) must correspond to a range of points on variable y (the continuous variable).

Fig. 7.8
figure 8

Scatterplot of a uniform distribution of y values with the regression line overlaid

Fig. 7.9
figure 9

Scatterplot of clusters of y values located at x = 0 and x = 1 with the regression line overlaid

As Jum Nunnally showed in 1978, the maximum value of r pb between a dichotomous variable and a normally distributed variable is approximately r pb = ±0.80, which occurs only when p = n 0N = 0.50 [42]. As p deviates from 0.50 in either direction, the maximum value of r pb is further reduced. Consequently, when p = 0.25 or p = 0.75, the maximum value of r pb is approximately r pb = ±0.75, and when p = 0.90 or p = 0.10, the maximum value of r pb is only approximately r pb = ±0.58.Footnote 5

The problem can be illustrated with a small empirical example. Table 7.34 contains 10 scores (1, 2, …, 10) with frequencies corresponding to an expanded binomial distribution, which approximates a normal distribution with N = 512. For the binomial data listed in Table 7.34 with p = 0.50,

$$\displaystyle \begin{aligned} \bar{y}_{0} = \left( \sum_{i=1}^{n_{0}}f_{i} \right)^{-1} \sum_{i=1}^{n_{0}}f_{i}y_{i} = \frac{1+18+108+336+630}{1+9+36+84+126} = 4.2695\;, \end{aligned}$$
$$\displaystyle \begin{aligned} \bar{y}_{1} = \left( \sum_{i=1}^{n_{1}}f_{i} \right)^{-1} \sum_{i=1}^{n_{1}}f_{i}y_{i} = \frac{756+588+288+81+10}{126+84+36+9+1} = 6.7305\;, \end{aligned}$$
$$\displaystyle \begin{aligned} s_{y} = \sqrt{\frac{\displaystyle\sum_{i=1}^{N}fy^{2}-\displaystyle\frac{\left( \displaystyle\sum_{i=1}^{N}fy \right)^{2}}{N}}{N-1}} = \sqrt{\frac{16{,}640-\displaystyle\frac{(2{,}816)^{2}}{512}}{512-1}} = 1.5015\;, \end{aligned}$$

and

which approximates Nunnally’s estimate of r pb = +0.80.

Table 7.34 Example binomial distribution on N = 512 subjects with p = 0.50

Table 7.35 illustrates a binomial distribution with N = 512 and p ≃ 0.25, i.e.,

$$\displaystyle \begin{aligned} p = \frac{1}{N}\sum_{i=1}^{n_{0}}f_{i} = \frac{1+9+36+84}{512} = 0.2539\;. \end{aligned}$$
Table 7.35 Example binomial distribution on N = 512 subjects with p ≃ 0.25

For the binomial data in Table 7.35 with p ≃ 0.25,

$$\displaystyle \begin{aligned} \bar{y}_{0} = \left( \sum_{i=1}^{n_{0}}f_{i} \right)^{-1} \sum_{i=1}^{n_{0}}f_{i}y_{i} = \frac{1+18+108+336}{1+9+36+84} = 3.5615\;,\end{aligned} $$
$$\displaystyle \begin{aligned} \bar{y}_{1} = \left( \sum_{i=1}^{n_{1}}f_{i} \right)^{-1} \sum_{i=1}^{n_{1}}f_{i}y_{i} = \frac{630+756+588+288+81+10}{126+126+84+36+9+1} = 6.1597\;,\end{aligned} $$

the standard deviation of the y values is unchanged at s y = 1.5015 and

which approximates Nunnally’s estimate of r pb = +0.75.

While it is not convenient to take exactly 10% of N = 512 cases, as arranged in Table 7.34, it is possible to take 9% of N = 512 cases. Thus,

$$\displaystyle \begin{aligned} p = \frac{1}{N} \sum_{i=1}^{n_{0}} = \frac{1+9+36}{512} = \frac{46}{512} = 0.0898. \end{aligned}$$

Table 7.36 illustrates a binomial distribution with N = 512 and p = 0.09. For the binomial data listed in Table 7.36 with p ≃ 0.10,

$$\displaystyle \begin{aligned} \bar{y}_{0} = \left( \sum_{1=1}^{n_{0}}f_{i} \right)^{-1} \sum_{i=1}^{n_{0}}f_{i}y_{i} = \frac{1+18+108}{1+9+36} = 2.7609\;,\end{aligned} $$

the standard deviation of the y values is unchanged at s y = 1.5015 and

which approximates Nunnally’s estimate of r pb = +0.58.

Table 7.36 Example binomial distribution on N = 512 subjects with p = 0.09

7.7 Biserial Linear Correlation

Point-biserial correlation measures the degree of association between an interval-level variable and a dichotomous variable that is a true dichotomy, such as right and wrong, true and false, or left and right. On the other hand, biserial correlation measures the degree of association between an interval-level variable and a dichotomous variable that has been created from a variable that is assumed to be continuous and normally distributed, such as grades that have been dichotomized into “pass” and “fail” or weight that has been classified into “normal” and “obese.”Footnote 6 Biserial correlation has long been difficult to compute, requiring the ordinate of a unit-normal distribution. Some approximating methods have been suggested to simplify computation [16], but these are unnecessary with permutation methods.

Let x represent the dichotomous variable and y represent the continuous interval-level variable, then the biserial correlation coefficient is given by

$$\displaystyle \begin{aligned} r_{b} = \frac{(\bar{y}_{1}-\bar{y}_{0})pq}{uS_{y}}\;, \end{aligned}$$

where p and q = 1 − p denote the proportions of all y values coded 0 and 1, respectively, \(\bar {y}_{0}\) and \(\bar {y}_{1}\) denote the arithmetic means of the y values coded 0 and 1, respectively, S y is the standard deviation of the y values given byFootnote 7

$$\displaystyle \begin{aligned} S_{y} = \sqrt{ \frac{1}{N} \sum_{i=1}^{N}\big( y_{i}-\bar{y} \big)^{2}}\;, \end{aligned}$$

and u is the ordinate of the unit normal curve at the point of division between the p and q proportions under the curve given by

$$\displaystyle \begin{aligned} u = \frac{\exp(-z^{2}/2)}{\sqrt{2\pi}}\;. \end{aligned}$$

Written in raw terms without the p and q proportions,

$$\displaystyle \begin{aligned} r_{b} = \frac{(\bar{y}_{0}-\bar{y}_{1}) n_{0} n_{1}}{N^{2} u S_{y}}\;, \end{aligned}$$

where n 0 and n 1 denote the number of y values coded 0 and 1, respectively, and N = n 0 + n 1. The biserial correlation may also be written in terms of the point-biserial correlation coefficient,

$$\displaystyle \begin{aligned} r_{b} = \frac{r_{pb}\sqrt{pq}}{u} = \frac{r_{pb}\sqrt{n_{0}n_{1}}}{Nu}\;, \end{aligned}$$

where the point-biserial correlation coefficient is given by

$$\displaystyle \begin{aligned} r_{pb} = \frac{(\bar{y}_{1}-\bar{y}_{0})\sqrt{pq}}{S_{y}}\;.\end{aligned} $$

7.7.1 Example

To illustrate the calculation of the biserial correlation coefficient, consider the set of data given in Table 7.37 where N = 15 subjects are scored on interval-level variable y and are classified into types on dichotomous variable x. For the data listed in Table 7.37, n 0 = 6, n 1 = 9, p = 6∕15 = 0.40, q = 9∕15 = 0.60,

$$\displaystyle \begin{aligned} \bar{y}_{0} = \frac{1}{n_{0}}\sum_{i=1}^{n_{0}}y_{i} = \frac{12+15+11+18+13+11}{6} = 13.3333\;, \end{aligned}$$
$$\displaystyle \begin{aligned} \bar{y}_{1} = \frac{1}{n_{1}}\sum_{i=1}^{n_{1}}y_{i} = \frac{10+33+19+21+29+12+19+23+16}{9} = 20.2222\;, \end{aligned}$$
$$\displaystyle \begin{aligned} S_{y} = \sqrt{ \frac{1}{N} \sum_{i=1}^{N}\big( y_{i}-\bar{y} \big)^{2}} = \sqrt{\frac{649.7333}{15}} = 6.5815\;,\end{aligned} $$

the standard score that defines the lower p = 0.40 of the unit-normal distribution is z = −0.2533,

$$\displaystyle \begin{aligned} u = \frac{\exp(-z^{2}/2)}{\sqrt{2\pi}} = \frac{\exp[-(-0.2533)^{2}/2]}{\sqrt{(2)(3.1416)}} = 0.3863\;,\end{aligned} $$

and

$$\displaystyle \begin{aligned} r_{b} = \frac{(\bar{y}_{1}-\bar{y}_{0})pq}{u S_{y}} = \frac{(20.2222-13.3333)(0.40)(0.60)}{(0.3863)(6.5815)} = +0.6503\;. \end{aligned}$$
Table 7.37 Example biserial correlation data on N = 15 subjects

For the data listed in Table 7.37, the point-biserial correlation coefficient is

$$\displaystyle \begin{aligned} r_{pb} = \frac{(\bar{y}_{1}-\bar{y}_{0})\sqrt{pq}}{S_{y}} = \frac{(20.2222-13.3333)\sqrt{(0.40)(0.60)}}{6.5815 } = +0.5128\;, \end{aligned}$$

and in terms of the point-biserial correlation coefficient, the biserial correlation coefficient is

$$\displaystyle \begin{aligned} r_{b} = \frac{r_{pb}\sqrt{pq}}{u} = \frac{+0.5128 \sqrt{(0.40)(0.60)}}{0.3863} = +0.6503\;. \end{aligned}$$

For the N = 15 scores listed in Table 7.37, there are only

$$\displaystyle \begin{aligned} M = \frac{N!}{n_{0}!\;n_{1}!} = \frac{15!}{6!\;9!} = 5{,}005 \end{aligned}$$

possible, equally-likely arrangements in the reference set of all permutations of the observed scores, making an exact permutation analysis easily accomplished. Note that in the formula for the biserial correlation coefficient,

$$\displaystyle \begin{aligned} r_{b} = \frac{\bar{y}_{1}-\bar{y}_{0}pq}{uS_{y}} \end{aligned}$$

p, q, u, and S y are invariant under permutation. Therefore, the permutation distribution can efficiently be based entirely on \(\bar {y}_{1}-\bar {y}_{0}\). If all M = 5, 005 arrangements of the N = 15 observed values occur with equal chance, the exact two-sided probability value of |r b| = +0.6503 computed on the M = 5, 005 possible arrangements of the observed data with n 0 = 6 and n 1 = 9 preserved for each arrangement is P = 263∕5, 005 = 0.0525.

7.8 Intraclass Correlation

There exists an extensive, and controversial, literature on the intraclass correlation coefficient and its uses. The standard reference is by E.A. Haggard, Intraclass Correlation and the Analysis of Variance [20], although it has been heavily criticized for both its exposition and its statistical accuracy [51]. See also discussions by Bartko [3, 2, 4], Kraemer [27], Kraemer and Thiemann [26, pp. 32–34, 54–56], Shrout and Fleiss [50], von Eye and Mun [54, pp. 116-122], and Winer [56, pp. 289–296].

The intraclass correlation coefficient is most often used for measuring the level of agreement among judges. The coefficient represents concordance, where + 1 indicates perfect agreement and 0 indicates no agreement. While the maximum value of the intraclass correlation coefficient is + 1, the minimum is given by − 1∕(k − 1), where k is the number of judges. Thus, for k = 2 judges the lower limit is − 1, but for k = 3 judges the lower limit is − 1∕2, for k = 4 judges the lower limit is − 1∕3, for k = 5 judges the lower limit is − 1∕4, and so on, approaching zero as the number of judges increases. A number of authors recommend that when the intraclass correlation coefficient is negative, it should be interpreted as zero [4, 20, p. 71], but this seems intuitively wrong.

In many ways the intraclass correlation coefficient is a special form of the Pearson product-moment (interclass) correlation coefficient. Consider the small set of data given in Table 7.38 with N = 5 subjects and measurements on Height (x) and Weight (y). For the bivariate data given in Table 7.38 with N = 5 subjects,

$$\displaystyle \begin{aligned} \sum_{i=1}^{N}x_{i} = 15\;, \;\; \sum_{i=1}^{N} x_{i}^{2} = 55\;, \;\; \sum_{i=1}^{N} y_{i} = 25\;, \;\; \sum_{i=1}^{N} y_{i}^{2} = 135\;, \;\; \sum_{i=1}^{N}x_{i}y_{i} = 83\;, \end{aligned}$$

and the Pearson product-moment correlation coefficient is r xy = +0.80.

Table 7.38 Example bivariate correlation data on N = 5 subjects

Now consider N = 5 sets of twins and let the variable under consideration be Weight, as in Table 7.39. The question is, which of the two variables labeled Weight is to be considered variable x and which is to be considered variable y? The problem can be solved by the intraclass correlation coefficient using double entries. The intraclass correlation between N pairs of observations on two variables, x and y, is by definition the ordinary Pearson product-moment (interclass) correlation between 2N pairs of observations, the first N of which are the original observations, and the second N the original observations with variable x replacing variable y and vice versa [15, Sect. 38]. Table 7.40 illustrates the arrangement. For the bivariate data given in Table 7.40 with 2N = 10 subjects,

$$\displaystyle \begin{aligned} \sum_{i=1}^{N}x_{i} = \sum_{i=1}^{N} y_{i} = 40\;, \quad \sum_{i=1}^{N} x_{i}^{2} = \sum_{i=1}^{N} y_{i}^{2} = 190\;, \quad \sum_{i=1}^{N}x_{i}y_{i} = 166\;, \end{aligned}$$

and the intraclass correlation coefficient is r I = +0.20. Note that certain computational simplifications follow from the reversal of the variables, mainly because the reversals make the marginal distributions for the new variables the same and, therefore, the means and variances of the new variables are also the same [46, p. 20].

Table 7.39 Example bivariate correlation data on N = 5 twins
Table 7.40 Example bivariate correlation data on 2N = 10 twins

For cases with k > 2, the construction of a table suitable for calculating the intraclass correlation coefficient is more laborious. For example, given k = 3 judges, designate the three values for each subject as x 1, x 2, and x 3. The three values are entered into the table as six observations, each being one of the six permutations of two values that can be made from the original three values. That is, the values of the three values x 1, x 2, and x 3 for each subject are entered into a bivariate correlation table with coordinates (x 1, x 2), (x 1, x 3), (x 2, x 3), (x 2, x 1), (x 3, x 1), and (x 3, x 2), and the Pearson product-moment correlation coefficient is computed for the resulting table, yielding the intraclass correlation coefficient.

To illustrate, consider the small data set given in Table 7.41 with N = 3 subjects and k = 3 judges. The permutations of the observations in Table 7.41 are listed in the correlation matrix given in Table 7.42. For the bivariate data listed in Table 7.42 with N = 18 subjects,

$$\displaystyle \begin{aligned} \sum_{i=1}^{N}x_{i} = \sum_{i=1}^{N}y_{i} = 90\;, \quad \sum_{i=1}^{N} x_{i}^{2} = \sum_{i=1}^{N}y_{i}^{2} = 570\;, \quad \sum_{i=1}^{N}x_{i}y_{i} = 552\;,\end{aligned} $$

and the intraclass correlation coefficient obtained via the Pearson product-moment correlation coefficient is r I = r xy = +0.85.

Table 7.41 Example correlation data with k = 3 judges and N = 3 subjects
Table 7.42 Bivariate permutation matrix for k = 3 judges and N = 3 subjects

Because of the complexity of double entries with k > 2, the intraclass correlation coefficient is usually formulated as an analysis of variance with variable A a random variable. There are actually three different intraclass correlation coefficients, and two forms of each [32, 50, 57]. The three types and two forms are designated as:

$$\displaystyle \begin{gathered} \text{ICC(1, 1)} \text{ and } \text{ICC(1,}\,k\text{)},\\ \text{ICC(2, 1)} \text{ and } \text{ICC(2,}\,k\text{)},\\ \text{ICC(3, 1)} \text{ and } \text{ICC(3,}\,k\text{)}.\vspace{-3pt} \end{gathered} $$

Case 1, Form 1: ICC(1, 1)

For Case 1, Form 1, there exists a pool of judges. For each subject, a researcher randomly samples k judges from the pool to evaluate each subject. The k judges who rate Subject 1 are not necessarily the same judges who rate Subject 2. To illustrate Case 1, Form 1, Table 7.43 lists example data for k = 4 judges (A) and N = 6 subjects (S).

Table 7.43 Example data for Case 1, Form 1, with N = 6 subjects (S) and k = 4 judges (A)

Now consider the data given in Table 7.43 as a one-way randomized-block analysis of variance, given in Table 7.44. For the summary data given in Table 7.44, let a indicate the number of levels of Factor A, then the sum-of-squares Total is

$$\displaystyle \begin{aligned} \mathit{SS}_{\text{Total}} = \sum_{i=1}^{N}x_{i}^{2}-\frac{\left( \:\displaystyle\sum_{i=1}^{N}x_{i} \right)^{2}}{Na} = 841-\frac{(127)^{2}}{(6)(4)} = 168.9583\;, \end{aligned}$$

the sum-of-squares Between Subjects (BS) is

$$\displaystyle \begin{aligned} \begin{array}{rcl} \mathit{SS}_{\text{BS}} = \frac{\displaystyle\sum_{i=1}^{N}T_{S_{i}}^{2}}{a}&\displaystyle -&\displaystyle \frac{\left( \:\displaystyle\sum_{i=1}^{N}x_{i}\right)^{2}}{Na}\\ &\displaystyle &\displaystyle = \frac{(24)^{2}+(12)^{2}+ \cdots +(19)^{2}}{4}-\frac{(127)^{2}}{(6)(4)} = 56.2083\;, \end{array} \end{aligned} $$

the sum-of-squares for Factor A is

$$\displaystyle \begin{aligned} \begin{array}{rcl} \mathit{SS}_{\text{A}} = \frac{\displaystyle\sum_{j=1}^{a}T_{A_{j}^{2}}}{N}&\displaystyle -&\displaystyle \frac{\left( \:\displaystyle\sum_{i=1}^{N}x_{i}\right)^{2}}{Na}\\ &\displaystyle &\displaystyle = \frac{(46)^{2}+(15)^{2}+(26)^{2}+(40)^{2}}{6}-\frac{(127)^{2}}{(6)(4)} = 97.4583\;, \end{array} \end{aligned} $$

the sum-of-squares Within Subjects (WS) is

$$\displaystyle \begin{aligned} \mathit{SS}_{\text{WS}} = \mathit{SS}_{\text{Total}}-\mathit{SS}_{\text{BS}} = 168.9583-56.2083 = 112.7500\;, \end{aligned}$$

and the sum-of-squares Error is

$$\displaystyle \begin{aligned} \mathit{SS}_{\text{Error}} = \mathit{SS}_{\text{A}{\times}\text{S}} = \mathit{SS}_{\text{WS}}-\mathit{SS}_{\text{A}} = 112.7500-97.4583 = 15.2917\;. \end{aligned}$$
Table 7.44 Example data for Case 1, Form 1, prepared for an analysis of variance with N = 6 subjects (S) and k = 4 judges (A)

The analysis of variance source table is given in Table 7.45. For Case 1, Form 1, the intraclass correlation coefficient is given by

Table 7.45 Analysis of variance source table for the data given in Table 7.44 with k = 4 judges and N = 6 subjects

Case 1, Form k: ICC(1,k)

If each judge is replaced with a group of k judges, such as a team of clinicians, and the score is the average score of the k judges, then for Case 1, Form k, the intraclass correlation coefficient is

$$\displaystyle \begin{aligned} \text{ICC(1, }k\text{)} = \frac{\mathit{MS}_{\text{BS}}-\mathit{MS}_{\text{WS}}}{\mathit{MS}_{\text{BS}}} = \frac{11.2417-6.2639}{11.2417} = +0.4428\;.\end{aligned} $$

Case 2, Form 1: ICC(2, 1)

If the same set of k judges rate each subject and the k judges are considered a random sample from a population of potential judges, then the intraclass correlation coefficient is designated ICC(2, 1). Because this is the most common case/form, it is usually designated simply as r I in the literature.

$$\displaystyle \begin{aligned} \begin{array}{rcl} \text{ICC(2, 1)} &\displaystyle =&\displaystyle \frac{\mathit{MS}_{\text{BS}}-\mathit{MS}_{\text{A} {\times} \text{S}}}{\mathit{MS}_{\text{BS}}+(a-1)\mathit{MS}_{\text{A} {\times} \text{S}}+\displaystyle\frac{a(\mathit{MS}_{\text{A}}-\mathit{MS}_{\text{A} {\times} \text{S}})}{N}}\\ &\displaystyle =&\displaystyle \frac{11.2417-1.0194}{11.2417+(4-1)(1.0194)+\displaystyle\frac{(4)(32.4861-1.0194)}{6}} = +0.2898\;. \end{array} \end{aligned} $$

Case 2, Form k: ICC(2, k)

If each judge is replaced with a team of k judges, and the score is the average score of the k judges, then for Case 2, Form k, the intraclass correlation coefficient is

Case 3, Form 1: ICC(3, 1)

Case 3, Form 1 is the same as Case 2, Form 1, except that the raters are considered as fixed, not random. For Case 3, Form 1, the intraclass correlation coefficient is

Case 3, Form k: ICC(3, k)

If each judge is replaced with a team of k judges and the teams are considered as fixed, not random, the intraclass correlation coefficient is

$$\displaystyle \begin{aligned} \text{ICC(3, }k\text{)} = \frac{\mathit{MS}_{\text{BS}}-\mathit{MS}_{\text{A} {\times} \text{S}}}{\mathit{MS}_{\text{BS}}} = \frac{11.2417-1.0194}{11.2417} = +0.9093\;. \end{aligned}$$

7.8.1 Example

For another example of the intraclass correlation coefficient, consider Case 2, Form 1, the most common in the literature, with k judges randomly selected from a pool of potential judges. Table 7.46 contains data for k = 3 judges and N = 5 subjects. Table 7.47 contains the analysis of variance source table for the data given in Table 7.46. Given the analysis of variance source table in Table 7.47, the intraclass correlation coefficient is

Table 7.46 Example data for Case 2, Form 1, with N = 5 subjects (S) and k = 3 judges (A)
Table 7.47 Analysis of variance source table for the data given in Table 7.46 with k = 3 judges and N = 5 subjects

7.8.2 A Permutation Analysis

Permutation analyses are completely data-dependent and do not depend on random sampling and/or fixed- or random-effects models. For the data given in Table 7.46 for k = 3 judges and N = 5 subjects there are only

$$\displaystyle \begin{aligned} M = \big( k! \big)^{N} = \big( 3! \big)^{5} = 7{,}776 \end{aligned}$$

possible, equally-likely arrangements in the reference set of all permutations of the observed data, making an exact permutation analysis possible. If r o denotes the observed value of r I, the exact upper-tail probability value of the observed value of r I is

$$\displaystyle \begin{aligned} P \big( r_{\text{I}} \geq r_{\text{o}}|H_{0} \big) = \frac{\text{number of }r_{\text{I}}\text{ values } \geq r_{\text{o}}}{M} = \frac{24}{7{,}776} = 0.0031\;. \end{aligned}$$

7.8.3 Interclass and Intraclass Linear Correlation

In the special case of k = 2 the relationship between the Pearson product-moment (interclass) correlation coefficient and the Pearson intraclass correlation coefficient can easily be demonstrated. Given k = 2 judges, the value of the intraclass correlation depends in part upon the corresponding Pearson product-moment correlation, but it also depends upon the differences between the means and standard deviations of the two variables. Thus,

$$\displaystyle \begin{aligned} r_{\text{I}} = \frac{\left[ \left( \sigma_{x}^{2}+\sigma_{y}^{2} \right)-\left( \sigma_{x}-\sigma_{y} \right)^{2}\right]\,r_{xy}-\left( \bar{x}-\bar{y} \right)^{2}/2}{(\sigma_{x}^{2}+\sigma_{y}^{2})+\left( \bar{x}-\bar{y} \right)^{2}/2}\;, \end{aligned}$$

where \(\bar {x}\) and \(\bar {y}\) denote the means, \(\sigma _{x}^{2}\) and \(\sigma _{y}^{2}\) the variances, and r xy the Pearson product-moment correlation of variables x and y. Thus, for the bivariate data given in Table 7.38 on p. 59, replicated in Table 7.48 for convenience,

$$\displaystyle \begin{aligned} \bar{x} = 3.00\;, \quad \bar{y} = 5.00\;, \quad \sigma_{x} = \sigma_{y} = 1.4142\;, \quad \sigma_{x}^{2} = \sigma_{y}^{2} = 2.00\;, \end{aligned}$$

r xy = +0.80, and

the same value found with 2N pairs of observations.

Table 7.48 Example bivariate correlation data on N = 5 subjects

7.9 Coda

Chapter 7 applied permutation statistical methods to measures of association for two variables at the interval level of measurement. Included in Chap. 7 were discussions of ordinary least squares (OLS) regression, least absolute deviation (LAD regression), multivariate multiple regression, point-biserial correlation, biserial correlation, intraclass correlation, and Fisher’s z transform for skewed distributions.

Chapter 8 applies exact and Monte Carlo resampling permutation statistical methods to measures of association for two variables at different levels of measurement, e.g., a nominal-level variable and an ordinal-level variable, a nominal-level variable and an interval-level variable, and an ordinal-level variable and an interval-level variable. Included in Chap. 8 are permutation statistical methods applied to Freeman’s θ, Agresti’s \(\hat {\delta }\), Piccarreta’s \(\hat {\tau }\), Whitfield’s S, Cureton’s r rb, Pearson’s η 2, Kelley’s 𝜖 2, Hays’ \(\hat {\omega }^{2}\), and Jaspen’s multiserial correlation coefficient.