Completely-Randomized Designs

Berry, Kenneth J.; Johnston, Janis E.; Mielke, Paul W.

doi:10.1007/978-3-030-20933-9_8

Kenneth J. Berry⁴,
Janis E. Johnston⁵ &
Paul W. Mielke Jr.⁶

968 Accesses

Abstract

This chapter introduces permutation methods for multiple independent variables; that is, completely-randomized designs. Included in this chapter are six example analyses illustrating computation of exact permutation probability values for multi-sample tests, calculation of measures of effect size for multi-sample tests, the effect of extreme values on conventional and permutation multi-sample tests, exact and Monte Carlo permutation procedures for multi-sample tests, application of permutation methods to multi-sample rank-score data, and analysis of multi-sample multivariate data. Included in this chapter are permutation versions of Fisher’s F test for one-way, completely-randomized analysis of variance, the Kruskal–Wallis one-way analysis of variance for ranks, the Bartlett–Nanda–Pillai trace test for multivariate analysis of variance, and a permutation-based alternative for the four conventional measures of effect size for multi-sample tests: Cohen’s $\hat {d}$, Pearson’s η ², Kelley’s $\hat {\eta }^{2}$, and Hays’ $\hat {\omega }^{2}$.

Access provided by Autonomous University of Puebla. Download chapter PDF

Randomized Designs: Ordinal Data, I

Completely Randomized Data

Randomized Block Designs: Nominal Data

This chapter presents exact and Monte Carlo permutation statistical methods for multi-sample tests. Multi-sample tests are of two types: tests for experimental differences among three or more independent samples (completely-randomized designs) and tests for experimental differences among three or more dependent samples (randomized-blocks designs).^{Footnote 1} Permutation statistical methods for multiple dependent samples are presented in Chap. 9. Permutation statistical methods for multiple independent samples are presented in this chapter. In addition there are mixed models with one or more independent samples and one or more dependent samples, but these models are beyond the scope of this introductory book on permutation statistical methods. Interested readers can consult a 2016 book on Permutation Statistical Methods: An Integrated Approach by the authors [2].

Multi-sample tests for independent samples constitute a large family of tests in conventional statistical methods. Included in this family are one-way analysis of variance with univariate responses (ANOVA), one-way analysis of variance with multivariate responses (MANOVA), one-way analysis of variance with one or more covariates and univariate responses (ANCOVA), one-way analysis of variance with one or more covariates and multivariate responses (MANCOVA), and a variety of factorial designs that may be two-way, three-way, four-way, nested, balanced, unbalanced, fixed, random, or mixed.

In this chapter, permutation statistical methods for multiple independent samples are illustrated with six example analyses. The first example utilizes a small set of data to illustrate the computation of exact permutation methods for multiple independent samples, wherein the permutation test statistic, δ, is developed and compared with Fisher’s conventional F-ratio test statistic. The second example develops a permutation-based measure of effect size as a chance-corrected alternative to the five conventional measures of effect size for multi-sample tests: Cohen’s $\hat {d}$, Pearson’s η ², Kelley’s $\hat {\eta }^{2}$, Hays’ $\hat {\omega }_{\text{F}}^{2}$ for fixed models, and Hays’ $\hat {\omega }_{\text{R}}^{2}$ for random models. The third example compares permutation statistical methods based on ordinary and squared Euclidean scaling functions, with an emphasis on the analysis of data sets containing extreme values. The fourth example utilizes a larger data set to provide a comparison of exact permutation methods and Monte Carlo permutation methods, demonstrating the efficiency and accuracy of Monte Carlo statistical methods for multi-sample tests. The fifth example illustrates the application of permutation statistical methods to univariate rank-score data, comparing permutation statistical methods to the conventional Kruskal–Wallis one-way analysis of variance for ranks test. The sixth example illustrates the application of permutation statistical methods to multivariate data, comparing permutation statistical methods with the conventional Bartlett–Nanda–Pillai trace test for multivariate data.

8.1 Introduction

The most popular univariate test for g ≥ 3 independent samples under the Neyman–Pearson population model of statistical inference is Fisher’s one-way analysis of variance wherein the null hypothesis (H ₀) posits no mean differences among the g populations from which the samples are presumed to have been randomly drawn; that is, H ₀: μ ₁ = μ ₂ = ⋯ = μ _g. It should be noted that Fisher , writing in the first edition of Statistical Methods for Research Workers in 1925, named the aforementioned statistic the variance-ratio test, symbolized it as z, and defined it as

$$\displaystyle \begin{aligned} z = \frac{1}{2} \log_{e} \left( \frac{\nu_{1}}{\nu_{0}} \right)\;, \end{aligned}$$

where ν ₁ = MS _Between and ν ₀ = MS _Within in modern notation. In 1934, in an effort to eliminate the calculation of the natural logarithm required for calculating Fisher’s z test, George Snedecor at Iowa State University published tabled values in a small monograph for Fisher’s variance-ratio z statistic and renamed the test statistic F, presumably in honor of Fisher [22]. It has often been reported that Fisher was displeased when the variance-ratio z test statistic was renamed F by Snedecor [4, 8].

Fisher’s F-ratio test for a completely-randomized design does not determine whether or not the null hypothesis is true, but only provides the probability that, if the null hypothesis is true, the samples have been drawn from populations with identical mean values, assuming normality and homogeneity of variance.

Consider a conventional multi-sample F test with samples of independent and identically distributed univariate random variables of sizes n ₁, …, n _g, viz.,

$$\displaystyle \begin{aligned} \{x_{11},\,\ldots,\,x_{n_{1}1}\},\,\ldots,\,\{x_{1g},\,\ldots,\,x_{n_{g}g}\}\;, \end{aligned}$$

drawn from g specified populations with cumulative distribution functions F ₁(x), …, F _g(x), respectively. For simplicity, suppose that population i is normal with mean μ _i and variance σ ² for i = 1, …, g. This is the standard one-way classification model with g treatment groups. Under the Neyman–Pearson population model of statistical inference, the null hypothesis of no differences among the population means tests

$$\displaystyle \begin{aligned} H_{0}{:}\;\mu_{1} = \mu_{2} = \cdots = \mu_{g} \quad \mbox{versus} \quad H_{1}{:}\;\mu_{i} \neq \mu_{j} \quad \mbox{for some }i \neq j \end{aligned}$$

for g treatment groups. The permissible probability of a type I error is denoted by α and if the observed value of Fisher’s F-ratio test statistic is equal to or greater than the critical value of F that defines α, the null hypothesis is rejected with a probability of type I error equal to or less than α, under the assumptions of normality and homogeneity.

For multi-sample tests with g treatment groups and N observations, Fisher’s F-ratio test statistic is given by

$$\displaystyle \begin{aligned} F = \frac{\mathit{MS}_{\mathrm{Between}}}{\mathit{MS}_{\mathrm{Within}}}\;, \end{aligned}$$

where the mean-square between treatments is given by^{Footnote 2}

$$\displaystyle \begin{aligned} \mathit{MS}_{\mathrm{Between}} = \frac{\mathit{SS}_{\mathrm{Between}}}{g-1}\;, \end{aligned}$$

the sum-of-squares between treatments is given by

$$\displaystyle \begin{aligned} \mathit{SS}_{\mathrm{Between}} = \sum_{i=1}^{g} n_{i} \big( \bar{x}_{i}-\bar{\bar{x}} \big)^{2}\;, \end{aligned}$$

the mean-square within treatments is given by

$$\displaystyle \begin{aligned} \mathit{MS}_{\mathrm{Within}} = \frac{\mathit{SS}_{\mathrm{Within}}}{N-g}\;, \end{aligned}$$

the sum-of-squares within treatments is given by

$$\displaystyle \begin{aligned} \mathit{SS}_{\mathrm{Within}} = \sum_{i=1}^{g}\,\sum_{j=1}^{n_{i}} \big( x_{ij}-\bar{x}_{i} \big)^{2}\;, \end{aligned}$$

the sum-of-squares total is given by

$$\displaystyle \begin{aligned} \mathit{SS}_{\mathrm{Total}} = \mathit{SS}_{\mathrm{Between}}+\mathit{SS}_{\mathrm{Within}} = \sum_{i=1}^{g}\,\sum_{j=1}^{n_{i}} \big( x_{ij}-\bar{\bar{x}} \big)^{2}\;, \end{aligned}$$

the mean value for the ith of g treatment groups is given by

$$\displaystyle \begin{aligned} \bar{x}_{i} = \frac{1}{n_{i}} \sum_{j=1}^{n_{i}} x_{ij}\;, \end{aligned}$$

the grand mean for all g treatment groups combined is given by

$$\displaystyle \begin{aligned} \bar{\bar{x}} = \frac{1}{N} \sum_{i=1}^{g}\,\sum_{j=1}^{n_{i}} x_{ij}\;, \end{aligned}$$

and the total number of observations is

$$\displaystyle \begin{aligned} N = \sum_{i=1}^{g}n_{i}\;. \end{aligned}$$

Under the Neyman–Pearson null hypothesis, H ₀: μ ₁ = μ ₂ = ⋯ = μ _g, test statistic F is asymptotically distributed as Snedecor’s F distribution with ν ₁ = g − 1 degrees of freedom in the numerator and ν ₂ = N − g degrees of freedom in the denominator. However, if any of the g populations is not normally distributed, then the distribution of test statistic F no longer follows Snedecor’s F distribution with ν ₁ = g − 1 and ν ₂ = N − g degrees of freedom.

The assumptions underlying Fisher’s F-ratio test for multiple independent samples are (1) the observations are independent, (2) the data are random samples from well-defined, normally-distributed populations, and (3) homogeneity of variance; that is, $\sigma _{1}^{2} = \sigma _{2}^{2} = \cdots = \sigma _{g}^{2}$.

8.2 A Permutation Approach

Now consider a test for multiple independent samples under the Fisher–Pitman permutation model of statistical inference. Under the Fisher–Pitman permutation model there is no null hypothesis specifying population parameters. Instead the null hypothesis simply states that all possible arrangements of the observations occur with equal chance [10]. Also, there is no alternative hypothesis under the permutation model and no specified α level. Moreover, there is no requirement of random sampling, no degrees of freedom, no assumption of normality, and no assumption of homogeneity of variance.

A permutation alternative to the conventional F test for multiple independent samples is easily defined. The permutation test statistic for g ≥ 3 independent samples is given by

$$\displaystyle \begin{aligned} \delta = \sum_{i=1}^{g}C_{i}\xi_{i}\;, \end{aligned} $$

(8.1)

where C _i > 0 is a positive treatment-group weight for i = 1, …, g,

$$\displaystyle \begin{aligned} \xi_{i} = \binom{n_{i}}{2}^{-1} \sum_{j=1}^{N-1}\,\sum_{k=j+1}^{N} \Delta(j,k) \Psi_{i}(\omega_{j})\Psi_{i}(\omega_{k}) \end{aligned} $$

(8.2)

is the average distance-function value for all distinct pairs of objects in sample S _i for i = 1, …, g,

$$\displaystyle \begin{aligned} \Delta(j,k) = \big| x_{j}-x_{k} \big|{}^{v} \end{aligned} $$

denotes a symmetric distance-function value for a single pair of objects,

$$\displaystyle \begin{aligned} N = \sum_{i=1}^{g}n_{i}\;, \end{aligned} $$

and Ψ(⋅) is an indicator function given by

$$\displaystyle \begin{aligned} \Psi_{i}(\omega_{j}) = \begin{cases} \,1 & \text{if }\omega_{j} \in S_{i}\;, \\ {} \,0 & \text{otherwise .} \end{cases} \end{aligned} $$

Under the Fisher–Pitman permutation model, the null hypothesis simply states that equal probabilities are assigned to each of the

$$\displaystyle \begin{aligned} M = \frac{N!}{\displaystyle\prod_{i=1}^{g}n_{i}!} \end{aligned} $$

(8.3)

possible, equally-likely allocations of the N objects to the g samples [10]. The probability value associated with an observed value of δ, say δ _o, is the probability under the null hypothesis of observing a value of δ as extreme or more extreme than δ _o. Thus, an exact probability value for δ _o may be expressed as

$$\displaystyle \begin{aligned} P \big( \delta \leq \delta_{\text{o}}|H_{0} \big) = \frac{\text{number of }\delta\text{ values }\leq \delta_{\text{o}}}{M}\;. \end{aligned} $$

(8.4)

When M is large, an approximate probability value for δ may be obtained from a Monte Carlo permutation procedure, where

$$\displaystyle \begin{aligned} P \big( \delta \leq \delta_{\text{o}}|H_{0} \big) = \frac{\text{number of }\delta\text{ values }\leq \delta_{\text{o}}}{L} \end{aligned}$$

and L denotes the number of randomly-sampled test statistic values. Typically, L is set to a large number to ensure accuracy; for example, L = 1, 000, 000 [11].

8.3 The Relationship Between Statistics F and δ

When the null hypothesis under the Neyman–Pearson population model states H ₀: μ ₁ = μ ₂ = ⋯ = μ _g, v = 2, and the treatment-group weights are given by

$$\displaystyle \begin{aligned} C_{i} = \frac{n_{i}-1}{N-g}\;, \qquad i = 1,\,\ldots,\,g\;, \end{aligned}$$

the functional relationships between test statistic δ and Fisher’s F-ratio test statistic are given by

$$\displaystyle \begin{aligned} \delta = \frac{2 \mathit{SS}_{\mathrm{Total}}}{N-g+(g-1)F} \quad \mbox{and} \quad F = \frac{2 \mathit{SS}_{\mathrm{Total}}}{(g-1)\delta}-\frac{N-g}{g-1}\;, \end{aligned} $$

(8.5)

where

$$\displaystyle \begin{aligned} \mathit{SS}_{\mathrm{Total}} = \sum_{i=1}^{N}x_{i}^{2}-\left( \sum_{i=1}^{N} x_{i} \right)^{2} \left/ \rule{0pt}{14pt} N \right.\;, \end{aligned}$$

and x _i is a univariate measurement score for the ith of N objects. The permutation analogue of the F test is generally known as the Fisher–Pitman permutation test [3].

Because of the relationship between test statistics δ and F, the exact probability values given by

$$\displaystyle \begin{aligned} P \big( \delta \leq \delta_{\text{o}}|H_{0} \big) = \frac{\text{number of }\delta\text{ values }\leq \delta_{\text{o}}}{M} \end{aligned}$$

and

$$\displaystyle \begin{aligned} P \big( F \geq F_{\text{o}}|H_{0} \big) = \frac{\text{number of }F\text{ values }\geq F_{\text{o}}}{M} \end{aligned}$$

are equivalent under the Fisher–Pitman null hypothesis, where δ _o and F _o denote the observed values of δ and F, respectively, and M is the number of possible, equally-likely arrangements of the observed data.

A chance-corrected measure of agreement among the N measurement scores is given by

$$\displaystyle \begin{aligned} \Re = 1-\frac{\delta}{\mu_{\delta}}\;, \end{aligned} $$

(8.6)

where μ _δ is the arithmetic average of the M δ test statistic values calculated on all possible arrangements of the observed measurements; that is,

$$\displaystyle \begin{aligned} \mu_{\delta} = \frac{1}{M} \sum_{i=1}^{M} \delta_{i}\;. \end{aligned} $$

(8.7)

Alternatively, in terms of a one-way analysis of variance model, the exact expected value of test statistic δ is a simple function of the total sum-of-squares; that is,

$$\displaystyle \begin{aligned} \mu_{\delta} = \frac{2\mathit{SS}_{\mathrm{Total}}}{N-1}\;. \end{aligned}$$

8.4 Example 1: Test Statistics F and δ

A small example will serve to illustrate the relationship between test statistics F and δ. Consider the example data listed in Table 8.1 with g = 3 treatment groups, sample sizes of n ₁ = n ₂ = 3, n ₃ = 4, and N = n ₁ + n ₂ + n ₃ = 3 + 3 + 4 = 10 total observations. Under the Neyman–Pearson population model with sample sizes n ₁ = n ₂ = 3, and n ₃ = 4, treatment-group means $\bar {x}_{1} = 3$, $\bar {x}_{2} = 4$, and $\bar {x}_{3} = 8$, grand mean $\bar {\bar {x}} = 5.30$, estimated population variances $s_{1}^{2} = s_{2}^{2} = 1.00$ and $s_{3}^{2} = 0.6667$, the sum-of-squares between treatments is

$$\displaystyle \begin{aligned} \mathit{SS}_{\mathrm{Between}} = \sum_{i=1}^{g} n_{i} \big( \bar{x}_{i}-\bar{\bar{x}} \big)^{2} = 50.10\;, \end{aligned}$$

the sum-of-squares within treatments is

$$\displaystyle \begin{aligned} \mathit{SS}_{\mathrm{Within}} = \sum_{i=1}^{g}\,\sum_{j=1}^{n_{i}} \big( x_{ij}-\bar{x}_{i} \big)^{2} = 6.00\;, \end{aligned}$$

the sum-of-squares total is

$$\displaystyle \begin{aligned} \mathit{SS}_{\mathrm{Total}} = \mathit{SS}_{\mathrm{Between}}+\mathit{SS}_{\mathrm{Within}} = 50.10+6.00 = 56.10\;, \end{aligned}$$

the mean-square between treatments is

$$\displaystyle \begin{aligned} \mathit{MS}_{\mathrm{Between}} = \frac{\mathit{SS}_{\mathrm{Between}}}{g-1} = \frac{50.10}{3-1} = 25.05\;, \end{aligned}$$

the mean-square within treatments is

$$\displaystyle \begin{aligned} \mathit{MS}_{\mathrm{Within}} = \frac{\mathit{SS}_{\mathrm{Within}}}{N-g} = \frac{6.00}{10-3} = 0.8571\;, \end{aligned}$$

and the observed value of Fisher’s F-ratio test statistic is

$$\displaystyle \begin{aligned} F = \frac{\mathit{MS}_{\mathrm{Between}}}{\mathit{MS}_{\mathrm{Within}}} = \frac{25.05}{0.8571} = 29.2250\;. \end{aligned}$$

The essential factors, sums of squares (SS), degrees of freedom (df), mean squares (MS), and variance-ratio test statistic (F) are summarized in Table 8.2.

Table 8.1 Example data for a test of g = 3 independent samples with N = 10 observations

Full size table

Table 8.2 Source table for the example data listed in Table 8.1

Full size table

Under the Neyman–Pearson null hypothesis, H ₀: μ ₁ = μ ₂ = μ ₃, Fisher’s F-ratio test statistic is asymptotically distributed as Snedecor’s F with ν ₁ = g − 1 and ν ₂ = N − g degrees of freedom. With ν ₁ = g − 1 = 3 − 1 = 2 and ν ₂ = N − g = 10 − 3 = 7 degrees of freedom, the asymptotic probability value of F = 29.2250 is P = 0.4001×10⁻³, under the assumptions of normality and homogeneity.

8.4.1 An Exact Analysis with v = 2

For the first permutation analysis of the example data listed in Table 8.1 let v = 2, employing squared Euclidean scaling, and let the treatment-group weights be given by

$$\displaystyle \begin{aligned} C_{i} = \frac{n_{i}-1}{N-g}\;, \qquad i = 1,\,\ldots,\,g\;, \end{aligned}$$

for correspondence with Fisher’s F-ratio test statistic.

Because there are only

$$\displaystyle \begin{aligned} M = \frac{N!}{\displaystyle\prod_{i=1}^{g}n_{i}!} = \frac{10!}{3!\;3!\;4!} = 4200 \end{aligned}$$

possible, equally-likely arrangements in the reference set of all permutations of the N = 10 observations listed in Table 8.1, an exact permutation analysis is feasible. While M = 4200 arrangements are too many to list, Table 8.3 illustrates the calculation of the ξ, δ, and F values for a small sample of the M possible arrangements of the N = 10 observations listed in Table 8.1.

Table 8.3 Sample arrangements of the example data listed in Table 8.1 with associated ξ ₁, ξ ₂, δ, and F values

Full size table

Following Eq. (8.1) on p. 261, the N = 10 observations yield g = 3 average distance-function values of

$$\displaystyle \begin{aligned} \xi_{i} = \xi_{2} = 2.00 \quad \mbox{and} \quad \xi_{3} = 1.3333\;. \end{aligned}$$

Alternatively, in terms of a one-way analysis of variance model the average distance-function values are $\xi _{1} = 2s_{1}^{2} = 2(1.00) = 2.00$, $\xi _{2} = 2s_{2}^{2} = 2(1.00) = 2.00$, and $\xi _{3} = 2s_{3}^{2} = 2(0.6667) = 1.3333$.

Following Eq. (8.1) on p. 260, the observed value of the permutation test statistic based on v = 2 and treatment-group weights

$$\displaystyle \begin{aligned} C_{i} = \frac{n_{i}-1}{N-g}\;, \qquad i = 1,2,3\;, \end{aligned}$$

is

$$\displaystyle \begin{aligned} \begin{array}{rcl} {} \delta = \sum_{i=1}^{g} C_{i} \xi_{i} = \frac{1}{10-3} \big[(3-1)(2.00)&\displaystyle +&\displaystyle (3-1)(2.00)\\ &\displaystyle &\displaystyle \qquad {}+(4-1)(1.3333)\big] = 1.7143\;. \end{array} \end{aligned} $$

Alternatively, in terms of a one-way analysis of variance model the permutation test statistic is

$$\displaystyle \begin{aligned} \delta = 2\mathit{MS}_{\mathrm{Within}} = 2(0.8571) = 1.7143\;. \end{aligned}$$

For the example data listed in Table 8.1, the sum of the N = 10 observations is

$$\displaystyle \begin{aligned} \sum_{i=1}^{N}x_{i} = 2+3+4+3+4+5+7+8+8+9 = 53\;, \end{aligned}$$

the sum of the N = 10 squared observations is

$$\displaystyle \begin{aligned} \sum_{i=1}^{N}x_{i}^{2} = 2^{2}+3^{2}+4^{2}+3^{2}+4^{2}+5^{2}+7^{2}+8^{2}+8^{2}+9^{2} = 337\;, \end{aligned}$$

and the total sum-of-squares is

$$\displaystyle \begin{aligned} \begin{array}{rcl} {} \mathit{SS}_{\mathrm{Total}} = \sum_{i=1}^{N}\big( x_{i}-\bar{\bar{x}} \big)^{2} = \sum_{i=1}^{N}x_{i}^{2}&\displaystyle -&\displaystyle \left( \sum_{i=1}^{N} x_{i} \right)^{2} \left/ \rule{0pt}{14pt} N \right.\\ &\displaystyle &\displaystyle \qquad \qquad {}= 337-(53)^{2}/10 = 56.10\;, \end{array} \end{aligned} $$

where $\bar {\bar {x}}$ denotes the grand mean of all N = 10 observations. Then following the expressions given in Eq. (8.5) on p. 262 for test statistics δ and F, the observed value of test statistic δ with respect to test statistic F is

$$\displaystyle \begin{aligned} \delta = \frac{2 \mathit{SS}_{\mathrm{Total}}}{N-g+(g-1)F} {}= \frac{2(56.10)}{10-3+(3-1)(29.2250)} = 1.7143 \end{aligned}$$

and the observed value of test statistic F with respect to test statistic δ is

$$\displaystyle \begin{aligned} F = \frac{2 \mathit{SS}_{\mathrm{Total}}}{(g-1)\delta}-\frac{N-g}{g-1} = \frac{2(56.10)}{(3-1)(1.7143)}-\frac{10-3}{3-1} = 29.2250\;. \end{aligned}$$

Under the Fisher–Pitman permutation model, the exact probability of an observed δ is the proportion of δ test statistic values computed on all possible, equally-likely arrangements of the N = 10 observations listed in Table 8.1 that are equal to or less than the observed value of δ = 1.7143. There are exactly 10 δ test statistic values that are equal to or less than the observed value of δ = 1.7143. If all M arrangements of the N = 10 observations listed in Table 8.1 occur with equal chance under the Fisher–Pitman null hypothesis, the exact probability value of δ = 1.7143 computed on all M = 4200 arrangements of the observed data with n ₁ = n ₂ = 3 and n ₃ = 4 preserved for each arrangement is

$$\displaystyle \begin{aligned} P \big( \delta \leq \delta_{\text{o}}|H_{0} \big) = \frac{\text{number of }\delta\text{ values } \leq \delta_{\text{o}}}{M} = \frac{10}{4200} = 0.2381 {\times} 10^{-2}\;, \end{aligned}$$

where δ _o denotes the observed value of test statistic δ and M is the number of possible, equally-likely arrangements of the N = 10 observations listed in Table 8.1.

Alternatively, there are only 10 F values that are larger than the observed value of F = 29.2250. Thus, if all arrangements of the observed data occur with equal chance, the exact probability value of F = 29.2250 under the Fisher–Pitman null hypothesis is

$$\displaystyle \begin{aligned} P \big( F \geq F_{\text{o}}|H_{0} \big) = \frac{\text{number of }F\text{ values } \geq F_{\text{o}}}{M} = \frac{10}{4200} = 0.2381 {\times} 10^{-2}\;, \end{aligned}$$

where F _o denotes the observed value of test statistic F.

Following Eq. (8.7) on p. 263, the exact expected value of the M = 4200 δ test statistic values under the Fisher–Pitman null hypothesis is

$$\displaystyle \begin{aligned} \mu_{\delta} = \frac{1}{M}\sum_{i=1}^{M}\delta_{i} = \frac{52{,}360}{4200} = 12.4667\;. \end{aligned}$$

Alternatively, in terms of a one-way analysis of variance model the exact expected value of test statistic δ is

$$\displaystyle \begin{aligned} \mu_{\delta} = \frac{2\mathit{SS}_{\mathrm{Total}}}{N-1} = \frac{2(56.10)}{10-1} = 12.4667\;. \end{aligned}$$

Following Eq. (8.6) on p. 263, the observed chance-corrected measure of effect size is

$$\displaystyle \begin{aligned} \Re = 1-\frac{\delta}{\mu_{\delta}} = 1-\frac{1.7143}{12.4667} = +0.8625\;, \end{aligned}$$

indicating approximately 86% within-group agreement above what is expected by chance. Alternatively, in terms of a one-way analysis of variance model the chance-corrected measure of effect size is

$$\displaystyle \begin{aligned} \begin{array}{rcl} {} \Re = 1-\frac{\delta}{\mu_{\delta}} = 1-\frac{2\mathit{MS}_{\mathrm{Within}}}{\displaystyle\frac{2\mathit{SS}_{\mathrm{Total}}}{N-1}} &\displaystyle =&\displaystyle 1-\frac{(N-1)(\mathit{MS}_{\mathrm{Within}})}{\mathit{SS}_{\mathrm{Total}}}\\ &\displaystyle &\displaystyle \quad {}= 1-\frac{(10-1)(0.8571)}{56.10} = +0.8625\;. \end{array} \end{aligned} $$

8.5 Example 2: Measures of Effect Size

Measures of effect size express the practical or clinical significance of differences among multiple independent sample means, as contrasted with the statistical significance of differences. Five measures of effect size are commonly used for determining the magnitude of treatment effects for multiple independent samples: Cohen’s $\hat {d}$, Pearson’s η ², Kelley’s $\hat {\eta }^{2}$, Hays’ $\hat {\omega }_{F}^{2}$ for fixed models, and Hays’ $\hat {\omega }_{R}^{2}$, for random models. Cohen’s $\hat {d}$ measure of effect size is given by

$$\displaystyle \begin{aligned} \hat{d} =\left[ \frac{1}{g-1} \left( \frac{\mathit{SS}_{\mathrm{Between}}}{n\mathit{MS}_{\mathrm{Within}}} \right) \right]^{1/2} = \left[ \frac{F}{n} \,\right]^{1/2}\;, \end{aligned}$$

where n denotes the common size of each treatment group. Pearson’s η ² measure of effect size is given by

$$\displaystyle \begin{aligned} \eta^{2} = \frac{\mathit{SS}_{\mathrm{Between}}}{\mathit{SS}_{\mathrm{Total}}} = 1-\frac{N-g}{F(g-1)+N-g}\;, \end{aligned}$$

which is equivalent to Pearson’s r ² for a one-way analysis of variance design. Kelley’s “unbiased” correlation ratio is given by^{Footnote 3}

$$\displaystyle \begin{aligned} \hat{\eta}^{2} = \frac{\mathit{SS}_{\mathrm{Total}}-(N-1)\mathit{MS}_{\mathrm{Within}}}{\mathit{SS}_{\mathrm{Total}}} = 1-\frac{N-1}{F(g-1)+N-g}\;, \end{aligned}$$

which is equivalent to an adjusted or “shrunken” squared multiple correlation coefficient reported by most computer statistical packages and given by

$$\displaystyle \begin{aligned} \hat{\eta}^{2} = R_{\text{adj}}^{2} = 1-\frac{(1-R^{2})(N-1)}{N-p-1}\;, \end{aligned}$$

where R ² is the squared product-moment multiple correlation coefficient and p is the number of predictors. Hays’ $\hat {\omega }_{\text{F}}^{2}$ measure of effect size for a fixed-effects analysis of variance model is given by

$$\displaystyle \begin{aligned} \hat{\omega}_{\text{F}}^{2} = \frac{\mathit{SS}_{\mathrm{Between}}-(g-1)\mathit{MS}_{\mathrm{Within}}}{\mathit{SS}_{\mathrm{Total}}+\mathit{MS}_{\mathrm{Within}}} = 1-\frac{N}{(F-1)(g-1)+N}\;. \end{aligned}$$

Hays’ $\hat {\omega }_{\text{R}}^{2}$ measure of effect size for a random-effects analysis of variance model is given by

$$\displaystyle \begin{aligned} \hat{\omega}_{\text{R}}^{2} = \frac{\mathit{MS}_{\mathrm{Between}}-\mathit{MS}_{\mathrm{Within}}}{\mathit{MS}_{\mathrm{Between}}+(n-1)\mathit{MS}_{\mathrm{Within}}} = 1-\frac{n}{F+n-1}\;, \end{aligned}$$

where n denotes the common size of each treatment group. Mielke and Berry’s $\Re $ chance-corrected measure of effect size is given by

$$\displaystyle \begin{aligned} \Re = 1-\frac{\delta}{\mu_{\delta}}\;, \end{aligned}$$

where δ is defined in Eq. (8.1) on p. 261 and μ _δ is the exact expected value of δ under the Fisher–Pitman null hypothesis given by

$$\displaystyle \begin{aligned} \mu_{\delta} = \frac{1}{M}\sum_{i=1}^{M}\delta_{i}\;, \end{aligned}$$

where, for a test of g ≥ 3 independent samples, the number of possible, equally-likely arrangements of the observed data is given by

$$\displaystyle \begin{aligned} M = \frac{N!}{\displaystyle\prod_{i=1}^{g}n_{i}!}\;. \end{aligned}$$

For the example data listed in Table 8.1 on p. 263 for N = 10 observations, Cohen’s $\hat {d}$ measure of effect size is^{Footnote 4}

$$\displaystyle \begin{aligned} \hat{d} =\left[ \frac{1}{g-1} \left( \frac{\mathit{SS}_{\mathrm{Between}}}{\bar{n}\mathit{MS}_{\mathrm{Within}}} \right) \right]^{1/2} = \left[ \frac{F}{\bar{n}} \,\right]^{1/2} = \left[ \frac{29.2250}{3.3333} \right]^{1/2} = \pm 2.9610\;. \end{aligned}$$

Pearson’s r ² measure of effect size is usually labeled as η ² when reported with an analysis of variance. For the example data listed in Table 8.1, η ² is

$$\displaystyle \begin{aligned} \begin{array}{rcl} \eta^{2} = \frac{\mathit{SS}_{\mathrm{Between}}}{\mathit{SS}_{\mathrm{Total}}} &\displaystyle =&\displaystyle 1-\frac{N-g}{F(g-1)+N-g}\\ &\displaystyle &\displaystyle \qquad \qquad {}= 1-\frac{10-3}{(29.2250)(3-1)+10-3} = 0.8930\;, \end{array} \end{aligned} $$

Kelley’s $\hat {\eta }^{2}$ measure of effect size is

$$\displaystyle \begin{aligned} \begin{array}{rcl} &\displaystyle &\displaystyle \hat{\eta}^{2} = \frac{\mathit{SS}_{\mathrm{Total}}-(N-1)\mathit{MS}_{\mathrm{Within}}}{\mathit{SS}_{\mathrm{Total}}} = 1-\frac{N-1}{F(g-1)+N-g}\\ &\displaystyle &\displaystyle \qquad \qquad \qquad \qquad \quad \qquad {}= 1-\frac{10-1}{(29.2250)(3-1)+10-3} = 0.8625\;, \end{array} \end{aligned} $$

Hays’ $\hat {\omega }_{\text{F}}^{2}$ measure of effect size for a fixed-effects analysis of variance model is

$$\displaystyle \begin{aligned} \begin{array}{rcl} &\displaystyle &\displaystyle \hat{\omega}_{\text{F}}^{2} = \frac{\mathit{SS}_{\mathrm{Between}}-(g-1)\mathit{MS}_{\mathrm{Within}}}{\mathit{SS}_{\mathrm{Total}}+\mathit{MS}_{\mathrm{Within}}} = 1-\frac{N}{(F-1)(g-1)+N}\\ &\displaystyle &\displaystyle \qquad \qquad \qquad \quad \qquad \qquad {}= 1-\frac{10}{(29.2250-1)(3-1)+10} = 0.8495\;, \end{array} \end{aligned} $$

Hays’ $\hat {\omega }_{\text{R}}^{2}$ measure of effect size for a random-effects analysis of variance model is^{Footnote 5}

$$\displaystyle \begin{aligned} \begin{array}{rcl} {} &\displaystyle &\displaystyle \hat{\omega}_{\text{R}}^{2} = \frac{\mathit{MS}_{\mathrm{Between}}-\mathit{MS}_{\mathrm{Within}}}{\mathit{MS}_{\mathrm{Between}}+(\bar{n}-1)\mathit{MS}_{\mathrm{Within}}} = 1-\frac{\bar{n}}{F+\bar{n}-1}\\ &\displaystyle &\displaystyle \qquad \qquad \qquad \quad \ \ \qquad \qquad \qquad {}= 1-\frac{3.3333}{29.2250+3.3333-1} = 0.8944\;, \end{array} \end{aligned} $$

and Mielke and Berry’s $\Re $ chance-corrected measure of effect size is

$$\displaystyle \begin{aligned} \Re = 1-\frac{\delta}{\mu_{\delta}} = 1-\frac{1.7143}{12.4667} = +0.8625\;, \end{aligned}$$

where the exact expected value of test statistic δ under the Fisher–Pitman null hypothesis is

$$\displaystyle \begin{aligned} \mu_{\delta} = \frac{1}{M}\sum_{i=1}^{M}\delta_{i} = \frac{52{,}360}{4200} = 12.4667\;. \end{aligned}$$

It can easily be shown that Mielke and Berry’s $\Re $ chance-corrected measure of effect size is identical to Kelley’s $\hat {\eta }^{2}$ measure of effect size for a one-way, completely-randomized analysis of variance design, under the Neyman–Pearson population model.

8.5.1 Comparisons of Effect Size Measures

In this section the various measures of effect size are compared and contrasted. Because Pearson’s r ² and η ² are equivalent and Kelley’s $\hat {\eta }^{2}$ and Mielke and Berry’s $\Re $ are equivalent for multi-sample designs, only η ² and $\Re $ are utilized for the comparisons. The functional relationships between Cohen’s $\hat {d}$ measure of effect size and Pearson’s η ² (r ²) measure of effect size for g ≥ 3 independent samples are given by

$$\displaystyle \begin{aligned} \hat{d} = \left[ \frac{\eta^{2}(N-g)}{n(g-1)(1-\eta^{2})} \right]^{1/2} \quad \mbox{and} \quad \eta^{2} = 1-\frac{N-g}{n\hat{d}^{2}(g-1)+N-g}\;, \end{aligned} $$

(8.8)

where n denotes the common treatment-group size. The relationships between Cohen’s $\hat {d}$ measure of effect size and Mielke and Berry’s $\Re $ ($\hat {\eta }^{2}$) chance-corrected measure of effect size are given by

$$\displaystyle \begin{aligned} \hat{d} = \left[ \frac{\Re(N-g)+g-1}{n(g-1)(1-\Re)} \right]^{1/2} \quad \mbox{and} \quad \Re = 1-\frac{N-1}{n\hat{d}^{2}(g-1)+N-g}\;. \end{aligned} $$

(8.9)

The relationships between Cohen’s $\hat {d}$ measure of effect size and Hays’ $\hat {\omega }_{\text{F}}^{2}$ measure of effect size for a fixed-effects model are given by

$$\displaystyle \begin{aligned} \hat{d} = \left[ \frac{(N-g+1)\hat{\omega}_{\text{F}}^{2}+g-1}{n(g-1)(1-\hat{\omega}_{\text{F}}^{2})} \right]^{1/2} \end{aligned} $$

(8.10)

and

$$\displaystyle \begin{aligned} \hat{\omega}_{\text{F}}^{2} = 1-\frac{N}{(n\hat{d}^{2}-1)(g-1)+N}\;. \end{aligned} $$

(8.11)

The relationships between Cohen’s $\hat {d}$ measure of effect size and Hays’ $\hat {\omega }_{\text{R}}^{2}$ measure of effect size for a random-effects model are given by

$$\displaystyle \begin{aligned} \hat{d} = \left[ \frac{\hat{\omega}_{\text{R}}^{2}(n-1)+1}{n(1-\hat{\omega}_{\text{R}}^{2})} \right]^{1/2} \quad \mbox{and} \quad \hat{\omega}_{\text{R}}^{2} = 1-\frac{n}{n(\hat{d}^{2}+1)-1}\;. \end{aligned} $$

(8.12)

The relationships between Pearson’s η ² (r ²) measure of effect size and Mielke and Berry’s $\Re $ ($\hat {\eta }^{2}$) measure of effect size are given by

$$\displaystyle \begin{aligned} \eta^{2} = 1-\frac{(N-g)(1-\Re)}{N-1} \quad \mbox{and} \quad \Re = 1-\frac{(N-1)(1-\eta^{2})}{N-g}\;. \end{aligned} $$

(8.13)

The relationships between Pearson’s η ² (r ²) measure of effect size and Hays’ $\hat {\omega }_{\text{F}}^{2}$ measure of effect size for a fixed-effects model are given by

$$\displaystyle \begin{aligned} \eta^{2} = \frac{(N-g+1)\hat{\omega}_{\text{F}}^{2}+g-1}{N+\hat{\omega}_{\text{F}}^{2}-1} \end{aligned} $$

(8.14)

and

$$\displaystyle \begin{aligned} \hat{\omega}_{\text{F}}^{2} = \frac{\eta^{2}(N-1)-g+1}{N-\eta^{2}-g+1}\;. \end{aligned} $$

(8.15)

The relationships between Pearson’s η ² (r ²) measure of effect size and Hays’ $\hat {\omega }_{\text{R}}^{2}$ measure of effect size for a random-effects model are given by

$$\displaystyle \begin{aligned} \eta^{2} = 1-\frac{(N-g)(1-\hat{\omega}_{\text{R}}^{2})}{(g-1)[\hat{\omega}_{\text{R}}^{2}(n-1)+1]+(N-g)(1-\hat{\omega}_{\text{R}}^{2})} \end{aligned} $$

(8.16)

and

$$\displaystyle \begin{aligned} \hat{\omega}_{\text{R}}^{2} = \frac{\eta^{2}(N-1)-g+1}{(N-g)\eta^{2}+(g-1)(1-\eta^{2})(n-1)}\;. \end{aligned} $$

(8.17)

The relationships between Mielke and Berry’s $\Re $ ($\hat {\eta }^{2}$) measure of effect size and Hays’ $\hat {\omega }_{\text{F}}^{2}$ measure of effect size for a fixed-effects model are given by

$$\displaystyle \begin{aligned} \Re = \frac{N\hat{\omega}_{\text{F}}^{2}}{N+\hat{\omega}_{\text{F}}^{2}-1} \quad \mbox{and} \quad \hat{\omega}_{\text{F}}^{2} = \frac{\Re(N-1)}{N-\Re}\;. \end{aligned} $$

(8.18)

The relationships between Mielke and Berry’s $\Re $ ($\hat {\eta }^{2}$) measure of effect size and Hays’ $\hat {\omega }_{\text{R}}^{2}$ measure of effect size for a random-effects model are given by

$$\displaystyle \begin{aligned} \Re = 1-\frac{(N-1)(1-\hat{\omega}_{\text{R}}^{2})}{n\hat{\omega}_{\text{R}}^{2}(g-1)+(N-1))(1-\hat{\omega}_{\text{R}}^{2})} \end{aligned} $$

(8.19)

and

$$\displaystyle \begin{aligned} \hat{\omega}_{\text{R}}^{2} = \frac{\hat{\eta}^{2}(N-1)}{N\Re-1+(1-\Re)[n(g-1)+1]}\;. \end{aligned} $$

(8.20)

And the relationships between Hays’ $\hat {\omega }_{\text{F}}^{2}$ measure of effect size for a fixed-effects model and Hays’ $\hat {\omega }_{\text{R}}^{2}$ measure of effect size for a random-effects model are given by

$$\displaystyle \begin{aligned} \hat{\omega}_{\text{F}}^{2} = \frac{n\hat{\omega}_{\text{R}}^{2}(g-1)}{n\hat{\omega}_{\text{R}}^{2}+N(1-\hat{\omega}_{\text{R}}^{2})} \quad \mbox{and} \quad \hat{\omega}_{\text{R}}^{2} = \frac{N\hat{\omega}_{\text{F}}^{2}}{N\hat{\omega}_{\text{F}}^{2}-n(g-1)(1-\hat{\omega}_{\text{F}}^{2})}\;. \end{aligned} $$

(8.21)

8.5.2 Example Comparisons of Effect Size Measures

In this section comparisons of Cohen’s $\hat {d}$, Pearson’s η ², Mielke and Berry’s $\Re $, Hays’ $\hat {\omega }_{\text{F}}^{2}$, and Hays’ $\hat {\omega }_{\text{R}}^{2}$ measures of effect size are illustrated with the example data listed in Table 8.1 on p. 263 with n ₁ = n ₂ = 3, n ₃ = 4, and N = n ₁ + n ₂ + n ₃ = 3 + 3 + 4 = 10 observations. Because the treatment-group sizes are unequal, the ns in the equations for Cohen’s $\hat {d}$ and Hays’ $\hat {\omega }_{\text{R}}^{2}$ are replaced with a simple average; that is, $\bar {n} = (3+3+4)/3 = 3.3333$.

Given the example data listed in Table 8.1 and following the expressions given in Eq. (8.8) for Cohen’s $\hat {d}$ measure of effect size and Pearson’s η ² (r ²) measure of effect size, the observed value for Cohen’s $\hat {d}$ measure of effect size with respect to the observed value of Pearson’s η ² (r ²) measure of effect size is

$$\displaystyle \begin{aligned} \hat{d} = \left[ \frac{\eta^{2}(N-g)}{\bar{n}(g-1)(1-\eta^{2})} \right]^{1/2} = \left[ \frac{(0.8930)(10-3)}{(3.3333)(3-1)(1-0.8930)} \right]^{1/2} = \pm 2.9610 \end{aligned}$$

and the observed value for Pearson’s η ² (r ²) measure of effect size with respect to the observed value of Cohen’s $\hat {d}$ measure of effect size is

$$\displaystyle \begin{aligned} \begin{array}{rcl} &\displaystyle &\displaystyle \eta^{2} = 1-\frac{N-g}{\bar{n}\hat{d}^{\,2}(g-1)+N-g}\\ &\displaystyle &\displaystyle \qquad \qquad \quad \qquad \qquad {}= 1-\frac{10-3}{(3.3333)(2.9610)^{2}(3-1)+10-3} = 0.8930\;. \end{array} \end{aligned} $$

Following the expressions given in Eq. (8.9) for Cohen’s $\hat {d}$ measure of effect size and Mielke and Berry’s $\Re $ ($\hat {\eta }^{2}$) measure of effect size, the observed value for Cohen’s $\hat {d}$ measure of effect size with respect to the observed value of Mielke and Berry’s $\Re $ ($\hat {\eta }^{2}$) measure of effect size is

$$\displaystyle \begin{aligned} \begin{array}{rcl} &\displaystyle &\displaystyle \hat{d} = \left[ \frac{\Re(N-g)+g-1}{\bar{n}(g-1)(1-\Re)} \right]^{1/2}\\ &\displaystyle &\displaystyle \qquad \qquad \qquad \qquad \qquad {}= \left[ \frac{0.8625(10-3)+3-1}{(3.3333)(3-1)(1-0.8625)} \right]^{1/2} = \pm 2.9610 \end{array} \end{aligned} $$

and the observed value for Mielke and Berry’s $\Re $ ($\hat {\eta }^{2}$) measure of effect size with respect to the observed value of Cohen’s $\hat {d}$ measure of effect size is

$$\displaystyle \begin{aligned} \begin{array}{rcl} &\displaystyle &\displaystyle \Re = 1-\frac{N-1}{\bar{n}\hat{d}^{\,2}(g-1)+N-g}\\ &\displaystyle &\displaystyle \qquad \qquad \qquad \qquad {}= 1-\frac{10-1}{(3.3333)(2.9610)^{2}(3-1)+10-3} = +0.8625\;. \end{array} \end{aligned} $$

Following the expressions given in Eqs. (8.10) and (8.11) for Cohen’s $\hat {d}$ measure of effect size and Hays’ $\hat {\omega }_{\text{F}}^{2}$ measure of effect size for a fixed-effects model, the observed value for Cohen’s $\hat {d}$ measure of effect size with respect to the observed value of Hays’ $\hat {\omega }_{\text{F}}^{2}$ measure of effect size is

$$\displaystyle \begin{aligned} \begin{array}{rcl} &\displaystyle &\displaystyle \hat{d} = \left[ \frac{(N-g+1)\hat{\omega}_{\text{F}}^{2}+g-1}{\bar{n}(g-1)(1-\hat{\omega}_{\text{F}}^{2})} \right]^{1/2}\\ &\displaystyle &\displaystyle \qquad \qquad \qquad \qquad \qquad {}= \left[ \frac{(10-3+1)(0.8495)+3-1}{(3.3333)(3-1)(1-0.8495)} \right]^{1/2} = \pm 2.9610 \end{array} \end{aligned} $$

and the observed value for Hays’ $\hat {\omega }_{\text{F}}^{2}$ measure of effect size with respect to the observed value of Cohen’s $\hat {d}$ measure of effect size is

$$\displaystyle \begin{aligned} \begin{array}{rcl} {} &\displaystyle &\displaystyle \hat{\omega}_{\text{F}}^{2} = 1-\frac{N}{(\bar{n}\hat{d}^{\,2}-1)(g-1)+N}\\ &\displaystyle &\displaystyle \qquad \qquad \qquad \qquad {}= 1-\frac{10}{[(3.3333)(2.9610)^{2}-1](3-1)+10} = 0.8495 \;. \end{array} \end{aligned} $$

Following the expressions given in Eq. (8.12) for Cohen’s $\hat {d}$ measure of effect size and Hays’ $\hat {\omega }_{\text{R}}^{2}$ measure of effect size for a random-effects model, the observed value for Cohen’s $\hat {d}$ measure of effect size with respect to the observed value of Hays’ $\hat {\omega }_{\text{R}}^{2}$ measure of effect size is

$$\displaystyle \begin{aligned} \hat{d} = \left[ \frac{\hat{\omega}_{\text{R}}^{2}(\bar{n}-1)+1}{\bar{n}(1-\hat{\omega}_{\text{R}}^{2})} \right]^{1/2} = \left[ \frac{(0.8944)(3.3333-1)+1}{(3.3333)(1-0.8944)} \right]^{1/2} = \pm 2.9610 \end{aligned}$$

and the observed value of Hays’ $\hat {\omega }_{\text{R}}^{2}$ measure of effect size with respect to the observed value of Cohen’s $\hat {d}$ measure of effect size is

$$\displaystyle \begin{aligned} \hat{\omega}_{\text{R}}^{2} = 1-\frac{\bar{n}}{\bar{n}(\hat{d}^{\,2}+1)-1} = 1-\frac{3.3333}{(3.3333)[(2.9610)^{2}+1]-1} = 0.8944\;. \end{aligned}$$

Following the expressions given in Eq. (8.13) for Pearson’s η ² (r ²) measure of effect size and Mielke and Berry’s $\Re $ ($\hat {\eta }^{2}$) measure of effect size, the observed value for Pearson’s η ² (r ²) measure of effect size with respect to the observed value of Mielke and Berry’s $\Re $ ($\hat {\eta }^{2}$) measure of effect size is

$$\displaystyle \begin{aligned} \eta^{2} = 1-\frac{(N-g)(1-\Re)}{N-1} = 1-\frac{(10-3)(1-0.8625)}{10-1} = 0.8930 \end{aligned}$$

and the observed value for Mielke and Berry’s $\Re $ ($\hat {\eta }^{2}$) measure of effect size with respect to the observed value of Pearson’s η ² (r ²) measure of effect size is

$$\displaystyle \begin{aligned} \Re = 1-\frac{(N-1)(1-\eta^{2})}{N-g} = 1-\frac{(10-1)(1-0.8930)}{10-3} = +0.8625 \;. \end{aligned}$$

Following the expressions given in Eqs. (8.14) and (8.15) for Pearson’s η ² (r ²) measure of effect size and Hays’ $\hat {\omega }_{\text{F}}^{2}$ measure of effect size for a fixed-effects model, the observed value for Pearson’s η ² (r ²) measure of effect size with respect to the observed value of Hays’ $\hat {\omega }_{\text{F}}^{2}$ measure of effect size is

$$\displaystyle \begin{aligned} \eta^{2} = \frac{(N-g+1)\hat{\omega}_{\text{F}}^{2}+g-1}{N+\hat{\omega}_{\text{F}}^{2}-1} = \frac{(10-3+1)(0.8495)+3-1}{10+0.8495-1} = 0.8930 \end{aligned}$$

and the observed value for Hays’ $\hat {\omega }_{\text{F}}^{2}$ measure of effect size with respect to the observed value of Pearson’s η ² (r ²) measure of effect size is

$$\displaystyle \begin{aligned} \hat{\omega}_{\text{F}}^{2} = \frac{\eta^{2}(N-1)-g+1}{N-\eta^{2}-g+1} = \frac{(0.8930)(10-1)-3+1}{10-0.8930-3+1} = 0.8495\;. \end{aligned}$$

Following the expressions given in Eqs. (8.16) and (8.17) for Pearson’s η ² (r ²) measure of effect size and Hays’ $\hat {\omega }_{\text{R}}^{2}$ measure of effect size for a random-effects model, the observed value for Pearson’s η ² (r ²) measure of effect size with respect to the observed value of Hays’ $\hat {\omega }_{\text{R}}^{2}$ measure of effect size is

$$\displaystyle \begin{aligned} \begin{array}{rcl} \eta^{2} &\displaystyle =&\displaystyle 1-\frac{(N-g)(1-\hat{\omega}_{\text{R}}^{2})}{(g-1)[\hat{\omega}_{\text{R}}^{2}(\bar{n}-1)+1]+(N-g)(1-\hat{\omega}_{\text{R}}^{2})}\\ &\displaystyle &\displaystyle {}= 1-\frac{(10-3)(1-0.8944)}{(3-1)[(0.8944)(3.3333-1)+1]+(10-3)(1-0.8944)}\\ &\displaystyle &\displaystyle \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad {}= 0.8930 \end{array} \end{aligned} $$

and the observed value for Hays’ $\hat {\omega }_{\text{R}}^{2}$ measure of effect size with respect to the observed value of Pearson’s η ² (r ²) measure of effect size is

$$\displaystyle \begin{aligned} \begin{array}{rcl} {} \hat{\omega}_{\text{R}}^{2} &\displaystyle =&\displaystyle \frac{\eta^{2}(N-1)-g+1}{(N-g)\eta^{2}+(g-1)(1-\eta^{2})(\bar{n}-1)}\\ &\displaystyle &\displaystyle \quad {}= \frac{0.8930(10-1)-3+1}{(10-3)(0.8930)+(3-1)(1-0.8930)(3.3333-1)} = 0.8944\;. \end{array} \end{aligned} $$

Following the expressions given in Eq. (8.18) for Mielke and Berry’s $\Re $ ($\hat {\eta }^{2}$) measure of effect size and Hays’ $\hat {\omega }_{\text{F}}^{2}$ measure of effect size for a fixed-effects model, the observed value for Mielke and Berry’s $\Re $ ($\hat {\eta }^{2}$) measure of effect size with respect to the observed value of Hays’ $\hat {\omega }_{\text{F}}^{2}$ measure of effect size is

$$\displaystyle \begin{aligned} \Re = \frac{N\hat{\omega}_{\text{F}}^{2}}{N+\hat{\omega}_{\text{F}}^{2}-1} = \frac{(10)(0.8495)}{10+0.8495 -1} = +0.8625 \end{aligned}$$

and the observed value for Hays’ $\hat {\omega }_{\text{F}}^{2}$ measure of effect size with respect to the observed value of Mielke and Berry’s $\Re $ ($\hat {\eta }^{2}$) measure of effect size is

$$\displaystyle \begin{aligned} \hat{\omega}_{\text{F}}^{2} = \frac{\Re(N-1)}{N-\Re} = \frac{(0.8625)(10-1)}{10-0.8625} = 0.8495\;. \end{aligned}$$

Following the expressions given in Eqs. (8.19) and (8.20) for Mielke and Berry’s $\Re $ ($\hat {\eta }^{2}$) measure of effect size and Hays’ $\hat {\omega }_{\text{R}}^{2}$ measure of effect size for a random-effects model, the observed value for Mielke and Berry’s $\Re $ ($\hat {\eta }^{2}$) measure of effect size with respect to the observed value of Hays’ $\hat {\omega }_{\text{R}}^{2}$ measure of effect size is

$$\displaystyle \begin{aligned} \begin{array}{rcl} \Re &\displaystyle =&\displaystyle 1-\frac{(N-1)(1-\hat{\omega}_{\text{R}}^{2})}{\bar{n}\hat{\omega}_{\text{R}}^{2}(g-1)+(N-1)(1-\hat{\omega}_{\text{R}}^{2})}\\ &\displaystyle &\displaystyle \quad {}= 1-\frac{(10-1)(1-0.8944)}{(3.3333)(0.8944)(3-1)+(10-1)(1-0.8944)} = +0.8625 \end{array} \end{aligned} $$

and the observed value for Hays’ $\hat {\omega }_{\text{R}}^{2}$ measure of effect size with respect to the observed value for Mielke and Berry’s $\Re $ ($\hat {\eta }^{2}$) measure of effect size is

$$\displaystyle \begin{aligned} \begin{array}{rcl} \hat{\omega}_{\text{R}}^{2} &\displaystyle =&\displaystyle \frac{\hat{\eta}^{2}(N-1)}{N\Re-1+(1-\Re)[\bar{n}(g-1)+1]}\\ &\displaystyle &\displaystyle {}= \frac{(0.8625)(10-1)}{(10)(0.8625)-1+(1-0.8625)[(3.3333)(3-1)+1]} = 0.8944\;. \end{array} \end{aligned} $$

Following the expressions given in Eq. (8.21) for Hays’ $\hat {\omega }_{\text{F}}^{2}$ measure of effect size for a fixed-effects model and Hays’ $\hat {\omega }_{\text{R}}^{2}$ measure of effect size for a random-effects model, the observed value for Hays’ $\hat {\omega }_{\text{F}}^{2}$ measure of effect size with respect to the observed value of Hays’ $\hat {\omega }_{\text{R}}^{2}$ measure of effect size is

$$\displaystyle \begin{aligned} \begin{array}{rcl} &\displaystyle &\displaystyle \hat{\omega}_{\text{F}}^{2} = \frac{\bar{n}\hat{\omega}_{\text{R}}^{2}(g-1)}{\bar{n}\hat{\omega}_{\text{R}}^{2}(g-1)+N(1-\hat{\omega}_{\text{R}}^{2})}\\ &\displaystyle &\displaystyle \qquad \qquad \qquad \quad {}= \frac{(3.3333)(0.8944)(3-1)}{(3.3333)(0.8944)(3-1)+(10)(1-0.8944)} = 0.8495 \end{array} \end{aligned} $$

and the observed value for Hays’ $\hat {\omega }_{\text{R}}^{2}$ measure of effect size with respect to the observed value of Hays’ $\hat {\omega }_{\text{F}}^{2}$ measure of effect size is

$$\displaystyle \begin{aligned} \begin{array}{rcl} {} &\displaystyle &\displaystyle \hat{\omega}_{\text{R}}^{2} = \frac{N\hat{\omega}_{\text{F}}^{2}}{N\hat{\omega}_{\text{F}}^{2}+\bar{n}(g-1)(1-\hat{\omega}_{\text{F}}^{2})}\\ &\displaystyle &\displaystyle \qquad \qquad \qquad \quad {}= \frac{(10)(0.8495)}{(10)(0.8495)+(3.3333)(3-1)(1-0.8495)} = 0.8944\;. \end{array} \end{aligned} $$

8.6 Example 3: Analyses with v = 2 and v = 1

For a third example of tests of differences among g ≥ 3 independent samples, consider the example data set given in Table 8.4 with g = 4 treatment groups, sample sizes of n ₁ = n ₂ = n ₃ = n ₄ = 7, and N = 28 total observations. Under the Neyman–Pearson population model with sample sizes n ₁ = n ₂ = n ₃ = n ₄ = 7, treatment-group means $\bar {x}_{1} = 20.4286$, $\bar {x}_{2} = 20.8571$, $\bar {x}_{3} = 9.1429$, and $\bar {x}_{4} = 14.1429$, grand mean $\bar {\bar {x}} = 16.1429$, estimated population variances $s_{1}^{2} = 27.9524$, $s_{2}^{2} = 35.4762$, and $s_{3}^{2} = s_{4}^{2} = 8.8095$, the sum-of-squares between treatments is

$$\displaystyle \begin{aligned} \mathit{SS}_{\mathrm{Between}} = \sum_{i=1}^{g} n_{i} \big( \bar{x}_{i}-\bar{\bar{x}} \big)^{2} = 655.1429\;, \end{aligned}$$

the sum-of-squares within treatments is

$$\displaystyle \begin{aligned} \mathit{SS}_{\mathrm{Within}} = \sum_{i=1}^{g}\,\sum_{j=1}^{n_{i}} \big( x_{ij}-\bar{x}_{i} \big)^{2} = 486.2857\;, \end{aligned}$$

the sum-of-squares total is

$$\displaystyle \begin{aligned} \mathit{SS}_{\mathrm{Total}} = \mathit{SS}_{\mathrm{Between}}+\mathit{SS}_{\mathrm{Within}} = 655.1429+486.2857 = 1141.4286\;, \end{aligned}$$

the mean-square between treatments is

$$\displaystyle \begin{aligned} \mathit{MS}_{\mathrm{Between}} = \frac{\mathit{SS}_{\mathrm{Between}}}{g-1} = \frac{655.1429}{4-1} = 218.3810\;, \end{aligned}$$

the mean-square within treatments is

$$\displaystyle \begin{aligned} \mathit{MS}_{\mathrm{Within}} = \frac{\mathit{SS}_{\mathrm{Within}}}{N-g} = \frac{486.28571}{28-4} = 20.2619\;, \end{aligned}$$

and the observed value of Fisher’s F-ratio test statistic is

$$\displaystyle \begin{aligned} F = \frac{\mathit{MS}_{\mathrm{Between}}}{\mathit{MS}_{\mathrm{Within}}} = \frac{218.3810}{20.2619} = 10.7779\;. \end{aligned}$$

The essential factors, sums of squares (SS), degrees of freedom (df), mean squares (MS), and variance-ratio test statistic (F) are summarized in Table 8.5.

Table 8.4 Example data for a test of g = 4 independent samples with N = 28 observations

Full size table

Table 8.5 Source table for the data listed in Table 8.4

Full size table

Under the Neyman–Pearson null hypothesis, H ₀: μ ₁ = μ ₂ = μ ₃ = μ ₄, Fisher’s F-ratio test statistic is asymptotically distributed as Snedecor’s F with ν ₁ = g − 1 and ν ₂ = N − g degrees of freedom. With ν ₁ = g − 1 = 4 − 1 = 3 and ν ₂ = N − g = 28 − 4 = 24 degrees of freedom, the asymptotic probability value of F = 10.7778 is P = 0.1122×10⁻³, under the assumptions of normality and homogeneity.

8.6.1 A Monte Carlo Analysis with v = 2

For the first analysis of the example data listed in Table 8.4 on p. 278 under the Fisher–Pitman permutation model let v = 2, employing squared Euclidean scaling, and let the treatment-group weights be given by

$$\displaystyle \begin{aligned} C_{i} = \frac{n_{i}-1}{N-g}\;, \qquad i = 1,\,\ldots,\,g\;, \end{aligned}$$

for correspondence with Fisher’s F-ratio test statistic.

Because there are

$$\displaystyle \begin{aligned} M = \frac{N!}{\displaystyle\prod_{i=1}^{g}n_{i}!} = \frac{28!}{7!\;7!\;7!\;7!} = 472{,}518{,}347{,}558{,}400 \end{aligned}$$

possible, equally-likely arrangements in the reference set of all permutations of the N = 28 observations listed in Table 8.4, an exact permutation analysis is not possible and a Monte Carlo analysis is required.

Following Eq. (8.2) on p. 261, the N = 28 observations yield g = 4 average distance-function values of

$$\displaystyle \begin{aligned} \xi_{i} = 55.9048\;, \quad \xi_{2} = 70.9524\;, \quad \mbox{and} \;\quad \xi_{3} = \xi_{4} = 17.6190\;. \end{aligned}$$

Alternatively, in terms of a one-way analysis of variance model the average distance-function values are $\xi _{1} = 2s_{1}^{2} = 2(27.9524) = 55.9048$, $\xi _{2} = 2s_{2}^{2} = 2(34.4762) = 70.9524$, $\xi _{3} = 2s_{3}^{2} = 2(8.8095) = 2(8.8095) = 17.6190$, and $\xi _{4} = 2s_{4}^{2} = 2(8.8095) = 17.6190$.

Following Eq. (8.1) on p. 261, the observed value of the permutation test statistic based on v = 2 and treatment-group weights

$$\displaystyle \begin{aligned} C_{i} = \frac{n_{i}-1}{N-g}\;, \qquad i = 1,\,\ldots,\,4\;, \end{aligned}$$

is

$$\displaystyle \begin{aligned} \begin{array}{rcl} {} \delta = \sum_{i=1}^{g} C_{i} \xi_{i} = \frac{7-1}{28-4} \big(55.9048&\displaystyle +&\displaystyle 70.9524\\ &\displaystyle &\displaystyle \qquad {}+17.6190+17.6190\big) = 40.5238\;. \end{array} \end{aligned} $$

Alternatively, in terms of a one-way analysis of variance model the permutation test statistic is

$$\displaystyle \begin{aligned} \delta = 2\mathit{MS}_{\mathrm{Within}} = 2(20.2619) = 40.5238\;. \end{aligned}$$

For the example data listed in Table 8.4, the sum of the N = 28 observations is

$$\displaystyle \begin{aligned} \sum_{i=1}^{N}x_{i} = 15+23+18+ \cdots +11+15 = 452\;, \end{aligned}$$

the sum of the N = 28 squared observations is

$$\displaystyle \begin{aligned} \sum_{i=1}^{N}x_{i}^{2} = 15^{2}+23^{2}+18^{2}+ \cdots +11^{2}+15^{2} = 8438\;, \end{aligned}$$

and the total sum-of-squares is

$$\displaystyle \begin{aligned} \begin{array}{rcl} {} \mathit{SS}_{\mathrm{Total}} = \sum_{i=1}^{N}\big( x_{i}-\bar{\bar{x}} \big)^{2} = \sum_{i=1}^{N}x_{i}^{2}&\displaystyle -&\displaystyle \left( \sum_{i=1}^{N} x_{i} \right)^{2} \left/ \rule{0pt}{14pt} N \right.\\ &\displaystyle &\displaystyle \quad {}= 8438-(452)^{2}/28 = 1141.4286\;, \end{array} \end{aligned} $$

where $\bar {\bar {x}}$ denotes the grand mean of all N = 28 observations.

Then following the expressions given in Eq. (8.5) on p. 262 for test statistics δ and F, the observed value for test statistic δ with respect to the observed value of test statistic F is

$$\displaystyle \begin{aligned} \delta = \frac{2 \mathit{SS}_{\mathrm{Total}}}{N-g+(g-1)F} = \frac{2 (1141.4286)}{28-4+(4-1)(10.7779)} = 40.5238 \end{aligned}$$

and the observed value of test statistic F with respect to the observed value of test statistic δ is

$$\displaystyle \begin{aligned} F = \frac{2 \mathit{SS}_{\mathrm{Total}}}{(g-1)\delta}-\frac{N-g}{g-1} = \frac{2(1141.4286)}{(4-1)(40.5238)}-\frac{28-4}{4-1} = 10.7779\;. \end{aligned}$$

Under the Fisher–Pitman permutation model, the Monte Carlo probability of an observed δ is the proportion of δ test statistic values computed on the randomly-selected, equally-likely arrangements of the N = 28 observations listed in Table 8.4 that are equal to or less than the observed value of δ = 40.5238. There are exactly 138 δ test statistic values that are equal to or less than the observed value of δ = 40.5238. If all M arrangements of the N = 28 observations listed in Table 8.4 occur with equal chance under the Fisher–Pitman null hypothesis, the Monte Carlo probability value of δ = 40.5238 computed on L = 1, 000, 000 random arrangements of the observed data with n ₁ = n ₂ = n ₃ = n ₄ = 7 preserved for each arrangement is

$$\displaystyle \begin{aligned} P \big( \delta \leq \delta_{\text{o}} \big) = \frac{\text{number of }\delta\text{ values } \leq \delta_{\text{o}}}{L} = \frac{138}{1{,}000{,}000} = 0.1380 {\times} 10^{-3}\;, \end{aligned}$$

where δ _o denotes the observed value of test statistic δ and L is the number of randomly-selected, equally-likely arrangements of the N = 28 observations listed in Table 8.4.

In terms of a one-way analysis of variance model, there are only 138 F values that are larger than the observed value of F = 10.7779. Thus, if all arrangements of the observed data occur with equal chance, the exact probability value of F = 10.7779 under the Fisher–Pitman null hypothesis is

$$\displaystyle \begin{aligned} P \big( F \geq F_{\text{o}} \big) = \frac{\text{number of }F\text{ values } \geq F_{\text{o}}}{L} = \frac{138}{1{,}000{,}000} = 0.1380 {\times} 10^{-3}\;, \end{aligned}$$

where F _o denotes the observed value of test statistic F and L is the number of random, equally-likely arrangements of the example data listed in Table 8.4.

Following Eq. (8.7) on p. 263, the exact expected value of the M = 4200 δ test statistic values under the Fisher–Pitman null hypothesis is

$$\displaystyle \begin{aligned} \mu_{\delta} = \frac{1}{M}\sum_{i=1}^{M}\delta_{i} = \frac{39{,}951{,}568{,}041{,}566{,}987}{472{,}518{,}347{,}558{,}400} = 84.5503\;. \end{aligned}$$

Alternatively, in terms of a one-way analysis of variance model the exact expected value of test statistic δ under the Fisher–Pitman null hypothesis is

$$\displaystyle \begin{aligned} \mu_{\delta} = \frac{2\mathit{SS}_{\mathrm{Total}}}{N-1} = \frac{2(1141.4286)}{28-1} = 84.5503\;. \end{aligned}$$

Following Eq. (8.6) on p. 263, the observed chance-corrected measure of effect size is

$$\displaystyle \begin{aligned} \Re = 1-\frac{\delta}{\mu_{\delta}} = 1-\frac{40.5238}{84.5503} = +0.5207\;, \end{aligned}$$

indicating approximately 52% within-group agreement above what is expected by chance. Alternatively, in terms of a one-way analysis of variance model, the observed chance-corrected measure of effect size is

$$\displaystyle \begin{aligned} \Re = 1-\frac{(N-1)(\mathit{MS}_{\mathrm{Within}})}{\mathit{SS}_{\mathrm{Total}}} = 1-\frac{(28-1)(20.2619)}{1141.4286} = +0.5207\;. \end{aligned}$$

Alternatively, in terms of Fisher’s F-ratio test statistic the chance-corrected measure of effect size is

$$\displaystyle \begin{aligned} \Re = 1-\frac{N-1}{F(g-1)+N-g} = 1-\frac{28-1}{10.7779(4-1)+28-4} = +0.5207\;. \end{aligned}$$

8.6.2 Measures of Effect Size

For the example data listed in Table 8.4, Cohen’s $\hat {d}$ measure of effect size is

$$\displaystyle \begin{aligned} \hat{d} =\left[ \frac{1}{g-1} \left( \frac{\mathit{SS}_{\mathrm{Between}}}{n\mathit{MS}_{\mathrm{Within}}} \right) \right]^{1/2} = \left[ \frac{1}{4-1} \left( \frac{655.1429}{(7)(20.2619)} \right) \right]^{1/2} = \pm 1.2408\;, \end{aligned}$$

Pearson’s η ² (r ²) measure of effect size is

$$\displaystyle \begin{aligned} \eta^{2} = \frac{\mathit{SS}_{\mathrm{Between}}}{\mathit{SS}_{\mathrm{Total}}} = \frac{655.1429}{1141.4286} = 0.5740\;, \end{aligned}$$

Kelley’s $\hat {\eta }^{2}$ measure of effect size is

$$\displaystyle \begin{aligned} \begin{array}{rcl} &\displaystyle &\displaystyle \hat{\eta}^{2} = \frac{\mathit{SS}_{\mathrm{Total}}-(N-1)\mathit{MS}_{\mathrm{Within}}}{\mathit{SS}_{\mathrm{Total}}}\\ &\displaystyle &\displaystyle \qquad \qquad \qquad \qquad \qquad \qquad {}= \frac{1141.4286-(28-1)(20.2619)}{1141.4286} = 0.5207\;, \end{array} \end{aligned} $$

Hays’ $\hat {\omega }_{\text{F}}^{2}$ measure of effect size for a fixed-effects model is

$$\displaystyle \begin{aligned} \begin{array}{rcl} &\displaystyle &\displaystyle \hat{\omega}_{\text{F}}^{2} = \frac{\mathit{SS}_{\mathrm{Between}}-(g-1)\mathit{MS}_{\mathrm{Within}}}{\mathit{SS}_{\mathrm{Total}}+\mathit{MS}_{\mathrm{Within}}}\\ &\displaystyle &\displaystyle \qquad \qquad \qquad \qquad \qquad \qquad {}= \frac{655.1429-(4-1)(20.2619)}{1141.4286+20.2619} = 0.5116\;, \end{array} \end{aligned} $$

Hays’ $\hat {\omega }_{\text{R}}^{2}$ measure of effect size for a random-effects model is

$$\displaystyle \begin{aligned} \begin{array}{rcl} {} &\displaystyle &\displaystyle \hat{\omega}_{\text{R}}^{2} = \frac{\mathit{MS}_{\mathrm{Between}}-\mathit{MS}_{\mathrm{Within}}}{\mathit{MS}_{\mathrm{Between}}+(n-1)\mathit{MS}_{\mathrm{Within}}}\\ &\displaystyle &\displaystyle \qquad \qquad \qquad \qquad \qquad \qquad {}= \frac{655.1429-20.2619}{655.1429+(7-1)(20.2619)} = 0.8174\;, \end{array} \end{aligned} $$

and the observed chance-corrected measure of effect size is

$$\displaystyle \begin{aligned} \Re = 1-\frac{\delta}{\mu_{\delta}} = 1-\frac{40.5238}{84.5503} = +0.5207\;, \end{aligned}$$

indicating approximately 52% within-group agreement above what is expected by chance.

8.6.3 A Monte Carlo Analysis with v = 1

Consider a second analysis of the example data listed in Table 8.4 on p. 278 under the Fisher–Pitman permutation model with v = 1 and treatment-group weights

$$\displaystyle \begin{aligned} C_{i} = \frac{n_{i}-1}{N-g}\;, \qquad i = 1,\,\ldots,\,g\;. \end{aligned}$$

For v = 1, the average distance-function values for the g = 4 treatment groups are

$$\displaystyle \begin{aligned} \xi_{1} = 6.2857\;, \quad \xi_{2} = 7.2381\;, \quad \mbox{and} \quad \xi_{3} = \xi_{4} = 3.6190\;, \end{aligned}$$

respectively, and the observed permutation test statistic is

$$\displaystyle \begin{aligned} \begin{array}{rcl} {} &\displaystyle &\displaystyle \delta = \sum_{i=1}^{g}C_{i}\xi_{i}\\ &\displaystyle &\displaystyle \qquad \ \ \qquad {}= \left( \frac{7-1}{28-4} \right)(6.2857+7.2381+3.6190+3.6190) = 5.1905\;. \end{array} \end{aligned} $$

Because there are

$$\displaystyle \begin{aligned} M = \frac{N!}{\displaystyle\prod_{i=1}^{g}n_{i}!} = \frac{28!}{7!\;7!\;7!\;7!} = 472{,}518{,}347{,}558{,}400 \end{aligned}$$

possible, equally-likely arrangements in the reference set of all permutations of the N = 28 observations listed in Table 8.4, an exact permutation analysis is impossible and a Monte Carlo permutation analysis is required. Under the Fisher–Pitman permutation model, the Monte Carlo probability of an observed δ is the proportion of δ test statistic values computed on the randomly-selected, equally-likely arrangements of the N = 28 observations listed in Table 8.4 that are equal to or less than the observed value of δ = 5.1905. There are exactly 204 δ test statistic values that are equal to or less than the observed value of δ = 5.1905. If all M arrangements of the N = 28 observations listed in Table 8.4 occur with equal chance under the Fisher–Pitman null hypothesis, the Monte Carlo probability value of δ = 5.1905 computed on L = 1, 000, 000 random arrangements of the observed data with n ₁ = n ₂ = n ₃ = n ₄ = 7 preserved for each arrangement is

$$\displaystyle \begin{aligned} P \big( \delta \leq \delta_{\text{o}}|H_{0} \big) = \frac{\text{number of }\delta\text{ values } \leq \delta_{\text{o}}}{L} = \frac{204}{1{,}000{,}000} = 0.2040 {\times} 10^{-3}\;, \end{aligned}$$

where δ _o denotes the observed value of test statistic δ and L is the number of randomly-selected, equally-likely arrangements of the N = 28 observations listed in Table 8.4. No comparison is made with Fisher’s F-ratio test statistic as F is undefined for ordinary Euclidean scaling.

For the example data listed in Table 8.4, the exact expected value of test statistic δ under the Fisher–Pitman null hypothesis is

$$\displaystyle \begin{aligned} \mu_{\delta} = \frac{1}{M}\sum_{i=1}^{M}\delta_{i} = \frac{3{,}497{,}628{,}060{,}462{,}033}{472{,}518{,}347{,}558{,}400} = 7.4021 \end{aligned} $$

(8.22)

and the observed chance-corrected measure of effect size is

$$\displaystyle \begin{aligned} \Re = 1-\frac{\delta}{\mu_{\delta}} = 1-\frac{5.1905}{7.4021} = +0.2988\;, \end{aligned}$$

indicating approximately 30% within-group agreement above what is expected by chance. No comparisons are made with Cohen’s $\hat {d}$, Pearson’s η ² (r ²), Kelley’s $\hat {\eta }^{2}$, Hays’ $\hat {\omega }^{2}_{\text{F}}$, or Hays’ $\hat {\omega }^{2}_{\text{R}}$ conventional measures of effect size as $\hat {d}$, η ², $\hat {\eta }^{2}$, $\hat {\omega }_{\text{F}}^{2}$, and $\hat {\omega }_{\text{R}}^{2}$ are undefined for ordinary Euclidean scaling.

8.6.4 The Effects of Extreme Values

To illustrate the robustness to the inclusion of extreme values of ordinary Euclidean scaling with v = 1, consider the example data listed in Table 8.4 on p. 278 with one alteration. The seventh (last) observation in Group 4 in Table 8.4 has been increased from x _7,4 = 15 to x _7,4 = 75, as shown in Table 8.6. Under the Neyman–Pearson population model with sample sizes n ₁ = n ₂ = n ₃ = n ₄ = 7, treatment-group means $\bar {x}_{1} = 20.4286$, $\bar {x}_{2} = 20.8571$, $\bar {x}_{3} = 9.1429$, and $\bar {x}_{4} = 22.7143$, grand mean $\bar {\bar {x}} = 18.2857$, estimated population variances $s_{1}^{2} = 27.9524$, $s_{2}^{2} = 35.4762$, $s_{3}^{2} = 8.8095$, and $s_{4}^{2} = 540.2381$, the sum-of-squares between treatments is

$$\displaystyle \begin{aligned} \mathit{SS}_{\mathrm{Between}} = \sum_{i=1}^{g} n_{i} \big( \bar{x}_{i}-\bar{\bar{x}} \big)^{2} = 800.8571\;, \end{aligned}$$

the sum-of-squares within treatments is

$$\displaystyle \begin{aligned} \mathit{SS}_{\mathrm{Within}} = \sum_{i=1}^{g}\,\sum_{j=1}^{n_{i}} \big( x_{ij}-\bar{x}_{i} \big)^{2} = 3674.8571, \end{aligned}$$

the sum-of-squares total is

$$\displaystyle \begin{aligned} \mathit{SS}_{\mathrm{Total}} = \mathit{SS}_{\mathrm{Between}}+\mathit{SS}_{\mathrm{Within}} = 800.8571+3674.8571 = 4475.7142\;, \end{aligned}$$

the mean-square between treatments is

$$\displaystyle \begin{aligned} \mathit{MS}_{\mathrm{Between}} = \frac{\mathit{SS}_{\mathrm{Between}}}{g-1} = \frac{655.1429}{4-1} = 266.9524\;, \end{aligned}$$

the mean-square within treatments is

$$\displaystyle \begin{aligned} \mathit{MS}_{\mathrm{Within}} = \frac{\mathit{SS}_{\mathrm{Within}}}{N-g} = \frac{486.28571}{28-4} = 153.1190\;, \end{aligned}$$

and the observed value of Fisher’s F-ratio test statistic is

$$\displaystyle \begin{aligned} F = \frac{\mathit{MS}_{\mathrm{Between}}}{\mathit{MS}_{\mathrm{Within}}} = \frac{266.9524}{153.1190} = 1.7434\;. \end{aligned}$$

The essential factors, sums of squares (SS), degrees of freedom (df), mean squares (MS), and variance-ratio test statistic (F) are summarized in Table 8.7.

Table 8.6 Example data for a test of g = 4 independent samples with N = 28 observations and one extreme value, x _7,4 = 75

Full size table

Table 8.7 Source table for the data listed in Table 8.6

Full size table

Under the Neyman–Pearson null hypothesis, H ₀: μ ₁ = μ ₂ = μ ₃ = μ ₄, Fisher’s F-ratio test statistic is asymptotically distributed as Snedecor’s F with ν ₁ = g − 1 and ν ₂ = N − g degrees of freedom. With ν ₁ = g − 1 = 4 − 1 = 3 and ν ₂ = N − g = 28 − 4 = 24 degrees of freedom, the asymptotic probability value of F = 1.7434 is P = 0.1849, under the assumptions of normality and homogeneity. The original F-ratio test statistic value with observation x _7,4 = 15 was F = 10.7779 with an asymptotic probability value of P = 0.1122×10⁻³, yielding a difference between the two probability values of

$$\displaystyle \begin{aligned} \Delta_{P} = 0.1849-0.1122 {\times} 10^{-3} = 0.1848\;. \end{aligned}$$

8.6.5 A Monte Carlo Analysis with v = 2

For the first analysis of the example data listed in Table 8.6 on p. 285 under the Fisher–Pitman permutation model let v = 2, employing squared Euclidean scaling, and let the treatment-group weights be given by

$$\displaystyle \begin{aligned} C_{i} = \frac{n_{i}-1}{N-g}\;, \qquad i = 1,\,\ldots,\,g\;, \end{aligned}$$

for correspondence with Fisher’s F-ratio test statistic.

Because there are

$$\displaystyle \begin{aligned} M = \frac{N!}{\displaystyle\prod_{i=1}^{g}n_{i}!} = \frac{28!}{7!\;7!\;7!\;7!} = 472{,}518{,}347{,}558{,}400 \end{aligned}$$

possible, equally-likely arrangements in the reference set of all permutations of the N = 28 observations listed in Table 8.6, an exact permutation analysis is not possible and a Monte Carlo analysis is required.

Following Eq. (8.2) on p. 261, the N = 28 observations yield g = 4 average distance-function values of

$$\displaystyle \begin{aligned} \xi_{i} = 55.9048\;, \quad \xi_{2} = 70.9524\;, \quad \xi_{3} = 17.6190\;, \quad \mbox{and} \quad \xi_{4} = 1080.4762\;. \end{aligned}$$

Alternatively, under an analysis of variance model, $\xi _{1} = 2s_{1}^{2} = 2(27.9524) = 55.9048$, $\xi _{2} = 2s_{2}^{2} = 2(35.4762) = 70.9524$, $\xi _{3} = 2s_{3}^{2} = 2(8.8095) = 17.6190$, and $\xi _{4} = 2s_{4}^{2} = 2(540.2381) = 1080.4762$.

Following Eq. (8.1) on p. 261, the observed value of the permutation test statistic based on v = 2 and treatment-group weights

$$\displaystyle \begin{aligned} C_{i} = \frac{n_{i}-1}{N-g}\;, \qquad i = 1,\,\ldots,\,4\;, \end{aligned}$$

is

$$\displaystyle \begin{aligned} \begin{array}{rcl} {} \delta = \sum_{i=1}^{g} C_{i} \xi_{i} = \frac{7-1}{28-4} \big(55.9048&\displaystyle +&\displaystyle 70.9524\\ &\displaystyle &\displaystyle \quad {}+17.6190+1080.4762 \big) = 306.2381\;. \end{array} \end{aligned} $$

Under the Fisher–Pitman permutation model, the Monte Carlo probability of an observed δ is the proportion of δ test statistic values computed on the randomly-selected, equally-likely arrangements of the N = 28 observations listed in Table 8.6 that are equal to or less than the observed value of δ = 306.2381. There are exactly 128,239 δ test statistic values that are equal to or less than the observed value of δ = 306.2381. If all M arrangements of the N = 28 observations listed in Table 8.6 occur with equal chance under the Fisher–Pitman null hypothesis, the Monte Carlo probability value of δ = 306.2381 computed on L = 1, 000, 000 random arrangements of the observed data with n ₁ = n ₂ = n ₃ = n ₄ = 7 preserved for each arrangement is

$$\displaystyle \begin{aligned} P \big( \delta \leq \delta_{\text{o}}|H_{0} \big) = \frac{\text{number of }\delta\text{ values } \leq \delta_{\text{o}}}{L} = \frac{128{,}239}{1{,}000{,}000} = 0.1282\;, \end{aligned}$$

where δ _o denotes the observed value of test statistic δ and L is the number of randomly-selected, equally-likely arrangements of the N = 28 observations listed in Table 8.6. For comparison, the original value of test statistic δ based on v = 2 with observation x _7,4 = 15 was δ = 40.5238 with a Monte Carlo probability value of P = 0.1380×10⁻³, yielding a difference between the two probability values of

$$\displaystyle \begin{aligned} \Delta_{P} = 0.1282-0.1380 {\times} 10^{-3} = 0.1281\;. \end{aligned}$$

8.6.6 A Monte Carlo Analysis with v = 1

For the second analysis of the example data listed in Table 8.6 on p. 285 under the Fisher–Pitman permutation model let v = 1, employing ordinary Euclidean scaling, and let the treatment-group weights be given by

$$\displaystyle \begin{aligned} C_{i} = \frac{n_{i}-1}{N-g}\;, \qquad i = 1,\,\ldots,\,g\;. \end{aligned}$$

Setting v = 1 can be expected to reduce the outsized effect of extreme value x _7,4 = 75.

Because there are

$$\displaystyle \begin{aligned} M = \frac{N!}{\displaystyle\prod_{i=1}^{g}n_{i}!} = \frac{28!}{7!\;7!\;7!\;7!} = 472{,}518{,}347{,}558{,}400 \end{aligned}$$

possible, equally-likely arrangements in the reference set of all permutations of the N = 28 observations listed in Table 8.6, an exact permutation analysis is not possible and a Monte Carlo analysis is required.

Following Eq. (8.2) on p. 261, the N = 28 observations yield g = 4 average distance-function values of

$$\displaystyle \begin{aligned} \xi_{i} = 6.2857\;, \quad \xi_{2} = 7.2381\;, \quad \xi_{3} = 3.6190\;, \quad \mbox{and} \quad \xi_{4} = 20.2857\;. \end{aligned}$$

Following Eq. (8.1) on p. 261, the observed value of the permutation test statistic based on v = 1 and treatment-group weights

$$\displaystyle \begin{aligned} C_{i} = \frac{n_{i}-1}{N-g}\;, \qquad i = 1,\,\ldots,\,4\;, \end{aligned}$$

is

$$\displaystyle \begin{aligned} \delta = \sum_{i=1}^{g} C_{i} \xi_{i} = \frac{7-1}{28-4} \big(6.2857+7.2381+3.6190+20.2857 \big) = 9.3571\;. \end{aligned}$$

Under the Fisher–Pitman permutation model, the exact probability of an observed δ is the proportion of δ test statistic values computed on the randomly-selected, equally-likely arrangements of the N = 28 observations listed in Table 8.6 that are equal to or less than the observed value of δ = 9.3571. There are exactly 1960 δ test statistic values that are equal to or less than the observed value of δ = 9.3571. If all M arrangements of the N = 28 observations listed in Table 8.6 occur with equal chance, the Monte Carlo probability value of δ = 9.3571 computed on L = 1, 000, 000 random arrangements of the observed data with n ₁ = n ₂ = n ₃ = n ₄ = 7 preserved for each arrangement is

$$\displaystyle \begin{aligned} P \big( \delta \leq \delta_{\text{o}}|H_{0} \big) = \frac{\text{number of }\delta\text{ values } \leq \delta_{\text{o}}}{L} = \frac{1960}{1{,}000{,}000} = 0.1960 {\times} 10^{-2}\;, \end{aligned}$$

where δ _o denotes the observed value of δ and L is the number of randomly-selected, equally-likely arrangements of the N = 28 observations listed in Table 8.6.

The original value of test statistic δ based on v = 1 with observation x _7,4 = 15 was δ = 5.1905 with a Monte Carlo probability value of P = 0.2040×10⁻³, yielding a difference between the two probability values of only

$$\displaystyle \begin{aligned} \Delta_{P} = 0.1960 {\times} 10^{-2}-0.2040 {\times} 10^{-3} = 0.1756 {\times} 10^{-2}\;. \end{aligned}$$

Multi-sample permutation tests based on ordinary Euclidean scaling with v = 1 tend to be relatively robust with respect to extreme values when compared with permutation tests based on squared Euclidean scaling with v = 2.

8.7 Example 4: Exact and Monte Carlo Analyses

For a fourth, larger example of tests for differences among g ≥ 3 independent samples, consider the example data given in Table 8.8 with g = 4 treatment groups, sample sizes of n ₁ = n ₂ = 3, n ₃ = 4, n ₄ = 5, and N = n ₁ + n ₂ + n ₃ + n ₄ = 3 + 3 + 4 + 5 = 15 total observations. Under the Neyman–Pearson population model with sample sizes n ₁ = n ₂ = 3, n ₃ = 4, and n ₄ = 5, treatment-group means $\bar {x}_{1} = 11.00$, $\bar {x}_{2} = 12.00$, $\bar {x}_{3} = 13.50$, and $\bar {x}_{4} = 19.00$, grand mean $\bar {\bar {x}} = 14.5333$, estimated population variances $s_{1}^{2} = s_{2}^{2} = 1.00$, $s_{3}^{2} = 1.6667$, and $s_{4}^{2} = 62.50$, the sum-of-squares between treatments is

$$\displaystyle \begin{aligned} \mathit{SS}_{\mathrm{Between}} = \sum_{i=1}^{g} n_{i} \big( \bar{x}_{i}-\bar{\bar{x}} \big)^{2} = 160.7333\;, \end{aligned}$$

the sum-of-squares within treatments is

$$\displaystyle \begin{aligned} \mathit{SS}_{\mathrm{Within}} = \sum_{i=1}^{g}\,\sum_{j=1}^{n_{i}} \big( x_{ij}-\bar{x}_{i} \big)^{2} = 259.00\;, \end{aligned}$$

the sum-of-squares total is

$$\displaystyle \begin{aligned} \mathit{SS}_{\mathrm{Total}} = \mathit{SS}_{\mathrm{Between}}+\mathit{SS}_{\mathrm{Within}} = 160.7333+259.00 = 419.7333\;, \end{aligned}$$

the mean-square between treatments is

$$\displaystyle \begin{aligned} \mathit{MS}_{\mathrm{Between}} = \frac{\mathit{SS}_{\mathrm{Between}}}{g-1} = \frac{160.7333}{4-1} = 53.5778\;, \end{aligned}$$

the mean-square within treatments is

$$\displaystyle \begin{aligned} \mathit{MS}_{\mathrm{Within}} = \frac{\mathit{SS}_{\mathrm{Within}}}{N-g} = \frac{259.00}{15-4} = 23.5455\;, \end{aligned}$$

and the observed value of Fisher’s F-ratio test statistic is

$$\displaystyle \begin{aligned} F = \frac{\mathit{MS}_{\mathrm{Between}}}{\mathit{MS}_{\mathrm{Within}}} = \frac{53.5778}{23.5455} = 2.2755\;. \end{aligned}$$

The essential factors, sums of squares (SS), degrees of freedom (df), mean squares (MS), and variance-ratio test statistic (F) are summarized in Table 8.9.

Table 8.8 Example data for a test of g = 4 independent samples with N = 15 observations

Full size table

Table 8.9 Source table for the data listed in Table 8.8

Full size table

Under the Neyman–Pearson null hypothesis, H ₀: μ ₁ = μ ₂ = μ ₃ = μ ₄, Fisher’s F-ratio test statistic is asymptotically distributed as Snedecor’s F with ν ₁ = g − 1 and ν ₂ = N − g degrees of freedom. With ν ₁ = g − 1 = 4 − 1 = 3 and ν ₂ = N − g = 15 − 4 = 11 degrees of freedom, the asymptotic probability value of F = 2.2755 is P = 0.1366, under the assumptions of normality and homogeneity.

8.7.1 A Permutation Analysis with v = 2

For the first analysis of the example data listed in Table 8.8 under the Fisher–Pitman permutation model let v = 2, employing squared Euclidean scaling, and let the treatment-group weighting functions be given by

$$\displaystyle \begin{aligned} C_{i} = \frac{n_{i}-1}{N-g}\;, \qquad i = 1,\,\ldots,\,g\;, \end{aligned}$$

for correspondence with Fisher’s F-ratio test statistic.

Because there are

$$\displaystyle \begin{aligned} M = \frac{N!}{\displaystyle\prod_{i=1}^{g}n_{i}!} = \frac{15!}{3!\;3!\;4!\;5!} = 12{,}612{,}600 \end{aligned}$$

possible, equally-likely arrangements in the reference set of all permutations of the N = 15 observations listed in Table 8.8, an exact permutation analysis is not practical and a Monte Carlo analysis is utilized.

Following Eq. (8.2) on p. 261, the N = 15 observations yield g = 4 average distance-function values of

$$\displaystyle \begin{aligned} \xi_{1} = \xi_{2} = 2.00\;, \quad \xi_{3} = 3.3333\;, \quad \mbox{and} \quad \xi_{4} = 125.00\;. \end{aligned}$$

Alternatively, in terms of a one-way analysis of variance model the average distance-function values are $\xi _{1} = 2s_{1}^{2} = 2(1.00) = 2.00$, $\xi _{2} = 2s_{2}^{2} = 2(1.00) = 2.00$, $\xi _{3} = 2s_{3}^{2} = 2(1.667) = 3.3333$, and $\xi _{4} = 2s_{4}^{2} = 2(62.50) = 125.00$.

Following Eq. (8.1) on p. 261, the observed value of the permutation test statistic based on v = 2 and treatment-group weights

$$\displaystyle \begin{aligned} C_{i} = \frac{n_{i}-1}{N-g}\;, \qquad i = 1,\,\ldots,\,4\;, \end{aligned}$$

is

$$\displaystyle \begin{aligned} \begin{array}{rcl} {} \delta = \sum_{i=1}^{g}C_{i}\xi_{i} &\displaystyle =&\displaystyle \frac{1}{15-4}\big[ (3-1)(2.00)+(3-1)(2.00)\\ &\displaystyle &\displaystyle \qquad \qquad {}+(4-1)(3.3333)+(5-1)(125.00) \big] = 47.0909\;. \end{array} \end{aligned} $$

Alternatively, in terms of a one-way analysis of variance model the permutation test statistic is

$$\displaystyle \begin{aligned} \delta = 2\mathit{MS}_{\mathrm{Within}} = 2(23.5455) = 47.0909\;. \end{aligned}$$

For the example data listed in Table 8.8, the sum of the N = 15 observations is

$$\displaystyle \begin{aligned} \sum_{i=1}^{N}x_{i} = 10+11+12+ \cdots +17+33 = 218\;, \end{aligned}$$

the sum of the N = 15 squared observations is

$$\displaystyle \begin{aligned} \sum_{i=1}^{N}x_{i}^{2} = 10^{2}+11^{2}+12^{2}+ \cdots +17^{2}+33^{2} = 3588\;, \end{aligned}$$

and the total sum-of-squares is

$$\displaystyle \begin{aligned} \begin{array}{rcl} {} \mathit{SS}_{\mathrm{Total}} = \sum_{i=1}^{N}\big( x_{i}-\bar{\bar{x}} \big)^{2} = \sum_{i=1}^{N}d_{i}^{2}&\displaystyle -&\displaystyle \left( \sum_{i=1}^{N} d_{i} \right)^{2} \left/ \rule{0pt}{14pt} N \right.\\ &\displaystyle &\displaystyle \qquad \ \ = 3588-(218)^{2}/15 = 419.7333\;, \end{array} \end{aligned} $$

where $\bar {\bar {x}}$ denotes the grand mean of all N = 15 observations. Then following the expressions given in Eq. (8.5) on p. 262 for test statistics δ and F, the observed value for test statistic δ with respect to the observed value of test statistic F is

$$\displaystyle \begin{aligned} \delta = \frac{2 \mathit{SS}_{\mathrm{Total}}}{N-g+(g-1)F} {}= \frac{2 (419.7333)}{15-4+(4-1)(2.2755)} = 47.0909 \end{aligned}$$

and the observed value for test statistic F with respect to the observed value of test statistic δ is

$$\displaystyle \begin{aligned} F = \frac{2 \mathit{SS}_{\mathrm{Total}}}{(g-1)\delta}-\frac{N-g}{g-1} = \frac{2 (419.7333)}{(4-1)(47.0909)}-\frac{15-4}{4-1} = 2.2755\;. \end{aligned}$$

Under the Fisher–Pitman permutation model, the Monte Carlo probability of an observed δ is the proportion of δ test statistic values computed on the randomly-selected, equally-likely arrangements of the N = 15 observations listed in Table 8.8 that are equal to or less than the observed value of δ = 47.0909. There are exactly 53,200 δ test statistic values that are equal to or less than the observed value of δ = 47.0909. If all M arrangements of the N = 15 observations listed in Table 8.8 occur with equal chance under the Fisher–Pitman null hypothesis, the Monte Carlo probability value of δ = 47.0909 computed on L = 1, 000, 000 randomly-selected arrangements of the observed data with n ₁ = n ₂ = 3 = n ₃ = 4, and n ₄ = 5 preserved for each arrangement is

$$\displaystyle \begin{aligned} P \big( \delta \leq \delta_{\text{o}}|H_{0} \big) = \frac{\text{number of }\delta\text{ values } \leq \delta_{\text{o}}}{L} = \frac{53{,}200}{1{,}000{,}000} = 0.0532\;, \end{aligned}$$

where δ _o denotes the observed value of test statistic δ and L is the number of randomly-selected, equally-likely arrangements of the N = 15 observations listed in Table 8.8.

Alternatively, in terms of a one-way analysis of variance model, there are 53,200 F values that are equal to or greater than the observed value of F = 2.2755. Thus, if all arrangements of the observed data occur with equal chance, the exact probability value of F = 2.2755 under the Fisher–Pitman null hypothesis is

$$\displaystyle \begin{aligned} P \big( F \geq F_{\text{o}}|H_{0} \big) = \frac{\text{number of }F\text{ values } \geq F_{\text{o}}}{L} = \frac{53{,}200}{1{,}000{,}000} = 0.0532\;, \end{aligned}$$

where F _o denotes the observed value of test statistic F.

Following Eq. (8.7) on p. 263, the exact expected value of the M = 12, 612, 600 δ test statistic values under the Fisher–Pitman null hypothesis is

$$\displaystyle \begin{aligned} \mu_{\delta} = \frac{1}{M}\sum_{i=1}^{M}\delta_{i} = \frac{756{,}275{,}456}{12{,}612{,}600} = 59.9619\;. \end{aligned}$$

In terms of a one-way analysis of variance model the exact expected value of test statistic δ is

$$\displaystyle \begin{aligned} \mu_{\delta} = \frac{2\mathit{SS}_{\mathrm{Total}}}{N-1} = \frac{2(419.7333)}{15-1} = 59.9619\;. \end{aligned}$$

Following Eq. (8.6) on p. 263, the observed chance-corrected measure of effect size is

$$\displaystyle \begin{aligned} \Re = 1-\frac{\delta}{\mu_{\delta}} = 1-\frac{47.0909}{59.9619} = +0.2147\;, \end{aligned}$$

indicating approximately 21% within-group agreement above what is expected by chance. Alternatively, in terms of a one-way analysis of variance model, the observed measure of effect size is

$$\displaystyle \begin{aligned} \Re = 1-\frac{(N-1)(\mathit{MS}_{\mathrm{Within}})}{\mathit{SS}_{\mathrm{Total}}} = 1-\frac{(15-1)(23.5455)}{419.7333} = +0.2147\;. \end{aligned}$$

8.7.2 Measures of Effect Size

For the example data listed in Table 8.8 on p. 289, the average treatment-group size is

$$\displaystyle \begin{aligned} \bar{n} = \frac{1}{g}\sum_{i=1}^{g}n_{i} = \frac{3+3+4+5}{4} = 3.75\;, \end{aligned}$$

Cohen’s $\hat {d}$ measure of effect size is

$$\displaystyle \begin{aligned} \begin{array}{rcl} {} &\displaystyle &\displaystyle \hat{d} =\left[ \frac{1}{g-1} \left( \frac{\mathit{SS}_{\mathrm{Between}}}{\bar{n}\mathit{MS}_{\mathrm{Within}}} \right) \right]^{1/2}\\ &\displaystyle &\displaystyle \qquad \ \ \quad \qquad \qquad \qquad \qquad {}= \left[ \frac{1}{4-1} \left( \frac{160.7333}{(3.75)(23.5455)} \right) \right]^{1/2} = \pm 0.7336\;, \end{array} \end{aligned} $$

Pearson’s η ² (r ²) measure of effect size is

$$\displaystyle \begin{aligned} \eta^{2} = \frac{\mathit{SS}_{\mathrm{Between}}}{\mathit{SS}_{\mathrm{Total}}} = \frac{160.7333}{419.7333} = 0.3829\;, \end{aligned}$$

Kelley’s $\hat {\eta }^{2}$ measure of effect size is

$$\displaystyle \begin{aligned} \begin{array}{rcl} &\displaystyle &\displaystyle \hat{\eta}^{2} = \frac{\mathit{SS}_{\mathrm{Total}}-(N-1)\mathit{MS}_{\mathrm{Within}}}{\mathit{SS}_{\mathrm{Total}}}\\ &\displaystyle &\displaystyle \qquad \quad \quad \qquad \qquad \qquad \qquad {}= \frac{419.7333-(15-1)(23.5455)}{419.7333} = 0.2147\;, \end{array} \end{aligned} $$

Hays’ $\hat {\omega }_{\text{F}}^{2}$ measure of effect size for a fixed-effects model is

$$\displaystyle \begin{aligned} \begin{array}{rcl} &\displaystyle &\displaystyle \hat{\omega}_{\text{F}}^{2} = \frac{\mathit{SS}_{\mathrm{Between}}-(g-1)\mathit{MS}_{\mathrm{Within}}}{\mathit{SS}_{\mathrm{Total}}+\mathit{MS}_{\mathrm{Within}}}\\ &\displaystyle &\displaystyle \qquad \qquad \quad \qquad \qquad \qquad \qquad {}= \frac{160.7333-(4-1)(23.5455)}{419.7333+23.5455} = 0.2033\;, \end{array} \end{aligned} $$

Hays’ $\hat {\omega }_{\text{R}}^{2}$ measure of effect size for a random-effects model is

$$\displaystyle \begin{aligned} \begin{array}{rcl} {} &\displaystyle &\displaystyle \hat{\omega}_{\text{R}}^{2} = \frac{\mathit{MS}_{\mathrm{Between}}-\mathit{MS}_{\mathrm{Within}}}{\mathit{MS}_{\mathrm{Between}}+(\bar{n}-1)\mathit{MS}_{\mathrm{Within}}}\\ &\displaystyle &\displaystyle \qquad \qquad \quad \qquad \qquad \qquad \qquad {}= \frac{53.5777-23.5455}{53.5777+(3.75)(23.5455)} = 0.2117\;, \end{array} \end{aligned} $$

and Mielke and Berry’s $\Re $ chance-corrected measure of effect size is

$$\displaystyle \begin{aligned} \Re = 1-\frac{\delta}{\mu_{\delta}} = 1-\frac{47.0909}{56.9619} = +0.2147\;, \end{aligned}$$

indicating approximately 21% within-group agreement above what is expected by chance.

8.7.3 An Exact Analysis with v = 2

While an exact permutation analysis with M = 12, 612, 600 possible arrangements of the observed data may be impractical, it is not impossible. An exact analysis of the N = 15 observations listed in Table 8.8 on p. 289 under the Fisher–Pitman permutation model yields g = 4 average distance-function values of

$$\displaystyle \begin{aligned} \xi_{1} = \xi_{2} = 2.00\;, \quad \xi_{3} = 3.3333\;, \quad \mbox{and} \quad \xi_{4} = 125.00\;. \end{aligned}$$

The observed value of the permutation test statistic based on v = 2 and treatment-group weights

$$\displaystyle \begin{aligned} C_{i} = \frac{n_{i}-1}{N-g}\;, \qquad i = 1,\,\ldots,\,4\;, \end{aligned}$$

is

$$\displaystyle \begin{aligned} \begin{array}{rcl} {} \delta = \sum_{i=1}^{g}C_{i}\xi_{i}&\displaystyle = &\displaystyle \frac{1}{15-4}\big[ (3-1)(2.00)+(3-1)(2.00)\\ &\displaystyle &\displaystyle \qquad \qquad {}+(4-1)(3.3333)+(5-1)(125.00) \big] = 47.0909\;. \end{array} \end{aligned} $$

Under the Fisher–Pitman permutation model, the exact probability of an observed δ is the proportion of δ test statistic values computed on all possible, equally-likely arrangements of the N = 15 observations listed in Table 8.8 that are equal to or less than the observed value of δ = 47.0909. There are exactly 673,490 δ test statistic values that are equal to or less than the observed value of δ = 47.0909. If all M arrangements of the N = 15 observations listed in Table 8.8 occur with equal chance under the Fisher–Pitman null hypothesis, the exact probability value of δ = 47.0909 computed on the M = 12, 612, 600 possible arrangements of the observed data with n ₁ = n ₂ = 3 = n ₃ = 4, and n ₄ = 5 preserved for each arrangement is

$$\displaystyle \begin{aligned} P \big( \delta \leq \delta_{\text{o}}|H_{0} \big) = \frac{\text{number of }\delta\text{ values } \leq \delta_{\text{o}}}{M} = \frac{673{,}490}{12{,}612{,}600} = 0.0534\;, \end{aligned}$$

where δ _o denotes the observed value of test statistic δ and M is the number of possible, equally-likely arrangements of the N = 15 observations listed in Table 8.8.

Carrying the Monte Carlo probability value based on L = 1, 000, 000 random arrangements and the exact probability value based on M = 12, 612, 600 possible arrangements to a few extra decimal places allows for a more direct comparison of the Monte Carlo and exact permutation approaches. The Monte Carlo approximate probability value and the corresponding exact probability value to six decimal places are

$$\displaystyle \begin{aligned} P = 0.053242 \quad \mbox{and} \quad P = 0.053398\;, \end{aligned}$$

respectively. The difference between the two probability values is only

$$\displaystyle \begin{aligned} \Delta_{P} = 0.053398-0.053242 = 0.000156\;, \end{aligned}$$

demonstrating the efficiency and accuracy of a Monte Carlo approach for permutation methods when L is large and the exact probability value is not too small. In general, L = 1, 000, 000 random arrangements of the observed data is sufficient to ensure three decimal places of accuracy [11].

8.7.4 A Monte Carlo Analysis with v = 1

Consider a second analysis of the example data listed in Table 8.8 on p. 289 under the Fisher–Pitman permutation model with v = 1 and treatment-group weights

$$\displaystyle \begin{aligned} C_{i} = \frac{n_{i}-1}{N-g}\;, \qquad i = 1,\,\ldots,\,g\;. \end{aligned}$$

For v = 1, employing ordinary Euclidean scaling between the observations, thereby reducing the effects of any extreme values, the average distance-function values for the g = 4 treatment groups are

$$\displaystyle \begin{aligned} \xi_{1} = \xi_{2} = 1.3333\;, \quad \xi_{3} = 1,6667\;, \quad \mbox{and} \quad \xi_{4} = 8.00\;, \end{aligned}$$

respectively, and the observed permutation test statistic is

$$\displaystyle \begin{aligned} \begin{array}{rcl} {} \delta = \sum_{i=1}^{g}C_{i}\xi_{i} &\displaystyle =&\displaystyle \left( \frac{1}{15-4} \right)(3-1)(1.3333)+(3-1)(1.3333)\\ &\displaystyle &\displaystyle \qquad \qquad \qquad {}+(4-1)(1.6667)+(5-1)(8.00) = 3.8485\;. \end{array} \end{aligned} $$

Because there are

$$\displaystyle \begin{aligned} M = \frac{N!}{\displaystyle\prod_{i=1}^{g}n_{i}!} = \frac{15!}{3!\;3!\;4!\;5!} = 12{,}612{,}600 \end{aligned}$$

possible, equally-likely arrangements in the reference set of all permutations of the N = 28 observations listed in Table 8.8, a Monte Carlo permutation analysis is recommended.

Under the Fisher–Pitman permutation model, the Monte Carlo probability of an observed δ is the proportion of δ test statistic values computed on the randomly-selected, equally-likely arrangements of the N = 15 observations listed in Table 8.8 that are equal to or less than the observed value of δ = 3.8485. There are exactly 18,000 δ test statistic values that are equal to or less than the observed value of δ = 3.8485. If all M arrangements of the N = 15 observations listed in Table 8.8 occur with equal chance under the Fisher–Pitman null hypothesis, the Monte Carlo probability value of δ = 3.8485 computed on L = 1, 000, 000 random arrangements of the observed data with n ₁ = n ₂ = 3, n ₃ = 4, and n ₄ = 5 preserved for each arrangement is

$$\displaystyle \begin{aligned} P \big( \delta \leq \delta_{\text{o}}|H_{0} \big) = \frac{\text{number of }\delta\text{ values } \leq \delta_{\text{o}}}{L} = \frac{18{,}000}{1{,}000{,}000} = 0.0180\;, \end{aligned}$$

where δ _o denotes the observed value of test statistic δ and L is the number of randomly-selected, equally-likely arrangements of the N = 15 observations listed in Table 8.8.

For comparison, the approximate Monte Carlo probability value based on v = 2, L = 1, 000, 000, and

$$\displaystyle \begin{aligned} C_{i} = \frac{n_{i}-1}{N-g}\;, \qquad i = 1,\,\ldots,g\;, \end{aligned}$$

is P = 0.0532. The difference between the two probability values, P = 0.0180 and P = 0.0532, is due to the single extreme value of x _5,4 = 33 in the fourth treatment group. No comparison is made with Fisher’s F-ratio test statistic as F is undefined for ordinary Euclidean scaling.

For the example data listed in Table 8.8 on p. 289, the exact expected value of the M = 12, 612, 600 δ test statistic values under the Fisher–Pitman null hypothesis is

$$\displaystyle \begin{aligned} \mu_{\delta} = \frac{1}{M}\sum_{i=1}^{M}\delta_{i} = \frac{59{,}579{,}400}{12{,}612{,}600} = 4.7238 \end{aligned} $$

(8.23)

and the observed chance-corrected measure of effect size is

$$\displaystyle \begin{aligned} \Re = 1-\frac{\delta}{\mu_{\delta}} = 1-\frac{3.8485}{4.7238} = +0.1853\;, \end{aligned}$$

indicating approximately 19% within-group agreement above what is expected by chance. No comparisons are made with Cohen’s $\hat {d}$, Pearson’s η ² (r ²), Kelley’s $\hat {\eta }^{2}$, Hays’ $\hat {\omega }_{\text{F}}^{2}$, or Hays’ $\hat {\omega }_{\text{R}}^{2}$ conventional measures of effect size as $\hat {d}$, η ², $\hat {\eta }^{2}$, $\hat {\omega }_{\text{F}}^{2}$, and $\hat {\omega }_{\text{R}}^{2}$ are undefined for ordinary Euclidean scaling.

8.7.5 An Exact Analysis with v = 1

An exact permutation analysis of the observations listed in Table 8.8 with v = 1 yields g = 4 average distance-function values of

$$\displaystyle \begin{aligned} \xi_{1} = \xi_{2} = 1.3333\;, \quad \xi_{3} = 1,6667\;, \quad \mbox{and} \quad \xi_{4} = 8.00\;. \end{aligned}$$

The observed value of the permutation test statistic based on v = 1 and treatment-group weights

$$\displaystyle \begin{aligned} C_{i} = \frac{n_{i}-1}{N-g}\;, \qquad i = 1,\,\ldots,\,4\;, \end{aligned} $$

is

$$\displaystyle \begin{aligned} \begin{array}{rcl} {} \delta = \sum_{i=1}^{g}C_{i}\xi_{i} &\displaystyle =&\displaystyle \frac{1}{15-4} \big[ (3-1)(1.3333)+(3-1)(1.3333)\\ &\displaystyle &\displaystyle \qquad \qquad \quad {}+(4-1)(1.6667)+(5-1)(8.00) \big] = 3.8485\;. \vspace{-2pt}\end{array} \end{aligned} $$

Under the Fisher–Pitman permutation model, the exact probability of an observed δ is the proportion of δ test statistic values computed on all possible, equally-likely arrangements of the N = 15 observations listed in Table 8.8 that are equal to or less than the observed value of δ = 3.8485. There are exactly 225,720 δ test statistic values that are equal to or less than the observed value of δ = 3.8485. If all M arrangements of the N = 15 observations listed in Table 8.8 occur with equal chance under the Fisher–Pitman null hypothesis, the exact probability value of δ = 3.8485 computed on the M = 12, 612, 600 possible arrangements of the observed data with n ₁ = n ₂ = 3, n ₃ = 4, and n ₄ = 5 preserved for each arrangement is

$$\displaystyle \begin{aligned} P \big( \delta \leq \delta_{\text{o}}|H_{0} \big) = \frac{\text{number of }\delta\text{ values } \leq \delta_{\text{o}}}{M} = \frac{225{,}720}{12{,}612{,}600} = 0.0179\;, \end{aligned} $$

where δ _o denotes the observed value of test statistic δ and M is the number of possible, equally-likely arrangements of the N = 15 observations listed in Table 8.8.

The exact expected value of the M = 12, 612, 600 δ test statistic values under the Fisher–Pitman null hypothesis is

$$\displaystyle \begin{aligned} \mu_{\delta} = \frac{1}{M}\sum_{i=1}^{M}\delta_{i} = \frac{59{,}579{,}400}{12{,}612{,}600} = 4.7238 \end{aligned} $$

and the observed chance-corrected measure of effect size is

$$\displaystyle \begin{aligned} \Re = 1-\frac{\delta}{\mu_{\delta}} = 1-\frac{3.8485}{4.7238} = +0.1853\;, \end{aligned}$$

indicating approximately 19% within-group agreement above what is expected by chance. No comparisons are made with Cohen’s $\hat {d}$, Pearson’s η ² (r ²), Kelley’s $\hat {\eta }^{2}$, Hays’ $\hat {\omega }_{\text{F}}^{2}$, or Hays’ $\hat {\omega }_{\text{R}}^{2}$ conventional measures of effect size as $\hat {d}$, η ², $\hat {\eta }^{2}$, $\hat {\omega }_{\text{F}}^{2}$, and $\hat {\omega }_{\text{R}}^{2}$ are undefined for ordinary Euclidean scaling.

Finally, note the effect of a single extreme value (x _4,5 = 33) in Treatment 4 in the analysis based on ordinary Euclidean scaling with v = 1, compared with the analysis based on squared Euclidean scaling with v = 2. In the analysis based on v = 2, the value for the fourth average distance-function value was ξ ₄ = 125.00, but in the analysis based on v = 1, ξ ₄ was reduced to only ξ ₄ = 8.00. Also, in the analysis based on v = 2 the exact probability value was P = 0.0534, but in the analysis based on v = 1 the exact probability value was only P = 0.0179, a reduction of approximately 66%. For comparison, the asymptotic probability value of F = 2.2755 with ν ₁ = g − 1 = 4 − 1 = 3 and ν ₂ = N − g = 15 − 4 = 11 degrees of freedom was P = 0.1366.

8.8 Example 5: Rank-Score Permutation Analyses

In many research applications it becomes necessary to analyze rank-score data, typically because the required parametric assumptions of normality and homogeneity cannot be met. Consequently, the raw scores are often converted to rank scores and analyzed under a less-restrictive model. While it is never necessary to convert raw scores to rank scores under the Fisher–Pitman permutation model, sometimes the observed data are simply collected as rank scores. Thus, this fifth example serves merely to demonstrate the relationship between a g-sample test of rank-score observations under the population model and the same test under the permutation model. The conventional approach to univariate rank-score data for multiple independent samples under the Neyman–Pearson population model is the Kruskal–Wallis g-sample rank-sum test. As Kruskal and Wallis explained, the rank-sum test stemmed from two statistical methods: rank transformations of the original raw scores and permutations of the rank-order statistics [12].

8.8.1 The Kruskal–Wallis Rank-Sum Test

Consider g random samples of possibly different sizes and denote the size of the ith sample by n _i for i = 1, …, g. Let

$$\displaystyle \begin{aligned} N = \sum_{i=1}^{g}n_{i} \end{aligned}$$

denote the total number of observations, assign rank 1 to the smallest of the N observations, rank 2 to the next smallest observation, continuing to the largest observation that is assigned rank N, and let R _i denote the sum of the rank scores in the ith sample, i = 1, …, g. If there are no tied rank scores, the Kruskal–Wallis g-sample rank-sum test statistic is given by

$$\displaystyle \begin{aligned} H = \frac{12}{N(N+1)} \sum_{i=1}^{g}\frac{R_{i}}{n_{i}}-3(N+1)\;. \end{aligned} $$

(8.24)

When g = 2, H is equivalent to the Wilcoxon [25], Festinger [5], Mann–Whitney [15], Haldane–Smith [7], and van der Reyden [24] two-sample rank-sum tests.

For an example analysis of g-sample rank-score data, consider the rank scores listed in Table 8.10 with g = 3 samples, n ₁ = n ₂ = n ₃ = 6, N = 18, and no tied rank scores.

Table 8.10 Ranking of g = 3 with n ₁ = n ₂ = n ₃ = 6 and N = 18

Full size table

The conventional Kruskal–Wallis g-sample rank-sum test on the N = 18 rank scores listed in Table 8.10 yields an observed test statistic of

$$\displaystyle \begin{aligned} \begin{array}{rcl} {} H &\displaystyle =&\displaystyle \frac{12}{N(N+1)} \sum_{i=1}^{g}\frac{R_{i}}{n_{i}}-3(N+1)\\ &\displaystyle &\displaystyle \qquad {}= \frac{12}{18(18+1)}\left[ \frac{(63)^{2}}{6}+\frac{(30)^{2}}{6}+\frac{(78)^{2}}{6} \right] -3(18+1) = 7.0526\;, \end{array} \end{aligned} $$

where test statistic H is asymptotically distributed as Pearson’s chi-squared under the Neyman–Pearson null hypothesis with g − 1 degrees of freedom as N →∞. Under the Neyman–Pearson null hypothesis with g − 1 = 3 − 1 = 2 degrees of freedom, the observed value of H = 7.0526 yields an asymptotic probability value of P = 0.0294, under the assumption of normality.

8.8.2 A Monte Carlo Analysis with v = 2

For the first analysis of the rank-score data listed in Table 8.10 under the Fisher–Pitman permutation model let v = 2, employing squared Euclidean scaling between the pairs of rank scores, and let the treatment-groups weights be given by

$$\displaystyle \begin{aligned} C_{i} = \frac{n_{i}-1}{N-g}\;, \qquad i = 1,\,\ldots,\,g\;, \end{aligned}$$

for correspondence with the Kruskal–Wallis g-sample rank-sum test. The average distance-function values for the g = 3 samples are

$$\displaystyle \begin{aligned} \xi_{1} = 53.40\;, \quad \xi_{2} = 29.60\;, \quad \mbox{and} \quad \xi_{3} = 30.40\;, \end{aligned}$$

and the observed value of the permutation test statistic based on v = 2 is

$$\displaystyle \begin{aligned} \delta = \sum_{i=1}^{g}C_{i}\xi_{i} = \frac{6-1}{18-3}\big( 53.40+29.60+30.40 \big) = 37.80\;. \end{aligned}$$

Because there are

$$\displaystyle \begin{aligned} M = \frac{N!}{\displaystyle\prod_{i=1}^{g}n_{i}!} = \frac{18!}{6!\;6!\;6!} = 17{,}153{,}136 \end{aligned}$$

possible, equally-likely arrangements in the reference set of all permutations of the N = 18 rank scores listed in Table 8.10, an exact permutation analysis is not practical and a Monte Carlo permutation analysis is utilized.

Under the Fisher–Pitman permutation model, the Monte Carlo probability of an observed δ is the proportion of δ test statistic values computed on the randomly-selected, equally-likely arrangements of the N = 18 rank scores listed in Table 8.10 that are equal to or less than the observed value of δ = 37.80. There are exactly 21,810 δ test statistic values that are equal to or less than the observed value of δ = 37.80. If all M arrangements of the N = 18 observations listed in Table 8.10 occur with equal chance under the Fisher–Pitman null hypothesis, the Monte Carlo probability value of δ = 37.80 computed on L = 1, 000, 000 random arrangements of the observed data with n ₁ = n ₂ = n ₃ = 6 preserved for each arrangement is

$$\displaystyle \begin{aligned} P \big( \delta \leq \delta_{\text{o}}|H_{0} \big) = \frac{\text{number of }\delta\text{ values } \leq \delta_{\text{o}}}{L} = \frac{21{,}810}{1{,}000{,}000} = 0.0218\;, \end{aligned}$$

where δ _o denotes the observed value of test statistic δ and L is the number of randomly-selected, equally-likely arrangements of the N = 18 rank scores listed in Table 8.10. It should be noted that whereas the Kruskal–Wallis test statistic, H, as defined in Eq. (8.24) does not allow for tied rank scores, test statistic δ automatically accommodates tied rank scores.

The functional relationships between test statistics δ and H are given by

$$\displaystyle \begin{aligned} \delta = \frac{2\left( T-\left\{ \displaystyle\frac{S}{6} \Big[ H+3(N+1) \Big] \right\} \right)}{N-g} \end{aligned} $$

(8.25)

and

$$\displaystyle \begin{aligned} H = \frac{6}{S} \left[ T-\frac{\delta}{2}(N-g) \right] -3(N+1)\;, \end{aligned} $$

(8.26)

where, if no rank scores are tied, S and T may simply be expressed as

$$\displaystyle \begin{aligned} S = \sum_{i=1}^{N}i = \frac{N(N+1)}{2} \quad \mbox{and} \quad T = \sum_{i=1}^{N}i^{2} = \frac{N(N+1)(2N+1)}{6}\;. \end{aligned}$$

Note that in Eqs. (8.25) and (8.26), S, T, N, and g are invariant under permutation, along with the constants 2, 3, and 6.

The relationships between test statistics δ and H can be confirmed with the rank-score data listed in Table 8.10. For the rank scores listed in Table 8.10 with no tied values, the observed value of S is

$$\displaystyle \begin{aligned} S = \sum_{i=1}^{N}i = \frac{N(N+1)}{2} = \frac{18(18+1)}{2} = 171\;, \end{aligned}$$

and the observed value of T is

$$\displaystyle \begin{aligned} T = \sum_{i=1}^{N}i^{2} = \frac{N(N+1)(2N+1)}{6} = \frac{18(18+2)[(2)(18)+1]}{6} = 2109\;. \end{aligned}$$

Then following Eq. (8.25), the observed value of the permutation test statistic for the N = 18 rank scores listed in Table 8.10 is

$$\displaystyle \begin{aligned} \begin{array}{rcl} &\displaystyle &\displaystyle \delta = \frac{2\left( T-\left\{ \displaystyle\frac{S}{6} \Big[ H+3(N+1) \Big] \right\} \right)}{N-g} = \frac{N(N+1)(N-1-H)}{6(N-g)}\\ &\displaystyle &\displaystyle \qquad \qquad \qquad \qquad \qquad \qquad \qquad {}= \frac{18(18+1)(18-1-7.0526)}{6(18-3)} = 37.80 \end{array} \end{aligned} $$

and, following Eq. (8.26), the observed value of the Kruskal–Wallis test statistic is

$$\displaystyle \begin{aligned} \begin{array}{rcl} {} H = \frac{6}{S} \left[ T-\frac{\delta}{2}(N-g) \right] &\displaystyle -&\displaystyle 3(N+1) = N-1-\frac{6\delta(N-g)}{N(N+1)}\\ &\displaystyle &\displaystyle \qquad \ \ \ {}= 18-1-\frac{6(37.80)(18-3)}{18(18+1)} = 7.0526\;. \end{array} \end{aligned} $$

Because of the relationship between test statistics δ and H, the Monte Carlo probability value of the realized value of H = 7.0526 is identical to the Monte Carlo probability value of δ = 37.80 under the Fisher–Pitman null hypothesis. Thus,

$$\displaystyle \begin{aligned} P \big( H \geq H_{\text{o}}|H_{0} \big) = \frac{\text{number of }H\text{ values } \geq H_{\text{o}}}{L} = \frac{21{,}810}{1{,}000{,}000} = 0.0218\;, \end{aligned}$$

where H _o denotes the observed value of test statistic H.

The exact expected value of the M = 17, 153, 136 δ test statistic values under the Fisher–Pitman null hypothesis is

$$\displaystyle \begin{aligned} \mu_{\delta} = \frac{1}{M} \sum_{i=1}^{M}\delta_{i} = \frac{977{,}728{,}752}{17{,}153{,}136} = 57.00 \end{aligned}$$

and the observed chance-corrected measure of effect size is

$$\displaystyle \begin{aligned} \Re = 1-\frac{\delta}{\mu_{\delta}} = 1-\frac{37.80}{57.00} = +0.3368\;, \end{aligned}$$

indicating approximately 34% within-group agreement above what is expected by chance. No comparisons are made with Cohen’s $\hat {d}$, Pearson’s η ² (r ²), Kelley’s $\hat {\eta }^{2}$, Hays’ $\hat {\omega }^{2}$, or Hays’ $\hat {\omega }_{\text{R}}^{2}$ measures of effect size as $\hat {d}$, η ², $\hat {\eta }^{2}$, $\hat {\omega }_{\text{F}}^{2}$, and $\hat {\omega }_{\text{R}}^{2}$ are undefined for rank-score data.

8.8.3 An Exact Analysis with v = 2

Although an exact permutation analysis with M = 17, 153, 136 possible arrangements of the observed data may be impractical, it is not impossible. An exact permutation analysis of the N = 18 observations listed in Table 8.10 yields g = 3 average distance-function values of

$$\displaystyle \begin{aligned} \xi_{1} = 53.40\;, \quad \xi_{2} = 29.60\;, \quad \mbox{and} \quad \xi_{3} = 30.40\;, \end{aligned}$$

and the observed value of the permutation test statistic based on v = 2 and treatment-group weights

$$\displaystyle \begin{aligned} C_{i} = \frac{N_{i}-1}{N-g}\;, \qquad i = 1,2,3\;, \end{aligned}$$

is

$$\displaystyle \begin{aligned} \delta = \sum_{i=1}^{g}C_{i}\xi_{i} = \frac{6-1}{18-3}\big( 53.40+29.60+30.40 \big) = 37.80\;. \end{aligned}$$

There are

$$\displaystyle \begin{aligned} M = \frac{N!}{\displaystyle\prod_{i=1}^{g}n_{i}!} = \frac{18!}{6!\;6!\;6!} = 17{,}153{,}136 \end{aligned}$$

possible, equally-likely arrangements in the reference set of all permutations of the N = 18 rank scores listed in Table 8.10, making an exact permutation analysis feasible. Under the Fisher–Pitman permutation model, the exact probability of an observed δ is the proportion of δ test statistic values computed on all possible, equally-likely arrangements of the N = 18 rank scores listed in Table 8.10 that are equal to or less than the observed value of δ = 37.80. There are exactly 376,704 δ test statistic values that are equal to or less than the observed value of δ = 37.80. If all M arrangements of the N = 18 rank scores listed in Table 8.10 occur with equal chance under the Fisher–Pitman null hypothesis, the exact probability value of δ = 37.80 computed on the M = 17, 153, 136 possible arrangements of the observed data with n ₁ = n ₂ = n ₃ = 6 preserved for each arrangement is

$$\displaystyle \begin{aligned} P \big( \delta \leq \delta_{\text{o}}|H_{0} \big) = \frac{\text{number of }\delta\text{ values } \leq \delta_{\text{o}}}{M} = \frac{376{,}704}{17{,}153{,}136} = 0.0220\;, \end{aligned}$$

where δ _o denotes the observed value of test statistic δ and M is the number of possible, equally-likely arrangements of the N = 18 rank scores listed in Table 8.10. For comparison, the Monte Carlo probability value based on v = 2, L = 1, 000, 000 random arrangements of the observed data, and treatment-group weights given by

$$\displaystyle \begin{aligned} C_{i} = \frac{n_{i}-1}{N-g}\;, \qquad i = 1,2,3\;, \end{aligned}$$

is P = 0.0218 for a difference between the two probability values of only

$$\displaystyle \begin{aligned} \Delta_{P} = 0.0220-0.0218 = 0.0002\;. \end{aligned}$$

8.8.4 An Exact Analysis with v = 1

For a second analysis of the rank-score data listed in Table 8.10, let the treatment-group weights be given by

$$\displaystyle \begin{aligned} C_{i} = \frac{n_{i}-1}{N-g}\;, \qquad i = 1,\,\ldots,\,g\;, \end{aligned}$$

as in the previous example but set v = 1, employing ordinary Euclidean scaling between the pairs of rank scores. The N = 18 rank scores listed in Table 8.10 yield g = 3 average distance-function values of

$$\displaystyle \begin{aligned} \xi_{1} = 6.3333\;, \quad \xi_{2} = 4.6667\;, \quad \mbox{and} \quad \xi_{3} = 4.5333\;, \end{aligned}$$

and the observed value of the permutation test statistic based on v = 1 is

$$\displaystyle \begin{aligned} \delta = \sum_{i=1}^{g}C_{i}\xi_{i} = \frac{6-1}{18-3}\big( 6.3333+4.6667+4.5333 \big) = 5.1778\;. \end{aligned}$$

Under the Fisher–Pitman permutation model, the exact probability of an observed δ is the proportion of δ test statistic values computed on all possible, equally-likely arrangements of the N = 18 rank scores listed in Table 8.10 that are equal to or less than the observed value of δ = 5.1778. There are exactly 547,662 δ test statistic values that are equal to or less than the observed value of δ = 5.1778. If all M arrangements of the N = 18 rank scores listed in Table 8.10 occur with equal chance under the Fisher–Pitman null hypothesis, the exact probability value of δ = 5.1778 computed on the M = 17, 153, 136 possible arrangements of the observed data with n ₁ = n ₂ = n ₃ = 6 preserved for each arrangement is

$$\displaystyle \begin{aligned} P \big( \delta \leq \delta_{\text{o}}|H_{0} \big) = \frac{\text{number of }\delta\text{ values } \leq \delta_{\text{o}}}{M} = \frac{547{,}662}{17{,}153{,}136} = 0.0319\;, \end{aligned}$$

where δ _o denotes the observed value of test statistic δ and M is the number of possible, equally-likely arrangements of the N = 18 rank scores listed in Table 8.10. For comparison, the exact probability value based on v = 2, M = 17, 153, 136, and

$$\displaystyle \begin{aligned} C_{i} = \frac{n_{i}-1}{N-g}\;, \quad i = 1,2,3\;, \end{aligned}$$

is P = 0.0220. No comparison is made with the conventional Kruskal–Wallis g-sample rank-sum test as H is undefined for ordinary Euclidean scaling.

The exact expected value of the M = 17, 153, 136 δ test statistic values under the Fisher–Pitman null hypothesis is

$$\displaystyle \begin{aligned} \mu_{\delta} = \frac{1}{M}\sum_{i=1}^{M}\delta_{i} = \frac{108,636,5232}{17{,}153{,}136} = 6.3333\;, \end{aligned}$$

and the observed chance-corrected measure of effect size is

$$\displaystyle \begin{aligned} \Re = 1-\frac{\delta}{\mu_{\delta}} = 1-\frac{5.1778}{6.3333} = +0.1825\;, \end{aligned}$$

indicating approximately 18% within-group agreement above what is expected by chance. No comparisons are made with Cohen’s $\hat {d}$, Pearson’s r ² (η ²), Kelley’s $\hat {\eta }^{2}$, Hays’ $\hat {\omega }_{\text{F}}^{2}$, or Hays’ $\hat {\omega }_{\text{R}}^{2}$ measures of effect size as $\hat {d}$, r ², $\hat {\eta }^{2}$, $\hat {\omega }_{\text{F}}^{2}$, and $\hat {\omega }_{\text{R}}^{2}$ are undefined for rank-score data.

8.9 Example 6: Multivariate Permutation Analyses

It is sometimes desirable to test for differences among g ≥ 3 independent treatment groups where r ≥ 2 measurement scores have been obtained from each object. The conventional approach is a one-way multivariate analysis of variance (MANOVA) for which a number of statistical tests have been proposed, including the Bartlett–Nanda–Pillai (BNP) trace test [1, 16, 19], Wilks’ likelihood-ratio test [26], Roy’s maximum-root test [20, 21], and the Lawley–Hotelling trace test [9, 13, 14]. The Bartlett–Nanda–Pillai trace test is considered to be the most powerful and robust of the four tests [17, 18, 23, p. 269].

8.9.1 The Bartlett–Nanda–Pillai Trace Test

To illustrate a conventional multivariate analysis of variance, consider the BNP trace test given by

where W denotes the Within matrix summarizing within-object variability, B denotes the hypothesized Between matrix summarizing between-object variability, and $s = \min (r,\;g-1)$. For a conventional test of significance, the BNP trace statistic, V ^(s), can be transformed into a conventional F test statistic by

$$\displaystyle \begin{aligned} F = \frac{2u+s+1}{2t+s+1}\left( \frac{V^{(s)}}{s-V^{(s)}} \right)\;, \end{aligned} $$

(8.27)

where $s = \min (r, \,g-1)$, u = 0.50(N − g − r − 1), t = 0.50(|r − q|− 1), and q = g − 1. Assuming independence, normality, and homogeneity of variance and covariance, test statistic F is asymptotically distributed as Snedecor’s F under the Neyman–Pearson null hypothesis with ν ₁ = s(2t + s + 1) and ν ₂ = s(2u + s + 1) degrees of freedom.

To illustrate the BNP trace test, consider the multivariate observations listed in Table 8.11, where r = 2 measurements, g = 3 treatment groups, n ₁ = 5, n ₂ = 4, and n ₃ = 3 sample sizes, and N = 12 multivariate observations.

Table 8.11 Example multivariate response measurement scores with r = 2 measurement scores, g = 3 treatment groups, n ₁ = 5, n ₂ = 4, n ₃ = 3, and N = 12 observations

Full size table

A conventional BNP analysis of the multivariate observations listed in Table 8.11 yields

$$\displaystyle \begin{aligned} \mathbf{W} = \left[ \begin{array}{rcr} 11.71000 && 1.17000 \\ {} 1.17000 && 10.42667 \end{array} \right]\;, \quad \mathbf{B} = \left[ \begin{array}{rcr} 2.75250 && 3.19755 \\ {} 3.19755 && 17.30242 \end{array} \right]\;, \end{aligned}$$

$$\displaystyle \begin{aligned} \mathbf{W+B} = \left[ \begin{array}{rcr} 14.46250 && 4.36755 \\ {} 4.36755 && 27.72909 \end{array} \right]\;, \end{aligned}$$

$$\displaystyle \begin{aligned} (\mathbf{W+B})^{-1} = \left[ \begin{array}{rcr} 0.07260 && -0.01143 \\ {} -0.01143 && 0.03786 \end{array} \right]\;, \end{aligned}$$

$$\displaystyle \begin{aligned} \mathbf{B}(\mathbf{W+B})^{-1} = \left[ \begin{array}{rcr} 0.16328 && 0.08960 \\ {} 0.03476 && 0.61852 \end{array} \right]\;, \end{aligned}$$

and

Then, q = g − 1 = 3 − 1 = 2, $s = \min (r,\;q) = \min (2,\;3-1) = 2$, u = 0.50(N − g − r − 1) = 0.50(12 − 3 − 2 − 1) = 3, t = 0.50(|r − q|− 1) = 0.50(|2 − 2|− 1) = −0.50, and following Eq. (8.27) on p. 306, the observed value of Fisher’s F-ratio test statistic is

$$\displaystyle \begin{aligned} F = \frac{2(3)+2+1}{2(-0.50)+2+1} \left( \frac{0.7818}{2-0.7818} \right) = \frac{9}{2}(0.6414) = 2.8879\;. \end{aligned}$$

Assuming independence, normality, homogeneity of variance, and homogeneity of covariance, test statistic F is asymptotically distributed as Snedecor’s F with ν ₁ = s(2t + s + 1) = 2[(2)(−0.50) + 2 + 1] = 4 and ν ₂ = s(2u + s + 1) = 2[(2)(3) + 2 + 1] = 18 degrees of freedom. Under the Neyman–Pearson null hypothesis, the observed value of F = 2.8879 with ν ₁ = 4 and ν ₂ = 18 degrees of freedom yields an asymptotic probability value of P = 0.0521.

8.9.2 An Exact Analysis with v = 2

For the first analysis of the observed data listed in Table 8.11 under the Fisher–Pitman permutation model let v = 2, employing squared Euclidean scaling between the pairs of multivariate observations, and let the treatment-group weights be given by

$$\displaystyle \begin{aligned} C_{i} = \frac{n_{i}-1}{N-g}\;, \qquad i = 1,\,\ldots,\,g\;, \end{aligned}$$

for correspondence with the BNP trace test. An exact permutation analysis is feasible for the multivariate observations listed in Table 8.11 as there are only

$$\displaystyle \begin{aligned} M = \frac{N!}{\displaystyle\prod_{i=1}^{g}n_{i}!} = \frac{12!}{5!\;4!\;3!} = 27{,}720 \end{aligned}$$

possible, equally-likely arrangements in the reference set of all permutations of the N = 12 multivariate scores listed in Table 8.11.

Following Eq. (8.2) on p. 261, the multivariate observations listed in Table 8.11 yield g = 3 average distance-function values of

$$\displaystyle \begin{aligned} \xi_{1} = 0.3242\;, \quad \xi_{2} = 0.2994\;, \mbox{ ~and} \quad \xi_{3} = 0.1207\;. \end{aligned}$$

Following Eq. (8.1) on p. 261, the observed value of the permutation test statistic based on v = 2 and treatment-group weights

$$\displaystyle \begin{aligned} C_{i} = \frac{n_{i}-1}{N-g}\;, \qquad i = 1,2,3\;, \end{aligned}$$

is

$$\displaystyle \begin{aligned} \begin{array}{rcl} {} \delta = \sum_{i=1}^{g}C_{i}\xi_{i} = \frac{1}{12-3} \big[ (5-1)(0.3242)&\displaystyle +&\displaystyle (4-1)(0.2994)\\ &\displaystyle &\displaystyle \quad {}+(3-1)(0.1207) \big] = 0.2707\;. \end{array} \end{aligned} $$

Under the Fisher–Pitman permutation model, the exact probability of an observed δ is the proportion of δ test statistic values computed on all possible, equally-likely arrangements of the N = 12 multivariate observations listed in Table 8.11 that are equal to or less than the observed value of δ = 0.2707. There are exactly 967 δ test statistic values that are equal to or less than the observed value of δ = 0.2702. If all M arrangements of the N = 12 multivariate scores listed in Table 8.11 occur with equal chance under the Fisher–Pitman null hypothesis, the exact probability value of δ = 0.2707 computed on the M = 27, 720 possible arrangements of the observed data with n ₁ = 5, n ₂ = 4, and n ₃ = 3 multivariate observations preserved for each arrangement is

$$\displaystyle \begin{aligned} P \big( \delta \leq \delta_{\text{o}}|H_{0} \big) = \frac{\text{number of }\delta\text{ values } \leq \delta_{\text{o}}}{M} = \frac{967}{27{,}720} = 0.0349\;, \end{aligned}$$

where δ _o denotes the observed value of test statistic δ and M is the number of possible, equally-likely arrangements of the N = 12 multivariate observations listed in Table 8.11.

Following Eq. (8.7) on p. 263, the exact expected value of the M = 27, 720 δ test statistic values under the Fisher–Pitman null hypothesis is

$$\displaystyle \begin{aligned} \mu_{\delta} = \frac{1}{M}\sum_{i=1}^{M}\delta_{i} = \frac{10{,}080}{27{,}720} = 0.3636 \end{aligned}$$

and, following Eq. (8.6) on p. 263, the observed chance-corrected measure of effect size is

$$\displaystyle \begin{aligned} \Re = 1-\frac{\delta}{\mu_{\delta}} = 1-\frac{0.2707}{0.3636} = +0.2556\;, \end{aligned}$$

indicating approximately 26% within-group agreement above what is expected by chance.

A convenient, although positively biased, measure of effect size for the BNP trace test is given by

$$\displaystyle \begin{aligned} \eta^{2} = \frac{V^{(2)}}{s} = \frac{0.7818}{2} = 0.3909\;, \end{aligned}$$

which can be compared with the unbiased chance-corrected measure of effect size, $\Re = +0.2556$. No comparisons are made with Cohen’s $\hat {d}$, Kelley’s $\hat {\eta }^{2}$, Hays’ $\hat {\omega }_{\text{F}}^{2}$, or Hays’ $\hat {\omega }_{\text{R}}^{2}$ measures of effect size as $\hat {d}$, $\hat {\eta }^{2}$, $\hat {\omega }_{\text{F}}^{2}$, and $\hat {\omega }_{\text{R}}^{2}$ are undefined for multivariate data.

The functional relationships between statistic δ and the V ⁽²⁾ BNP trace statistic are given by

$$\displaystyle \begin{aligned} \delta = \frac{2\big( r-V^{(2)} \big)}{N-g} \quad \mbox{and} \quad V^{(2)} = r-\frac{\delta(N-g)}{2}\;. \end{aligned} $$

(8.28)

Following the expressions given in Eq. (8.28) for test statistics δ and V ², the observed value for test statistic δ with respect to the observed value of test statistic V ² is

$$\displaystyle \begin{aligned} \delta = \frac{2\big( r-V^{(2)} \big)}{N-g} = \frac{2(2-0.7818)}{12-3} = 0.2707 \end{aligned}$$

and the observed value for test statistic V ² with respect to the observed value of test statistic δ is

$$\displaystyle \begin{aligned} V^{(2)} = r-\frac{\delta(N-g)}{2} = 2-\frac{(0.2707)(12-3)}{2} = 0.7818\;. \end{aligned}$$

8.9.3 An Exact Analysis with v = 1

For a second analysis of the multivariate measurement scores listed in Table 8.11 on p. 307 under the Fisher–Pitman permutation model, let the treatment-group weights again be given by

$$\displaystyle \begin{aligned} C_{i} = \frac{n_{i}-1}{N-g}\;, \quad i = 1,\,\ldots,\,g\;, \end{aligned}$$

but set v = 1 instead of v = 2, employing ordinary Euclidean scaling between the N = 12 multivariate scores. Following Eq. (8.2) on p. 261, the multivariate scores listed in Table 8.11 yield g = 3 average distance-function values of

$$\displaystyle \begin{aligned} \xi_{1} = 2.3933\;, \quad \xi_{2} = 1.9326\;, \mbox{ ~and} \quad \xi_{3} = 1.4284\;. \end{aligned}$$

Following Eq. (8.1) on p. 261, the observed value of the permutation test statistic based on v = 1 and treatment-group weights

$$\displaystyle \begin{aligned} C_{i} = \frac{n_{i}-1}{N-g}\;, \qquad i = 1,2,3\;, \end{aligned}$$

is

$$\displaystyle \begin{aligned} \begin{array}{rcl} {} \delta = \sum_{i=1}^{g}C_{i}\xi_{i} = \frac{1}{12-3} \big[ (5-1)(2.3933)&\displaystyle +&\displaystyle (4-1)(1.9326)\\ &\displaystyle &\displaystyle \quad {}+(3-1)(1.4284) \big] = 2.0253\;. \end{array} \end{aligned} $$

There are only

$$\displaystyle \begin{aligned} M = \frac{N!}{\displaystyle\prod_{i=1}^{g}n_{i}!} = \frac{12!}{5!\;4!\;3!} = 27{,}720 \end{aligned}$$

possible, equally-likely arrangements in the reference set of all permutations of the N = 12 multivariate observations listed in Table 8.11, making an exact permutation analysis feasible.

Under the Fisher–Pitman permutation model, the exact probability of an observed δ is the proportion of δ test statistic values computed on all possible, equally-likely arrangements of the N = 12 multivariate observations listed in Table 8.11 that are equal to or less than the observed value of δ = 2.0253. There are exactly 618 δ test statistic values that are equal to or less than the observed value of δ = 2.0253. If all M arrangements of the N = 12 multivariate observations listed in Table 8.11 occur with equal chance under the Fisher–Pitman null hypothesis, the exact probability value of δ = 2.0253 computed on the M = 27, 720 possible arrangements of the observed data with n ₁ = 5, n ₂ = 4, and n ₃ = 3 multivariate observations preserved for each arrangement is

$$\displaystyle \begin{aligned} P \big( \delta \leq \delta_{\text{o}}|H_{0} \big) = \frac{\text{number of }\delta\text{ values } \leq \delta_{\text{o}}}{M} = \frac{618}{27{,}720} = 0.0223\;, \end{aligned}$$

where δ _o denotes the observed value of test statistic δ and M is the number of possible, equally-likely arrangements of the N = 12 multivariate observations listed in Table 8.11. No comparison is made with the Bartlett–Nanda–Pillai trace test as the BNP test is undefined for ordinary Euclidean scaling.

Following Eq. (8.7) on p. 263, the exact expected value of the M = 27, 720 δ test statistic values under the Fisher–Pitman null hypothesis is

$$\displaystyle \begin{aligned} \mu_{\delta} = \frac{1}{M}\sum_{i=1}^{M}\delta_{i} = \frac{69,854}{27{,}720} = 2.5200 \end{aligned}$$

and, following Eq. (8.6) on p. 263, the observed chance-corrected measure of effect size is

$$\displaystyle \begin{aligned} \Re = 1-\frac{\delta}{\mu_{\delta}} = 1-\frac{2.0253}{2.5200} = +0.1963\;, \end{aligned}$$

indicating approximately 20% within-group agreement above that expected by chance. No comparison is made with the conventional measure of effect size as η ² is undefined for ordinary Euclidean scaling.

8.10 Summary

This chapter examined statistical methods for multiple independent samples where the null hypothesis posits no differences among the g ≥ 3 populations that the g random samples are presumed to represent. Under the Neyman–Pearson population model of statistical inference, a conventional one-way analysis of variance and five measures of effect size were described and illustrated: Fisher’s F-ratio test statistic, and Cohen’s $\hat {d}$, Pearson’s η ², Kelley’s $\hat {\eta }^{2}$, Hays’ $\hat {\omega }_{\text{F}}^{2}$, and Hays’ $\hat {\omega }_{\text{R}}^{2}$ measures of effect size, respectively.

Under the Fisher–Pitman permutation model of statistical inference, test statistic δ and associated measure of effect size, $\Re $, were described and illustrated for multi-sample tests. For tests of g ≥ 3 independent samples, test statistic δ was demonstrated to be flexible enough to incorporate both ordinary and squared Euclidean scaling functions with v = 1 and v = 2, respectively. Effect size measure $\Re $ was shown to be applicable to either v = 1 or v = 2 without modification and to have a clear and meaningful chance-corrected interpretation.

Six examples illustrated permutation-based statistics δ and $\Re $. In the first example, a small sample of N = 10 observations in g = 3 treatment groups was utilized to describe and illustrate the calculation of test statistics δ and $\Re $ for multiple independent samples. The second example with N = 10 observations in g = 3 treatment groups demonstrated the chance-corrected measure of effect size, $\Re $, and related $\Re $ to the five conventional measures of effect size for g ≥ 3 independent samples: Cohen’s $\hat {d}$, Pearson’s η ², Kelley’s $\hat {\eta }^{2}$, Hays’ $\hat {\omega }_{\text{F}}^{2}$, and Hays’ $\hat {\omega }_{\text{R}}^{2}$. The third example with N = 28 observations in g = 4 treatment groups illustrated the effects of extreme values on analyses using v = 1 for ordinary Euclidean scaling and v = 2 for squared Euclidean scaling. The fourth example with N = 15 observations in g = 4 treatment groups compared exact and Monte Carlo permutation statistical methods, illustrating the accuracy and efficiency of Monte Carlo analyses. The fifth example with N = 18 rank scores in g = 3 treatment groups illustrated an application of permutation statistical methods to univariate rank-score data, comparing a permutation analysis of the rank-score data with the conventional Kruskal–Wallis g-sample one-way analysis of variance for ranks. In the sixth example, both test statistic δ and effect size measure $\Re $ were extended to multivariate data with N = 12 multivariate observations in g = 3 treatment groups and compared the permutation analysis of the multivariate data to the conventional Bartlett–Nanda–Pillai trace test for multivariate independent samples.

Chapter 9 continues the presentation of permutation statistical methods for g ≥ 3 samples, but examines research designs in which the subjects in the g ≥ 3 samples are matched on specific characteristics; that is, not independent. Research designs that posit no differences among matched treatment groups have a long history and are ubiquitous in the contemporary statistical literature and are generally known as randomized-blocks designs, of which there exist a large variety.

Notes

1.
In some disciplines tests on multiple independent samples are known as between-subjects tests and tests for multiple dependent or related samples are known as within-subjects tests.
2.
The terms MS _Between and MS _Within are only one set of descriptive labels for the numerator and denominator of the F-ratio test statistic. MS _Between is often replaced by either MS _Treatment or MS _Factor and MS _Within is often replaced by MS _Error.
3.
It is well known that Kelley’s correlation ratio is not unbiased, but since the title of Truman Kelley’s 1935 article was “An unbiased correlation ratio measure,” the label has persisted.
4.
Since the sizes of the treatment groups are not equal, the average value of $\bar {n} = 3.3333$ is used for both Cohen’s $\hat {d}$ measure of effect size and Hays’ $\hat {\omega }_{\text{R}}^{2}$ measure of effect size for a random-effects model. In cases where the treatment-group sizes differ greatly, a weighted average recommended by Haggard is often adopted [6].
5.
For a one-way completely-randomized analysis of variance, a fixed-effects model and a random-effects model yield the same F-ratio, but measures of effect size can differ under the two models.

References

Bartlett, M.S.: A note on tests of significance in multivariate analysis. Proc. Camb. Philos. Soc. 34, 33–40 (1939)
Article Google Scholar
Berry, K.J., Mielke, P.W., Johnston, J.E.: Permutation Statistical Methods: An Integrated Approach. Springer, Cham (2016)
Book Google Scholar
Boik, R.J.: The Fisher–Pitman permutation test: a non-robust alternative to the normal theory F test when variances are heterogeneous. Brit. J. Math. Stat. Psychol. 40, 26–42 (1987)
Article MathSciNet Google Scholar
Box, J.F.: R. A. Fisher: The Life of a Scientist. Wiley, New York (1978)
Google Scholar
Festinger, L.: The significance of differences between means without reference to the frequency distribution function. Psychometrika 11, 97–105 (1946)
Article MathSciNet Google Scholar
Haggard, E.A.: Intraclass Correlation and the Analysis of Variance. Dryden, New York (1958)
MATH Google Scholar
Haldane, J.B.S., Smith, C.A.B.: A simple exact test for birth-order effect. Ann. Eugenic. 14, 117–124 (1948)
Article Google Scholar
Hall, N.S.: R. A. Fisher and his advocacy of randomization. J. Hist. Biol. 40, 295–325 (2007)
Google Scholar
Hotelling, H.: A generalized T test and measure of multivariate dispersion. In: Neyman, J. (ed.) Proceedings of the Second Berkeley Symposium on Mathematical Statistics and Probability, vol. II, pp. 23–41. University of California Press, Berkeley (1951)
Google Scholar
Hotelling, H., Pabst, M.R.: Rank correlation and tests of significance involving no assumption of normality. Ann. Math. Stat. 7, 29–43 (1936)
Article Google Scholar
Johnston, J.E., Berry, K.J., Mielke, P.W.: Permutation tests: precision in estimating probability values. Percept. Motor Skill. 105, 915–920 (2007)
Article Google Scholar
Kruskal, W.H., Wallis, W.A.: Use of ranks in one-criterion variance analysis. J. Am. Stat. Assoc. 47, 583–621 (1952). [Erratum: J. Am. Stat. Assoc. 48, 907–911 (1953)]
Article Google Scholar
Lawley, D.N.: A generalization of Fisher’s z test. Biometrika 30, 180–187 (1938)
Article Google Scholar
Lawley, D.N.: Corrections to “A generalization of Fisher’s z test”. Biometrika 30, 467–469 (1939)
MATH Google Scholar
Mann, H.B., Whitney, D.R.: On a test of whether one of two random variables is stochastically larger than the other. Ann. Math. Stat. 18, 50–60 (1947)
Article MathSciNet Google Scholar
Nanda, D.N.: Distribution of the sum of roots of a determinantal equation. Ann. Math. Stat. 21, 432–439 (1950)
Article Google Scholar
Olson, C.L.: On choosing a test statistic in multivariate analysis of variance. Psychol. Bull. 83, 579–586 (1976)
Article Google Scholar
Olson, C.L.: Practical considerations in choosing a MANOVA test statistic: a rejoinder to Stevens. Psychol. Bull. 86, 1350–1352 (1979)
Article Google Scholar
Pillai, K.C.S.: Some new test criteria in multivariate analysis. Ann. Math. Stat. 26, 117–121 (1955)
Article MathSciNet Google Scholar
Roy, S.N.: On a heuristic method of test construction and its use in multivariate analysis. Ann. Math. Stat. 24, 220–238 (1953)
Article MathSciNet Google Scholar
Roy, S.N.: Some Aspects of Multivariate Analysis. Wiley, New York (1957)
Google Scholar
Snedecor, G.W.: Calculation and Interpretation of Analysis of Variance and Covariance. Collegiate Press, Ames (1934)
Book Google Scholar
Tabachnick, B.G., Fidell, L.S.: Using Multivariate Statistics, 5th edn. Pearson, Boston (2007)
Google Scholar
van der Reyden, D.: A simple statistical significance test. Rhod. Agric. J. 49, 96–104 (1952)
Google Scholar
Wilcoxon, F.: Individual comparisons by ranking methods. Biometrics Bull. 1, 80–83 (1945)
Article Google Scholar
Wilks, S.S.: Certain generalizations in the analysis of variance. Biometrika 24, 471–494 (1932)
Article Google Scholar

Download references

Author information

Authors and Affiliations

Department of Sociology, Colorado State University, Fort Collins, Colorado, USA
Kenneth J. Berry
Alexandria, Virginia, USA
Janis E. Johnston
Department of Statistics, Colorado State University, Fort Collins, Colorado, USA
Paul W. Mielke Jr.

Authors

Kenneth J. Berry
View author publications
You can also search for this author in PubMed Google Scholar
Janis E. Johnston
View author publications
You can also search for this author in PubMed Google Scholar
Paul W. Mielke Jr.
View author publications
You can also search for this author in PubMed Google Scholar

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Berry, K.J., Johnston, J.E., Mielke, P.W. (2019). Completely-Randomized Designs. In: A Primer of Permutation Statistical Methods. Springer, Cham. https://doi.org/10.1007/978-3-030-20933-9_8

Download citation

DOI: https://doi.org/10.1007/978-3-030-20933-9_8
Published: 02 August 2019
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-20932-2
Online ISBN: 978-3-030-20933-9
eBook Packages: Mathematics and StatisticsMathematics and Statistics (R0)

Publish with us

Policies and ethics

Completely-Randomized Designs

Abstract

Similar content being viewed by others

Randomized Designs: Ordinal Data, I

Completely Randomized Data

Randomized Block Designs: Nominal Data

8.1 Introduction

8.2 A Permutation Approach

8.3 The Relationship Between Statistics F and δ

8.4 Example 1: Test Statistics F and δ

8.4.1 An Exact Analysis with v = 2

8.5 Example 2: Measures of Effect Size

8.5.1 Comparisons of Effect Size Measures

8.5.2 Example Comparisons of Effect Size Measures

8.6 Example 3: Analyses with v = 2 and v = 1

8.6.1 A Monte Carlo Analysis with v = 2

8.6.2 Measures of Effect Size

8.6.3 A Monte Carlo Analysis with v = 1

8.6.4 The Effects of Extreme Values

8.6.5 A Monte Carlo Analysis with v = 2

8.6.6 A Monte Carlo Analysis with v = 1

8.7 Example 4: Exact and Monte Carlo Analyses

8.7.1 A Permutation Analysis with v = 2

8.7.2 Measures of Effect Size

8.7.3 An Exact Analysis with v = 2

8.7.4 A Monte Carlo Analysis with v = 1

8.7.5 An Exact Analysis with v = 1

8.8 Example 5: Rank-Score Permutation Analyses

8.8.1 The Kruskal–Wallis Rank-Sum Test

8.8.2 A Monte Carlo Analysis with v = 2

8.8.3 An Exact Analysis with v = 2

8.8.4 An Exact Analysis with v = 1

8.9 Example 6: Multivariate Permutation Analyses

8.9.1 The Bartlett–Nanda–Pillai Trace Test

8.9.2 An Exact Analysis with v = 2

8.9.3 An Exact Analysis with v = 1

8.10 Summary

Notes

References

Author information

Authors and Affiliations

Rights and permissions

Copyright information

About this chapter

Cite this chapter

Download citation

Share this chapter

Publish with us

Search

Navigation