Completely Randomized Data

Berry, Kenneth J.; Mielke, Paul W.; Johnston, Janis E.

doi:10.1007/978-3-319-28770-6_2

Kenneth J. Berry⁴,
Paul W. Mielke Jr.⁵ &
Janis E. Johnston⁶

1157 Accesses

Abstract

Chapter 2 introduces a generalized Minkowski distance function that is the basis for a set of multi-response permutation procedures for univariate and multivariate completely randomized data. Multi-response permutation procedures constitute a class of permutation methods for one or more response measurements that are designed to distinguish possible differences among two or more groups. The multi-response permutation procedures provide a synthesizing foundation for a variety of statistical tests and measures developed in successive chapters.

Access provided by Autonomous University of Puebla. Download chapter PDF

Completely-Randomized Designs

Permutation tests for experimental data

Article Open access 01 April 2023

Randomized-Blocks Designs

Keywords

These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

This second chapter of Permutation Statistical Methods introduces a generalized distance function that provides the foundation for a set of multi-response permutation procedures specifically designed for univariate and multivariate completely randomized data. Multi-Response Permutation Procedures (MRPP) were introduced by Mielke , Berry , and Johnson in 1976 and constitute a class of permutation methods for one or more response measurements on each object that were initially developed to distinguish possible differences among two or more groups of objects [300].^{Footnote 1} The multi-response permutation procedures presented here are based on a generalized Minkowski distance function and provide a synthesizing foundation for a variety of statistical tests and measures for completely randomized data that are further developed in Chaps. 3–7.

2.1 Minkowski Distance Function

Hermann Minkowski (1864–1909), German mathematician and creator of the geometry of numbers, utilized geometrical methods to solve problems in number theory, mathematical physics, and the theory of relativity. Minkowski was a close friend of David Hilbert while teaching at Königsberg University and taught Albert Einstein while employed at Eidgenössische Polytechnikum in Zürich (now, ETH Zürich). In 1891 Minkowski introduced a measure of metric distance between two points in Crelle’s Journal [310].^{Footnote 2} The Minkowski metric distance of order p between two points in an r-dimensional Euclidean space, $x^{{\prime}} = (x_{1},x_{2},\,\ldots,\,x_{r})$ and $y^{{\prime}} = (y_{1},y_{2},\,\ldots,\,y_{r}) \in \mathbb{R}^{r}$, is given by

$$\displaystyle{ d(x,y) = \left (\sum _{i=1}^{r}\big\vert x_{ i} - y_{i}\big\vert ^{p}\right )^{\!1/p}\;, }$$

where p ≥ 1.

The Minkowski distance function is typically used with p = 1, 2, or $\infty $. When p = 1, the distance is a first-order Minkowski metric, often called a city-block, Manhattan [231], rectilinear [54], or taxicab [222] metric, the latter named for the distance between two points that a car or taxicab would drive in a city laid out in square blocks. When p = 2, the distance is a second-order Minkowski metric and is the ordinary Euclidean distance between points, a generalization of the Pythagorean theorem to more than two coordinates. When $p = \infty $, the Minkowski metric is known as the Tchebycheff (Chebyshev), von Neumann, or, in the two-dimensional case, the chess-board Minkowski distance [167].

Conventional statistical tests and measures, such as t tests, F tests, and ordinary least-squares (OLS) regression and correlation, are based on squared Euclidean distances between response measurement scores, which are not metric. The Minkowski distance function, however, is limited to metric distances and, under its standard definition, cannot accommodate most conventional statistical tests. Therefore, consider a generalized Minkowski distance function given by

$$\displaystyle{ \Delta (x,y) = \left (\sum _{i=1}^{r}\big\vert x_{ i} - y_{i}\big\vert ^{p}\right )^{\!v/p}\;, }$$

(2.1)

where p ≥ 1 and v > 0 [297, p. 5]. When r ≥ 2, p = 2, and v = 1, $\Delta (x,y)$ is rotationally invariant in an r ≥ 2 dimensional space. When $v = p = 1$, $\Delta (x,y)$ is a city-block metric, which is not rotationally invariant. When v = 1 and p = 2, $\Delta (x,y)$ is an ordinary Euclidean distance metric. And when $v = p = 2$, $\Delta (x,y)$ is a squared Euclidean distance, which is not a metric distance function since the triangle inequality is not satisfied.^{Footnote 3}

2.2 Multi-response Permutation Procedures

Multi-Response Permutation Procedures (MRPP) were originally designed to statistically determine possible differences among one or more response measurement scores among two or more groups of objects or subjects [300]. Let $\Omega =\{\omega _{1},\,\ldots,\,\omega _{N}\}$ denote a finite sample of N objects that represents a target population, let x′_i = (x _1i, …, x _ri) be a transposed vector of r commensurate response measurement scores for object ω _i, i = 1, …, N, and let S ₁, …, S _g designate an exhaustive partitioning of the N objects into g disjoint treatment groups.^{Footnote 4} The MRPP test statistic is a weighted mean given by

$$\displaystyle{ \delta =\sum _{ i=1}^{g}C_{ i}\xi _{i}\;, }$$

(2.2)

where C _i > 0 is a positive weight for treatment group S _i, i = 1, …, g, $\sum _{i=1}^{g}C_{i} =\ 1$,

$$\displaystyle{ \xi _{i} = \binom{n_{i}}{2}^{\!-1}\sum _{ j<k}\Delta (j,k)\,\Psi _{i}(\omega _{j})\,\Psi _{i}(\omega _{k}) }$$

(2.3)

is the average distance-function value for all distinct pairs of objects in treatment group S _i, i = 1, …, g, n _i ≥ 2 is the number of a priori objects classified into treatment group S _i, i = 1, …, g,

$$\displaystyle{ N =\sum _{ i=1}^{g}n_{ i}, }$$

$\sum _{j<k}$ is the sum over all j and k such that 1 ≤ j < k ≤ N, and $\Psi (\cdot )$ is an indicator function given by

$$\displaystyle{ \Psi _{i}(\omega _{j}) = \left \{\begin{array}{@{}l@{\quad }l@{}} \,1 \quad &\mbox{ if $\omega _{j} \in S_{i}$}\;, \\ [6pt]\,0\quad &\text{otherwise}\;. \end{array} \right. }$$

The choice of the treatment-group weights, C ₁, …, C _g, and the generalized Minkowski distance function given in Eq. (2.1) on p. 30 specify the structure of MRPP. The original choice of C _i given by Mielke , Berry , and Johnson in 1976 was

$$\displaystyle{ C_{i} = \frac{n_{i}(n_{i} - 1)} {\sum _{j=1}^{g}n_{ j}(n_{j} - 1)} }$$

for i = 1, …, g [300]. However, a variety of other treatment-group weights can be considered; for example,

$$\displaystyle{ C_{i} = \frac{n_{i}} {N}\;,\quad C_{i} = \frac{n_{i} - 1} {N - g}\;,\quad \mbox{ or}\;\quad C_{i} = \frac{1} {g} }$$

for i = 1, …, g. The efficient choice of $C_{i} = n_{i}/N$, i = 1, …, g, forces the population variance, $\sigma _{x}^{2}$, to be proportional to N ⁻² and eliminates all terms of order 1∕N in the variance of δ [297, pp. 26, 30].

The null hypothesis (H ₀) states that equal probabilities are assigned to each of the

$$\displaystyle{ M = \frac{N!} {\prod _{i=1}^{g}n_{ i}!} }$$

possible, equally-likely allocations of the N objects to the g treatment groups, S ₁, …, S _g. Under H ₀ the N multi-response measurements are exchangeable multivariate random variables.^{Footnote 5} The probability value associated with an observed value of δ, δ _o, is the probability under the null hypothesis (H ₀) of observing a value of δ as extreme or more extreme than δ _o. Thus, an exact probability value for δ _o may be expressed as

$$\displaystyle{ P\big(\delta \leq \delta _{\text{o}}\vert H_{0}\big) = \frac{\mbox{ number of $\delta $ values $ \leq \delta _{\text{o}}$}} {M} \;. }$$

When M is very large, an approximate probability value for δ may be obtained from a resampling procedure, where

$$\displaystyle{ P\big(\delta \leq \delta _{\text{o}}\vert H_{0}\big) = \frac{\mbox{ number of $\delta $ values $ \leq \delta _{\text{o}}$}} {L} }$$

and L denotes the number of randomly sampled test statistic values. Typically, L is set to a large number to ensure accuracy.

Number of Resamplings Necessary

Exact permutation tests are restricted to relatively small samples, given the large number of possible permutations. On the other hand, resampling permutation tests are not limited by the size of the samples. Resampling permutation tests also have been shown to provide good approximations to exact probability values as a function of the number of resamplings considered. An early concern regarding the systematic use of resampling permutation tests was the speed of the computers used for calculating the probability values. Given modern high-speed computers, the question of computational speed is moot when probability values are not too small. The remaining question is: how many resamplings are required for a specified accuracy?

The number of resamplings suggested in books and articles on permutation methods is varied and likely dated due to previous limitations of computer speed and memory. Some authors have proposed as few as 100 resamplings to as many as 5,000; for example, see discussions by Dwass in 1957 [100]; Hope in 1968 [180]; Edwards in 1985 [110]; Jockel in 1986 [193]; Keller-McNulty and Higgins in 1987 [199]; Bailer in 1989 [16]; Kim, Nelson, and Startz in 1991 [216]; Manly in 1991 [258, pp. 32–35]; McQueen in 1992 [274]; Rickerts and Berry in 1994 [347]; Kennedy in 1995 [212]; Maxim in 1999 [265, p. 356]; Lunneborg in 2000 [256, pp. 210–213]; Good in 2001 [149, p. 47]; Higgins in 2004 [176]; and Edgington and Onghena in 2007 [109, pp. 40–41]. On the other hand, examples provided by Howell as recently as 2007 utilized as many as 10,000 resamplings [184, pp. 642–646]. Resampling computing packages such as Resampling Stats [14] and StatXact [15] typically use 10,000 resamplings as the default value.

The accuracy of a resampling probability value depends on both the probability value (P) and the number of resamplings (L). Confidence limits on the probability value can be obtained from the binomial distribution when L is large. The 1 −α confidence limits of the binomial distribution are given by

$$\displaystyle{ \hat{P }\pm Z_{\alpha /2}\sqrt{\frac{P(1 - P)} {L}} \;, }$$

(2.4)

where P is the probability value in question and $\hat{P }$ denotes the estimated value of P. Define

$$\displaystyle{ x_{i} = \left \{\begin{array}{@{}l@{\quad }l@{}} \,1 \quad &\mbox{ if $\hat{P }\leq \hat{P } _{\text{o}}$}\;,\\ [6pt]\,0\quad &\text{otherwise} \;, \end{array} \right. }$$

for i = 1, …, L, where $\hat{P }_{\text{o}}$ denotes the observed value of $\hat{P }$. Then $\hat{P }$, the expected value of $\hat{P }$, the variance of $\hat{P }$, and the skewness of $\hat{P }$ are given by

$$\displaystyle\begin{array}{rcl} & \hat{P }= \frac{1} {L}\sum _{i=1}^{L}x_{ i}\;,& {}\\ & \mathrm{E}[\hat{P } ] = P\;, & {}\\ & \sigma _{\hat{P } }^{2} = \frac{P(1-P)} {L} \;,& {}\\ \end{array}$$

and

$$\displaystyle{ \gamma _{\hat{P } } = \frac{1 - 2P} {\sqrt{LP(1 - P)}}\;, }$$

respectively [195, p. 916]. If L is small and P is close to either 0 or 1, the skewness term $\gamma _{\hat{P } }$ becomes large and Eq. (2.4) may not be appropriate. For example, if L = 100 and P = 0. 01,

$$\displaystyle{ \gamma _{\hat{P } } = \frac{1 - 2P} {\sqrt{LP(1 - P)}} = \frac{1 - 2(0.01)} {\sqrt{100(0.01)(1 - 0.01)}} = 0.9849\;. }$$

Table 2.1 lists a selected number of probability values (P = 0. 50, 0.25, 0.10, 0.05, and 0.01), a variety of resamplings (L = 100, 1000, 10,000, 1,000,000, and 100,000,000), computed skewness values, errors on the 95 % confidence limits determined from Eq. (2.4), and the simulated lower and upper errors on the 95 % confidence limits based on L resamplings and determined from the smallest value for which the cumulative binomial distribution is equal to or less than 0.025 and equal to or greater than 0.975, respectively. In general, as can be seen from Table 2.1, two additional orders of magnitude are required to increase accuracy by just one decimal place.

Table 2.1 Five probability (P) values, four levels of resampling (L), skewness ($\gamma _{\hat{P } }$), and asymptotic and simulated errors on 95 % confidence limits; table adapted from Johnston, Berry, and Mielke [195, p. 917]

Full size table

To illustrate the number of resamplings required to yield a predetermined number of decimal places of accuracy, given a known probability value, consider the interval-level data listed in Fig. 2.1.

The data listed in Fig. 2.1 are adapted from Berry, Mielke, and Mielke [38] and represent soil lead (Pb) quantities from two school districts in metropolitan New Orleans. Elevated Pb levels have been linked to a number of physiological, neurological, and endocrine effects in children, including difficulties in learning, perception, social behavior, and fine motor skills. The n ₁ = 20 soil lead samples collected in District 1 yielded a mean value of $\bar{x}_{1} = 203.9350$ mg/kg and the n ₂ = 20 soil lead samples collected in District 2 yielded a mean value of $\bar{x}_{2} = 1,661.7800$ mg/kg. There are

$$\displaystyle{ M = \frac{(n_{1} + n_{2})!} {n_{1}!\;n_{2}!} = \frac{(20 + 20)!} {20!\;20!} = 137,846,528,820 }$$

possible permutations of the soil lead data listed in Fig. 2.1 to be considered. Under the null hypothesis of no difference between the two group means in the population, a Fisher–Pitman permutation F test [38] yields an exact two-sided probability value of

$$\displaystyle\begin{array}{rcl} & & P\big(F \geq F_{\text{o}}\vert H_{0}\big) = \frac{\mbox{ number of $F$ values $ \geq F_{\text{o}}$}} {M} {}\\ & & \qquad \qquad \qquad \qquad \quad \phantom{P\big(F \geq F_{\text{o}}\vert H_{0}\big) =}= \frac{2,056,423,782} {137,846,528,820} = 0.0149182123 {}\\ \end{array}$$

for the soil lead data listed in Fig. 2.1. Figure 2.2 summarizes the results for eight different resamplings of the data listed in Fig. 2.1 and the associated two-sided resampling probability values with α = 0. 05. Each of the probability values was generated using a common seed and the same pseudorandom number generator [197]. The last row of Fig. 2.2 contains the exact probability value based on all M = 137, 846, 528, 820 possible permutations of the soil lead data listed in Fig. 2.1.

Given the results of the resampling probability analyses listed in Fig. 2.2, L = 1, 000, 000 is recommended whenever three decimal places of accuracy are required. There are four reasons for promoting L = 1, 000, 000 resamplings: accuracy, practicality, error, and consistency. First, inspection of Fig. 2.2 indicates that with an exact probability value of P = 0. 0149182123 and α = 0. 05, L = 1, 000, 000 resamplings is the minimum number of resamplings necessary to ensure three decimal places of accuracy. Second, given the speed of modern computers and the efficiency of resampling algorithms, such as the Mersenne Twister, L = 1, 000, 000 resamplings can be used on a routine basis. Third, there is the potential for additional type I error, the magnitude of which is of concern when the number of resamplings (L) is very small. Fourth, some researchers object to the use of resampling statistics because different pseudorandom number generators and different seeds can produce widely varying results. This is certainly true when L is very small. For example, in Fig. 2.2, L = 100 yields a probability value of P = 0. 06. Varying the seed with L = 100 and the same pseudorandom number generator produced observed probability values ranging from P = 0. 01 to P = 0. 11. However, with L = 1, 000, 000, varying the seed produced no differences in the third decimal place.

When the number of possible arrangements (M) is very large and the exact probability value (P) is exceedingly small, a resampling permutation procedure may produce no δ values equal to or less than δ _o, even with L = 1, 000, 000, yielding an approximate resampling probability value of P = 0. 00. In such cases, moment-approximation permutation procedures based on fitting the first three exact moments of the discrete permutation distribution to a Pearson type III distribution provide approximate probability values, as detailed in Chap. 1, Sect. 1.2.2; see also references [284] and [300].

An Index of Agreement

It is oftentimes desirable to have an index of the amount of agreement among response measurement scores within g treatment groups. A useful measure for this purpose is a chance-corrected within-group coefficient of agreement given by

$$\displaystyle{ \mathfrak{R} = 1 -\frac{\delta } {\mu _{\delta }}\;, }$$

(2.5)

where μ _δ is the arithmetic average of the Mδ values calculated on all possible arrangements of the observed response measurement scores, given by

$$\displaystyle{ \mu _{\delta } = \frac{1} {M}\sum _{i=1}^{M}\delta _{ i}\;. }$$

(2.6)

$\mathfrak{R}$ is a chance-corrected measure of agreement since $\mathrm{E}[\mathfrak{R}\vert H_{0}] = 0$.^{Footnote 6} Because μ _δ is a constant under H ₀, the permutation distributions of δ and $\mathfrak{R}$ are equivalent, viz.,

$$\displaystyle{ P\big(\delta \leq \delta _{\text{o}}\vert H_{0}\big) = P\left (\mathfrak{R}\geq \mathfrak{R}_{\text{o}}\vert H_{0}\right )\;, }$$

where

$$\displaystyle{ \mathfrak{R}_{\text{o}} = 1 -\frac{\delta _{\text{o}}} {\mu _{\delta }} }$$

and δ _o and $\mathfrak{R}_{\text{o}}$ denote the observed values of δ and $\mathfrak{R}$, respectively. Possible values of $\mathfrak{R}$ range from slightly negative values to a maximum of $\mathfrak{R} = +1$ for the extreme case when all response measurements on objects within each of the g classified treatment groups are identical, i.e., δ = 0.

The generalized Minkowski distance function, $\Delta (x,y)$, as defined in Eq. (2.1) on p. 30, determines the analysis space of the MRPP test statistic, δ. The data space in question for almost all statistical analyses is an ordinary Euclidean distance space. If the distance function of the MRPP test statistic is based on p = 2 and v = 1, then the data and analysis spaces are congruent, so that the resulting statistical analyses represent the data in question. Unfortunately, commonly used statistical analyses based on the arithmetic mean, such as Student’s two-sample t test and Fisher’s one-way analysis of variance, are based on $p = v = 2$, yielding a non-metric squared-distance analysis space that is not congruent with the data space. The difference between the data and analysis spaces associated with the most popular statistical analyses is a reason that problems occur with what should be routine analyses. Examples illustrating this problem are given elsewhere; see, for example, references [41, pp. 404–410] and [297, pp. 50–53]. Any statistical analysis is questionable when the data and analysis spaces are not congruent.

2.2.1 Chance-Corrected Agreement Measures

Chance-corrected measures yield values that are interpreted as a proportion above that expected by chance alone. Chance-corrected agreement measures provide clear and meaningful interpretations of the amount of, or lack of, agreement present in the data. In general, chance-corrected measures of agreement, such as $\mathfrak{R}$, are equal to + 1 when perfect agreement among the response measurement scores occurs, 0 when agreement is equal to that expected under independence, and negative when agreement among the response measurement scores is less than that expected by chance. For example, define a chance-corrected measure such that

$$\displaystyle{ A_{i} = 100\left (\frac{O_{i} - E_{i}} {N - E_{i}} \right )\;, }$$

where O _i and E _i denote the Observed (earned) and Expected (chance) score from purely guessing, respectively, on a multiple-choice examination with N questions for the ith student in a class of m students [175, p. 912].

Thus, on a 50-question multiple-choice examination with five choices per question, chance would indicate that a student could answer 50 × 0. 20 = 10 questions correctly simply by guessing. If a student answered only eight questions correctly, then a chance-corrected measure of agreement would yield a grade of

$$\displaystyle{ A = 100\left ( \frac{8 - 10} {50 - 10}\right ) = 100\left (\frac{-2} {40} \right ) = -5\;, }$$

since the score was less than expected by chance, i.e., only eight of 50 questions were answered correctly. The lowest grade would occur when a student answered all 50 questions incorrectly, yielding a score of

$$\displaystyle{ A = 100\left ( \frac{0 - 10} {50 - 10}\right ) = 100\left (\frac{-10} {40} \right ) = -25\;. }$$

Note that while a student with the highest possible score of 50 correct answers would score

$$\displaystyle{ A = 100\left (\frac{50 - 10} {50 - 10}\right ) = 100\left (\frac{40} {40}\right ) = 100\;, }$$

the lowest possible score is − 25, not − 100. Thus, the distributions of chance-corrected measures are usually asymmetric.

Since the mean value of $\mathfrak{R}$ under H ₀ is 0, homogeneity of within-classified-group response measurements is associated with $\mathfrak{R} > 0$, and heterogeneity of within-classified-group response measurements is associated with $\mathfrak{R}\leq 0$ [28]. The distribution of $\mathfrak{R}$ is usually asymmetric and the upper and lower bounds depend on both the nature of the data and the structure of δ. The degree of homogeneity or heterogeneity depends on the discrete permutation distribution of $\mathfrak{R}$. If large values of n ₁, …, n _g and N are involved, a very small value of $P(\delta \leq \delta _{\text{o}}\vert H_{0})$ may be associated with a small positive observed value of $\mathfrak{R}$, say $\mathfrak{R}_{\text{o}}$. Conversely, with small values of n ₁, …, n _g and N, a large value of $\mathfrak{R}_{\text{o}}$ may be associated with a relatively large value of $P(\delta \leq \delta _{\text{o}}\vert H_{0})$.

2.2.2 Example Univariate MRPP Analysis with v = 2

Although multi-response permutation procedures were originally designed for analyzing multivariate response measurement scores, they can also be used for analyzing univariate data. Consider a comparison between two mutually exclusive groups of objects, S ₁ and S ₂, where a single response measurement, x, has been obtained from each object. For this example, there is r = 1 response measurement score for each object, g = 2 disjoint groups, and a total of N = 6 objects with n ₁ = 2 and n ₂ = 4 in treatment groups S ₁ and S ₂, respectively. Suppose that the n ₁ = 2 observed response measurement scores for treatment group S ₁ are {5, 4} and the n ₂ = 4 response measurement scores for treatment group S ₂ are {2, 3, 7, 9}. The treatment-group sizes and the response measurement scores are deliberately kept small to simplify the example analysis. The treatment-group sizes and the univariate response measurement scores are listed in Fig. 2.3.

For this example analysis, let v = 2, p = 2, r = 1,

$$\displaystyle{ C_{1} = \frac{n_{1}} {N} = \frac{2} {6}\;,\quad \mbox{ and}\quad C_{2} = \frac{n_{2}} {N} = \frac{4} {6}\;, }$$

so that the S ₁ and S ₂ treatment groups are weighted proportional to their group sizes of n ₁ = 2 and n ₂ = 4, respectively. For univariate response measurement scores with r = 1, Eq. (2.1) on p. 30 reduces to

$$\displaystyle{ \Delta (j,k) =\Big (\big\vert x_{j} - x_{k}\big\vert ^{p}\Big)^{\!v/p}\;. }$$

(2.7)

Thus, for treatment group S ₁ with n ₁ = 2 objects, p = 2, and v = 2, the generalized Minkowski distance function yields

$$\displaystyle{ \Delta (1,2) =\Big (\big\vert 5 - 4\big\vert ^{2}\,\Big)^{\!2/2} = 1.00\;, }$$

and for treatment group S ₂ with n = 4 objects, the generalized Minkowski distance function yields

$$\displaystyle\begin{array}{rcl} \qquad \qquad \qquad \Delta (3,4)& =& \Big(\big\vert 2 - 3\big\vert ^{2}\,\Big)^{\!2/2} =\phantom{ 1}1.00\;, {}\\ \qquad \qquad \qquad \Delta (3,5)& =& \Big(\big\vert 2 - 7\big\vert ^{2}\,\Big)^{\!2/2} = 25.00\;, {}\\ \qquad \qquad \qquad \Delta (3,6)& =& \Big(\big\vert 2 - 9\big\vert ^{2}\,\Big)^{\!2/2} = 49.00\;, {}\\ \qquad \qquad \qquad \Delta (4,5)& =& \Big(\big\vert 3 - 7\big\vert ^{2}\,\Big)^{\!2/2} = 16.00\;, {}\\ \qquad \qquad \qquad \Delta (4,6)& =& \Big(\big\vert 3 - 9\big\vert ^{2}\,\Big)^{\!2/2} = 36.00\;, {}\\ \text{and}\qquad \qquad \qquad \qquad \qquad & & {}\\ \qquad \qquad \qquad \Delta (5,6)& =& \Big(\big\vert 7 - 9\big\vert ^{2}\,\Big)^{\!2/2} =\phantom{ 1}4.00\;. {}\\ \end{array}$$

Then following Eq. (2.3) on p. 31, the average distance-function values for all distinct pairs of objects in treatment groups S _i, i = 1, 2, are

$$\displaystyle{ \xi _{1} = \binom{n_{1}}{2}^{\!-1}\Big[\Delta (1,2)\Big] = \binom{\,2\,}{2}^{\!-1}\left (1.00\right ) = 1.00 }$$

and

$$\displaystyle\begin{array}{rcl} \xi _{2}& =& \binom{n_{2}}{2}^{\!-1}\Big[\Delta (3,4) + \Delta (3,5) + \Delta (3,6) + \Delta (4,5) + \Delta (4,6) + \Delta (5,6)\Big] {}\\ & =& \binom{\,4\,}{2}^{\!-1}\left (1.00 + 25.00 + 49.00 + 16.00 + 36.00 + 4.00\right ) = 21.8333\;. {}\\ \end{array}$$

Following Eq. (2.2) on p. 31, the observed weighted mean of the $\xi _{1}$ and $\xi _{2}$ values, based on v = 2 and $C_{i} = n_{i}/N$ for i = 1, 2 is

$$\displaystyle{ \delta _{\text{o}} = C_{1}\xi _{1} + C_{2}\xi _{2} = \left (\frac{2} {6}\right )(1.00) + \left (\frac{4} {6}\right )(21.8333) = 14.8889\;. }$$

Smaller values of δ _o indicate a concentration of response measurement scores within the g treatment groups, whereas larger values of δ _o indicate a lack of concentration between response measurement scores among the g treatment groups [301]. The N = 6 objects can be partitioned into g = 2 treatment groups, S ₁ and S ₂, respectively, with n ₁ = 2 and n ₂ = 4 response measurement scores preserved in

$$\displaystyle{ M = \frac{N!} {n_{1}!\;n_{2}!} = \frac{6!} {2!\;4!} = 15 }$$

possible, equally-likely ways. The M = 15 possible arrangements of the observed data in Fig. 2.3, along with the corresponding $\xi _{1}$, $\xi _{2}$, and δ values, are listed in Table 2.2 and ordered by the δ values from lowest to highest. The observed MRPP test statistic, δ _o = 14. 8889, obtained from the realized arrangement,

$$\displaystyle{ \{5,4\}\quad \{2,3,7,9\}\;, }$$

(Order 9 in Table 2.2) is not unusual since five of the remaining δ values ($\delta _{11}\mbox{ to }\delta _{15}$) exceed the observed value of δ _o = 14. 8889 and 10 values of δ (δ ₁ to δ ₁₀) are equal to or less than the observed value. If all arrangements of the N = 6 observed response measurement scores listed in Fig. 2.3 occur with equal chance, the exact probability value of δ _o = 14. 8889 computed on the M = 15 possible arrangements of the observed data with n ₁ = 2 and n ₂ = 4 response measurement scores preserved for each arrangement is

$$\displaystyle{ P\big(\delta \leq \delta _{\text{o}}\vert H_{0}\big) = \frac{\mbox{ number of $\delta $ values $ \leq \delta _{\text{o}}$}} {M} = \frac{10} {15} = 0.6667\;. }$$

Table 2.2 Permutations of the observed data in Fig. 2.3 for treatment groups S ₁ and S ₂ with values for $\xi _{1}$, $\xi _{2}$, and δ based on v = 2, ordered by values of δ from lowest to highest

Full size table

For comparison, a conventional Student two-sample pooled t test calculated on the N = 6 response measurement scores listed in Fig. 2.3 yields an observed value of $t_{\text{o}} = -0.3004$. Assuming independence, normality, and homogeneity of variance, t is approximately distributed as Student’s t under the null hypothesis with $N - 2 = 6 - 2 = 4$ degrees of freedom. Under the null hypothesis, the observed value of $t_{\text{o}} = -0.3004$ yields an approximate two-sided probability value of P = 0. 7789.

Following Eq. (2.6) on p. 37, the exact average value of the M = 15 δ values listed in Table 2.2 is μ _δ = 13. 60. Thus, the observed chance-corrected coefficient of agreement, following Eq. (2.5) on p. 37, is

$$\displaystyle{ \mathfrak{R}_{\text{o}} = 1 -\frac{\delta _{\text{o}}} {\mu _{\delta }} = 1 -\frac{14.8889} {13.60} = -0.0948\;, }$$

indicating that within-group agreement is well below that expected by chance.

2.2.3 Example Univariate MRPP Analysis with v = 1

Permutation statistical tests and measures are data-dependent, distribution-free, and non-parametric; consequently, they require no distributional assumptions and make no estimates of population parameters. Thus, it is not necessary to set v = 2 and to square the response-measurement differences between objects. While conventional tests and measures that assume normality must estimate the mean and variance, μ _x and $\sigma _{x}^{2}$, of the normal distribution, both of which are based on squared deviations from the mean, permutation tests and measures do not assume normality and are not restricted to v = 2, which is not a metric distance function. A distance function based on v = 1 is an attractive alternative to v = 2 as it is a metric distance function, satisfies the triangle inequality, is robust to extreme values, provides an easy-to-understand ordinary Euclidean distance between objects, and ensures that the data and analysis spaces are congruent [284–287, 289, 295]. In addition, choosing v = 1 over v = 2 can make a substantial difference in the results of an MRPP analysis; see, for example, a discussion by Mielke and Berry in 2007 [297, pp. 45–50].

To illustrate the computation of δ with v = 1, consider the same finite sample of N = 6 objects listed in Fig. 2.3 on p. 40 and let S ₁ and S ₂ denote an exhaustive partitioning of the N = 6 objects into g = 2 disjoint treatment groups. As previously, let S ₁ consist of n ₁ = 2 objects, each with a single response measurement, and let S ₂ consist of n ₂ = 4 objects, each with a single response measurement.

Given the univariate data listed in Fig. 2.3, let r = 1, p = 2,

$$\displaystyle{ C_{1} = \frac{n_{1}} {N} = \frac{2} {6}\;,\quad \mbox{ and}\quad C_{2} = \frac{n_{2}} {N} = \frac{4} {6}\;, }$$

but in this case set v = 1 instead of v = 2, employing ordinary Euclidean distance instead of squared Euclidean distance between objects. Following Eq. (2.7) on p. 40 for treatment group S ₁ with n ₁ = 2 objects, p = 2, and v = 1, the generalized Minkowski distance function yields

$$\displaystyle{ \Delta (1,2) =\Big (\big\vert (5 - 4\big\vert ^{2}\,\Big)^{\!1/2} = 1.00\;, }$$

and for treatment group S ₂ with n = 4 objects, the generalized Minkowski distance function yields

$$\displaystyle\begin{array}{rcl} \qquad \qquad \qquad \Delta (3,4)& =& \Big(\big\vert 2 - 3\big\vert ^{2}\,\Big)^{\!1/2} = 1.00\;, {}\\ \qquad \qquad \qquad \Delta (3,5)& =& \Big(\big\vert 2 - 7\big\vert ^{2}\,\Big)^{\!1/2} = 5.00\;, {}\\ \qquad \qquad \qquad \Delta (3,6)& =& \Big(\big\vert 2 - 9\big\vert ^{2}\,\Big)^{\!1/2} = 7.00\;, {}\\ \qquad \qquad \qquad \Delta (4,5)& =& \Big(\big\vert 3 - 7\big\vert ^{2}\,\Big)^{\!1/2} = 4.00\;, {}\\ \qquad \qquad \qquad \Delta (4,6)& =& \Big(\big\vert 3 - 9\big\vert ^{2}\,\Big)^{\!1/2} = 6.00\;, {}\\ \text{and}\qquad \qquad \qquad \qquad \qquad & & {}\\ \qquad \qquad \qquad \Delta (5,6)& =& \Big(\big\vert 7 - 9\big\vert ^{2}\,\Big)^{\!1/2} = 2.00\;. {}\\ \end{array}$$

Then following Eq. (2.3) on p. 31, the average distance-function values for all distinct pairs of objects in treatment group S _i, i = 1, 2, are

$$\displaystyle{ \xi _{1} = \binom{n_{1}}{2}^{\!-1}\Big[\Delta (1,2)\Big] = \binom{\,2\,}{2}^{\!-1}\left (1.00\right ) = 1.00 }$$

and

$$\displaystyle\begin{array}{rcl} \xi _{2}& =& \binom{n_{2}}{2}^{\!-1}\Big[\Delta (3,4) + \Delta (3,5) + \Delta (3,6) + \Delta (4,5) + \Delta (4,6) + \Delta (5,6)\Big] {}\\ & =& \binom{\,4\,}{2}^{\!-1}\left (1.00 + 5.00 + 7.00 + 4.00 + 6.00 + 2.00\right ) = 4.1667\;. {}\\ \end{array}$$

Following Eq. (2.2) on p. 31, the observed weighted mean of the $\xi _{1}$ and $\xi _{2}$ values, based on $C_{i} = n_{i}/N$ for i = 1, 2 is

$$\displaystyle{ \delta _{\text{o}} = C_{1}\xi _{1} + C_{2}\xi _{2} = \left (\frac{2} {6}\right )(1.00) + \left (\frac{4} {6}\right )(4.1667) = 3.1111\;. }$$

As in the previous MRPP example with v = 2, the N = 6 objects can be partitioned into g = 2 treatment groups, S ₁ and S ₂, with n ₁ = 2 and n ₂ = 4 response measurement scores preserved for each arrangement of the observed data in

$$\displaystyle{ M = \frac{N!} {n_{1}!\;n_{2}!} = \frac{6!} {2!\;4!} = 15 }$$

possible, equally-likely ways. The M = 15 possible arrangements of the observed data in Fig. 2.3, along with the corresponding $\xi _{1}$, $\xi _{2}$, and δ values, are listed in Table 2.3 and ordered by the δ values from lowest to highest. The observed MRPP test statistic, δ _o = 3. 1111, obtained from the realized arrangement,

$$\displaystyle{ \{5,4\}\quad \{2,3,7,9\}\;, }$$

(Order 5 in Table 2.3) is not unusual since eight of the remaining δ values (δ ₈ to δ ₁₅) exceed the observed value of δ _o = 3. 1111 and seven values of δ (δ ₁ to δ ₇) are equal to or less than the observed value. If all arrangements of the N = 6 observed response measurement scores listed in Fig. 2.3 occur with equal chance, the exact probability value of δ _o = 3. 1111 computed on the M = 15 possible arrangements of the observed data with n ₁ = 2 and n ₂ = 4 response measurement scores preserved for each arrangement is

$$\displaystyle{ P\big(\delta \leq \delta _{\text{o}}\vert H_{0}\big) = \frac{\mbox{ number of $\delta $ values $ \leq \delta _{\text{o}}$}} {M} = \frac{7} {15} = 0.4667\;. }$$

For comparison, for the univariate data listed in Fig. 2.3 the exact probability value based on v = 2, M = 15, and $C_{i} = n_{i}/N$ for i = 1, 2 in the previous example is P = 0. 6667. No comparison is made with the conventional Student two-sample t test as Student’s t test is undefined for v = 1.

Table 2.3 Permutations of the observed data in Fig. 2.3 for treatment groups S ₁ and S ₂ with values for $\xi _{1}$, $\xi _{2}$, and δ based on v = 1, ordered by values of δ from lowest to highest

Full size table

Following Eq. (2.6) on p. 37, the exact average value of the M = 15 δ values listed in Table 2.3 is μ _δ = 3. 20. Thus, the observed chance-corrected coefficient of agreement, following Eq. (2.5) on p. 37, is

$$\displaystyle{ \mathfrak{R}_{\text{o}} = 1 -\frac{\delta _{\text{o}}} {\mu _{\delta }} = 1 -\frac{14.8889} {3.20} = +0.0278\;, }$$

indicating very little within-group agreement above that expected by chance.

2.2.4 Example Bivariate MRPP Analysis with v = 2

In this second example, bivariate response measurement scores are used for simplicity to demonstrate a multivariate MRPP analysis. To illustrate the computation of MRPP with bivariate response measurement scores for each object, consider a finite sample of N = 7 objects and let S ₁ and S ₂ denote an exhaustive partitioning of the N objects into g = 2 disjoint treatment groups. Further, let S ₁ consist of n ₁ = 4 objects with r = 2 commensurate response measurement scores (x _1i and x _2i) on each object for i = 1, …, 4, with $x_{1}^{\,{\prime}} = (5,\,1)$, $x_{2}^{\,{\prime}} = (4,\,6)$, $x_{3}^{\,{\prime}} = (5,\,2)$, and $x_{4}^{\,{\prime}} = (6,\,3)$, and let S ₂ consist of n ₂ = 3 objects with r = 2 commensurate response measurement scores (x _1i and x _2i) on each object for i = 1, 2, 3 with $x_{5}^{\,{\prime}} = (2,\,3)$, $x_{6}^{\,{\prime}} = (3,\,4)$, and $x_{7}^{\,{\prime}} = (2,\,4)$. The treatment group sizes and the response measurement scores are deliberately kept small to simplify the example analysis. The bivariate response measurement scores for the N = 7 objects are listed in Fig. 2.4.

For this example analysis, let v = 2, p = 2, r = 2,

$$\displaystyle{ C_{1} = \frac{n_{1}} {N} = \frac{4} {7}\;,\quad \mbox{ and}\quad C_{2} = \frac{n_{2}} {N} = \frac{3} {7}\;, }$$

so that the S ₁ and S ₂ treatment groups are weighted proportional to their group sizes of n ₁ = 4 and n ₂ = 3, respectively. Following Eq. (2.1) on p. 30 for treatment group S ₁ with n ₁ = 4 objects, p = 2, and v = 2, the generalized Minkowski distance function yields

$$\displaystyle\begin{array}{rcl} \qquad \qquad \qquad \Delta (1,2)& =& \Big(\big\vert 5 - 4\big\vert ^{2} +\big \vert 1 - 6\big\vert ^{2}\,\Big)^{\!2/2} = 26.00\;, {}\\ \qquad \qquad \qquad \Delta (1,3)& =& \Big(\big\vert 5 - 5\big\vert ^{2} +\big \vert 1 - 2\big\vert ^{2}\,\Big)^{\!2/2} =\phantom{ 2}1.00\;, {}\\ \qquad \qquad \qquad \Delta (1,4)& =& \Big(\big\vert 5 - 6\big\vert ^{2} +\big \vert 1 - 3\big\vert ^{2}\,\Big)^{\!2/2} =\phantom{ 2}5.00\;, {}\\ \qquad \qquad \qquad \Delta (2,3)& =& \Big(\big\vert 4 - 5\big\vert ^{2} +\big \vert 6 - 2\big\vert ^{2}\,\Big)^{\!2/2} = 17.00\;, {}\\ \qquad \qquad \qquad \Delta (2,4)& =& \Big(\big\vert 4 - 6\big\vert ^{2} +\big \vert 6 - 3\big\vert ^{2}\,\Big)^{\!2/2} = 13.00\;, {}\\ \text{and}\qquad \qquad \qquad \qquad \qquad & & {}\\ \qquad \qquad \qquad \Delta (3,4)& =& \Big(\big\vert 5 - 6\big\vert ^{2} +\big \vert 2 - 3\big\vert ^{2}\,\Big)^{\!2/2} =\phantom{ 2}2.00\;, {}\\ \end{array}$$

and for treatment group S ₂ with n ₂ = 3 objects, the generalized Minkowski distance function yields

$$\displaystyle\begin{array}{rcl} \qquad \qquad \qquad \Delta (5,6)& =& \Big(\big\vert 2 - 3\big\vert ^{2} +\big \vert 3 - 4\big\vert ^{2}\,\Big)^{\!2/2} = 2.00\;, {}\\ \qquad \qquad \qquad \Delta (5,7)& =& \Big(\big\vert 2 - 2\big\vert ^{2} +\big \vert 3 - 4\big\vert ^{2}\,\Big)^{\!2/2} = 1.00\;, {}\\ \text{and}\qquad \qquad \qquad \qquad \qquad & & {}\\ \qquad \qquad \qquad \Delta (6,7)& =& \Big(\big\vert 3 - 2\big\vert ^{2} +\big \vert 4 - 4\big\vert ^{2}\,\Big)^{\!2/2} = 1.00\;. {}\\ \end{array}$$

Then following Eq. (2.3) on p. 31, the average distance-function values for all distinct pairs of objects in treatment group S _i, i = 1, 2, are

$$\displaystyle\begin{array}{rcl} \xi _{1}& =& \binom{n_{1}}{2}^{\!-1}\Big[\Delta (1,2) + \Delta (1,3) + \Delta (1,4) + \Delta (2,3) + \Delta (2,4) + \Delta (3,4)\Big] {}\\ & =& \binom{\,4\,}{2}^{\!-1}\left (26.00 + 1.00 + 5.00 + 17.00 + 13.00 + 2.00\right ) = 10.6667 {}\\ \end{array}$$

and

$$\displaystyle\begin{array}{rcl} \xi _{2}& =& \binom{n_{2}}{2}^{\!-1}\Big[\Delta (5,6) + \Delta (5,7) + \Delta (6,7)\Big] {}\\ & =& \binom{\,3\,}{2}^{\!-1}\left (2.00 + 1.00 + 1.00\right ) = 1.3333\;. {}\\ \end{array}$$

Following Eq. (2.2) on p. 31, the observed weighted mean of the $\xi _{1}$ and $\xi _{2}$ values, based on v = 2 and $C_{i} = n_{i}/N$ for i = 1, 2 is

$$\displaystyle{ \delta _{\text{o}} = C_{1}\xi _{1} + C_{2}\xi _{2} = \left (\frac{4} {7}\right )(10.6667) + \left (\frac{3} {7}\right )(1.3333) = 6.6667\;. }$$

The N = 7 objects can be partitioned into g = 2 treatment groups, S ₁ and S ₂, with n ₁ = 4 and n ₂ = 3 response measurement scores preserved for each arrangement of the observed data in

$$\displaystyle{ M = \frac{N!} {n_{1}!\;n_{2}!} = \frac{7!} {4!\;3!} = 35 }$$

possible, equally-likely ways. The M = 35 possible arrangements of the observed bivariate data in Fig. 2.4, along with the corresponding $\xi _{1}$, $\xi _{2}$, and δ values, are listed in Table 2.4 and ordered by the δ values from lowest to highest.

Table 2.4 Permutations of the observed data set in Fig. 2.4 for treatment groups S ₁ and S ₂ with values for $\xi _{1}$, $\xi _{2}$, and δ based on v = 2, ordered by values of δ from lowest to highest

Full size table

The observed MRPP test statistic, δ _o = 6. 6667, obtained from the realized arrangement,

$$\displaystyle{ \{(5,1)(4,6)(5,2)(6,3)\}\quad \{(2,3)(3,4)(2,4)\}\;, }$$

(Order 3 in Table 2.4) is unusual since 32 of the remaining δ values (δ ₄ to δ ₃₅) exceed the observed value of δ _o = 6. 6667 and only two values of δ are less than the observed value: δ ₁ = 4. 0000 and δ ₂ = 6. 4762. If all arrangements of the N = 7 observed bivariate response measurement scores listed in Fig. 2.4 occur with equal chance, the exact probability value of δ _o = 6. 6667 computed on the M = 35 possible arrangements of the observed data with n ₁ = 4 and n ₂ = 3 response measurement scores preserved for each arrangement is

$$\displaystyle{ P\big(\delta \leq \delta _{\text{o}}\vert H_{0}\big) = \frac{\mbox{ number of $\delta $ values $ \leq \delta _{\text{o}}$}} {M} = \frac{3} {35} = 0.0857\;. }$$

A conventional Hotelling two-sample T ² test is given by

$$\displaystyle{ T^{2} = \frac{n_{1}n_{2}} {N} \left (\bar{\mathbf{y}}_{1} -\bar{\mathbf{y}}_{2}\right )^{{\prime}}\mathbf{S}^{-1}\left (\bar{\mathbf{y}}_{ 1} -\bar{\mathbf{y}}_{2}\right )\;, }$$

(2.8)

where $\bar{\mathbf{y}}_{1}$ and $\bar{\mathbf{y}}_{2}$ denote vectors of mean differences between treatment groups S ₁ and S ₂, n ₁ and n ₂ are the number of interval-level multivariate response measurement scores in treatment groups S ₁ and S ₂, and S is a pooled variance–covariance matrix.

For the example data listed in Fig. 2.4, $\bar{y}_{11} = 5.00$, $s_{11}^{2} = 0.6167$, $\bar{y}_{12} = 3.00$, $s_{12}^{2} = 4.6667$, $\mathrm{cov}(1,2)_{1} = -1.00$, $\bar{y}_{21} = 2.3333$, $s_{21}^{2} = 0.3333$, $\bar{y}_{22} = 3.6667$, $s_{22}^{2} = 0.3333$, and $\mathrm{cov}(1,2)_{2} = +0.1667$. Then, $\bar{\mathbf{y}}_{1} =\bar{ y}_{11} -\bar{ y}_{21} = 5.00 - 2.3333 = +2.6667$ and $\bar{\mathbf{y}}_{2} =\bar{ y}_{12} -\bar{ y}_{22} = 3.00 - 3.6667 = -0.6667$.

The variance–covariance matrices for treatment groups S ₁ and S ₂ in Fig. 2.4 are

$$\displaystyle{ \hat{\boldsymbol{\Sigma }}_{1} = \left [\begin{array}{ccc} \phantom{ +} 0.6667&& - 1.0000\\ - 1.0000 & &\phantom{ +} 4.6667 \end{array} \right ]\quad \mbox{ and}\quad \hat{\boldsymbol{\Sigma }}_{2} = \left [\begin{array}{ccc} \phantom{ +} 0.3333&& + 0.1667\\ + 0.1667 & &\phantom{ +} 0.3333 \end{array} \right ]\;, }$$

respectively, and the pooled variance–covariance matrix and its inverse are

$$\displaystyle{ \mathbf{S} = \left [\begin{array}{ccc} \phantom{ -} 0.5333&& - 0.5333\\ - 0.5333 & &\phantom{ -} 2.9333 \end{array} \right ]\quad \mbox{ and}\quad \mathbf{S}^{-1} = \left [\begin{array}{ccc} + 2.9167&& + 0.4167 \\ + 0.4167&& + 0.4167 \end{array} \right ]\;, }$$

respectively.^{Footnote 7}

Following Eq. (2.8), the observed value of Hotelling’s T ² is

$$\displaystyle\begin{array}{rcl} T_{\text{o}}^{2}& =& \frac{n_{1}n_{2}} {N} \left (\bar{\mathbf{y}}_{1} -\bar{\mathbf{y}}_{2}\right )^{{\prime}}\mathbf{S}^{-1}\left (\bar{\mathbf{y}}_{ 1} -\bar{\mathbf{y}}_{2}\right ) {}\\ & =& \frac{(4)(3)} {7} \left [\begin{array}{ccc} + 2.6667\ - 0.6667 \end{array} \right ]\left [\begin{array}{ccc} + 2.2917&& + 0.4167\\ + 0.4167 & & + 0.4167 \end{array} \right ]\left [\begin{array}{c} + 2.6667\\ - 0.6667 \end{array} \right ] {}\\ & & \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad = (1.7143)(15.00) = 25.7143 {}\\ \end{array}$$

and the observed F-ratio for Hotelling’s T ² is

$$\displaystyle{ F_{\text{o}} = \frac{N - r - 1} {r(N - r)} T_{\text{o}}^{2} = \frac{7 - 2 - 1} {2(7 - 2)} (25.7145) = 10.2858\;. }$$

Assuming independence, normality, and homogeneity of variance, F is approximately distributed as Snedecor’s F under the null hypothesis with $\nu _{1} = r = 2$ and $\nu _{2} = N - r - 1 = 7 - 2 - 1 = 4$ degrees of freedom. Under the null hypothesis, the observed value of F _o = 10. 2858 yields an approximate probability value of P = 0. 0265. While there is a considerable difference between the exact probability value of P = 0. 0857 and the approximate probability value of P = 0. 0265, it is not surprising, as Hotelling’s T ² test was not designed for samples as small as n ₁ = 4 and n ₂ = 3.

Following Eq. (2.6) on p. 37, the exact average value of the M = 35 δ values listed in Table 2.4 is μ _δ = 10. 0952. Thus, the observed chance-corrected coefficient of agreement, following Eq. (2.5) on p. 37, is

$$\displaystyle{ \mathfrak{R}_{\text{o}} = 1 -\frac{\delta _{\text{o}}} {\mu _{\delta }} = 1 - \frac{6.6667} {10.0952} = +0.3396\;, }$$

indicating approximately 34 % within-group agreement above that expected by chance.

2.2.5 Example Bivariate MRPP Analysis with v = 1

As mentioned in the univariate example on p. 43, the choice of v can make a substantial difference in the results of an MRPP analysis. To illustrate the computation of MRPP with bivariate data and v = 1, consider the same finite sample of N = 7 objects listed in Fig. 2.4 on p. 46 and let S ₁ and S ₂ denote an exhaustive partitioning of the N objects into g = 2 disjoint treatment groups. As previously, let S ₁ consist of n ₁ = 4 objects with r = 2 commensurate response measurement scores (x _1i and x _2i) on each object for i = 1, …, 4, with $x_{1}^{\,{\prime}} = (5,\,1)$, $x_{2}^{\,{\prime}} = (4,\,6)$, $x_{3}^{\,{\prime}} = (5,\,2)$, and $x_{4}^{\,{\prime}} = (6,\,3)$, and let S ₂ consist of n ₂ = 3 objects with r = 2 commensurate response measurement scores (x _1i and x _2i) on each object for i = 1, 2, 3 with $x_{5}^{\,{\prime}} = (2,\,3)$, $x_{6}^{\,{\prime}} = (3,\,4)$, and $x_{7}^{\,{\prime}} = (2,\,4)$.

The bivariate response measurement scores for the N = 7 objects are listed in Fig. 2.4 on p. 46 and are replicated in Fig. 2.5 for convenience.

For this example analysis, let r = 2, $C_{1} = n_{1}/N = 4/7$, $C_{2} = n_{2}/N = 3/7$, and p = 2, but in this case set v = 1 instead of v = 2, employing ordinary Euclidean distance between objects. Following Eq. (2.1) on p. 30 for treatment group S ₁ with n ₁ = 4 objects, p = 2, and v = 1, the generalized Minkowski distance function yields

$$\displaystyle\begin{array}{rcl} \qquad \qquad \qquad \Delta (1,2)& =& \Big(\big\vert 5 - 4\big\vert ^{2} +\big \vert 1 - 6\big\vert ^{2}\,\Big)^{\!1/2} = 5.0990\;, {}\\ \qquad \qquad \qquad \Delta (1,3)& =& \Big(\big\vert 5 - 5\big\vert ^{2} +\big \vert 1 - 2\big\vert ^{2}\,\Big)^{\!1/2} = 1.0000\;, {}\\ \qquad \qquad \qquad \Delta (1,4)& =& \Big(\big\vert 5 - 6\big\vert ^{2} +\big \vert 1 - 3\big\vert ^{2}\,\Big)^{\!1/2} = 2.2361\;, {}\\ \qquad \qquad \qquad \Delta (2,3)& =& \Big(\big\vert 4 - 5\big\vert ^{2} +\big \vert 6 - 2\big\vert ^{2}\,\Big)^{\!1/2} = 4.1231\;, {}\\ \qquad \qquad \qquad \Delta (2,4)& =& \Big(\big\vert 4 - 6\big\vert ^{2} +\big \vert 6 - 3\big\vert ^{2}\,\Big)^{\!1/2} = 3.6056\;, {}\\ \text{and}\qquad \qquad \qquad \qquad \qquad & & {}\\ \qquad \qquad \qquad \Delta (3,4)& =& \Big(\big\vert 5 - 6\big\vert ^{2} +\big \vert 2 - 3\big\vert ^{2}\,\Big)^{\!1/2} = 1.4142\;, {}\\ \end{array}$$

and for treatment group S ₂ with n ₂ = 3 objects, the generalized Minkowski distance function yields

$$\displaystyle\begin{array}{rcl} \qquad \qquad \qquad \Delta (5,6)& =& \Big(\big\vert 2 - 3\big\vert ^{2} +\big \vert 3 - 4\big\vert ^{2}\,\Big)^{\!1/2} = 1.4142\;, {}\\ \qquad \qquad \qquad \Delta (5,7)& =& \Big(\big\vert 2 - 2\big\vert ^{2} +\big \vert 3 - 4\big\vert ^{2}\,\Big)^{\!1/2} = 1.0000\;, {}\\ \text{and}\qquad \qquad \qquad \qquad \qquad & & {}\\ \qquad \qquad \qquad \Delta (6,7)& =& \Big(\big\vert 3 - 2\big\vert ^{2} +\big \vert 4 - 4\big\vert ^{2}\,\Big)^{\!1/2} = 1.0000\;. {}\\ \end{array}$$

Then, following Eq. (2.3) on p. 31, the average distance-function values for all distinct pairs of objects in treatment group S _i, i = 1, 2, are

$$\displaystyle\begin{array}{rcl} \xi _{1}& =& \binom{n_{1}}{2}^{\!-1}\Big[\Delta (1,2) + \Delta (1,3) + \Delta (1,4) + \Delta (2,3) + \Delta (2,4) + \Delta (3,4)\Big] {}\\ & =& \binom{\,4\,}{2}^{\!-1}\left (5.0990 + 1.0000 + 2.2361 + 4.1231 + 3.6056 + 1.4142\right ) {}\\ & =& 2.9130 {}\\ \end{array}$$

and

$$\displaystyle\begin{array}{rcl} \xi _{2}& =& \binom{n_{2}}{2}^{\!-1}\Big[\Delta (5,6) + \Delta (5,7) + \Delta (6,7)\Big] {}\\ & =& \binom{\,3\,}{2}^{\!-1}\left (1.4142 + 1.0000 + 1.0000\right ) = 1.1381\;. {}\\ \end{array}$$

Following Eq. (2.2) on p. 31, the observed weighted mean of the $\xi _{1}$ and $\xi _{2}$ values, based on v = 1 and $C_{i} = n_{i}/N$ for i = 1, 2 is

$$\displaystyle{ \delta _{\text{o}} = C_{1}\xi _{1} + C_{2}\xi _{2} = \left (\frac{4} {7}\right )(2.9130) + \left (\frac{3} {7}\right )(1.1381) = 2.1523\;. }$$

The N = 7 objects listed in Fig. 2.5 can be partitioned into g = 2 treatment groups, S ₁ and S ₂, with n ₁ = 4 and n ₂ = 3 response measurement scores preserved for each arrangement of the observed data in

$$\displaystyle{ M = \frac{N!} {n_{1}!\;n_{2}!} = \frac{7!} {4!\;3!} = 35 }$$

possible, equally-likely ways. The M = 35 possible arrangements of the observed data in Fig. 2.5, along with the corresponding $\xi _{1}$, $\xi _{2}$, and δ values, are listed in Table 2.5 and ordered by the δ values from lowest to highest. The observed MRPP test statistic, δ _o = 2. 1523, obtained from the realized arrangement,

$$\displaystyle{ \{(5,1)(4,6)(5,2)(6,3)\}\quad \{(2,3)(3,4)(2,4)\}\;, }$$

(Order 2 in Table 2.5) is unusual since 33 of the remaining δ values (δ ₃ to δ ₃₅) exceed the observed value of δ _o = 2. 1523 and only one value is less than the observed value: δ ₁ = 1. 8152. If all arrangements of the N = 7 observed bivariate response measurement scores listed in Fig. 2.5 occur with equal chance, the exact probability value of δ _o = 2. 1523 computed on the M = 35 possible arrangements of the observed data with n ₁ = 4 and n ₂ = 3 response measurement scores preserved for each arrangement is

$$\displaystyle{ P\big(\delta \leq \delta _{\text{o}}\vert H_{0}\big) = \frac{\mbox{ number of $\delta $ values $ \leq \delta _{\mbox{ o}}$}} {M} = \frac{2} {35} = 0.0571\;. }$$

For comparison, for the bivariate response measurement scores listed in Fig. 2.5 the exact probability value based on v = 2 and $C_{i} = n_{i}/N$ for i = 1, 2 in the first example is P = 0. 0857. No comparison is made with the conventional Hotelling T ² test as Hotelling’s T ² is undefined for v = 1.

Table 2.5 Permutations of the observed data set in Fig. 2.5 for treatment groups S ₁ and S ₂ with values for $\xi _{1}$, $\xi _{2}$, and δ based on v = 1, ordered by values of δ from lowest to highest

Full size table

Following Eq. (2.6) on p. 37, the exact average value of the M = 35 δ values listed in Table 2.5 is μ _δ = 2. 9475. Thus, the observed chance-corrected coefficient of agreement, following Eq. (2.5) on p. 37, is

$$\displaystyle{ \mathfrak{R}_{\text{o}} = 1 -\frac{\delta _{\text{o}}} {\mu _{\delta }} = 1 -\frac{2.1523} {2.9475} = +0.2698\;, }$$

indicating approximately 27 % within-group agreement above that expected by chance.

2.3 Coda

Chapter 2 provided the foundation for Multi-Response Permutation Procedures (MRPP), with special emphasis on the generalized Minkowski distance function, $\Delta (x,y)$, as defined in Eq. (2.1) on p. 30; δ, the weighted mean of the specified distance function values for all distinct pairs of objects in treatment group S _i for i = 1, …, g, as defined in Eq. (2.2) on p. 31; and $\mathfrak{R}$, the chance-corrected within-group coefficient of agreement, as defined in Eq. (2.4) on p. 33. Chapters 3 and 4 provide applications of MRPP for completely randomized data at the interval level of measurement, Chaps. 5 and 6 provide applications of MRPP for completely randomized data at the ordinal (ranked) level of measurement, and Chap. 7 provides applications of MRPP for completely randomized data at the nominal (categorical) level of measurement.

Chapter 3

Chapter 3 establishes the relationship between the MRPP test statistics, δ and $\mathfrak{R}$, and selected conventional tests and measures designed for the analysis of completely randomized data at the interval level of measurement. Considered in Chap. 3 are Student’s two-sample t test with interval-level univariate response measurement scores, Hotelling’s two-sample T ² test with interval-level multivariate response measurement scores, one-way fixed-effects analysis of variance (ANOVA) with interval-level univariate response measurement scores, and one-way multivariate analysis of variance (MANOVA) with interval-level multivariate response measurement scores.

Notes

1.
The 1976 paper by Mielke , Berry , and Johnson was the first published account of MRPP [300]. Previously, Mielke utilized MRPP in a study sponsored by the National Communicable Disease Center that involved comparisons of proportional contributions of five plague organism protein bands based on electrophoresis measurements obtained from samples of organisms associated with distinct geographical regions.
2.
The Journal für die Reine und Angewandte Mathematik was founded by August Leopold Crelle in 1826. It continues today, although it is more popularly known as Crelle’s Journal.
3.
A distance function is a metric if it satisfies three properties given by (1) $\Delta (x,y) \geq 0$ and $\Delta (x,x) = 0$, i.e., the distance is positive between two different points and is equal to zero from any point to itself; (2) the distance is symmetric: $\Delta (x,y) = \Delta (y,x)$, i.e., the distance between points x and y is the same in either direction; and (3) the triangle inequality is satisfied: $\Delta (x,y) \leq \Delta (x,z) + \Delta (z,y)$, i.e., the distance between any two points is the shortest distance along any path.
4.
Multi-response permutation procedures also provide for a group of unclassified response measurement scores such as might result from a survey with question choices that include “none of the above” or “not applicable.” See, for example, a 1983 article on lead concentrations in inner-city soils by Mielke , Anderson , Berry , Mielke , Chaney , and Leech [302] and a discussion by Mielke and Berry in 2007 [297, pp. 35–40].
5.
A sufficient condition for a permutation statistical test is the exchangeability of the random variables. Sequences that are independent and identically distributed (i.i.d.) are always exchangeable, but so is sampling without replacement from a finite population. However, while i.i.d. implies exchangeability, exchangeability does not imply i.i.d. [150, 168, 217].
6.
As will be shown in Chap. 3, $\mathfrak{R}$ may also be interpreted as a chance-corrected measure of effect size.
7.
Each element of the S matrix is constructed from two corresponding elements in the $\hat{\boldsymbol{\Sigma }}$ matrices, weighted by the degrees of freedom, i.e., n − 1. For example, the first element of the S matrix is $0.5333 = [(4 - 1)(0.6667) + (3 - 1)(0.3333)]/(4 + 3 - 2)$.

References

Agresti, A.: Measures of nominal-ordinal association. J. Am. Stat. Assoc. 76, 524–529 (1981)
Article MATH Google Scholar
Agresti, A.: Categorical Data Analysis, 2nd edn. Wiley, New York (2002)
Book MATH Google Scholar
Agresti, A., Finley, B.: Statistical Methods for the Social Sciences. Prentice-Hall, Upper Saddle River (1997)
Google Scholar
Agresti, A., Liu, I.: Modeling a categorical variable allowing arbitrarily many category choices. Biometrics 55, 936–943 (1999)
Article MATH Google Scholar
Agresti, A., Liu, I.: Strategies for modeling a categorical variable allowing multiple category choies. Sociol. Method Res. 29, 403–434 (2001)
Article MathSciNet Google Scholar
Altman, D.G., Bland, J.M.: Measurement in medicine: the analysis of method comparison studies. Statistician 32, 307–317 (1983)
Article Google Scholar
Anderson, T.W.: An Introduction to Multivariate Statistical Analysis, 2nd edn. Wiley, New York (1984)
MATH Google Scholar
Anderson, T.W.: Two of Harold Hotelling’s contributions to multivariate analysis. Tech. Rep. 40, Stanford University, Stanford (1990)
Google Scholar
Anderson, D.R., Sweeney, D.J., Williams, T.A.: Introduction to Statistics: Concepts and Applications. West, New York (1994)
Google Scholar
Ansari, A.R., Bradley, R.A.: Rank-sum tests for dispersion. Ann. Math. Stat. 31, 1174–1189 (1960)
Article MathSciNet MATH Google Scholar
Anscombe, F.J.: Rejection of outliers. Technometrics 2, 123–147 (1960)
Article MathSciNet MATH Google Scholar
Arabie, P.: Was Euclid an unnecessarily sophisticated psychologist? Psychometrika 56, 567–587 (1991)
Article MATH Google Scholar
Arbuckle, J., Aiken, L.S.: A program for Pitman’s permutation test for differences in location. Behav. Res. Methods Instrum. 7, 381 (1975)
Article Google Scholar
Author: Resampling Stats User’s Guide. Resampling Stats, Arlington (1999)
Google Scholar
Author: StatXact for Windows. Cytel Software, Cambridge (2000)
Google Scholar
Bailer, A.J.: Testing variance equality with randomization tests. J. Stat. Comput. Simul. 31, 1–8 (1989)
Article MATH Google Scholar
Bakan, D.: The test of significance in psychological research. Psychol. Bull. 66, 423–437 (1966)
Article Google Scholar
Bakeman, R., Robinson, B.F., Quera, V.: Testing sequential association: estimating exact p values using sampled permutations. Psychol. Methods 1, 4–15 (1996)
Article Google Scholar
Bartko, J.J.: On various intraclass correlation reliability coefficients. Psychol. Bull. 83, 762–765 (1976)
Article Google Scholar
Bartko, J.J.: Measurement and reliability: statistical thinking considerations. Schizophr. Bull. 17, 483–489 (1991)
Article Google Scholar
Bartlett, M.S.: A note on tests of significance in multivariate analysis. Proc. Camb. Philos. Soc. 34, 33–40 (1939)
Article MATH Google Scholar
Bernardin, H.J., Beatty, R.W.: Performance Appraisal: Assessing Human Behavior at Work. Kent, Boston (1984)
Google Scholar
Berry, K.J., Mielke, P.W.: Moment approximations as an alternative to the F test in analysis of variance. Br. J. Math. Stat. Psychol. 36, 202–206 (1983)
Article MATH Google Scholar
Berry, K.J., Mielke, P.W.: An APL function for Radlow and Alf’s exact chi-square test. Behav. Res. Methods Instrum. Comput. 17, 131–132 (1985)
Article Google Scholar
Berry, K.J., Mielke, P.W.: Goodman and Kruskal’s tau-b statistic: a nonasymptotic test of significance. Sociol. Methods Res. 13, 543–550 (1985)
Article Google Scholar
Berry, K.J., Mielke, P.W.: Subroutines for computing exact chi-square and Fisher’s exact probability tests. Educ. Psychol. Meas. 45, 153–159 (1985)
Article Google Scholar
Berry, K.J., Mielke, P.W.: A generalization of Cohen’s kappa agreement measure to interval measurement and multiple raters. Educ. Psychol. Meas. 48, 921–933 (1988)
Article Google Scholar
Berry, K.J., Mielke, P.W.: A family of multivariate measures of association for nominal independent variables. Educ. Psychol. Meas. 52, 41–55 (1992)
Article Google Scholar
Berry, K.J., Mielke, P.W.: Spearman’s footrule as a measure of agreement. Psychol. Rep. 80, 839–846 (1997)
Article Google Scholar
Berry, K.J., Mielke, P.W.: Extension of Spearman’s footrule to multiple rankings. Psychol. Rep. 82, 376–378 (1998)
Article Google Scholar
Berry, K.J., Mielke, P.W.: Least absolute regression residuals: analyses of block designs. Psychol. Rep. 83, 923–929 (1998)
Article Google Scholar
Berry, K.J., Mielke, P.W.: Least sum of absolute deviations regression: distance, leverage, and influence. Percept. Mot. Skills 86, 1063–1070 (1998)
Article Google Scholar
Berry, K.J., Mielke, P.W.: Least sum of Euclidean regression residuals: estimation of effect size. Psychol. Rep. 91, 955–962 (2002)
Article Google Scholar
Berry, K.J., Mielke, P.W.: Longitudinal analysis of data with multiple binary category choices. Psychol. Rep. 93, 127–131 (2003)
Article Google Scholar
Berry, K.J., Martin, T.W., Olson, K.F.: Testing theoretical hypotheses: a PRE statistic. Soc. Forces 53, 190–196 (1974)
Article Google Scholar
Berry, K.J., Martin, T.W., Olson, K.F.: A note on fourfold point correlation. Educ. Psychol. Meas. 34, 53–56 (1974)
Article Google Scholar
Berry, K.J., Mielke, P.W., Iyer, H.K.: Factorial designs and dummy coding. Percept. Mot. Skills 87, 919–927 (1998)
Article Google Scholar
Berry, K.J., Mielke, P.W., Mielke, H.W.: The Fisher–Pitman permutation test: an attractive alternative to the F test. Psychol. Rep. 90, 495–502 (2002)
Article Google Scholar
Berry, K.J., Johnston, J.E., Mielke, P.W.: Exact and resampling probability values for measures associated with ordered R by C contingency tables. Psychol. Rep. 99, 231–238 (2006)
Google Scholar
Berry, K.J., Johnston, J.E., Mielke, P.W.: An alternative measure of effect size for Cochran’s Q test for related proportions. Percept. Mot. Skills 104, 1236–1242 (2007)
Google Scholar
Berry, K.J., Johnston, J.E., Mielke, P.W.: A Chronicle of Permutation Statistical Methods: 1920–2000 and Beyond. Springer, Cham (2014)
Book MATH Google Scholar
Bilder, C.R., Loughin, T.M.: On the first-order Rao–Scott correction of the Umesh–Loughin–Scherer statistic. Biometrics 57, 1253–1255 (2001)
Article MathSciNet MATH Google Scholar
Bilder, C.R., Loughin, T.M., Nettleton, D.: Multiple marginal independence-testing for pick any/c variables. Commun. Stat. Simul. Comput. 29, 1285–1316 (2000)
Article MATH Google Scholar
Biondini, M.E., Mielke, P.W., Berry, K.J.: Data-dependent permutation techniques for the analysis of ecological data. Vegetatio 75, 161–168 (1988). [The name of the journal was changed to Plant Ecology in 1997]
Google Scholar
Blalock, H.M.: A double standard in measuring degree of association. Am. Sociol. Rev. 28, 988–989 (1963)
Google Scholar
Blattberg, R., Sargent, T.: Regression with non-Gaussian stable disturbances. Econometrica 39, 501–510 (1971)
Article Google Scholar
Borgatta, E.F.: My student, the purist: a lament. Soc. Q. 9, 29–34 (1968)
Article Google Scholar
Box, G.E.P.: Science and statistics. J. Am. Stat. Assoc. 71, 791–799 (1976)
Article MathSciNet MATH Google Scholar
Box, J.F.: R. A. Fisher: The Life of a Scientist. Wiley, New York (1978)
Google Scholar
Box, G.E.P.: An Accidental Statistician: The Life and Memories of George E. P. Box. Wiley, New York (2013). [Inscribed “With a little help from my friend, Judith L. Allen”]
Google Scholar
Bradbury, I.: Analysis of variance versus randomization: a comparison. Br. J. Math. Stat. Psychol. 40, 177–187 (1987)
Article MathSciNet MATH Google Scholar
Bradley, J.V.: Distribution-free Statistical Tests. Prentice-Hall, Englewood Cliffs (1968)
MATH Google Scholar
Bradley, J.V.: A common situation conducive to bizarre distribution shapes. Am. Stat. 31, 147–150 (1977)
Google Scholar
Brandeau, M.L., Chiu, S.S.: Parametric facility location on a tree network with an L _p norm cost function. Transp. Sci. 22, 59–69 (1988)
Article MathSciNet MATH Google Scholar
Brennan, P.F., Hays, B.J.: The kappa statistic for establishing interrater reliability in the secondary analysis of qualitative clinical data. Res. Nurs. Heal. 15, 153–158 (1992)
Article Google Scholar
Brennan, R.L., Prediger, D.J.: Coefficient kappa: some uses, misuses, and alternatives. Educ. Psychol. Meas. 41, 687–699 (1981)
Article Google Scholar
Brillinger, D.R., Jones, L.V., Tukey, J.W.: The role of statistics in weather resources management. Tech. Rep. II, Weather Modification Advisory Board, United States Department of Commerce, Washington, DC (1978)
Google Scholar
Bross, I.D.J.: Is there an increased risk? Fed. Proc. 13, 815–819 (1954)
Google Scholar
Brown, G.W., Mood, A.M.: On median tests for linear hypotheses. In: Neyman, J. (ed.) Proceedings of the Second Berkeley Symposium on Mathematical Statistics and Probability, vol. II, pp. 159–166. University of California Press, Berkeley (1951)
Google Scholar
Burr, E.J.: The distribution of Kendall’s score S for a pair of tied rankings. Biometrika 47, 151–171 (1960)
Article MathSciNet MATH Google Scholar
Burry-Stock, J.A., Laurie, D.G., Chissom, B.S.: Rater agreement indexes for performance assessment. Educ. Psychol. Meas. 56, 251–262 (1996)
Article Google Scholar
Campbell, M.J., Gardner, M.J.: Calculating confidence intervals for some non-parametric analyses. Br. Med. J. 296, 1454–1456 (1988)
Article Google Scholar
Capraro, R.M., Capraro, M.M.: Treatments of effect sizes and statistical significance tests in textbooks. Educ. Psychol. Meas. 62, 771–782 (2002)
Article MathSciNet Google Scholar
Capraro, R.M., Capraro, M.M.: Exploring the APA fifth edition Publication Manual’s impact of the analytic preferences of journal editorial board members. Educ. Psychol. Meas. 63, 554–565 (2003)
Article MathSciNet Google Scholar
Carroll, R.M., Nordholm, L.A.: Sampling characteristics of Kelley’s ε ² and Hays’ $\hat{\omega }^{2}$. Educ. Psychol. Meas. 35, 541–554 (1975)
Article Google Scholar
Carver, R.P.: The case against statistical significance testing. Harv. Educ. Rev. 48, 378–399 (1978)
Article Google Scholar
Carver, R.P.: The case against statistical significance testing, revisited. J. Exp. Educ. 61, 287–292 (1993)
Article Google Scholar
Chesterton, G.K.: The Complete Father Brown Stories: “The Head of Caesar”. Star Books, Vancouver (2003)
Google Scholar
Cochran, W.G.: The comparison of percentages in matched samples. Biometrika 37, 256–266 (1950)
Article MathSciNet MATH Google Scholar
Cohen, J.: A coefficient of agreement for nominal scales. Educ. Psychol. Meas. 20, 37–46 (1960)
Article Google Scholar
Cohen, J.: Weighted kappa: nominal scale agreement with provision for scaled disagreement or partial credit. Psychol. Bull. 70, 213–220 (1968)
Article Google Scholar
Cohen, J.: Statistical Power Analysis for the Behavioral Sciences. Academic Press, New York (1969)
MATH Google Scholar
Cohen, J.: Things I have learned (so far). Am. Psychol. 45, 1304–1312 (1990)
Article Google Scholar
Cohen, J.: The earth is round (p < . 05). Am. Psychol. 49, 997–1003 (1994)
Google Scholar
Cohen, J., Cohen, P.: Applied Multiple Regression/Correlation Analysis for the Behavioral Sciences. Erlbaum, Hillsdale (1975)
Google Scholar
Colwell, D.J., Gillett, J.R.: Spearman versus Kendall. Math. Gaz. 66, 307–309 (1982)
Article Google Scholar
Conover, W.J.: Practical Nonparametric Statistics, 3rd edn. Wiley, New York (1999)
Google Scholar
Conti, L.H., Musty, R.E.: The effects of delta-9-tetrahydrocannabinol injections to the nucleus accumbens on the locomotor activity of rats. In: Arurell, S., Dewey, W.L., Willette, R.E. (eds.) The Cannabinoids: Chemical, Pharmacologic, and Therapeutic Aspects, pp. 649–655. Academic Press, New York (1984)
Chapter Google Scholar
Coombs, C.H.: A Theory of Data. Wiley, New York (1964)
Google Scholar
Costner, H.L.: Criteria for measures of association. Am. Sociol. Rev. 30, 341–353 (1965)
Article Google Scholar
Cramér, H.: Mathematical Methods of Statistics. Princeton University Press, Princeton (1946)
MATH Google Scholar
Crittenden, K.S., Montgomery, A.C.: A system of paired asymmetric measures of association for use with ordinal dependent variables. Soc. Forces 58, 1178–1194 (1980)
Article Google Scholar
Cureton, E.E.: Rank-biserial correlation. Psychometrika 21, 287–290 (1956)
Article MathSciNet MATH Google Scholar
Cureton, E.E.: Rank-biserial correlation when ties are present. Educ. Psychol. Meas. 28, 77–79 (1968)
Article Google Scholar
Curran-Everett, D.: Explorations in statistics: standard deviations and standard errors. Adv. Physiol. Educ. 32, 203–208 (2008)
Article Google Scholar
Daniel, W.W.: Statistical significance versus practical significance. Sci. Educ. 61, 423–427 (1977)
Article Google Scholar
Daniels, H.E.: Rank correlation and population models (with discussion). J. R. Stat. Soc. Ser. B Methodol. 12, 171–191 (1950)
MathSciNet MATH Google Scholar
Daniels, H.E.: Note on Durbin and Stuart’s formula for E(r _s). J. R. Stat. Soc. Ser. B Methodol. 13, 310 (1951)
Google Scholar
Darwin, C.R.: The Effects of Cross and Self Fertilization in the Vegetable Kingdom. John Murray, London (1876)
Book Google Scholar
David, F.N.: Review of “Rank Correlation Methods” by M. G. Kendall. Biometrika 37, 190 (1950)
Article Google Scholar
de Mast, J., Akkerhuis, T., Erdmann, T.: The statistical evaluation of categorical measurements: simple scales, but treacherous complexity underneath (2014). [Originally a paper presented at the First Stu Hunter Research Conference in Heemskerk, Netherlands, March, 2013]
Google Scholar
Decady, Y.R., Thomas, D.R.: A simple test of association for contingency tables with multiple column responses. Biometrics 56, 893–896 (2000)
Article MATH Google Scholar
Diekhoff, G.: Statistics for the Social and Behavioral Sciences: Univariate, Bivariate, Multivariate. Brown, Dubuque (1992)
Google Scholar
Dielman, T.E.: A comparison of forecasts from least absolute and least squares regression. J. Forecast. 5, 189–195 (1986)
Article Google Scholar
Dielman, T.E.: Corrections to a comparison of forecasts from least absolute and least squares regression. J. Forecast. 8, 419–420 (1989)
Article Google Scholar
Dielman, T.E., Pfaffenberger, R.: Least absolute value regression: necessary sample sizes to use normal theory inference procedures. Decis. Sci. 19, 734–743 (1988)
Article Google Scholar
Dielman, T.E., Rose, E.L.: Forecasting in least absolute value regression with autocorrelated errors: a small-sample study. Int. J. Forecast. 10, 539–547 (1994)
Article Google Scholar
Dodd, D.H., Schultz, R.F.: Computational procedures for estimating magnitude of effects for some analysis of variance designs. Psychol. Bull. 79, 391–395 (1973)
Article Google Scholar
Durbin, J., Stuart, A.: Inversions and rank correlation coefficients. J. R. Stat. Soc. Ser. B Methodol. 13, 303–309 (1951)
MathSciNet MATH Google Scholar
Dwass, M.: Modified randomization tests for nonparametric hypotheses. Ann. Math. Stat. 28, 181–187 (1957)
Article MathSciNet MATH Google Scholar
Dwyer, J.H.: Analysis of variance and the magnitude of effect: a general approach. Psychol. Bull. 81, 731–737 (1974)
Article Google Scholar
Dyson, G.: Turing’s Cathedral: The Origins of the Digital Universe. Pantheon/Vintage, New York (2012)
MATH Google Scholar
Eden, T., Yates, F.: On the validity of Fisher’s z test when applied to an actual example of non-normal data. J. Agric. Sci. 23, 6–17 (1933)
Article Google Scholar
Edgington, E.S.: Randomization tests. J. Psychol. 57, 445–449 (1964)
Article Google Scholar
Edgington, E.S.: Statistical inference and nonrandom samples. Psychol. Bull. 66, 485–487 (1966)
Article Google Scholar
Edgington, E.S.: Approximate randomization tests. J. Psychol. 72, 143–149 (1969)
Article Google Scholar
Edgington, E.S.: Statistical Inference: The Distribution-Free Approach. McGraw-Hill, New York (1969)
Google Scholar
Edgington, E.S.: Randomization Tests. Marcel Dekker, New York (1980)
MATH Google Scholar
Edgington, E.S., Onghena, P.: Randomization Tests, 4th edn. Chapman & Hall/CRC, Boca Raton (2007)
MATH Google Scholar
Edwards, D.: Exact simulation based inference: a survey, with additions. J. Stat. Comput. Simul. 22, 307–326 (1985)
Article MathSciNet MATH Google Scholar
Everitt, B.S.: Moments of the statistics kappa and weighted kappa. Br. J. Math. Stat. Psychol. 21, 97–103 (1968)
Article Google Scholar
Ezekiel, M.J.B.: Methods of Correlation Analysis. Wiley, New York (1930)
MATH Google Scholar
Feinstein, A.R.: Clinical biostatistics XXIII: the role of randomization in sampling, testing, allocation, and credulous idolatry (Part 2). Clin. Pharmacol. Ther. 14, 898–915 (1973)
Article Google Scholar
Feinstein, A.R.: Clinical Biostatistics. C.V. Mosby, St. Louis (1977)
Google Scholar
Ferguson, G.A.: Statistical Analysis in Psychology and Education, 5th edn. McGraw-Hill, New York (1981)
Google Scholar
Festinger, L.: The significance of differences between means without reference to the frequency distribution function. Psychometrika 11, 97–105 (1946)
Article MathSciNet MATH Google Scholar
Fidler, F., Thompson, B.: Computing correct confidence intervals for ANOVA fixed- and random-effects effect sizes. Educ. Psychol. Meas. 61, 575–604 (2001)
MathSciNet Google Scholar
Fisher, R.A.: Statistical Methods for Research Workers. Oliver and Boyd, Edinburgh (1925)
MATH Google Scholar
Fisher, R.A.: The Design of Experiments. Oliver and Boyd, Edinburgh (1935)
Google Scholar
Fisher, R.A.: The logic of inductive inference (with discussion). J. R. Stat. Soc. 98, 39–82 (1935)
Article MATH Google Scholar
Fisher, R.A.: Mathematics of a lady tasting tea. In: Newman, J.R. (ed.) The World of Mathematics, vol. III, section VIII, pp. 1512–1521. Simon & Schuster, New York (1956)
Google Scholar
Fisher, R.A.: The Design of Experiments, 7th edn. Hafner, New York (1960)
Google Scholar
Fleiss, J.L.: Estimating the magnitude of experimental effects. Psychol. Bull. 72, 273–276 (1969)
Article Google Scholar
Fleiss, J.L., Cohen, J., Everitt, B.S.: Large sample standard errors of kappa and weighted kappa. Psychol. Bull. 72, 323–327 (1969)
Article Google Scholar
Franklin, L.A.: Exact tables of Spearman’s footrule for n = 11(1)18 with estimate of convergence and errors for the normal approximation. Stat. Probab. Lett. 6, 399–406 (1988)
Article MathSciNet MATH Google Scholar
Freeman, L.C.: Elementary Applied Statistics. Wiley, New York (1965)
Google Scholar
Frick, R.W.: Interpreting statistical testing: process and propensity, not population and random sampling. Behav. Res. Methods Instrum. Comput. 30, 527–535 (1998)
Article Google Scholar
Friedman, M.: The use of ranks to avoid the assumption of normality implicit in the analysis of variance. J. Am. Stat. Assoc. 32, 675–701 (1937)
Article MATH Google Scholar
Friedman, M.: A comparison of alternative tests of significance for the problem of m rankings. Ann. Math. Stat. 11, 86–92 (1940)
Article MathSciNet MATH Google Scholar
Friedman, H.: Magnitude of experimental effect and a table for its rapid estimation. Psychol. Bull. 70, 245–251 (1968)
Article Google Scholar
Gaebelein, J.W., Soderquist, J.A., Powers, W.A.: A note on the variance explained in the mixed analysis of variance model. Psychol. Bull. 83, 1110–1112 (1976)
Article Google Scholar
Gail, M., Mantel, N.: Counting the number of r × c contingency tables with fixed margins. J. Am. Stat. Assoc. 72, 859–862 (1977)
MathSciNet MATH Google Scholar
Gardner, M.J., Altman, D.G.: Statistics with Confidence: Confidence Intervals and Statistical Guidelines. British Medical Journal, London (1989)
Google Scholar
Geary, R.C.: Some properties of correlation and regression in a limited universe. Metron 7, 83–119 (1927)
MATH Google Scholar
Geary, R.C.: Testing for normality. Biometrika 34, 209–242 (1947)
Article MathSciNet MATH Google Scholar
Gebhard, J., Schmitz, N.: Permutation tests: a revival? I. Optimum properties. Stat. Pap. 39, 75–85 (1998)
MathSciNet MATH Google Scholar
Glass, G.V.: Note on rank-buserial correlation. Educ. Psychol. Meas. 26, 623–631 (1966)
Article Google Scholar
Glass, G.V.: Primary, secondary, and meta-analysis of research. Educ. Res. 5, 3–8 (1976)
Article Google Scholar
Glass, G.V.: Statistical Methods in Education and Psychology, 2nd edn. Prentice-Hall, Englewood Cliffs (1984)
Google Scholar
Glass, G.V., Hakstian, A.R.: Measures of association in comparative experiments: their development and interpretation. Am. Educ. Res. J. 6, 403–414 (1969)
Article Google Scholar
Glass, G.V., Peckham, P.D., Sanders, J.R.: Consequences of failure to meet assumptions underlying the fixed effects analysis of variance and covariance. Rev. Educ. Res. 42, 237–288 (1972)
Article Google Scholar
Glass, G.V., McGraw, B., Smith, M.L.: Meta-Analysis in Social Research: Individual and Neighbourhood Reactions. Sage, Beverly Hills (1981)
Google Scholar
Golding, S.L.: Flies in the ointment: methodological problems in the analysis of the percentage of variance due to persons and situations. Psychol. Bull. 82, 278–289 (1975)
Article Google Scholar
Good, I.J.: Further comments concerning the lady tasting tea or beer: P-values and restricted randomization. J. Stat. Comput. Simul. 40, 263–267 (1992)
Article Google Scholar
Good, P.I.: Permutation, Parametric and Bootstrap Tests of Hypotheses. Springer, New York (1994)
Book MATH Google Scholar
Good, P.I.: Permutation Tests: A Practical Guide to Resampling Methods for Testing Hypotheses. Springer, New York (1994)
Book MATH Google Scholar
Good, P.I.: Resampling Methods: A Practical Guide to Data Analysis. Birkhäuser, Boston (1999)
Book MATH Google Scholar
Good, P.I.: Permutation Tests: A Practical Guide to Resampling Methods for Testing Hypotheses, 2nd edn. Springer, New York (2000)
Book MATH Google Scholar
Good, P.I.: Resampling Methods: A Practical Guide to Data Analysis, 2nd edn. Birkhäuser, Boston (2001)
Book MATH Google Scholar
Good, P.I.: Extensions of the concept of exchangeability and their applications. J. Mod. Appl. Stat. Methods 1, 243–247 (2002)
Google Scholar
Goodman, L.A., Kruskal, W.H.: Measures of association for cross classifications. J. Am. Stat. Assoc. 49, 732–764 (1954)
MATH Google Scholar
Goodman, L.A., Kruskal, W.H.: Measures of association for cross classifications, III: approximate sampling theory. J. Am. Stat. Assoc. 58, 310–364 (1963)
MathSciNet Google Scholar
Gravetter, F.J., Wallnau, L.B.: Essentials of Statistics for the Behavioral Sciences, 8th edn. Wadsworth, Belmont (2014)
Google Scholar
Greenhouse, S.W., Geisser, S.: On methods in the analysis of profile data. Psychometrika 24, 95–112 (1959)
Article MathSciNet Google Scholar
Gridgeman, N.T.: The lady tasting tea, and allied topics. J. Am. Stat. Assoc. 54, 776–783 (1959)
MATH Google Scholar
Grier, D.A.: Statistical laboratories and the origins of computing. Chance 12, 14–20 (1999)
Google Scholar
Grissom, R.J., Kim, J.J.: Effect Sizes for Research: A Broad Practical Approach. Lawrence Erlbaum, Mahwah (2005)
Google Scholar
Grissom, R.J., Kim, J.J.: Effect Sizes for Research: Univariate and Multivariate Applications. Routledge, New York (2012)
Google Scholar
Guggenmoos-Holzmann, I.: How reliable are chance-corrected measures of agreement? Stat. Med 12, 2191–2205 (1993)
Article Google Scholar
Guggenmoos-Holzmann, I.: Comment on “Modeling covariate effects in observer agreement studies: the case of nominal scale agreement” by P. Graham. Stat. Med. 14, 2285–2286 (1995)
Article Google Scholar
Guilford, J.P.: Fundamental Statistics in Psychology and Education. McGraw-Hill, New York (1950)
MATH Google Scholar
Hald, A.: History of Probability and Statistics and Their Applications Before 1750. Wiley, New York (1990)
Book MATH Google Scholar
Hald, A.: A History of Mathematical Statistics from 1750 to 1930. Wiley, New York (1998)
MATH Google Scholar
Haldane, J.B.S., Smith, C.A.B.: A simple exact test for birth-order effect. Ann. Eugen. 14, 117–124 (1948)
Article Google Scholar
Hall, N.S.: R. A. Fisher and his advocacy of randomization. J. Hist. Biol. 40, 295–325 (2007)
Google Scholar
Hanley, J.A.: Standard error of the kappa statistic. Psychol. Bull. 102, 315–321 (1987)
Article Google Scholar
Harding, E.F.: An efficient, minimal-storage procedure for calculating the Mann–Whitney U, generalized U and similar distributions. J. R. Stat. Soc.: Ser. C: Appl. Stat. 33, 1–6 (1984)
Google Scholar
Hayes, A.F.: Permutation test is not distribution-free: testing H ₀: ρ = 0. Psychol. Methods 1, 184–198 (1996)
Article Google Scholar
Hays, W.L.: Statistics. Holt, Rinehart and Winston, New York (1963)
MATH Google Scholar
Hedges, L.V.: Estimation of effect size from a series of independent experiments. Psychol. Bull. 92, 490–499 (1982)
Article Google Scholar
Heiser, W.J.: Geometric representation of association between categories. Psychometrika 69, 513–545 (2004)
Article MathSciNet MATH Google Scholar
Hellman, M.: A study of some etiological factors of malocclusion. Dent. Cosmos 56, 1017–1032 (1914)
Google Scholar
Hemelrijk, J.: Note on Wilcoxon’s two-sample test when ties are present. Ann. Math. Stat. 23, 133–135 (1952)
Article MathSciNet MATH Google Scholar
Henson, R.K., Smith, A.D.: State of the art in statistical significance and effect size reporting: a review of the APA task force report and current trends. J. Res. Dev. Educ. 33, 285–296 (2000)
Google Scholar
Hess, B., Olejnik, S., Huberty, C.J.: The efficacy of two improvement-over-chance effect sizes for two-group univariate comparisons. Educ. Psychol. Meas. 61, 909–936 (2001)
Article MathSciNet Google Scholar
Higgins, J.J.: Introduction to Modern Nonparametric Tests. Brooks/Cole, Pacific Grove (2004)
Google Scholar
Hitchcock, D.B.: Yates and contingency tables: 75 years later. Electron. J. Hist. Probab. Stat. 5, 1–14 (2009)
MathSciNet MATH Google Scholar
Hodges, J.L., Lehmann, E.L.: Rank methods for combination of independent experiments in analysis of variance. Ann. Math. Stat. 33, 482–497 (1962)
Article MathSciNet MATH Google Scholar
Hodges, J.L., Lehmann, E.L.: Estimates of location based on rank tests. Ann. Math. Stat. 34, 598–611 (1963)
Article MathSciNet MATH Google Scholar
Hope, A.C.A.: A simplified Monte Carlo significance test procedure. J. R. Stat. Soc. Ser. B Methodol. 30, 582–598 (1968)
MATH Google Scholar
Hotelling, H.: The generalization of student’s ratio. Ann. Math. Stat. 2, 360–378 (1931)
Article MATH Google Scholar
Hotelling, H.: A generalized T test and measure of multivariate dispersion. In: Neyman, J. (ed.) Proceedings of the Second Berkeley Symposium on Mathematical Statistics and Probability, vol. II, pp. 23–41. University of California Press, Berkeley (1951)
Google Scholar
Hotelling, H., Pabst, M.R.: Rank correlation and tests of significance involving no assumption of normality. Ann. Math. Stat. 7, 29–43 (1936)
Article MATH Google Scholar
Howell, D.C.: Statistical Methods for Psychology, 6th edn. Wadsworth, Belmont (2007)
Google Scholar
Howell, D.C.: Statistical Methods for Psychology, 8th edn. Wadsworth, Belmont (2013)
Google Scholar
Hubbard, R.: Alphabet soup: Blurring the distinctions between p’s and α’s in psychological research. Theor. Psychol. 14, 295–327 (2004)
Article Google Scholar
Hubert, L.J.: A note on Freeman’s measure of association for relating an ordered to an unordered factor. Psychometrika 39, 517–520 (1974)
Article MathSciNet MATH Google Scholar
Hunter, A.A.: On the validity of measures of association: the nominal-nominal two-by-two case. Am. J. Sociol. 79, 99–109 (1973)
Article Google Scholar
Hutchinson, T.P.: Kappa muddles together two sources of disagreement: Tetrachoric correlation is preferable. Res. Nurs. Health 16, 313–315 (1993)
Article Google Scholar
Huynh, H., Feldt, L.S.: Conditions under which mean square ratios in repeated measurements designs have exact F distributions. J. Am. Stat. Assoc. 65, 1582–1589 (1970)
Article MATH Google Scholar
Irwin, J.O.: Tests of significance for differences between percentages based on small numbers. Metron 12, 83–94 (1935)
MATH Google Scholar
Isaacson, W.: The Innovators. Simon & Schuster, New York (2014)
Google Scholar
Jockel, K.H.: Finite sample properties and asymptotic efficiency of Monte Carlo tests. J. Stat. Comput. Simul. 14, 336–347 (1986)
MathSciNet MATH Google Scholar
Johnston, J.E., Berry, K.J., Mielke, P.W.: A measure of effect size for experimental designs with heterogeneous variances. Percept. Mot. Skills 98, 3–18 (2004)
Article Google Scholar
Johnston, J.E., Berry, K.J., Mielke, P.W.: Permutation tests: precision in estimating probability values. Percept. Mot. Skills 105, 915–920 (2007)
Google Scholar
Jonckheere, A.R.: A distribution-free k-sample test against ordered alternatives. Biometrika 41, 133–145 (1954)
Article MathSciNet MATH Google Scholar
Kahaner, D., Moler, C., Nash, S.: Numerical Methods and Software. Prentice-Hall, Englewood Cliffs (1988)
MATH Google Scholar
Kaufman, E.H., Taylor, G.D., Mielke, P.W., Berry, K.J.: An algorithm and FORTRAN program for multivariate LAD (ℓ ₁ of ℓ ₂) regression. Computing 68, 275–287 (2002)
Article MathSciNet MATH Google Scholar
Keller-McNulty, S., Higgins, J.J.: Effect of tail weight and outliers and power and type-I error of robust permutation tests for location. Commun. Stat. Simul. Comput. 16, 17–35 (1987)
Article MathSciNet Google Scholar
Kelley, T.L.: An unbiased correlation ratio measure. Proc. Natl. Acad. Sci. 21, 554–559 (1935)
Article MATH Google Scholar
Kempthorne, O.: The Design and Analysis of Experiments. Wiley, New York (1952)
MATH Google Scholar
Kempthorne, O.: The randomization theory of experimental inference. J. Am. Stat. Assoc. 50, 946–967 (1955)
MathSciNet Google Scholar
Kempthorne, O.: Some aspects of experimental inference. J. Am. Stat. Assoc. 61, 11–34 (1966)
Article MathSciNet Google Scholar
Kempthorne, O.: Why randomize? J. Stat. Plan. Inference 1, 1–25 (1977)
Article MathSciNet MATH Google Scholar
Kendall, M.G.: A new measure of rank correlation. Biometrika 30, 81–93 (1938)
Article MathSciNet MATH Google Scholar
Kendall, M.G.: The treatment of ties in ranking problems. Biometrika 33, 239–251 (1945)
Article MathSciNet MATH Google Scholar
Kendall, M.G.: Rank Correlation Methods. Griffin, London (1948)
MATH Google Scholar
Kendall, M.G.: Rank Correlation Methods, 3rd edn. Griffin, London (1962)
MATH Google Scholar
Kendall, M.G., Babington Smith, B.: The problem of m rankings. Ann. Math. Stat. 10, 275–287 (1939)
Article MathSciNet MATH Google Scholar
Kendall, M.G., Babington Smith, B.: On the method of paired comparisons. Biometrika 31, 324–345 (1940)
Article MathSciNet MATH Google Scholar
Kendall, M.G., Kendall, S.F.H., Babington Smith, B.: The distribution of Spearman’s coefficient of rank correlation in a universe in which all rankings occur an equal number of times. Biometrika 30, 251–273 (1939)
MATH Google Scholar
Kennedy, P.E.: Randomization tests in econometrics. J. Bus. Econ. Stat. 13, 85–94 (1995)
MathSciNet Google Scholar
Kenny, D.A.: Statistics for the Social and Behavioral Sciences. Little Brown, Boston (1987)
Google Scholar
Keppel, G.: Design and Analysis: A Researcher’s Handbook, 2nd edn. Prentice-Hall, Englewood Cliffs (1982)
Google Scholar
Keppel, G., Zedeck, S.: Data Analysis for Research Designs: Analysis of Variance and Multiple Regression/Correlation Approaches. Freeman, New York (1989)
Google Scholar
Kim, M.J., Nelson, C.R., Startz, R.: Mean revision in stock prices? a reappraisal of the empirical evidence. Rev. Econ. Stud. 58, 515–528 (1991)
Article Google Scholar
Kingman, J.F.C.: Uses of exchangeability. Ann. Probab. 6, 183–197 (1978). [Abraham Wald memorial lecture delivered in Aug 1977 in Seattle, Washington]
Google Scholar
Kirk, R.E.: Experimental Design: Procedures for the Behavioral Sciences. Brooks/Cole, Belmont (1968)
MATH Google Scholar
Kirk, R.E.: Practical significance: a concept whose time has come. Educ. Psychol. Meas. 56, 746–759 (1996)
Article Google Scholar
Kirk, R.E.: Effect magnitude: a different focus. J. Stat. Plan. Inference 137, 1634–1646 (2006). [Keynote address delivered at the 2003 International Conference on Statistics, Combinatorics, and Related Areas, held at the University of Southern Maine]
Google Scholar
Kraft, C.A., van Eeden, C.: A Nonparametric Introduction to Statistics. Macmillan, New York (1968)
Google Scholar
Krause, E.F.: Taxicab Geometry. Addison-Wesley, Menlo Park (1975)
Google Scholar
Krippendorff, K.: Bivariate agreement coefficients for reliability of data. In: Borgatta, E.G. (ed.) Sociological Methodology, pp. 139–150. Jossey-Bass, San Francisco (1970)
Google Scholar
Kruskal, W.H.: Historical notes on the Wilcoxon unpaired two-sample test. J. Am. Stat. Assoc. 52, 356–360 (1957)
Article MATH Google Scholar
Kruskal, W.H., Wallis, W.A.: Use of ranks in one-criterion variance analysis. J. Am. Stat. Assoc. 47, 583–621 (1952). [Erratum: J. Am. Stat. Assoc. 48, 907–911 (1953)]
Google Scholar
Lachin, J.M.: Statistical properties of randomization in clinical trials. Control. Clin. Trials 9, 289–311 (1988)
Article Google Scholar
LaFleur, B.J., Greevy, R.A.: Introduction to permutation and resampling-based hypothesis tests. J. Clin. Child Adolesc. 38, 286–294 (2009)
Article Google Scholar
Lance, C.E.: More statistical and methodological myths and urban legends. Organ. Res. Methods 14, 279–286 (2011)
Article Google Scholar
Lange, J.: Crime as Destiny: A Study of Criminal Twins. Allen & Unwin, London (1931). [Translated by C. Haldane]
Google Scholar
Larson, S.C.: The shrinkage of the coefficient of multiple correlation. J. Educ. Psychol. 22, 45–55 (1931)
Article Google Scholar
Larson, R.C., Sadiq, G.: Facility locations with the Manhattan metric in the presence of barriers to travel. Oper. Res. 31, 652–669 (1983)
Article MathSciNet MATH Google Scholar
Lawley, D.N.: A generalization of Fisher’s z test. Biometrika 30, 180–187 (1938)
Article MATH Google Scholar
Lawley, D.N.: Corrections to “A generalization of Fisher’s z test”. Biometrika 30, 467–469 (1939)
MATH Google Scholar
Leach, C.: Introduction to Statistics: A Nonparametric Approach for the Social Sciences. Wiley, New York (1979)
Google Scholar
Lehmann, E.L.: Parametrics vs. nonparametrics: two alternative methodologies. J. Nonparametr. Stat. 21, 397–405 (2009)
Google Scholar
Lehmann, E.L.: Fisher, Neyman, and the Creation of Classical Statistics. Springer, New York (2011)
Book MATH Google Scholar
Lehmann, E.L., Stein, C.M.: On the theory of some non-parametric hypotheses. Ann. Math. Stat. 20, 28–45 (1949)
Article MathSciNet MATH Google Scholar
Levine, J.H.: Joint-space analysis of “pick-any” data: analysis of choices from an unconstrained set of alternatives. Psychometrika 44, 85–92 (1979)
Article Google Scholar
Levine, T.R., Hullett, C.R.: Eta squared, partial eta squared, and misreporting of effect size in communication research. Hum. Commun. Res. 28, 612–625 (2002)
Article Google Scholar
Levine, T.R., Weber, R., Hullett, C.R., Park, H.S., Massi Lindsey, L.L.: A critical assessment of null hypothesis significance testing in quantitative communication research. Hum. Commun. Res. 34, 171–187 (2008)
Article Google Scholar
Levine, T.R., Weber, R., Park, H.S., Hullett, C.R.: A communication researchers’ guide to null hypothesis significance testing and alternatives. Hum. Commun. Res. 34, 188–209 (2008)
Article Google Scholar
Light, R.J.: Measures of response agreement for qualitative data: some generalizations and alternatives. Psychol. Bull. 76, 365–377 (1971)
Article Google Scholar
Light, R.J., Margolin, B.H.: An analysis of variance for categorical data. J. Am. Stat. Assoc. 66, 534–544 (1971)
Article MathSciNet MATH Google Scholar
Linn, R.L., Baker, E.L., Dunbar, S.B.: Complex performance-based assessment: expectations and validation criterion. Educ. Res. 20, 15–21 (1991)
Article Google Scholar
Loether, H.J., McTavish, D.G.: Descriptive and Inferential Statistics: An Introduction, 4th edn. Allyn and Bacon, Boston (1993)
MATH Google Scholar
Loughin, T.M., Scherer, P.N.: Testing for association in contingency tables with multiple column responses. Biometrics 54, 630–637 (1998)
Article MATH Google Scholar
Ludbrook, J.: Advantages of permutation (randomization) tests in clinical and experimental pharmacology and physiology. Clin. Exp. Pharmacol. Physiol. 21, 673–686 (1994)
Article Google Scholar
Ludbrook, J.: Issues in biomedical statistics: comparing means by computer-intensive tests. Aust. N. Z. J. Surg. 65, 812–819 (1995)
Article Google Scholar
Ludbrook, J.: The Wilcoxon–Mann–Whitney test condemned. Br. J. Surg. 83, 136–137 (1996)
Article Google Scholar
Ludbrook, J.: Statistical techniques for comparing measures and methods of measurement: a critical review. Clin. Exp. Pharmacol. Physiol. 29, 527–536 (2002)
Article Google Scholar
Ludbrook, J.: Outlying observations and missing values: how should they be handled? Clin. Exp. Pharmacol. Physiol. 35, 670–678 (2008)
Article Google Scholar
Ludbrook, J., Dudley, H.A.F.: Issues in biomedical statistics: analyzing 2 × 2 tables of frequencies. Aust. N. Z. J. Surg. 64, 780–787 (1994)
Article Google Scholar
Ludbrook, J., Dudley, H.A.F.: Issues in biomedical statistics: statistical inference. Aust. N. Z. J. Surg. 64, 630–636 (1994)
Article Google Scholar
Ludbrook, J., Dudley, H.A.F.: Why permutation tests are superior to t and F tests in biomedical research. Am. Stat. 52, 127–132 (1998)
Google Scholar
Ludbrook, J., Dudley, H.A.F.: Discussion of “Why permutation tests are superior to t and F tests in biomedical research” by J. Ludbrook and H.A.F. Dudley. Am. Stat. 54, 87 (2000)
Google Scholar
Lunneborg, C.E.: Data Analysis by Resampling: Concepts and Applications. Duxbury, Pacific Grove (2000)
Google Scholar
Maclure, M., Willett, W.C.: Misinterpretation and misuse of the kappa statistic. Am. J. Epidemiol. 126, 161–169 (1987)
Article Google Scholar
Manly, B.F.J.: Randomization and Monte Carlo Methods in Biology. Chapman & Hall, London (1991)
Book MATH Google Scholar
Manly, B.F.J.: Randomization and Monte Carlo Methods in Biology, 2nd edn. Chapman & Hall, London (1997)
MATH Google Scholar
Manly, B.F.J.: Randomization, Bootstrap and Monte Carlo Methods in Biology, 3rd edn. Chapman & Hall/CRC, Boca Raton (2007)
MATH Google Scholar
Manly, B.F.J., Francis, R.I.C.: Analysis of variance by randomization when variances are unequal. Aust. N. Z. J. Stat. 41, 411–429 (1999)
Article MATH Google Scholar
Mann, H.B., Whitney, D.R.: On a test of whether one of two random variables is stochastically larger than the other. Ann. Math. Stat. 18, 50–60 (1947)
Article MathSciNet MATH Google Scholar
Margolin, B.H., Light, R.J.: An analysis of variance for categorical data, II: small sample comparisons with chi square and other competitors. J. Am. Stat. Assoc. 69, 755–764 (1974)
MathSciNet MATH Google Scholar
Mathew, T., Nordström, K.: Least squares and least absolute deviation procedures in approximately linear models. Stat. Probab. Lett. 16, 153–158 (1993)
Article MathSciNet MATH Google Scholar
Maxim, P.S.: Quantitative Research Methods in the Social Sciences. Oxford, New York (1999)
Google Scholar
Maxwell, S.E., Camp, C.J., Arvey, R.D.: Measures of strength of association: a comparative examination. J. Appl. Psychol. 66, 525–534 (1981)
Article Google Scholar
May, R.B., Hunter, M.A.: Some advantages of permutation tests. Can. Psychol. 34, 401–407 (1993)
Article Google Scholar
May, S.M.: Modelling observer agreement: an alternative to kappa. J. Clin. Epidemiol. 47, 1315–1324 (1994)
Article Google Scholar
McCarthy, M.D.: On the application of the z-test to randomized blocks. Ann. Math. Stat. 10, 337–359 (1939)
Article MathSciNet MATH Google Scholar
McGrath, R.E., Meyer, G.J.: When effect sizes disagree: the case of r and d. Psychol. Methods 11, 386–401 (2006)
Google Scholar
McHugh, R.B., Mielke, P.W.: Negative variance estimates and statistical dependence in nested sampling. J. Am. Stat. Assoc. 63, 1000–1003 (1968)
Google Scholar
McLean, J.E., Ernest, J.M.: The role of statistical significance testing in educational research. J. Health Soc. Behav. 5, 15–22 (1998)
Google Scholar
McNemar, Q.: Note on the sampling error of the differences between correlated proportions and percentages. Psychometrika 12, 153–157 (1947)
Article Google Scholar
McQueen, G.: Long-horizon mean-reverting stock priced revisited. J. Financ. Quant. Anal. 27, 1–17 (1992)
Article Google Scholar
Mehta, C.R., Patel, N.R.: Algorithm 643: FEXACT. A FORTRAN subroutine for Fisher’s exact test on unordered r × c contingency tables. ACM Trans. Math. Softw. 12, 154–161 (1986)
Google Scholar
Mehta, C.R., Patel, N.R.: A hybrid algorithm for Fisher’s exact test in unordered r × c contingency tables. Commun. Stat. Theory Methods 15, 387–403 (1986)
Article MathSciNet MATH Google Scholar
Mehta, C.R., Patel, N.R., Gray, R.: On computing an exact confidence interval for the common odds ratio in several 2 × 2 contingency tables. J. Am. Stat. Assoc. 80, 969–973 (1985)
MathSciNet MATH Google Scholar
Metropolis, N., Ulam, S.: The Monte Carlo method. J. Am. Stat. Assoc. 44, 335–341 (1949)
Article MathSciNet MATH Google Scholar
Meyer, G.J.: Assessing reliability: critical corrections for a critical examination of the Rorschach comprehensive system. Psychol. Assess. 9, 480–489 (1997)
Article Google Scholar
Micceri, T.: The unicorn, the normal curve, and other improbable creatures. Psychol. Bull. 105, 156–166 (1989)
Article Google Scholar
Mielke, P.W.: Asymptotic behavior of two-sample tests based on powers of ranks for detecting scale and location alternatives. J. Am. Stat. Assoc. 67, 850–854 (1972)
Article MathSciNet MATH Google Scholar
Mielke, P.W.: Squared rank test appropriate to weather modification cross-over design. Technometrics 16, 13–16 (1974)
MathSciNet MATH Google Scholar
Mielke, P.W.: Convenient beta distribution likelihood techniques for describing and comparing meteorological data. J. Appl. Meterol. 14, 985–990 (1975)
Article Google Scholar
Mielke, P.W.: Meteorological applications of permutation techniques based on distance functions. In: Krishnaiah, P.R., Sen, P.K. (eds.) Handbook of Statistics, vol. IV, pp. 813–830. North-Holland, Amsterdam (1984)
Google Scholar
Mielke, P.W.: Geometric concerns pertaining to applications of statistical tests in the atmospheric sciences. J. Atmos. Sci. 42, 1209–1212 (1985)
Article Google Scholar
Mielke, P.W.: Non-metric statistical analyses: some metric alternatives. J. Stat. Plan Inference 13, 377–387 (1986)
Article MathSciNet MATH Google Scholar
Mielke, P.W.: The application of multivariate permutation methods based on distance functions in the earth sciences. Earth Sci. Rev. 31, 55–71 (1991)
Article Google Scholar
Mielke, P.W., Berry, K.J.: An extended class of permutation techniques for matched pairs. Commun. Stat. Theory Methods 11, 1197–1207 (1982)
Article MathSciNet Google Scholar
Mielke, P.W., Berry, K.J.: Asymptotic clarifications, generalizations, and concerns regarding an extended class of matched pairs tests based on powers of ranks. Psychometrika 48, 483–485 (1983)
Article Google Scholar
Mielke, P.W., Berry, K.J.: Cumulant methods for analyzing independence of r-way contingency tables and goodness-of-fit frequency data. Biometrika 75, 790–793 (1988)
Article MathSciNet MATH Google Scholar
Mielke, P.W., Berry, K.J.: Permutation tests for common locations among samples with unequal variances. J. Educ. Behav. Stat. 19, 217–236 (1994)
Article Google Scholar
Mielke, P.W., Berry, K.J.: Nonasymptotic inferences based on Cochran’s Q test. Percept. Mot. Skill 81, 319–322 (1995)
Article Google Scholar
Mielke, P.W., Berry, K.J.: Permutation-based multivariate regression analysis: the case for least sum of absolute deviations regression. Ann. Oper. Res. 74, 259–268 (1997)
Article MATH Google Scholar
Mielke, P.W., Berry, K.J.: Permutation covariate analyses of residuals based on Euclidean distance. Psychol. Rep. 81, 795–802 (1997)
Article Google Scholar
Mielke, P.W., Berry, K.J.: Euclidean distance based permutation methods in atmospheric science. Data Min. Knowl. Disc. 4, 7–27 (2000)
Article MATH Google Scholar
Mielke, P.W., Berry, K.J.: Data-dependent analyses in psychological research. Psychol. Rep. 91, 1225–1234 (2002)
Article Google Scholar
Mielke, P.W., Berry, K.J.: Permutation Methods: A Distance Function Approach, 2nd edn. Springer, New York (2007)
MATH Google Scholar
Mielke, P.W., Berry, K.J.: A note on Cohen’s weighted kappa coefficient of agreement with linear weights. Stat. Methodol. 6, 439–446 (2009)
Article MathSciNet Google Scholar
Mielke, P.W., Iyer, H.K.: Permutation techniques for analyzing multi-response data from randomized block experiments. Commun. Stat. Theory Methods 11, 1427–1437 (1982)
Article MATH Google Scholar
Mielke, P.W., Berry, K.J., Johnson, E.S.: Multi-response permutation procedures for a priori classifications. Commun. Stat. Theory Methods 5, 1409–1424 (1976)
Article MATH Google Scholar
Mielke, P.W., Berry, K.J., Brier, G.W.: Application of multi-response permutation procedures for examining seasonal changes in monthly mean sea-level pressure patterns. Mon. Weather Rev. 109, 120–126 (1981)
Article Google Scholar
Mielke, H.W., Anderson, J.C., Berry, K.J., Mielke, P.W., Chaney, R.L., Leech, M.: Lead concentrations in inner-city soils as a factor in the child lead problem. Am. J. Public Health 73, 1366–1369 (1983)
Article Google Scholar
Mielke, P.W., Berry, K.J., Landsea, C.W., Gray, W.M.: Artificial skill and validation in meteorological forecasting. Weather Forecast. 11, 153–169 (1996)
Article Google Scholar
Mielke, P.W., Berry, K.J., Neidt, C.O.: A permutation test for multivariate matched-pairs analyses: comparisons with Hotelling’s multivariate matched-pairs T ² test. Psychol. Rep. 78, 1003–1008 (1996)
Article Google Scholar
Mielke, P.W., Berry, K.J., Johnston, J.E.: A FORTRAN program for computing the exact variance of weighted kappa. Percept. Mot. Skill 101, 468–472 (2005)
Google Scholar
Mielke, P.W., Berry, K.J., Johnston, J.E.: The exact variance of weighted kappa with multiple raters. Psychol. Rep. 101, 655–660 (2007)
Google Scholar
Mielke, P.W., Berry, K.J., Johnston, J.E.: Resampling programs for multiway contingency tables with fixed marginal frequency totals. Psychol. Rep. 101, 18–24 (2007)
Google Scholar
Mielke, P.W., Berry, K.J., Johnston, J.E.: Resampling probability values for weighted kappa with multiple raters. Psychol. Rep. 102, 606–613 (2008)
Article Google Scholar
Mielke, P.W., Berry, K.J., Johnston, J.E.: Robustness without rank order statistics. J. Appl. Stat. 38, 207–214 (2011)
Article MathSciNet Google Scholar
Minkowski, H.: Über die positiven quadratishen formen und über kettenbruchähnliche algorithmen. Crelle’s J (J. Reine Angew. Math.) 107, 278–297 (1891). [Also available in H. Minkowski, Gesammelte Abhandlungen, vol. 1, AMS Chelsea, New York, 1967]
Google Scholar
Mitchell, C., Hartmann, D.P.: A cautionary note on the use of omega squared to evaluate the effectiveness of behavioral treatments. Behav. Assess. 3, 93–100 (1981)
Article Google Scholar
Mood, A.M.: On the asymptotic efficiency of certain nonparametric two-sample tests. Ann. Math. Stat. 25, 514–522 (1954)
Article MathSciNet MATH Google Scholar
Moses, L.E.: Statistical theory and research design. Ann. Rev. Psychol. 7, 233–258 (1956)
Article Google Scholar
Murphy, K.R., Cleveland, J.: Understanding Performance Appraisal: Social, Organizational, and Goal-Based Perspectives. Sage, Thousand Oaks (1995)
Google Scholar
Myers, J.L., Well, A.D.: Research Design and Statistical Analysis. HarperCollins, New York (1991)
Google Scholar
Nanda, D.N.: Distribution of the sum of roots of a determinantal equation. Ann. Math. Stat. 21, 432–439 (1950)
Article MathSciNet MATH Google Scholar
Neave, H.R., Worthington, P.L.: Distribution-Free Tests. Unwin Hyman, London (1988)
Google Scholar
Newson, R.: Parameters behind “nonparametric” statistics: Kendall’s tau, Somers’ D and median differences. Stata J. 2, 45–64 (2002)
Google Scholar
Neyman, J., Pearson, E.S.: On the use and interpretation of certain test criteria for purposes of statistical inference: part I. Biometrika 20A, 175–240 (1928)
MATH Google Scholar
Neyman, J., Pearson, E.S.: On the use and interpretation of certain test criteria for purposes of statistical inference: part II. Biometrika 20A, 263–294 (1928)
MATH Google Scholar
Nix, T.W., Barnette, J.J.: The data analysis dilemma: Ban or abandon. A review of null hypothesis significance testing. Res. Schools 5, 3–14 (1998)
Google Scholar
Nix, T.W., Barnette, J.J.: A review of hypothesis testing revisited: Rejoinder to Thompson, Knapp, and Levin. Res. Schools 5, 55–57 (1998)
Google Scholar
O’Boyle, Jr., E., Aguinis, H.: The best and the rest: revisiting the norm of normality of individual performance. Percept. Psychophys. 65, 79–119 (2012)
Google Scholar
Okamoto, D.: Letter to the editor: does it work for coffee? Significance 10, 45–46 (2013)
Article Google Scholar
Olds, E.G.: Distribution of sums of squares of rank differences for small numbers of individuals. Ann. Math. Stat. 9, 133–148 (1938)
Article MATH Google Scholar
Olejnik, S., Algina, J.: Measures of effect size for comparative studies: applications, interpretations, and limitations. Contemp. Educ. Psychol. 25, 241–286 (2000)
Article Google Scholar
Olson, C.L.: On choosing a test statistic in multivariate analysis of variance. Psychol. Bull. 83, 579–586 (1976)
Article Google Scholar
Olson, C.L.: Practical considerations in choosing a MANOVA test statistic: a rejoinder to Stevens. Psychol. Bull. 86, 1350–1352 (1979)
Article Google Scholar
Osgood, C.E., Suci, G., Tannenbaum, P.: The Measurement of Meaning. University of Illinois Press, Urbana (1957)
Google Scholar
Overall, J.E., Spiegel, D.K.: Concerning least squares analysis of experimental data. Psychol. Bull. 72, 311–322 (1969)
Article Google Scholar
Pagano, R.R.: Understanding Statistics in the Behavioral Sciences, 6th edn. Wadsworth, Pacific Grove (2001)
Google Scholar
Pearson, K.: Contributions to the mathematical theory of evolution. Proc. R. Soc. Lond. 54, 329–333 (1893)
Article Google Scholar
Pearson, K.: Contributions to the mathematical theory of evolution, II. Skew variation in homogeneous material. Philos. Trans. R. Soc. Lond. A 186, 343–414 (1895)
Article Google Scholar
Pearson, K.: Mathematical contributions to the theory of evolution, XIII. On the theory of contingency and its relation to association and normal correlation. In: Drapers’ Company Research Memoirs, Biometric Series I, pp. 1–35. Cambridge University Press, Cambridge (1904)
Google Scholar
Pearson, E.S.: Untitled. Nature 123, 866–867 (1929). [Review by E.S. Pearson of the second edition of R.A. Fisher’s Statistical Methods for Research Workers]
Google Scholar
Pearson, K., Heron, D.: On theories of association. Biometrika 9, 159–315 (1913)
Article Google Scholar
Pfaffenberger, R., Dinkel, J.: Absolute deviations curve-fitting: an alternative to least squares. In: David, H.A. (ed.) Contributions to Survey Sampling and Applied Statistics, pp. 279–294. Academic Press, New York (1978)
Google Scholar
Picard, R.: Randomization and design: II. In: Feinberg, S.E., Hinkley, D.V. (eds.) R. A. Fisher: An Appreciation, pp. 46–58. Springer, Heidelberg (1980)
Chapter Google Scholar
Pillai, K.C.S.: Some new test criteria in multivariate analysis. Ann. Math. Stat. 26, 117–121 (1955)
Article MathSciNet MATH Google Scholar
Pitman, E.J.G.: Significance tests which may be applied to samples from any populations. Suppl. J. R. Stat. Soc. 4, 119–130 (1937)
Article MATH Google Scholar
Pitman, E.J.G.: Significance tests which may be applied to samples from any populations: II. The correlation coefficient test. Suppl. J. R. Stat. Soc. 4, 225–232 (1937)
Article MATH Google Scholar
Pitman, E.J.G.: Significance tests which may be applied to samples from any populations: III. The analysis of variance test. Biometrika 29, 322–335 (1938)
MATH Google Scholar
Randles, R.H., Wolfe, D.A.: Introduction to the Theory of Nonparametric Statistics. Wiley, New York (1979)
MATH Google Scholar
Raveh, A.: On measures of monotone association. Am. Stat. 40, 117–123 (1986)
MathSciNet MATH Google Scholar
Reinhart, A.: Statistics Done Wrong: The Woefully Complete Guide. No Starch Press, San Francisco (2015)
Google Scholar
Rice, J., White, J.: Norms for smoothing and estimation. SIAM Rev. 6, 243–256 (1964)
Article MathSciNet MATH Google Scholar
Ricketts, C., Berry, J.S.: Teaching statistics through resampling. Teach. Stat. 16, 41–44 (1994)
Article Google Scholar
Roberts, J.K., Henson, R.K.: Correcting for bias in estimating effect sizes. Educ. Psychol. Meas. 62, 241–253 (2002)
Article MathSciNet Google Scholar
Robinson, W.S.: Ecological correlations and the behavior of individuals. Am. Soc. Rev. 15, 351–357 (1950). [Reprinted in Int. J. Epidemiol. 38, 337–341 (2009)]
Google Scholar
Robinson, W.S.: The statistical measurement of agreement. Am. Sociol. Rev. 22, 17–25 (1957)
Article Google Scholar
Robinson, W.S.: The geometric interpretation of agreement. Am. Sociol. Rev. 24, 338–345 (1959)
Article Google Scholar
Rosenberg, B., Carlson, D.: A simple approximation of the sampling distribution of least absolute residuals regression estimates. Commun. Stat. Simul. Comput. 6, 421–438 (1977)
Article MATH Google Scholar
Rosenthal, R., Rosnow, R.L., Rubin, D.B.: Contrasts and Effect Sizes in Behavioral Research: A Correlational Approach. Cambridge University Press, Cambridge (2000)
Google Scholar
Rouanet, H., Lépine, D.: Comparison between treatments in a repeated measures design: ANOVA and multivariate methods. Br. J. Math. Stat. Psychol. 23, 147–164 (1970)
Article MATH Google Scholar
Rousseeuw, P.J.: Least median of squares regression. J. Am. Stat. Assoc. 79, 421–438 (1984)
Article MathSciNet MATH Google Scholar
Routledge, R.D.: Resolving the conflict over Fisher’s exact test. Can. J. Stat. 20, 201–209 (1992)
Article MathSciNet MATH Google Scholar
Roy, S.N.: On a heuristic method of test construction and its use in multivariate analysis. Ann. Math. Stat. 24, 220–238 (1953)
Article MathSciNet MATH Google Scholar
Roy, S.N.: Some Aspects of Multivariate Analysis. Wiley, New York (1957)
Google Scholar
Saal, F.E., Downey, R.G., Lahey, M.A.: Rating the ratings: assessing the quality of rating data. Psychol. Bull. 88, 413–428 (1980)
Article Google Scholar
Salama, I.A., Quade, D.: A note on Spearman’s footrule. Commun. Stat. Simul. Comput. 19, 591–601 (1990)
Article MathSciNet MATH Google Scholar
Salsburg, D.: The Lady Tasting Tea: How Statistics Revolutionized Science in the Twentieth Century. Holt, New York (2001)
MATH Google Scholar
Särndal, C.E.: A comparative study of association measures. Psychometrika 39, 165–187 (1974)
Article MATH Google Scholar
Satterthwaite, F.E.: An approximate distribution of estimates of variance components. Biom. Bull. 2, 110–114 (1946)
Article Google Scholar
Scheffé, H.: Statistical inference in the non-parametric case. Ann. Math. Stat. 14, 305–332 (1943)
Article MathSciNet MATH Google Scholar
Scheffé, H.: The Analysis of Variance. Wiley, New York (1959)
MATH Google Scholar
Schmidt, F.L., Johnson, R.H.: Effect of race on peer ratings in an industrial situation. J. Appl. Psychol. 57, 237–241 (1973)
Article Google Scholar
Schuster, C.: A note on the interpretation of weighted kappa and its relations to other rater agreement statistics for metric scales. Educ. Psychol. Meas. 64, 243–253 (2004)
Article MathSciNet Google Scholar
Scott, W.A.: Reliability of content analysis: the case of nominal scale coding. Public Opin. Q. 19, 321–325 (1955)
Article Google Scholar
Senn, S.: Fisher’s game with the devil. Stat. Med. 13, 217–230 (1994). [Publication of a paper presented at the Statisticians in the Pharmaceutical Industry (PSI) annual conference held in Sept 1991 in Bristol, England]
Google Scholar
Senn, S.: Tea for three: of infusions and inferences and milk in first. Significance 9, 30–33 (2012)
Article Google Scholar
Senn, S.: Response to “Tea break” by S. Springate. Significance 10, 46 (2013)
Google Scholar
Sheynin, O.B.: R. J. Boscovich’s work on probability. Arch. Hist. Exact Sci. 9, 306–324 (1973)
Google Scholar
Shrout, P.E., Fleiss, J.L.: Intraclass correlations: uses in assessing rater relaibility. Psychol. Bull. 86, 420–428 (1979)
Article Google Scholar
Shrout, P.E., Spitzer, R.L., Fleiss, J.L.: Quantification of agreement in psychiatric diagnosis revisited. Arch. Gen. Psychiatry 44, 172–177 (1987)
Article Google Scholar
Siegel, S., Castellan, N.J.: Nonparametric Statistics for the Behavioral Sciences, 2nd edn. McGraw-Hill, New York (1988)
Google Scholar
Siegel, S., Tukey, J.W.: A nonparametric sum of ranks procedure for relative spread in unpaired samples. J. Am. Stat. Assoc. 55, 429–445 (1960). [Corrigendum: J. Am. Stat. Assoc. 56, 1005 (1961)]
Google Scholar
Siegfried, T.: Odds are, it’s wrong. Sci. News 177, 26–29 (2010)
Article Google Scholar
Snedecor, G.W.: Calculation and Interpretation of Analysis of Variance and Covariance. Collegiate Press, Ames (1934)
Book Google Scholar
Snyder, P., Lawson, S.: Evaluating results using corrected and uncorrected effect size estimates. J. Exp. Educ. 61, 334–349 (1993)
Article Google Scholar
Somers, R.H.: A new asymmetric measure of association for ordinal variables. Am. Sociol. Rev. 27, 799–811 (1962)
Article Google Scholar
Spearman, C.E.: The proof and measurement of association between two things. Am. J. Psychol. 15, 72–101 (1904)
Article Google Scholar
Spearman, C.E.: ‘Footrule’ for measuring correlation. Br. J. Psychol. 2, 89–108 (1906)
Google Scholar
Spitznagel, E.L., Helzer, J.E.: A proposed solution to the base rate problem in the kappa statistic. Arch. Gen. Psychiatry 42, 725–728 (1985)
Article Google Scholar
Springate, S.: Tea break. Significance 10, 45–46 (2013)
Article Google Scholar
Stark, R., Roberts, I.: Contemporary Social Research Methods. Micro-Case, Bellevue (1996)
Google Scholar
Stevens, J.P.: Applied Multivariate Statistics for the Social Sciences. Erlbaum, Hillsdale (1986)
MATH Google Scholar
Stevens, J.P.: Intermediate Statistics: A Modern Approach. Erlbaum, Hillsdale (1990)
MATH Google Scholar
Still, A.W., White, A.P.: The approximate randomization test as an alternative to the F test in analysis of variance. Br. J. Math. Stat. Psychol. 34, 243–252 (1981)
Article Google Scholar
Stuart, A.: The estimation and comparison of strengths of association in contingency tables. Biometrika 40, 105–110 (1953)
Article MathSciNet MATH Google Scholar
“Student”: The probable error of a mean. Biometrika 6, 1–25 (1908). [“Student” is a nom de plume for William Sealy Gosset]
Google Scholar
Susskind, E.C., Howland, E.W.: Measuring effect magnitude in repeated measures ANOVA designs: implications for gerontological research. J. Gerontol. 35, 867–876 (1980)
Article Google Scholar
Tabachnick, B.G., Fidell, L.S.: Using Multivariate Statistics, 5th edn. Pearson, Boston (2007)
Google Scholar
Taha, M.A.H.: Rank test for scale parameter for asymmetrical one-sided distributions. Publ. Inst. Stat. Univ. Paris 13, 169–180 (1964)
MathSciNet MATH Google Scholar
Taylor, L.D.: Estimation by minimizing the sum of absolute errors. In: Zarembka, P. (ed.) Frontiers in Econometrics, pp. 169–190. Academic Press, New York (1974)
Google Scholar
Tedin, O.: The influence of systematic plot arrangements upon the estimate of error in field experiments. J. Agric. Sci. 21, 191–208 (1931)
Article Google Scholar
Thompson, D.W.: On Growth and Form: The Complete Revised Edition. Dover, New York (1992)
Book Google Scholar
Thompson, W.L.: 402 citations questioning the indiscriminate use of null hypothesis significance tests in observational studies. http://www.warnercnr.colostate.edu/~anderson/thompson1.html (2001). Accessed 18 June 2015
Thompson, W.L.: Problems with the hypothesis testing approach. http://www.warnercnr.colostate.edu/~gwhite/fw663/testing.pdf (2001). Accessed 18 June 2015
Thompson, W.D., Walter, S.D.: A reappraisal of the kappa coefficient. J. Clin. Epidemiol. 41, 949–958 (1988)
Article Google Scholar
Trafimow, D.: Editorial. Basic Appl. Soc. Psychol. 36, 1–2 (2014)
Article Google Scholar
Trafimow, D., Marks, M.: Editorial. Basic Appl. Soc. Psychol. 37, 1–2 (2015)
Article Google Scholar
Tschuprov, A.A.: Principles of the Mathematical Theory of Correlation. Hodge, London (1939). [Translated by M. Kantorowitsch]
Google Scholar
Tukey, J.W.: Data analysis and behavioral science (1962). [Unpublished manuscript]
Google Scholar
Tukey, J.W.: The future of data analysis. Ann. Math. Stat. 33, 1–67 (1962)
Article MathSciNet MATH Google Scholar
Tukey, J.W.: Randomization and re-randomization: the wave of the past in the future. In: Statistics in the Pharmaceutical Industry: Past, Present and Future. Philadelphia Chapter of the American Statistical Association (1988). [Presented at a Symposium in Honor of Joseph L. Ciminera held in June 1988 at Philadelphia, Pennsylvania]
Google Scholar
Umesh, U.N.: Predicting nominal variable relationships with multiple response. J. Forecast. 14, 585–596 (1995)
Article Google Scholar
Umesh, U.N., Peterson, R.A., Sauber, M.H.: Interjudge agreement and the maximum value of kappa. Educ. Psychol. Meas. 49, 835–850 (1989)
Article Google Scholar
Ury, H.K., Kleinecke, D.C.: Tables of the distribution of Spearman’s footrule. J. R. Stat. Soc.: Ser. C: Appl. Stat. 28, 271–275 (1979)
Google Scholar
van der Reyden, D.: A simple statistical significance test. Rhod. Agric. J. 49, 96–104 (1952)
Google Scholar
Vanbelle, S., Albert, A.: A note on the linearly weighted kappa coefficient for ordinal scales. Stat. Methodol. 6, 157–163 (2008)
Article MathSciNet MATH Google Scholar
Vaughan, G.M., Corballis, M.C.: Beyond tests of significance: estimating strength of effects in selected ANOVA designs. Psychol. Bull. 79, 391–395 (1969)
Google Scholar
von Eye, A., von Eye, M.: On the marginal dependency of Cohen’s κ. Eur. Pychol. 13, 305–315 (2008)
Google Scholar
Wald, A., Wolfowitz, J.: An exact test for randomness in the non-parametric case based on serial correlation. Ann. Math. Stat. 14, 378–388 (1943)
Article MathSciNet MATH Google Scholar
Wallis, W.A.: The correlation ratio for ranked data. J. Am. Stat. Assoc. 34, 533–538 (1939)
Article MATH Google Scholar
Watnik, M.: Early computational statistics. J. Comput. Graph. Stat. 20, 811–817 (2011)
Article MathSciNet Google Scholar
Watterson, I.G.: Nondimensional measures of climate model performance. Int. J. Climatol. 16, 379–391 (1996)
Article Google Scholar
Welch, B.L.: The specification of rules for rejecting too variable a product, with particular reference to an electric lamp problem. Suppl. J. R. Stat. Soc. 3, 29–48 (1936)
Article MATH Google Scholar
Welch, B.L.: On the z-test in randomized blocks and Latin squares. Biometrika 29, 21–52 (1937)
Article MATH Google Scholar
Welch, B.L.: The significance of the difference between two means when the population variances are unequal. Biometrika 29, 350–362 (1938)
Article MATH Google Scholar
Welch, B.L.: On the comparison of several mean values: an alternative approach. Biometrika 38, 330–336 (1951)
Article MathSciNet MATH Google Scholar
Welkowitz, J., Ewen, R.B., Cohen, J.: Introductory Statistics for the Behavioral Sciences, 5th edn. Harcourt Brace, Orlando (2000)
Google Scholar
Wherry, R.J.: A new formula for predicting the shrinkage of the coefficient of multiple correlation. Ann. Math. Stat. 2, 440–457 (1931)
Article MATH Google Scholar
Whitehurst, G.J.: Interrater agreement for journal manuscript reviews. Am. Psychol. 39, 22–28 (1984)
Article Google Scholar
Whitfield, J.W.: Rank correlation between two variables, one of which is ranked, the other dichotomous. Biometrika 34, 292–296 (1947)
Article MathSciNet MATH Google Scholar
Wickens, T.D.: Multiway Contingency Tables Analysis for the Social Sciences. Erlbaum, Hillsdale (1989)
MATH Google Scholar
Wilcox, R.R.: Statistics for the Social Sciences. Academic Press, San Diego (1996)
Google Scholar
Wilcox, R.R.: Applying Contemporary Statistical Techniques. Academic Press, San Diego (2003)
MATH Google Scholar
Wilcox, R.R., Muska, J.: Measuring effect size: a non-parametric analgue of $\hat{\omega }^{2}$. Br. J. Math. Stat. Psychol. 52, 93–110 (1999)
Article Google Scholar
Wilcoxon, F.: Individual comparisons by ranking methods. Biom. Bull. 1, 80–83 (1945)
Article Google Scholar
Wilkinson, L.: Statistical methods in psychology journals: guidelines and explanations. Am. Psychol. 54, 594–604 (1999)
Article Google Scholar
Wilks, S.S.: Certain generalizations in the analysis of variance. Biometrika 24, 471–494 (1932)
Article MATH Google Scholar
Wilson, H.G.: Least squares versus minimum absolute deviations estimation in linear models. Decis. Sci. 9, 322–325 (1978)
Article Google Scholar
Yates, F.: Contingency tables involving small numbers and the χ ² test. Suppl. J. R. Stat. Soc. 1, 217–235 (1934)
Article MATH Google Scholar
Yule, G.U.: On the association of attributes in statistics: with illustrations from the material childhood society. Philos. Trans. R. Soc. Lond. 194, 257–319 (1900)
Article MATH Google Scholar
Yule, G.U.: On the methods of measuring association between two attributes. J. R. Stat. Soc. 75, 579–652 (1912). [Originally a paper read before the Royal Statistical Society on 23 April 1912]
Google Scholar
Zwick, R.: Another look at interrater agreement. Psychol. Bull. 103, 374–378 (1988)
Article MathSciNet Google Scholar

Download references

Author information

Authors and Affiliations

Department of Sociology, Colorado State University, Fort Collins, Colorado, USA
Kenneth J. Berry
Department of Statistics, Colorado State University, Fort Collins, Colorado, USA
Paul W. Mielke Jr.
U.S. Government, Alexandria, Virginia, USA
Janis E. Johnston

Authors

Kenneth J. Berry
View author publications
You can also search for this author in PubMed Google Scholar
Paul W. Mielke Jr.
View author publications
You can also search for this author in PubMed Google Scholar
Janis E. Johnston
View author publications
You can also search for this author in PubMed Google Scholar

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Berry, K.J., Mielke, P.W., Johnston, J.E. (2016). Completely Randomized Data. In: Permutation Statistical Methods. Springer, Cham. https://doi.org/10.1007/978-3-319-28770-6_2

Download citation

DOI: https://doi.org/10.1007/978-3-319-28770-6_2
Published: 04 May 2016
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-28768-3
Online ISBN: 978-3-319-28770-6
eBook Packages: Mathematics and StatisticsMathematics and Statistics (R0)

Publish with us

Policies and ethics

Completely Randomized Data

Abstract

Similar content being viewed by others

Completely-Randomized Designs

Permutation tests for experimental data

Randomized-Blocks Designs

Keywords

2.1 Minkowski Distance Function

2.2 Multi-response Permutation Procedures

Number of Resamplings Necessary

An Index of Agreement

2.2.1 Chance-Corrected Agreement Measures

2.2.2 Example Univariate MRPP Analysis with v = 2

2.2.3 Example Univariate MRPP Analysis with v = 1

2.2.4 Example Bivariate MRPP Analysis with v = 2

2.2.5 Example Bivariate MRPP Analysis with v = 1

2.3 Coda

Chapter 3

Notes

References

Author information

Authors and Affiliations

Rights and permissions

Copyright information

About this chapter

Cite this chapter

Download citation

Publish with us

Navigation

Completely Randomized Data

Abstract

Similar content being viewed by others

Completely-Randomized Designs

Permutation tests for experimental data

Randomized-Blocks Designs

Keywords

2.1 Minkowski Distance Function

2.2 Multi-response Permutation Procedures

Number of Resamplings Necessary

An Index of Agreement

2.2.1 Chance-Corrected Agreement Measures

2.2.2 Example Univariate MRPP Analysis with v = 2

2.2.3 Example Univariate MRPP Analysis with v = 1

2.2.4 Example Bivariate MRPP Analysis with v = 2

2.2.5 Example Bivariate MRPP Analysis with v = 1

2.3 Coda

Chapter 3

Notes

References

Author information

Authors and Affiliations

Rights and permissions

Copyright information

About this chapter

Cite this chapter

Download citation

Share this chapter

Publish with us

Search

Navigation