Keywords

These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

This second chapter of Permutation Statistical Methods introduces a generalized distance function that provides the foundation for a set of multi-response permutation procedures specifically designed for univariate and multivariate completely randomized data. Multi-Response Permutation Procedures (MRPP) were introduced by Mielke , Berry , and Johnson in 1976 and constitute a class of permutation methods for one or more response measurements on each object that were initially developed to distinguish possible differences among two or more groups of objects [300].Footnote 1 The multi-response permutation procedures presented here are based on a generalized Minkowski distance function and provide a synthesizing foundation for a variety of statistical tests and measures for completely randomized data that are further developed in Chaps. 3–7.

2.1 Minkowski Distance Function

Hermann Minkowski (1864–1909), German mathematician and creator of the geometry of numbers, utilized geometrical methods to solve problems in number theory, mathematical physics, and the theory of relativity. Minkowski was a close friend of David Hilbert while teaching at Königsberg University and taught Albert Einstein while employed at Eidgenössische Polytechnikum in Zürich (now, ETH Zürich). In 1891 Minkowski introduced a measure of metric distance between two points in Crelle’s Journal [310].Footnote 2 The Minkowski metric distance of order p between two points in an r-dimensional Euclidean space, \(x^{{\prime}} = (x_{1},x_{2},\,\ldots,\,x_{r})\) and \(y^{{\prime}} = (y_{1},y_{2},\,\ldots,\,y_{r}) \in \mathbb{R}^{r}\), is given by

$$\displaystyle{ d(x,y) = \left (\sum _{i=1}^{r}\big\vert x_{ i} - y_{i}\big\vert ^{p}\right )^{\!1/p}\;, }$$

where p ≥ 1.

The Minkowski distance function is typically used with p = 1, 2, or \(\infty \). When p = 1, the distance is a first-order Minkowski metric, often called a city-block, Manhattan [231], rectilinear [54], or taxicab [222] metric, the latter named for the distance between two points that a car or taxicab would drive in a city laid out in square blocks. When p = 2, the distance is a second-order Minkowski metric and is the ordinary Euclidean distance between points, a generalization of the Pythagorean theorem to more than two coordinates. When \(p = \infty \), the Minkowski metric is known as the Tchebycheff (Chebyshev), von Neumann, or, in the two-dimensional case, the chess-board Minkowski distance [167].

Conventional statistical tests and measures, such as t tests, F tests, and ordinary least-squares (OLS) regression and correlation, are based on squared Euclidean distances between response measurement scores, which are not metric. The Minkowski distance function, however, is limited to metric distances and, under its standard definition, cannot accommodate most conventional statistical tests. Therefore, consider a generalized Minkowski distance function given by

$$\displaystyle{ \Delta (x,y) = \left (\sum _{i=1}^{r}\big\vert x_{ i} - y_{i}\big\vert ^{p}\right )^{\!v/p}\;, }$$
(2.1)

where p ≥ 1 and v > 0 [297, p. 5]. When r ≥ 2, p = 2, and v = 1, \(\Delta (x,y)\) is rotationally invariant in an r ≥ 2 dimensional space. When \(v = p = 1\), \(\Delta (x,y)\) is a city-block metric, which is not rotationally invariant. When v = 1 and p = 2, \(\Delta (x,y)\) is an ordinary Euclidean distance metric. And when \(v = p = 2\), \(\Delta (x,y)\) is a squared Euclidean distance, which is not a metric distance function since the triangle inequality is not satisfied.Footnote 3

2.2 Multi-response Permutation Procedures

Multi-Response Permutation Procedures (MRPP) were originally designed to statistically determine possible differences among one or more response measurement scores among two or more groups of objects or subjects [300]. Let \(\Omega =\{\omega _{1},\,\ldots,\,\omega _{N}\}\) denote a finite sample of N objects that represents a target population, let x i  = (x 1i , , x ri ) be a transposed vector of r commensurate response measurement scores for object ω i , i = 1, , N, and let S 1, , S g designate an exhaustive partitioning of the N objects into g disjoint treatment groups.Footnote 4 The MRPP test statistic is a weighted mean given by

$$\displaystyle{ \delta =\sum _{ i=1}^{g}C_{ i}\xi _{i}\;, }$$
(2.2)

where C i  > 0 is a positive weight for treatment group S i , i = 1, , g, \(\sum _{i=1}^{g}C_{i} =\ 1\),

$$\displaystyle{ \xi _{i} = \binom{n_{i}}{2}^{\!-1}\sum _{ j<k}\Delta (j,k)\,\Psi _{i}(\omega _{j})\,\Psi _{i}(\omega _{k}) }$$
(2.3)

is the average distance-function value for all distinct pairs of objects in treatment group S i , i = 1, , g, n i  ≥ 2 is the number of a priori objects classified into treatment group S i , i = 1, , g,

$$\displaystyle{ N =\sum _{ i=1}^{g}n_{ i}, }$$

\(\sum _{j<k}\) is the sum over all j and k such that 1 ≤ j < k ≤ N, and \(\Psi (\cdot )\) is an indicator function given by

$$\displaystyle{ \Psi _{i}(\omega _{j}) = \left \{\begin{array}{@{}l@{\quad }l@{}} \,1 \quad &\mbox{ if $\omega _{j} \in S_{i}$}\;, \\ [6pt]\,0\quad &\text{otherwise}\;. \end{array} \right. }$$

The choice of the treatment-group weights, C 1, , C g , and the generalized Minkowski distance function given in Eq. (2.1) on p. 30 specify the structure of MRPP. The original choice of C i given by Mielke , Berry , and Johnson in 1976 was

$$\displaystyle{ C_{i} = \frac{n_{i}(n_{i} - 1)} {\sum _{j=1}^{g}n_{ j}(n_{j} - 1)} }$$

for i = 1, , g [300]. However, a variety of other treatment-group weights can be considered; for example,

$$\displaystyle{ C_{i} = \frac{n_{i}} {N}\;,\quad C_{i} = \frac{n_{i} - 1} {N - g}\;,\quad \mbox{ or}\;\quad C_{i} = \frac{1} {g} }$$

for i = 1, , g. The efficient choice of \(C_{i} = n_{i}/N\), i = 1, , g, forces the population variance, \(\sigma _{x}^{2}\), to be proportional to N −2 and eliminates all terms of order 1∕N in the variance of δ [297, pp. 26, 30].

The null hypothesis (H 0) states that equal probabilities are assigned to each of the

$$\displaystyle{ M = \frac{N!} {\prod _{i=1}^{g}n_{ i}!} }$$

possible, equally-likely allocations of the N objects to the g treatment groups, S 1, , S g . Under H 0 the N multi-response measurements are exchangeable multivariate random variables.Footnote 5 The probability value associated with an observed value of δ, δ o, is the probability under the null hypothesis (H 0) of observing a value of δ as extreme or more extreme than δ o. Thus, an exact probability value for δ o may be expressed as

$$\displaystyle{ P\big(\delta \leq \delta _{\text{o}}\vert H_{0}\big) = \frac{\mbox{ number of $\delta $ values $ \leq \delta _{\text{o}}$}} {M} \;. }$$

When M is very large, an approximate probability value for δ may be obtained from a resampling procedure, where

$$\displaystyle{ P\big(\delta \leq \delta _{\text{o}}\vert H_{0}\big) = \frac{\mbox{ number of $\delta $ values $ \leq \delta _{\text{o}}$}} {L} }$$

and L denotes the number of randomly sampled test statistic values. Typically, L is set to a large number to ensure accuracy.

Number of Resamplings Necessary

Exact permutation tests are restricted to relatively small samples, given the large number of possible permutations. On the other hand, resampling permutation tests are not limited by the size of the samples. Resampling permutation tests also have been shown to provide good approximations to exact probability values as a function of the number of resamplings considered. An early concern regarding the systematic use of resampling permutation tests was the speed of the computers used for calculating the probability values. Given modern high-speed computers, the question of computational speed is moot when probability values are not too small. The remaining question is: how many resamplings are required for a specified accuracy?

The number of resamplings suggested in books and articles on permutation methods is varied and likely dated due to previous limitations of computer speed and memory. Some authors have proposed as few as 100 resamplings to as many as 5,000; for example, see discussions by Dwass in 1957 [100]; Hope in 1968 [180]; Edwards in 1985 [110]; Jockel in 1986 [193]; Keller-McNulty and Higgins in 1987 [199]; Bailer in 1989 [16]; Kim, Nelson, and Startz in 1991 [216]; Manly in 1991 [258, pp. 32–35]; McQueen in 1992 [274]; Rickerts and Berry in 1994 [347]; Kennedy in 1995 [212]; Maxim in 1999 [265, p. 356]; Lunneborg in 2000 [256, pp. 210–213]; Good in 2001 [149, p. 47]; Higgins in 2004 [176]; and Edgington and Onghena in 2007 [109, pp. 40–41]. On the other hand, examples provided by Howell as recently as 2007 utilized as many as 10,000 resamplings [184, pp. 642–646]. Resampling computing packages such as Resampling Stats [14] and StatXact [15] typically use 10,000 resamplings as the default value.

The accuracy of a resampling probability value depends on both the probability value (P) and the number of resamplings (L). Confidence limits on the probability value can be obtained from the binomial distribution when L is large. The 1 −α confidence limits of the binomial distribution are given by

$$\displaystyle{ \hat{P }\pm Z_{\alpha /2}\sqrt{\frac{P(1 - P)} {L}} \;, }$$
(2.4)

where P is the probability value in question and \(\hat{P }\) denotes the estimated value of P. Define

$$\displaystyle{ x_{i} = \left \{\begin{array}{@{}l@{\quad }l@{}} \,1 \quad &\mbox{ if $\hat{P }\leq \hat{P } _{\text{o}}$}\;,\\ [6pt]\,0\quad &\text{otherwise} \;, \end{array} \right. }$$

for i = 1, , L, where \(\hat{P }_{\text{o}}\) denotes the observed value of \(\hat{P }\). Then \(\hat{P }\), the expected value of \(\hat{P }\), the variance of \(\hat{P }\), and the skewness of \(\hat{P }\) are given by

$$\displaystyle\begin{array}{rcl} & \hat{P }= \frac{1} {L}\sum _{i=1}^{L}x_{ i}\;,& {}\\ & \mathrm{E}[\hat{P } ] = P\;, & {}\\ & \sigma _{\hat{P } }^{2} = \frac{P(1-P)} {L} \;,& {}\\ \end{array}$$

and

$$\displaystyle{ \gamma _{\hat{P } } = \frac{1 - 2P} {\sqrt{LP(1 - P)}}\;, }$$

respectively [195, p. 916]. If L is small and P is close to either 0 or 1, the skewness term \(\gamma _{\hat{P } }\) becomes large and Eq. (2.4) may not be appropriate. For example, if L = 100 and P = 0. 01,

$$\displaystyle{ \gamma _{\hat{P } } = \frac{1 - 2P} {\sqrt{LP(1 - P)}} = \frac{1 - 2(0.01)} {\sqrt{100(0.01)(1 - 0.01)}} = 0.9849\;. }$$

Table 2.1 lists a selected number of probability values (P = 0. 50, 0.25, 0.10, 0.05, and 0.01), a variety of resamplings (L = 100, 1000, 10,000, 1,000,000, and 100,000,000), computed skewness values, errors on the 95 % confidence limits determined from Eq. (2.4), and the simulated lower and upper errors on the 95 % confidence limits based on L resamplings and determined from the smallest value for which the cumulative binomial distribution is equal to or less than 0.025 and equal to or greater than 0.975, respectively. In general, as can be seen from Table 2.1, two additional orders of magnitude are required to increase accuracy by just one decimal place.

Table 2.1 Five probability (P) values, four levels of resampling (L), skewness (\(\gamma _{\hat{P } }\)), and asymptotic and simulated errors on 95 % confidence limits; table adapted from Johnston, Berry, and Mielke [195, p. 917]

To illustrate the number of resamplings required to yield a predetermined number of decimal places of accuracy, given a known probability value, consider the interval-level data listed in Fig. 2.1.

Fig. 2.1
figure 1

Ordered soil Pb data in mg/kg from two school attendance districts in metropolitan New Orleans

The data listed in Fig. 2.1 are adapted from Berry, Mielke, and Mielke [38] and represent soil lead (Pb) quantities from two school districts in metropolitan New Orleans. Elevated Pb levels have been linked to a number of physiological, neurological, and endocrine effects in children, including difficulties in learning, perception, social behavior, and fine motor skills. The n 1 = 20 soil lead samples collected in District 1 yielded a mean value of \(\bar{x}_{1} = 203.9350\) mg/kg and the n 2 = 20 soil lead samples collected in District 2 yielded a mean value of \(\bar{x}_{2} = 1,661.7800\) mg/kg. There are

$$\displaystyle{ M = \frac{(n_{1} + n_{2})!} {n_{1}!\;n_{2}!} = \frac{(20 + 20)!} {20!\;20!} = 137,846,528,820 }$$

possible permutations of the soil lead data listed in Fig. 2.1 to be considered. Under the null hypothesis of no difference between the two group means in the population, a Fisher–Pitman permutation F test [38] yields an exact two-sided probability value of

$$\displaystyle\begin{array}{rcl} & & P\big(F \geq F_{\text{o}}\vert H_{0}\big) = \frac{\mbox{ number of $F$ values $ \geq F_{\text{o}}$}} {M} {}\\ & & \qquad \qquad \qquad \qquad \quad \phantom{P\big(F \geq F_{\text{o}}\vert H_{0}\big) =}= \frac{2,056,423,782} {137,846,528,820} = 0.0149182123 {}\\ \end{array}$$

for the soil lead data listed in Fig. 2.1. Figure 2.2 summarizes the results for eight different resamplings of the data listed in Fig. 2.1 and the associated two-sided resampling probability values with α = 0. 05. Each of the probability values was generated using a common seed and the same pseudorandom number generator [197]. The last row of Fig. 2.2 contains the exact probability value based on all M = 137, 846, 528, 820 possible permutations of the soil lead data listed in Fig. 2.1.

Fig. 2.2
figure 2

Comparison of eight resampled probability values with the exact probability value given in the last row, based on the soil lead data listed in Fig. 2.1

Given the results of the resampling probability analyses listed in Fig. 2.2, L = 1, 000, 000 is recommended whenever three decimal places of accuracy are required. There are four reasons for promoting L = 1, 000, 000 resamplings: accuracy, practicality, error, and consistency. First, inspection of Fig. 2.2 indicates that with an exact probability value of P = 0. 0149182123 and α = 0. 05, L = 1, 000, 000 resamplings is the minimum number of resamplings necessary to ensure three decimal places of accuracy. Second, given the speed of modern computers and the efficiency of resampling algorithms, such as the Mersenne Twister, L = 1, 000, 000 resamplings can be used on a routine basis. Third, there is the potential for additional type I error, the magnitude of which is of concern when the number of resamplings (L) is very small. Fourth, some researchers object to the use of resampling statistics because different pseudorandom number generators and different seeds can produce widely varying results. This is certainly true when L is very small. For example, in Fig. 2.2, L = 100 yields a probability value of P = 0. 06. Varying the seed with L = 100 and the same pseudorandom number generator produced observed probability values ranging from P = 0. 01 to P = 0. 11. However, with L = 1, 000, 000, varying the seed produced no differences in the third decimal place.

When the number of possible arrangements (M) is very large and the exact probability value (P) is exceedingly small, a resampling permutation procedure may produce no δ values equal to or less than δ o, even with L = 1, 000, 000, yielding an approximate resampling probability value of P = 0. 00. In such cases, moment-approximation permutation procedures based on fitting the first three exact moments of the discrete permutation distribution to a Pearson type III distribution provide approximate probability values, as detailed in Chap. 1, Sect. 1.2.2; see also references [284] and [300].

An Index of Agreement

It is oftentimes desirable to have an index of the amount of agreement among response measurement scores within g treatment groups. A useful measure for this purpose is a chance-corrected within-group coefficient of agreement given by

$$\displaystyle{ \mathfrak{R} = 1 -\frac{\delta } {\mu _{\delta }}\;, }$$
(2.5)

where μ δ is the arithmetic average of the values calculated on all possible arrangements of the observed response measurement scores, given by

$$\displaystyle{ \mu _{\delta } = \frac{1} {M}\sum _{i=1}^{M}\delta _{ i}\;. }$$
(2.6)

\(\mathfrak{R}\) is a chance-corrected measure of agreement since \(\mathrm{E}[\mathfrak{R}\vert H_{0}] = 0\).Footnote 6 Because μ δ is a constant under H 0, the permutation distributions of δ and \(\mathfrak{R}\) are equivalent, viz.,

$$\displaystyle{ P\big(\delta \leq \delta _{\text{o}}\vert H_{0}\big) = P\left (\mathfrak{R}\geq \mathfrak{R}_{\text{o}}\vert H_{0}\right )\;, }$$

where

$$\displaystyle{ \mathfrak{R}_{\text{o}} = 1 -\frac{\delta _{\text{o}}} {\mu _{\delta }} }$$

and δ o and \(\mathfrak{R}_{\text{o}}\) denote the observed values of δ and \(\mathfrak{R}\), respectively. Possible values of \(\mathfrak{R}\) range from slightly negative values to a maximum of \(\mathfrak{R} = +1\) for the extreme case when all response measurements on objects within each of the g classified treatment groups are identical, i.e., δ = 0.

The generalized Minkowski distance function, \(\Delta (x,y)\), as defined in Eq. (2.1) on p. 30, determines the analysis space of the MRPP test statistic, δ. The data space in question for almost all statistical analyses is an ordinary Euclidean distance space. If the distance function of the MRPP test statistic is based on p = 2 and v = 1, then the data and analysis spaces are congruent, so that the resulting statistical analyses represent the data in question. Unfortunately, commonly used statistical analyses based on the arithmetic mean, such as Student’s two-sample t test and Fisher’s one-way analysis of variance, are based on \(p = v = 2\), yielding a non-metric squared-distance analysis space that is not congruent with the data space. The difference between the data and analysis spaces associated with the most popular statistical analyses is a reason that problems occur with what should be routine analyses. Examples illustrating this problem are given elsewhere; see, for example, references [41, pp. 404–410] and [297, pp. 50–53]. Any statistical analysis is questionable when the data and analysis spaces are not congruent.

2.2.1 Chance-Corrected Agreement Measures

Chance-corrected measures yield values that are interpreted as a proportion above that expected by chance alone. Chance-corrected agreement measures provide clear and meaningful interpretations of the amount of, or lack of, agreement present in the data. In general, chance-corrected measures of agreement, such as \(\mathfrak{R}\), are equal to + 1 when perfect agreement among the response measurement scores occurs, 0 when agreement is equal to that expected under independence, and negative when agreement among the response measurement scores is less than that expected by chance. For example, define a chance-corrected measure such that

$$\displaystyle{ A_{i} = 100\left (\frac{O_{i} - E_{i}} {N - E_{i}} \right )\;, }$$

where O i and E i denote the Observed (earned) and Expected (chance) score from purely guessing, respectively, on a multiple-choice examination with N questions for the ith student in a class of m students [175, p. 912].

Thus, on a 50-question multiple-choice examination with five choices per question, chance would indicate that a student could answer 50 × 0. 20 = 10 questions correctly simply by guessing. If a student answered only eight questions correctly, then a chance-corrected measure of agreement would yield a grade of

$$\displaystyle{ A = 100\left ( \frac{8 - 10} {50 - 10}\right ) = 100\left (\frac{-2} {40} \right ) = -5\;, }$$

since the score was less than expected by chance, i.e., only eight of 50 questions were answered correctly. The lowest grade would occur when a student answered all 50 questions incorrectly, yielding a score of

$$\displaystyle{ A = 100\left ( \frac{0 - 10} {50 - 10}\right ) = 100\left (\frac{-10} {40} \right ) = -25\;. }$$

Note that while a student with the highest possible score of 50 correct answers would score

$$\displaystyle{ A = 100\left (\frac{50 - 10} {50 - 10}\right ) = 100\left (\frac{40} {40}\right ) = 100\;, }$$

the lowest possible score is − 25, not − 100. Thus, the distributions of chance-corrected measures are usually asymmetric.

Since the mean value of \(\mathfrak{R}\) under H 0 is 0, homogeneity of within-classified-group response measurements is associated with \(\mathfrak{R} > 0\), and heterogeneity of within-classified-group response measurements is associated with \(\mathfrak{R}\leq 0\) [28]. The distribution of \(\mathfrak{R}\) is usually asymmetric and the upper and lower bounds depend on both the nature of the data and the structure of δ. The degree of homogeneity or heterogeneity depends on the discrete permutation distribution of \(\mathfrak{R}\). If large values of n 1, , n g and N are involved, a very small value of \(P(\delta \leq \delta _{\text{o}}\vert H_{0})\) may be associated with a small positive observed value of \(\mathfrak{R}\), say \(\mathfrak{R}_{\text{o}}\). Conversely, with small values of n 1, , n g and N, a large value of \(\mathfrak{R}_{\text{o}}\) may be associated with a relatively large value of \(P(\delta \leq \delta _{\text{o}}\vert H_{0})\).

2.2.2 Example Univariate MRPP Analysis with v = 2

Although multi-response permutation procedures were originally designed for analyzing multivariate response measurement scores, they can also be used for analyzing univariate data. Consider a comparison between two mutually exclusive groups of objects, S 1 and S 2, where a single response measurement, x, has been obtained from each object. For this example, there is r = 1 response measurement score for each object, g = 2 disjoint groups, and a total of N = 6 objects with n 1 = 2 and n 2 = 4 in treatment groups S 1 and S 2, respectively. Suppose that the n 1 = 2 observed response measurement scores for treatment group S 1 are {5, 4} and the n 2 = 4 response measurement scores for treatment group S 2 are {2, 3, 7, 9}. The treatment-group sizes and the response measurement scores are deliberately kept small to simplify the example analysis. The treatment-group sizes and the univariate response measurement scores are listed in Fig. 2.3.

Fig. 2.3
figure 3

Example data with g = 2, r = 1, n 1 = 2, n 2 = 4, and \(N = n_{1} + n_{2} = 6\)

For this example analysis, let v = 2, p = 2, r = 1,

$$\displaystyle{ C_{1} = \frac{n_{1}} {N} = \frac{2} {6}\;,\quad \mbox{ and}\quad C_{2} = \frac{n_{2}} {N} = \frac{4} {6}\;, }$$

so that the S 1 and S 2 treatment groups are weighted proportional to their group sizes of n 1 = 2 and n 2 = 4, respectively. For univariate response measurement scores with r = 1, Eq. (2.1) on p. 30 reduces to

$$\displaystyle{ \Delta (j,k) =\Big (\big\vert x_{j} - x_{k}\big\vert ^{p}\Big)^{\!v/p}\;. }$$
(2.7)

Thus, for treatment group S 1 with n 1 = 2 objects, p = 2, and v = 2, the generalized Minkowski distance function yields

$$\displaystyle{ \Delta (1,2) =\Big (\big\vert 5 - 4\big\vert ^{2}\,\Big)^{\!2/2} = 1.00\;, }$$

and for treatment group S 2 with n = 4 objects, the generalized Minkowski distance function yields

$$\displaystyle\begin{array}{rcl} \qquad \qquad \qquad \Delta (3,4)& =& \Big(\big\vert 2 - 3\big\vert ^{2}\,\Big)^{\!2/2} =\phantom{ 1}1.00\;, {}\\ \qquad \qquad \qquad \Delta (3,5)& =& \Big(\big\vert 2 - 7\big\vert ^{2}\,\Big)^{\!2/2} = 25.00\;, {}\\ \qquad \qquad \qquad \Delta (3,6)& =& \Big(\big\vert 2 - 9\big\vert ^{2}\,\Big)^{\!2/2} = 49.00\;, {}\\ \qquad \qquad \qquad \Delta (4,5)& =& \Big(\big\vert 3 - 7\big\vert ^{2}\,\Big)^{\!2/2} = 16.00\;, {}\\ \qquad \qquad \qquad \Delta (4,6)& =& \Big(\big\vert 3 - 9\big\vert ^{2}\,\Big)^{\!2/2} = 36.00\;, {}\\ \text{and}\qquad \qquad \qquad \qquad \qquad & & {}\\ \qquad \qquad \qquad \Delta (5,6)& =& \Big(\big\vert 7 - 9\big\vert ^{2}\,\Big)^{\!2/2} =\phantom{ 1}4.00\;. {}\\ \end{array}$$

Then following Eq. (2.3) on p. 31, the average distance-function values for all distinct pairs of objects in treatment groups S i , i = 1, 2, are

$$\displaystyle{ \xi _{1} = \binom{n_{1}}{2}^{\!-1}\Big[\Delta (1,2)\Big] = \binom{\,2\,}{2}^{\!-1}\left (1.00\right ) = 1.00 }$$

and

$$\displaystyle\begin{array}{rcl} \xi _{2}& =& \binom{n_{2}}{2}^{\!-1}\Big[\Delta (3,4) + \Delta (3,5) + \Delta (3,6) + \Delta (4,5) + \Delta (4,6) + \Delta (5,6)\Big] {}\\ & =& \binom{\,4\,}{2}^{\!-1}\left (1.00 + 25.00 + 49.00 + 16.00 + 36.00 + 4.00\right ) = 21.8333\;. {}\\ \end{array}$$

Following Eq. (2.2) on p. 31, the observed weighted mean of the \(\xi _{1}\) and \(\xi _{2}\) values, based on v = 2 and \(C_{i} = n_{i}/N\) for i = 1, 2 is

$$\displaystyle{ \delta _{\text{o}} = C_{1}\xi _{1} + C_{2}\xi _{2} = \left (\frac{2} {6}\right )(1.00) + \left (\frac{4} {6}\right )(21.8333) = 14.8889\;. }$$

Smaller values of δ o  indicate a concentration of response measurement scores within the g treatment groups, whereas larger values of δ o indicate a lack of concentration between response measurement scores among the g treatment groups [301]. The N = 6 objects can be partitioned into g = 2 treatment groups, S 1 and S 2, respectively, with n 1 = 2 and n 2 = 4 response measurement scores preserved in

$$\displaystyle{ M = \frac{N!} {n_{1}!\;n_{2}!} = \frac{6!} {2!\;4!} = 15 }$$

possible, equally-likely ways. The M = 15 possible arrangements of the observed data in Fig. 2.3, along with the corresponding \(\xi _{1}\), \(\xi _{2}\), and δ values, are listed in Table 2.2 and ordered by the δ values from lowest to highest. The observed MRPP test statistic, δ o = 14. 8889, obtained from the realized arrangement,

$$\displaystyle{ \{5,4\}\quad \{2,3,7,9\}\;, }$$

(Order 9 in Table 2.2) is not unusual since five of the remaining δ values (\(\delta _{11}\mbox{ to }\delta _{15}\)) exceed the observed value of δ o = 14. 8889 and 10 values of δ (δ 1 to δ 10) are equal to or less than the observed value. If all arrangements of the N = 6 observed response measurement scores listed in Fig. 2.3 occur with equal chance, the exact probability value of δ o = 14. 8889 computed on the M = 15 possible arrangements of the observed data with n 1 = 2 and n 2 = 4 response measurement scores preserved for each arrangement is

$$\displaystyle{ P\big(\delta \leq \delta _{\text{o}}\vert H_{0}\big) = \frac{\mbox{ number of $\delta $ values $ \leq \delta _{\text{o}}$}} {M} = \frac{10} {15} = 0.6667\;. }$$
Table 2.2 Permutations of the observed data in Fig. 2.3 for treatment groups S 1 and S 2 with values for \(\xi _{1}\), \(\xi _{2}\), and δ based on v = 2, ordered by values of δ from lowest to highest

For comparison, a conventional Student two-sample pooled t test calculated on the N = 6 response measurement scores listed in Fig. 2.3 yields an observed value of \(t_{\text{o}} = -0.3004\). Assuming independence, normality, and homogeneity of variance, t is approximately distributed as Student’s t under the null hypothesis with \(N - 2 = 6 - 2 = 4\) degrees of freedom. Under the null hypothesis, the observed value of \(t_{\text{o}} = -0.3004\) yields an approximate two-sided probability value of P = 0. 7789.

Following Eq. (2.6) on p. 37, the exact average value of the M = 15 δ values listed in Table 2.2 is μ δ  = 13. 60. Thus, the observed chance-corrected coefficient of agreement, following Eq. (2.5) on p. 37, is

$$\displaystyle{ \mathfrak{R}_{\text{o}} = 1 -\frac{\delta _{\text{o}}} {\mu _{\delta }} = 1 -\frac{14.8889} {13.60} = -0.0948\;, }$$

indicating that within-group agreement is well below that expected by chance.

2.2.3 Example Univariate MRPP Analysis with v = 1

Permutation statistical tests and measures are data-dependent, distribution-free, and non-parametric; consequently, they require no distributional assumptions and make no estimates of population parameters. Thus, it is not necessary to set v = 2 and to square the response-measurement differences between objects. While conventional tests and measures that assume normality must estimate the mean and variance, μ x and \(\sigma _{x}^{2}\), of the normal distribution, both of which are based on squared deviations from the mean, permutation tests and measures do not assume normality and are not restricted to v = 2, which is not a metric distance function. A distance function based on v = 1 is an attractive alternative to v = 2 as it is a metric distance function, satisfies the triangle inequality, is robust to extreme values, provides an easy-to-understand ordinary Euclidean distance between objects, and ensures that the data and analysis spaces are congruent [284287, 289, 295]. In addition, choosing v = 1 over v = 2 can make a substantial difference in the results of an MRPP analysis; see, for example, a discussion by Mielke and Berry in 2007 [297, pp. 45–50].

To illustrate the computation of δ with v = 1, consider the same finite sample of N = 6 objects listed in Fig. 2.3 on p. 40 and let S 1 and S 2 denote an exhaustive partitioning of the N = 6 objects into g = 2 disjoint treatment groups. As previously, let S 1 consist of n 1 = 2 objects, each with a single response measurement, and let S 2 consist of n 2 = 4 objects, each with a single response measurement.

Given the univariate data listed in Fig. 2.3, let r = 1, p = 2,

$$\displaystyle{ C_{1} = \frac{n_{1}} {N} = \frac{2} {6}\;,\quad \mbox{ and}\quad C_{2} = \frac{n_{2}} {N} = \frac{4} {6}\;, }$$

but in this case set v = 1 instead of v = 2, employing ordinary Euclidean distance instead of squared Euclidean distance between objects. Following Eq. (2.7) on p. 40 for treatment group S 1 with n 1 = 2 objects, p = 2, and v = 1, the generalized Minkowski distance function yields

$$\displaystyle{ \Delta (1,2) =\Big (\big\vert (5 - 4\big\vert ^{2}\,\Big)^{\!1/2} = 1.00\;, }$$

and for treatment group S 2 with n = 4 objects, the generalized Minkowski distance function yields

$$\displaystyle\begin{array}{rcl} \qquad \qquad \qquad \Delta (3,4)& =& \Big(\big\vert 2 - 3\big\vert ^{2}\,\Big)^{\!1/2} = 1.00\;, {}\\ \qquad \qquad \qquad \Delta (3,5)& =& \Big(\big\vert 2 - 7\big\vert ^{2}\,\Big)^{\!1/2} = 5.00\;, {}\\ \qquad \qquad \qquad \Delta (3,6)& =& \Big(\big\vert 2 - 9\big\vert ^{2}\,\Big)^{\!1/2} = 7.00\;, {}\\ \qquad \qquad \qquad \Delta (4,5)& =& \Big(\big\vert 3 - 7\big\vert ^{2}\,\Big)^{\!1/2} = 4.00\;, {}\\ \qquad \qquad \qquad \Delta (4,6)& =& \Big(\big\vert 3 - 9\big\vert ^{2}\,\Big)^{\!1/2} = 6.00\;, {}\\ \text{and}\qquad \qquad \qquad \qquad \qquad & & {}\\ \qquad \qquad \qquad \Delta (5,6)& =& \Big(\big\vert 7 - 9\big\vert ^{2}\,\Big)^{\!1/2} = 2.00\;. {}\\ \end{array}$$

Then following Eq. (2.3) on p. 31, the average distance-function values for all distinct pairs of objects in treatment group S i , i = 1, 2, are

$$\displaystyle{ \xi _{1} = \binom{n_{1}}{2}^{\!-1}\Big[\Delta (1,2)\Big] = \binom{\,2\,}{2}^{\!-1}\left (1.00\right ) = 1.00 }$$

and

$$\displaystyle\begin{array}{rcl} \xi _{2}& =& \binom{n_{2}}{2}^{\!-1}\Big[\Delta (3,4) + \Delta (3,5) + \Delta (3,6) + \Delta (4,5) + \Delta (4,6) + \Delta (5,6)\Big] {}\\ & =& \binom{\,4\,}{2}^{\!-1}\left (1.00 + 5.00 + 7.00 + 4.00 + 6.00 + 2.00\right ) = 4.1667\;. {}\\ \end{array}$$

Following Eq. (2.2) on p. 31, the observed weighted mean of the \(\xi _{1}\) and \(\xi _{2}\) values, based on \(C_{i} = n_{i}/N\) for i = 1, 2 is

$$\displaystyle{ \delta _{\text{o}} = C_{1}\xi _{1} + C_{2}\xi _{2} = \left (\frac{2} {6}\right )(1.00) + \left (\frac{4} {6}\right )(4.1667) = 3.1111\;. }$$

As in the previous MRPP example with v = 2, the N = 6 objects can be partitioned into g = 2 treatment groups, S 1 and S 2, with n 1 = 2 and n 2 = 4 response measurement scores preserved for each arrangement of the observed data in

$$\displaystyle{ M = \frac{N!} {n_{1}!\;n_{2}!} = \frac{6!} {2!\;4!} = 15 }$$

possible, equally-likely ways. The M = 15 possible arrangements of the observed data in Fig. 2.3, along with the corresponding \(\xi _{1}\), \(\xi _{2}\), and δ values, are listed in Table 2.3 and ordered by the δ values from lowest to highest. The observed MRPP test statistic, δ o = 3. 1111, obtained from the realized arrangement,

$$\displaystyle{ \{5,4\}\quad \{2,3,7,9\}\;, }$$

(Order 5 in Table 2.3) is not unusual since eight of the remaining δ values (δ 8 to δ 15) exceed the observed value of δ o = 3. 1111 and seven values of δ (δ 1 to δ 7) are equal to or less than the observed value. If all arrangements of the N = 6 observed response measurement scores listed in Fig. 2.3 occur with equal chance, the exact probability value of δ o = 3. 1111 computed on the M = 15 possible arrangements of the observed data with n 1 = 2 and n 2 = 4 response measurement scores preserved for each arrangement is

$$\displaystyle{ P\big(\delta \leq \delta _{\text{o}}\vert H_{0}\big) = \frac{\mbox{ number of $\delta $ values $ \leq \delta _{\text{o}}$}} {M} = \frac{7} {15} = 0.4667\;. }$$

For comparison, for the univariate data listed in Fig. 2.3 the exact probability value based on v = 2, M = 15, and \(C_{i} = n_{i}/N\) for i = 1, 2 in the previous example is P = 0. 6667. No comparison is made with the conventional Student two-sample t test as Student’s t test is undefined for v = 1.

Table 2.3 Permutations of the observed data in Fig. 2.3 for treatment groups S 1 and S 2 with values for \(\xi _{1}\), \(\xi _{2}\), and δ based on v = 1, ordered by values of δ from lowest to highest

Following Eq. (2.6) on p. 37, the exact average value of the M = 15 δ values listed in Table 2.3 is μ δ  = 3. 20. Thus, the observed chance-corrected coefficient of agreement, following Eq. (2.5) on p. 37, is

$$\displaystyle{ \mathfrak{R}_{\text{o}} = 1 -\frac{\delta _{\text{o}}} {\mu _{\delta }} = 1 -\frac{14.8889} {3.20} = +0.0278\;, }$$

indicating very little within-group agreement above that expected by chance.

2.2.4 Example Bivariate MRPP Analysis with v = 2

In this second example, bivariate response measurement scores are used for simplicity to demonstrate a multivariate MRPP analysis. To illustrate the computation of MRPP with bivariate response measurement scores for each object, consider a finite sample of N = 7 objects and let S 1 and S 2 denote an exhaustive partitioning of the N objects into g = 2 disjoint treatment groups. Further, let S 1 consist of n 1 = 4 objects with r = 2 commensurate response measurement scores (x 1i and x 2i ) on each object for i = 1, , 4, with \(x_{1}^{\,{\prime}} = (5,\,1)\), \(x_{2}^{\,{\prime}} = (4,\,6)\), \(x_{3}^{\,{\prime}} = (5,\,2)\), and \(x_{4}^{\,{\prime}} = (6,\,3)\), and let S 2 consist of n 2 = 3 objects with r = 2 commensurate response measurement scores (x 1i and x 2i ) on each object for i = 1, 2, 3 with \(x_{5}^{\,{\prime}} = (2,\,3)\), \(x_{6}^{\,{\prime}} = (3,\,4)\), and \(x_{7}^{\,{\prime}} = (2,\,4)\). The treatment group sizes and the response measurement scores are deliberately kept small to simplify the example analysis. The bivariate response measurement scores for the N = 7 objects are listed in Fig. 2.4.

Fig. 2.4
figure 4

Example data with g = 2, r = 2, n 1 = 4, n 2 = 3, and \(N = n_{1} + n_{2} = 7\)

For this example analysis, let v = 2, p = 2, r = 2,

$$\displaystyle{ C_{1} = \frac{n_{1}} {N} = \frac{4} {7}\;,\quad \mbox{ and}\quad C_{2} = \frac{n_{2}} {N} = \frac{3} {7}\;, }$$

so that the S 1 and S 2 treatment groups are weighted proportional to their group sizes of n 1 = 4 and n 2 = 3, respectively. Following Eq. (2.1) on p. 30 for treatment group S 1 with n 1 = 4 objects, p = 2, and v = 2, the generalized Minkowski distance function yields

$$\displaystyle\begin{array}{rcl} \qquad \qquad \qquad \Delta (1,2)& =& \Big(\big\vert 5 - 4\big\vert ^{2} +\big \vert 1 - 6\big\vert ^{2}\,\Big)^{\!2/2} = 26.00\;, {}\\ \qquad \qquad \qquad \Delta (1,3)& =& \Big(\big\vert 5 - 5\big\vert ^{2} +\big \vert 1 - 2\big\vert ^{2}\,\Big)^{\!2/2} =\phantom{ 2}1.00\;, {}\\ \qquad \qquad \qquad \Delta (1,4)& =& \Big(\big\vert 5 - 6\big\vert ^{2} +\big \vert 1 - 3\big\vert ^{2}\,\Big)^{\!2/2} =\phantom{ 2}5.00\;, {}\\ \qquad \qquad \qquad \Delta (2,3)& =& \Big(\big\vert 4 - 5\big\vert ^{2} +\big \vert 6 - 2\big\vert ^{2}\,\Big)^{\!2/2} = 17.00\;, {}\\ \qquad \qquad \qquad \Delta (2,4)& =& \Big(\big\vert 4 - 6\big\vert ^{2} +\big \vert 6 - 3\big\vert ^{2}\,\Big)^{\!2/2} = 13.00\;, {}\\ \text{and}\qquad \qquad \qquad \qquad \qquad & & {}\\ \qquad \qquad \qquad \Delta (3,4)& =& \Big(\big\vert 5 - 6\big\vert ^{2} +\big \vert 2 - 3\big\vert ^{2}\,\Big)^{\!2/2} =\phantom{ 2}2.00\;, {}\\ \end{array}$$

and for treatment group S 2 with n 2 = 3 objects, the generalized Minkowski distance function yields

$$\displaystyle\begin{array}{rcl} \qquad \qquad \qquad \Delta (5,6)& =& \Big(\big\vert 2 - 3\big\vert ^{2} +\big \vert 3 - 4\big\vert ^{2}\,\Big)^{\!2/2} = 2.00\;, {}\\ \qquad \qquad \qquad \Delta (5,7)& =& \Big(\big\vert 2 - 2\big\vert ^{2} +\big \vert 3 - 4\big\vert ^{2}\,\Big)^{\!2/2} = 1.00\;, {}\\ \text{and}\qquad \qquad \qquad \qquad \qquad & & {}\\ \qquad \qquad \qquad \Delta (6,7)& =& \Big(\big\vert 3 - 2\big\vert ^{2} +\big \vert 4 - 4\big\vert ^{2}\,\Big)^{\!2/2} = 1.00\;. {}\\ \end{array}$$

Then following Eq. (2.3) on p. 31, the average distance-function values for all distinct pairs of objects in treatment group S i , i = 1, 2, are

$$\displaystyle\begin{array}{rcl} \xi _{1}& =& \binom{n_{1}}{2}^{\!-1}\Big[\Delta (1,2) + \Delta (1,3) + \Delta (1,4) + \Delta (2,3) + \Delta (2,4) + \Delta (3,4)\Big] {}\\ & =& \binom{\,4\,}{2}^{\!-1}\left (26.00 + 1.00 + 5.00 + 17.00 + 13.00 + 2.00\right ) = 10.6667 {}\\ \end{array}$$

and

$$\displaystyle\begin{array}{rcl} \xi _{2}& =& \binom{n_{2}}{2}^{\!-1}\Big[\Delta (5,6) + \Delta (5,7) + \Delta (6,7)\Big] {}\\ & =& \binom{\,3\,}{2}^{\!-1}\left (2.00 + 1.00 + 1.00\right ) = 1.3333\;. {}\\ \end{array}$$

Following Eq. (2.2) on p. 31, the observed weighted mean of the \(\xi _{1}\) and \(\xi _{2}\) values, based on v = 2 and \(C_{i} = n_{i}/N\) for i = 1, 2 is

$$\displaystyle{ \delta _{\text{o}} = C_{1}\xi _{1} + C_{2}\xi _{2} = \left (\frac{4} {7}\right )(10.6667) + \left (\frac{3} {7}\right )(1.3333) = 6.6667\;. }$$

The N = 7 objects can be partitioned into g = 2 treatment groups, S 1 and S 2, with n 1 = 4 and n 2 = 3 response measurement scores preserved for each arrangement of the observed data in

$$\displaystyle{ M = \frac{N!} {n_{1}!\;n_{2}!} = \frac{7!} {4!\;3!} = 35 }$$

possible, equally-likely ways. The M = 35 possible arrangements of the observed bivariate data in Fig. 2.4, along with the corresponding \(\xi _{1}\), \(\xi _{2}\), and δ values, are listed in Table 2.4 and ordered by the δ values from lowest to highest.

Table 2.4 Permutations of the observed data set in Fig. 2.4 for treatment groups S 1 and S 2 with values for \(\xi _{1}\), \(\xi _{2}\), and δ based on v = 2, ordered by values of δ from lowest to highest

The observed MRPP test statistic, δ o = 6. 6667, obtained from the realized arrangement,

$$\displaystyle{ \{(5,1)(4,6)(5,2)(6,3)\}\quad \{(2,3)(3,4)(2,4)\}\;, }$$

(Order 3 in Table 2.4) is unusual since 32 of the remaining δ values (δ 4 to δ 35) exceed the observed value of δ o = 6. 6667 and only two values of δ are less than the observed value: δ 1 = 4. 0000 and δ 2 = 6. 4762. If all arrangements of the N = 7 observed bivariate response measurement scores listed in Fig. 2.4 occur with equal chance, the exact probability value of δ o = 6. 6667 computed on the M = 35 possible arrangements of the observed data with n 1 = 4 and n 2 = 3 response measurement scores preserved for each arrangement is

$$\displaystyle{ P\big(\delta \leq \delta _{\text{o}}\vert H_{0}\big) = \frac{\mbox{ number of $\delta $ values $ \leq \delta _{\text{o}}$}} {M} = \frac{3} {35} = 0.0857\;. }$$

A conventional Hotelling two-sample T 2 test is given by

$$\displaystyle{ T^{2} = \frac{n_{1}n_{2}} {N} \left (\bar{\mathbf{y}}_{1} -\bar{\mathbf{y}}_{2}\right )^{{\prime}}\mathbf{S}^{-1}\left (\bar{\mathbf{y}}_{ 1} -\bar{\mathbf{y}}_{2}\right )\;, }$$
(2.8)

where \(\bar{\mathbf{y}}_{1}\) and \(\bar{\mathbf{y}}_{2}\) denote vectors of mean differences between treatment groups S 1 and S 2, n 1 and n 2 are the number of interval-level multivariate response measurement scores in treatment groups S 1 and S 2, and S is a pooled variance–covariance matrix.

For the example data listed in Fig. 2.4, \(\bar{y}_{11} = 5.00\), \(s_{11}^{2} = 0.6167\), \(\bar{y}_{12} = 3.00\), \(s_{12}^{2} = 4.6667\), \(\mathrm{cov}(1,2)_{1} = -1.00\), \(\bar{y}_{21} = 2.3333\), \(s_{21}^{2} = 0.3333\), \(\bar{y}_{22} = 3.6667\), \(s_{22}^{2} = 0.3333\), and \(\mathrm{cov}(1,2)_{2} = +0.1667\). Then, \(\bar{\mathbf{y}}_{1} =\bar{ y}_{11} -\bar{ y}_{21} = 5.00 - 2.3333 = +2.6667\) and \(\bar{\mathbf{y}}_{2} =\bar{ y}_{12} -\bar{ y}_{22} = 3.00 - 3.6667 = -0.6667\).

The variance–covariance matrices for treatment groups S 1 and S 2 in Fig. 2.4 are

$$\displaystyle{ \hat{\boldsymbol{\Sigma }}_{1} = \left [\begin{array}{ccc} \phantom{ +} 0.6667&& - 1.0000\\ - 1.0000 & &\phantom{ +} 4.6667 \end{array} \right ]\quad \mbox{ and}\quad \hat{\boldsymbol{\Sigma }}_{2} = \left [\begin{array}{ccc} \phantom{ +} 0.3333&& + 0.1667\\ + 0.1667 & &\phantom{ +} 0.3333 \end{array} \right ]\;, }$$

respectively, and the pooled variance–covariance matrix and its inverse are

$$\displaystyle{ \mathbf{S} = \left [\begin{array}{ccc} \phantom{ -} 0.5333&& - 0.5333\\ - 0.5333 & &\phantom{ -} 2.9333 \end{array} \right ]\quad \mbox{ and}\quad \mathbf{S}^{-1} = \left [\begin{array}{ccc} + 2.9167&& + 0.4167 \\ + 0.4167&& + 0.4167 \end{array} \right ]\;, }$$

respectively.Footnote 7

Following Eq. (2.8), the observed value of Hotelling’s T 2 is

$$\displaystyle\begin{array}{rcl} T_{\text{o}}^{2}& =& \frac{n_{1}n_{2}} {N} \left (\bar{\mathbf{y}}_{1} -\bar{\mathbf{y}}_{2}\right )^{{\prime}}\mathbf{S}^{-1}\left (\bar{\mathbf{y}}_{ 1} -\bar{\mathbf{y}}_{2}\right ) {}\\ & =& \frac{(4)(3)} {7} \left [\begin{array}{ccc} + 2.6667\ - 0.6667 \end{array} \right ]\left [\begin{array}{ccc} + 2.2917&& + 0.4167\\ + 0.4167 & & + 0.4167 \end{array} \right ]\left [\begin{array}{c} + 2.6667\\ - 0.6667 \end{array} \right ] {}\\ & & \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad = (1.7143)(15.00) = 25.7143 {}\\ \end{array}$$

and the observed F-ratio for Hotelling’s T 2 is

$$\displaystyle{ F_{\text{o}} = \frac{N - r - 1} {r(N - r)} T_{\text{o}}^{2} = \frac{7 - 2 - 1} {2(7 - 2)} (25.7145) = 10.2858\;. }$$

Assuming independence, normality, and homogeneity of variance, F is approximately distributed as Snedecor’s F under the null hypothesis with \(\nu _{1} = r = 2\) and \(\nu _{2} = N - r - 1 = 7 - 2 - 1 = 4\) degrees of freedom. Under the null hypothesis, the observed value of F o = 10. 2858 yields an approximate probability value of P = 0. 0265. While there is a considerable difference between the exact probability value of P = 0. 0857 and the approximate probability value of P = 0. 0265, it is not surprising, as Hotelling’s T 2 test was not designed for samples as small as n 1 = 4 and n 2 = 3.

Following Eq. (2.6) on p. 37, the exact average value of the M = 35 δ values listed in Table 2.4 is μ δ  = 10. 0952. Thus, the observed chance-corrected coefficient of agreement, following Eq. (2.5) on p. 37, is

$$\displaystyle{ \mathfrak{R}_{\text{o}} = 1 -\frac{\delta _{\text{o}}} {\mu _{\delta }} = 1 - \frac{6.6667} {10.0952} = +0.3396\;, }$$

indicating approximately 34 % within-group agreement above that expected by chance.

2.2.5 Example Bivariate MRPP Analysis with v = 1

As mentioned in the univariate example on p. 43, the choice of v can make a substantial difference in the results of an MRPP analysis. To illustrate the computation of MRPP with bivariate data and v = 1, consider the same finite sample of N = 7 objects listed in Fig. 2.4 on p. 46 and let S 1 and S 2 denote an exhaustive partitioning of the N objects into g = 2 disjoint treatment groups. As previously, let S 1 consist of n 1 = 4 objects with r = 2 commensurate response measurement scores (x 1i and x 2i ) on each object for i = 1, , 4, with \(x_{1}^{\,{\prime}} = (5,\,1)\), \(x_{2}^{\,{\prime}} = (4,\,6)\), \(x_{3}^{\,{\prime}} = (5,\,2)\), and \(x_{4}^{\,{\prime}} = (6,\,3)\), and let S 2 consist of n 2 = 3 objects with r = 2 commensurate response measurement scores (x 1i and x 2i ) on each object for i = 1, 2, 3 with \(x_{5}^{\,{\prime}} = (2,\,3)\), \(x_{6}^{\,{\prime}} = (3,\,4)\), and \(x_{7}^{\,{\prime}} = (2,\,4)\).

The bivariate response measurement scores for the N = 7 objects are listed in Fig. 2.4 on p. 46 and are replicated in Fig. 2.5 for convenience.

Fig. 2.5
figure 5

Example data with g = 2, r = 2, n 1 = 4, n 2 = 3, and \(N = n_{1} + n_{2} = 7\)

For this example analysis, let r = 2, \(C_{1} = n_{1}/N = 4/7\), \(C_{2} = n_{2}/N = 3/7\), and p = 2, but in this case set v = 1 instead of v = 2, employing ordinary Euclidean distance between objects. Following Eq. (2.1) on p. 30 for treatment group S 1 with n 1 = 4 objects, p = 2, and v = 1, the generalized Minkowski distance function yields

$$\displaystyle\begin{array}{rcl} \qquad \qquad \qquad \Delta (1,2)& =& \Big(\big\vert 5 - 4\big\vert ^{2} +\big \vert 1 - 6\big\vert ^{2}\,\Big)^{\!1/2} = 5.0990\;, {}\\ \qquad \qquad \qquad \Delta (1,3)& =& \Big(\big\vert 5 - 5\big\vert ^{2} +\big \vert 1 - 2\big\vert ^{2}\,\Big)^{\!1/2} = 1.0000\;, {}\\ \qquad \qquad \qquad \Delta (1,4)& =& \Big(\big\vert 5 - 6\big\vert ^{2} +\big \vert 1 - 3\big\vert ^{2}\,\Big)^{\!1/2} = 2.2361\;, {}\\ \qquad \qquad \qquad \Delta (2,3)& =& \Big(\big\vert 4 - 5\big\vert ^{2} +\big \vert 6 - 2\big\vert ^{2}\,\Big)^{\!1/2} = 4.1231\;, {}\\ \qquad \qquad \qquad \Delta (2,4)& =& \Big(\big\vert 4 - 6\big\vert ^{2} +\big \vert 6 - 3\big\vert ^{2}\,\Big)^{\!1/2} = 3.6056\;, {}\\ \text{and}\qquad \qquad \qquad \qquad \qquad & & {}\\ \qquad \qquad \qquad \Delta (3,4)& =& \Big(\big\vert 5 - 6\big\vert ^{2} +\big \vert 2 - 3\big\vert ^{2}\,\Big)^{\!1/2} = 1.4142\;, {}\\ \end{array}$$

and for treatment group S 2 with n 2 = 3 objects, the generalized Minkowski distance function yields

$$\displaystyle\begin{array}{rcl} \qquad \qquad \qquad \Delta (5,6)& =& \Big(\big\vert 2 - 3\big\vert ^{2} +\big \vert 3 - 4\big\vert ^{2}\,\Big)^{\!1/2} = 1.4142\;, {}\\ \qquad \qquad \qquad \Delta (5,7)& =& \Big(\big\vert 2 - 2\big\vert ^{2} +\big \vert 3 - 4\big\vert ^{2}\,\Big)^{\!1/2} = 1.0000\;, {}\\ \text{and}\qquad \qquad \qquad \qquad \qquad & & {}\\ \qquad \qquad \qquad \Delta (6,7)& =& \Big(\big\vert 3 - 2\big\vert ^{2} +\big \vert 4 - 4\big\vert ^{2}\,\Big)^{\!1/2} = 1.0000\;. {}\\ \end{array}$$

Then, following Eq. (2.3) on p. 31, the average distance-function values for all distinct pairs of objects in treatment group S i , i = 1, 2, are

$$\displaystyle\begin{array}{rcl} \xi _{1}& =& \binom{n_{1}}{2}^{\!-1}\Big[\Delta (1,2) + \Delta (1,3) + \Delta (1,4) + \Delta (2,3) + \Delta (2,4) + \Delta (3,4)\Big] {}\\ & =& \binom{\,4\,}{2}^{\!-1}\left (5.0990 + 1.0000 + 2.2361 + 4.1231 + 3.6056 + 1.4142\right ) {}\\ & =& 2.9130 {}\\ \end{array}$$

and

$$\displaystyle\begin{array}{rcl} \xi _{2}& =& \binom{n_{2}}{2}^{\!-1}\Big[\Delta (5,6) + \Delta (5,7) + \Delta (6,7)\Big] {}\\ & =& \binom{\,3\,}{2}^{\!-1}\left (1.4142 + 1.0000 + 1.0000\right ) = 1.1381\;. {}\\ \end{array}$$

Following Eq. (2.2) on p. 31, the observed weighted mean of the \(\xi _{1}\) and \(\xi _{2}\) values, based on v = 1 and \(C_{i} = n_{i}/N\) for i = 1, 2 is

$$\displaystyle{ \delta _{\text{o}} = C_{1}\xi _{1} + C_{2}\xi _{2} = \left (\frac{4} {7}\right )(2.9130) + \left (\frac{3} {7}\right )(1.1381) = 2.1523\;. }$$

The N = 7 objects listed in Fig. 2.5 can be partitioned into g = 2 treatment groups, S 1 and S 2, with n 1 = 4 and n 2 = 3 response measurement scores preserved for each arrangement of the observed data in

$$\displaystyle{ M = \frac{N!} {n_{1}!\;n_{2}!} = \frac{7!} {4!\;3!} = 35 }$$

possible, equally-likely ways. The M = 35 possible arrangements of the observed data in Fig. 2.5, along with the corresponding \(\xi _{1}\), \(\xi _{2}\), and δ values, are listed in Table 2.5 and ordered by the δ values from lowest to highest. The observed MRPP test statistic, δ o = 2. 1523, obtained from the realized arrangement,

$$\displaystyle{ \{(5,1)(4,6)(5,2)(6,3)\}\quad \{(2,3)(3,4)(2,4)\}\;, }$$

(Order 2 in Table 2.5) is unusual since 33 of the remaining δ values (δ 3 to δ 35) exceed the observed value of δ o = 2. 1523 and only one value is less than the observed value: δ 1 = 1. 8152. If all arrangements of the N = 7 observed bivariate response measurement scores listed in Fig. 2.5 occur with equal chance, the exact probability value of δ o = 2. 1523 computed on the M = 35 possible arrangements of the observed data with n 1 = 4 and n 2 = 3 response measurement scores preserved for each arrangement is

$$\displaystyle{ P\big(\delta \leq \delta _{\text{o}}\vert H_{0}\big) = \frac{\mbox{ number of $\delta $ values $ \leq \delta _{\mbox{ o}}$}} {M} = \frac{2} {35} = 0.0571\;. }$$

For comparison, for the bivariate response measurement scores listed in Fig. 2.5 the exact probability value based on v = 2 and \(C_{i} = n_{i}/N\) for i = 1, 2 in the first example is P = 0. 0857. No comparison is made with the conventional Hotelling T 2 test as Hotelling’s T 2 is undefined for v = 1.

Table 2.5 Permutations of the observed data set in Fig. 2.5 for treatment groups S 1 and S 2 with values for \(\xi _{1}\), \(\xi _{2}\), and δ based on v = 1, ordered by values of δ from lowest to highest

Following Eq. (2.6) on p. 37, the exact average value of the M = 35 δ values listed in Table 2.5 is μ δ  = 2. 9475. Thus, the observed chance-corrected coefficient of agreement, following Eq. (2.5) on p. 37, is

$$\displaystyle{ \mathfrak{R}_{\text{o}} = 1 -\frac{\delta _{\text{o}}} {\mu _{\delta }} = 1 -\frac{2.1523} {2.9475} = +0.2698\;, }$$

indicating approximately 27 % within-group agreement above that expected by chance.

2.3 Coda

Chapter 2 provided the foundation for Multi-Response Permutation Procedures (MRPP), with special emphasis on the generalized Minkowski distance function, \(\Delta (x,y)\), as defined in Eq. (2.1) on p. 30; δ, the weighted mean of the specified distance function values for all distinct pairs of objects in treatment group S i for i = 1, , g, as defined in Eq. (2.2) on p. 31; and \(\mathfrak{R}\), the chance-corrected within-group coefficient of agreement, as defined in Eq. (2.4) on p. 33. Chapters 3 and 4 provide applications of MRPP for completely randomized data at the interval level of measurement, Chaps. 5 and 6 provide applications of MRPP for completely randomized data at the ordinal (ranked) level of measurement, and Chap. 7 provides applications of MRPP for completely randomized data at the nominal (categorical) level of measurement.

Chapter 3

Chapter 3 establishes the relationship between the MRPP test statistics, δ and \(\mathfrak{R}\), and selected conventional tests and measures designed for the analysis of completely randomized data at the interval level of measurement. Considered in Chap. 3 are Student’s two-sample t test with interval-level univariate response measurement scores, Hotelling’s two-sample T 2 test with interval-level multivariate response measurement scores, one-way fixed-effects analysis of variance (ANOVA) with interval-level univariate response measurement scores, and one-way multivariate analysis of variance (MANOVA) with interval-level multivariate response measurement scores.