Chapter 10 of The Measurement of Association continues the discussion of fourfold (2×2) contingency tables initiated in Chap. 9, but concentrates on symmetrical 2×2 contingency tables, where each marginal frequency total is equal to N∕2. In the same way that 2×2 contingency tables are special cases of r×c contingency tables, symmetrical 2×2 contingency tables are special cases of fourfold tables. Symmetrical 2×2 tables provide additional insight into the relationships among various measures of association.

Included in Chap. 10 are exact and Monte Carlo permutation statistical methods applied to Pearson’s ϕ 2, Tschuprov’s T 2, Cramér’s V 2, Pearson’s r xy product-moment correlation coefficient, Leik and Gove’s \(d_{N}^{\,c}\) measure of nominal association, Goodman and Kruskal’s t a and t b asymmetric measures, Kendall’s τ b and Stuart’s τ c measures, Somers’ d yx and d xy asymmetric measures, simple percentage differences, D x and D y, Yule’s Y measure of nominal association, and Cohen’s unweighted and weighted κ measures of chance-corrected inter-rater agreement.

Also included in Chap. 10 are some extensions to multiple 2×2 contingency tables and 2×2×2 contingency tables, including the Mantel–Haenszel test for combined 2×2 contingency tables, Cohen’s kappa measure of chance-corrected inter-rater agreement, McNemar’s and Cochran’s Q tests, Fisher’s exact test for 2×2×2 and 2×2×2×2 contingency tables, and tests for interactions in 2×2×2 and 2×2×2×2 contingency tables.

10.1 Symmetrical Fourfold Tables

A symmetrical fourfold contingency table is a 2×2 contingency table in which N is even and each marginal frequency total is equal to N∕2. To illustrate the analysis of symmetrical fourfold contingency tables, consider the general layout of a 2×2 table, such as given in Table 10.1, and an example 2×2 frequency table, such as given in Table 10.2, where each marginal frequency total is equal to N∕2 = 12∕2 = 6.

Table 10.1 Notation for variables x and y with categories dummy-coded 0 and 1
Table 10.2 Example 2×2 contingency data for variables x and y with categories dummy-coded 0 and 1

10.1.1 Statistics ϕ 2, T 2, and V 2

For the frequency data given in Table 10.2, Pearson’s chi-squared test statistic is given by

$$\displaystyle \begin{aligned} \chi^{2} = N \left( \sum_{i=1}^{r}\,\sum_{j=1}^{c} \frac{O_{ij}^{2}}{R_{i}C_{j}}-1 \right) \;, \end{aligned}$$

where O ij is the observed cell frequency for i, j = 1, 2, R i denotes a row total for i = 1, 2, and C j denotes a column total for j = 1, 2. Thus, for the frequency data given in Table 10.2,

$$\displaystyle \begin{aligned} \chi^{2} = 12 \left[ \frac{4^{2}+2^{2}+2^{2}+4^{2}}{(6)(6)} -1 \right] = 1.3333\;. \end{aligned}$$

Then, Pearson’s ϕ measure of association is given by

$$\displaystyle \begin{aligned} \phi = \sqrt{\frac{\chi^{2}}{N}} = \sqrt{\frac{1.3333}{12}} = \pm0.3333 \end{aligned}$$

and ϕ 2 = (0.3333)2 = 0.1111. Alternatively, using the notation given in Table 10.1,

$$\displaystyle \begin{aligned} \phi = \frac{ad-bc}{\sqrt{(a+b)(c+d)(a+c)(b+d)}} = \frac{(4)(4)-(2)(2)}{\sqrt{(6)(6)(6)(6)}} = +0.3333\;. \end{aligned}$$

Tschuprov’s measure of nominal association is

$$\displaystyle \begin{aligned} T^{2} = \frac{\chi^{2}}{N \sqrt{(r-1)(c-1)}} = \frac{1.3333}{12 \sqrt{(2-1)(2-1)}} = 0.1111 \end{aligned}$$

and \(T = \sqrt {T^{2}} = \sqrt {0.1111} = 0.3333\). Also, Cramér’s measure of nominal association is

$$\displaystyle \begin{aligned} V^{2} = \frac{\chi^{2}}{N \big[ \min(r-1,c-1) \big]} = \frac{1.3333}{12 \big[ \min(2-1,2-1) \big]} = 0.1111\end{aligned} $$

and \(V = \sqrt {V^{2}} = \sqrt {0.1111} = 0.3333\). Thus, Pearson’s ϕ, Tschuprov’s T, and Cramér’s V are equivalent for a symmetrical 2×2 contingency table.

10.1.2 Pearson’s r xy Correlation Coefficient

Next, consider Pearson’s product-moment correlation coefficient given by

$$\displaystyle \begin{aligned} r_{xy} = \frac{N \displaystyle \sum_{i=1}^{N}x_{i}y_{i}-\sum_{i=1}^{N}x_{i}\sum_{i=1}^{N}y_{i}}{\sqrt{ \left[ N \displaystyle \sum_{i=1}^{N}x_{i}^{2}-\left(\sum_{i=1}^{N}x_{i}\right)^{2}\,\right]\left[ N \displaystyle \sum_{i=1}^{N}y_{i}^{2}-\left(\sum_{i=1}^{N}y_{i}\right)^{2}\, \right]}}\;. \end{aligned}$$

The binary-coded (0, 1) data listed in Table 10.3 were obtained from the frequency data given in Table 10.2, where Objects 1 through 4, coded (0, 0), represent the four objects in row 1 and column 1 of Table 10.2; Objects 5 and 6, coded (0, 1), represent the two objects in row 1 and column 2; Objects 7 and 8, coded (1, 0), represent the two objects in row 2 and column 1; and Objects 9 through 12, coded (1, 1), represent the four objects in row 2 and column 2 of Table 10.2.

Table 10.3 Example dummy-coded (0, 1) values from the 2×2 contingency table in Table 10.2

For the binary-coded data listed in Table 10.3,

$$\displaystyle \begin{aligned} N = 12\;, \quad \sum_{i=1}^{N} x_{i} = \sum_{i=1}^{N} x_{i}^{2} = \sum_{i=1}^{N} y_{i} = \sum_{i=1}^{N} y_{i}^{2} = 6\;, \quad \sum_{i=1}^{N} x_{i}y_{i} = +4\;, \end{aligned}$$

Pearson’s product-moment correlation coefficient is

$$\displaystyle \begin{aligned} \begin{array}{rcl} r_{xy} &\displaystyle &\displaystyle = \frac{N \displaystyle \sum_{i=1}^{N}x_{i}y_{i}-\sum_{i=1}^{N}x_{i}\sum_{i=1}^{N}y_{i}}{\sqrt{ \left[ N \displaystyle \sum_{i=1}^{N}x_{i}^{2}-\left(\sum_{i=1}^{N}x_{i}\right)^{2}\,\right]\left[ N \displaystyle \sum_{i=1}^{N}y_{i}^{2}-\left(\sum_{i=1}^{N}y_{i}\right)^{2}\, \right]}}\\ &\displaystyle &\displaystyle \qquad \qquad \qquad = \frac{(12)(+4)-(6)(6)}{\sqrt{[(12)(6)-6^{2}][(12)(6)-6^{2}]}} = +0.3333\;, \end{array} \end{aligned} $$

and \(r_{xy}^{2} = (+0.3333)^{2} = 0.1111\).

10.1.3 Regression Coefficients

For the binary-coded data listed in Table 10.3, the slope (unstandardized regression coefficient) of the regression line with variable y the dependent variable is

$$\displaystyle \begin{aligned} b_{yx} = \frac{N \displaystyle\sum_{i=1}^{N} x_{i}y_{i}-\sum_{i=1}^{N}x_{i}\sum_{i=1}^{N}y_{i}}{N \displaystyle\sum_{i=1}^{r}x_{i}^{2}-\left( \sum_{i=1}^{r}x_{i} \right)^{2}} = \frac{(12)(+4)-(6)(6)}{(12)(6)-6^{2}} = +0.3333 \end{aligned}$$

and the standardized regression coefficient with variable x the dependent variable is

$$\displaystyle \begin{aligned} \hat{\beta}_{yx} = b_{yx}\left( \frac{s_{x}}{s_{y}} \right) = +0.3333\left( \frac{0.5222}{0.5222} \right) = +0.3333\;. \end{aligned}$$

Also the unstandardized regression coefficient with variable x the dependent variable is

$$\displaystyle \begin{aligned} b_{xy} = \frac{N \displaystyle\sum_{i=1}^{N} x_{i}y_{i}-\sum_{i=1}^{N}x_{i}\sum_{i=1}^{N}y_{i}}{N \displaystyle\sum_{i=1}^{r}y_{i}^{2}-\left( \sum_{i=1}^{r}y_{i} \right)^{2}} = \frac{(12)(+4)-(6)(6)}{(12)(6)-6^{2}} = +0.3333 \end{aligned}$$

and the standardized regression coefficient with variable y the dependent variable is

$$\displaystyle \begin{aligned} \hat{\beta}_{xy} = b_{xy}\left( \frac{s_{x}}{s_{y}} \right) = +0.3333 \left( \frac{0.5222}{0.5222} \right) = +0.3333\;. \end{aligned}$$

Thus it is demonstrated that \(\phi = T = V = r_{xy} = b_{yx} = b_{xy} = \hat {\beta }_{yx}\) = \(\hat {\beta }_{xy}\) for a symmetrical 2×2 contingency table.

10.1.4 Leik and Gove’s \(d_{N}^{\,c}\) Statistic

Leik and Gove’s \(d_{N}^{\,c}\) test statistic for two nominal-level variables is described in detail in Chap. 4, Sect. 4.9. As noted by Leik and Gove , for symmetrical 2×2 contingency tables, \(d_{N}^{\,c}\) is equivalent to the traditional chi-squared-based measures such as Pearson’s ϕ 2, Tschuprov’s T 2, and Cramér’s V 2 [15, p. 291]. Test statistic \(d_{N}^{\,c}\) is based on three r×c contingency tables: one r×c contingency table containing the observed cell frequency values, a second r×c contingency table containing the expected cell frequency values, and a third r×c contingency table containing the maximized cell frequency values. Here, the observed values of concordant pairs, C; discordant pairs, D; pairs tied on variable x, T x; pairs tied on variable y, T y; and pairs tied on both variables x and y, T xy, are indicated without primes, the expected values of concordant pairs, C; discordant pairs, D; pairs tied on variable x, T x; pairs tied on variable y, T y; and pairs tied on both variables x and y, T xy, are indicated with a single prime (), and the maximized values of concordant pairs, C; discordant pairs, D; pairs tied on variable x, T x; pairs tied on variable y, T y; and pairs tied on both variables x and y, T xy, are indicated with double primes (′′).

Consider \(d_{N}^{\,c}\) for a symmetrical 2×2 contingency table, where

$$\displaystyle \begin{aligned} d_{N}^{\,c} = \frac{T_{y}^{\,\prime}-T_{y}}{T_{y}^{\,\prime}-T_{y}^{\,\prime\prime}} = \frac{T_{x}^{\,\prime}-T_{x}}{t_{x}^{\,\prime}-T_{x}^{\,\prime\prime}} = \frac{T_{xy}^{\,\prime}-T_{xy}}{T_{xy}^{\,\prime}-T_{xy}^{\,\prime\prime}} = \frac{(C^{\prime}+D^{\prime})-(C+D)}{(C^{\prime}+D^{\prime})-(C^{\prime\prime}+D^{\prime\prime})}\;. \end{aligned}$$

For the observed data given in Table 10.2 on p. 578, replicated in Table 10.4 for convenience, the observed values of C, D, T x, T y, and T xy are

$$\displaystyle \begin{aligned} C &= ad = (4)(4) = 16\;,\\ D &= bc = (2)(2) = 4\;,\\ T_{x} &= ab+cd = (4)(2)+(2)(4) = 16\;,\\ T_{y} &= ac+bd = (4)(2)+(2)(4) = 16\;,\\ T_{xy} &= \frac{1}{2}\big[ (a)(a-1)+(b)(b-1)+(c)(c-1)+(d)(d-1) \big]\\ &= \frac{1}{2}\big[ (4)(3)+(2)(1)+(2)(1)+(4)(3) \big] = 14\;, \end{aligned} $$

and

$$\displaystyle \begin{aligned} \begin{array}{rcl} &\displaystyle &\displaystyle C+D+T_{x}+T_{y}+T_{xy} = 16+4+16+16+14\\ &\displaystyle &\displaystyle \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad = \frac{N(N-1)}{2} = \frac{12(12-1)}{2} = 66\;. \end{array} \end{aligned} $$
Table 10.4 Observed values for a 2×2 contingency table with categories dummy-coded 0 and 1

Next, consider the expected values for the observed data in Table 10.4, given in Table 10.5, where

Table 10.5 Expected values for the 2×2 contingency table data in Table 10.4
$$\displaystyle \begin{aligned} E_{11} = E_{12} = E_{21} = E_{22} = \frac{(6)(6)}{12} = 3\;. \end{aligned}$$

For the expected cell values given in Table 10.5,

$$\displaystyle \begin{aligned} C^{\,\prime} &= ad = (3)(3) = 9\;,\\ D^{\,\prime} &= bc = (3)(3) = 9\;,\\ T_{x}^{\,\prime} &= ab+cd = (3)(3)+(3)(3) = 18\;,\\ T_{y}^{\,\prime} &= ac+bd = (3)(3)+(3)(3) = 18\;,\\ T_{xy}^{\,\prime} &= \frac{1}{2} \big[ (a)(a-1)+(b)(b-1)+(c)(c-1)+(d)(d-1) \big]\\ &= \frac{1}{2}\big[ (3)(2)+(3)(2)+(3)(2)+(3)(2) \big] = 12\;, \end{aligned} $$

and

$$\displaystyle \begin{aligned} \begin{array}{rcl} &\displaystyle &\displaystyle C^{\,\prime}+D^{\,\prime}+T_{x}^{\,\prime}+T_{y}^{\,\prime}+T_{xy}^{\,\prime} = 9+9+18+18+12\\ &\displaystyle &\displaystyle \qquad \qquad \qquad \qquad \qquad \qquad \qquad = \frac{N(N-1)}{2} = \frac{12(12-1)}{2} = 66\;. \end{array} \end{aligned} $$

Finally, consider the maximized cell frequencies for the data in Table 10.4, given in Table 10.6. For the maximized values given in Table 10.6,

$$\displaystyle \begin{aligned} C^{\,\prime\prime} &= ad = (6)(6) = 36\;,\\ D^{\,\prime\prime} &= bc = (0)(0) = 0\;,\\ T_{x}^{\,\prime\prime} &= ab+cd = (6)(0)+(0)(6) = 0\;,\\ T_{y}^{\,\prime\prime} &= ac+bd = (6)(0)+(0)(6) = 0\;,\\ T_{xy}^{\,\prime\prime} &= \frac{1}{2} \big[ (a)(a-1)+(b)(b-1)+(c)(c-1)+(d)(d-1) \big]\\ &= \frac{1}{2}\big[ (6)(5)+(6)(5) \big] = 30\;,\end{aligned} $$
Table 10.6 Maximized values for the 2×2 contingency table data in Table 10.4

and

$$\displaystyle \begin{aligned} \begin{array}{rcl} &\displaystyle &\displaystyle C^{\,\prime\prime}+D^{\,\prime\prime}+T_{x}^{\,\prime\prime}+T_{y}^{\,\prime\prime}+T_{xy}^{\,\prime\prime} = 36+0+0+0+30\\ &\displaystyle &\displaystyle \qquad \qquad \qquad \qquad \qquad \qquad = \frac{N(N-1)}{2} = \frac{12(12-1)}{2} = 66\;. \end{array} \end{aligned} $$

Then, Leik and Gove’s \(d_{N}^{\,c}\) measure is

$$\displaystyle \begin{aligned} d_{N}^{\,c} = \frac{T_{y}^{\,\prime}-T_{y}}{T_{y}^{\,\prime}-T_{y}^{\,\prime\prime}} = \frac{18-16}{18-0} = 0.1111\;, \end{aligned}$$

or

$$\displaystyle \begin{aligned} d_{N}^{\,c} = \frac{T_{x}^{\,\prime}-T_{x}}{T_{x}^{\,\prime}-T_{x}^{\,\prime\prime}} = \frac{18-16}{18-0} = 0.1111\;, \end{aligned}$$

or

$$\displaystyle \begin{aligned} d_{N}^{\,c} = \frac{T_{xy}^{\,\prime}-T_{xy}}{T_{xy}^{\,\prime}-T_{xy}^{\,\prime\prime}} = \frac{12-14}{12-30} = 0.1111\;, \end{aligned}$$

or

$$\displaystyle \begin{aligned} d_{N}^{\,c} = \frac{(C^{\,\prime}+D^{\,\prime})-(C+D)}{(C^{\,\prime}+D^{\,\prime})-(C^{\,\prime\prime}+D^{\,\prime\prime})} = \frac{(9+9)-(16+4)}{(9+9)-(36-0)} = 0.1111\;. \end{aligned}$$

Thus it is demonstrated that \(\phi ^{2} = T^{2} = V^{2} = r_{xy}^{2} = d_{N}^{\,c}\) for a symmetrical 2×2 contingency table.

10.1.5 Goodman and Kruskal’s t a and t b Statistics

Goodman and Kruskal’s t a and t b measures of nominal association are discussed in Chap. 4, Sect. 4.3. Consider the notation for a 2×2 contingency table given in Table 10.7. For the frequency data given in Table 10.4 on p. 582, Goodman and Kruskal’s asymmetric measure of association with variable a the dependent variable is

$$\displaystyle \begin{aligned} \begin{array}{rcl} &\displaystyle &\displaystyle t_{a} = \frac{N \displaystyle\sum_{j=1}^{c}\,\sum_{i=1}^{r} \frac{n_{ij}^{2}}{n_{.j}}-\sum_{i=1}^{r}n_{i.}^{2}}{N^{2}-\displaystyle\sum_{i=1}^{r}n_{i.}^{2}}\\ &\displaystyle &\displaystyle \qquad \qquad \qquad \qquad = \frac{12 \left( \displaystyle\frac{4^{2}+2^{2}+2^{2}+4^{2}}{6} \right) -6^{2}-6^{2}}{12^{2}-6^{2}-6^{2}} = 0.1111 \end{array} \end{aligned} $$

and Goodman and Kruskal’s asymmetric measure with variable b the dependent variable is

$$\displaystyle \begin{aligned} \begin{array}{rcl} &\displaystyle &\displaystyle t_{b} = \frac{N \displaystyle\sum_{i=1}^{r}\,\sum_{j=1}^{c} \frac{n_{ij}^{2}}{n_{i.}}-\sum_{j=1}^{c}n_{.j}^{2}}{N^{2}-\displaystyle\sum_{j=1}^{c}n_{.j}^{2}}\\ &\displaystyle &\displaystyle \qquad \qquad \qquad \qquad = \frac{12 \left( \displaystyle\frac{4^{2}+2^{2}+2^{2}+4^{2}}{6} \right)-6^{2}-6^{2}}{12^{2}-6^{2}-6^{2}} = 0.1111\;. \end{array} \end{aligned} $$
Table 10.7 Notation for a 2×2 contingency data for variables a and b with dummy (0, 1) coding

10.1.6 Kendall’s τ b Statistic

Kendall’s τ b measure of ordinal association is detailed in Chap. 5, Sect. 5.4. For the frequency data given in Table 10.4 on p. 582, the number of concordant pairs is

$$\displaystyle \begin{aligned} C = ad = (4)(4) = 16\;,\end{aligned} $$

the number of discordant pairs is

$$\displaystyle \begin{aligned} D = bc = (2)(2) = 4\;, \end{aligned}$$

the number of pairs tied on variable x but not tied on variable y is

$$\displaystyle \begin{aligned} T_{x} = ab+cd = (4)(2)+(2)(4) = 16\;, \end{aligned}$$

and the number of pairs tied on variable y but not tied on variable x is

$$\displaystyle \begin{aligned} T_{y} = ac+bd = (4)(2)+(2)(4) = 16\;. \end{aligned}$$

Then, Kendall’s τ b measure is

Alternatively, following the notation given in Table 10.1,

10.1.7 Stuart’s τ c Statistic

Stuart’s τ c measure of ordinal association is discussed in Chap. 5, Sect. 5.5 and is given by

$$\displaystyle \begin{aligned} \tau_{c} = \frac{2mS}{N^{2}(m-1)}\;, \end{aligned}$$

where m is the minimum number of rows or columns. For the frequency data given in Table 10.4 on p. 582, \(m = \min (r,c) = \min (2,2) = 2\), the number of concordant pairs is

$$\displaystyle \begin{aligned} C = ad = (4)(4) = 16\;, \end{aligned}$$

the number of discordant pairs is

$$\displaystyle \begin{aligned} D = bc = (2)(2) = 4\;, \end{aligned}$$

Kendall’s S is

$$\displaystyle \begin{aligned} S = C-D = 16-4 = +12\;, \end{aligned}$$

and Stuart’s τ c measure is

$$\displaystyle \begin{aligned} \tau_{c} = \frac{2mS}{N^{2}(m-1)} = \frac{2(2)(+12)}{12^{2}(2-1)} = +0.3333\;. \end{aligned}$$

10.1.8 Somers’ d yx and d xy Statistics

Somers’ d yx and d xy asymmetric measures of ordinal association are discussed in Chap. 5, Sect. 5.7. For the frequency data given in Table 10.4 on p. 582, Somers’ asymmetric measure of association with variable y the dependent variable is

$$\displaystyle \begin{aligned} d_{yx} = \frac{C-D}{C+D+T_{y}} = \frac{16-4}{16+4+16} = +0.3333 \end{aligned}$$

and Somers’ asymmetric measure with variable x the dependent variable is

$$\displaystyle \begin{aligned} d_{xy} = \frac{C-D}{C+D+T_{x}} = \frac{16-4}{16+4+16} = +0.3333\;. \end{aligned}$$

Alternatively,

$$\displaystyle \begin{aligned} d_{yx} = \frac{ad-bc}{(a+c)(b+d)} = \frac{(4)(4)-(2)(2)}{(6)(6)} = +0.3333 \end{aligned}$$

and

$$\displaystyle \begin{aligned} d_{xy} = \frac{ad-bc}{(a+b)(c+d)} = \frac{(4)(4)-(2)(2)}{(6)(6)} = +0.3333\;. \end{aligned}$$

10.1.9 Percentage Differences

Percentage differences are discussed in Chap. 9, Sect. 9.10. For the frequency data given in Table 10.4 on p. 582, the percentage difference for variable x is

$$\displaystyle \begin{aligned} D_{x} = \left| \frac{a}{a+b}-\frac{c}{c+d} \right| = \left| \frac{4}{6}-\frac{2}{6} \right| = |0.6667-0.3333| = 0.3333 \end{aligned}$$

and the percentage difference for variable y is

$$\displaystyle \begin{aligned} D_{y} = \left| \frac{a}{a+c}-\frac{b}{b+d} \right| = \left| \frac{4}{6}-\frac{2}{6} \right| = |0.6667-0.3333| = 0.3333\;. \end{aligned}$$

10.1.10 Yule’s Y Statistic

Yule’s Y measure of nominal association is discussed in Chap. 9, Sect. 9.6. For the frequency data given in Table 10.4 on p. 582, Yule’s coefficient of colligation is

$$\displaystyle \begin{aligned} Y = \frac{\sqrt{ad}-\sqrt{bc}}{\sqrt{ad}+\sqrt{bc}} = \frac{\sqrt{(4)(4)}-\sqrt{(2)(2)}}{\sqrt{(4)(4)}+\sqrt{(2)(2)}} = +0.3333\;. \end{aligned}$$

10.1.11 Cohen’s κ Statistic

Cohen’s unweighted kappa measure of inter-rater agreement is discussed in Chap. 4, Sect. 4.5, and Cohen’s linear and quadratic weighted kappa measures of inter-rater agreement are discussed in Chap. 6, Sect. 6.5. For the frequency data given in Table 10.4 on p. 582, let O ii for i = 1, 2 denote the observed cell frequencies on the principal diagonal and E ii for i = 1, 2 denote the expected cell frequencies on the principal diagonal. Then, Cohen’s unweighted chance-corrected coefficient of inter-rater agreement is

$$\displaystyle \begin{aligned} \kappa = \frac{\displaystyle\sum_{i=1}^{r} O_{ii}-\sum_{i=1}^{r} E_{ii}}{N - \displaystyle\sum_{i=1}^{r} E_{ii}} = \frac{(4+4)-(3+3)}{12-(3+3)} = +0.3333\;. \end{aligned}$$

Cohen’s weighted kappa measure of inter-rater agreement for b = 2 judges and c categories is given by

$$\displaystyle \begin{aligned} \kappa_{w} = 1-\frac{N \displaystyle\sum_{i=1}^{c}\,\sum_{j=1}^{c} w_{ij}n_{ij}}{\displaystyle\sum_{i=1}^{c}\,\sum_{j=1}^{c} w_{ij} R_{i} C_{j}}\;, \end{aligned} $$
(10.1)

where n ij denotes the observed cell frequencies, w ij denotes the cell weights, R i and C j denote the observed row and column marginal frequency totals for i, j = 1, …, c, and

$$\displaystyle \begin{aligned} N = \sum_{i=1}^{c}\,\sum_{j=1}^{c} n_{ij} \end{aligned}$$

denotes the table frequency total. For Cohen’s unweighted kappa measure of inter-rater agreement, the cell disagreement “weights” are given by

$$\displaystyle \begin{aligned} w_{ij} = \begin{cases} \,0 & \text{if }i = j\;, \\ {} \,1 & \text{otherwise ,} \end{cases} \end{aligned}$$

and for Cohen’s weighted kappa measure of inter-rater agreement the cell disagreement weights are given by

$$\displaystyle \begin{aligned} w_{ij} = \begin{cases} \,0 & \text{if }i = j\;, \\ {} \,|i-j| & \text{otherwise ,} \end{cases} \end{aligned}$$

for linear weighting, and

$$\displaystyle \begin{aligned} w_{ij} = \begin{cases} \,0 & \text{if }i = j\;, \\ {} \,(i-j)^{2} & \text{otherwise ,} \end{cases} \end{aligned}$$

for quadratic weighting. For the frequency data given in Table 10.4 on p. 582, Cohen’s linear-weighted kappa measure of inter-rater agreement is κ w = +0.3333 and Cohen’s quadratic-weighted kappa measure of inter-rater agreement is κ w = +0.3333.

10.2 Inter-relationships Among the Measures

The inter-relationships among the various measures for a symmetrical 2×2 contingency table can be summarized as follows. The Pearson product-moment correlation coefficient, r xy; the unstandardized slopes of the two regression lines, b yx and b xy; Yule’s coefficient of colligation, Y ; Pearson’s mean-square contingency coefficient, ϕ; Tschuprov’s T measure; Cramér’s V measure; Kendall’s τ b measure; Stuart’s τ c measure; Somers’ d yx and d xy asymmetric measures; the two percentage differences, D x and D y; and Cohen’s κ unweighted and weighted measures of chance-corrected inter-rater agreement are all equivalent measures, i.e.,

$$\displaystyle \begin{aligned} \begin{array}{rcl} &\displaystyle &\displaystyle r_{xy} = b_{yx} = b_{xy} = Y = \phi = T = V = \tau_{b} = \tau_{c} = d_{yx} = d_{xy}\\ &\displaystyle &\displaystyle \qquad \qquad \qquad = D_{x} = D_{y} = \kappa = \kappa_{w}\;. \end{array} \end{aligned} $$

Also, Pearson’s squared product-moment correlation coefficient, \(r_{xy}^{2}\); Pearson’s mean-squared contingency coefficient, ϕ 2; Tschuprov’s T 2 measure; Cramér’s V 2 measure; Leik and Gove’s \(d_{N}^{\,c}\) measure; and Goodman and Kruskal’s t b and t a measures of association are all equivalent measures, i.e.,

$$\displaystyle \begin{aligned} r_{xy}^{2} = \phi^{2} = T^{2} = V^{2} = d_{N}^{\,c} = t_{b} = t_{a}\;. \end{aligned}$$

10.2.1 Notational Inconsistencies

Measures of association for 2×2 contingency tables in particular, and r×c contingency tables in general, can be very confusing. First, some measures are denoted by uppercase Latin letters, e.g., Yule’s Q and Y , Tschuprov’s T 2, and Cramér’s V 2; some measures are denoted by lowercase Latin letters, e.g., Somers’ d yx and d xy, Leik and Gove’s \(d_{N}^{\,c}\), and Goodman and Kruskal’s t a and t b; and some measures are denoted by lowercase Greek letters, e.g., Pearson’s ϕ, Kendall’s τ b, and Cohen’s κ. While it would be preferable to reserve Greek letters for population parameters that are being estimated by sample statistics and Latin letters for sample statistics, once symbols are in common use it is difficult to standardize usage.Footnote 1 Second, certain measures of association appear as squared, whereas others do not. In particular, for the 2×2 case, the non-squared symbols t b and t a for Goodman and Kruskal’s asymmetric measures of nominal association are equivalent to Pearson’s symmetric measures ϕ 2 and \(r_{xy}^{2}\). Third, some measures norm between 0 and 1 for 2×2 contingency tables, e.g., Goodman and Kruskal’s t a and t b; others norm between − 1 and + 1, e.g., Kendall’s τ b and Cramér’s V ; and still others norm between 0 and , e.g., the odds ratio. Finally, some measures identify the two variables as x and y, e.g., Somers’ d yx and d xy, while others identify the two variables as a and b, e.g., Kendall’s τ a and τ b.

10.3 Extended Fourfold Contingency Tables

In some cases, measures of association have been introduced to analyze fourfold tables that have either been extended to analyze a series of 2×2 contingency tables or redesigned to consider multidimensional contingency tables with two categories in each dimension. In this section a small number of such measures are considered, including the Mantel –Haenszel test, McNemar’s Q test, Cochran’s Q test, Cohen’s chance-corrected measure of inter-rater agreement, Fisher’s exact probability test for 2×2×2 contingency tables, and tests for interactions in 2×2×2 and 2×2×2×2 contingency tables

10.4 The Mantel–Haenszel Test

The Mantel–Haenszel test, developed by Nathan Mantel and William Haenszel in 1959, is a test of significance for S combined 2×2 contingency tables.Footnote 2 Suppose that a treatment is compared with a control in each of S strata, where the outcome is binary: success or failure. Of interest is whether or not the treatment increases the probability of success.

Let n ijk denote the cell frequency for i, j = 1, 2 discrete categories and k = 1, …, S discrete strata for a 2×2×S contingency table. Table 10.8 illustrates a three-way contingency table with r = 2 rows, c = 2 columns, and S strata. Denote by a dot (⋅) the partial sum of all rows, all columns, or all strata, depending on the position of the (⋅) in the subscript list. If the (⋅) is in the first subscript position, the sum is over all rows; if the (⋅) is in the second subscript position, the sum is over all columns; and if the (⋅) is in the third subscript position, the sum is over all strata. Thus, n i.. denotes the marginal frequency total of the ith row, i = 1, 2, summed over all columns and strata; n .j. denotes the marginal frequency total of the jth column, j = 1, 2, summed over all rows and strata; n ..k denotes the marginal frequency total of the kth stratum, k = 1, …, S, summed over all rows and columns; and n denotes the table frequency total. The Mantel–Haenszel statistical model, under the null hypothesis, states that the S 2×2 contingency tables are independent and the marginal frequency totals for each of the 2×2 contingency tables are fixed [17]. Then, the probability for the n 11k frequency of each of the 2×2 contingency tables under the null hypothesis is the hypergeometric point probability value given by

$$\displaystyle \begin{aligned} \begin{array}{rcl} {} &\displaystyle &\displaystyle p \left( n_{11k}|n_{1.k},n_{.1k},n_{..k}\right) = \binom{n_{.1k}}{n_{11k}}\binom{n_{.2k}}{n_{12k}}\binom{n_{..k}}{n_{1.k}}^{-1} \\ &\displaystyle &\displaystyle \qquad \qquad \qquad \qquad \qquad = \frac{n_{1.k}!\;n_{2.k}!\;n_{.1k}!\;n_{.2k}!}{n_{..k}!\;n_{11k}!\;n_{12k}!\;n_{21k}!\;n_{22k}!}\;, \end{array} \end{aligned} $$
(10.2)

where n ..k = n 11k + n 12k + n 21k + n 22k, n 2.k = n ..k − n 1.k, n .2k = n ..k − n .1k, and k = 1, …, S.

Table 10.8 General layout of a 3-way contingency table with r = 2 rows, c = 2 columns, and S strata

The test statistic of interest is given by

$$\displaystyle \begin{aligned} T = \sum_{k=1}^{S} n_{11k}\;, \end{aligned}$$

where the summation is over only one cell since for any 2×2 contingency table with fixed marginal frequency totals the entry in any one cell determines the entries in the remaining three cells.

Under the null hypothesis (H 0) of the model in Eq. (10.2), the mean and variance of test statistic T are given by

$$\displaystyle \begin{aligned} \mathrm{E}\left[ T|H_{0} \right] = \sum_{k=1}^{S} \frac{n_{1.k}\, n_{.1k}}{n_{..k}} \end{aligned}$$

and

$$\displaystyle \begin{aligned} \mathrm{VAR} \left( T|H_{0} \right) = \sum_{k=1}^{S} \frac{n_{1.k}\,n_{2.k}\,n_{.1k}\,n_{.2k}}{(n_{..k})^{2}(n_{..k}-1)}\;, \end{aligned}$$

respectively. The Mantel–Haenszel test statistic, corrected for continuity, is given by

$$\displaystyle \begin{aligned} M = \frac{\Big( \big| T-\mathrm{E}[T|H_{0}] \big|-\frac{1}{2} \Big)^{2}}{\mathrm{VAR}(T|H_{0})}\;. \end{aligned}$$

The Mantel–Haenszel test statistic, M, is approximately distributed as Pearson’s chi-squared with one degree of freedom as N →.Footnote 3

10.4.1 Example Analysis

Consider the example data set given in Table 10.9 with r = 2 rows, c = 2 columns, S = 3 strata, and n  = 74 total observations. For the data listed in Table 10.9, the observed value of test statistic T is

$$\displaystyle \begin{aligned} T_{\text{o}} = \sum_{k=1}^{S} n_{11k} = 2+2+4 = 8.00\;, \end{aligned}$$

the expected value of T under the null hypothesis is

$$\displaystyle \begin{aligned} \mathrm{E}[T|H_{0}] = \sum_{k=1}^{S} \frac{n_{1.k}\,n_{.1k}}{n_{..k}} = \frac{(3)(7)}{32}+\frac{(4)(4)}{24}+\frac{(5)(5)}{18} = 2.7118\;, \end{aligned}$$

the variance of T is

$$\displaystyle \begin{aligned} \begin{array}{rcl} &\displaystyle &\displaystyle \mathrm{VAR} \left( T|H_{0} \right) = \sum_{k=1}^{S} \frac{n_{1.k}\,n_{2.k}\,n_{.1k}\,n_{.2k}}{(n_{..k})^{2}(n_{..k}-1)}\\ &\displaystyle &\displaystyle \qquad \qquad = \frac{(3)(29)(7)(25)}{(32)^{2}(32-1)}+\frac{(4)(20)(4)(20)}{(24)^{2}(24-1)}+\frac{(5)(13)(5)(13)}{(18)^{2}(18-1)} = 1.7272 \;, \end{array} \end{aligned} $$
Table 10.9 General layout of a 3-way contingency table with r = 2 rows, c = 2 columns, and S = 3 strata

and the observed Mantel–Haenszel test statistic is

$$\displaystyle \begin{aligned} M_{\text{o}} = \frac{\Big( \big| T_{\text{o}}-\mathrm{E}[T|H_{0}] \big|-\frac{1}{2} \Big)^{2}}{\mathrm{VAR}(T|H_{0})} = \frac{\left( \big| 8.00-2.7118 \big|-\frac{1}{2} \right)^{2}}{1.7272} = 13.2742\;. \end{aligned} $$
(10.3)

Mantel and Haenszel’s M test statistic is approximately distributed as Pearson’s chi-squared with one degree of freedom. For the observed value of M o = 13.2742 the approximate chi-squared probability value is P = 0.2691×10−3.

In Eq. (10.3), E[T|H 0], VAR(T|H 0), and the correction factor, are all invariant under permutation, leaving only variable T. Thus, for the data listed in Table 10.9 the approximate Monte Carlo resampling probability value based on L = 1, 000, 000 random arrangements of the observed data under the null hypothesis is

$$\displaystyle \begin{aligned} \begin{array}{rcl} &\displaystyle &\displaystyle P(M \geq M_{\text{o}}|H_{0}) = \frac{\text{number of }M\text{ values } \geq M_{\text{o}}}{L}\\ &\displaystyle &\displaystyle \qquad \qquad = P(T \geq T_{\text{o}}|H_{0}) = \frac{\text{number of }T\text{ values } \geq T_{\text{o}}}{L} = \frac{2{,}555}{1{,}000{,}000}\\ &\displaystyle &\displaystyle \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad = 0.2555 {\times} 10^{-2}\;. \end{array} \end{aligned} $$

10.4.2 Measures of Effect Size

Two types of measures of effect size have been proposed to represent the strength of a treatment effect [32]. One type, designated the d-family, is based on one or more measures of the differences between groups or levels of an independent variable. Representative of the d-family is Cohen’s d, which calculates the effect size by the number of standard deviations separating the means of the groups or levels [8]. The second type of measure of effect size, designated the r-family, represents some sort of correlation between the independent variables. Measures in the r-family are typically measures of correlation or association, the most prominent being Pearson’s squared product-moment correlation coefficient. Since the Mantel–Haenszel test is based on a 2×2×S contingency table, the d-family is not applicable.

The r-family measures of effect size contains two types of measures: putative maximum-corrected and chance-corrected. Maximum-corrected measures of effect size standardize the observed test statistic value by the maximum possible value of the test statistic. Maximum-corrected measures of effect size are bounded between 0 and 1 and are interpretable as the proportion of the maximum possible value of the test statistic. On the other hand, chance-corrected measures of effect size standardize the observed test statistic value by the expected value of the test statistic. Chance-corrected measures of effect size can attain a maximum value of + 1, but may be less than 0 when the test statistic value is less than expected by chance and are interpretable as the proportion above, or below, what is expected by chance. In 2010 Berry , Johnston , and Mielke developed two measures of effect size for the Mantel–Haenszel test statistic: a maximum-corrected and a chance-corrected measure of effect size [2].

10.4.2.1 Maximum-Corrected Measure of Effect Size

Let M o and T o denote the observed values of M and T, respectively. Then, the maximum-corrected measure of effect size is given by M o divided by the maximum possible value of M. The maximum value of T for an observed 2×2×S contingency table is given by

$$\displaystyle \begin{aligned} T_{\text{max}} = \sum_{k=1}^{S} \min(n_{1.k}, n_{.1k})\;, \end{aligned}$$

where \(\min (n_{1.k}, n_{.1k})\) is the maximum value of n 11k in the kth of S 2×2 contingency tables. Thus, the maximum value of M is given by

$$\displaystyle \begin{aligned} M_{\text{max}} = \frac{\Big( \big| T_{\text{max}}-\mathrm{E}[T|H_{0}] \big|-\frac{1}{2} \Big)^{2}}{\mathrm{VAR}(T|H_{0})} \end{aligned}$$

and the maximum-corrected measure of effect size for M is given by the observed value of M divided by the maximum value of M, i.e.,

$$\displaystyle \begin{aligned} \mathrm{ES}_{\text{M}} = \frac{M_{\text{o}}}{M_{\text{max}}}\;. \end{aligned}$$

For the frequency data given in Table 10.2 on p. 578, the maximum value of T is

$$\displaystyle \begin{aligned} T_{\text{max}} = \sum_{k=1}^{S} \min(n_{1.k}, n_{.1k}) = 3+4+5 = 12.00\;, \end{aligned}$$

the maximum value of M is

$$\displaystyle \begin{aligned} M_{\text{max}} = \frac{\Big( \big| T_{\text{max}}-\mathrm{E}[T|H_{0}] \big|-\frac{1}{2} \Big)^{2}}{\mathrm{VAR}(T|H_{0})} = \frac{\left( \big| 12.00-2.7118 \big| -\frac{1}{2} \right)^{2}}{1.7272} = 44.7162\;, \end{aligned}$$

and the maximum-corrected measure of effect size is

$$\displaystyle \begin{aligned} \mathrm{ES}_{\text{M}} = \frac{M_{\text{o}}}{M_{\text{max}}} = \frac{13.2742}{44.7162} = 0.2969\;, \end{aligned}$$

indicating that M o = 13.2742 accounts for approximately 30% of the maximum value of M, given the observed row, column, and stratum marginal frequency distributions, {12, 62}, {16, 58}, and {32, 24, 18}, respectively.

10.4.2.2 Chance-Corrected Measure of Effect Size

A chance-corrected measure of effect size for the Mantel–Haenszel test may be given by statistic M, standardized by the expected value of M. Thus, the chance-corrected measure is given by

$$\displaystyle \begin{aligned} \mathrm{ES}_{\text{C}} = \frac{M-\mathrm{E}[M|H_{0}]}{M_{\text{max}}-\mathrm{E}[M|H_{0}]} = 1-\frac{M_{\text{max}}-M}{M_{\text{max}}-1}\;, \end{aligned}$$

where E[M] = 1 since the mean of a chi-squared distribution is equal to the degrees of freedom and M is approximately distributed as chi-squared with one degree of freedom. For the frequency data given in Table 10.2, the chance-corrected measure of effect size is

$$\displaystyle \begin{aligned} \mathrm{ES}_{\text{C}} = 1-\frac{44.7162-13.2742}{44.7162-1} = +0.2808\;, \end{aligned}$$

indicating that M o = 13.2742 accounts for approximately 28% above what is expected by chance. In general, chance-corrected measures of effect size, such as ESC, tend to slightly smaller values than maximum-corrected measures, such as ESM, for the same set of data [2, pp. 398–399].

10.5 Cohen’s Kappa Measure

In 1960 Jacob Cohen introduced statistic kappa, an unweighted, chance-corrected measure of inter-rater agreement between two judges for a set of c disjoint, unordered categories [6]. In 1968 Cohen expanded kappa to include weighting for measuring the agreement between two judges for a set of c disjoint, ordered categories [7]. Unweighted kappa is discussed more completely in Chap. 4, Sect. 4.5, and weighted kappa is discussed in detail in Chap. 6, Sect. 6.5. Whereas unweighted kappa for categorical data did not distinguish among magnitudes of disagreement, weighted kappa for ordinal-level data incorporated the magnitude of each disagreement and provided partial credit for disagreements when agreement was not complete [16]. Weighted kappa is easily extended to interval-level data [3]. The usual approach is to assign weights to each disagreement pair with larger weights indicating greater disagreement. In the cases of both, unweighted and weighted kappa, kappa is equal to +1 when perfect agreement between the two judges occurs, 0 when agreement is equal to that expected under independence, and negative when agreement is less than expected by chance. Unweighted kappa and weighted kappa are conventionally designated as κ and κ w, respectively. Two forms of weighting are popular for weighted kappa: linear weighting, in which category disagreement weights progress outward linearly from the agreement diagonal, and quadratic weighting, in which category disagreement weights progress outward geometrically from the agreement diagonal. In keeping with the theme of this chapter—fourfold contingency tables—κ and κ w are extended to multiple judges with c = 2 categories.

Consider first b = 2 judges and c = 2 categories. A generalized calculation formula that applies to both unweighted and weighted kappa for b = 2 judges and c categories is given by

$$\displaystyle \begin{aligned} \kappa = 1-\frac{N \displaystyle\sum_{i=1}^{c}\,\sum_{j=1}^{c} w_{ij}n_{ij}}{\displaystyle\sum_{i=1}^{c}\,\sum_{j=1}^{c} w_{ij} R_{i} C_{j}}\;, \end{aligned} $$
(10.4)

where n ij denotes the observed cell frequencies, w ij denotes the cell weights, R i and C j denote the observed row and column marginal frequency totals for i, j = 1, …, c, and

$$\displaystyle \begin{aligned} N = \sum_{i=1}^{c}\,\sum_{j=1}^{c} n_{ij} \end{aligned}$$

denotes the table frequency total.

Given a c×c agreement table with N objects cross-classified by the ratings of two independent judges into c disjoint categories, an exact permutation test generates all M possible, equally-likely arrangements of the N objects in the c 2 cells, while preserving the total number of objects in each category, i.e., the marginal frequency distributions. For each arrangement of cell frequencies with fixed marginal frequency distributions, the kappa statistic, κ, and the exact point probability, p(n ij|n i., n .j, N), are calculated, where

$$\displaystyle \begin{aligned} p(n_{ij}|R_{i},C_{j},N) = \frac{\left( \displaystyle \,\prod_{i=1}^{c} R_{i}! \right) \left( \displaystyle \,\prod_{j=1}^{c} C_{j}! \right)}{N! \displaystyle\prod_{i=1}^{c} \,\displaystyle\prod_{j=1}^{c} n_{ij}!} \end{aligned}$$

is the conventional hypergeometric probability of a c×c contingency table.

Let κ o denote the value of the observed weighted kappa statistic and M denote the total number of distinct cell frequency arrangements of the N objects in the c×c agreement table, given fixed marginal frequency totals. Then the exact probability value of κ o under the null hypothesis is given by

$$\displaystyle \begin{aligned} P(\kappa_{\text{o}}|H_{0}) = \sum_{k=1}^{M} \Psi(\kappa_{k})\,p(n_{ij}|R_{i},C_{j},N)\;, \end{aligned}$$

where

$$\displaystyle \begin{aligned} \Psi(\kappa_{k}) = \begin{cases} \,1 & \text{if }\kappa_{k} \geq \kappa_{\text{o}}\;, \\ {} \,0 & \text{otherwise .} \end{cases} \end{aligned}$$

When M is very large, exact permutation analyses quickly become impractical and Monte Carlo resampling procedures become necessary. Let L denote a random sample of all M possible values of κ. Then, under the null hypothesis the resampling approximate probability value for the observed value of κ, κ o is given by

$$\displaystyle \begin{aligned} P\left( \kappa_{\text{o}} \right) = \frac{1}{L} \sum_{l=1}^{L} \Psi_{l} \left( \kappa \right)\;, \end{aligned}$$

where

$$\displaystyle \begin{aligned} \Psi_{l} \left( \kappa \right) = \begin{cases} \,1 & \text{if }\kappa \geq \kappa_{\text{o}}\;, \\ {} \,0 & \text{otherwise}\;. \end{cases} \end{aligned}$$

To calculate Cohen’s unweighted kappa with Eq. (10.4) on p. 597, the cell disagreement “weights” are given by

$$\displaystyle \begin{aligned} w_{ij} = \begin{cases} \,0 & \text{if }i = j\;, \\ {} \,1 & \text{otherwise .} \end{cases} \end{aligned}$$

To calculate Cohen’s weighted kappa with linear weighting, the cell disagreement weights are given by

$$\displaystyle \begin{aligned} w_{ij} = \begin{cases} \,0 & \text{if }i = j\;, \\ {} \,|i-j| & \text{otherwise .} \end{cases} \end{aligned}$$

To calculate Cohen’s weighted kappa with quadratic weighting, the cell disagreement weights are given by

$$\displaystyle \begin{aligned} w_{ij} = \begin{cases} \,0 & \text{if }i = j\;, \\ {} \,(i-j)^{2} & \text{otherwise .} \end{cases}\end{aligned} $$

Thus, as demonstrated, for b = 2 judges and c = 2 categories, the cell disagreement weights are the same for unweighted kappa (κ) and weighted kappa (κ w) with either linear or quadratic weighting.

10.5.1 Example 1

To illustrate the application of Cohen’s unweighted kappa with b = 2 judges and c = 2 categories, consider the frequency data given in Table 10.10, where b = 2 independent judges have each assigned N = 123 observations to c = 2 disjoint, unordered categories labeled Pro and Con. Assign the number 1 to the categories labeled “Pro” and the number 2 to the categories labeled “Con.” Then following Eq. (10.4) on p. 597,

$$\displaystyle \begin{aligned} \begin{array}{rcl} &\displaystyle &\displaystyle \kappa = 1-\frac{N \displaystyle\sum_{i=1}^{c}\,\sum_{j=1}^{c} w_{ij}n_{ij}}{\displaystyle\sum_{i=1}^{c}\,\sum_{j=1}^{c} w_{ij} R_{i} C_{j}}\\ &\displaystyle &\displaystyle \qquad \qquad \qquad = 1-\frac{123 \big[ (0)(42)+(1)(23)+(1)(18)+(0)(40) \big]}{(0)(65)(60)+(1)(65)(63)+(1)(58)(60)+(0)(58)(63)}\\{}= +0.3343\;,\vspace{-3pt} \end{array} \end{aligned} $$

indicating approximately 33% agreement between the two judges above that expected by chance.

Table 10.10 Example 2×2 contingency table for b = 2 independent judges and c = 2 disjoint categories

For the frequency data given in Table 10.10, there are only

$$\displaystyle \begin{aligned} \begin{array}{rcl} &\displaystyle &\displaystyle M = \min(a+b,a+c)-\max(0,a-d)+1\\ &\displaystyle &\displaystyle \qquad \qquad \qquad = \min(65,60)-\max(0,42-40)+1 = 60-2+1 = 59 \end{array} \end{aligned} $$

possible, equally-likely arrangements in the reference set of all permutations of the cell frequencies in Table 10.10 given the observed row and column marginal frequency distributions, {65, 58} and {60, 63}, respectively, making an exact permutation analysis possible. If the M = 59 possible arrangements of the frequency data given in Table 10.10 occur with equal chance, the exact probability value of κ under the null hypothesis is the sum of the hypergeometric point probability values associated with κ = +0.3343 or greater.

Table 10.11 lists the n 11 cell frequency values, unweighted kappa values, and associated hypergeometric probability values for the frequency data given in Table 10.10, where the n 11 cell values associated with κ values equal to or greater than the observed value of κ = +0.3343 are indicated with asterisks. Because there is only one degree of freedom, it is sufficient to list the cell frequency values for only one cell, n 11. For the frequency data given in Table 10.10, the exact upper-tail hypergeometric probability value of the observed κ value is

$$\displaystyle \begin{aligned} \begin{array}{rcl} &\displaystyle &\displaystyle P = 0.1388 {\times} 10^{-3}+0.3259 {\times} 10^{-4}\\ &\displaystyle &\displaystyle \qquad \qquad \qquad + \cdots +0.6507 {\times} 10^{-26}+0.1122 {\times} 10^{-28} = 0.1793 {\times} 10^{-3}. \end{array} \end{aligned} $$
Table 10.11 Listing of the M = 59 possible arrangements of cell frequencies, unweighted kappa values, and associated hypergeometric probability values for the data given in Table 10.10

10.5.2 Example 2

Although weighted and unweighted kappa were originally formulated to compare only two judges, both κ and κ w can be generalized to accommodate multiple judges [25]. However, with multiple judges an exact permutation analysis becomes impractical except for very small sample sizes; therefore, a Monte Carlo resampling permutation analysis is preferred when analyzing agreement data from multiple judges. The analysis for b multiple judges may be conceptualized as a b-way contingency table with c = 2 categories on each axis. Figure 10.1 illustrates a 2×2×2 contingency table with b = 3 judges and c = 2 disjoint, unordered categories labeled Pro and Con.

Fig. 10.1
figure 1

Graphic depiction of a 2×2×2 contingency table with b = 3 independent judges and c = 2 disjoint categories

To illustrate the application of Cohen’s kappa with multiple judges and c = 2 disjoint categories, consider the frequency data given in Table 10.12, where b = 3 judges have independently assigned N = 254 observations to c = 2 categories labeled Pro and Con. A generalized calculation formula that applies to both unweighted and weighted kappa for b = 3 judges and c categories is given by

$$\displaystyle \begin{aligned} \kappa = 1-\frac{N^{2} \displaystyle\sum_{i=1}^{c}\,\sum_{j=1}^{c}\,\sum_{k=1}^{c} w_{ijk} n_{ijk}}{\displaystyle\sum_{i=1}^{c}\,\sum_{j=1}^{c}\,\sum _{k=1}^{c} w_{ijk} R_{i} C_{j} S_{k}}\;, \end{aligned} $$
(10.5)

where n ijk denotes the observed cell frequencies, w ijk denotes the cell weights, R i, C j, and S k denote the observed row, column, and slice marginal frequency totals for i, j, k = 1, …, c, and

$$\displaystyle \begin{aligned} N = \sum_{i=1}^{c}\,\sum_{j=1}^{c}\,\sum_{k=1}^{c} n_{ijk} \end{aligned}$$

denotes the table frequency total.

Table 10.12 Example 2×2×2 contingency table for b = 3 independent judges and c = 2 disjoint categories

Given a c×c×c agreement table with N objects cross-classified by b = 3 independent judges, an exact permutation test involves generating all possible, equally-likely arrangements of the N objects to the c 3 cells, while preserving the observed row, column, and slice marginal frequency distributions, {123, 131}, {135, 119}, and {134, 120}, respectively. For each arrangement of cell frequencies, the kappa statistic, κ, and the exact hypergeometric point probability value under the null hypothesis, p(n ijk|R i, C j, S k, N), are calculated, where

$$\displaystyle \begin{aligned} p(n_{ijk}|R_{i},C_{j},S_{k},N) = \frac{\left( \,\displaystyle\prod_{i=1}^{c}R_{i}! \right) \left( \,\displaystyle\prod_{j=1}^{c}C_{j}! \right) \left( \,\displaystyle\prod_{k=1}^{c}S_{k}! \right)}{(N!)^{2}\displaystyle\prod_{i=1}^{c}\,\displaystyle\prod_{j=1}^{c}\,\displaystyle\prod_{k=1}^{c}n_{ijk}!} \end{aligned}$$

[20].

If κ o denotes the value of the observed kappa test statistic, the exact probability value of κ o under the null hypothesis is given by

$$\displaystyle \begin{aligned} P(\kappa_{\text{o}}|H_{0}) = \sum_{l=1}^{M}\Psi_{l}\left( n_{ijk}|R_{i},C_{j},S_{k},N \right)\;, \end{aligned}$$

where

$$\displaystyle \begin{aligned} \Psi_{l}\left( n_{ijk}|R_{i},C_{j},S_{k},N \right) = \begin{cases} \,p(n_{ijk}|R_{i},C_{j},S_{k},N) & \text{if }\kappa \geq \kappa_{\text{o}}\;, \\ {} \,0 & \text{otherwise}\;, \end{cases} \end{aligned}$$

and M denotes the total number of possible, equally-likely arrangements in the reference set of all permutations of cell frequencies in Table 10.12 given the observed marginal frequency distributions. When M is very large, as is typical with multi-way contingency tables, exact tests are impractical and Monte Carlo resampling becomes necessary, where a random sample, L, of the M possible arrangements of cell frequencies provides for a comparison of κ test statistics calculated on the L random tables with the κ test statistic calculated on the observed table.

10.5.2.1 Unweighted Kappa

Unweighted kappa and weighted kappa, with either linear or quadratic weighting, yield the same result when analyzing agreement data for b = 2 judges and c = 2 categories. For b > 2 judges and c = 2 categories, unweighted kappa and weighted kappa usually yield different results, but weighted kappa with linear weighting and weighted kappa with quadratic weighting yield the same result. For the frequency data given in Table 10.12, assign the number 1 to the categories labeled “Pro” and the number 2 to the categories labeled “Con.” Then the cell disagreement “weights” for unweighted kappa are given by

$$\displaystyle \begin{aligned} w_{ijk} = \begin{cases} \,0 & \text{if }i = j = k\;, \\ {} \,1 & \text{otherwise .} \end{cases} \end{aligned}$$

Following Eq. (10.5) on p. 602, Cohen’s unweighted kappa coefficient is κ = +0.1862, indicating approximately 19% agreement among the b = 3 judges above that expected by chance. If κ o denotes the observed value of κ, the approximate Monte Carlo resampling probability value based on L = 1, 000, 000 random arrangements of the cell frequencies, given the observed row, column, and slice marginal frequency distributions, {123, 131}, {135, 119}, and {134, 120}, respectively, is

$$\displaystyle \begin{aligned} P(\kappa \geq \kappa_{\text{o}}|H_{0}) = \frac{\text{number of }\kappa\text{ values } \geq \kappa_{\text{o}}}{L} = \frac{2{,}250}{1{,}000{,}000} = 0.0023\;. \end{aligned}$$

10.5.2.2 Weighted Kappa

For the frequency data given in Table 10.12, assign the number 1 to the categories labeled “Pro” and the number 2 to the categories labeled “Con.” Then the linear cell disagreement weights are given by

$$\displaystyle \begin{aligned} w_{ijk} = |i-j|+|i-k|+|j-k| \end{aligned}$$

and the quadratic cell disagreement weights are given by

$$\displaystyle \begin{aligned} w_{ijk} = (i-j)^{2}+(i-k)^{2}+(j-k)^{2} \end{aligned}$$

for i, j, k = 1, …, c. Table 10.13 lists the eight cell indices and the associated linear and quadratic weights for a 2×2×2 agreement table, demonstrating that with c = 2 categories, the linear and quadratic weights are identical.

Table 10.13 Cells, linear weights, and quadratic weights for b = 3 independent judges and c = 2 disjoint categories

Following Eq. (10.5) on p. 602, Cohen’s weighted kappa with linear weighting is κ w = +0.0342, indicating approximately 3% agreement among the b = 3 judges above that expected by chance. If κ o denotes the observed value of κ w, the approximate Monte Carlo resampling probability value based on L = 1, 000, 000 random arrangements of the cell frequencies, given the observed row, column, and slice marginal frequency distributions, {123, 131}, {135, 119}, and {134, 120}, respectively, is

$$\displaystyle \begin{aligned} P(\kappa_{w} \geq \kappa_{\text{o}}|H_{0}) = \frac{\text{number of }\kappa_{w}\text{ values } \geq \kappa_{\text{o}}}{L} = \frac{190{,}610}{1{,}000{,}000} = 0.1906\;. \end{aligned}$$

Because with c = 2 categories the linear and quadratic weights are the same, the results are identical with quadratic weighting, i.e., κ w = +0.0342 and P = 0.1906.

10.5.3 Example 3

For this third example of Cohen’s chance-corrected measure of inter-rater agreement, consider b = 4 judges who independently assign N = 76 observations to c = 2 disjoint, unordered categories labeled Pro and Con. The frequency data are given in Table 10.14.

Table 10.14 Example 2×2×2×2 contingency table for b = 4 independent judges and c = 2 disjoint categories

A generalized calculation formula that applies to both unweighted and weighted kappa for b = 4 judges and c categories is given by

$$\displaystyle \begin{aligned} \kappa = 1-\frac{N^{3} \displaystyle\sum_{i=1}^{c}\,\sum_{j=1}^{c}\,\sum_{k=1}^{c}\,\sum_{l=1}^{c} w_{ijkl} n_{ijkl}}{\displaystyle\sum_{i=1}^{c}\,\sum_{j=1}^{c}\,\sum_{k=1}^{c}\,\sum_{l=1}^{c} w_{ijkl} R_{i} C_{j} S_{k} L_{l}}\;, \end{aligned} $$
(10.6)

where n ijkl denotes the observed cell frequencies, w ijkl denotes the cell weights, R i, C j, S k, and L l denote the observed row, column, slice, and level marginal frequency totals for i, j, k, l = 1, …, c, and

$$\displaystyle \begin{aligned} N = \sum_{i=1}^{c}\,\sum_{j=1}^{c}\,\sum_{k=1}^{c}\,\sum_{l=1}^{c} n_{ijkl} \end{aligned}$$

denotes the table frequency total.

Given a c×c×c×c agreement table with N objects cross-classified by b = 4 independent judges, an exact permutation test involves generating all possible, equally-likely arrangements of the N objects to the c 4 cells, while preserving the observed row, column, slice, and level marginal frequency distributions, {33, 43}, {34, 42}, {39, 37}, and {44, 32}, respectively. For each arrangement of cell frequencies, the kappa statistic, κ, and the exact hypergeometric point probability value under the null hypothesis, p(n ijkl|R i, C j, S k, L l, N), are calculated, where

$$\displaystyle \begin{aligned} p(n_{ijkl}|R_{i},C_{j},S_{k},L_{l},N) = \frac{\left( \,\displaystyle\prod_{i=1}^{c}R_{i}! \right) \left( \,\displaystyle\prod_{j=1}^{c}C_{j}! \right) \left( \,\displaystyle\prod_{k=1}^{c}S_{k}! \right) \left( \,\displaystyle\prod_{l=1}^{c}L_{l}! \right)}{(N!)^{3}\displaystyle\prod_{i=1}^{c}\,\displaystyle\prod_{j=1}^{c}\,\displaystyle\prod_{k=1}^{c}\,\displaystyle\prod_{l=1}^{c}n_{ijkl}!} \end{aligned}$$

[20].

If κ o denotes the value of the observed kappa test statistic, the exact probability value of κ o under the null hypothesis is given by

$$\displaystyle \begin{aligned} P(\kappa_{\text{o}}|H_{0}) = \sum_{l=1}^{M}\Psi_{l}\left( n_{ijkl}|R_{i},C_{j},S_{k},L_{l},N \right)\;,\end{aligned} $$

where

$$\displaystyle \begin{aligned} \Psi_{l}\left( n_{ijkl}|R_{i},C_{j},S_{k},L_{l},N \right) = \begin{cases} \,p(n_{ijkl}|R_{i},C_{j},S_{k},L_{l},N) & \text{if }\kappa \geq \kappa_{\text{o}}\;, \\ {} \,0 & \text{otherwise}\;, \end{cases}\end{aligned} $$

and M denotes the total number of possible, equally-likely arrangements in the reference set of all permutations of cell frequencies in Table 10.14 given the row, column, slice, and level observed marginal frequency distributions, {33, 43}, {34, 42}, {39, 37}, and {44, 32}, respectively. When M is very large, as is typical with multi-way contingency tables, exact tests are impractical and Monte Carlo resampling becomes necessary, where a random sample, L, of the M possible arrangements of cell frequencies provides for a comparison of κ test statistics calculated on the L random tables with the κ test statistic calculated on the observed table.

10.5.3.1 Unweighted Kappa

For the frequency data given in Table 10.14, assign the number 1 to the categories labeled “Pro” and the number 2 to the categories labeled “Con.” Then the cell disagreement “weights” for unweighted kappa are given by

$$\displaystyle \begin{aligned} w_{ijkl} = \begin{cases} \,0 & \text{if }i = j = k = l\;, \\ {} \,1 & \text{otherwise .} \end{cases} \end{aligned}$$

Following Eq. (10.6) on p. 605, Cohen’s unweighted kappa coefficient is κ = +0.0561, indicating approximately 6% agreement among the b = 4 judges above that expected by chance. If κ o denotes the observed value of κ, the approximate Monte Carlo resampling probability value based on L = 1, 000, 000 random arrangements of the cell frequencies, given the observed marginal frequency distributions, is

$$\displaystyle \begin{aligned} P(\kappa \geq \kappa_{\text{o}}|H_{0}) = \frac{\text{number of }\kappa\text{ values } \geq \kappa_{\text{o}}}{L} = \frac{9{,}475}{1{,}000{,}000} = 0.0095\;.\end{aligned} $$

10.5.3.2 Weighted Kappa

For the frequency data given in Table 10.14, assign the number 1 to the categories labeled “Pro” and the number 2 to the categories labeled “Con.” Then the linear cell disagreement weights are given by

$$\displaystyle \begin{aligned} w_{ijkl} = |i-j|+|i-k|+|i-l|+|j-k|+|j-l|+|k-l|\end{aligned} $$

and the quadratic cell disagreement weights are given by

$$\displaystyle \begin{aligned} w_{ijkl} = (i-j)^{2}+(i-k)^{2}+(i-l)^{2}+(j-k)^{2}+(j-l)^{2}+(k-l)^{2}\end{aligned} $$

for i, j, k, l = 1, …, c.

Table 10.15 lists the 16 cell indices and the associated linear and quadratic weights for a 2×2×2×2 agreement table. Note that for c = 2 categories, the linear and quadratic weights are identical.

Table 10.15 Cells, linear weights, and quadratic weights for b = 4 independent judges and c = 2 disjoint categories

Following Eq. (10.6) on p. 605, Cohen’s weighted kappa with linear weighting is κ w = +0.0654, indicating approximately 7% agreement among the b = 4 judges above that expected by chance. If κ o denotes the observed value of κ w, the approximate Monte Carlo resampling probability value based on L = 1, 000, 000 random arrangements of the cell frequencies, given the observed row, column, slice, and level marginal frequency distributions, {33, 43}, {34, 42}, {39, 37}, and {44, 32}, respectively, is

$$\displaystyle \begin{aligned} P(\kappa_{w} \geq \kappa_{\text{o}}|H_{0}) = \frac{\text{number of }\kappa_{w}\text{ values } \geq \kappa_{\text{o}}}{L} = \frac{3{,}967}{1{,}000{,}000} = 0.0040\;. \end{aligned}$$

Because, with c = 2 categories, the linear and quadratic weights are the same, the results are identical to those obtained with quadratic weighting, i.e., κ w = +0.0654 and P = 0.0040.

10.6 McNemar’s and Cochran’s Q Tests for Change

In 1947 Quinn McNemar proposed a test for change over k = 2 time periods [18]. In 1950 William Cochran developed a test for change for k ≥ 2 time periods [4]. For k = 2, Cochran’s Q test for related proportions is identical to McNemar’s Q test for related proportions. The McNemar and Cochran Q tests are described in detail in Chap. 4, Sects. 4.6 and 4.7, respectively.

10.6.1 McNemar’s Q Test for Change

Represent a 2×2 contingency table as in Table 10.16. Then, McNemar’s test for change is given by

$$\displaystyle \begin{aligned} Q = \frac{(B-C)^{2}}{B+C}\;, \end{aligned}$$

where B and C represent the two cells of change, i.e., Pro to Con and Con to Pro.

Table 10.16 Notation for a 2×2 cross-classification for McNemar’s test for change

10.6.1.1 Illustration

To illustrate the calculation of probability values for McNemar’s Q test for change, consider the frequency data given in Table 10.17, where N = 9 subjects have been recorded as either Pro or Con on a specified issue at Time 1 and again on the same issue at Time 2. For the frequency data given in Table 10.17, the observed value of McNemar’s Q test statistic is

$$\displaystyle \begin{aligned} Q = \frac{(B-C)^{2}}{B+C} = \frac{(5-1)^{2}}{5+1} = 2.6667\;. \end{aligned}$$
Table 10.17 Example frequency data for McNemar’s test for change with N = 9 subjects

The exact probability value of an observed value of Q, under the null hypothesis, is given by the sum of the hypergeometric point probability values associated with the Q values equal to or greater than the observed value of Q. For the frequency data given in Table 10.17, there are only

$$\displaystyle \begin{aligned} \begin{array}{rcl} &\displaystyle &\displaystyle M = \min(A+B,A+C)-\max(0,A-D)+1\\ &\displaystyle &\displaystyle \qquad \qquad \qquad = \min(7,3)-\max(0,2-1)+1 = 3-1+1 = 3 \end{array} \end{aligned} $$

possible, equally-likely arrangements in the reference set of all permutations of cell frequencies given the two cell frequencies of change, 5 and 1, and only two Q values are equal to or greater than the observed value of Q = 2.6667. The exact upper-tail probability of the observed Q value is P = 0.9167, i.e., the sum of the hypergeometric point probability values associated with values of Q = 2.6667 or greater.

More specifically, Table 10.18 displays the complete reference set of three possible 2×2 contingency tables given the row and column marginal frequency distributions, {7, 2} and {3, 6}, respectively. For Table A in Table 10.18, Q = 2.0000 and the associated hypergeometric point probability value is p = 0.0833. For Table B in Table 10.18, the observed table, Q = 2.6667 and the associated hypergeometric point probability value is p = 0.5000. And for Table C in Table 10.18, Q = 4.0000 and the associated hypergeometric point probability value is p = 0.4167. Thus, the cumulative hypergeometric probability value for Q = 2.6667 is the sum of the hypergeometric point probability values associated with values of Q = 2.6667 or greater; in this case, the probability values associated with Q = 2.6667 and Q = 4.0000, i.e., P = 0.5000 + 0.4167 = 0.9167.

Table 10.18 Three possible cell arrangements given the marginal frequency distributions {7, 2} and {3, 6}, Q values, and hypergeometric point probability values

McNemar’s Q test statistic is approximately distributed as chi-squared with 1 degree of freedom. While no responsible researcher would knowingly fit a chi-squared distribution function to only three possible outcomes, small samples, such as in Table 10.17, sometimes occur inadvertently. Suppose a researcher is employed by a national food service provider and begins with a reasonable, but small sample of subjects. As the research analysis proceeds, an interest develops in a subset of subjects composed of only women, breast-feeding their first child, and residing on a Native American reservation. Such unplanned small samples are relatively common and are not suitable for a conventional analysis. The chi-squared value for the observed data in Table 10.17 is χ 2 = 0.3214 and the probability value is P = 0.5708, which, as expected, is far removed from the exact probability value of P = 0.9167.

10.6.1.2 Example

A more realistic example illustrating McNemar’s Q test for change is given in Table 10.19, where N = 70 subjects were recorded as either Pro or Con on a specified issue at Time 1 and again on the same issue at Time 2. At Time 1, 40 of the 70 subjects were in favor of the issue and 30 subjects were opposed. At Time 2, 50 subjects were in favor and 20 were opposed. Of those subjects that changed, seven changed from Pro to Con and 17 changed from Con to Pro. For the frequency data given in Table 10.19, McNemar’s test statistic is

$$\displaystyle \begin{aligned} Q = \frac{(B-C)^{2}}{B+C} = \frac{(7-17)^{2}}{7+17} = \frac{100}{24} = 4.1667\;. \end{aligned}$$
Table 10.19 Example frequency data for McNemar’s test for change with N = 70 subjects

For the frequency data given in Table 10.19, there are only

$$\displaystyle \begin{aligned} \begin{array}{rcl} &\displaystyle &\displaystyle M = \min(A+B,A+C)-\max(0,A-D)+1\\ &\displaystyle &\displaystyle \qquad \qquad \qquad = \min(40,50)-\max(0,33-13)+1 = 40-20+1 = 21 \end{array} \end{aligned} $$

possible, equally-likely arrangements in the reference set of all permutations of cell frequencies given the observed row and column marginal frequency distributions, {40, 30} and {50, 20}, respectively, making an exact permutation analysis possible. Since M = 21 is a reasonably small number of arrangements, it will be illustrative to list the 21 sets of cell frequencies, McNemar’s Q values, and the associated hypergeometric point probability values in Table 10.20, where the rows with hypergeometric probability values associated with Q values equal to or greater than the observed value of Q = 4.1667 are indicated with asterisks.

Table 10.20 Cell frequencies, McNemar’s Q values, and exact hypergeometric point probability values for M = 21 possible arrangements of the observed data in Table 10.19

If the M = 21 possible arrangements of the frequency data given in Table 10.19 occur with equal chance, the exact probability of Q under the null hypothesis is the sum of the hypergeometric point probability values associated with Q = 4.1667 or greater. For the frequency data given in Table 10.19, the exact upper-tail probability of the observed value of Q value is

$$\displaystyle \begin{aligned} \begin{array}{rcl} &\displaystyle &\displaystyle P = 0.1379 {\times} 10^{-1}+0.3448 {\times} 10^{-2}+0.6305 {\times} 10^{-3}+0.8210 {\times} 10^{-4}\\ &\displaystyle &\displaystyle \qquad \qquad \qquad +0.7309 {\times} 10^{-5}+0.4167 {\times} 10^{-6}+0.1350 {\times} 10^{-7}+0.1856 {\times} 10^{-9}\\ &\displaystyle &\displaystyle \qquad \qquad \qquad \qquad = 0.0180\;. \end{array} \end{aligned} $$

For comparison, the value of chi-squared for the frequency data given in Table 10.19 is χ 2 = 5.6058 and with 1 degree of freedom, the probability value is P = 0.0179, which compares favorably with the exact probability value of P = 0.0180.

10.6.2 Cochran’s Q Test for Change

Cochran’s Q test for k ≥ 2 treatments can be considered an extension of McNemar’s Q test for k = 2 treatments or time periods. Cochran’s Q test is described more completely in Chap. 4, Sect. 4.7.

Cochran’s Q test for the analysis of k treatment conditions (columns) and N subjects (rows) is given by

$$\displaystyle \begin{aligned} Q = \frac{(k-1)\left( k \displaystyle\sum_{j=1}^{k} C_{j}^{2}-A^{2} \right)}{kA - B}\;, \end{aligned} $$
(10.7)

where

$$\displaystyle \begin{aligned} C_{j} = \sum_{i=1}^{N} x_{ij} \end{aligned}$$

is the number of 1s in the jth of k columns,

$$\displaystyle \begin{aligned} R_{i} = \sum_{j=1}^{k}x_{ij} \end{aligned}$$

is the number of 1s in the ith of N rows,

$$\displaystyle \begin{aligned} A = \sum_{i=1}^{N} R_{i}\;, \quad B = \sum_{i=1}^{N} R_{i}^{2}\;, \end{aligned}$$

and x ij denotes the cell entry of either 0 or 1 associated with the ith of N rows and the jth of k columns. The null hypothesis stipulates that each of the

$$\displaystyle \begin{aligned} M = \prod_{i=1}^{N} \binom{k}{R_{i}} \end{aligned}$$

distinguishable arrangements of 1s and 0s within each of the N rows occurs with equal probability, given that the values of R 1, …, R N are fixed [21].

10.6.2.1 Example

To illustrate Cochran’s Q test for change, consider the binary data listed in Table 10.21 consisting of the responses (1 or 0) for N = 9 subjects evaluated over k = 3 time periods, where a 1 indicates success on a prescribed task and a 0 indicates failure. For the binary data listed in Table 10.21,

$$\displaystyle \begin{aligned} \sum_{j=1}^{k} C_{j}^{2} = 1^{2}+8^{2}+5^{2} = 90\;, \end{aligned}$$
$$\displaystyle \begin{aligned} A = \sum_{i=1}^{N} R_{i} = 2+2+2+2+1+1+1+1+2 = 14\;, \end{aligned}$$
$$\displaystyle \begin{aligned} B = \sum_{i=1}^{N} R_{i}^{2} = 2^{2}+2^{2}+2^{2}+2^{2}+1^{1}+1^{2}+1^{2}+1^{2}+2^{2} = 24\;, \end{aligned}$$
Table 10.21 Successes (1) and failures (0) of N = 9 subjects on a series of k = 3 time periods

and, following Eq. (10.7) on p. 612, the observed value of Cochran’s Q is

$$\displaystyle \begin{aligned} Q = \frac{(k-1)\left( k \displaystyle\sum_{j=1}^{k} C_{j}^{2}-A^{2} \right)}{kA - B} = \frac{(3-1)[(3)(90)-14^{2}]}{(3)(14)-24} = 8.2222\;. \end{aligned}$$

For the binary data listed in Table 10.21, there are only

$$\displaystyle \begin{aligned} M = \prod_{i=1}^{N} \binom{k}{R_{i}} = \binom{3}{1}^{4} \binom{3}{2}^{5} = (3^{4})(3^{5}) = 19{,}683 \end{aligned}$$

possible, equally-likely arrangements in the reference set of all permutations of the observed binary data, making an exact permutation analysis feasible. Based on M = 19, 683 arrangements of the observed data, there are 312 Q values equal to or greater than the observed value of Q = 8.2222. If Q o denotes the observed value of Q, the exact upper-tail probability value of the observed data is

$$\displaystyle \begin{aligned} P \big( Q \geq Q_{\text{o}}|H_{0} \big) = \frac{\text{number of }Q\text{ values } \geq Q_{\text{o}}}{M} = \frac{312}{19{,}683} = 0.0159\;. \end{aligned}$$

For comparison, under the null hypothesis Cochran’s Q is approximately distributed as chi-squared with k − 1 degrees of freedom. The approximate probability of Q = 8.2222 with k − 1 = 3 − 1 = 2 degrees of freedom is P = 0.0164.

10.7 Fisher’s Exact Probability Test

Fisher’s exact probability test was independently developed by R.A. Fisher , Joseph Irwin , and Frank Yates in the early 1930s [11, 14, 34]. Characteristically, Fisher’s exact test is applied to 2×2 contingency tables, but can be generalized and extended to more complex contingency tables. The eponymous exact test for 2×2 tables and several extensions are detailed in Chap. 4, Sects. 4.11 and 4.12. In this chapter on fourfold contingency tables, only 2×2 and 2×2×2 contingency tables are considered.

10.7.1 Analysis of 2×2 Contingency Tables

Consider a 2×2 contingency table containing N cases, where x o denotes the observed frequency of any cell and r and c represent the row and column marginal frequency totals, respectively, corresponding to x o. Table 10.22 illustrates the notation for a 2×2 contingency table.

Table 10.22 Example notation for a 2×2 contingency table

Given the notation in Table 10.22, Fisher’s exact test for 2×2 contingency tables is given by

$$\displaystyle \begin{aligned} P = \sum_{x=a}^{b} p(x|r,c,N)\;, \end{aligned}$$

where \(a = \max (0,r+c-N)\), \(b = \min (r,c)\), and the hypergeometric point probability value is given by

$$\displaystyle \begin{aligned} p(x|r,c,N) = \binom{c}{x}\binom{N-c}{r-x}\binom{N}{r}^{-1} {}= \frac{r!\;(N-r)!\;c!\;(N-c)!}{N!\;x!\;(r-x)!\;(N-r-c-x)!}\;. \end{aligned}$$

To illustrate Fisher’s exact probability test for a multi-way contingency table, consider the 2×2 contingency table given in Table 10.23 where x o = 13, r = 15, c = 20, and N = 30.

Table 10.23 Example 2×2 contingency table for Fisher’s exact test

For the frequency data given in Table 10.23, there are only

$$\displaystyle \begin{aligned} \begin{array}{rcl} &\displaystyle &\displaystyle M = \min(r,c)-\max(0,r+c-N)+1\\ &\displaystyle &\displaystyle \qquad \qquad \qquad = \min(15,20)-\max(0,15+20-30)+1 = 15-5+1 = 11 \end{array} \end{aligned} $$

possible, equally-likely arrangements in the reference set of all permutations of cell frequencies given the observed row and column marginal frequency distributions, {15, 15} and {20, 10}, respectively, making an exact permutation analysis possible. Table 10.24 lists the M = 11 possible values of x and associated hypergeometric point probability values to nine decimal places.

Table 10.24 Probability values for M = 11 possible arrangements of cell frequencies in Table 10.23, given the marginal frequency distributions {15, 15} and {20, 10}

The exact probability value is obtained by summing all the hypergeometric point probability values equal to or less than the hypergeometric point probability value of the observed table, indicated with asterisks in Table 10.24. Thus,

$$\displaystyle \begin{aligned} P = 0.022488756+0.002498751+0.000099950 = 0.025087457 \end{aligned}$$

for the upper tail of the distribution, i.e., the sum of the hypergeometric point probability values associated with x = 13, 14, and 15. Since the probability distribution is symmetric in this case, the exact hypergeometric probability value is twice the probability of the upper tail, i.e., P = 2(0.0251) = 0.0502.

10.7.2 Analysis of 2×2×2 Contingency Tables

Analyses of multi-way contingency tables are more complex than simple two-way tables; see Chap. 4, Sect. 4.12. For a two-way contingency table the degrees of freedom are given by df = (r − 1)(c − 1), where r denotes the number of rows and c denotes the number of columns. Thus, in the case of a 2×2 contingency table the degrees of freedom are (2 − 1)(2 − 1) = 1 and only one cell frequency need be permuted over its range. In the 2×2 example above, the chosen cell (A 1B 1) was designated as x in Table 10.22.

For multi-way contingency tables the degrees of freedom are given by

$$\displaystyle \begin{aligned} \mathit{df} = \prod_{i=1}^{r} c_{i}-\sum_{i=1}^{r} (c_{i}-1)-1\;, \end{aligned}$$

where r denotes the number of dimensions and c i denotes the number of categories in each dimension, i = 1, …, r [24, p. 309]. Thus, for a 2×2×2 contingency table with c = 2 disjoint categories in each of r = 3 dimensions,

$$\displaystyle \begin{aligned} \mathit{df} = 2^{3}-3(2-1)-1 = 4\;. \end{aligned}$$

Consider a 2×2×2 contingency table where n ijk denotes the cell frequency of the ith row, jth column, and kth slice for i, j, k = 1, 2. Let A = n 1.., B = n .1., C = n ..1, and N = n denote the observed marginal frequency totals of the first row, first column, first slice, and entire table, respectively, such that 1 ≤ A ≤ B ≤ C ≤ N∕2. Also, let w = n 111, x = n 112, y = n 121, and z = n 211 denote four cell frequencies of the 2×2×2 contingency table. Then, the probability for any specified w, x, y, and z is given by

$$\displaystyle \begin{aligned} \begin{array}{rcl} &\displaystyle &\displaystyle p(w,x,y,z|A,B,C,N) =\\ &\displaystyle &\displaystyle \qquad \qquad \qquad \big[A!\,(N-A)!\;B!\;(N-B)!\;C!\,(N-C)!\big]\\ &\displaystyle &\displaystyle \qquad \quad \times \big[(N!)^{2}\,w!\;x!\;y!\;z!\;(A-w-x-y)!\;(B-w-x-z)!\\ &\displaystyle &\displaystyle \qquad \qquad \qquad (C-w-y-z)!\;(N-A-B-C+2w+x+y+z)! \big]^{-1} \end{array} \end{aligned} $$

[26].

The bounds for w, x, y, and z are

$$\displaystyle \begin{aligned} 0 \leq &w \leq M_{w}\;,\\ 0 \leq &x \leq M_{x}\;,\\ 0 \leq &y \leq M_{y}\;, \end{aligned} $$

and

$$\displaystyle \begin{aligned}\\ {} L_{x} \leq &z \leq M_{z}\;, \end{aligned} $$

respectively, where M w = A, M x = A − w, M y = A − w − x, \(M_{z} = \min (B-w-x,C-w-y)\), and \(L_{z} = \max (0,A+B+C-N-2w-x-y)\). If w o, x o, y o, and z o denote the values of w, x, y, and z in the observed contingency table, then Fisher’s exact probability value for a 2×2×2 contingency table is given by

where

$$\displaystyle \begin{aligned} \psi(w,x,y,x) = \begin{cases} \,1 & \text{if }p(w,x,y,z) \leq p(w_{\text{o}},x_{\text{o}},y_{\text{o}},z_{\text{o}})\;, \\ {} \,0 & \text{otherwise .} \end{cases} \end{aligned}$$

To illustrate Fisher’s exact probability test, consider the 2×2×2 contingency table given in Table 10.25 where N = 75 and the observed values of w, x, y, and z are w o = 13, x o = 8, y o = 4, and z o = 18. For the frequency data given in Table 10.25 there are M = 77, 910 possible arrangements in the reference set of all permutations of cell frequencies given the observed row, column, and slice marginal distributions, {44, 31}, {44, 31}, and {44, 31}, respectively, making an exact permutation analysis feasible. Fisher’s exact probability is the sum of the hypergeometric point probability values equal to or less than the probability value associated with the observed contingency table; in this case, there are 2,991 tables with probability values equal to or less than the probability value of the observed table, i.e., p = 0.1743×10−4, yielding P = 0.0384.

Table 10.25 Example 2×2×2 contingency table for Fisher’s exact test

10.8 Contingency Table Interactions

It is occasionally necessary to test the independence among multiple classification variables, each of which consists of two mutually exclusive classes, e.g., a 2×2×2 or 23 contingency table. In this section exact permutation procedures are described for analyzing interactions in 2×2×2 and 2×2×2×2 contingency tables.

10.8.1 Analysis of 2×2×2 Contingency Tables

Mielke, Berry, and Zelterman provided a procedure for determining the exact global probability value obtained from an examination of all possible arrangements of the eight cell frequencies of a 2×2×2 contingency table, conditioned on the observed marginal frequency totals [26]. An alternative approach that is not as computationally intensive and, quite possibly, more fruitful is to examine the first- and second-order interactions of a 2×2×2 table when the observed marginal frequency totals are considered to be fixed [22]. This approach was first proposed by Bartlett [1] and has been discussed by Darroch [9, 10], Haber [12, 13], Odoroff [27], Plackett [29], Pomar [30], Simpson [33], and Zachs and Solomon [35]. In this section an algorithm is described that computes the exact probability values of the three first-order (two-variable) interactions and the single second-order (three-variable) interaction.

The logic on which the algorithm is based was apparently first developed by Lambert Adolphe Jacques Quetelet to calculate binomial probability values in 1846 [31]. Beginning with a small arbitrary initial value, a simple recursion procedure generates relative frequency values for all possible 2×2×2 contingency tables, given the observed marginal frequency totals. The desired exact probability value is obtained by summing the relative frequency values equal to or less than the observed relative frequency value and dividing the resultant sum by the unrestricted relative frequency total.

Consider a sample of N independent observations arranged in a 2×2×2 contingency table. Let n ijk denote the observed cell frequency of the ith row, jth column, and kth slice, and let p ijk denote the corresponding cell probability for i, j, k = 1, 2. Also let n .jk, n i.k, n ij., n 1.., n .j., n ..k, and n indicate the observed marginal frequency totals of the 2×2×2 contingency table, and let the corresponding marginals over p ijk be indicated by p .jk, p i.k, p ij., p 1.., p .j., p ..k, and p , respectively, for i, j, k = 1, 2. Because the categories are mutually exclusive and exhaustive, n  = N and p  = 1.

Let r denote the number of dimensions and c i denote the number of categories in each dimension, i = 1, …, r. Then for a 2×2×2 contingency table there are

$$\displaystyle \begin{aligned} \begin{array}{rcl} &\displaystyle &\displaystyle \prod_{i=1}^{r}c_{i}-\sum_{i=1}^{r}(c_{i}-1)-1\\ &\displaystyle &\displaystyle \qquad \qquad \qquad = (2)(2)(2)-[(2-1)+(2-1)+(2-1)]-1 = 8-3-1 = 4 \end{array} \end{aligned} $$

degrees of freedom and, consequently, four interaction terms to be considered: three first-order and one second-order. Following Bartlett, the null hypotheses for the three first-order interactions are

$$\displaystyle \begin{aligned} H_{0}{:}\;p_{.11}p_{.22} = p_{.12}p_{.21}\;, \end{aligned}$$
$$\displaystyle \begin{aligned} H_{0}{:}\;p_{1.1}p_{2.2} = p_{1.2}p_{2.1}\;, \end{aligned}$$

and

$$\displaystyle \begin{aligned} H_{0}{:}\;p_{11.}p_{22.} = p_{12.}p_{21.} \end{aligned}$$

[1]. The null hypothesis for the second-order interaction is

$$\displaystyle \begin{aligned} H_{0}{:}\;p_{111}p_{122}p_{212}p_{221} = p_{112}p_{121}p_{211}p_{222} \end{aligned}$$

[1, 13, 28].

For simplicity, set x = n 111, a = n .11, b = n 1.1, c = n 11., A = n 1.., B = n .1., C = n ..1, and N = n . The point probability of x is given by

$$\displaystyle \begin{gathered} P(x|a,b,c,A,B,C,N) = \big[ A!\;(N-A)!\;B!\;(N-B)!\;C!\;(N-C)!\big]\\ {}\times \big[(N!)^{2}\,x!\;(a-x)!\;(b-x)!\;(c-x)!\;(A-b-c+x)!\\ (B-a-c+x)!\;(C-a-b+x)!\;(N-A-B-C+a+b+c-x)!\big]^{-1}\;. \end{gathered} $$

If H(k), given a, b, c, A, B, C, and N, is a recursively defined positive function, then solving the recursive relation H(k + 1) = H(k) × g(k) yields

$$\displaystyle \begin{aligned} \begin{array}{rcl} &\displaystyle &\displaystyle g(k) =\\ &\displaystyle &\displaystyle \qquad \quad \frac{(a-k)(b-k)(c-k)(N-A-B-C+a+b+c-k)}{(k+1)(A-b-c+k+1)(B-a-c+k+1)(C-a-b+k+1)}\;, \end{array} \end{aligned} $$

which may be used to enumerate the distribution of P(k|a, b, c, A, B, C, N), v ≤ k ≤ w, where

$$\displaystyle \begin{aligned} v = \max(0,b+c-A,a+c-B,a+b-C)\;, \end{aligned}$$
$$\displaystyle \begin{aligned} w = \min(a,b,c,N-A-B-C+a+b+c)\;, \end{aligned}$$

and where H(v) is initially set to some small value, such as 10−20. The total over the completely enumerated distribution may be found by

$$\displaystyle \begin{aligned} T = \sum_{k=v}^{w} H(k)\;.\end{aligned} $$

The exact second-order interaction probability value is found by

$$\displaystyle \begin{aligned} P = \sum_{k=v}^{w}\frac{H(k)I_{k}}{T}\;,\end{aligned} $$

where

$$\displaystyle \begin{aligned} I_{k} = \begin{cases} \,1 & \text{if }H(k) \leq H(x), \\ {} \,0 & \text{otherwise .} \end{cases} \end{aligned}$$

10.8.1.1 A 2×2×2 Contingency Table Example

Table 10.26 depicts a 2×2×2 contingency table based on N = 76 responses to a question (Yes, No) classified by gender (Female, Male), in two elementary school grades (First, Fourth).

Table 10.26 Cross- classification of yes/no responses, categorized by gender and elementary school grade

Table 10.27 provides the cell frequencies for Grade by Gender, conditioned on Response. The first-order interaction probability value associated with the cell frequencies in Table 10.27 is

$$\displaystyle \begin{aligned} \begin{array}{rcl} P(a|a+b,a+c,N) &\displaystyle &\displaystyle = \binom{a+c}{a}\binom{b+d}{b}\binom{N}{a+b}^{-1}\\ &\displaystyle &\displaystyle = \binom{31}{14}\binom{45}{18}\binom{76}{32}^{-1} = \frac{32!\;44!\;31!\;45!}{76!\;14!\;18!\;17!\;27!} = 0.8134\;. \end{array} \end{aligned} $$
Table 10.27 Grade by Gender, conditioned on Response

Table 10.28 provides the cell frequencies for Gender by Response, conditioned on Grade. The first-order interaction probability value associated with the cell frequencies in Table 10.28 is

$$\displaystyle \begin{aligned} \begin{array}{rcl} P(a|a+b,a+c,N) &\displaystyle &\displaystyle = \binom{a+c}{a}\binom{b+d}{b}\binom{N}{a+b}^{-1}\\ &\displaystyle &\displaystyle = \binom{33}{16}\binom{43}{15}\binom{76}{31}^{-1} = \frac{31!\;45!\;33!\;43!}{76!\;16!\;15!\;17!\;28!} = 0.2496\;. \end{array} \end{aligned} $$
Table 10.28 Gender by Response, conditioned on Grade

Table 10.29 provides the cell frequencies for Grade by Response, conditioned on Gender The first-order interaction probability value associated with the cell frequencies in Table 10.29 is

$$\displaystyle \begin{aligned} \begin{array}{rcl} P(a|a+b,a+c,N) &\displaystyle &\displaystyle = \binom{a+c}{a}\binom{b+d}{b}\binom{N}{a+b}^{-1}\\ &\displaystyle &\displaystyle = \binom{33}{12}\binom{43}{20}\binom{76}{32}^{-1} = \frac{32!\;44!\;33!\;43!}{76!\;12!\;20!\;21!\;23!} = 0.4830\;. \end{array} \end{aligned} $$
Table 10.29 Grade by Response, conditioned on Gender

The second-order interaction probability value for the frequency data given in Table 10.29 is P = 0.9036×10−3 and the global probability of a table this extreme or more extreme than the observed table in Table 10.29 is P = 0.4453×10−2 [26].

10.8.2 Analysis of 2×2×2×2 Contingency Tables

Utilizing the recursion procedure presented in the previous example, it is possible to analyze a 2×2×2×2 or 24 contingency table [23]. The conditional probability value of a 2×2×2×2 contingency table is a special case of the conditional probability of an r-way contingency table as defined in Mielke and Berry [20]. Zelterman, Chan, and Mielke [36] provided an algorithm for the exact global probability value obtained from an examination of all possible arrangements of the 16 cell frequencies of a 2×2×2×2 contingency table, conditioned on the observed marginal frequency totals. An alternative approach is to examine the first-, second-, and third-order interactions in a 2×2×2×2 table when the observed marginal frequency totals are considered to be fixed.

Let r denote the number of dimensions and c i denote the number of categories in each dimension, i = 1, …, r, then for a 2×2×2×2 contingency table there are

$$\displaystyle \begin{aligned} \begin{array}{rcl} &\displaystyle &\displaystyle \prod_{i=1}^{r}c_{i}-\sum_{i=1}^{r}(c_{i}-1)-1\\ &\displaystyle &\displaystyle \qquad \qquad = (2)(2)(2)(2)-[(2-1)+(2-1)+(2-1)+(2-1)]-1\\ &\displaystyle &\displaystyle \qquad \qquad \qquad \qquad \qquad \qquad \qquad = 16-4-1 = 11 \end{array} \end{aligned} $$

degrees of freedom and, consequently, 11 interaction terms to be considered: six first-order and four second-order, and one third-order. In this section, a procedure is described for computing the exact probability values of the six first-order (two-variable) interactions, the four second-order (three-variable) interactions, and the single third-order (four-variable) interactions for a 2×2×2×2 contingency table.

Following Mielke [19], let \(p_{i_{1}i_{2}i_{3}i_{4}}\) denote the probability of cell i 1i 2i 3i 4 in a 2×2×2×2 contingency table, where the index i j = 1 or 2 for j = 1, 2, 3, 4. The six null hypotheses of no first-order interactions for a 2×2×2×2 contingency table are

$$\displaystyle \begin{aligned} H_{0}{:}\;p_{1100}p_{2200} &= p_{1200}p_{2100}\;,\\ H_{0}{:}\;p_{1010}p_{2020} &= p_{1020}p_{2010}\;,\\ H_{0}{:}\;p_{1001}p_{2002} &= p_{1002}p_{2001}\;,\\ H_{0}{:}\;p_{0110}p_{0220} &= p_{0120}p_{0210}\;,\\ H_{0}{:}\;p_{0101}p_{0202} &= p_{0102}p_{0201}\;, \end{aligned} $$

and

$$\displaystyle \begin{aligned}\\ {} H_{0}{:}\;p_{0011}p_{0022} &= p_{0012}p_{0021}\;, \end{aligned} $$

where the usual summation convention is employed. Thus, p 0101 is the sum over indices i 1 and i 3. The four null hypotheses of no second-order interaction for a 2×2×2×2 contingency table are

$$\displaystyle \begin{aligned} H_{0}{:}\;p_{1110}p_{2210}p_{1220}p_{2120} &= p_{1120}p_{2220}p_{1210}p_{2110}\;,\\ H_{0}{:}\;p_{1101}p_{2201}p_{1202}p_{2102} &= p_{1102}p_{2202}p_{1201}p_{2101}\;,\\ H_{0}{:}\;p_{1011}p_{2021}p_{1022}p_{2012} &= p_{1012}p_{2022}p_{1021}p_{2011}\;, \end{aligned} $$

and

$$\displaystyle \begin{aligned}\\ {} H_{0}{:}\;p_{0111}p_{0221}p_{0122}p_{0212} &= p_{0112}p_{0222}p_{0121}p_{0211}\;. \end{aligned} $$

The null hypothesis of no third-order interaction for a 2×2×2×2 contingency table is given by

$$\displaystyle \begin{aligned} \begin{array}{rcl} &\displaystyle &\displaystyle H_{0}{:}\;p_{1111}p_{2211}p_{1221}p_{2121}p_{1122}p_{2222}p_{1212}p_{2112}\\ &\displaystyle &\displaystyle \qquad \qquad \qquad = p_{1112}p_{2212}p_{1222}p_{2122}p_{1121}p_{2221}p_{1211}p_{2111}\;. \end{array} \end{aligned} $$

Table 10.30 contains data from a 2×2×2×2 contingency table based on N = 1, 356 responses classified on four dichotomous variables: A, B, C, and D. The first-, second-, and third-order interaction exact probability values associated with the data listed in Table 10.30 are given in Table 10.31.

Table 10.30 Example data for a 2×2×2×2 contingency table
Table 10.31 Interactions and associated exact hypergeometric probability values for the data listed in Table 10.30

10.9 Coda

Chapter 10 applied exact and Monte Carlo permutation statistical methods to measures of association for symmetrical 2×2 contingency tables. Included in Chap. 10 were discussions of Pearson’s ϕ, Tschuprov’s T 2, and Cramér’s V 2 coefficients of contingency, Pearson’s product-moment correlation coefficient, Leik and Gove’s \(d_{N}^{\,c}\) measure, Goodman and Kruskal’s t a and t b asymmetric measures of nominal association, Kendall’s τ b and Stuart’s τ c measures of ordinal association, Somers’ d yx and d xy asymmetric measures of ordinal association, Yule’s Y measure of nominal association, simple percentage differences, and Cohen’s unweighted and weighted κ measures of inter-rater agreement.

Chapter 10 concluded with an examination of extensions to multiple 2×2 contingency tables and 2×2×2 contingency tables, including the Mantel–Haenszel test for combined 2×2 contingency tables, Cohen’s chance-corrected measure of inter-rater agreement, McNemar’s and Cochran’s Q tests for change, Fisher’s exact test for 2×2×2 and 2×2×2×2 contingency tables, and tests for interactions in 2×2×2 and 2×2×2×2 contingency tables.