Abstract
This chapter continues the discussion measures of association for 2×2 contingency tables initiated in the previous chapter, but concentrates on symmetrical 2×2 contingency tables. Included in this chapter are permutation statistical methods applied to Pearson’s ϕ 2, Tschuprov’s T 2, and Cramér’s V 2 coefficients of contingency, Pearson’s product-moment correlation coefficient, Leik and Gove’s \(d_{N}^{\,c}\) measure, Goodman and Kruskal’s t a and t b measures, Kendall’s τ b and Stuart’s τ c measures, Yule’s Y measure, and Cohen’s κ measure of inter-rate agreement.
Access provided by CONRICYT-eBooks. Download chapter PDF
Similar content being viewed by others
Chapter 10 of The Measurement of Association continues the discussion of fourfold (2×2) contingency tables initiated in Chap. 9, but concentrates on symmetrical 2×2 contingency tables, where each marginal frequency total is equal to N∕2. In the same way that 2×2 contingency tables are special cases of r×c contingency tables, symmetrical 2×2 contingency tables are special cases of fourfold tables. Symmetrical 2×2 tables provide additional insight into the relationships among various measures of association.
Included in Chap. 10 are exact and Monte Carlo permutation statistical methods applied to Pearson’s ϕ 2, Tschuprov’s T 2, Cramér’s V 2, Pearson’s r xy product-moment correlation coefficient, Leik and Gove’s \(d_{N}^{\,c}\) measure of nominal association, Goodman and Kruskal’s t a and t b asymmetric measures, Kendall’s τ b and Stuart’s τ c measures, Somers’ d yx and d xy asymmetric measures, simple percentage differences, D x and D y, Yule’s Y measure of nominal association, and Cohen’s unweighted and weighted κ measures of chance-corrected inter-rater agreement.
Also included in Chap. 10 are some extensions to multiple 2×2 contingency tables and 2×2×2 contingency tables, including the Mantel–Haenszel test for combined 2×2 contingency tables, Cohen’s kappa measure of chance-corrected inter-rater agreement, McNemar’s and Cochran’s Q tests, Fisher’s exact test for 2×2×2 and 2×2×2×2 contingency tables, and tests for interactions in 2×2×2 and 2×2×2×2 contingency tables.
10.1 Symmetrical Fourfold Tables
A symmetrical fourfold contingency table is a 2×2 contingency table in which N is even and each marginal frequency total is equal to N∕2. To illustrate the analysis of symmetrical fourfold contingency tables, consider the general layout of a 2×2 table, such as given in Table 10.1, and an example 2×2 frequency table, such as given in Table 10.2, where each marginal frequency total is equal to N∕2 = 12∕2 = 6.
10.1.1 Statistics ϕ 2, T 2, and V 2
For the frequency data given in Table 10.2, Pearson’s chi-squared test statistic is given by
where O ij is the observed cell frequency for i, j = 1, 2, R i denotes a row total for i = 1, 2, and C j denotes a column total for j = 1, 2. Thus, for the frequency data given in Table 10.2,
Then, Pearson’s ϕ measure of association is given by
and ϕ 2 = (0.3333)2 = 0.1111. Alternatively, using the notation given in Table 10.1,
Tschuprov’s measure of nominal association is
and \(T = \sqrt {T^{2}} = \sqrt {0.1111} = 0.3333\). Also, Cramér’s measure of nominal association is
and \(V = \sqrt {V^{2}} = \sqrt {0.1111} = 0.3333\). Thus, Pearson’s ϕ, Tschuprov’s T, and Cramér’s V are equivalent for a symmetrical 2×2 contingency table.
10.1.2 Pearson’s r xy Correlation Coefficient
Next, consider Pearson’s product-moment correlation coefficient given by
The binary-coded (0, 1) data listed in Table 10.3 were obtained from the frequency data given in Table 10.2, where Objects 1 through 4, coded (0, 0), represent the four objects in row 1 and column 1 of Table 10.2; Objects 5 and 6, coded (0, 1), represent the two objects in row 1 and column 2; Objects 7 and 8, coded (1, 0), represent the two objects in row 2 and column 1; and Objects 9 through 12, coded (1, 1), represent the four objects in row 2 and column 2 of Table 10.2.
For the binary-coded data listed in Table 10.3,
Pearson’s product-moment correlation coefficient is
and \(r_{xy}^{2} = (+0.3333)^{2} = 0.1111\).
10.1.3 Regression Coefficients
For the binary-coded data listed in Table 10.3, the slope (unstandardized regression coefficient) of the regression line with variable y the dependent variable is
and the standardized regression coefficient with variable x the dependent variable is
Also the unstandardized regression coefficient with variable x the dependent variable is
and the standardized regression coefficient with variable y the dependent variable is
Thus it is demonstrated that \(\phi = T = V = r_{xy} = b_{yx} = b_{xy} = \hat {\beta }_{yx}\) = \(\hat {\beta }_{xy}\) for a symmetrical 2×2 contingency table.
10.1.4 Leik and Gove’s \(d_{N}^{\,c}\) Statistic
Leik and Gove’s \(d_{N}^{\,c}\) test statistic for two nominal-level variables is described in detail in Chap. 4, Sect. 4.9. As noted by Leik and Gove , for symmetrical 2×2 contingency tables, \(d_{N}^{\,c}\) is equivalent to the traditional chi-squared-based measures such as Pearson’s ϕ 2, Tschuprov’s T 2, and Cramér’s V 2 [15, p. 291]. Test statistic \(d_{N}^{\,c}\) is based on three r×c contingency tables: one r×c contingency table containing the observed cell frequency values, a second r×c contingency table containing the expected cell frequency values, and a third r×c contingency table containing the maximized cell frequency values. Here, the observed values of concordant pairs, C; discordant pairs, D; pairs tied on variable x, T x; pairs tied on variable y, T y; and pairs tied on both variables x and y, T xy, are indicated without primes, the expected values of concordant pairs, C; discordant pairs, D; pairs tied on variable x, T x; pairs tied on variable y, T y; and pairs tied on both variables x and y, T xy, are indicated with a single prime (′), and the maximized values of concordant pairs, C; discordant pairs, D; pairs tied on variable x, T x; pairs tied on variable y, T y; and pairs tied on both variables x and y, T xy, are indicated with double primes (′′).
Consider \(d_{N}^{\,c}\) for a symmetrical 2×2 contingency table, where
For the observed data given in Table 10.2 on p. 578, replicated in Table 10.4 for convenience, the observed values of C, D, T x, T y, and T xy are
and
Next, consider the expected values for the observed data in Table 10.4, given in Table 10.5, where
For the expected cell values given in Table 10.5,
and
Finally, consider the maximized cell frequencies for the data in Table 10.4, given in Table 10.6. For the maximized values given in Table 10.6,
and
Then, Leik and Gove’s \(d_{N}^{\,c}\) measure is
or
or
or
Thus it is demonstrated that \(\phi ^{2} = T^{2} = V^{2} = r_{xy}^{2} = d_{N}^{\,c}\) for a symmetrical 2×2 contingency table.
10.1.5 Goodman and Kruskal’s t a and t b Statistics
Goodman and Kruskal’s t a and t b measures of nominal association are discussed in Chap. 4, Sect. 4.3. Consider the notation for a 2×2 contingency table given in Table 10.7. For the frequency data given in Table 10.4 on p. 582, Goodman and Kruskal’s asymmetric measure of association with variable a the dependent variable is
and Goodman and Kruskal’s asymmetric measure with variable b the dependent variable is
10.1.6 Kendall’s τ b Statistic
Kendall’s τ b measure of ordinal association is detailed in Chap. 5, Sect. 5.4. For the frequency data given in Table 10.4 on p. 582, the number of concordant pairs is
the number of discordant pairs is
the number of pairs tied on variable x but not tied on variable y is
and the number of pairs tied on variable y but not tied on variable x is
Then, Kendall’s τ b measure is
Alternatively, following the notation given in Table 10.1,
10.1.7 Stuart’s τ c Statistic
Stuart’s τ c measure of ordinal association is discussed in Chap. 5, Sect. 5.5 and is given by
where m is the minimum number of rows or columns. For the frequency data given in Table 10.4 on p. 582, \(m = \min (r,c) = \min (2,2) = 2\), the number of concordant pairs is
the number of discordant pairs is
Kendall’s S is
and Stuart’s τ c measure is
10.1.8 Somers’ d yx and d xy Statistics
Somers’ d yx and d xy asymmetric measures of ordinal association are discussed in Chap. 5, Sect. 5.7. For the frequency data given in Table 10.4 on p. 582, Somers’ asymmetric measure of association with variable y the dependent variable is
and Somers’ asymmetric measure with variable x the dependent variable is
Alternatively,
and
10.1.9 Percentage Differences
Percentage differences are discussed in Chap. 9, Sect. 9.10. For the frequency data given in Table 10.4 on p. 582, the percentage difference for variable x is
and the percentage difference for variable y is
10.1.10 Yule’s Y Statistic
Yule’s Y measure of nominal association is discussed in Chap. 9, Sect. 9.6. For the frequency data given in Table 10.4 on p. 582, Yule’s coefficient of colligation is
10.1.11 Cohen’s κ Statistic
Cohen’s unweighted kappa measure of inter-rater agreement is discussed in Chap. 4, Sect. 4.5, and Cohen’s linear and quadratic weighted kappa measures of inter-rater agreement are discussed in Chap. 6, Sect. 6.5. For the frequency data given in Table 10.4 on p. 582, let O ii for i = 1, 2 denote the observed cell frequencies on the principal diagonal and E ii for i = 1, 2 denote the expected cell frequencies on the principal diagonal. Then, Cohen’s unweighted chance-corrected coefficient of inter-rater agreement is
Cohen’s weighted kappa measure of inter-rater agreement for b = 2 judges and c categories is given by
where n ij denotes the observed cell frequencies, w ij denotes the cell weights, R i and C j denote the observed row and column marginal frequency totals for i, j = 1, …, c, and
denotes the table frequency total. For Cohen’s unweighted kappa measure of inter-rater agreement, the cell disagreement “weights” are given by
and for Cohen’s weighted kappa measure of inter-rater agreement the cell disagreement weights are given by
for linear weighting, and
for quadratic weighting. For the frequency data given in Table 10.4 on p. 582, Cohen’s linear-weighted kappa measure of inter-rater agreement is κ w = +0.3333 and Cohen’s quadratic-weighted kappa measure of inter-rater agreement is κ w = +0.3333.
10.2 Inter-relationships Among the Measures
The inter-relationships among the various measures for a symmetrical 2×2 contingency table can be summarized as follows. The Pearson product-moment correlation coefficient, r xy; the unstandardized slopes of the two regression lines, b yx and b xy; Yule’s coefficient of colligation, Y ; Pearson’s mean-square contingency coefficient, ϕ; Tschuprov’s T measure; Cramér’s V measure; Kendall’s τ b measure; Stuart’s τ c measure; Somers’ d yx and d xy asymmetric measures; the two percentage differences, D x and D y; and Cohen’s κ unweighted and weighted measures of chance-corrected inter-rater agreement are all equivalent measures, i.e.,
Also, Pearson’s squared product-moment correlation coefficient, \(r_{xy}^{2}\); Pearson’s mean-squared contingency coefficient, ϕ 2; Tschuprov’s T 2 measure; Cramér’s V 2 measure; Leik and Gove’s \(d_{N}^{\,c}\) measure; and Goodman and Kruskal’s t b and t a measures of association are all equivalent measures, i.e.,
10.2.1 Notational Inconsistencies
Measures of association for 2×2 contingency tables in particular, and r×c contingency tables in general, can be very confusing. First, some measures are denoted by uppercase Latin letters, e.g., Yule’s Q and Y , Tschuprov’s T 2, and Cramér’s V 2; some measures are denoted by lowercase Latin letters, e.g., Somers’ d yx and d xy, Leik and Gove’s \(d_{N}^{\,c}\), and Goodman and Kruskal’s t a and t b; and some measures are denoted by lowercase Greek letters, e.g., Pearson’s ϕ, Kendall’s τ b, and Cohen’s κ. While it would be preferable to reserve Greek letters for population parameters that are being estimated by sample statistics and Latin letters for sample statistics, once symbols are in common use it is difficult to standardize usage.Footnote 1 Second, certain measures of association appear as squared, whereas others do not. In particular, for the 2×2 case, the non-squared symbols t b and t a for Goodman and Kruskal’s asymmetric measures of nominal association are equivalent to Pearson’s symmetric measures ϕ 2 and \(r_{xy}^{2}\). Third, some measures norm between 0 and 1 for 2×2 contingency tables, e.g., Goodman and Kruskal’s t a and t b; others norm between − 1 and + 1, e.g., Kendall’s τ b and Cramér’s V ; and still others norm between 0 and ∞, e.g., the odds ratio. Finally, some measures identify the two variables as x and y, e.g., Somers’ d yx and d xy, while others identify the two variables as a and b, e.g., Kendall’s τ a and τ b.
10.3 Extended Fourfold Contingency Tables
In some cases, measures of association have been introduced to analyze fourfold tables that have either been extended to analyze a series of 2×2 contingency tables or redesigned to consider multidimensional contingency tables with two categories in each dimension. In this section a small number of such measures are considered, including the Mantel –Haenszel test, McNemar’s Q test, Cochran’s Q test, Cohen’s chance-corrected measure of inter-rater agreement, Fisher’s exact probability test for 2×2×2 contingency tables, and tests for interactions in 2×2×2 and 2×2×2×2 contingency tables
10.4 The Mantel–Haenszel Test
The Mantel–Haenszel test, developed by Nathan Mantel and William Haenszel in 1959, is a test of significance for S combined 2×2 contingency tables.Footnote 2 Suppose that a treatment is compared with a control in each of S strata, where the outcome is binary: success or failure. Of interest is whether or not the treatment increases the probability of success.
Let n ijk denote the cell frequency for i, j = 1, 2 discrete categories and k = 1, …, S discrete strata for a 2×2×S contingency table. Table 10.8 illustrates a three-way contingency table with r = 2 rows, c = 2 columns, and S strata. Denote by a dot (⋅) the partial sum of all rows, all columns, or all strata, depending on the position of the (⋅) in the subscript list. If the (⋅) is in the first subscript position, the sum is over all rows; if the (⋅) is in the second subscript position, the sum is over all columns; and if the (⋅) is in the third subscript position, the sum is over all strata. Thus, n i.. denotes the marginal frequency total of the ith row, i = 1, 2, summed over all columns and strata; n .j. denotes the marginal frequency total of the jth column, j = 1, 2, summed over all rows and strata; n ..k denotes the marginal frequency total of the kth stratum, k = 1, …, S, summed over all rows and columns; and n … denotes the table frequency total. The Mantel–Haenszel statistical model, under the null hypothesis, states that the S 2×2 contingency tables are independent and the marginal frequency totals for each of the 2×2 contingency tables are fixed [17]. Then, the probability for the n 11k frequency of each of the 2×2 contingency tables under the null hypothesis is the hypergeometric point probability value given by
where n ..k = n 11k + n 12k + n 21k + n 22k, n 2.k = n ..k − n 1.k, n .2k = n ..k − n .1k, and k = 1, …, S.
The test statistic of interest is given by
where the summation is over only one cell since for any 2×2 contingency table with fixed marginal frequency totals the entry in any one cell determines the entries in the remaining three cells.
Under the null hypothesis (H 0) of the model in Eq. (10.2), the mean and variance of test statistic T are given by
and
respectively. The Mantel–Haenszel test statistic, corrected for continuity, is given by
The Mantel–Haenszel test statistic, M, is approximately distributed as Pearson’s chi-squared with one degree of freedom as N →∞.Footnote 3
10.4.1 Example Analysis
Consider the example data set given in Table 10.9 with r = 2 rows, c = 2 columns, S = 3 strata, and n … = 74 total observations. For the data listed in Table 10.9, the observed value of test statistic T is
the expected value of T under the null hypothesis is
the variance of T is
and the observed Mantel–Haenszel test statistic is
Mantel and Haenszel’s M test statistic is approximately distributed as Pearson’s chi-squared with one degree of freedom. For the observed value of M o = 13.2742 the approximate chi-squared probability value is P = 0.2691×10−3.
In Eq. (10.3), E[T|H 0], VAR(T|H 0), and the correction factor, are all invariant under permutation, leaving only variable T. Thus, for the data listed in Table 10.9 the approximate Monte Carlo resampling probability value based on L = 1, 000, 000 random arrangements of the observed data under the null hypothesis is
10.4.2 Measures of Effect Size
Two types of measures of effect size have been proposed to represent the strength of a treatment effect [32]. One type, designated the d-family, is based on one or more measures of the differences between groups or levels of an independent variable. Representative of the d-family is Cohen’s d, which calculates the effect size by the number of standard deviations separating the means of the groups or levels [8]. The second type of measure of effect size, designated the r-family, represents some sort of correlation between the independent variables. Measures in the r-family are typically measures of correlation or association, the most prominent being Pearson’s squared product-moment correlation coefficient. Since the Mantel–Haenszel test is based on a 2×2×S contingency table, the d-family is not applicable.
The r-family measures of effect size contains two types of measures: putative maximum-corrected and chance-corrected. Maximum-corrected measures of effect size standardize the observed test statistic value by the maximum possible value of the test statistic. Maximum-corrected measures of effect size are bounded between 0 and 1 and are interpretable as the proportion of the maximum possible value of the test statistic. On the other hand, chance-corrected measures of effect size standardize the observed test statistic value by the expected value of the test statistic. Chance-corrected measures of effect size can attain a maximum value of + 1, but may be less than 0 when the test statistic value is less than expected by chance and are interpretable as the proportion above, or below, what is expected by chance. In 2010 Berry , Johnston , and Mielke developed two measures of effect size for the Mantel–Haenszel test statistic: a maximum-corrected and a chance-corrected measure of effect size [2].
10.4.2.1 Maximum-Corrected Measure of Effect Size
Let M o and T o denote the observed values of M and T, respectively. Then, the maximum-corrected measure of effect size is given by M o divided by the maximum possible value of M. The maximum value of T for an observed 2×2×S contingency table is given by
where \(\min (n_{1.k}, n_{.1k})\) is the maximum value of n 11k in the kth of S 2×2 contingency tables. Thus, the maximum value of M is given by
and the maximum-corrected measure of effect size for M is given by the observed value of M divided by the maximum value of M, i.e.,
For the frequency data given in Table 10.2 on p. 578, the maximum value of T is
the maximum value of M is
and the maximum-corrected measure of effect size is
indicating that M o = 13.2742 accounts for approximately 30% of the maximum value of M, given the observed row, column, and stratum marginal frequency distributions, {12, 62}, {16, 58}, and {32, 24, 18}, respectively.
10.4.2.2 Chance-Corrected Measure of Effect Size
A chance-corrected measure of effect size for the Mantel–Haenszel test may be given by statistic M, standardized by the expected value of M. Thus, the chance-corrected measure is given by
where E[M] = 1 since the mean of a chi-squared distribution is equal to the degrees of freedom and M is approximately distributed as chi-squared with one degree of freedom. For the frequency data given in Table 10.2, the chance-corrected measure of effect size is
indicating that M o = 13.2742 accounts for approximately 28% above what is expected by chance. In general, chance-corrected measures of effect size, such as ESC, tend to slightly smaller values than maximum-corrected measures, such as ESM, for the same set of data [2, pp. 398–399].
10.5 Cohen’s Kappa Measure
In 1960 Jacob Cohen introduced statistic kappa, an unweighted, chance-corrected measure of inter-rater agreement between two judges for a set of c disjoint, unordered categories [6]. In 1968 Cohen expanded kappa to include weighting for measuring the agreement between two judges for a set of c disjoint, ordered categories [7]. Unweighted kappa is discussed more completely in Chap. 4, Sect. 4.5, and weighted kappa is discussed in detail in Chap. 6, Sect. 6.5. Whereas unweighted kappa for categorical data did not distinguish among magnitudes of disagreement, weighted kappa for ordinal-level data incorporated the magnitude of each disagreement and provided partial credit for disagreements when agreement was not complete [16]. Weighted kappa is easily extended to interval-level data [3]. The usual approach is to assign weights to each disagreement pair with larger weights indicating greater disagreement. In the cases of both, unweighted and weighted kappa, kappa is equal to +1 when perfect agreement between the two judges occurs, 0 when agreement is equal to that expected under independence, and negative when agreement is less than expected by chance. Unweighted kappa and weighted kappa are conventionally designated as κ and κ w, respectively. Two forms of weighting are popular for weighted kappa: linear weighting, in which category disagreement weights progress outward linearly from the agreement diagonal, and quadratic weighting, in which category disagreement weights progress outward geometrically from the agreement diagonal. In keeping with the theme of this chapter—fourfold contingency tables—κ and κ w are extended to multiple judges with c = 2 categories.
Consider first b = 2 judges and c = 2 categories. A generalized calculation formula that applies to both unweighted and weighted kappa for b = 2 judges and c categories is given by
where n ij denotes the observed cell frequencies, w ij denotes the cell weights, R i and C j denote the observed row and column marginal frequency totals for i, j = 1, …, c, and
denotes the table frequency total.
Given a c×c agreement table with N objects cross-classified by the ratings of two independent judges into c disjoint categories, an exact permutation test generates all M possible, equally-likely arrangements of the N objects in the c 2 cells, while preserving the total number of objects in each category, i.e., the marginal frequency distributions. For each arrangement of cell frequencies with fixed marginal frequency distributions, the kappa statistic, κ, and the exact point probability, p(n ij|n i., n .j, N), are calculated, where
is the conventional hypergeometric probability of a c×c contingency table.
Let κ o denote the value of the observed weighted kappa statistic and M denote the total number of distinct cell frequency arrangements of the N objects in the c×c agreement table, given fixed marginal frequency totals. Then the exact probability value of κ o under the null hypothesis is given by
where
When M is very large, exact permutation analyses quickly become impractical and Monte Carlo resampling procedures become necessary. Let L denote a random sample of all M possible values of κ. Then, under the null hypothesis the resampling approximate probability value for the observed value of κ, κ o is given by
where
To calculate Cohen’s unweighted kappa with Eq. (10.4) on p. 597, the cell disagreement “weights” are given by
To calculate Cohen’s weighted kappa with linear weighting, the cell disagreement weights are given by
To calculate Cohen’s weighted kappa with quadratic weighting, the cell disagreement weights are given by
Thus, as demonstrated, for b = 2 judges and c = 2 categories, the cell disagreement weights are the same for unweighted kappa (κ) and weighted kappa (κ w) with either linear or quadratic weighting.
10.5.1 Example 1
To illustrate the application of Cohen’s unweighted kappa with b = 2 judges and c = 2 categories, consider the frequency data given in Table 10.10, where b = 2 independent judges have each assigned N = 123 observations to c = 2 disjoint, unordered categories labeled Pro and Con. Assign the number 1 to the categories labeled “Pro” and the number 2 to the categories labeled “Con.” Then following Eq. (10.4) on p. 597,
indicating approximately 33% agreement between the two judges above that expected by chance.
For the frequency data given in Table 10.10, there are only
possible, equally-likely arrangements in the reference set of all permutations of the cell frequencies in Table 10.10 given the observed row and column marginal frequency distributions, {65, 58} and {60, 63}, respectively, making an exact permutation analysis possible. If the M = 59 possible arrangements of the frequency data given in Table 10.10 occur with equal chance, the exact probability value of κ under the null hypothesis is the sum of the hypergeometric point probability values associated with κ = +0.3343 or greater.
Table 10.11 lists the n 11 cell frequency values, unweighted kappa values, and associated hypergeometric probability values for the frequency data given in Table 10.10, where the n 11 cell values associated with κ values equal to or greater than the observed value of κ = +0.3343 are indicated with asterisks. Because there is only one degree of freedom, it is sufficient to list the cell frequency values for only one cell, n 11. For the frequency data given in Table 10.10, the exact upper-tail hypergeometric probability value of the observed κ value is
10.5.2 Example 2
Although weighted and unweighted kappa were originally formulated to compare only two judges, both κ and κ w can be generalized to accommodate multiple judges [25]. However, with multiple judges an exact permutation analysis becomes impractical except for very small sample sizes; therefore, a Monte Carlo resampling permutation analysis is preferred when analyzing agreement data from multiple judges. The analysis for b multiple judges may be conceptualized as a b-way contingency table with c = 2 categories on each axis. Figure 10.1 illustrates a 2×2×2 contingency table with b = 3 judges and c = 2 disjoint, unordered categories labeled Pro and Con.
To illustrate the application of Cohen’s kappa with multiple judges and c = 2 disjoint categories, consider the frequency data given in Table 10.12, where b = 3 judges have independently assigned N = 254 observations to c = 2 categories labeled Pro and Con. A generalized calculation formula that applies to both unweighted and weighted kappa for b = 3 judges and c categories is given by
where n ijk denotes the observed cell frequencies, w ijk denotes the cell weights, R i, C j, and S k denote the observed row, column, and slice marginal frequency totals for i, j, k = 1, …, c, and
denotes the table frequency total.
Given a c×c×c agreement table with N objects cross-classified by b = 3 independent judges, an exact permutation test involves generating all possible, equally-likely arrangements of the N objects to the c 3 cells, while preserving the observed row, column, and slice marginal frequency distributions, {123, 131}, {135, 119}, and {134, 120}, respectively. For each arrangement of cell frequencies, the kappa statistic, κ, and the exact hypergeometric point probability value under the null hypothesis, p(n ijk|R i, C j, S k, N), are calculated, where
[20].
If κ o denotes the value of the observed kappa test statistic, the exact probability value of κ o under the null hypothesis is given by
where
and M denotes the total number of possible, equally-likely arrangements in the reference set of all permutations of cell frequencies in Table 10.12 given the observed marginal frequency distributions. When M is very large, as is typical with multi-way contingency tables, exact tests are impractical and Monte Carlo resampling becomes necessary, where a random sample, L, of the M possible arrangements of cell frequencies provides for a comparison of κ test statistics calculated on the L random tables with the κ test statistic calculated on the observed table.
10.5.2.1 Unweighted Kappa
Unweighted kappa and weighted kappa, with either linear or quadratic weighting, yield the same result when analyzing agreement data for b = 2 judges and c = 2 categories. For b > 2 judges and c = 2 categories, unweighted kappa and weighted kappa usually yield different results, but weighted kappa with linear weighting and weighted kappa with quadratic weighting yield the same result. For the frequency data given in Table 10.12, assign the number 1 to the categories labeled “Pro” and the number 2 to the categories labeled “Con.” Then the cell disagreement “weights” for unweighted kappa are given by
Following Eq. (10.5) on p. 602, Cohen’s unweighted kappa coefficient is κ = +0.1862, indicating approximately 19% agreement among the b = 3 judges above that expected by chance. If κ o denotes the observed value of κ, the approximate Monte Carlo resampling probability value based on L = 1, 000, 000 random arrangements of the cell frequencies, given the observed row, column, and slice marginal frequency distributions, {123, 131}, {135, 119}, and {134, 120}, respectively, is
10.5.2.2 Weighted Kappa
For the frequency data given in Table 10.12, assign the number 1 to the categories labeled “Pro” and the number 2 to the categories labeled “Con.” Then the linear cell disagreement weights are given by
and the quadratic cell disagreement weights are given by
for i, j, k = 1, …, c. Table 10.13 lists the eight cell indices and the associated linear and quadratic weights for a 2×2×2 agreement table, demonstrating that with c = 2 categories, the linear and quadratic weights are identical.
Following Eq. (10.5) on p. 602, Cohen’s weighted kappa with linear weighting is κ w = +0.0342, indicating approximately 3% agreement among the b = 3 judges above that expected by chance. If κ o denotes the observed value of κ w, the approximate Monte Carlo resampling probability value based on L = 1, 000, 000 random arrangements of the cell frequencies, given the observed row, column, and slice marginal frequency distributions, {123, 131}, {135, 119}, and {134, 120}, respectively, is
Because with c = 2 categories the linear and quadratic weights are the same, the results are identical with quadratic weighting, i.e., κ w = +0.0342 and P = 0.1906.
10.5.3 Example 3
For this third example of Cohen’s chance-corrected measure of inter-rater agreement, consider b = 4 judges who independently assign N = 76 observations to c = 2 disjoint, unordered categories labeled Pro and Con. The frequency data are given in Table 10.14.
A generalized calculation formula that applies to both unweighted and weighted kappa for b = 4 judges and c categories is given by
where n ijkl denotes the observed cell frequencies, w ijkl denotes the cell weights, R i, C j, S k, and L l denote the observed row, column, slice, and level marginal frequency totals for i, j, k, l = 1, …, c, and
denotes the table frequency total.
Given a c×c×c×c agreement table with N objects cross-classified by b = 4 independent judges, an exact permutation test involves generating all possible, equally-likely arrangements of the N objects to the c 4 cells, while preserving the observed row, column, slice, and level marginal frequency distributions, {33, 43}, {34, 42}, {39, 37}, and {44, 32}, respectively. For each arrangement of cell frequencies, the kappa statistic, κ, and the exact hypergeometric point probability value under the null hypothesis, p(n ijkl|R i, C j, S k, L l, N), are calculated, where
[20].
If κ o denotes the value of the observed kappa test statistic, the exact probability value of κ o under the null hypothesis is given by
where
and M denotes the total number of possible, equally-likely arrangements in the reference set of all permutations of cell frequencies in Table 10.14 given the row, column, slice, and level observed marginal frequency distributions, {33, 43}, {34, 42}, {39, 37}, and {44, 32}, respectively. When M is very large, as is typical with multi-way contingency tables, exact tests are impractical and Monte Carlo resampling becomes necessary, where a random sample, L, of the M possible arrangements of cell frequencies provides for a comparison of κ test statistics calculated on the L random tables with the κ test statistic calculated on the observed table.
10.5.3.1 Unweighted Kappa
For the frequency data given in Table 10.14, assign the number 1 to the categories labeled “Pro” and the number 2 to the categories labeled “Con.” Then the cell disagreement “weights” for unweighted kappa are given by
Following Eq. (10.6) on p. 605, Cohen’s unweighted kappa coefficient is κ = +0.0561, indicating approximately 6% agreement among the b = 4 judges above that expected by chance. If κ o denotes the observed value of κ, the approximate Monte Carlo resampling probability value based on L = 1, 000, 000 random arrangements of the cell frequencies, given the observed marginal frequency distributions, is
10.5.3.2 Weighted Kappa
For the frequency data given in Table 10.14, assign the number 1 to the categories labeled “Pro” and the number 2 to the categories labeled “Con.” Then the linear cell disagreement weights are given by
and the quadratic cell disagreement weights are given by
for i, j, k, l = 1, …, c.
Table 10.15 lists the 16 cell indices and the associated linear and quadratic weights for a 2×2×2×2 agreement table. Note that for c = 2 categories, the linear and quadratic weights are identical.
Following Eq. (10.6) on p. 605, Cohen’s weighted kappa with linear weighting is κ w = +0.0654, indicating approximately 7% agreement among the b = 4 judges above that expected by chance. If κ o denotes the observed value of κ w, the approximate Monte Carlo resampling probability value based on L = 1, 000, 000 random arrangements of the cell frequencies, given the observed row, column, slice, and level marginal frequency distributions, {33, 43}, {34, 42}, {39, 37}, and {44, 32}, respectively, is
Because, with c = 2 categories, the linear and quadratic weights are the same, the results are identical to those obtained with quadratic weighting, i.e., κ w = +0.0654 and P = 0.0040.
10.6 McNemar’s and Cochran’s Q Tests for Change
In 1947 Quinn McNemar proposed a test for change over k = 2 time periods [18]. In 1950 William Cochran developed a test for change for k ≥ 2 time periods [4]. For k = 2, Cochran’s Q test for related proportions is identical to McNemar’s Q test for related proportions. The McNemar and Cochran Q tests are described in detail in Chap. 4, Sects. 4.6 and 4.7, respectively.
10.6.1 McNemar’s Q Test for Change
Represent a 2×2 contingency table as in Table 10.16. Then, McNemar’s test for change is given by
where B and C represent the two cells of change, i.e., Pro to Con and Con to Pro.
10.6.1.1 Illustration
To illustrate the calculation of probability values for McNemar’s Q test for change, consider the frequency data given in Table 10.17, where N = 9 subjects have been recorded as either Pro or Con on a specified issue at Time 1 and again on the same issue at Time 2. For the frequency data given in Table 10.17, the observed value of McNemar’s Q test statistic is
The exact probability value of an observed value of Q, under the null hypothesis, is given by the sum of the hypergeometric point probability values associated with the Q values equal to or greater than the observed value of Q. For the frequency data given in Table 10.17, there are only
possible, equally-likely arrangements in the reference set of all permutations of cell frequencies given the two cell frequencies of change, 5 and 1, and only two Q values are equal to or greater than the observed value of Q = 2.6667. The exact upper-tail probability of the observed Q value is P = 0.9167, i.e., the sum of the hypergeometric point probability values associated with values of Q = 2.6667 or greater.
More specifically, Table 10.18 displays the complete reference set of three possible 2×2 contingency tables given the row and column marginal frequency distributions, {7, 2} and {3, 6}, respectively. For Table A in Table 10.18, Q = 2.0000 and the associated hypergeometric point probability value is p = 0.0833. For Table B in Table 10.18, the observed table, Q = 2.6667 and the associated hypergeometric point probability value is p = 0.5000. And for Table C in Table 10.18, Q = 4.0000 and the associated hypergeometric point probability value is p = 0.4167. Thus, the cumulative hypergeometric probability value for Q = 2.6667 is the sum of the hypergeometric point probability values associated with values of Q = 2.6667 or greater; in this case, the probability values associated with Q = 2.6667 and Q = 4.0000, i.e., P = 0.5000 + 0.4167 = 0.9167.
McNemar’s Q test statistic is approximately distributed as chi-squared with 1 degree of freedom. While no responsible researcher would knowingly fit a chi-squared distribution function to only three possible outcomes, small samples, such as in Table 10.17, sometimes occur inadvertently. Suppose a researcher is employed by a national food service provider and begins with a reasonable, but small sample of subjects. As the research analysis proceeds, an interest develops in a subset of subjects composed of only women, breast-feeding their first child, and residing on a Native American reservation. Such unplanned small samples are relatively common and are not suitable for a conventional analysis. The chi-squared value for the observed data in Table 10.17 is χ 2 = 0.3214 and the probability value is P = 0.5708, which, as expected, is far removed from the exact probability value of P = 0.9167.
10.6.1.2 Example
A more realistic example illustrating McNemar’s Q test for change is given in Table 10.19, where N = 70 subjects were recorded as either Pro or Con on a specified issue at Time 1 and again on the same issue at Time 2. At Time 1, 40 of the 70 subjects were in favor of the issue and 30 subjects were opposed. At Time 2, 50 subjects were in favor and 20 were opposed. Of those subjects that changed, seven changed from Pro to Con and 17 changed from Con to Pro. For the frequency data given in Table 10.19, McNemar’s test statistic is
For the frequency data given in Table 10.19, there are only
possible, equally-likely arrangements in the reference set of all permutations of cell frequencies given the observed row and column marginal frequency distributions, {40, 30} and {50, 20}, respectively, making an exact permutation analysis possible. Since M = 21 is a reasonably small number of arrangements, it will be illustrative to list the 21 sets of cell frequencies, McNemar’s Q values, and the associated hypergeometric point probability values in Table 10.20, where the rows with hypergeometric probability values associated with Q values equal to or greater than the observed value of Q = 4.1667 are indicated with asterisks.
If the M = 21 possible arrangements of the frequency data given in Table 10.19 occur with equal chance, the exact probability of Q under the null hypothesis is the sum of the hypergeometric point probability values associated with Q = 4.1667 or greater. For the frequency data given in Table 10.19, the exact upper-tail probability of the observed value of Q value is
For comparison, the value of chi-squared for the frequency data given in Table 10.19 is χ 2 = 5.6058 and with 1 degree of freedom, the probability value is P = 0.0179, which compares favorably with the exact probability value of P = 0.0180.
10.6.2 Cochran’s Q Test for Change
Cochran’s Q test for k ≥ 2 treatments can be considered an extension of McNemar’s Q test for k = 2 treatments or time periods. Cochran’s Q test is described more completely in Chap. 4, Sect. 4.7.
Cochran’s Q test for the analysis of k treatment conditions (columns) and N subjects (rows) is given by
where
is the number of 1s in the jth of k columns,
is the number of 1s in the ith of N rows,
and x ij denotes the cell entry of either 0 or 1 associated with the ith of N rows and the jth of k columns. The null hypothesis stipulates that each of the
distinguishable arrangements of 1s and 0s within each of the N rows occurs with equal probability, given that the values of R 1, …, R N are fixed [21].
10.6.2.1 Example
To illustrate Cochran’s Q test for change, consider the binary data listed in Table 10.21 consisting of the responses (1 or 0) for N = 9 subjects evaluated over k = 3 time periods, where a 1 indicates success on a prescribed task and a 0 indicates failure. For the binary data listed in Table 10.21,
and, following Eq. (10.7) on p. 612, the observed value of Cochran’s Q is
For the binary data listed in Table 10.21, there are only
possible, equally-likely arrangements in the reference set of all permutations of the observed binary data, making an exact permutation analysis feasible. Based on M = 19, 683 arrangements of the observed data, there are 312 Q values equal to or greater than the observed value of Q = 8.2222. If Q o denotes the observed value of Q, the exact upper-tail probability value of the observed data is
For comparison, under the null hypothesis Cochran’s Q is approximately distributed as chi-squared with k − 1 degrees of freedom. The approximate probability of Q = 8.2222 with k − 1 = 3 − 1 = 2 degrees of freedom is P = 0.0164.
10.7 Fisher’s Exact Probability Test
Fisher’s exact probability test was independently developed by R.A. Fisher , Joseph Irwin , and Frank Yates in the early 1930s [11, 14, 34]. Characteristically, Fisher’s exact test is applied to 2×2 contingency tables, but can be generalized and extended to more complex contingency tables. The eponymous exact test for 2×2 tables and several extensions are detailed in Chap. 4, Sects. 4.11 and 4.12. In this chapter on fourfold contingency tables, only 2×2 and 2×2×2 contingency tables are considered.
10.7.1 Analysis of 2×2 Contingency Tables
Consider a 2×2 contingency table containing N cases, where x o denotes the observed frequency of any cell and r and c represent the row and column marginal frequency totals, respectively, corresponding to x o. Table 10.22 illustrates the notation for a 2×2 contingency table.
Given the notation in Table 10.22, Fisher’s exact test for 2×2 contingency tables is given by
where \(a = \max (0,r+c-N)\), \(b = \min (r,c)\), and the hypergeometric point probability value is given by
To illustrate Fisher’s exact probability test for a multi-way contingency table, consider the 2×2 contingency table given in Table 10.23 where x o = 13, r = 15, c = 20, and N = 30.
For the frequency data given in Table 10.23, there are only
possible, equally-likely arrangements in the reference set of all permutations of cell frequencies given the observed row and column marginal frequency distributions, {15, 15} and {20, 10}, respectively, making an exact permutation analysis possible. Table 10.24 lists the M = 11 possible values of x and associated hypergeometric point probability values to nine decimal places.
The exact probability value is obtained by summing all the hypergeometric point probability values equal to or less than the hypergeometric point probability value of the observed table, indicated with asterisks in Table 10.24. Thus,
for the upper tail of the distribution, i.e., the sum of the hypergeometric point probability values associated with x = 13, 14, and 15. Since the probability distribution is symmetric in this case, the exact hypergeometric probability value is twice the probability of the upper tail, i.e., P = 2(0.0251) = 0.0502.
10.7.2 Analysis of 2×2×2 Contingency Tables
Analyses of multi-way contingency tables are more complex than simple two-way tables; see Chap. 4, Sect. 4.12. For a two-way contingency table the degrees of freedom are given by df = (r − 1)(c − 1), where r denotes the number of rows and c denotes the number of columns. Thus, in the case of a 2×2 contingency table the degrees of freedom are (2 − 1)(2 − 1) = 1 and only one cell frequency need be permuted over its range. In the 2×2 example above, the chosen cell (A 1B 1) was designated as x in Table 10.22.
For multi-way contingency tables the degrees of freedom are given by
where r denotes the number of dimensions and c i denotes the number of categories in each dimension, i = 1, …, r [24, p. 309]. Thus, for a 2×2×2 contingency table with c = 2 disjoint categories in each of r = 3 dimensions,
Consider a 2×2×2 contingency table where n ijk denotes the cell frequency of the ith row, jth column, and kth slice for i, j, k = 1, 2. Let A = n 1.., B = n .1., C = n ..1, and N = n … denote the observed marginal frequency totals of the first row, first column, first slice, and entire table, respectively, such that 1 ≤ A ≤ B ≤ C ≤ N∕2. Also, let w = n 111, x = n 112, y = n 121, and z = n 211 denote four cell frequencies of the 2×2×2 contingency table. Then, the probability for any specified w, x, y, and z is given by
[26].
The bounds for w, x, y, and z are
and
respectively, where M w = A, M x = A − w, M y = A − w − x, \(M_{z} = \min (B-w-x,C-w-y)\), and \(L_{z} = \max (0,A+B+C-N-2w-x-y)\). If w o, x o, y o, and z o denote the values of w, x, y, and z in the observed contingency table, then Fisher’s exact probability value for a 2×2×2 contingency table is given by
where
To illustrate Fisher’s exact probability test, consider the 2×2×2 contingency table given in Table 10.25 where N = 75 and the observed values of w, x, y, and z are w o = 13, x o = 8, y o = 4, and z o = 18. For the frequency data given in Table 10.25 there are M = 77, 910 possible arrangements in the reference set of all permutations of cell frequencies given the observed row, column, and slice marginal distributions, {44, 31}, {44, 31}, and {44, 31}, respectively, making an exact permutation analysis feasible. Fisher’s exact probability is the sum of the hypergeometric point probability values equal to or less than the probability value associated with the observed contingency table; in this case, there are 2,991 tables with probability values equal to or less than the probability value of the observed table, i.e., p = 0.1743×10−4, yielding P = 0.0384.
10.8 Contingency Table Interactions
It is occasionally necessary to test the independence among multiple classification variables, each of which consists of two mutually exclusive classes, e.g., a 2×2×2 or 23 contingency table. In this section exact permutation procedures are described for analyzing interactions in 2×2×2 and 2×2×2×2 contingency tables.
10.8.1 Analysis of 2×2×2 Contingency Tables
Mielke, Berry, and Zelterman provided a procedure for determining the exact global probability value obtained from an examination of all possible arrangements of the eight cell frequencies of a 2×2×2 contingency table, conditioned on the observed marginal frequency totals [26]. An alternative approach that is not as computationally intensive and, quite possibly, more fruitful is to examine the first- and second-order interactions of a 2×2×2 table when the observed marginal frequency totals are considered to be fixed [22]. This approach was first proposed by Bartlett [1] and has been discussed by Darroch [9, 10], Haber [12, 13], Odoroff [27], Plackett [29], Pomar [30], Simpson [33], and Zachs and Solomon [35]. In this section an algorithm is described that computes the exact probability values of the three first-order (two-variable) interactions and the single second-order (three-variable) interaction.
The logic on which the algorithm is based was apparently first developed by Lambert Adolphe Jacques Quetelet to calculate binomial probability values in 1846 [31]. Beginning with a small arbitrary initial value, a simple recursion procedure generates relative frequency values for all possible 2×2×2 contingency tables, given the observed marginal frequency totals. The desired exact probability value is obtained by summing the relative frequency values equal to or less than the observed relative frequency value and dividing the resultant sum by the unrestricted relative frequency total.
Consider a sample of N independent observations arranged in a 2×2×2 contingency table. Let n ijk denote the observed cell frequency of the ith row, jth column, and kth slice, and let p ijk denote the corresponding cell probability for i, j, k = 1, 2. Also let n .jk, n i.k, n ij., n 1.., n .j., n ..k, and n … indicate the observed marginal frequency totals of the 2×2×2 contingency table, and let the corresponding marginals over p ijk be indicated by p .jk, p i.k, p ij., p 1.., p .j., p ..k, and p …, respectively, for i, j, k = 1, 2. Because the categories are mutually exclusive and exhaustive, n … = N and p … = 1.
Let r denote the number of dimensions and c i denote the number of categories in each dimension, i = 1, …, r. Then for a 2×2×2 contingency table there are
degrees of freedom and, consequently, four interaction terms to be considered: three first-order and one second-order. Following Bartlett, the null hypotheses for the three first-order interactions are
and
[1]. The null hypothesis for the second-order interaction is
For simplicity, set x = n 111, a = n .11, b = n 1.1, c = n 11., A = n 1.., B = n .1., C = n ..1, and N = n …. The point probability of x is given by
If H(k), given a, b, c, A, B, C, and N, is a recursively defined positive function, then solving the recursive relation H(k + 1) = H(k) × g(k) yields
which may be used to enumerate the distribution of P(k|a, b, c, A, B, C, N), v ≤ k ≤ w, where
and where H(v) is initially set to some small value, such as 10−20. The total over the completely enumerated distribution may be found by
The exact second-order interaction probability value is found by
where
10.8.1.1 A 2×2×2 Contingency Table Example
Table 10.26 depicts a 2×2×2 contingency table based on N = 76 responses to a question (Yes, No) classified by gender (Female, Male), in two elementary school grades (First, Fourth).
Table 10.27 provides the cell frequencies for Grade by Gender, conditioned on Response. The first-order interaction probability value associated with the cell frequencies in Table 10.27 is
Table 10.28 provides the cell frequencies for Gender by Response, conditioned on Grade. The first-order interaction probability value associated with the cell frequencies in Table 10.28 is
Table 10.29 provides the cell frequencies for Grade by Response, conditioned on Gender The first-order interaction probability value associated with the cell frequencies in Table 10.29 is
The second-order interaction probability value for the frequency data given in Table 10.29 is P = 0.9036×10−3 and the global probability of a table this extreme or more extreme than the observed table in Table 10.29 is P = 0.4453×10−2 [26].
10.8.2 Analysis of 2×2×2×2 Contingency Tables
Utilizing the recursion procedure presented in the previous example, it is possible to analyze a 2×2×2×2 or 24 contingency table [23]. The conditional probability value of a 2×2×2×2 contingency table is a special case of the conditional probability of an r-way contingency table as defined in Mielke and Berry [20]. Zelterman, Chan, and Mielke [36] provided an algorithm for the exact global probability value obtained from an examination of all possible arrangements of the 16 cell frequencies of a 2×2×2×2 contingency table, conditioned on the observed marginal frequency totals. An alternative approach is to examine the first-, second-, and third-order interactions in a 2×2×2×2 table when the observed marginal frequency totals are considered to be fixed.
Let r denote the number of dimensions and c i denote the number of categories in each dimension, i = 1, …, r, then for a 2×2×2×2 contingency table there are
degrees of freedom and, consequently, 11 interaction terms to be considered: six first-order and four second-order, and one third-order. In this section, a procedure is described for computing the exact probability values of the six first-order (two-variable) interactions, the four second-order (three-variable) interactions, and the single third-order (four-variable) interactions for a 2×2×2×2 contingency table.
Following Mielke [19], let \(p_{i_{1}i_{2}i_{3}i_{4}}\) denote the probability of cell i 1i 2i 3i 4 in a 2×2×2×2 contingency table, where the index i j = 1 or 2 for j = 1, 2, 3, 4. The six null hypotheses of no first-order interactions for a 2×2×2×2 contingency table are
and
where the usual summation convention is employed. Thus, p 0101 is the sum over indices i 1 and i 3. The four null hypotheses of no second-order interaction for a 2×2×2×2 contingency table are
and
The null hypothesis of no third-order interaction for a 2×2×2×2 contingency table is given by
Table 10.30 contains data from a 2×2×2×2 contingency table based on N = 1, 356 responses classified on four dichotomous variables: A, B, C, and D. The first-, second-, and third-order interaction exact probability values associated with the data listed in Table 10.30 are given in Table 10.31.
10.9 Coda
Chapter 10 applied exact and Monte Carlo permutation statistical methods to measures of association for symmetrical 2×2 contingency tables. Included in Chap. 10 were discussions of Pearson’s ϕ, Tschuprov’s T 2, and Cramér’s V 2 coefficients of contingency, Pearson’s product-moment correlation coefficient, Leik and Gove’s \(d_{N}^{\,c}\) measure, Goodman and Kruskal’s t a and t b asymmetric measures of nominal association, Kendall’s τ b and Stuart’s τ c measures of ordinal association, Somers’ d yx and d xy asymmetric measures of ordinal association, Yule’s Y measure of nominal association, simple percentage differences, and Cohen’s unweighted and weighted κ measures of inter-rater agreement.
Chapter 10 concluded with an examination of extensions to multiple 2×2 contingency tables and 2×2×2 contingency tables, including the Mantel–Haenszel test for combined 2×2 contingency tables, Cohen’s chance-corrected measure of inter-rater agreement, McNemar’s and Cochran’s Q tests for change, Fisher’s exact test for 2×2×2 and 2×2×2×2 contingency tables, and tests for interactions in 2×2×2 and 2×2×2×2 contingency tables.
Notes
- 1.
It was not too many years ago that while μ x and \(\sigma _{x}^{2}\) denoted the population mean and variance, respectively, \(\hat {\mu }_{x}\) and \(\hat {\sigma }_{x}^{2}\) denoted the unbiased sample-estimated population mean and variance. The American Psychological Association presently recommends using M for the sample mean instead of the conventional \(\bar {x}\).
- 2.
The test is often called the Cochran–Mantel–Haenszel test as William Cochran presented essentially the same test in an earlier paper [5].
- 3.
The symbol M for the Mantel–Haenszel test should not be confused with the symbol M for the number of possible, equally-likely arrangements of the observed data under the Fisher–Pitman permutation model.
References
Bartlett, M.S.: Contingency table interactions. Suppl. J. R. Stat. Soc. 2, 248–252 (1935)
Berry, K.J., Johnston, J.E., Mielke, P.W.: Maximum-corrected and chance-corrected measures of effect size for the Mantel–Haenszel test. Psychol. Rep. 107, 393–401 (2010)
Berry, K.J., Mielke, P.W.: A generalization of Cohen’s kappa agreement measure to interval measurement and multiple raters. Educ. Psychol. Meas. 48, 921–933 (1988)
Cochran, W.G.: The comparison of percentages in matched samples. Biometrika 37, 256–266 (1950)
Cochran, W.G.: Some methods for strengthening the common χ 2 test. Biometrics 10, 417–452 (1954)
Cohen, J.: A coefficient of agreement for nominal scales. Educ. Psychol. Meas. 20, 37–46 (1960)
Cohen, J.: Weighted kappa: Nominal scale agreement with provision for scaled disagreement or partial credit. Psychol. Bull. 70, 213–220 (1968)
Cohen, J.: Statistical Power Analysis for the Behavioral Sciences, 2nd edn. Erlbaum, Hillsdale, NJ (1988)
Darroch, J.N.: Interactions in multi-factor contingency tables. J. R. Stat. Soc. B Meth. 24, 251–263 (1962)
Darroch, J.N.: Multiplicative and additive interaction in contingency tables. Biometrika 61, 207–214 (1974)
Fisher, R.A.: The logic of inductive inference (with discussion). J. R. Stat. Soc. 98, 39–82 (1935)
Haber, M.: Sample sizes for the exact test of “no interaction” in 2×2×2 tables. Biometrics 39, 493–498 (1983)
Haber, M.: A comparison of tests for the hypothesis of no three-factor interaction in 2×2×2 contingency tables. J. Stat. Comp. Sim. 20, 205–215 (1984)
Irwin, J.O.: Tests of significance for differences between percentages based on small numbers. Metron 12, 83–94 (1935)
Leik, R.K., Gove, W.R.: Integrated approach to measuring association. In: Costner, H.L. (ed.) Sociological Methodology, pp. 279–301. Jossey Bass, San Francisco, CA (1971)
Maclure, M., Willett, W.C.: Misinterpretation and misuse of the kappa statistic. Am. J. Epidemiol. 126, 161–169 (1987)
Mantel, N., Haenszel, W.: Statistical aspects of the analysis of data from retrospective studies of disease. J. Natl. Cancer I 22, 719–748 (1959)
McNemar, Q.: Note on the sampling error of the differences between correlated proportions and percentages. Psychometrika 12, 153–157 (1947)
Mielke, P.W.: Some exact and nonasymptotic analyses of discrete goodness-of-fit and r-way contingency tables. In: Johnson, N.L., Balakrishnan, N. (eds.) Advances in the Theory and Practice of Statistics: A Volume in Honor of Samuel Kotz, pp. 179–192. Wiley, New York (1997)
Mielke, P.W., Berry, K.J.: Cumulant methods for analyzing independence of r-way contingency tables and goodness-of-fit frequency data. Biometrika 75, 790–793 (1988)
Mielke, P.W., Berry, K.J.: Nonasymptotic inferences based on Cochran’s Q test. Percept. Motor Skill 81, 319–322 (1995)
Mielke, P.W., Berry, K.J.: Exact probabilities for first-order and second-order interactions in 2 × 2 × 2 contingency tables. Educ. Psychol. Meas. 56, 843–847 (1996)
Mielke, P.W., Berry, K.J.: Exact probabilities for first-order, second-order, and third-order interactions in 2×2×2×2 contingency tables. Percept. Motor Skill 86, 760–762 (1998)
Mielke, P.W., Berry, K.J.: Permutation Methods: A Distance Function Approach, 2nd edn. Springer–Verlag, New York (2007)
Mielke, P.W., Berry, K.J., Johnston, J.E.: Resampling probability values for weighted kappa with multiple raters. Psychol. Rep. 102, 606–613 (2008)
Mielke, P.W., Berry, K.J., Zelterman, D.: Fisher’s exact test of mutual independence for 2 × 2 × 2 cross-classification tables. Educ. Psychol. Meas. 54, 110–114 (1994)
Odoroff, C.L.: A comparison of minimum logit chi-square estimation and maximum likelihood estimation in 2×2×2 and 3×2×2 contingency tables: Tests for interaction. J. Am. Stat. Assoc. 65, 1617–1631 (1970)
O’Neill, M.E.: A comparison of the additive and multiplicative definitions of second-order interaction in 2×2×2 contingency tables. J. Stat. Comp. Sim. 15, 33–50 (1982)
Plackett, R.L.: A note on interactions in contingency tables. J. R. Stat. Soc. B Meth. 24, 162–166 (1962)
Pomar, M.I.: Demystifying loglinear analysis: Four ways to assess interaction in a 2×2×2 table. Sociol. Persp. 27, 111–135 (1984)
Quetelet, L.A.J.: Lettres à S. A. R. le Duc Régnant de Saxe–Cobourg et Gotha, sur la Théorie des Probabilitiés Appliquée aux Sciences Morales et Politiques. Hayez, Bruxelles (1846). [English translation, Letters Addressed to H.R.H. the Grand Duke of Saxe Coburg and Gotha on the Theory of Probabilities as Applied to the Moral and Political Sciences, by O.G. Downes and published by Charles & Edwin Layton, London, 1849]
Rosenthal, R.: Parametric measures of effect size. In: Cooper, H., Hedges, L.V. (eds.) The Handbook of Research Synthesis, pp. 231–234. Russell Sage, New York (1994)
Simpson, E.H.: The interpretation of interaction in contingency tables. J. R. Stat. Soc. B Meth. 13, 238–241 (1951)
Yates, F.: Contingency tables involving small numbers and the χ 2 test. Suppl. J. R. Stat. Soc. 1, 217–235 (1934)
Zachs, S., Solomon, H.: On testing and estimating the interaction between treatments and environmental conditions in binomial experiments: The case of two stations. Commun. Stat. Theor. M 5, 197–223 (1976)
Zelterman, D., Chan, I.S., Mielke, P.W.: Exact tests of significance in higher dimensional tables. Am. Stat. 49, 357–361 (1995)
Author information
Authors and Affiliations
Rights and permissions
Copyright information
© 2018 Springer Nature Switzerland AG
About this chapter
Cite this chapter
Berry, K.J., Johnston, J.E., Mielke, P.W. (2018). Fourfold Contingency Tables, II. In: The Measurement of Association. Springer, Cham. https://doi.org/10.1007/978-3-319-98926-6_10
Download citation
DOI: https://doi.org/10.1007/978-3-319-98926-6_10
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-98925-9
Online ISBN: 978-3-319-98926-6
eBook Packages: Mathematics and StatisticsMathematics and Statistics (R0)