Keywords

6.1 Basic Association Models for Two-way Tables

We realized so far that in the context of classical log-linear models there are just two options for modeling two-way contingency tables: the parsimonious but restrictive model of independence (4.1) and the saturated. Association models fill the gap between these two extreme cases by imposing a special structure on the association and reducing the number of interaction parameters, providing thus intermediate models of dependence. For ease in understanding but also for interpretation purposes, it is convenient to think in terms of local associations in the table and first define the models on local odds ratios rather than cell frequencies. Recall that for models applied on an I × J contingency table there always exists an equivalent expression defining them on the (I − 1) × (J − 1) table of the corresponding set of local odds ratios.

In most of the cases association models apply to ordinal classification variables and are thus usually introduced as models for ordinal data. However, some of them do not require ordinality, as we shall see later on in this chapter.

6.1.1 Linear-by-Linear Association Model

We have seen in Sect. 2.2.5 that for an I × J contingency table and under the model of independence, all the local odds ratios are equal to 1, i.e., \(\theta _{ij}^{L} = 1\), i = 1, I − 1; j = 1, J − 1. Whenever the model of independence is of poor fit, the only alternative in the framework of classical log-linear models is the saturated model (4.5), which assumes all \(\theta _{ij}^{L}\)’s to be free parameters and is noneffective in summarizing the underlying significant association. A natural way to proceed is to assume a pattern for this underlying association. This way, the number of parameters to be estimated is reduced and, most important, we can provide a meaningful interpretation. The easiest pattern to think of, which is meanwhile of clear and strong interpretational power, is that of constant \(\theta _{ij}^{L}\)’s, as under independence, but different than 1. That is, to introduce the model

$$\displaystyle{ \theta _{ij}^{L} = c,i = 1,\ldots,I - 1\,\ \ j = 1,\ldots,J - 1, }$$
(6.1)

for some c > 0, to be estimated. This model allows for interaction while remains parsimonious, since it has just one parameter more than the independence model, the parameter c. Under the independence model, all possible odds ratios \(\theta _{ij}^{k\ell}\) of the table are equal to 1. Under (6.1), local association is uniform, since all the local odds ratios are equal to c. This property characterizes model (6.1), which is therefore called uniform association model, denoted as U. When it comes to the odds ratio \(\theta _{ij}^{k\ell}\) of any 2 × 2 subtable of our initial table, through (2.46), (6.1) takes the form

$$\displaystyle{ \theta _{ij}^{k\ell} = {c}^{(k-i)(\ell-j)},\ i = 1,\ldots,I - 1\,\ j = 1,\ldots,J - 1\,\ i < k \leq I\,\ j <\ell\leq J\, }$$

and in log-scale

$$\displaystyle{ \log \theta _{ij}^{k\ell} = (k - i)(\ell-j)\log c,\ \ \ i = 1,\ldots,I - 1\,\ j = 1,\ldots,J - 1\, }$$
(6.2)

for i < k ≤ I ,  j <  ≤ J. Under the U model, the general \(\theta _{ij}^{k\ell}\) odds ratio is influenced by the categories of each classification variable but only through their distances. Hence, odds ratios formed by cells further apart will exhibit stronger association. Measuring thus how far apart are two categories of a classification variable is crucial. Distances between categories are meaningful only when the corresponding classification variable is ordinal. Hence the U model makes sense to be considered only for tables with both classification variables ordinal or with one ordinal and the other binary.

The U model assumes that all successive categories of a classification variable are equidistant. However, there can arise ordinal variables of non-equidistant successive categories. A typical example of this type is a categorized income variable, which is actually interval scaled with categories corresponding to intervals of unequal length. A flexible way to handle such situations is to assign scores to the categories of the classification variables and express their distances by the corresponding scores’ differences. Thus, let \(\{\mu _{1},\ \mu _{2},\ldots,\ \mu _{I}\}\) and \(\{\nu _{1},\ \nu _{2},\ldots,\ \nu _{J}\}\) be the scores assigned to the row and column categories, respectively. The simplest and most natural choice for the scores is μ i  = i (i = 1, , I) and ν j  = j (j = 1, , J), which corresponds to model (6.2). Allowing the scores to take other values as well and setting \(\varphi =\log c\), we are led to model

$$\displaystyle{ \log \theta _{ij}^{k\ell} =\varphi (\mu _{ k} -\mu _{i})(\nu _{\ell} -\nu _{j}),\ \ \ i = 1,\ldots,I - 1\,\ j = 1,\ldots,J - 1\, }$$
(6.3)

with i < k ≤ I ,  j <  ≤ J, for which scores of successive row or column categories are not necessarily equidistant. Their distance is meant in terms of their similarity as they interact with the other classification variable. Thus different scores may be assigned to the same levels of a classification variable X when interacting with different variables Y or Z. This will be illustrated in a three-way contingency table example in Sect. 6.7.1. Regarding the scores’ assignment, refer also to the related discussion in Sect. 2.3.1.

For non-equidistant scores for successive categories, the local odds ratios under (6.3) are no more all equal but proportional (in log-scale) to the distance between the enrolled categories of each classification variable. Due to this linear dependence on each of the classification variables, model (6.3) is called the linear-by-linear association model (LL).

Though the interpretation of these models is clear and natural when formulated in terms of the local odds ratios, the development of inferential aspects and model fitting is more straightforward for their equivalent formulation in terms of expected cell frequencies. Recalling that the saturated log-linear model in terms of θ ij is provided by (4.7) and equating (4.7) to (6.3) for k = i + 1 and  = j + 1, we conclude that the (i, j)th interaction term under the LL model has the form \(\lambda _{ij}^{XY } =\varphi \mu _{i}\nu _{j}\). Hence, the equivalent expression of LL model (6.3) in terms of expected cell frequencies is

$$\displaystyle{ \log m_{ij} =\lambda +\lambda _{i}^{X} +\lambda _{ j}^{Y } +\varphi \mu _{ i}\nu _{j},\ \ i = 1,\ldots,I,\ \ j = 1,\ldots,J, }$$
(6.4)

where the overall mean and the main effects parameters are those of the classical log-linear model.

Model (6.4) reduces to the U model and is thus equivalent to (6.2), not just for μ i  = i, (i = 1, , I) and ν j  = j, (j = 1, , J) but for any choice of row and column scores \(\{\mu _{1},\ \mu _{2},\ldots,\ \mu _{I}\}\) and \(\{\nu _{1},\ \nu _{2},\ldots,\ \nu _{J}\}\), as long as they are both equidistant for successive categories. This is due to model’s LL property of being invariant in linear transformation of the scores, as we shall see in Sect. 6.4. Thus, for identifiability purposes, usually the scores are set to satisfy the sum-to-zero and the sum of squares-to-one constraints

$$\displaystyle\begin{array}{rcl} \sum _{i=1}^{I}\mu _{ i} = 0\qquad \text{and}\qquad \sum _{i=1}^{I}\mu _{ i}^{2} = 1,& &{}\end{array}$$
(6.5)
$$\displaystyle\begin{array}{rcl} \sum _{j=1}^{J}\nu _{ j} = 0\qquad \text{and}\qquad \sum _{j=1}^{J}\nu _{ j}^{2} = 1.& &{}\end{array}$$
(6.6)

For scores satisfying the (6.5) and (6.6) constraints, multiplying (6.4) by \(\mu _{i}\nu _{j}\) and adding over i and j leads to

$$\displaystyle{ \varphi =\sum _{i,j}\mu _{i}\nu _{j}\log m_{ij}\, }$$
(6.7)

i.e., \(\varphi\) measures the correlation between row and column scores, fact that justifies its characterization as intrinsic association parameter.

6.1.2 Example 6.1

We shall demonstrate the utility and interpretation power of this parsimonious association model with just one degree of freedom less than complete independence by an example. We shall first focus on explaining the nature and use of such a model and we will provide inferential details and application in software at a later stage. The data used are from a survey on the use of cannabis among students, conducted at the University of Ioannina (Greece) in 1995 and published in Marselos et al. (1997). The students’ frequency of alcohol consumption is measured on a four-level scale ranging from at most once per month up to more frequent than twice per week while their trial of cannabis through a three-level variable (never tried–tried once or twice–more often). These two ordinal variables are cross-classified leading to a 4 × 3 table provided in Table 6.1.

These data provide strong evidence against the independence model (4.1), since the corresponding LR test statistic is G 2(I) = 152. 793, which is highly significant with an asymptotic p-value < 0. 00005 (df = 6). In the context of classical log-linear models the only alternative is to add the interaction term \(\lambda _{ij}^{XY }\) in the model and end up thus to the saturated model.

Taking advantage of the ordinal nature of the classification variables, we apply the U model to the data of Table 6.1, by fitting model (6.4) with μ i  = i (i = 1, , 4) and ν j  = j (j = 1, 2, 3). Thus, we introduce just one additional parameter to the independence model, the \(\varphi\). The LR test statistic for model (6.4) equals G 2(U) = 1. 469, leading to a reduction of 151.324 from G 2(I) by sacrificing just 1 df. This model is of impressive fit with p-value = 0. 92. The cell estimates under U are provided in parentheses in Table 6.1.

Table 6.1 Students’ survey about cannabis use at the University of Ioannina, Greece (1995)

As already mentioned, under the U model, the local odds ratios \(\theta _{ij}^{L}\) are constant all over the table. The corresponding sampling values for the local odds ratios are provided in Table 6.2. In this case, the association parameter \(\varphi\) is estimated as \(\hat{\varphi }= 0.803\) and furthermore \(\hat{\theta }_{ij}^{L} =\hat{\theta }=\exp (\hat{\varphi }) =\exp (0.803) = 2.23\), for all i = 1, 2, 3 and j = 1, 2. This means that the odds of having tried cannabis once or twice vs. never tried is 2.23 times higher for students who drink twice a month than those who drink at most once a month. The same comparison holds for any odds ratio comparing successive row and successive column categories.

Table 6.2 Sample local odds ratios for the students’ survey about cannabis use at the University of Ioannina, Greece (1995)

If we would like to compare any non-successive categories, the results can be adjusted accordingly. For example, for the odds ratio formed by the corner (“extreme”) cells of the table, it holds

$$\displaystyle{\frac{\hat{\pi }_{11}\hat{\pi }_{43}} {\hat{\pi }_{13}\hat{\pi }_{41}} =\exp \left (\hat{\varphi }(\mu _{4} -\mu _{1})(\nu _{3} -\nu _{1})\right ) =\exp \left (\hat{\varphi }\cdot 3 \cdot 2\right ) = 123.387\,}$$

meaning that the odds of using often cannabis instead of never tried is 123 times higher for student who drink more often than twice a week than for students who drink at most once a month.

6.1.3 Row and Column Effect Models

The LL model presented above is a very parsimonious and useful model of strong interpretation power when it is applicable. However, often it is proved insufficient. It can be the case that the structure of model (6.4) is appropriate but there is no obvious way of deciding about the scores of one of the classification variables. It is then natural to broaden model (6.4) to a class of more flexible association models by relaxing the assumptions about known scores. Model (6.4) with unknown row scores \(\{\mu _{1},\ldots,\mu _{I}\}\) and thus parameters to be estimated is the row effect association model, to be denoted as R. Under this model, the odds defined over the column classification variable vary from row to row, i.e., the effect of the row classification variable on the column odds is significant but unknown. This effect is reflected in the row scores and more precisely in the unknown (and unequal) distances between successive row categories. Model R has I − 2 additional parameters than model LL, corresponding to the row scores. The number of parameters is reduced by two, due to the identifiability constraints (6.5) that hold. Thus, the associated df of model R equal (I − 1)(J − 2). Analogously, the column effect association model C is defined by expression (6.4) for known row scores and unknown column scores \(\{\nu _{1},\ldots,\nu _{J}\}\). It models the effect of the column classification variable on the row odds. The associated df are df(C) = (I − 2)(J − 1).

We have seen in the context of the U association model that its definition in terms of odds ratios (6.1) is more natural with respect to interpretation. The R model is equivalently defined in terms of local odds ratios as

$$\displaystyle{ \theta _{ij}^{L} = c_{ 1i}\,\ \ i = 1,\ldots,I - 1\,\ j = 1,\ldots,J - 1, }$$
(6.8)

and the C model as

$$\displaystyle{ \theta _{ij}^{L} = c_{ 2j}\,\ \ i = 1,\ldots,I - 1\,\ j = 1,\ldots,J - 1. }$$
(6.9)

Expression (6.8) reveals the dependence of the column odds on the row category while the analogue statement is true for the C model (6.9).

In terms of local odds ratios and categories’ scores, model R is expressed as

$$\displaystyle{ \log \left (\theta _{ij}^{L}\right ) =\varphi (\mu _{ i+1} -\mu _{i})(\nu _{j+1} -\nu _{j}),\ \ i = 1,\ldots,I - 1\,\ j = 1,\ldots,J - 1, }$$
(6.10)

with parametric row scores {μ i ,  i = 1, , I} and known (equidistant) column scores {ν j ,  j = 1, , J}. Analogously, model C is (6.10) with parametric column scores and known (equidistant) row scores.

6.1.4 Row by Column Effect Model

The LL, U, R, and C models considered so far are special types of log-linear models. The LL model is applicable on two-way tables when both the classification variables are ordinal. The R and C models are less restrictive about the nature of the underlying classification variables and thus also less parsimonious. They allow the row or column classification variable, respectively, to be ordinal but with unknown distances between the scores assigned to its successive categories or even nominal. This is achieved by considering the row or the column scores as unknown parameters to be estimated. Furthermore, a more flexible model can be defined by (6.4), considering the row and the column score vectors to be both unknown parameters. Thus, we model a multiplicative row by column association. This model, denoted by RC, is no more linear in its parameters and their estimation is not straightforward. The estimation problem will be faced in Sect. 6.2.

In terms of local odds ratios, the RC model is defined by

$$\displaystyle{ \theta _{ij}^{L} = c_{ 1i}c_{2j}\,\ \ \ i = 1,\ldots,I - 1\,\ \ j = 1,\ldots,J - 1\, }$$

allowing the effect of each classification variable on the odds defined by the other one to vary from category to category. In log-scale, the RC model is given by (6.10) with parametric (unknown) row and column scores. The RC model does not require ordinality for any of the classification variables. Thus it can be applied in tables of nominal variables as well. Of course, scores’ assignment is more natural for ordinal variables.

The association models considered so far are all defined in terms of expected cell frequencies by

$$\displaystyle{ \log m_{ij} =\lambda +\lambda _{i}^{X} +\lambda _{ j}^{Y } +\varphi \mu _{ i}\nu _{j},\ \ i = 1,\ldots,I,\ \ j = 1,\ldots,J, }$$
(6.11)

i.e., by expression (6.4). Thus, all association models considered so far are defined by the same expression (6.4) and differentiated by the assumptions made for the nature of the scores, known or unknown parameters. They are summarized in Table 6.3. The U model is a special LL model and is not listed in the table.

Table 6.3 Association models and related df. The U model is a special LL model
Table 4

6.1.5 Example 6.1 (Revisited)

Revisiting the cannabis example, we fit in Table 6.1 the R, C, and RC models. The test statistic values along with their corresponding significance are provided in Table 6.4. The estimates of parametric scores as well as the values of the fixed scores for these models are provided in Table 6.5. The estimated score parameters for the rows and the columns are close to be equidistant for two successive score parameters. Thus it seems not to be worth to adopt a more complex model than U. This is verified also from the corresponding goodness-of-fit tests, where we see that moving from the simple U model to less parsimonious association models, the fit improvement is very minor. A more detailed discussion on association model selection will be carried out in Sect. 6.3.

Table 6.4 LR goodness-of-fit tests for the independence and the association models applied in Table 6.1
Table 6.5 ML estimates for parameters and fixed scores values for the U, R, C, and RC models applied in Table 6.1. Values in italics correspond to fixed scores

Focusing on the C and RC models, the estimated local odds ratios under these models are provided in Table 6.6. Recall that under the U model, the common local odds ratios estimate is 2.23. We can verify that the expected local odds ratios under the C model are column dependent, i.e., the value is common in each column but differs from column to column while under the more general RC model they are row and column dependent, thus all different to each other. However, the estimated local odds ratios are not that different to justify the use of more complicated models than the simple U model, which was of impressive fit. Under the C model, the odds of having tried cannabis once or twice vs. never tried is 2.38 times higher for those who are one level higher in the alcohol consumption scale, no matter what this level is. The odds in the second column of Table 6.6 can be interpreted similarly. In this example we did not refer at all at model R since it is less parsimonious of C and of worse fit.

Table 6.6 Estimated local odds ratios under the RC model and under the C model (in parentheses) for the students’ survey about cannabis use at the University of Ioannina, Greece (1995)

6.2 Maximum Likelihood Estimation for Association Models

For any association model, the maximum likelihood estimation approach is that described in Sect. 4.2 for log-linear models. Thus, independent of the underlying sampling scheme, ML estimates of an association model’s parameters and eventually of its expected cell frequencies m ij are achieved by maximizing the Poisson log-likelihood kernel (4.13) with respect to the parameters of the model. Substituting m ij by the association model expression and equating the partial derivative of (4.13) with respect to a parameter of the model to zero, one is led to the likelihood equation corresponding to this parameter.

The likelihood equations with respect to the main effect parameters of model (6.4) are the same as the corresponding of the two-way standard log-linear model, given in (4.14). For the more general RC model where both set of scores are parameters, the likelihood equations for the row scores (\(\mu _{1},\ldots,\mu _{I}\)) are derived as

$$\displaystyle{ \sum _{j}\nu _{j}(\hat{m}_{ij} - n_{ij}) = 0,i = 1,\ldots,I }$$
(6.12)

while for the column scores (\(\nu _{1},\ldots,\nu _{J}\)) as

$$\displaystyle{ \sum _{i}\mu _{i}(\hat{m}_{ij} - n_{ij}) = 0,j = 1,\ldots,J. }$$
(6.13)

Finally, the likelihood equation corresponding to the intrinsic association parameter \(\varphi\) is

$$\displaystyle{ \sum _{i,j}\mu _{i}\nu _{j}(\hat{m}_{ij} - n_{ij}) = 0. }$$
(6.14)

For the rest of the association models defined by (6.11) by considering the row or column scores or both of them as fixed, the likelihood equations are derived from the above set by eliminating the equations corresponding to known scores. Thus, the likelihood equations for the U model are (4.14) and (6.14) while for the R model (4.14), (6.12), and (6.14). Analogously, the likelihood equations for model C are (4.14), (6.13), and (6.14). Note that the likelihood equation (6.14) is redundant given (6.12) or (6.13). This means that parameter \(\varphi\) is redundant whenever at least one set of scores is parametric and can thus be eliminated. At this point it is worth mentioning that the RC model was introduced by Goodman (1979b, model (4.1b)) in terms of the non-redundant parameters, as

$$\displaystyle{ m_{ij} =\alpha _{i}\beta _{j}\exp (\mu _{i}\nu _{j}),\ \ i = 1,\ldots,I;\ \ j = 1,\ldots,J\, }$$
(6.15)

with unconstrained row and column scores. Introducing the intrinsic association parameter \(\varphi\) with the cost of imposing constraints (6.5) and (6.6) on the scores, Goodman (1979b, model (4.5b)) proposed the equivalent expression

$$\displaystyle{ m_{ij} =\alpha _{i}\beta _{j}\exp (\varphi \mu _{i}\nu _{j}),\ \ i = 1,\ldots,I;\ \ j = 1,\ldots,J. }$$
(6.16)

This way the RC model and its scores are comparable to other standard models (Chap. 7). The multiplicative form (6.16) is also equivalent to the log-form (6.4).

The ML estimates of the parameters of any association model cannot be derived in closed-form expression and the corresponding likelihood equations have to be solved iteratively. The simplest iterative procedure for association models ML estimation is based on the Newton’s unidimensional method. The updating equations (at the tth iteration) for the RC model parameters’ estimation, based on expression (6.16), are

$$\displaystyle\begin{array}{rcl} \alpha _{i}^{(t)}& =& \alpha _{ i}^{(t-1)} \frac{n_{i+}} {\tilde{m}_{i+}}\,\qquad \qquad \qquad \qquad \qquad \quad i = 1,\ldots,I, {}\\ \beta _{j}^{(t)}& =& \beta _{ j}^{(t-1)} \frac{n_{+j}} {\tilde{m}_{+j}}\,\qquad \qquad \qquad \qquad \qquad \quad j = 1,\ldots,J, {}\\ \mu _{i}^{(t)}& =& \mu _{ i}^{(t-1)} + \frac{\sum _{j}\nu _{j}^{(t-1)}(n_{ ij} -\tilde{ m}_{ij})} {{\tilde{\varphi }}^{(t-1)}\sum _{j}{\left (\nu _{j}^{(t-1)}\right )}^{2}\tilde{m}_{ij}}\,\qquad\quad\quad i = 1,\ldots,I, {}\\ \nu _{j}^{(t)}& =& \nu _{ j}^{(t-1)} + \frac{\sum _{i}\mu _{i}^{(t-1)}(n_{ ij} -\tilde{ m}_{ij})} {{\tilde{\varphi }}^{(t-1)}\sum _{i}{\left (\mu _{i}^{(t-1)}\right )}^{2}\tilde{m}_{ij}}\,\quad\quad\qquad j = 1,\ldots,J, {}\\ {\varphi }^{(t)}& =& {\varphi }^{(t-1)} + \frac{\sum _{i}\mu _{i}^{(t-1)}\nu _{ j}^{(t-1)}(n_{ ij} -\tilde{ m}_{ij})} {{\tilde{\varphi }}^{(t-1)}\sum _{i,j}{\left (\mu _{i}^{(t-1)}\nu _{j}^{(t-1)}\right )}^{2}\tilde{m}_{ij}}\, {}\\ \end{array}$$

where \(\tilde{m}_{ij}\) stands for the ML estimate of m ij , recalculated at each step of the iterations (Goodman 1979b).

As in every iterative procedure the assignment of initial values to the parameters’ estimates is crucial. In this setup, a reasonable choice for the main effects is

\(\alpha _{i}^{(0)} =\exp (\frac{\ell_{i+}} {J} - \frac{\bar{\ell}} {2})\) and \(\beta _{j}^{(0)} =\exp (\frac{\ell_{+j}} {I} - \frac{\bar{\ell}} {2})\),

where \(\ell_{ij} =\log (n_{ij})\) and \(\bar{\ell}= \frac{\ell_{++}} {IJ}\). A natural choice for the initial estimates of the parametric scores is to consider them equidistant for successive categories, i.e., as if the U model was applied. In this case, starting by considering the scores equal to the corresponding category index and rescaling them linearly so that constraints (6.5) and (6.6) are satisfied, we conclude to

\(\mu _{i}^{(0)} = \sqrt{ \frac{3} {I({I}^{2}-1)}}(2i - I - 1)\) and \(\nu _{j}^{(0)} = \sqrt{ \frac{3} {J({J}^{2}-1)}}(2j - J - 1)\).

A compatible then choice for \({\varphi }^{(0)}\) would be \({\varphi }^{(0)} =\sum _{i,j}\mu _{i}\nu _{j}\log n_{ij}\); see (6.7). The algorithm convergence is checked through the change in the log-likelihood value (4.13), calculated after each parameters’ estimates updating circle.

The standard algorithms normally applied are the Newton–Raphson’s or the Fisher’s scoring algorithm (see Sect. 5.3.1). In this context, the parameters of the under consideration association model has to be written in a vector form. For example, for the U model the parameter vector is \(\boldsymbol{\beta }= (\lambda,\lambda _{1}^{X},\ldots,\lambda _{I-1}^{X},\lambda _{1}^{Y },\ldots,\lambda _{J-1}^{Y },\varphi )\). The Newton’s unidimensional method is simpler, since it does not require matrix inversion but for this with the drawback that it does not estimate the s.e. of the parameters.

Information on available software and special programs for estimation of association models based on each of these algorithms will be provided in Sect. 6.6.

6.3 Association Model Selection

We have already faced in the context of the cannabis example the problem of selecting the appropriate association model when more than one of them is of adequate fit. The problem of model selection in the framework of association models is connected to the analysis of association (ANOAS) in a contingency table and is based on the interconnection between the models. In particular, it holds

I  ⊂  U (or (LL)  ⊂  R (or C)  ⊂  RC.

Indeed, the I model is the U (or LL) model with \(\varphi = 0\), while the C model, for example, is the RC model for a specific choice for the row scores. This means that

$$\displaystyle{{G}^{2}(\text{I}) > {G}^{2}(\text{U}) > {G}^{2}(\text{C}) > {G}^{2}(\text{RC})\,}$$

for example, with the analogous results for the LL or the R model. The crucial question at this point is whether the reduction in G 2 value as we move to less parsimonious models is worth, justifying the loss in df and simplicity. The answer is provided through the conditional testing procedure (see Sects. 4.6 and 5.3.4). As soon as we detect the simplest association model \(\mathcal{M}_{1}\) of adequate fit, we abandon it in favor of a more complicated \(\mathcal{M}_{2}\) (\(\mathcal{M}_{1} \subset \mathcal{M}_{2}\)) only if the reduction in G 2 is statistically significant. Hence, we proceed testing the fit of \(\mathcal{M}_{1}\) conditional on the fact that \(\mathcal{M}_{2}\) holds by (4.34).

Thus, for example, given that the U, R, or C models hold, one could propose the conditional tests of independence G 2(I | U), G 2(I | R), or G 2(I | C), being asymptotically distributed as \({\mathcal{X}}^{2}\) with df equal to 1, I − 1, or J − 1, respectively. These conditional tests of independence, given that model U, R, or C holds, are of greater asymptotic power, compared to the traditional unconditional test of independence (Gross 1981; Agresti 1983a). The tests I | U and I | LL are special mentioned since they are most powerful as 1 df tests. In this context it is important to note that the conditional test I | RC is not that straightforward since \({G}^{2}(\text{I}\vert \text{RC}) = {G}^{2}(\text{I}) - {G}^{2}(\text{RC})\) is not asymptotically \({\mathcal{X}}^{2}\) distributed with df = df(I) − df(RC) as probably expected. The asymptotic null distribution of G 2(I | RC) for testing independence is that of the largest eigenvalue from a Wishart distributed matrix (Haberman 1981). Gradually conditional testing from the RC to I, such as I | U, U | R, and R | RC, is possible and provides an analysis of association (ANOAS) table, throwing light on the underlying association structure of the table and analyzing deviance from independence in terms of source (overall, row, interaction) in a manner analogous to the ANOVA table (Goodman 1981a).

6.3.1 Model Selection for Example 6.1

We have already seen that for the cannabis data set all association models provide an acceptable fit. It seems natural to favor the C model over the R, due to parsimony and better fit. Thus, the choice lies between the U, C, and RC models. By the conditional testing procedure one has \({G}^{2}(\text{C}\vert \text{RC}) = {G}^{2}(\text{C}) - {G}^{2}(\text{RC}) = 0.496\), which is non-significant based on the \(\mathcal{X}_{2}^{2}\) distribution (p-value=0.7804). Thus, there is no point in adopting the RC model since it does not provide a significant improvement of the fit over the C model. Further on, since \({G}^{2}(\text{U}\vert \text{C}) = {G}^{2}(\text{U}) - {G}^{2}(\text{C}) = 0.3683\) is again non-significant (p-value=0.5439, df = 1), the model that seems to be appropriate for this data set is the simple U model, with just 1 df less than the independence and a straightforward interpretation of constant local association all over the table. This sequence of conditional testing is summarized in the ANOAS table, provided for this example at the end of Sect. 6.6.1.

6.4 Features of Association Models

We have mentioned that the LL model (U as well) is invariant under linear transformations of its scores. Actually, this property holds for all association models considered so far. Let \(\mu _{i}^{{\ast}} = a_{1}\mu _{i} + b_{1}\) and \(\nu _{j}^{{\ast}} = a_{2}\nu _{j} + b_{2}\) be any choice of linear rescaling for the row and column scores, respectively. Then in terms of the local odds ratios and the new scores, the association model would be defined as

$$\displaystyle{ \log \theta _{ij}^{k\ell} {=\varphi }^{{\ast}}(\mu _{ k}^{{\ast}}-\mu _{ i}^{{\ast}})(\nu _{\ell}^{{\ast}}-\nu _{ j}^{{\ast}})\,\ \ i = 1,\ldots,I - 1\,\ \ j = 1,\ldots,J - 1\, }$$

for i < k ≤ I and j <  ≤ J. This is further transformed to

$$\displaystyle{ \log \theta _{ij}^{k\ell} = a_{ 1}a{_{2}\varphi }^{{\ast}}(\mu _{ k} -\mu _{i})(\nu _{\ell} -\nu _{j})\, }$$

which for \({\varphi }^{{\ast}} = \frac{\varphi } {a_{ 1}a_{2}}\) is equivalent to (6.3). Thus, without affecting the expected cell frequencies, their estimates, and consequently the fit of the model, we can replace the normalizing constraints on the scores by the weighted normalizing constraints:

$$\displaystyle{ \sum _{i}w_{1i}\mu _{i} =\sum _{j}w_{2j}\nu _{j} = 0\qquad \text{and}\qquad \sum _{i}w_{1i}\mu _{i}^{2} =\sum _{ j}w_{2j}\nu _{j}^{2} = 1. }$$
(6.17)

Although the choice of weights does not affect the model fit, it has an impact on the scores’ values and thus issues related to or depending on them. The most common choices for weights are uniform (\(w_{1i} = w_{2j} = 1,\ i = 1,\ldots,I\ \ j = 1,\ldots,J\)) or marginal (\(w_{1i} =\pi _{i+},\ i = 1,\ldots,I\) and \(w_{2j} =\pi _{+j},\ j = 1,\ldots,J\)). Uniform weights are preferred when the marginal distributions are not fixed and interest lies on comparing tables with unequal marginal distributions. The marginal weights are the choice when scores of association models have to be compared to correspondence analysis results (see Sect. 7.2) or when merging rows and/or columns of a table is the issue (see Sect. 7.5). For a more detailed discussion on the choice of the weighting system, please see Goodman (1985, 1991) or Becker and Clogg (1989).

Replacing the standard constraints (6.5) and (6.6) by the more general (6.17) and working analogously as for deriving (6.7), the intrinsic association parameter \(\varphi\) satisfies

$$\displaystyle{\varphi =\sum _{i,j}w_{1i}w_{2j}\mu _{i}\nu _{j}\log \pi _{ij}\,}$$

i.e., it is a weighted measure of correlation between the row and columns of the table. However, as already stated, parameter \(\varphi\) is redundant in models R, C, and RC.

Models LL, U, R, and C are log-linear while RC is log-multiplicative (not linear in its parameters). As already mentioned, models LL and U require that both classification variables of the contingency table are ordinal and thus are sensitive in re-ordering of rows or columns. Similarly, model R (C) is invariant in re-ordering of the rows (columns) of the table and the corresponding classification variable needs not necessarily be ordinal. Ordinality is required only for columns (rows). Finally, the RC model is invariant in re-ordering of columns or rows. Hence, it can also be applied to tables with nominal classification variables. Overall, parametric scores in an association model can correspond either to nominal underlying classification variable or to ordinal with unknown distances between successive categories. Thus, the parametric scores of models R, C, and RC need not necessarily be monotone. Lack of monotonicity implies non-monotone association, in the sense that local association will be positive in some areas of the table and negative in others.

Thus, monotonicity of the row and column scores is naturally connected to positive dependence and stochastic ordering of the conditional distributions in rows or columns of the table. In particular, Goodman (1981a) showed that under the RC model, association is isotropic and tables possessing this property are TP2, i.e., \(\theta _{ij}\geqslant 1\) for all i = 1, , I − 1, j = 1, , J − 1 with at least one strict inequality (see also Sect. 2.5.5). As indicated by (6.10), in case the row and column scores are both ordered and of the same ordering (i.e., both increasing or both decreasing), \(\varphi > 0\) is equivalent to positive dependence and consequently the conditional row or column probabilities are stochastically ordered. This means that if X i and X i are the conditional row distributions of rows i and i′ with i < i′, then the positive dependence implies \(\mathbf{X}_{i}\leqslant _{st}\mathbf{X}_{i^{\prime}}\), e.g., \(\mathbf{X}_{i}\) is stochastically smaller than X i. The distribution of X i is said to be stochastically smaller than that of X i, if \(F_{\mathbf{X}_{i}}(t)\geqslant F_{\mathbf{X}_{i^{\prime}}}(t)\), for all t = 1, , J, where F X is the cumulative distribution function of X.

In general, it is not ensured that the ML estimates of monotone parametric scores will be monotone as well. If the ML estimates are non-monotonic, then one can proceed in order-restricted estimation of the corresponding association model (see Sect. 6.8.2).

Another nice property of association models is their connection to the bivariate normal distribution. In fact, association models lead to very good approximations of the discretized bivariate normal distribution (Goodman 1981b, 1985; Wang 1987; Becker 1989a; Rom and Sarkar 1990). To see this, consider the bivariate normal density

$$\displaystyle\begin{array}{rcl} & & f(x,y;\mu _{x},\mu _{y},\sigma _{x},\sigma _{y},\rho ) = \frac{1} {2\pi \sigma _{x}\sigma _{y}\sqrt{1 {-\rho }^{2}}} \times {}\\ & &\qquad \qquad \exp \left (- \frac{1} {2(1 {-\rho }^{2})}\left [{(\frac{x -\mu _{x}} {\sigma _{x}} )}^{2} - 2\rho (\frac{x -\mu _{x}} {\sigma _{x}} )(\frac{y -\mu _{y}} {\sigma _{y}} ) + {(\frac{y -\mu _{y}} {\sigma _{y}} )}^{2}\right ]\right ) {}\\ \end{array}$$

and partition the 2 surface in small rectangular regions \((a_{i-1} \times a_{i}) \times (b_{j-1} \times b_{j})\), where i = 1, , I,  j = 1, , J, \(a_{0} = b_{0} = -\infty \), and \(a_{I} = b_{J} = +\infty \). Then, the U model, or more precisely the symmetric U model (with I = J and \(\mu _{i} =\nu _{i}\)), applied in the table formed by this partition approximated well the discretization of the above density. For standardized scores, parameter \(\varphi\) is analogue to \(\frac{\rho }{1{-\rho }^{2}}\) of the normal density.

Finally, we would like to emphasize that beyond the sophisticated insight in the structure of the underlying association, if significant, one of the major strong points of the association models is the ability for conditional testing of independence, as already discussed in Sect. 6.3.

6.5 Association Models of Higher Order: The RC(M) Model

The RC model, though the less parsimonious association model considered so far and in spite of its impressive abilities and often impressive fit, is not always adequate. RC itself imposes a restrictive structure which can sometimes be insufficient to model the underlying association. It leaves (I − 2)(J − 2) df, enough space for more in-between models for building up the interaction until the saturated model is reached.

Indeed, one could consider to add more multiplicative terms of the RC-type. For example, the next model to consider would be

$$\displaystyle{ \log m_{ij} =\lambda +\lambda _{i}^{X} +\lambda _{ j}^{Y } +\varphi _{ 1}\mu _{i1}\nu _{j1} +\varphi _{2}\mu _{i2}\nu _{j2}\,\ \ i = 1,\ldots,I\,\ j = 1,\ldots,J. }$$

In fact, this idea can be extended further, as long as I and J are large enough, since in the saturated model, there are (I − 1)(J − 1) association parameters. Thus, considering M association terms for the general case, we are led to the model

$$\displaystyle{ \log m_{ij} =\lambda +\lambda _{i}^{X} +\lambda _{ j}^{Y } +\sum _{ m=1}^{M}\varphi _{ m}\mu _{im}\nu _{jm}\,\ i = 1,\ldots,I\,\ j = 1,\ldots,J\, }$$
(6.18)

denoted by RC(M).

How large can M be? To answer this question one must see what M represents. The concept behind the RC(M) general association model is that of dimensionality of the underlying association and its decomposition in axes. The idea is the same as in other well-known methods of reduction of dimensionality, such as factor analysis and principal component analysis. As in these methods, for identifiability purposes as well as for convenience of interpretation, the axes to which the association is decomposed are considered to be orthogonal. In our framework, the key for this decomposition is the singular value decomposition (SVD) of the interaction parameters matrix \(\boldsymbol{\varLambda }= \left (\lambda _{ij}^{XY }\right )_{I\times J}\) of the saturated log-linear expression. Thus, M is the rank of matrix \(\boldsymbol{\varLambda }\), the parameters \(\varphi _{m}\) (m = 1, , M) are the associated eigenvalues while the row and column scores for a certain m are the components of the mth corresponding eigenvector. In particular, the SVD of the interaction matrix \(\boldsymbol{\varLambda }\) gives

$$\displaystyle{\boldsymbol{\varLambda }= \mathbf{M}\boldsymbol{\varphi }\mathbf{N}^{\prime}}$$

where \(\boldsymbol{\varphi }= \mathrm{diag}(\varphi _{1},\ldots,\varphi _{M})\) with \(\varphi _{1}\geqslant \ldots \geqslant \varphi _{M} > 0\) are the eigenvalues while the eigenvectors \(\mathbf{\mu }_{m} = (\mu _{1m},\ldots,\mu _{Im})\) and \(\mbox{ $\boldsymbol{\nu }$}_{m} = (\nu _{1m},\ldots,\nu _{Jm})\), associated to the mth eigenvalue, form the matrices \(\mathbf{M}_{I\times M} = (\mu _{im})\) and \(\mathbf{N}_{J\times M} = (\nu _{jm})\), respectively. M and N are orthonormal, e.g., they satisfy

$$\displaystyle{\mathbf{M}^{\prime}\mathbf{M} = \mathbf{N}^{\prime}\mathbf{N} = \mathbf{I}_{M}}$$

where I M is the Mth order identity matrix. The maximum possible value for the dimension of the decomposition M is M  = min(I, J) − 1. Thus, model (6.18) can be considered for 0 ≤ M ≤ M . The associated degrees of freedom equal df[RC(M)] = (IM − 1)(JM − 1). Model RC(0) is the independence model, RC(1) is the RC while RC(M ) is the saturated model. The orthonormality of the eigenvectors is equivalently expressed as

$$\displaystyle\begin{array}{rcl} \sum _{i}\mu _{im}& =& \sum _{j}\nu _{jm} = 0\, {}\\ \sum _{i}\mu _{im}^{2}& =& \sum _{ j}\nu _{jm}^{2} = 1,\ \ \ \ \ m,\ell= 1,\ldots,M, {}\\ \sum _{i}\mu _{im}\mu _{i\ell}& =& \sum _{j}\nu _{jm}\nu _{j\ell} = 0,\ \ \ m\neq \ell. {}\\ \end{array}$$

Note that the first two restrictions are the identifiability constraints we have already imposed on the row and column scores of the RC model for uniform weights while the last one corresponds to the orthogonality of the dimensions. In order to generalize the above constraints and allow the use of weights, the generalized singular value decomposition (GSVD) of the interaction matrix \(\mbox{ $\boldsymbol{\varLambda }$}\) has to be applied instead of the SVD. By GSVD, M and N are orthonormalized with respect to the weights

\(\mathbf{W}_{1} = \mathrm{diag}(w_{11},\ldots,w_{1I})\) and \(\mathbf{W}_{2} = \mathrm{diag}(w_{21},\ldots,w_{2J})\),

e.g., they satisfy

\(\mathbf{M}^{\prime}\mathbf{W}_{1}\mathbf{M} = \mathbf{N}^{\prime}\mathbf{W}_{2}\mathbf{N} = \mathbf{I}_{M}\), or equivalently, the row and column scores satisfy the constraints:

$$\displaystyle\begin{array}{rcl} \sum _{i}w_{1i}\mu _{im}& =& \sum _{j}w_{2j}\nu _{jm} = 0,\ \ \ m = 1,\ldots,M\, \\ \sum _{i}w_{1i}\mu _{im}\mu _{i\ell}& =& \sum _{j}w_{2j}\nu _{jm}\nu _{j\ell} =\delta _{m\ell},\ \ \ m,\ell= 1,\ldots,M,{}\end{array}$$
(6.19)

where δ mℓ is Kronecker’s delta.

Analogously to the RC, the RC(M) model can alternatively be expressed by the multiplicative form, used by Goodman:

$$\displaystyle{ m_{ij} =\alpha _{i}\beta _{j}\exp \left (\sum _{m=1}^{M}\varphi _{ m}\mu _{im}\nu _{jm}\right ),\ \ i = 1,\ldots,I\,\ j = 1,\ldots,J. }$$

However, the most convenient expression for physical interpretation is in terms of the local odds ratios

$$\displaystyle{ \log \theta _{ij}^{L} =\sum _{ m=1}^{M}\varphi _{ m}(\mu _{im} -\mu _{i+1,m})(\nu _{jm} -\nu _{j+1,m})\, }$$
(6.20)

for i = 1, , I − 1 ,  j = 1, , J − 1. 

6.5.1 Maximum Likelihood Estimation of the RC(M) Model

The estimation procedure for the RC(M) follows the lines of the procedure described in Sect. 6.2 for the simple RC model. The extension for the RC(M) model is straightforward. Thus it can be proved that the likelihood equations for the RC(M) model are the (4.14) for the main effects while the likelihood equations corresponding to the row and column scores and the association parameters \(\varphi _{m}\)’s are

$$\displaystyle\begin{array}{rcl} \sum _{j}\nu _{jm}(\hat{m}_{ij} - n_{ij})& =& 0,i = 1,\ldots,I,\ \ m = 1,\ldots,M\,{}\end{array}$$
(6.21)
$$\displaystyle\begin{array}{rcl} \sum _{i}\mu _{im}(\hat{m}_{ij} - n_{ij})& =& 0,j = 1,\ldots,J,\ \ m = 1,\ldots,M\,{}\end{array}$$
(6.22)
$$\displaystyle\begin{array}{rcl} \sum _{i,j}\mu _{im}\nu _{jm}(\hat{m}_{ij} - n_{ij})& =& 0,m = 1,\ldots,M\,{}\end{array}$$
(6.23)

i.e., straightforward extensions of (6.12), (6.13), and (6.14), respectively.

In practice, the updating equations of the simple Newton’s unidimensional method for the interaction parameters of RC(M) are direct extensions of the corresponding updating equations for the RC model, presented in Sect. 6.2, while the updating equations for the main effects remain the same. The orthonormal constraints that must be satisfied by the scores of RC(M) need not to be enrolled in the iterative procedure. Since it is only a matter of parameters’ identifiability and rescaling that does not affect the cell estimates, it is sufficient if they are fulfilled by the initial values and if the final estimated interaction parameters are rescaled by SVD at the final stage, after the convergence of the algorithm is achieved. The initial values \(\varphi _{m}^{(0)}\), \(\mu _{im}^{(0)}\), and \(\nu _{jm}^{(0)}\) (m = 1, , M) can easily be obtained as the corresponding values of the first M terms of the SVD of the observed interaction matrix, e.g., matrix \(\mbox{ $\boldsymbol{\varGamma }$}\) with entries \(\gamma _{ij} = \frac{n_{ij}} {\alpha _{i}^{(0)}\beta _{j}^{(0)}}\). The extension of the Newton–Raphson algorithm, presented in Sect. 6.2, is also straightforward.

6.5.2 Example 6.2

The data considered in Table 6.7 are from Wermuth and Cox (1998) and cross-classify people in West Germany (Central archive, 1993) according to their type of schooling completed and their age in a 5 × 5 table. As can be observed in Table 6.8, there exists a highly significant association between age and type of schooling which is not captured by the RC model. Hence, the consideration of an association model RC(M) with M > 1 is necessary. The RC(2) model is of very good fit and is the model we propose for this data set and base inference on.

Table 6.7 Cross-classification of 3,673 subjects according to their age and type of school attended, West Germany 1991/92
Table 6.8 G 2 statistics for the fit of independence and association models applied in Table 6.7

In the context of the RC model, we have seen that the important information lies not on the values of the row and column scores themselves but on their distances for successive categories. Distances are the quantities that are interpreted in terms of closeness of the effect of the underlying categories on odds formed by categories of the other classification variable. For the RC(M) model with M > 1, the logic of interpretation is the same, as can easily be verified by definition (6.20). However, distances between rows (or columns) are now defined by the Euclidean distance in the M dimensional space. For M = 2, this is easily visualized on the two-dimensional space, with the i-th row (i = 1, , I) and the j-th column (j = 1, , J) being represented by the points \((\hat{\mu }_{i1},\hat{\mu }_{i2})\) and \((\hat{\nu }_{j1},\hat{\nu }_{j2})\), respectively. For our example, Fig. 6.1 presents such graphs for scores satisfying constraints (6.19) subject to the uniform (left) or marginal (right) weights. The MLEs of the scores (under uniform weights) are provided in Table 6.9.

Fig. 6.1
figure 1

Plots of the estimated row (bullets) scores \((\hat{\mu }_{i1},\hat{\mu }_{i2})\), i = 1, , 5, and column (triangles) scores \((\hat{\nu }_{j1},\hat{\nu }_{j2})\), j = 1, , 5, under the RC(2) model applied to Table 6.7, with respect to uniform (left) and marginal (right) weights

These two graphs, though they obviously refer to different scores’ values, they correspond to equivalent expressions of the RC(2) estimates just differently scaled through the choice of the weights. It is evident from the plots that the 2nd dimension captures the differentiation of row 1 from 2 (incomplete from complete basic education) and 4 from 3 and 5 (upper medium from medium and intensive education). The closeness of columns 4 and 5 (ages 60–74 or > 74) is remarkable, especially in the marginal weights plot, where they are almost indistinguishable. This observation motivates Sect. 7.5 on merging categories, where this example is revisited (Sect. 7.5.1). Though the marginal weighted scores are preferred for comparisons in rows (or in columns), the uniform weighted are more appropriate for investigating the row–column combinations of strong association. We can observe, for example, that upper medium education (row 4) is stronger associated with people aged 30–44 (column 2). Also, as expected basic incomplete education (row 1) is more often among elder people (columns 4 and 5). Alternatively, one could apply Correspondence Analysis (CA) and conclude to very similar results. The CA of this data set is provided in Sect. 7.2.2.

Table 6.9 Association parameters’ ML estimates for the RC(2) model applied in Table 6.7

6.6 Software Applications for Association Models

Association models, though so powerful tools in modeling the association in contingency tables, did not receive the attention one would expect. The major reason for that is the fact that their fit is not provided as a standard option in statistical software. They can be fitted in statistical packages, but some extra programming is required. Additionally, a Fortran algorithm for ML estimation of the RC(M) model by the Newton’s unidimensional method has been provided by Becker (1990a) while the Newton–Raphson algorithm has been implemented in Fortran by Haberman (1995) and a Fisher’s scoring type algorithm using the weighted least squares as initial estimates by Ait-Sidi-Allal et al. (2004). As already mentioned in Sect. 6.2, the above algorithms are appropriate for the estimation and fit of models linear in their parameters, i.e., models U, R, and C. In case of RC(M), M ≥ 1, we still apply these methods by considering at each step of estimation for the row (columns) scores that the column (row) scores are fixed at the estimated value of the previous step. This procedure is continued until convergence is achieved.

Association models which are linear in their parameters, e.g., the models U (and LL), R, and C, are log-linear and can be fitted as GLM by any available software, adopting the procedure described next for R. In the web appendix (see Sect. A.4) are also available syntax codes for automatized fitting of all the association models in SPSS, including the RC(M).

6.6.1 Association Models in R: Example 6.1

The simple association models U (or LL), R, and C can be fitted in R straightforward, in the generalized linear models framework by the glm() function of R, as described below.

First of all, the data have to be in the standard format for fitting classical log-linear models. Thus, let freq, row, and col be the usual variables of a data frame corresponding to the vectors of observed frequencies and row and column classification variables, respectively. Then, we have to construct the variables of row and column scores, mu = row and nu = col, respectively. This way, the row and column scores are set equal to μ i  = i, i = 1, , I and ν j  = j, j = 1, , J. In the sequel, row and col have to be defined as factors and then the U, R, and C models are the log-linear models with terms in the model row + col + mu:nu, row + col + row:nu and row + col + mu:col, respectively. The LL model can be fitted as the U model with the only difference that the score variables mu and nu will now contain the values of the prefixed, not equidistant scores for the corresponding row and column categories.

To illustrate, let us consider the cannabis example. The data are saved under the data frame cannabis.fr. Models U, R, and C are fitted by glm() as follows:

> freq <- c(204,6,1,211,13,5,357,44,38,92,34,49)

> row <- rep(1:4, each=3); col <- rep(1:3,4)

> mu <- row; nu <- col

> row <- factor(row); col <- factor(col)

> cannabis.fr <- data.frame(freq, row, col, mu, nu)

> model.U <- glm(freq~row+col+mu:nu, poisson, data=cannabis.fr)

> model.R <- glm(freq~row+col+row:nu, poisson, data=cannabis.fr)

> model.C <- glm(freq~row+col+mu:co,, poisson, data=cannabis.fr)

for models U, R, and C, respectively. The output is then derived, for the U model, for example, through the command

> summary(model.U)

and is provided in Table 6.10.

Table 6.10 Output of the U model fit in R for the cannabis data (Table 6.1)

Note that by the above procedure, the scores involved in the interaction term are not standardized, they are equal to the corresponding category index and thus they do not satisfy constraints (6.5) and (6.6). For these scores, we have \(\hat{\varphi }= 0.80265\), as given in Sect. 6.1.2. This, however, does not affect the ML estimates of the common value of all expected under U local odds ratios or of the expected cell frequencies, which can be obtained by

> MLE.U <- xtabs(model.U$fitted.values ∼ row+col)

verifying the corresponding entries of Table 6.1.

The R and C models fitted above are defined by (6.15), the parameterization without the intrinsic association parameter \(\varphi\). If we want the models to be in the form (6.4) and the scores to be standardized, then mu and nu have to be rescaled appropriately before applying the model, while the estimates of the parametric scores have to be rescaled at a final stage as well. To simplify this procedure, we conducted for each association model the corresponding R function, namely the fit.U(), fit.R() and fit.C(), to be found in the web appendix (see Sect. A.3.5). They fit the corresponding model subject to the general constraints (6.17), controlling the weights used by the parameter iflag, with the option of uniform (=0) or marginal (=1) weights. Hence, the U model on the cannabis example with marginal weights could be fitted by this function as

> U <- fit.U(freq, NI=4, NJ=3, iflag=1)

where NI=I and NJ=J. Under U, additional to the standard glm output, the standardized scores are saved under U$mu and U$nu, respectively, as well as \(\hat{\varphi }\) (U$phi), the G 2 value (U$G2), the degrees of freedom (U$df), the p-value (U$p.value), and the ML estimates of the expected cell frequencies (U$fit.freq). Functions fit.R() and fit.C() are called analogously and give output of the same format.

The RC and more generally the RC(M) models, M ≥ 1, cannot be fitted by glm, since they are not linear in their parameters and thus not in the GLM family. Thus, these models need special treatment. They can be fitted through functions available in special packages developed for nonlinear models, such as gnm, developed by Turner and Firth. An overview of version 1.0–6 is provided by Turner and Firth (2012a). An alternative choice is the VGAM package, which deals with Vector Generalized Additive Models (Yee and Wild 1996). For a short presentation of the package, see Yee (2008).

We will illustrate association models by the gnm package, based on Turner and Firth (2007). It is designed for models multiplicative in their parameters and defines the product of parameters, corresponding to factors f1 and f2, respectively, through Mult(f1,f2). Thus, the RC model is fitted on our cannabis example by

> library(gnm)

> RC.model<-gnm(freq~row+col+Mult(row,col),family=poisson)

Recall that row and col have to be defined as factors before calling the model. Output is printed on the screen by typing

> RC.model

The output is provided in Table 6.11.

Table 6.11 Output for the RC model applied on the cannabis example (data in Table 6.1) by gnm

The ML estimates of the expected cell frequencies under the RC model are provided by

The ML estimates of the parameters of the model are printed on screen by typing:

> coefficients(RC.model)

Furthermore, the command coef() gives the ability to save the ML estimates of a parameter in a separate vector in order to be handy for further use. For example, the row main effects estimates can be saved under the vector a:

Note that the model is fitted through (6.15) and the scores’ estimates are not with respect to the constraints (6.17). They can be rescaled linearly though, in order to satisfy them. The getContrasts() command of the gnm package provides this facility. Thus, for uniform weights, the rescaling is achieved as

mu<-getContrasts(model.RC, pickCoef(model.RC,"[.]row"),

+ ref="mean", scaleWeights="unit")

and

> nu<-getContrasts(model.RC, pickCoef(model.RC,"[.]col"),

+ ref="mean", scaleWeights="unit")

for the row and column scores, respectively, leading to

For marginal weights, the vectors of row and column marginal probabilities have to be computed first:

> rowProbs<-with(cannabis.fr, tapply(freq,row,sum)/sum(freq))

> colProbs<-with(cannabis.fr, tapply(freq,col,sum)/sum(freq))

The rescaling follows then analogously:

> mu<-getContrasts(model.RC, pickCoef(model.RC,"[.]row"),

+ ref=rowProbs, scaleWeights=rowProbs)

> nu<-getContrasts(model.RC, pickCoef(model.RC,"[.]col"),

+ ref=colProbs, scaleWeights=colProbs)

For our example, this leads to

and

Alternatively, the RC model can be fitted by the function fit.RC(), provided in the web appendix (see Sect. A.3.5), with the option of selecting marginal or uniform weights for the constraints (6.17) on the scores. The function is called exactly as fit.U() and provides the same type of output. However, this function does not provide the standard errors of the parametric scores. For this, the getContrasts() function described above is needed.

The conditional testing between nested association models, when allowed, can be performed by function anova(). Thus, for our cannabis example, the ANOAS table based on the conditional tests G 2(I | U), G 2(U | C), and G 2(C | RC) (see Sect. 6.3.1) is produced by

> I<-glm(freq~row+col, family=poisson)

> m1<- fit.U(freq,4,3,1)

> m2<- fit.C(freq,4,3,1)

> m3<- fit.RC(freq,4,3,1)

> anova(I,m1$model,m2$model,m3$model,test="Chisq")

6.6.2 The RC(M) Model in R: Example 6.2

Association models of order M, M > 1, can be fitted in the gnm package applying the instances argument for the multiplicative term of the model. Thus, for our Example 6.2 (Table 6.7), the RC(2) model can be fitted as

> RC2.model <- gnm(freq ∼ row+col+instances(Mult(row,col),2),

+ family=poisson)

where freq is the vector of cell frequencies while row and col are the factors corresponding to the rows (type of schooling) and columns (age group) of the table, respectively. The fit.RCm() function in the web appendix (see Sect. A.3.5) fits the RC(M) model (M ≥ 1) on a contingency table, read in vector form, and rescales the row and column score vectors through singular value decomposition of the appropriate table, so that constraints (6.19) hold for uniform or marginal weights.

Thus, for Example 6.2, the 5 × 5 data table is provided in vector form (by rows) as

> WCox <- c(12,13,12,20,7,215, 507,493,460,137,

+ 277,300,192,126,38,52,91,47,15,6,233,225,102,74,19)

and the RC(2) model is fitted by

> m <- 2

> RC.m <- fit.RCm(freq=WCox, NI=5, NJ=5, m=2, iflag=1)

where the parameter m specifies the order of the association model. The derived score vectors are subject to constraints (6.19) with marginal weights. Changing the last argument of fit.RCm() from 1 to 0, the uniform weights are applied.

One can save the scores’ estimates in order to proceed with the presentation of the results, for example, through appropriate graphs. For Example 6.2, the vectors of row and column scores subject to marginal weights can be saved in vectors mu1 and nu1 as

> mu1 <- RC.m$mu; nu1 <- RC.m$nu

while subject to uniform weights in mu0 and nu0 as

> RC.m0 <- fit.RCm(freq=WCox, NI=5, NJ=5, m=2, iflag=1)

> mu0 <- RC.m0$mu; nu0 <- RC.m0$nu

respectively.

The plot, for example, of the row and column scores’ coordinates under uniform weights (Fig. 5.1 (left)) can easily be obtained through the standard plot() command, applied on mu0 and nu0. This plot is produced by function plot_2dim(), provided in the web appendix (see Sect. A.3.5). The plot in Fig. 5.1 (left) is obtained by calling this function as

> plot_2dim(mu0, nu0, -0.6, 0.6, -0.8, 0.8, -0.7, 1.1, 1.2)

The parameters of this function following nu0 control the plot appearance. Thus, (−0.6, 0.6) and (−0.8, 0.8) define the range of values of the first and second axis, respectively. The value set −0.7 leaves a gap of 70% of the text width between the category label and the corresponding plotted symbol. According to the case, it can be adjusted each time for the better appearance of the graph. The size of text characters in axes and labels is set to be 1.1 times the default text size while the size of the symbols and their categories’ labels are 1.2 times the default text size. Analogously, the plot in Fig. 5.1 (right) is obtained through

> plot_2dim(mu1, nu1, -2, 2, -5, 5, -0.7, 1.1, 1.2)

6.6.3 Example 2.4 (Revisited)

Recall the data set on varicella disease in Table 2.5, where 170 children are cross-classified by complication occurrence and age (in a 2 × 4 table). Independence was rejected (p-value=0.040) and the linear trend test suggested that the linear association is non-significant (p-value=0.104). Fitting association models on this example, we confirm the inappropriateness of linear association since the U model is rejected with G 2 = 7. 093 (p-value=0.029, df = 2). Note that because I = 2, the R model is equivalent to the U while the C model is saturated (G 2 = 0, df = 0). However, derivation of the column scores of the C model is very informative on comparing the different age groups in terms of their association to the complication response. The C model is fitted by function fit.C of the web appendix (see Sect. A.3.5). From the corresponding output, the coefficients along with their standard errors and significances are provided below.

In the estimation procedure above, the parametric score ν 4 is redundant and is the reference category (coefficient for Y4:mu, shown as not defined). The fixed row scores used are μ 1 = −1 and μ 2 = 1. The estimated column scores are rescaled to satisfy the marginal weighted constraints (6.17). The rescaled scores and the interaction parameter are also part of the output. In particular, \(\hat{\varphi }= 0.231\), \(\hat{\nu }_{1} = -1.127\) \(\hat{\nu }_{2} = 2.136\), \(\hat{\nu }_{3} = 0.560\), and \(\hat{\nu }_{4} = -0.468\).

Observing that only the column score \(\hat{\nu }_{2}\) is significantly different from the others, we conclude that only the category “1–2 years old” relates differently to complications than all other age categories. Thus, we proceed by applying the LL model with the constraint \(\nu _{1} =\nu _{3} =\nu _{4}\). In R this is easily achieved, as shown next. Before fitting the LL model, we rescale the simple raw scores 1,2, assigned initially to the rows and column categories, through the function rescale of the web appendix (see Sect. A.3.5), so that the \(\hat{\varphi }\), derived by glm(), corresponds to the marginally weighted scores:

> NI <- 2

> NJ <- 4

> freq <- c(10,7,9,59,6,19,12,48)

> row<-gl(NI,NJ,length=NI*NJ)

> col<-gl(NJ,1,length=NI*NJ)

> dtable <- data.frame(freq,row,col)

> mu0<-c(1,2)

> nu0<-c(1,2,1,1)

> mu<-rep(rescale(mu0, dtable, 1, 1)$score,each=NJ)

> nu<-rep(rescale(nu0, dtable, 1, 0)$score, NI)

> LL.model <- glm(freq~row+col+mu:nu,poisson)

From the summary output, obtained by summary(LL.model), we see that the model is acceptable, since G 2 = 1. 572 (p-value=0.456, df = 2). Commands

MLEs <- xtabs(LL.model$fitted.values ∼ row + col)

stdres <- xtabs(rstandard(LL.model) ∼ row + col)

express in table form the ML estimates of the expected frequencies and the corresponding standardized residuals. None of the standardized residuals exceeds 1.96; thus all cells are fitted satisfactorily by the model.

The interaction parameter is estimated as \(\hat{\varphi }= 0.210\) (coefficient for mu:nu in the output) while the row and column scores are μ 1 = −1, μ 2 = 1, \(\nu _{1} =\nu _{3} =\nu _{4} = -0.4249\) and μ 2 = 2. 3534 (saved in vectors mu and nu).

This model provides a clear and strong interpretation. The equality restrictions among the column scores impose on the expected local odds ratios the restrictions \(\theta _{13}^{L} = 1\) and \(\theta _{11}^{L} = 1/\theta _{12}^{L}\). The odds ratios \(\theta _{11}^{23}\) and \(\theta _{11}^{24}\), opposing age categories 1 to 3 and 1 to 4, respectively, are also equal to 1 and \(\theta _{12}^{24} =\theta _{ 12}^{L}\). Thus we conclude that the odds of complication occurrence for children 1–2 years old is \(\hat{\theta }_{11}^{L} = {e}^{\hat{\varphi }(\mu _{2}-\mu _{1})(\nu _{2}-\nu _{1})} = {e}^{1.1669} = 3.2\) times higher than for children of any other age. The \(\hat{\theta }_{1j}^{L}\), j = 1, 2, 3, could also have been computed by the local.odds.DM() function (see Sect. A.3.2), implemented as follows:

> NI <- 2; NJ <- 4; C <- local.odds.DM(NI, NJ)

> LO <- as.vector(C%*%log(LL.model$fitted.values))

> exp(t(matrix(LO, NJ-1)))

\(\fbox{ $\begin{array}{lrrr} & \mathtt{[,1]}& \mathtt{[,2]}&\mathtt{[,3]}\\ \mathtt{[1, ] } &\mathtt{3.207792 } &\mathtt{0.3117409 } & \mathtt{1} \\ \end{array} $}\)

6.6.4 Association Models Fitted on the Local Odds Ratios

Association models can also be fitted directly on the local odds ratios through the generalized log-linear model GLLM (5.28) and implemented in R by Lang’s mph package. The GLLM turns to a model on local odds ratios by eliminating matrix M and appropriately defining matrix C, so that Clog(m) becomes the vector of the expected local odds ratios under the assumed model, where m is the vector of expected cell frequencies. For an I × J table, C and m are of size (I − 1)(J − 1) × IJ and IJ × 1, respectively. This matrix C for an I × J table is produced by function local.odds.DM() of the web appendix (see Sect. A.3.2). The design matrix X specifies the restrictions imposed on the local odds ratios by the model under consideration and is of size (I − 1)(J − 1) × s, where s is the number of parameters in the model.

We illustrate this option fitting the U model on our cannabis example. Under the U model a common value is assumed for all local odds ratios; thus the parameter β is scalar and the design matrix X is the (I − 1)(J − 1) × 1 vector of 1’s. Recall (see Sect. 5.6) that mph needs to be actualized in R and that the data are read as a vector saved in matrix form. The way we define the C matrix requires the data to be read by rows. For the cannabis example,

> y <- c(204,6,1,211,13,5,357,44,38,92,34,49)

> y <- matrix(y); NI <- 4; NJ <- 3; dim1<-(NI-1)*(NJ-1)

> X<-matrix(rep(1,dim1))

In our context, matrix C is

> C <- local.odds.DM(NI,NJ)

and the link of the GLLM model is defined by the function

> L.fct <- function(m){C%*%log(m)}

Finally, the U model is fitted by

> mph.out <- mph.fit(y=y,L.fct=L.fct,X=X)

> mph.summary(mph.out,cell.stats=T,model.info=T)

The derived output provides goodness-of-fit statistics for the model; estimate of the β (the log of the common local odds ratio under U), and estimates of the expected cell frequencies and the associated residuals. Further informations, as for example on the algorithm’s convergence, are also provided.

In case of one or more sampling zeros, when working with odds ratios and for ensuring their existence, we set

> z <- y+0.000001

> mph.out <- mph.fit(y=z,L.fct=L.fct,X=X)

6.7 Association Models for Multi-way Tables

Association models can also be applied on contingency tables of higher dimension. Consider a I × J × K contingency table with classification variables X, Y, and Z, respectively. Association models can be derived by replacing one or more of the interaction terms of any hierarchical log-linear model by multiplicative terms based on scores, leading thus to more parsimonious models of special structure, in analogy to two-way association models.

For example, consider the model

$$\displaystyle\begin{array}{rcl} & & \log m_{ijk} =\lambda +\lambda _{i}^{X} +\lambda _{ j}^{Y } +\lambda _{ k}^{Z} {+\varphi }^{XZ}\mu _{ i}\tau _{k} {+\varphi }^{Y Z}\nu _{ j}\tau _{k}\, \\ & & \qquad \qquad \qquad \qquad \qquad \qquad \qquad i = 1,\ldots,I,\ j = 1,\ldots,J,\ k = 1,\ldots,K, {}\end{array}$$
(6.24)

with \((\mu _{1},\ldots,\mu _{I})\), \((\nu _{1},\ldots,\nu _{J})\), and \((\tau _{1},\ldots,\tau _{K})\) sets of known scores assigned to the categories of the classification variables X, Y, and Z, respectively, all equidistant for successive categories. This model is a special type of conditional XY independence model, derived from the (XZ,  YZ) log-linear model by replacing the \(\lambda _{ik}^{XZ}\) and \(\lambda _{jk}^{Y Z}\) interaction terms by the uniform (U)-type terms \({\varphi }^{XZ}\mu _{i}\tau _{k}\) and \({\varphi }^{Y Z}\nu _{j}\tau _{k}\), respectively. For this, it will be denoted as \((XZ_{U},\ Y Z_{U})\). It is very parsimonious, having df = IJKIJK, just 2 less than the complete independence model (X,  Y,  Z).

More options are available by considering some of the scores to be parametric. Assuming thus an R-type interaction only for the term \(\lambda _{ik}^{XZ}\), the \((\mu _{1},\ldots,\mu _{I})\) scores would be considered as parameters in (6.24) and the model would then be \((XZ_{X},\ Y Z_{U})\). In terms of notation, an interaction term without an index is of log-linear model type, with U of uniform association type, while when it is of row or column effect type, the variable of parametric scores is set as an index. A multiplicative RC-type has also a multiplicative index. Thus \((XZ_{XZ},\ Y Z_{U})\) is the model defined by (6.24) with parametric μ- and τ-scores of the XZ interaction and fixed equidistant scores for the ν- and τ-scores of the YZ interaction term. Were the layer scores parametric in both interaction terms, they could be homogeneous or not. The model for parametric nonhomogeneous τ-scores, \((XZ_{Z},\ Y Z_{Z})\), is

$$\displaystyle{\log m_{ijk} =\lambda +\lambda _{i}^{X} +\lambda _{ j}^{Y } +\lambda _{ k}^{Z} {+\varphi }^{XZ}\mu _{ i}\tau _{k}^{XZ} {+\varphi }^{Y Z}\nu _{ j}\tau _{k}^{Y Z}\,}$$

while with the additional restrictions \(\tau _{k}^{XZ} =\tau _{ k}^{Y Z}\), k = 1, , K, the homogeneous \((XZ_{Z},\ Y Z_{Z})\) is derived.

A flexible form of association model, including three-factor interaction, is

$$\displaystyle\begin{array}{rcl} & & \log m_{ijk} =\lambda +\lambda _{i}^{X} +\lambda _{ j}^{Y } +\lambda _{ k}^{Z} {+\varphi }^{XY }\mu _{ i}^{XY }\nu _{ j}^{XY } {+\varphi }^{XZ}\mu _{ i}^{XZ}\tau _{ k}^{XZ} + \\ & & \qquad \qquad \qquad \qquad \qquad {\qquad \varphi }^{Y Z}\nu _{ j}^{Y Z}\tau _{ k}^{Y Z} {+\varphi }^{XY Z}\mu _{ i}^{XY Z}\nu _{ j}^{XY Z}\tau _{ k}^{XY Z}\, {}\end{array}$$
(6.25)

which offers a variety of model options, depending on the combinations of assumptions about the scores.

The most general expressions for imposing association structures on the two-factor interaction terms of a three-way log-linear model are

$$\displaystyle\begin{array}{rcl} \lambda _{ij}^{XY }& =& \sum _{ m=1}^{M_{1} }\varphi _{m}^{XY }\mu _{ im}^{XY }\nu _{ jm}^{XY }\,\qquad \lambda _{ jk}^{Y Z} =\sum _{ m=1}^{M_{2} }\varphi _{m}^{Y Z}\nu _{ jm}^{Y Z}\tau _{ km}^{Y Z}\, \\ \lambda _{ik}^{XZ}& =& \sum _{ m=1}^{M_{3} }\varphi _{m}^{XZ}\mu _{ im}^{XZ}\tau _{ km}^{XZ}\, {}\end{array}$$
(6.26)

with \(1 \leq M_{1} \leq \min (I,J) - 1\), 1 ≤ M 2 ≤ min(J, K) − 1, and 1 ≤ M 3 ≤ min(I, K) − 1. The three-factor interaction can be decomposed in an analogue manner

$$\displaystyle{ \lambda _{ijm}^{XY Z} =\sum _{ m=1}^{M_{4} }\varphi _{m}\mu _{im}\nu _{jm}\tau _{km}. }$$
(6.27)

The consideration of (6.27) for M 4 > 1 as well as other options for decomposing three-way arrays, known as trilinear decomposition, is beyond the scopes of this book (see Sect. 6.8.1).

The scores are subject to constraints analogue to (6.19) of the two-way case. When M i  = 1 (i = 1, , 4), we conclude to model (6.25). Furthermore, the scores can be considered known and lead to terms of U-, R-, C-, or L- (for the layer scores) type.

The idea extends analogously to contingency tables of higher dimension. However, the number of possible association models augments with the dimension of the table. It is difficult to control all possible combinations of assumptions regarding the interaction terms of a multi-way table, so an automated stepwise association model selection procedure is not feasible. In practice we start by selecting the appropriate hierarchical log-linear model by a stepwise procedure and then try to conclude to a more parsimonious model by imposing special structures to some of the interaction terms. In this procedure, conditional tests between nested models are helpful. Finally, we can test whether parametric scores of the same classification variable but on different interaction terms are homogeneous.

Multi-way association models and their physical interpretations will be illustrated with two examples that follow.

6.7.1 Example 6.3

In a study, 16,236 teenagers in Holland are cross-classified in a 6 × 7 × 2 table by their educational level after 4 years of second-level education, their test for intellectual capacity (TIC) score, and their gender (Siciliano and Mooijaart 1997). The data are provided in Table 6.12.

Table 6.12 Cross-classification of 16,236 teenagers in Holland by their educational level after 4 years of second-level education, their test for intellectual capacity (TIC) score, and their gender (Siciliano and Mooijaart 1997)

In the framework of hierarchical log-linear model, we do not have another option for this data set than the saturated model, since the three-factor interaction is significant. We can verify that the model of homogeneous association (EI, EG, IG) is rejected with G 2(GE, GI, EI) = 61. 517 (p-value=0.001, df=30). It is notable however that the highly significant G 2 value is also affected by the large sample size of the table. The corresponding dissimilarity index is \(\hat{\varDelta }= 0.02\), at the limit for a satisfying data representation by this model (see Sects. 4.2 and 4.2.1 for calculation in R). The significance of each term in the log-linear model is summarized in the analysis of deviance table of the saturated model, derived as shown below.

Provided that the data are given in vector freq, expanded by rows, followed by columns and layers, we program in R

> G<-factor(rep(1:2,each=42)); I<-factor(rep(1:7,12))

> E<-factor(rep(1:6,2,each=7)); educ.fr<-data.frame(freq,E,I,G)

> sat.glm <- glm(freq ∼ E*I*G, family=poisson, data=educ.fr)

> anova(sat.glm, test="Chisq")

and get the output of Table 6.13.

Table 6.13 Decomposition of the deviance for Table 6.12

Since all interaction terms are significant, the basis for selecting the appropriate association model will be the saturated. The simplest model expression of this type is the

$$\displaystyle\begin{array}{rcl} log(m_{ijk}) =\lambda +\lambda _{i}^{E} +\lambda _{ j}^{I} +\lambda _{ k}^{G} {+\varphi }^{EI}\mu _{ i}\nu _{j} {+\varphi }^{IG}\nu _{ j}\tau _{k} {+\varphi }^{EG}\mu _{ i}\tau _{k} {+\varphi }^{EIG}\mu _{ i}\nu _{j}\tau _{k},& &{}\end{array}$$
(6.28)

with all the involved set of scores known. Considering the scores in each set equidistant for successive categories, model (6.28), denoted by \(\left (EIG_{U}\right )\), is the most parsimonious three-way association model in the class of models with up to three-factor interaction, having just 4 parameters more than the model of complete independence (E,  I,  G).

In order to fit association models in R, we create the known score vectors for the classification variables. For simplicity, we set each score equal to the index of the category it corresponds to. We compute

> mu<-rep(1:6,2,each=7); nu<-rep(1:7,12); tau<-rep(1:2,each=42)

and extend the data frame

> educ.fr<-data.frame(freq,E,I,G,mu,nu,tau)

Model \(\left (EIG_{U}\right )\) is then fitted as

> EIG.U <- glm(freq ∼ E+I+G+mu:nu+mu:tau+nu:tau+mu:nu:tau,

+ poisson, data=educ.fr)

It is of very bad fit, with \({G}^{2}\left (EIG_{U}\right ) = 450.179\) (p-value < 0. 0005, df=67), but reduces the G 2 statistic drastically, compared to complete independence (G 2(E,  I,  G) = 2634, 719, df=71).

This means that some of the row and/or column scores in (6.28) have to be considered parametric. Since G is binary, the corresponding scores \((\tau _{1},\tau _{2})\) cannot be parametric and their choice does not affect the model fit.

Considering that only the TIC effect is parametric on all the interaction terms, model (6.28) extends to

$$\displaystyle{log(m_{ijk}) =\lambda +\lambda _{i}^{G} +\lambda _{ j}^{E} +\lambda _{ k}^{I} +\mu _{ i}\nu _{j}^{EI} +\nu _{ j}^{IG}\tau _{ k} {+\varphi }^{EG}\mu _{ i}\tau _{k} +\mu _{i}\nu _{j}^{EIG}\tau _{ k}\,}$$

denoted by \((EI_{I},\ EG_{U},\ IG_{I},\ EIG_{I})\). This last model expression employs non-standardized parametric scores and therefore the redundant intrinsic association \(\varphi\)-parameters are absorbed. It is fitted in R by

> EIG.I <- glm(freq~E+I+G+mu:I+mu:tau+I:tau+mu:I:tau,

+ poisson, data=educ.fr)

Its bad fit (\({G}^{2}(EI_{I},\ EG_{U},\ IG_{I},\ EIG_{I}) = 426.6\), p-value < 0. 0005, df=52) provides evidence that the education scores in some or all interaction terms should be considered as unknown parameters.

Thus, we try next the model

$$\displaystyle{ log(m_{ijk}) =\lambda +\lambda _{i}^{G} +\lambda _{ j}^{E} +\lambda _{ k}^{I} +\mu _{ i}^{EI}\nu _{ j} {+\varphi }^{IG}\nu _{ j}\tau _{k} +\mu _{ i}^{EG}\tau _{ k} +\mu _{ i}^{EIG}\nu _{ j}\tau _{k}\, }$$
(6.29)

where only the education level (E) effect is parametric for all interaction terms. This is denoted as \((EI_{E},\ EG_{E},\ IG_{U},\ EIG_{E})\) and fitted in R by

> EIG.E <- glm(freq~E+I+G+E:nu+E:tau+nu:tau+E:nu:tau,

+ poisson, data=educ.fr)

exhibiting an adequate fit with \({G}^{2}(EI_{E},\ EG_{E},\ IG_{U},\ EIG_{E}) = 61.074\) (p-value=0.267, df=55).

The fitted cell frequencies under \((EI_{E},\ EG_{E},\ IG_{U},\ EIG_{E})\) are provided in Table 6.12 in parentheses. For equidistant scores ν j  = j (j = 1, , 7) and τ k  = k (k = 1, 2), the ML estimate of the intrinsic association parameter \({\varphi }^{IG}\) is \(\hat{{\varphi }}^{IG} = 0.0745\) while the ML estimates of the parametric scores are given in Table 6.14.

Table 6.14 ML estimates of the parametric scores of model (6.29) fitted on the data in Table 6.12 for equidistant scores ν j  = j (j = 1, , 7) and τ k  = k (k = 1, 2)

The interpretation of parameters needs caution and has to be done locally due to the non-monotonicity of the parametric scores. For interpretation, the fitted odds ratios under the model have to be considered. Model (6.29) in terms of the conditional EI log local odds ratios and for the choice of known scores given above is expressed as

$$\displaystyle\begin{array}{rcl} \log \left (\theta _{ij(k)}^{EI}\right )& =& \log \left ( \frac{m_{ijk} \cdot m_{i+1,j+1,k}} {m_{i+1,j,k} \cdot m_{i,j+1,k}}\right ) = (\mu _{i+1}^{EG} -\mu _{ i}^{EG}) + (\mu _{ i+1}^{EIG} -\mu _{ i}^{EIG})k {}\\ & =& \log \left (\theta _{i(k)}^{EI}\right )\,\ \ \ \ i = 1,\ldots,I - 1,\ j = 1,\ldots,J - 1,\ k = 1,2, {}\\ \end{array}$$

i.e., independent of j, as expected since successive ν j scores are equidistant. This means that under (6.29) the fitted local odds ratios are constant within rows. In our case, the \(\hat{\theta }_{i(k)}^{EI}\) row values are given in Table 6.15. Thus, we see that for boys, the strongest association between educational level and the TIC score is between HAVO and VWO. The odds of a boy achieving a category of TIC score vs. the immediate previous one is 1.34 times higher for a boy having general education preparing for university (VWO) than high level of general education (HAVO). The corresponding odds ratio for girls is 1.37. The conditional (within gender) association between TIC score and educational level is positive for boys although not equally strong for all educational levels while for girls it is negative (though weak) when comparing DO to LBO and MAVO to MBO.

Table 6.15 ML estimates of the \(\theta _{i(k)}^{EI}\), i = 1, , 5, k = 1, 2, under model (6.29) for the example in Table 6.12

Based on model (6.29), one could further try if more parsimonious models are preferable. More parsimonious models are obtained by imposing homogeneity constraints among the vectors of the unknown education scores or considering one of them as being equidistant. It could also be tested whether less parsimonious non-log-linear models involving interaction terms multiplicative in their parameters (i.e., of RC-type) lead to a significant improvement of the fit.

6.7.2 Homogeneous Uniform Association

Consider a I × J × K contingency table, consisting of K independent strata. Then the simplest association structure is to consider that for each XY partial table, all local odds ratios are equal, i.e., assume that the U model holds for each stratum. This model is defined by

$$\displaystyle{ \theta _{ij(k)}^{XY } =\theta _{ k}^{XY },\ \ i = 1,\ldots,I - 1,\ j = 1,\ldots,J - 1,\ k = 1,\ldots,K. }$$
(6.30)

Local odds ratios however from different strata may vary and (6.30) is the nonhomogeneous U model. An even simpler model is the homogeneous U model, assuming that all strata have a common local odds ratio

$$\displaystyle{ \theta _{ij(k)}^{XY } {=\theta }^{XY },\ \ i = 1,\ldots,I - 1,\ j = 1,\ldots,J - 1,\ k = 1,\ldots,K. }$$
(6.31)

To illustrate these models, consider the data in Table 6.16. The first stratum of this 4 × 3 × 2 data table is the cannabis example of Table 6.1 while the second stratum corresponds to an analogue survey among students of another university (artificial data).

Table 6.16 Students’ survey about cannabis use at two universities

The U model, fitted on the 4 × 3 partial table of the second stratum (data in Table 6.16, stratum (2)), with one of the procedures described in Sect. 6.6.1 for Example 6.1, is of good fit with G 2 2 = 8. 284 (p-value= 0.141, df = 5). Under this model, the MLE of the common local odds ratio in log-scale is \(\log \hat{\theta }_{2}^{L} = 0.749\), close to the corresponding estimate for the data in the first stratum (\(\log \hat{\theta }_{1}^{L} = 0.803\)), for which we had \(G_{1}^{2} = 1.469\) (p-value= 0.917, df = 5).

The nonhomogeneous U model (6.30) for data in Table 6.16 can be derived in mph by

> source("c://Program Files//R//mph.Rcode.txt")

> freq <-c(204,6,1,211,13,5,357,44,38,92,34,49)

> freq2<-c(311,5,4,339,19,12,429,66,57,134,51,74)

> y<- matrix(append(freq,freq2))

> NI<-4; NJ<-3; dim1<-(NI-1)*(NJ-1); dim2<-2*dim1

> zer<-matrix(rep(0,NI*NJ*(NI-1)*(NJ-1)),(NI-1)*(NJ-1))

> C0<-local.odds.DM(NI,NJ); C1<-cbind(C0,zer)

> C2<-cbind(zer,C0); C<-rbind(C1,C2)

> L.fct <- function(m){C%*%log(m)}

> X<-matrix(rep(1,dim1)); Z<-matrix(rep(0,dim1))

> X2<-rbind(cbind(X,Z),cbind(Z,X))

mph.out <- mph.fit(y=y,L.fct=L.fct,X=X2)

> mph.summary(mph.out,cell.stats=T,model.info=T)

leading to G 2 = 9. 752 (p-value=0.462) with corresponding residual df=10. In this case, since the two strata are independent, model (6.30) is equivalent to fitting the U model independently to each of the partial two-way tables. Indeed, we can verify that \(G_{1}^{2} + G_{2}^{2} = 9.752 = {G}^{2}\) and that the ML estimates of \(\log \theta _{k}^{XY }\), k = 1, 2 (\(\log \hat{\theta }_{1}^{XY } =\hat{\beta } _{1} = 0.803\), \(\log \hat{\theta }_{2}^{XY } =\hat{\beta } _{2} = 0.749\)) coincide with the corresponding \(\log \hat{\theta }_{k}^{L}\), k = 1, 2.

The homogeneous U model (6.31) can be fitted as follows. The L.fct function is defined as above but the design matrix X2 is replaced by X1, defining thus a univariate parameter β instead of the bivariate \((\beta _{1},\beta _{2})\) above:

> X1<-rbind(X,X) # homogeneous U model for both layers

> mph.out <- mph.fit(y=y,strata=2,L.fct=L.fct,X=X1)

> mph.summary(mph.out,cell.stats=T,model.info=T)

Selected parts of the output are provided in Tables 6.17 and 6.18.

Table 6.17 Output of the mph function for the homogeneous U model, applied on the 4 × 3 × 2 data of Table 6.16: the observed local log odds ratios (OBS LINK) are listed by rows, along with the ML estimate of the common under the assumed model local log odds ratio value, its s.e. and the standardized link residuals
Table 6.18 Output of the mph function: observed and ML fitted cell frequencies under the homogeneous U model applied on the 4 × 3 × 2 data of Table 6.16, along with ML estimates of the cell probabilities, standard errors, and standardized residuals

The equivalent expression of model (6.30) in terms of expected cell frequencies is

$$\displaystyle\begin{array}{rcl} \log m_{ijk} =\lambda +\lambda _{i}^{X} +\lambda _{ j}^{Y } +\lambda _{ k}^{Z} {+\varphi }^{XY }\mu _{ i}\nu _{j} +\lambda _{ ik}^{XZ} +\lambda _{ jk}^{Y Z} {+\varphi }^{XY Z}\mu _{ i}\nu _{j}\tau _{k},\ \ \ \ & & \\ i = 1,\ldots,I,\ j = 1,\ldots,J,\ k = 1,\ldots,K,& &{}\end{array}$$
(6.32)

where the set of scores \((\mu _{1},\ldots,\mu _{I})\), \((\nu _{1},\ldots,\nu _{J})\), and \((\tau _{1},\ldots,\tau _{K})\) are all known and equidistant for successive categories. They can be considered subject to standardization constraints or set equal to the corresponding category index.

The conditional local odds ratios under this model are fixed within partial tables equal to

$$\displaystyle\begin{array}{rcl} & & \theta _{(k)}^{XY } =\exp \left ({(\varphi }^{XY } {+\varphi }^{XY Z}\tau _{ k})\varDelta _{1}\varDelta _{2}\right ), \\ & & \qquad \qquad \qquad \qquad \quad i = 1,\ldots,I - 1,\ j = 1,\ldots,J - 1,\ k = 1,\ldots,K,{}\end{array}$$
(6.33)

where \(\varDelta _{1} =\mu _{i+1} -\mu _{i}\) and \(\varDelta _{2} =\nu _{j+1} -\nu _{j}\). Distances Δ 1 and Δ 2 are constant over i and j, respectively, since the corresponding scores are equidistant. In case the scores equal their categories’ indexes, (6.33) is simplified to

$$\displaystyle{\theta _{(k)}^{XY } =\exp \left ({\varphi }^{XY } + {k\varphi }^{XY Z}\right ),\ \ i = 1,\ldots,I - 1,\ j = 1,\ldots,J - 1,\ k = 1,\ldots,K.}$$

Eliminating the three-factor interaction term in (6.32), the model of homogeneous uniform association (6.31) is derived in its equivalent expression

$$\displaystyle{ \log m_{ijk} =\lambda +\lambda _{i}^{X} +\lambda _{ j}^{Y } +\lambda _{ k}^{Z} {+\varphi }^{XY }\mu _{ i}\nu _{j} +\lambda _{ ik}^{XZ} +\lambda _{ jk}^{Y Z}, }$$
(6.34)

and \({\theta }^{XY } =\exp \left ({\varphi }^{XY }\varDelta _{1}\varDelta _{2}\right )\).

Replacing \(\lambda _{ik}^{XZ}\) and/or \(\lambda _{jk}^{Y Z}\) in (6.32) or (6.34) by \({\varphi }^{XZ}\mu _{i}\tau _{k}\) and/or \({\varphi }^{Y Z}\nu _{j}\tau _{k}\), respectively, more parsimonious models of uniform or homogeneous uniform association are derived that consider U-type structure also for one at least of the other two-factor interactions. The options for models of this type do not restrict to XZ and/or YZ interactions of U-type. They could be of any other type (R, C, RC, or RC(M)). Such special models of uniform and homogeneous uniform association cannot be captured via the odds ratio formulation (6.30) or (6.31).

Finally, the simplest homogeneous uniform XY association model is obtained when X is jointly independent from X and Y, i.e., the model

$$\displaystyle{\log m_{ijk} =\lambda +\lambda _{i}^{X} +\lambda _{ j}^{Y } +\lambda _{ k}^{Z} {+\varphi }^{XY }\mu _{ i}\nu _{j},\ \ i = 1,\ldots,I - 1,\ j = 1,\ldots,J - 1\,}$$

with k = 1, , K. For the example above, this model is the best option, with G 2 = 18. 165 (p-value=0.314, df = 16), giving \(\log \hat{{\theta }}^{XY } = 0.7688\).

6.8 Overview and Further Reading

Association models, in their dominant form, have been mainly developed by the fundamental and inspiring work of Goodman (1979b, 1981a, 1985, 1986, 1991, 1996) and thus it is common to refer to them also as Goodman’s models. For an overview, we refer to the 1986 and 1991 discussion papers and the review Goodman (1982). Significant in the development of association models was continuation work of Haberman (1979, 1995), Clogg (1982a), Becker and Clogg (1988, 1989), and Becker (1989a, 1990a, 1992) as well as the contribution of Anderson and Philips (1981) and Anderson (1984). Association models are presented in the book by Clogg and Shihadeh (1994). An overview of association models with formulation and interpretation based on odds ratios is provided by Breen (2008) along with social sciences-orientated illustrations and references. For their connection to latent class models , see Sect. 10.3.1.

To be fair, we must say that the basis for the development of the association models lies back to Tukey’s 1 d.f. test (Tukey 1949) and in a different form they have been considered earlier. In particular, Nelder and Wedderburn (1972) consider the U model applied on the popular Boys’ Dream Disturbance data set of Maxwell as an illustration of their GLM model on contingency tables. Simon (1974) introduced the R model (his formulation A) as well as the R model for cumulative odds (his formulation B), being the forerunner of the association models for global odds ratios (see Sect. 7.1 below). A similar model for the cumulative odds was earlier considered by Williams and Williams and Grizzle (1972). Also his analysis of information is remarkable, throwing insight into the nature of departure from independence in the direction of the ANOAS, developed later by Goodman. Other multiplicative models modeling triangular or diagonal departures from independence for square tables have been proposed by Goodman (1972). We should mention that methods that analyze contingency tables with ordinal classification variables by applying scores to their categories have been proposed much earlier, even from Yates (1948) and Armitage (1955), not to forget the linear trend test of Mantel (1963), described in Sect. 2.3. However, these early references, after assigning scores to the categories, treated the corresponding categorical variables applying methods appropriate for continuous variable analysis.

Haberman (1974b) adopted a different approach by generating a class of models through the decomposition of the vector with elements of the expected cell frequencies log(m ij ) on an orthonormal basis, formed by orthogonal polynomials. Special members of this class of models are the linear-by-linear association model and the row effect model. Further, he proved standard asymptotic inference results for these models and expressed them in terms of the log odds ratio, noting the importance of the difference between scores. Finally, he was the first to mention that his approach could be extended straightforward to define such models for multi-way tables. Association models for two-way and three-way tables are presented in Wong (2010).

Diagnostics for the RC association model have been discussed in Andersen (1992). De Rooij and Heiser (2005) criticized the classical graphical representation of the RC(M) model and proposed the distance-association model representation, for which the distances between row and column points can be interpreted directly. Marginal association models have been considered by Lapp et al. (1998), Bartolucci et al. (2001), and Bartolucci and Forcina (2002). For ordinal tables with a response variable, Agresti (1986) proposed a regression R 2-type measure of association, based on scores assigned to the classification variables’ categories, and used the R association model to estimate these scores. Baccini and Khoudraji (1992) and Baccini et al. (2000) considered least squares estimation of association models. Beh and Farver (2009) discuss on closed-form estimation of the association parameter \(\varphi\) of the U model.

The RC(M) are not the only models with additive multiplicative interaction terms. Goodman (1985) introduced other possible models with interaction of rank M but simpler than the RC(M). For example, for M = 2, the R+C model is defined by the same formula as RC(2) but assumes that the column scores of the first term and the row scores of the second are known; thus it has less parameters than RC(2). Similarly, model U+R+C has just one parameter more than the R+C model, since the third term that is added is of uniform type, having assigned fixed row and column scores. More options of parsimonious models of higher rank for the interaction are obtained through the use of orthogonal polynomials for assigning scores (Kateri et al. 1998). For example, the model U(1)+U(2) is of M = 2, but all the involved scores are fixed, assigned through orthogonal polynomials of first and second order for the (1) and (2) term, respectively, and has thus just 2 parameters more than the independence model.

In the special case of a square table with commensurable classification variables, it makes sense to assume that the row and column scores are homogeneous. Thus, the RC model with the homogeneity restriction on its scores \(\mu _{i} =\nu _{i},\ \forall \ i = 1,\ldots,I\), can be applied, which is more parsimonious than the standard RC and simultaneously of special interpretational value for such tables. On this we shall return and comment more on Chap. 9, specialized on square tables.

We have already mentioned in Chap. 2 that the log-linear models are the discrete analogue of the analysis of variance. It is interesting to note briefly at this point the analogues to association model in the two-way ANOVA framework. Special analysis of variance models that impose a structure on the interaction as that of the association models have been considered as well. Indicatively we mention the early work of Williams (1952), who used the multiplicative term for the interaction, and that by Gollob (1968), who introduced the more general term of the RC(M) type. Goodman and Haberman (1990) proved asymptotic normality for the scores of the RC(M) ANOVA model and provided asymptotic confidence intervals for the estimated scores. Furthermore, they developed the asymptotic conditional tests of the appropriateness of a simpler association model of the type U, R, or C given that the RC holds. Finally, they extended their results for the more general RC(M) ANOVA model. Viele and Srinivasan (2000) proceeded to the Bayesian analysis of the RC(M) ANOVA model. Speaking about analogies to the continuous case, Jones (1998) noted that the constant local dependence for continuous bivariate random variables is the continuous analogue of the U model.

We have seen in Sect. 6.3 that for ordinal contingency tables, conditional tests of independence given that the U, R, or C model holds, i.e., testing independence against a directed alternative, are more asymptotically powerful. Alternative approaches for strengthening the power of the classical Pearson’s X 2 test of independence are based on the decomposition of Pearson’s X 2 into orthogonal components in terms of assigned scores to the categories of the ordinal classification variables. For example, Best and Rayner (1996) and Rayner and Best (2000) considered scores based on orthogonal polynomials while Nair (1986, 1987), proportional to the midrank scores. Beh (1998) studied the use of different types of scores in the correspondence analysis framework. Nair’s procedure partitions the X 2 statistic value for testing independence into location, dispersion, and residual effects. It is related to the location-dispersion model of McCullagh (1980), as is also pointed out by McCullagh’s and Agresti’s comments in the discussion of Nair (1986). Agresti’s comment related Nair’s statistics also to the statistics of Koch et al. (1982) with fixed or rank-based scores, to the measure of Agresti (1986), and to association models and the models in Semenya et al. (1983). Koshimizu and Tsujitani (1998) consider association models with location and dispersion scores for singly ordered contingency tables. Their model is actually analogue to the R(1)+R(2) model of Kateri et al. (1998) with the column scores of the first dimension being the Nair’s scores instead of equidistant for successive categories.

6.8.1 Multi-way Association Models

Conditional and partial associations in multi-way tables are discussed in Clogg (1982b). Becker (1989b) introduced the no three-factor interaction model with all two-way interaction terms replaced by the general terms (6.26). Becker and Clogg (1989) considered three-way association models for the analysis of stratified two-way tables, with and without homogeneity constraints on the scores across the strata. Association models for stratified tables focusing on detecting layer differences were developed by Goodman and Hout (1998).

On the decomposition of the three-factor interaction term (6.27) focused Goodman (1983, 1986), Agresti and Kezouh (1983), Choulakian (1988), Anderson (1996), and Siciliano and Mooijaart (1997). A review is provided by Wong (2001). Methods of decomposing three-way arrays are reviewed in Ten Berge (2011).

6.8.2 Order-Restricted Inference

In case of association models with parametric scores, the monotonicity of the scores is not ensured by the standard estimation procedures. Since their monotonicity is related to stochastic ordering of the corresponding classification variable (Goodman 1981a), it is usually natural to expect the scores for ordinal classification variables to be monotonic. Estimation procedures subject to order constraints for the parametric scores have been developed for the R (or C) model by Agresti et al. (1987), based on isotonic regression. The RC model with order-restricted row and column scores has been considered by Ritov and Gilula (1991). A test of independence, conditional on the order-restricted RC model, is discussed in Kuriki (2005). Alternative algorithms for fitting the order-restricted RC model have been proposed and compared by Galindo-Garre and Vermunt (2004). Order restrictions yield also for an extended RC model, introduced by Bartolucci and Forcina (2002).

Ordinary or order-restricted inferences for these models rely on large-sample asymptotic methods. As it is stated in Galindo-Garre and Vermunt (2004), these methods do not work well for sparse tables or small sample sizes, common in social and biomedical applications, where the usual asymptotic chi-squared p-values are known to be inaccurate. A promising alternative is the Bayesian approach (see Sect. 10.5).

6.8.3 Comparison of Two Ordinal Responses

The problem of comparing two ordinal responses is very old and of special interest in many fields, especially in biomedical applications. The need to compare the response to a treatment of two independent groups of patients, defined, for example, by the presence of a prognostic factor, is obvious. Another common situation is to compare two different treatments applied on two independent samples with the corresponding responses measured on a common scale. The ordinality of the response scale has to be taken into consideration in handling the problem and answering to the question “Which group of patients benefits more from the treatment?” or “Which treatment is superior?.” The underlying sampling scheme can be multinomial or product multinomial. The first is the case whenever a sample of n subjects is cross-classified with respect to an ordinal response Y and a binary variable X indicating the two groups, while the second when two independent multinomial samples of the same ordinal response and of sizes n 1 and n 2 are available. If the ordinal response has J categories, then the above described data form a 2 × J contingency table. For the multinomial sampling scheme the corresponding joint distribution is \(\mbox{ $\boldsymbol{\pi }$} = (\pi _{ij}) = P(X = i,\ Y = j)\), i = 1, 2,  j = 1, , J. In case of two independent multinomials, the row marginals are also fixed, \(n_{i+} =\sum _{ j=1}^{J}n_{ij} = n_{i}\) (i = 1, 2).

The problem of comparing two response profiles is equivalent to the stochastic comparison of the two row distributions of the abovementioned 2 × J contingency table and as such has been faced by a variety of methods. The related bibliography is very rich and an extended critical review of the available methods can be found in Agresti and Coull (2002).

The hypothesis that two multinomial distributions are identical against an ordered alternative is mainly tested through LR, Wald, and score tests or through linear rank tests. It is well known that restricting the alternative hypothesis leads to more powerful tests than the standard chi-squared test of independence. These approaches are all asymptotic, while the linear rank tests depend on the choice of the scores assigned to the ordered categories. Characteristic references of LR tests are Grove (1980, 1984) and Robertson and Wright (1981), while the approaches of Emerson and Moses (1985), Graubard and Korn (1987), and Gautam (1997) are based on linear rank tests. To deal with the sensitivity of the linear rank tests on the scores, Kimeldorf et al. (1992) proposed the min–max scoring and Gautam et al. (2001) the iso-chi-square approach. Nonlinear rank tests have also been proposed. For example, Hilton et al. (1994) and Nikiforov (1994) applied the Smirnov test while Berger (1998) proposed the convex hull methodology that leads to admissible tests. Properties and power of the convex hull test applied on 2 × J tables are further studied in Berger et al. (1998) (see also Cohen and Sackrowitz 1998; Cohen et al. 2000). An interesting approach is provided by Permutt and Berger (2000), who reviewed various rank tests, classified them as Smirnov-like or Wilcoxon-like, and compared them. However, the nonlinear tests are not easy to compute for J > 3. It is important to note that “with few exceptions there is no optimal test for this problem,” as stated by Berger and Ivanova (2002). Tests based on log-linear models were developed by Agresti and Coull (1998).

The connection of association models to the stochastic ordering of the conditional row (or column) distributions of the contingency table has been discussed in Sect. 6.4. In case of the 2 × J tables, the RC model coincides with the C model, which is saturated. For 2 × J contingency tables with \(\mu _{1} <\mu _{2}\) and \(\varphi > 0\), positive dependence is equivalent to \(\nu _{j}\leqslant \nu _{j+1}\) (j = 1, , J − 1) with \(\nu _{1} <\nu _{J}\). Thus, monotonicity of the column scores {ν j : j = 1, , J} implies stochastic ordering of the probabilities

$$\displaystyle{\mbox{ $\boldsymbol{\pi }$}_{i} = \left \{ \frac{\pi _{ij}} {\pi _{i+}},j = 1,\ldots,J\right \},\ i = 1,2.}$$

Thus the distribution of the response Y for the second group (row) is stochastically larger than the one of the first group (row).

The comparison of the two row distributions can be further enriched with the option of umbrella ordering as an alternative when stochastic ordering is rejected (Kateri 2011). Umbrella ordering means that the distribution in the first row is stochastically smaller than the one in the second up to a level of the ordinal scale that defines the column categories and stochastically larger after this level (or the opposite). In terms of physical interpretation, when comparing two alternative treatments, umbrella ordering of their response distributions corresponds to cases where one treatment is better over the other up to a certain level of the response scale while the situation changes after this point. In a retrospective study context cross-classifying the “cured”– “not-cured” groups with the J levels of a prognostic factor, this could mean higher risk for the very low and very high levels of the prognostic factor. Umbrella ordering essentially reveals a dispersion effect for the group comparison. Dispersion effects for ordinal responses have been handled by the generalized cumulative link model, introduced by McCullagh (1980). Umbrella ordering can be captured by the C model, with adequately constrained column scores.

6.8.4 Cell Frequencies vs. Local Odds Ratios Modeling

We have seen so far that models applied on contingency tables can be expressed in terms of expected cell frequencies or equivalently in terms of local odds ratios. The choice depends on issues of interpretation and on convenience of model formulation. For example, the association models are easier interpreted through the local odds ratios. On the other hand, the quasi-independence model can be expressed in terms of local odds ratios but is too complicated to compete with (5.24).

A clarifying and inspiring insight into the possible different views of log-linear models is provided by Goodman (1981d), who considers three alternative views, depending on the purpose of the analysis. The model imposed on the cell frequencies is preferred whenever the purpose is the examination of the joint distribution of the contingency table. Local odds ratio formulation of the model is employed when interest lies on the association between the two variables that are cross-classified. In both cases, the classification variables of the table are treated symmetrically. If there exists a response variable, then modeling the possible dependence of the response variable on the explanatory one is more adequate than the symmetric approaches and leads to more direct interpretations. This constitutes the third view and corresponds to modeling the odds for the response variable, given that the explanatory variable is at a fixed, prespecified level. Such models are presented in Chap. 8. Goodman (1981d) discussed the connections between these different approaches of log-linear modeling and illustrated them on characteristic examples. These comments apply also to the special models for square tables in Chap. 9.