Keywords

1 Introduction

There are situations in practice where a univariate multinomial response, for example, the economic profit status of a pharmaceutical industry such as poor, medium, or high, may be recorded over the years along with known covariates such as type of industry, yearly advertising cost, and other research and development expenditures. It is likely that the profit status of an industry in a given year is correlated with status of profits from the past years. It is of interest to know both (i) the effects of the time dependent covariates, and (ii) the dynamic relationship among the responses over the years. This type of multinomial time series data has been analyzed by some authors such as Fahrmeir and Kaufmann [4], Kaufmann [8], Fokianos and Kedem [57], and Loredo-Osti and Sutradhar [10]. As far as the dynamic relationship is concerned, Loredo-Osti and Sutradhar [10] have considered a multinomial dynamic logit (MDL) model as a generalization of the binary time series model used in Tagore and Sutradhar [16] (see also Tong [17]).

Suppose that \(y_{t} = (y_{t1},\ldots , y_{tj},\ldots , y_{t,J-1})^{\prime }\) denotes the \((J-1)\)-dimensional multinomial response variable and for \(j=1,\ldots ,J-1,\)

$$\begin{aligned} y^{(j)}_{t}=\big (y^{(j)}_{t1}, \ldots , y^{(j)}_{tj}, \ldots ,y^{(j)}_{t,J-1}\big )^{\prime }= \big (01^{\prime }_{j-1},1,01^{\prime }_{J-1-j}\big )^{\prime } \equiv \delta _{tj} \end{aligned}$$
(1)

indicates that the multinomial response recorded at time t belongs to the jth category. For \(j=J,\) one writes \(y^{(J)}_{t}=\delta _{tJ}=01_{J-1}.\) Here and also in (1), for a scalar constant c, we have used \(c1_j\) for simplicity, to represent \(c \otimes 1_j, \otimes \) being the well known Kronecker or direct product. This notation will also be used through out the rest of the paper when needed. Note that in the non-stationary case, that is, when covariates are time dependent, one uses the time dependent marginal probabilities. Specifically, suppose that at time point t (\(t=1,\ldots ,T\)), \(x_t=(x_{t1},\ldots ,x_{t\ell },\ldots ,x_{t,p+1})'\) denotes the \((p+1)\)-dimensional covariate vector and \(\beta _j=(\beta _{j0},\beta _{j1},\ldots , \beta _{jp})'\) denotes the effect of \(x_t\) on \(y^{(j)}_{t}\) for \(j=1,\ldots ,J-1,\) and all \(t=1,\ldots ,T, T\) being the length of the time series. In such cases, the multinomial probability at time t, has the form

$$\begin{aligned} P\big [y_{t}=y^{(j)}_{t}\big ]=\pi _{(t)j} =\left\{ \begin{array}{ll}\dfrac{\exp \big (x'_t\beta _{j}\big )}{1+\sum ^{J-1}_{g=1}\exp \big (x'_t\beta _{g}\big )} &{}\quad \text{ for }\;\; j = 1,\ldots ,J-1;\quad t=1,\ldots ,T \\ \dfrac{1}{1+\sum ^{J-1}_{g=1}\exp \big (x'_t\beta _{g}\big )} &{}\quad \text{ for }\;\; j = J;\quad t=1,\ldots ,T, \end{array} \right. \end{aligned}$$
(2)

and the elements of \(y_{t}=(y_{t1},\ldots , y_{tj},\ldots ,y_{t,J-1})'\) at time t follow the multinomial probability distribution given by

$$\begin{aligned} P[y_{t1},\ldots , y_{tj},\ldots ,y_{t,J-1}]= \varPi ^J_{j=1}\pi ^{y_{tj}}_{(t)j}, \end{aligned}$$
(3)

for all \(t=1,\ldots ,T.\) In (3), \(y_{tJ}=1-\sum ^{J-1}_{j=1}y_{tj}, \;\text{ and }\;\pi _{tJ}=1-\sum ^{J-1}_{j=1}\pi _{tj}.\)

Next we define the transitional probability from the gth \((g=1,\ldots ,J)\) category at time \(t-1\) to the jth category at time t,  given by

$$\begin{aligned} \eta ^{(j)}_{t|t-1}(g)= & {} P\Big ({Y}_{t}={y}^{(j)}_{t}\,\Big |\,{Y}_{t-1}={y}^{(g)}_{t-1}\Big ) \nonumber \\= & {} \left\{ \begin{array}{ll} \dfrac{{\exp \,\left[ x^{'}_{t}\beta _j+\gamma '_jy^{(g)}_{t-1}\right] }}{{1\,+\,\sum ^{J-1}_{v=1}\exp \,\left[ x^{'}_{t}\beta _v+\gamma '_vy^{(g)}_{t-1}\right] }}, &{}\quad \text{ for }\quad j=1,\ldots ,J-1 \\ \dfrac{1}{{1\,+\,\sum ^{J-1}_{v=1}\exp \,\left[ x^{'}_{t}\beta _v +\gamma '_vy^{(g)}_{t-1}\right] }}, &{}\quad \text{ for }\quad j=J,\\ \end{array}\right. \end{aligned}$$
(4)

where \(\gamma _j=(\gamma _{j1},\ldots ,\gamma _{jv},\ldots ,\gamma _{j,J-1})'\) denotes the dynamic dependence parameters. Note that this model in (4) is referred to as the multinomial dynamic logits (MDL) model. For the binary case (\(J=2\)), this type of non-linear dynamic logit model has been studied by some econometricians. See, for example, Amemiya [3, p. 422] in time series setup, and the recent book by Sutradhar [13, Sect. 7.7] in the longitudinal setup. Now for further notational convenience, we re-express the conditional probabilities in (4) as

$$\begin{aligned} \eta ^{(j)}_{t|t-1}(g)=\left\{ \begin{array}{ll}\frac{{\exp \,\left[ x^{'}_{t}\beta _j +\gamma '_j\delta _{(t-1)g}\right] }}{{1\,+\,\sum ^{J-1}_{v=1}\exp \,\left[ x^{'}_{t}\beta _v+\gamma '_v \delta _{(t-1)g}\right] }}, &{}\quad \text{ for }\quad j=1,\ldots ,J-1 \\ \frac{1}{{1\,+\,\sum ^{J-1}_{v=1}\exp \,\left[ x^{'}_{t}\beta _v +\gamma '_v\delta _{(t-1)g}\right] }}, &{}\quad \text{ for }\quad j=J,\\ \end{array}\right. \end{aligned}$$
(5)

where for \(t=2,\ldots ,T, \delta _{(t-1)g},\) by (1), has the formula

$$\delta _{(t-1)g} = \left\{ \begin{array}{ll} [01'_{g-1},1,01'_{J-1-g}]' &{}\quad \text{ for }\quad g=1,\ldots ,J-1 \\ 01_{J-1} &{}\quad \text{ for }\quad g=J. \end{array} \right. $$

Remark that in (5), the category g occurred at time \(t-1.\) Thus the category g depends on time \(t-1,\) and \(\delta _{(t-1)g}\equiv \delta _{g_{t-1}}.\) However for simplicity we have used g for \(g_{t-1}.\)

Let \(\beta \,{=}\,(\beta '_1,\ldots ,\beta '_j,\ldots ,\beta '_{J-1})': (p+1)(J-1) \times 1,\) and \(\gamma \,{=}\,(\gamma '_1,\ldots ,\gamma '_j,\ldots ,\gamma '_{J-1})': (J-1)^2 \times 1.\) These parameters are involved in the unconditional mean, variance and covariances of the responses. More specifically one may show [10] that

$$\begin{aligned} E[Y_{t}]= & {} \tilde{\pi }_{(t)}(\beta ,\gamma )=(\tilde{\pi }_{(t)1},\ldots ,\tilde{\pi }_{(t)j},\ldots , \tilde{\pi }_{(t)(J-1)})': (J-1) \times 1 \nonumber \\= & {} \left\{ \begin{array}{ll} [\pi _{(1)1},\ldots ,\pi _{(1)j},\ldots ,\pi _{(1)(J-1)}]' &{} \text{ for }\; t=1 \\ \eta _{(t|t-1)}(J)+\left[ \eta _{(t|t-1),M}-\eta _{(t|t-1)}(J) 1'_{J-1}\right] \tilde{\pi }_{(t-1)} &{} \text{ for }\;t=2,\ldots ,T-1 \end{array} \right. \end{aligned}$$
(6)
$$\begin{aligned} \text{ var }[Y_{t}]= & {} \text{ diag }[\tilde{\pi }_{(t)1},\ldots ,\tilde{\pi }_{(t)j},\ldots , \tilde{\pi }_{(t)(J-1)}]-\tilde{\pi }_{(t)}\tilde{\pi }'_{(t)} \nonumber \\= & {} (\text{ cov }(Y_{tj},Y_{tk}))=(\tilde{\sigma }_{(tt)jk}),\;j,k=1,\ldots ,J-1 \nonumber \\= & {} \tilde{\varSigma }_{(tt)}(\beta ,\gamma ), \; \text{ for }\; t=1,\ldots ,T \end{aligned}$$
(7)
$$\begin{aligned} \text{ cov }[Y_{u},Y_{t}]= & {} \varPi ^t_{s=u+1}\left[ \eta _{(s|s-1),M}-\eta _{(s|s-1)}(J) 1'_{J-1}\right] \text{ var }[Y_{u}],\;\text{ for }\; u<t, t=2,\ldots ,T \nonumber \\= & {} (\text{ cov }(Y_{uj},Y_{tk}))=(\tilde{\sigma }_{(ut)jk}),\;j,k=1,\ldots ,J-1 \nonumber \\= & {} \tilde{\varSigma }_{(ut)}(\beta ,\gamma ), \end{aligned}$$
(8)

where

$$\begin{aligned} \eta _{(s|s-1)}(J)= & {} [\eta ^{(1)}_{s|s-1}(J),\ldots ,\eta ^{(j)}_{s|s-1}(J) \ldots ,\eta ^{(J-1)}_{s|s-1}(J)]'=\pi _{(s)}:(J-1) \times 1 \\ \eta _{(s|s-1),M}= & {} \begin{pmatrix} \eta ^{(1)}_{s|s-1}(1) &{} \cdots &{} \eta ^{(1)}_{s|s-1}(g) &{} \cdots &{} \eta ^{(1)}_{s|s-1}(J-1) \\ \vdots &{} \vdots &{} \vdots &{} \vdots &{} \vdots \\ \eta ^{(j)}_{s|s-1}(1) &{} \cdots &{} \eta ^{(j)}_{s|s-1}(g) &{} \cdots &{} \eta ^{(j)}_{s|s-1}(J-1) \\ \vdots &{} \vdots &{} \vdots &{} \vdots &{} \vdots \\ \eta ^{(J-1)}_{s|s-1}(1) &{} \cdots &{} \eta ^{(J-1)}_{s|s-1}(g) &{} \cdots &{} \eta ^{(J-1)}_{s|s-1}(J-1) \end{pmatrix}: (J-1) \times (J-1). \end{aligned}$$

Notice that there is a relation between the vector \(\eta _{(s|s-1)}(J)\) and the matrix \(\eta _{(s|s-1),M}.\) This is because the transition matrix \(\eta _{(s|s-1),M}\) contains the transitional probabilities from any of the first \(J-1\) states at time \(s-1\) to any of the \(J-1\) states at time s,  whereas the transition vector \(\eta _{(s|s-1)}(J)\) contains transitional probabilities from the Jth state at time \(s-1\) to any of the first \(J-1\) states at time \(s-1.\) Consequently, once the transition matrix \(\eta _{(s|s-1),M}\) is computed, the transition vector \(\eta _{(s|s-1)}(J)\) becomes known.

It is of importance to estimate \(\beta \) and \(\gamma \) parameters mainly to understand the aforementioned basic properties including the pair-wise correlations of the responses.

Note however that the multinomial time series model (2)–(5) and its basic moment properties shown in (6)–(8) are derived without any order restrictions of the categories of the responses. The purpose of this paper is to estimate the parameters \(\beta \) and \(\gamma \) under an ordinal categorical response model which we describe in Sect. 2. In Sect. 3, we demonstrate the application of a pseudo likelihood approach for the estimation for these parameters. Some concluding remarks are made in Sect. 4.

2 Cumulative MDL Model for Ordinal Categorical Data

When categories for a response at a given time t are ordinal, one may then collapse the \(J >2\) categories in a cumulative fashion into two \((J'=2)\) categories and use simpler binary model to fit such collapsed data. Note however that there will be various binary groups depending on which category in the middle is used as a cut point. For the transitional categorical response from time \(t-1\) (say) to time t,  cumulation of the categories at time t has to be computed conditional on the cumulative categories at time \(t-1.\) This will also generate a binary model for cumulative transitional responses. These concepts of cumulative probabilities for a cumulative response are used in the next section to construct the desired cumulative MDL model.

2.1 Marginal Cumulative Model at Time \(t=1\)

Suppose that for a selected cut point \(j (j=1,\ldots ,J-1), F_{(1)j}=\sum ^j_{c=1}\pi _{(1)c}\) represents the probability for a multinomial response to be in category c between 1 and j,  where \(\pi _{(1)c}\) by (2) defines the probability for the response to be in category c (\(c=1,\ldots ,J)\) at time \(t=1.\) Thus, \(1-F_{(1)j}=\sum ^J_{c=j+1}\pi _{(1)c}\) would represent the probability for the multinomial response to be in category c beyond j. To reflect this binary nature of the observed response in category c with regard to cut point j,  we define a binary variable \(b^{(j)}_c(1)\) such that

$$\begin{aligned} P\big [b^{(j)}_c(1)=1\big ]=1-F_{(1)j}=\sum ^J_{c=j+1}\pi _{(1)c}. \end{aligned}$$
(9)

Notice that because there are \(J-1\) possible cut points, if the categories are ordered and the response falls in cth category, by (11) below, we then obtain the cut points based observed vector at time \(t=1\) as

$$ \big [b^{(1)}_c(1)=1,\ldots ,b^{(c-1)}_c(1)=1,b^{(c)}_c(1)=0, \ldots ,b^{(J-1)}_c(1)=0\big ].$$

For other values of t,  the observed responses are constructed similarly depending on the response category.

2.2 Lag 1 Transitional Cumulative Model at Time \(t=2,\ldots ,T\)

In order to develop a transitional model, suppose we observe that the multinomial response at time \(t-1 (t=2,\ldots ,T)\) was in \(c_1\)th category \((c_1=1,\ldots ,J),\) whereas at time t it is observed in \(c_2 (c_2=1,\ldots ,J)\) category. Let (gj) denote a bivariate cut point which facilitates the binary variables [similar to (9)] given by

$$\begin{aligned} b^{(g)}_{c_1}(t-1)=\left\{ \begin{array}{ll} 1 &{} \text{ for } \text{ the } \text{ response } \text{ in } \text{ category }\; c_1 >g\; \text{ at } \text{ time }\; t-1\\ 0 &{} \text{ for } \text{ the } \text{ response } \text{ in } \text{ category }\; c_1 \le g \; \text{ at } \text{ time }\; t-1, \end{array} \right. \end{aligned}$$
(10)

and

$$\begin{aligned} b^{(j)}_{c_2}(t)=\left\{ \begin{array}{ll} 1 &{} \text{ for } \text{ the } \text{ response } \text{ in } \text{ category }\; c_2 >j \; \text{ at } \text{ time }\; t\\ 0 &{} \text{ for } \text{ the } \text{ response } \text{ in } \text{ category }\; c_2 \le j \; \text{ at } \text{ time }\; t. \end{array} \right. \end{aligned}$$
(11)

Consequently, a transitional probability model based on conditional probabilities (5) may be written as

$$\begin{aligned}&P\big [b^{(j)}_{c_2}(t)=1|b^{(g)}_{c_1}(t-1)\big ] =\tilde{\lambda }^{(2)}_{gj}\big (b^{(g)}_{c_1}(t-1)\big ) \nonumber \\= & {} \left\{ \begin{array}{ll} \tilde{\lambda }^{(2)}_{gj}(1) &{} \text{ for }\;b^{(g)}_{c_1}(t-1)=0 \\ \tilde{\lambda }^{(2)}_{gj}(2) &{} \text{ for }\;b^{(g)}_{c_1}(t-1)=1, \end{array} \right. \end{aligned}$$
(12)
$$\begin{aligned}= & {} \left\{ \begin{array}{l} \frac{1}{g}\sum ^g_{c_1=1}\sum ^J_{c_2=j+1}\lambda ^{(c_2)}_{t|t-1}(c_1) \\ \frac{1}{J-g}\sum ^J_{c_1=g+1}\sum ^J_{c_2=j+1}\lambda ^{(c_2)}_{t|t-1}(c_1), \end{array} \right. \end{aligned}$$
(13)

where the conditional probability \(\lambda ^{(c_2)}_{t|t-1}(c_1),\) has the known multinomial dynamic logit (MDL) form given by (5). For convenience, following (12)–(13), we also write

$$\begin{aligned}&P[b^{(j)}_{c_2}(t)=0|b^{(g)}_{c_1}(t-1)] =1-\tilde{\lambda }^{(2)}_{gj}(b^{(g)}_{c_1}(t-1)) \nonumber \\= & {} \left\{ \begin{array}{ll} \tilde{\lambda }^{(1)}_{gj}(1)=1-\tilde{\lambda }^{(2)}_{gj}(1) &{} \text{ for }\;b^{(g)}_{c_1}(t-1)=0 \\ \tilde{\lambda }^{(1)}_{gj}(2)=1-\tilde{\lambda }^{(2)}_{gj}(2) &{} \text{ for }\;b^{(g)}_{c_1}(t-1)=1, \end{array} \right. \end{aligned}$$
(14)
$$\begin{aligned}= & {} \left\{ \begin{array}{l} \frac{1}{g}\sum ^g_{c_1=1}\left[ 1-\sum ^J_{c_2=j+1}\lambda ^{(c_2)}_{t|t-1}(c_1)\right] \\ \frac{1}{J-g}\sum ^J_{c_1=g+1}\left[ 1-\sum ^J_{c_2=j+1}\lambda ^{(c_2)}_{t|t-1}(c_1)\right] \end{array} \right. \end{aligned}$$
(15)
$$\begin{aligned}= & {} \left\{ \begin{array}{l} \frac{1}{g}\sum ^g_{c_1=1}\sum ^j_{c_2=1}\lambda ^{(c_2)}_{t|t-1}(c_1) \\ \frac{1}{J-g}\sum ^J_{c_1=g+1}\sum ^j_{c_2=1}\lambda ^{(c_2)}_{t|t-1}(c_1). \end{array} \right. \end{aligned}$$
(16)

3 Pseudo Binary Likelihood Estimation for the Ordinal Model

In this section, we construct a binary data based likelihood function, where the binary data are obtained by collapsing the available ordinal multinomial observations. Consequently, we refer to this likelihood approach as the so-called pseudo likelihood approach. However, for convenience, we use the terminology ‘likelihood’ for the ‘pseudo likelihood’, through out the section.

At \(t=1,\) the marginal likelihood for \(\beta \) by (9) has the form

$$\begin{aligned} L_1(\beta )= & {} \varPi ^{J-1}_{j=1}\left[ \{F_{(1)j}\}^{1-b^{(j)}_{c}(1)}\right] \left[ \{1- F_{(1)j}\}^{b^{(j)}_{c}(1)}\right] \nonumber \\= & {} \varPi ^{J-1}_{j=1}\left[ \left\{ \sum ^j_{c=1}\pi _{(1)c}\right\} ^{1-b^{(j)}_{c}(1)}\right] \left[ \left\{ \sum ^J_{c=j+1}\pi _{(1)c}\right\} ^{b^{(j)}_{c}(1)}\right] , \end{aligned}$$
(17)

where

$$\begin{aligned} b^{(j)}_{c}(1)=\left\{ \begin{array}{ll} 1 &{}\quad \text{ for }\; c >j\\ 0 &{}\quad \text{ for }\; c \le j. \end{array} \right. \end{aligned}$$
(18)

Next for the construction of the conditional likelihood at t given the information from previous time point \(t-1,\) we first re-express the binary conditional probabilities in (12) and (14), as

$$\begin{aligned} \tilde{\lambda }^{(2)}_{gj}(g^*)= & {} \left\{ \begin{array}{ll} \tilde{\lambda }^{(2)}_{gj}(1) &{} \quad \text{ for }\;b^{(g)}_{c_1}(t-1)=0 \\ \tilde{\lambda }^{(2)}_{gj}(2) &{} \quad \text{ for }\;b^{(g)}_{c_1}(t-1)=1, \end{array} \right. \end{aligned}$$
(19)
$$\begin{aligned} \tilde{\lambda }^{(1)}_{gj}(g^*)= & {} \left\{ \begin{array}{ll} \tilde{\lambda }^{(1)}_{gj}(1) &{}\quad \text{ for }\;b^{(g)}_{c_1}(t-1)=0 \\ \tilde{\lambda }^{(1)}_{gj}(2) &{}\quad \text{ for }\;b^{(g)}_{c_1}(t-1)=1. \end{array} \right. \end{aligned}$$
(20)

One may then write the conditional likelihood for \(\beta \) and \(\gamma ,\) as

$$\begin{aligned} L_{t|t-1}(\beta ,\gamma ) =\varPi ^{J-1}_{g=1}\varPi ^{J-1}_{j=1}\varPi ^{2}_{g^*=1}\left[ \left\{ \tilde{\lambda }^{(2)}_{gj}(g^*)\right\} ^{b^{(j)}_{c_2}(t)} \left\{ \tilde{\lambda }^{(1)}_{gj}(g^*)\right\} ^{1-b^{(j)}_{c_2}(t)}\right] , \end{aligned}$$
(21)

where the binary data \(b^{(j)}_{c_2}(t)\) for observed \(c_2\) are obtained by (11), and similarly \(b^{(g)}_{c_1}(t-1)\) to define \(g^*\) for given \(c_1\) are obtained from (10).

Next by combining (17) and (21), one obtains the likelihood function for \(\beta \) and \(\gamma \) as

$$\begin{aligned} L(\beta ,\gamma )= & {} L_1(\beta )\varPi ^T_{t=2}L_{t|t-1}(\beta ,\gamma ) \nonumber \\= & {} \varPi ^{J-1}_{j=1}\left[ \{F_{(1)j}\}^{1-b^{(j)}_{c}(1)}\right] \left[ \{1- F_{(1)j}\}^{b^{(j)}_{c}(1)}\right] \nonumber \\&\times \,\varPi ^T_{t=2}\varPi ^{J-1}_{g=1}\varPi ^{J-1}_{j=1}\varPi ^{2}_{g^*=1} \left[ \left\{ \tilde{\lambda }^{(2)}_{gj}(g^*)\right\} ^{b^{(j)}_{c_2}(t)} \left\{ \tilde{\lambda }^{(1)}_{gj}(g^*)\right\} ^{1-b^{(j)}_{c_2}(t)}\right] . \end{aligned}$$
(22)

For the benefit of the practitioners, we now develop the likelihood estimating equations for these parameters \(\beta \) and \(\gamma ,\) as in the following sections. Remark that for the construction of similar likelihood estimating equations in the stationary longitudinal setup, one may be referred to Sutradhar [14, Sect. 3.6.2.2].

Note that the likelihood function in (22) is constructed by collapsing the ordinal multinomial responses to the binary responses at all suitable cut points. This likelihood function, therefore, can not be used for nominal multinomial time series data. When the categories are nominal, it is appropriate to construct the likelihood function by exploiting the marginal probability function \(\pi _{(t)j}\) from (2) for \(t=1,\) and the conditional multinomial logit probability function \(\eta ^{(j)}_{t|t-1}(g)\) from (4) for \(t=2,\ldots ,T\) (see Loredo-Osti and Sutradhar [10]). Notice that in practice the time dependent covariates \(x_t\) in (2) and (4) are fixed in general. However, by treating \(x_t\) as a random covariate vector, Fokianos and Kedem [6] obtained parameter estimates by maximizing a partial likelihood function without requiring any extra characterization of the joint process \(\{y_t,x_t\}.\) Loredo-Osti and Sutradhar [10] have, however, argued that in Fokianos and Kedem’s [6] approach, the conditional Fisher information matrix is not the same as the one obtained by conditioning on \(\{x_t\},\) the observed covariates. In fact, when the estimation is carried out in a general linear models framework that uses the canonical link function, this conditional information matrix obtained by Fokianos and Kedem, is just the Hessian matrix multiplied by \(-1,\) i.e., the observed information matrix.

As far as the ordinal multinomial time series data are concerned, the construction of binary mapping based likelihood function in (22) is a new concept. The core idea comes from the cumulative binary property for the MDL (multinomial dynamic logit) model (4) because of the present ordinal nature of the data. In the cross sectional setup, that is, for the case with \(t=1\) only, the likelihood function for ordinal multinomial data has been used by many authors such as Agresti [1]. Note that the marginal multinomial probability in (2) has the multinomial logit form. In the cluster data setup, many existing studies use this multinomial logit model (2) as the marginal model at a given time t. As far as the correlations between repeated responses are concerned, some authors such as Agresti [1], Lipsitz et al. [9], Agresti and Natarajan [2] do not model them, rather they use ‘working’ correlations to construct the so-called generalized estimating equations and solve them to obtain the estimates for regression parameters involved in the marginal multinomial logits model (2). These estimates however may not be reliable as they can be inefficient as compared to the ‘working’ independence assumption based estimates (see Sutradhar and Das [15], Sutradhar [13, Chap. 7] in the context of binary longitudinal data analysis). Thus, their extension to the time series setup may be useless. Moreover, it is not clear how to model the ordinal data using this type of ‘working’ correlations approach.

3.1 Likelihood Estimating Equations for the Regression Effects \(\beta \)

Recall that \(\beta =(\beta '_1,\ldots ,\beta '_j,\ldots ,\beta '_{J-1})': (J-1)(p+1) \times 1,\) with \(\beta _j=(\beta _{j0},\beta _{j1},\ldots ,\beta _{jp})'.\) For known \(\gamma ,\) in this section, we exploit the likelihood function (22) and develop the likelihood estimating equation for \(\beta .\) For convenience, we use log likelihood function, which, following the likelihood function in (22), is written as

$$\begin{aligned}&\text{ Log }\;L(\beta ,\gamma ) =\sum ^{J-1}_{j=1}\left[ \{1-b^{(j)}_{c}(1)\}\text{ log } F_{(1)j}+\{b^{(j)}_{c}(1)\}\text{ log }\{1- F_{(1)j}\} \right] \nonumber \\+ & {} \sum ^T_{t=2}\sum ^{J-1}_{g=1}\sum ^{J-1}_{j=1}\sum ^2_{g^*=1}\left[ b^{(j)}_{c_2}(t) \text{ log } \left\{ \tilde{\lambda }^{(2)}_{gj}(g^*)\right\} + \{1-b^{(j)}_{c_2}(t)\} \text{ log } \left\{ \tilde{\lambda }^{(1)}_{gj}(g^*)\right\} \right] , \end{aligned}$$
(23)

yielding the likelihood estimating equation for \(\beta \) as

$$\begin{aligned}&\frac{\partial \text{ Log }\;L(\beta ,\gamma )}{\partial \beta } =\sum ^{J-1}_{j=1}\left[ \frac{\{1-b^{(j)}_{c}(1)\}}{F_{(1)j}}-\frac{\{b^{(j)}_{c}(1)\}}{\{1- F_{(1)j}\}} \right] \frac{\partial F_{(1)j}}{\partial \beta }\nonumber \\+ & {} \sum ^T_{t=2}\sum ^{J-1}_{g=1}\sum ^{J-1}_{j=1}\sum ^2_{g^*=1}\left[ \frac{b^{(j)}_{c_2}(t)}{\tilde{\lambda }^{(2)}_{gj}(g^*)}-\frac{ \{1-b^{(j)}_{c_2}(t)\}}{\{1-\tilde{\lambda }^{(2)}_{gj}(g^*)\}} \right] \frac{\partial \tilde{\lambda }^{(2)}_{gj}(g^*)}{\partial \beta } \nonumber \\= & {} 0, \end{aligned}$$
(24)

where

$$\begin{aligned} \frac{\partial F_{(1)j}}{\partial \beta } = \sum ^j_{c=1}\left[ \pi _{(1)c}(\delta _{(1)c}-\pi _{(1)})\right] \otimes x_{1}; \end{aligned}$$
(25)

and

$$\begin{aligned} \frac{\partial \tilde{\lambda }^{(2)}_{gj}(g^*)}{\partial \beta } =\left\{ \begin{array}{ll} \frac{1}{g}\sum ^g_{c_1=1}\sum ^J_{c_2=j+1}\left[ \eta ^{(c_2)}_{t|t-1}(c_1)(\delta _{(t-1)c_2}-\eta _{t|t-1}(c_1))\right] \otimes x_{t} &{} \text{ for }\; g^*=1 \\ \frac{1}{J-g}\sum ^J_{c_1=g+1}\sum ^J_{c_2=j+1}\left[ \eta ^{(c_2)}_{t|t-1}(c_1)(\delta _{(t-1)c_2}-\eta _{t|t-1}(c_1))\right] \otimes x_{t} &{} \text{ for }\; g^*=2, \end{array} \right. \end{aligned}$$
(26)

with

$$\begin{aligned} \pi _{(1)}= & {} \left[ \pi _{(1)1},\ldots ,\pi _{(1)c},\ldots ,\pi _{(1)(J-1)}\right] ' \nonumber \\ \delta _{(t-1)c}= & {} \left\{ \begin{array}{ll} [01'_{c-1},1,01'_{J-1-c}]' &{}\quad \text{ for }\;c=1,\ldots ,J-1\\ 01_{J-1} &{}\quad \text{ for }\; c=J, \end{array} \right. \nonumber \\ \eta _{t|t-1}(c_1)= & {} \left[ \eta ^{(1)}_{t|t-1}(c_1),\ldots ,\eta ^{(c_2)}_{t|t-1}(c_1), \ldots ,\eta ^{(J-1)}_{t|t-1}(c_1)\right] '. \end{aligned}$$
(27)

The details for the derivatives in (25) and (26) are given in “Appendix”.

For given \(\gamma \), the likelihood equations in (24) may be solved iteratively by using the iterative equation for \(\beta \) given by

$$\begin{aligned} \hat{\beta }(r+1)=\hat{\beta }(r)-\left[ \left\{ \frac{\partial ^2 \text{ Log }\; L (\beta ,\gamma )}{\partial {\beta }' \partial {\beta }}\right\} ^{-1} \frac{\partial \text{ Log }\; L (\beta ,\gamma )}{\partial \beta }\right] _{| \beta =\hat{\beta }(r)};\;(J-1)(p+1) \times 1, \end{aligned}$$
(28)

where the formula for the second order derivative matrix \(\frac{\partial ^2 Log\;L(\beta ,\gamma _M)}{\partial {\beta }'\partial \beta }\) may be derived by taking the derivative of the \((J-1)(p+1) \times 1\) vector with respect to \(\beta '.\) The exact second order derivative matrix has a complicated formula. We provide an approximation as follows.

An approximation to \(\frac{\partial ^2 Log\;L(\beta ,\gamma _M)}{\partial {\beta }'\partial \beta }:\)

Re-express the likelihood estimating equation from (24) as

$$\begin{aligned}&\frac{\partial \text{ Log }\;L(\beta ,\gamma )}{\partial \beta } =\sum ^{J-1}_{j=1}\frac{\partial F_{(1)j}}{\partial \beta }\{(1- F_{(1)j})F_{(1)j}\}^{-1}\left[ \{1-b^{(j)}_{c}(1)\}-F_{(1)j}\right] \nonumber \\+ & {} \sum ^T_{t=2}\sum ^{J-1}_{g=1}\sum ^{J-1}_{j=1}\sum ^2_{g^*=1}\frac{\partial \tilde{\lambda }^{(2)}_{gj}(g^*)}{\partial \beta }\left\{ \tilde{\lambda }^{(2)}_{gj}(g^*) \left( 1-\tilde{\lambda }^{(2)}_{gj}(g^*)\right) \right\} ^{-1} \left[ b^{(j)}_{c_2}(t)-\tilde{\lambda }^{(2)}_{gj}(g^*)\right] \nonumber \\= & {} 0. \end{aligned}$$
(29)

Notice that in the first term in the left hand side of (29), \(\{1-b^{(j)}_{c}(1)\}\) is, by (9), a binary variable with

$$\begin{aligned} E\left\{ 1-b^{(j)}_{c}(1)\right\}= & {} F_{(1)j} \nonumber \\ \text{ var }\{1-b^{(j)}_{c}(1)\}= & {} F_{(1)j} \{1-F_{(1)j} \}, \end{aligned}$$
(30)

and similarly in the second term, by (12), \(b^{(j)}_{c_2}(t)\) conditional on \(b^{(g)}_{c_1}(t-1)\) is a binary variable with

$$\begin{aligned} E\left[ b^{(j)}_{c_2}(t)|b^{(g)}_{c_1}(t-1)\right]= & {} \tilde{\lambda }^{(2)}_{gj}(g^*) \nonumber \\ \text{ var }\left[ b^{(j)}_{c_2}(t)|b^{(g)}_{c_1}(t-1)\right]= & {} \tilde{\lambda }^{(2)}_{gj}(g^*)\left[ 1-\tilde{\lambda }^{(2)}_{gj}(g^*)\right] , \end{aligned}$$
(31)

for \(g^* \equiv b^{(g)}_{c_1}(t-1).\) Thus, the likelihood estimating function in (29) is equivalent to a conditional quasi-likelihood (CQL) function in \(\beta \) for the cut points based binary data [e.g. see Tagore and Sutradhar [16, Eq. (27), p. 888]. Now because the variance of the binary data is a function of the mean, the variance and gradient functions in (29) may be treated to be known when mean is known. Thus, when a QL estimating equation is solved iteratively, the gradient and variance functions use \(\beta \) from a previous iteration [11, 18]. Consequently, by (29), the second derivative matrix required to compute (28) has a simpler approximate formula

$$\begin{aligned}&\frac{\partial ^2 \text{ Log }\; L (\beta ,\gamma )}{\partial {\beta } \partial {\beta }'} =-\sum ^{J-1}_{j=1}\frac{\partial F_{(1)j}}{\partial \beta }\{(1- F_{(1)j})F_{(1)j}\}^{-1}\frac{\partial F_{(1)j}}{\partial \beta '} \nonumber \\- & {} \sum ^T_{t=2}\sum ^{J-1}_{g=1}\sum ^{J-1}_{j=1}\sum ^2_{g^*=1}\frac{\partial \tilde{\lambda }^{(2)}_{gj}(g^*)}{\partial \beta }\left\{ \tilde{\lambda }^{(2)}_{gj}(g^*) \left( 1-\tilde{\lambda }^{(2)}_{gj}(g^*)\right) \right\} ^{-1} \frac{\partial \tilde{\lambda }^{(2)}_{gj}(g^*)}{\partial \beta '}. \end{aligned}$$
(32)

Furthermore for known \(\gamma \), by (28) and (29), under some mild conditions it follows that the solution of (29), say \(\hat{\beta },\) satisfies

$$\begin{aligned} \hat{\beta } \sim N(\beta ,V(\beta ,\gamma )), \end{aligned}$$
(33)

(see Kaufmann [8, Sect. 5]) where the covariance matrix is estimated by

$$\begin{aligned} \hat{V}(\cdot )=\left[ -\frac{\partial ^2 \text{ Log }\; L (\beta ,\gamma )}{\partial {\beta } \partial {\beta }'}\right] ^{-1}_{\beta =\hat{\beta }}. \end{aligned}$$
(34)

3.2 Likelihood Estimating Equations for the Dynamic Dependence Parameters \(\gamma \)

In Sect. 3.1, we have estimated \(\beta \) for known \(\gamma ,\) for example, initially by using \(\gamma =0,\) where by (4)–(5),

$$\gamma =\left( \gamma '_1,\ldots ,\gamma '_j,\ldots ,\gamma '_{J-1}\right) ',\;\text{ with }\; \gamma _j=\left( \gamma _{j1},\ldots ,\gamma _{jv},\ldots ,\gamma _{j,J-1}\right) '.$$

Note that \(F_{(1)j}\) for all \(j=1,\ldots ,J-1,\) are free from \(\gamma .\) Hence, by exploiting the log likelihood function (23), similar to (24), we write the likelihood equation for \(\gamma \) as

$$\begin{aligned} \frac{\partial \text{ Log }\;L(\beta ,\gamma )}{\partial \gamma } = \sum ^T_{t=2}\sum ^{J-1}_{g=1}\sum ^{J-1}_{j=1}\sum ^2_{g^*=1}\left[ \frac{b^{(j)}_{c_2}(t)}{\tilde{\lambda }^{(2)}_{gj}(g^*)}-\frac{ \left\{ 1-b^{(j)}_{c_2}(t)\right\} }{\left\{ 1-\tilde{\lambda }^{(2)}_{gj}(g^*)\right\} } \right] \frac{\partial \tilde{\lambda }^{(2)}_{gj}(g^*)}{\partial \gamma }=0, \end{aligned}$$
(35)

where

$$\begin{aligned} \frac{\partial \tilde{\lambda }^{(2)}_{gj}(g^*)}{\partial \gamma } =\left\{ \begin{array}{ll} \frac{1}{g}\sum ^g_{c_1=1}\sum ^J_{c_2=j+1}\left[ \eta ^{(c_2)}_{t|t-1}(c_1) (\delta _{(t-1)c_2}-\eta _{t|t-1}(c_1))\right] \otimes \delta _{(t-1)c_1} &{} \text{ for }\; g^*=1 \\ \frac{1}{J-g}\sum ^J_{c_1=g+1}\sum ^J_{c_2=j+1}\left[ \eta ^{(c_2)}_{t|t-1}(c_1)(\delta _{(t-1)c_2}-\eta _{t|t-1}(c_1))\right] \otimes \delta _{(t-1)c_1} &{} \text{ for }\; g^*=2. \end{array} \right. \end{aligned}$$
(36)

An outline for this derivative is given in the “Appendix”.

By similar calculations as in (28), one may solve the likelihood estimating equation in (35) for \(\gamma \) using the iterative equation

$$\begin{aligned} \hat{\gamma }(r+1)=\hat{\gamma }(r)-\left[ \left\{ \frac{\partial ^2 \text{ Log }\; L (\beta ,\gamma )}{\partial {\gamma } \partial {\gamma }'}\right\} ^{-1} \frac{\partial \text{ Log }\; L (\beta ,\gamma )}{\partial \gamma }\right] _{| \gamma =\hat{\gamma }(r)};\;(J-1)^2 \times 1, \end{aligned}$$
(37)

where the second order derivative matrix, following (32), may be computed as

$$\begin{aligned} \frac{\partial ^2 \text{ Log }\; L (\beta ,\gamma )}{\partial {\gamma } \partial {\gamma }'} =-\sum ^T_{t=2}\sum ^{J-1}_{g=1}\sum ^{J-1}_{j=1}\sum ^2_{g^*=1}\frac{\partial \tilde{\lambda }^{(2)}_{gj}(g^*)}{\partial \gamma }\left\{ \tilde{\lambda }^{(2)}_{gj}(g^*) \left( 1-\tilde{\lambda }^{(2)}_{gj}(g^*)\right) \right\} ^{-1} \frac{\partial \tilde{\lambda }^{(2)}_{gj}(g^*)}{\partial \gamma '}. \end{aligned}$$
(38)

Furthermore for known \(\beta \), by (37) and (38), it follows under some mild conditions that the solution of (35), say \(\hat{\gamma },\) satisfies

$$\begin{aligned} \hat{\gamma } \sim N(\gamma ,V^*(\beta ,\gamma )), \end{aligned}$$
(39)

(see Kaufmann [8, Sect. 5]) where the covariance matrix is estimated by

$$\begin{aligned} \hat{V^*}(\cdot )=\left[ -\frac{\partial ^2 \text{ Log }\; L (\beta ,\gamma )}{\partial {\gamma } \partial {\gamma }'}\right] ^{-1}_{\gamma =\hat{\gamma }}. \end{aligned}$$
(40)

3.3 Joint Likelihood Estimating Equations for \(\beta \) and \(\gamma \)

Let \(\theta =(\beta ',\gamma ')'.\) One may then combine (28) and (37) and solve the iterative equation

$$\begin{aligned} \hat{\theta }(r+1)=\hat{\theta }(r)-\left[ \begin{pmatrix} \frac{\partial ^2 \text{ Log }\; L (\beta ,\gamma )}{\partial {\beta } \partial {\beta }'} &{} \frac{\partial ^2 \text{ Log }\; L (\beta ,\gamma )}{\partial {\beta } \partial {\gamma }'} \\ \frac{\partial ^2 \text{ Log }\; L (\beta ,\gamma )}{\partial {\gamma } \partial {\beta }'} &{} \frac{\partial ^2 \text{ Log }\; L (\beta ,\gamma )}{\partial {\gamma } \partial {\gamma }'}\end{pmatrix}^{-1} \begin{pmatrix} \frac{\partial \text{ Log }\; L (\beta ,\gamma )}{\partial \beta } \\ \frac{\partial \text{ Log }\; L (\beta ,\gamma )}{\partial \gamma } \end{pmatrix}\right] _{| \theta =\hat{\theta }(r)} \end{aligned}$$
(41)

to obtain the joint likelihood estimates for \(\beta \) and \(\gamma .\) In order to construct the iterative equation (41), we require the formula for the second order derivative matrix \(\frac{\partial ^2 \text{ Log }\; L (\beta ,\gamma )}{\partial {\beta } \partial {\gamma }'} \) which, using (29), may be approximately computed as

$$\begin{aligned} \frac{\partial ^2 \text{ Log }\;L(\beta ,\gamma )}{\partial \beta \partial \gamma '} =-\sum ^T_{t=2}\sum ^{J-1}_{g=1}\sum ^{J-1}_{j=1}\sum ^2_{g^*=1}\frac{\partial \tilde{\lambda }^{(2)}_{gj}(g^*)}{\partial \beta }\left\{ \tilde{\lambda }^{(2)}_{gj}(g^*) \left( 1-\tilde{\lambda }^{(2)}_{gj}(g^*)\right) \right\} ^{-1} \frac{\partial \tilde{\lambda }^{(2)}_{gj}(g^*)}{\partial \gamma '}, \end{aligned}$$
(42)

where the formulas for \(\frac{\partial \tilde{\lambda }^{(2)}_{gj}(g^*)}{\partial \beta }\) and \(\frac{\partial \tilde{\lambda }^{(2)}_{gj}(g^*)}{\partial \gamma }\) are given by (26) and (36), respectively.

Furthermore, by similar arguments to (33) and (39), under some mild conditions it follows that the solution of (41), say \(\begin{pmatrix}\tilde{\beta } \\ \tilde{\gamma }\end{pmatrix}\) has the multivariate Gaussian distribution

$$\begin{aligned} \begin{pmatrix}\tilde{\beta } \\ \tilde{\gamma }\end{pmatrix} \sim N \left[ \begin{pmatrix}\beta \\ \gamma \end{pmatrix}, \begin{pmatrix}\tilde{V}_{11}(\beta ,\gamma ) &{} \tilde{V}_{12}(\beta ,\gamma ) \\ \tilde{V}'_{12}(\beta ,\gamma ) &{} \tilde{V}_{22}(\beta ,\gamma )\end{pmatrix}\right] , \end{aligned}$$
(43)

where \(\text{ cov }(\tilde{\beta })=\tilde{V}_{11}(\beta ,\gamma )\) and \(\text{ cov }(\tilde{\gamma })=\tilde{V}_{22}(\beta ,\gamma )\) are estimated as

$$\begin{aligned} \hat{\text{ cov }}(\tilde{\beta })= & {} A^{-1}+FE^{-1}F' \nonumber \\ \hat{\text{ cov }}(\tilde{\gamma })= & {} E^{-1}, \end{aligned}$$
(44)

Rao [12, p. 33] with \(E=D-B'A^{-1}B,\;\text{ and }\;F=A^{-1}B,\) where by (41)

$$A=\frac{\partial ^2 \text{ Log }\; L (\beta ,\gamma )}{\partial {\beta } \partial {\beta }'};\quad B=\frac{\partial ^2 \text{ Log }\; L (\beta ,\gamma )}{\partial {\beta } \partial {\gamma }'};\;\text{ and }\quad D=\frac{\partial ^2 \text{ Log }\; L (\beta ,\gamma )}{\partial {\gamma } \partial {\gamma }'}.$$

4 Concluding Remarks

Recently some authors such as Loredo-Osti and Sutradhar [10] (see also Fokianos and Kedem [6]) have developed a likelihood approach for the estimation of regression and dynamic dependence parameters involved in a multinomial dynamic logit (MDL) model used for categorical time series data. This inference issue becomes more complex when the categorical response collected at a given time point also exhibit an order. In this paper we have demonstrated that this type of ordinal categorical responses collected over time may be analyzed by collapsing a multinomial response to a binary response at a given possible cut point and fitting binary dynamic model to all such binary responses collected based on all possible cut points over all times. For simplicity, we have fitted a low order, namely lag 1 dynamic model among all possible cut points based binary responses. A pseudo likelihood method using binary responses (in stead of the multinomial observations) is then constructed for the estimation of the regression and dynamic dependence parameters. The authors plan to undertake an empirical study involving simulations and real life data analysis in order to investigate the performance of the proposed estimation approach both for moderate and large size time series. The empirical results will be published elsewhere.