Abstract
Regression analysis for multinomial/categorical time series is not adequately discussed in the literature. Furthermore, when categories of a multinomial response at a given time are ordinal, the regression analysis for such ordinal categorical time series becomes more complex. In this paper, we first develop a lag 1 transitional logit probabilities based correlation model for the multinomial responses recorded over time. This model is referred to as a multinomial dynamic logits (MDL) model. To accommodate the ordinal nature of the responses we then compute the binary distributions for the cumulative transitional responses with cumulative logits as the binary probabilities. These binary distributions are next used to construct a pseudo likelihood function for inferences for the repeated ordinal multinomial data. More specifically, for the purpose of model fitting, the likelihood estimation is developed for the regression and dynamic dependence parameters involved in the MDL model.
Access provided by Autonomous University of Puebla. Download chapter PDF
Similar content being viewed by others
Keywords
- Category transition over time
- Cumulative logits
- Marginal multinomial logits
- Multinomial dynamic logits
- Pseudo binary likelihood
1 Introduction
There are situations in practice where a univariate multinomial response, for example, the economic profit status of a pharmaceutical industry such as poor, medium, or high, may be recorded over the years along with known covariates such as type of industry, yearly advertising cost, and other research and development expenditures. It is likely that the profit status of an industry in a given year is correlated with status of profits from the past years. It is of interest to know both (i) the effects of the time dependent covariates, and (ii) the dynamic relationship among the responses over the years. This type of multinomial time series data has been analyzed by some authors such as Fahrmeir and Kaufmann [4], Kaufmann [8], Fokianos and Kedem [5–7], and Loredo-Osti and Sutradhar [10]. As far as the dynamic relationship is concerned, Loredo-Osti and Sutradhar [10] have considered a multinomial dynamic logit (MDL) model as a generalization of the binary time series model used in Tagore and Sutradhar [16] (see also Tong [17]).
Suppose that \(y_{t} = (y_{t1},\ldots , y_{tj},\ldots , y_{t,J-1})^{\prime }\) denotes the \((J-1)\)-dimensional multinomial response variable and for \(j=1,\ldots ,J-1,\)
indicates that the multinomial response recorded at time t belongs to the jth category. For \(j=J,\) one writes \(y^{(J)}_{t}=\delta _{tJ}=01_{J-1}.\) Here and also in (1), for a scalar constant c, we have used \(c1_j\) for simplicity, to represent \(c \otimes 1_j, \otimes \) being the well known Kronecker or direct product. This notation will also be used through out the rest of the paper when needed. Note that in the non-stationary case, that is, when covariates are time dependent, one uses the time dependent marginal probabilities. Specifically, suppose that at time point t (\(t=1,\ldots ,T\)), \(x_t=(x_{t1},\ldots ,x_{t\ell },\ldots ,x_{t,p+1})'\) denotes the \((p+1)\)-dimensional covariate vector and \(\beta _j=(\beta _{j0},\beta _{j1},\ldots , \beta _{jp})'\) denotes the effect of \(x_t\) on \(y^{(j)}_{t}\) for \(j=1,\ldots ,J-1,\) and all \(t=1,\ldots ,T, T\) being the length of the time series. In such cases, the multinomial probability at time t, has the form
and the elements of \(y_{t}=(y_{t1},\ldots , y_{tj},\ldots ,y_{t,J-1})'\) at time t follow the multinomial probability distribution given by
for all \(t=1,\ldots ,T.\) In (3), \(y_{tJ}=1-\sum ^{J-1}_{j=1}y_{tj}, \;\text{ and }\;\pi _{tJ}=1-\sum ^{J-1}_{j=1}\pi _{tj}.\)
Next we define the transitional probability from the gth \((g=1,\ldots ,J)\) category at time \(t-1\) to the jth category at time t, given by
where \(\gamma _j=(\gamma _{j1},\ldots ,\gamma _{jv},\ldots ,\gamma _{j,J-1})'\) denotes the dynamic dependence parameters. Note that this model in (4) is referred to as the multinomial dynamic logits (MDL) model. For the binary case (\(J=2\)), this type of non-linear dynamic logit model has been studied by some econometricians. See, for example, Amemiya [3, p. 422] in time series setup, and the recent book by Sutradhar [13, Sect. 7.7] in the longitudinal setup. Now for further notational convenience, we re-express the conditional probabilities in (4) as
where for \(t=2,\ldots ,T, \delta _{(t-1)g},\) by (1), has the formula
Remark that in (5), the category g occurred at time \(t-1.\) Thus the category g depends on time \(t-1,\) and \(\delta _{(t-1)g}\equiv \delta _{g_{t-1}}.\) However for simplicity we have used g for \(g_{t-1}.\)
Let \(\beta \,{=}\,(\beta '_1,\ldots ,\beta '_j,\ldots ,\beta '_{J-1})': (p+1)(J-1) \times 1,\) and \(\gamma \,{=}\,(\gamma '_1,\ldots ,\gamma '_j,\ldots ,\gamma '_{J-1})': (J-1)^2 \times 1.\) These parameters are involved in the unconditional mean, variance and covariances of the responses. More specifically one may show [10] that
where
Notice that there is a relation between the vector \(\eta _{(s|s-1)}(J)\) and the matrix \(\eta _{(s|s-1),M}.\) This is because the transition matrix \(\eta _{(s|s-1),M}\) contains the transitional probabilities from any of the first \(J-1\) states at time \(s-1\) to any of the \(J-1\) states at time s, whereas the transition vector \(\eta _{(s|s-1)}(J)\) contains transitional probabilities from the Jth state at time \(s-1\) to any of the first \(J-1\) states at time \(s-1.\) Consequently, once the transition matrix \(\eta _{(s|s-1),M}\) is computed, the transition vector \(\eta _{(s|s-1)}(J)\) becomes known.
It is of importance to estimate \(\beta \) and \(\gamma \) parameters mainly to understand the aforementioned basic properties including the pair-wise correlations of the responses.
Note however that the multinomial time series model (2)–(5) and its basic moment properties shown in (6)–(8) are derived without any order restrictions of the categories of the responses. The purpose of this paper is to estimate the parameters \(\beta \) and \(\gamma \) under an ordinal categorical response model which we describe in Sect. 2. In Sect. 3, we demonstrate the application of a pseudo likelihood approach for the estimation for these parameters. Some concluding remarks are made in Sect. 4.
2 Cumulative MDL Model for Ordinal Categorical Data
When categories for a response at a given time t are ordinal, one may then collapse the \(J >2\) categories in a cumulative fashion into two \((J'=2)\) categories and use simpler binary model to fit such collapsed data. Note however that there will be various binary groups depending on which category in the middle is used as a cut point. For the transitional categorical response from time \(t-1\) (say) to time t, cumulation of the categories at time t has to be computed conditional on the cumulative categories at time \(t-1.\) This will also generate a binary model for cumulative transitional responses. These concepts of cumulative probabilities for a cumulative response are used in the next section to construct the desired cumulative MDL model.
2.1 Marginal Cumulative Model at Time \(t=1\)
Suppose that for a selected cut point \(j (j=1,\ldots ,J-1), F_{(1)j}=\sum ^j_{c=1}\pi _{(1)c}\) represents the probability for a multinomial response to be in category c between 1 and j, where \(\pi _{(1)c}\) by (2) defines the probability for the response to be in category c (\(c=1,\ldots ,J)\) at time \(t=1.\) Thus, \(1-F_{(1)j}=\sum ^J_{c=j+1}\pi _{(1)c}\) would represent the probability for the multinomial response to be in category c beyond j. To reflect this binary nature of the observed response in category c with regard to cut point j, we define a binary variable \(b^{(j)}_c(1)\) such that
Notice that because there are \(J-1\) possible cut points, if the categories are ordered and the response falls in cth category, by (11) below, we then obtain the cut points based observed vector at time \(t=1\) as
For other values of t, the observed responses are constructed similarly depending on the response category.
2.2 Lag 1 Transitional Cumulative Model at Time \(t=2,\ldots ,T\)
In order to develop a transitional model, suppose we observe that the multinomial response at time \(t-1 (t=2,\ldots ,T)\) was in \(c_1\)th category \((c_1=1,\ldots ,J),\) whereas at time t it is observed in \(c_2 (c_2=1,\ldots ,J)\) category. Let (g, j) denote a bivariate cut point which facilitates the binary variables [similar to (9)] given by
and
Consequently, a transitional probability model based on conditional probabilities (5) may be written as
where the conditional probability \(\lambda ^{(c_2)}_{t|t-1}(c_1),\) has the known multinomial dynamic logit (MDL) form given by (5). For convenience, following (12)–(13), we also write
3 Pseudo Binary Likelihood Estimation for the Ordinal Model
In this section, we construct a binary data based likelihood function, where the binary data are obtained by collapsing the available ordinal multinomial observations. Consequently, we refer to this likelihood approach as the so-called pseudo likelihood approach. However, for convenience, we use the terminology ‘likelihood’ for the ‘pseudo likelihood’, through out the section.
At \(t=1,\) the marginal likelihood for \(\beta \) by (9) has the form
where
Next for the construction of the conditional likelihood at t given the information from previous time point \(t-1,\) we first re-express the binary conditional probabilities in (12) and (14), as
One may then write the conditional likelihood for \(\beta \) and \(\gamma ,\) as
where the binary data \(b^{(j)}_{c_2}(t)\) for observed \(c_2\) are obtained by (11), and similarly \(b^{(g)}_{c_1}(t-1)\) to define \(g^*\) for given \(c_1\) are obtained from (10).
Next by combining (17) and (21), one obtains the likelihood function for \(\beta \) and \(\gamma \) as
For the benefit of the practitioners, we now develop the likelihood estimating equations for these parameters \(\beta \) and \(\gamma ,\) as in the following sections. Remark that for the construction of similar likelihood estimating equations in the stationary longitudinal setup, one may be referred to Sutradhar [14, Sect. 3.6.2.2].
Note that the likelihood function in (22) is constructed by collapsing the ordinal multinomial responses to the binary responses at all suitable cut points. This likelihood function, therefore, can not be used for nominal multinomial time series data. When the categories are nominal, it is appropriate to construct the likelihood function by exploiting the marginal probability function \(\pi _{(t)j}\) from (2) for \(t=1,\) and the conditional multinomial logit probability function \(\eta ^{(j)}_{t|t-1}(g)\) from (4) for \(t=2,\ldots ,T\) (see Loredo-Osti and Sutradhar [10]). Notice that in practice the time dependent covariates \(x_t\) in (2) and (4) are fixed in general. However, by treating \(x_t\) as a random covariate vector, Fokianos and Kedem [6] obtained parameter estimates by maximizing a partial likelihood function without requiring any extra characterization of the joint process \(\{y_t,x_t\}.\) Loredo-Osti and Sutradhar [10] have, however, argued that in Fokianos and Kedem’s [6] approach, the conditional Fisher information matrix is not the same as the one obtained by conditioning on \(\{x_t\},\) the observed covariates. In fact, when the estimation is carried out in a general linear models framework that uses the canonical link function, this conditional information matrix obtained by Fokianos and Kedem, is just the Hessian matrix multiplied by \(-1,\) i.e., the observed information matrix.
As far as the ordinal multinomial time series data are concerned, the construction of binary mapping based likelihood function in (22) is a new concept. The core idea comes from the cumulative binary property for the MDL (multinomial dynamic logit) model (4) because of the present ordinal nature of the data. In the cross sectional setup, that is, for the case with \(t=1\) only, the likelihood function for ordinal multinomial data has been used by many authors such as Agresti [1]. Note that the marginal multinomial probability in (2) has the multinomial logit form. In the cluster data setup, many existing studies use this multinomial logit model (2) as the marginal model at a given time t. As far as the correlations between repeated responses are concerned, some authors such as Agresti [1], Lipsitz et al. [9], Agresti and Natarajan [2] do not model them, rather they use ‘working’ correlations to construct the so-called generalized estimating equations and solve them to obtain the estimates for regression parameters involved in the marginal multinomial logits model (2). These estimates however may not be reliable as they can be inefficient as compared to the ‘working’ independence assumption based estimates (see Sutradhar and Das [15], Sutradhar [13, Chap. 7] in the context of binary longitudinal data analysis). Thus, their extension to the time series setup may be useless. Moreover, it is not clear how to model the ordinal data using this type of ‘working’ correlations approach.
3.1 Likelihood Estimating Equations for the Regression Effects \(\beta \)
Recall that \(\beta =(\beta '_1,\ldots ,\beta '_j,\ldots ,\beta '_{J-1})': (J-1)(p+1) \times 1,\) with \(\beta _j=(\beta _{j0},\beta _{j1},\ldots ,\beta _{jp})'.\) For known \(\gamma ,\) in this section, we exploit the likelihood function (22) and develop the likelihood estimating equation for \(\beta .\) For convenience, we use log likelihood function, which, following the likelihood function in (22), is written as
yielding the likelihood estimating equation for \(\beta \) as
where
and
with
The details for the derivatives in (25) and (26) are given in “Appendix”.
For given \(\gamma \), the likelihood equations in (24) may be solved iteratively by using the iterative equation for \(\beta \) given by
where the formula for the second order derivative matrix \(\frac{\partial ^2 Log\;L(\beta ,\gamma _M)}{\partial {\beta }'\partial \beta }\) may be derived by taking the derivative of the \((J-1)(p+1) \times 1\) vector with respect to \(\beta '.\) The exact second order derivative matrix has a complicated formula. We provide an approximation as follows.
An approximation to \(\frac{\partial ^2 Log\;L(\beta ,\gamma _M)}{\partial {\beta }'\partial \beta }:\)
Re-express the likelihood estimating equation from (24) as
Notice that in the first term in the left hand side of (29), \(\{1-b^{(j)}_{c}(1)\}\) is, by (9), a binary variable with
and similarly in the second term, by (12), \(b^{(j)}_{c_2}(t)\) conditional on \(b^{(g)}_{c_1}(t-1)\) is a binary variable with
for \(g^* \equiv b^{(g)}_{c_1}(t-1).\) Thus, the likelihood estimating function in (29) is equivalent to a conditional quasi-likelihood (CQL) function in \(\beta \) for the cut points based binary data [e.g. see Tagore and Sutradhar [16, Eq. (27), p. 888]. Now because the variance of the binary data is a function of the mean, the variance and gradient functions in (29) may be treated to be known when mean is known. Thus, when a QL estimating equation is solved iteratively, the gradient and variance functions use \(\beta \) from a previous iteration [11, 18]. Consequently, by (29), the second derivative matrix required to compute (28) has a simpler approximate formula
Furthermore for known \(\gamma \), by (28) and (29), under some mild conditions it follows that the solution of (29), say \(\hat{\beta },\) satisfies
(see Kaufmann [8, Sect. 5]) where the covariance matrix is estimated by
3.2 Likelihood Estimating Equations for the Dynamic Dependence Parameters \(\gamma \)
In Sect. 3.1, we have estimated \(\beta \) for known \(\gamma ,\) for example, initially by using \(\gamma =0,\) where by (4)–(5),
Note that \(F_{(1)j}\) for all \(j=1,\ldots ,J-1,\) are free from \(\gamma .\) Hence, by exploiting the log likelihood function (23), similar to (24), we write the likelihood equation for \(\gamma \) as
where
An outline for this derivative is given in the “Appendix”.
By similar calculations as in (28), one may solve the likelihood estimating equation in (35) for \(\gamma \) using the iterative equation
where the second order derivative matrix, following (32), may be computed as
Furthermore for known \(\beta \), by (37) and (38), it follows under some mild conditions that the solution of (35), say \(\hat{\gamma },\) satisfies
(see Kaufmann [8, Sect. 5]) where the covariance matrix is estimated by
3.3 Joint Likelihood Estimating Equations for \(\beta \) and \(\gamma \)
Let \(\theta =(\beta ',\gamma ')'.\) One may then combine (28) and (37) and solve the iterative equation
to obtain the joint likelihood estimates for \(\beta \) and \(\gamma .\) In order to construct the iterative equation (41), we require the formula for the second order derivative matrix \(\frac{\partial ^2 \text{ Log }\; L (\beta ,\gamma )}{\partial {\beta } \partial {\gamma }'} \) which, using (29), may be approximately computed as
where the formulas for \(\frac{\partial \tilde{\lambda }^{(2)}_{gj}(g^*)}{\partial \beta }\) and \(\frac{\partial \tilde{\lambda }^{(2)}_{gj}(g^*)}{\partial \gamma }\) are given by (26) and (36), respectively.
Furthermore, by similar arguments to (33) and (39), under some mild conditions it follows that the solution of (41), say \(\begin{pmatrix}\tilde{\beta } \\ \tilde{\gamma }\end{pmatrix}\) has the multivariate Gaussian distribution
where \(\text{ cov }(\tilde{\beta })=\tilde{V}_{11}(\beta ,\gamma )\) and \(\text{ cov }(\tilde{\gamma })=\tilde{V}_{22}(\beta ,\gamma )\) are estimated as
Rao [12, p. 33] with \(E=D-B'A^{-1}B,\;\text{ and }\;F=A^{-1}B,\) where by (41)
4 Concluding Remarks
Recently some authors such as Loredo-Osti and Sutradhar [10] (see also Fokianos and Kedem [6]) have developed a likelihood approach for the estimation of regression and dynamic dependence parameters involved in a multinomial dynamic logit (MDL) model used for categorical time series data. This inference issue becomes more complex when the categorical response collected at a given time point also exhibit an order. In this paper we have demonstrated that this type of ordinal categorical responses collected over time may be analyzed by collapsing a multinomial response to a binary response at a given possible cut point and fitting binary dynamic model to all such binary responses collected based on all possible cut points over all times. For simplicity, we have fitted a low order, namely lag 1 dynamic model among all possible cut points based binary responses. A pseudo likelihood method using binary responses (in stead of the multinomial observations) is then constructed for the estimation of the regression and dynamic dependence parameters. The authors plan to undertake an empirical study involving simulations and real life data analysis in order to investigate the performance of the proposed estimation approach both for moderate and large size time series. The empirical results will be published elsewhere.
References
Agresti, A. (1989). A survey of models for repeated ordered categorical response data. Statistics in Medicine, 8, 1209–1224.
Agresti, A., & Natarajan, R. (2001). Modeling clustered ordered categorical data: A survey. International Statistical Review, 69, 345–371.
Amemiya, T. (1985). Advanced econometrics. Cambridge, MA: Harvard University Press.
Fahrmeir, L., & Kaufmann, H. (1987). Regression models for non-stationary categorical time series. Journal of Time Series Analysis, 8, 147–160.
Fokianos, K., & Kedem, B. (1998). Prediction and classification of non-stationary categorical time series. Journal of Multivariate Analysis, 67, 277–296.
Fokianos, K., & Kedem, B. (2003). Regression theory for categorical time series. Statistical Science, 18, 357–376.
Fokianos, K., & Kedem, B. (2004). Partial likelihood inference for time series following generalized linear models. Journal of Time Series Analysis, 25, 173–197.
Kaufmann, H. (1987). Regression models for nonstationary categorical time series: Asymptotic estimation theory. Annals of Statistics, 15, 79–98.
Lipsitz, S. R., Kim, K., & Zhao, L. (1994). Analysis of repeated categorical data using generalized estimating equations. Statistics in Medicine, 13, 1149–1163.
Loredo-Osti, J. C., & Sutradhar, B. C. (2012). Estimation of regression and dynamic dependence parameters for non-stationary multinomial time series. Journal of Time Series Analysis, 33, 458–467.
McCullagh, P. (1983). Quasilikelihood functions. Annals of Statistics, 11, 59–67.
Rao, C. R. (1973). Linear statistical inference and its applications. New York, NY: Wiley.
Sutradhar, B. C. (2011). Dynamic mixed models for familial longitudinal data. New York, NY: Springer.
Sutradhar, B. C. (2014). Longitudinal categorical data analysis. New York, NY: Springer.
Sutradhar, B. C., & Das, K. (1999). On the efficiency of regression estimators in generalized linear models for longitudinal data. Biometrika, 86, 459–465.
Tagore, V., & Sutradhar, B. C. (2009). Conditional inference in linear versus nonlinear models for binary time series. Journal of Statistical Computation and Simulation, 79, 881–897.
Tong, H. (1990). Nonlinear time series: A dynamical system approach. Oxford statistical science series (Vol. 6). New York, NY: Oxford University Press (1990)
Wedderburn, R. W. M. (1974). Quasi-likelihood functions, generalized linear models, and the Gauss–Newton method. Biometrika, 61, 439–447.
Acknowledgments
The authors are grateful to Bhagawan Sri Sathya Sai Baba for His love and blessings to carry out this research in Sri Sathya Institute of Higher Learning. The authors thank the editorial committee for the invitation to participate in preparing this Festschrift honoring Professor Ian McLeod. It has brought back many pleasant memories of Western in early 80’s experienced by the first author during his PhD study. We have prepared this small contribution as a token of our love and respect to Professor Ian McLeod for his long and sustained contributions to the statistics community through teaching and research in time series analysis, among other areas. The authors thank two referees for their comments and suggestions on the earlier version of the paper.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Appendix
Appendix
Derivation for \(\frac{\partial F_{(1)j}}{\partial \beta }:\)
Recall from Sect. 2.1 that \(F_{(1)j}=\sum ^j_{c=1}\pi _{(1)c},\) where \(\pi _{(1)c}\) is given by (2). It then follows that
Now because
it follows that
The formula for \(\frac{\partial F_{(1)j}}{\partial \beta }\) in (25) follows by using (47) and (45).
Derivation for \(\frac{\partial \tilde{\lambda }^{(2)}_{gj}(g^*)}{\partial \beta }:\)
By using the formula for \(\tilde{\lambda }^{(2)}_{gj}(g^*)\) from (13) we write
where \(\lambda ^{(c_2)}_{t|t-1}(c_1)\) is given in (5), that is,
Now, for \(t=2,\ldots ,T,\) it follows from (49) that
yielding
The formula for the derivative in (26) follows now by applying (50) into (48).
Derivation for \(\frac{\partial \tilde{\lambda }^{(2)}_{gj}(g^*)}{\partial \gamma }:\)
By using the formula for \(\tilde{\lambda }^{(2)}_{gj}(g^*)\) from (13) we write
where \(\lambda ^{(c_2)}_{t|t-1}(c_1)\) is given in (5) [see also (49)].
Next, for \(t=2,\ldots ,T,\) it follows from (49) that
where
These derivatives in (53) may further be re-expressed as
The formula for the derivative in (36) now follows by applying (53) into (52).
Rights and permissions
Copyright information
© 2016 Springer Science+Business Media New York
About this chapter
Cite this chapter
Sutradhar, B.C., Prabhakar Rao, R. (2016). Regression Models for Ordinal Categorical Time Series Data. In: Li, W., Stanford, D., Yu, H. (eds) Advances in Time Series Methods and Applications . Fields Institute Communications, vol 78. Springer, New York, NY. https://doi.org/10.1007/978-1-4939-6568-7_8
Download citation
DOI: https://doi.org/10.1007/978-1-4939-6568-7_8
Published:
Publisher Name: Springer, New York, NY
Print ISBN: 978-1-4939-6567-0
Online ISBN: 978-1-4939-6568-7
eBook Packages: Mathematics and StatisticsMathematics and Statistics (R0)