Latent Class Models of Time Series Data: An Entropic-Based Uncertainty Measure

Dias, José G.

doi:10.1007/978-3-319-00035-0_20

José G. Dias^21,22

Part of the book series: Studies in Classification, Data Analysis, and Knowledge Organization ((STUDIES CLASS))

2845 Accesses

Abstract

Latent class modeling has proven to be a powerful tool for identifying regimes in time series. Here, we focus on the classification uncertainty in latent class modeling of time series data with emphasis on entropy-based measures of uncertainty. Results are illustrated with an example.

Access provided by Autonomous University of Puebla. Download conference paper PDF

Classification for Time Series Data. An Unsupervised Approach Based on Reduction of Dimensionality

Article 11 May 2019

Finding Patterns in Time Series

Adjusted Empirical Likelihood for Time Series Models

Article 21 July 2017

Keywords

These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

1 Introduction

Latent class models are a powerful tool for capturing unobserved heterogeneity in a wide range of social and behavioral science data (see, for example, McLachlan and Peel 2000 or Ramos et al. 2011).

Let y _i denote a T-dimensional observation and $D =\{ \mathbf{y}_{1},\ldots,\mathbf{y}_{n}\}$ a sample of size n. Each data point is assumed to be a realization of the random variable Y coming from an S-component mixture probability density function (p.d.f.)

$$\displaystyle{ f(\mathbf{y}_{i};\boldsymbol{\varphi }) =\sum _{ w=1}^{S}\pi _{ w}f_{w}(\mathbf{y}_{i};\boldsymbol{\theta }_{w}), }$$

(1)

where π _w are positive mixing proportions that sum to one, $\boldsymbol{\theta }_{w}$ are the parameters defining the conditional distribution $f_{w}(\mathbf{y}_{i};\boldsymbol{\theta }_{w})$ for latent class w, and $\boldsymbol{\varphi }= \left \{\pi _{1},\ldots,\pi _{S-1},\boldsymbol{\theta }_{1},\ldots,\boldsymbol{\theta }_{S}\right \}$. Note that $\pi _{S} = 1 -\sum _{w=1}^{S-1}\pi _{w}$. The log-likelihood function for an LC model – assuming i.i.d. observations – has the form $\ell(\boldsymbol{\varphi };\mathbf{y}) =\sum _{ i=1}^{n}\log f(\mathbf{y}_{i};\boldsymbol{\varphi })$, which is straightforward to maximize (yielding the MLE - maximum likelihood estimator) by the EM algorithm (Dempster et al. 1977).

From parameter estimates of the LC model one can derive the posterior probability that an observation belongs to a certain latent class conditional on its response pattern. From the ML parameter estimates, Bayes’ theorem gives the posterior probability that observation i was generated by latent class w:

$$\displaystyle{ \hat{\alpha }_{iw} = \frac{\hat{\pi }_{w}f_{w}(\mathbf{y}_{i};\hat{\boldsymbol{\theta }}_{w})} {\sum _{v=1}^{S}\hat{\pi }_{v}f_{v}(\mathbf{y}_{i};\hat{\boldsymbol{\theta }}_{v})}, }$$

(2)

where $\hat{\pi }_{w}$ and $\hat{\boldsymbol{\theta }}_{w}$ are the ML estimates of π _w and $\boldsymbol{\theta }_{w}$, respectively.

The $\hat{\alpha }_{iw}$ values define a soft partitioning/clustering of the data set, since $\sum _{w=1}^{S}\hat{\alpha }_{iw} = 1$ and $\hat{\alpha }_{iw} \in [0,1]$. Let c _i represent the true cluster membership (the missing data) of observation i. Then, the optimal Bayes rule assigning observation i to the class with maximum posterior probability can be defined as follows:

$$\displaystyle{ \hat{c}_{i} =\arg \max _{w}\hat{\alpha }_{iw},i = 1,\ldots,n, }$$

(3)

where $\hat{c}_{i}$ is the estimate of c _i.

In this paper, we address the following question: How can we measure the level of uncertainty in the mapping from the [0, 1] soft partition to the {0,1} hard partition obtained by applying the optimal Bayes rule in time series analysis?

The paper is organized as follows: Sect. 2 presents the full mixture hidden Markov model used to obtain the posterior probabilities of the regimes; Sect. 3 introduces a measure of classification uncertainty; Sect. 4 studies its behavior using a synthetic example; and Sect. 5 illustrates the application of the procedure to a panel data set of twenty European stock markets. Section 6 gives concluding remarks.

2 Latent Class Modeling of Time Series Data

The mixture of hidden Markov models (MHMM-S) (Dias et al. 2008; Ramos et al. 2011) is defined by the density:

$$\displaystyle{ f(\mathbf{y}_{i};\boldsymbol{\varphi }) =\sum _{ w_{i}=1}^{S}\pi _{ w}\sum _{z_{i1}=1}^{K}\ldots \sum _{ z_{iT}=1}^{K}f(z_{ i1}\vert w_{i})\prod \limits _{t=2}^{T}f(z_{\mathit{ it}}\vert z_{i,t-1},w_{i})\prod \limits _{t=1}^{T}f(y_{\mathit{ it}}\vert z_{\mathit{it}}), }$$

(4)

where the conditional distribution within each latent class is given by a hidden Markov model with K regimes. $\boldsymbol{\varphi }$ is the vector containing all parameters in the model. Thus, we assume that within latent class w the sequence $\{z_{i1},\ldots,z_{iT}\}$ is in agreement with a first-order Markov chain. Moreover, we assume that the observed value y _it at a particular time point depends only on the regime at this time point; i.e, conditionally on the regime z _it, the response y _it is independent of other time points, which is often referred to as the local independence assumption. As far as the first-order Markov assumption for the latent regime switching conditional on latent class membership w is concerned, it is important to note that this assumption is not as restrictive as one may initially think. It does clearly not imply a first-order Markov structure for the responses y _it. The standard hidden Markov model (HMM) (Baum et al. 1970) is a special case of the MHMM-S that is obtained by eliminating the time-constant latent variable w from the model, that is, by assuming that there is no unobserved heterogeneity across time series.

The characterization of the MHMM is provided by:

π _w is the prior probability of belonging to the latent class w;
$f(z_{i1}\vert w_{i})$ is the initial-regime probability; that is, the probability of having a particular initial regime conditional on belonging to latent class w with multinomial parameter $\lambda _{\mathit{kw}} = P(Z_{i1} = k\vert W_{i} = w)$;
$f(z_{\mathit{it}}\vert z_{i,t-1},w_{i})$ is a latent transition probability; that is, the probability of being in a particular regime at time point t conditional on the regime at time point t − 1 and latent class membership; assuming a time-homogeneous transition process, we have $p_{\mathit{jkw}} = P(Z_{\mathit{it}} = k\vert Z_{i,t-1} = j,W_{i} = w)$ as the relevant multinomial parameter. Note that the MHMM-S allows that each latent class has its specific transition or regime-switching dynamics, whereas in a standard HMM it is assumed that all cases have the same transition probabilities;
f(y _it | z _it), the probability density of having a particular value y _it, conditional on the regime occupied at time point t, is assumed to have the form of a univariate normal (or Gaussian) density function. This distribution is characterized by the parameter vector $(\mu _{k},\sigma _{k}^{2})$ containing the mean (μ _k) and variance (σ _k ²) for regime k. Note that these parameters are assumed invariant across latent classes, an assumption that may, however, be relaxed.

Since $f(\mathbf{y}_{i};\boldsymbol{\varphi })$, defined by Eq. (4), is a mixture of densities across S latent classes and K regimes, it defines a flexible Gaussian mixture model that can accommodate deviations from normality in terms of skewness and kurtosis. For example, for two regimes (K = 2), the MHMM-S has 4S + 3 free parameters to be estimated, including S − 1 class sizes, S initial-regime probabilities, 2S transition probabilities, 2 conditional means, and 2 conditional variances.

Maximum likelihood (ML) estimation of the parameters of the MHMM-S involves maximizing the log-likelihood function: $\ell(\boldsymbol{\varphi };\mathbf{y}) =\sum _{ i=1}^{n}\log f(\mathbf{y}_{i};\boldsymbol{\varphi })$, a problem that can be tackled by the Expectation-Maximization (EM) algorithm (Dempster et al. 1977). The E step computes the joint conditional distribution of the T + 1 latent variables given the data and the current provisional estimates of the model parameters. In the M step, standard complete data ML methods are used to update the unknown model parameters using an expanded data matrix with the estimated densities of the latent variables as weights. Since the EM algorithm requires the computation of S ⋅2^T entries in the E step, which makes this algorithm impractical or even impossible to apply with more than a few time points. However, for hidden Markov models, a special variant of the EM algorithm was proposed that is usually referred to as the forward-backward or Baum-Welch algorithm (Baum et al. 1970). The Baum-Welch algorithm circumvents the computation of this joint posterior distribution making use of the conditional independencies implied by the model.

An important modeling issue is the setting of S and K, the number of latent classes and regimes needed to capture the unobserved heterogeneity across time series. The selection of S and K is typically based on information statistics such as the Bayesian Information Criterion (BIC) (Schwarz 1978) defined as:

$$\displaystyle{ \mathit{BIC}_{S,K} = -2\ell_{S,K}(\hat{\boldsymbol{\varphi }};\mathbf{y}) + N_{S,K}\log n, }$$

(5)

where N _S, K is the number of free parameters of the model and n is the sample size.

3 An Entropic-Based Uncertainty Measure

Classification uncertainty can be measured by the posterior probabilities $\hat{\alpha }_{\mathit{is}}$. An aggregate measure of classification uncertainty is the entropy. For LC models, the entropy is

$$\displaystyle{ \mathit{EN}(\boldsymbol{\alpha }) = -\sum _{i=1}^{n}\sum _{ s=1}^{S}\alpha _{ \mathit{is}}\log \alpha _{\mathit{is}}. }$$

(6)

Its normalized version has been used as a model selection criterion, indicating the level of separation of latent classes (Dias and Vermunt 2006, 2008). The relative entropy that scales the entropy to the interval [0,1] is given by

$$\displaystyle{ E = 1 -\mathit{EN}(\boldsymbol{\alpha })/(n\log S). }$$

(7)

For well-separated latent classes, E ≈ 1; for ill-separated latent classes, E ≈ 0. This provides a method for assessing the “fuzzyness” of the partition of the data under the hypothesized model. The ML estimate of E – $\hat{E}$ – can be obtained using the MLE ($\hat{\alpha }_{\mathit{is}}$) of α _is in Eq. (7).

We propose an extension of the relative entropy to panel data, that we call Entropy Regime Classification Measure (ERCM). For the time series y _i, the ERCM is given by

$$\displaystyle\begin{array}{rcl} \mathit{ERCM}_{i} = 1 + \frac{1} {T\log K}\sum _{t=1}^{T}\sum _{ k=1}^{K}\alpha _{ \mathit{itk}}\log (\alpha _{\mathit{itk}}),& &{}\end{array}$$

(8)

where $\alpha _{\mathit{itk}} = P(Z_{\mathit{it}} = k\vert \mathbf{y}_{i})$ is the probability that time series i is in regime k at time t conditional on the observed data.

4 A Synthetic Example

To understand the behavior of ERCM, let us assume that the posterior probabilities for each time series i and time t – $\boldsymbol{\alpha }_{\mathit{it}} = \left (\alpha _{\mathit{it}1},\ldots,\alpha _{\mathit{itK}}\right )$ – are:

$$\displaystyle\begin{array}{rcl} \boldsymbol{\alpha }_{\mathit{it}} = \left (\delta, \frac{1-\delta } {K - 1},\ldots, \frac{1-\delta } {K - 1}\right ),& &{}\end{array}$$

(9)

i.e., it represents a regime with probability δ and the remaining K − 1 regimes with identical probability: $(1-\delta )/(K - 1)$. Replacing vector $\boldsymbol{\alpha }_{\mathit{it}}$, in Eq. (8), we obtain the expression below for ERCM:

$$\displaystyle\begin{array}{rcl} \mathit{ERCM}_{K,\delta } = 1 + \frac{1} {\log K}\left [\delta \log \delta +(1-\delta )\log \left ( \frac{1-\delta } {K - 1}\right )\right ].& & {}\\ \end{array}$$

Figure 1 depicts the ERCM as function of K and δ with values {2, 3, 4, 5}, and [0. 5, 1. 0], respectively. For example, for K = 2 and δ = 0. 5, the classification uncertainty is maximum and then ERCM = 0.

As expected, the relation between δ and ERCM is nonlinear. For the same value of δ, increasing K leads to an increase of ERCM as the value of $(1-\delta )/(K - 1)$ decreases with an increase in K, and it becomes clearer the ‘right’ regime. In the opposite case, with δ = 1, then ERCM = 1, with 0 ⋅log0 = 0.

5 Application

Modeling the dynamics of stock market returns has been an important challenge in modern financial econometrics. The statistics and dynamics of correctly specified distributions provide more accurate and detailed input for financial asset pricing and risk management. For example, investors buy or sell securities according to their expectation of the market regime. In addition, portfolio risk reduction might be achieved by procedures that take into account the synchronization of market regimes. Therefore, regime switching uncertainty is key in financial modeling.

The data set used in this article are daily closing prices from 4 July 2007 (the start of the subprime crisis) to 11 July 2011 for twenty European stock market indexes drawn from Datastream database and listed in Table 1. The series are expressed in US dollars. In total, we have 1,038 end-of-the-day observations per country. Let P _it be the observed daily closing price of market i on day t, i = 1, …, 20 and t = 0, …, 1,037. The daily rates of return are defined as the log-returns multiplied by 100: $y_{\mathit{it}} = 100 \times \log (P_{\mathit{it}}/P_{i,t-1})$, t = 1, …, T, with T = 1,037.

Table 1 Summary statistics

Full size table

This period was a very harsh one for the European stock markets. Table 1 provides descriptive statistics of the time series, while Fig. 2 depicts the log-returns time series. It can be seen that the mean is not positive for all markets in this period, however only for three markets – Finland, Greece, and Ireland – the median is negative. Stock markets show, instead, very different patterns of dispersion (Fig. 2); the largest standard deviation is found in Russia, Hungary, and Norway, while the smallest is in Switzerland (1.47). Return rate distributions are diverse in terms of skewness and the kurtosis (which equals 0 for normal distributions) shows high positive values, indicating heavier tails and more peakness than the normal distribution. The Jarque-Bera test rejects the null hypothesis of normality for all twenty stock markets.

Overall, these stock market features seem well suited to be modeled using MHMMs as we want to model simultaneously the 20 time series with typical cluster volatility. Given the traditional dichotomization of financial markets into “bull” and “bear” markets, we assume K = 2. We estimated models characterized by different number of latent classes (S = 1, …, 5). To minimize the impact of local maxima, 300 different starting values for the parameters are used for each model. The model with two latent classes (S = 2) yielded the lowest BIC value ($\ell_{2}(\hat{\varphi };\mathbf{y}) = -$42,479.61, N ₂ = 11 and BIC ₂ = 84,992.17).

Table 2 provides information on the two regimes that were identified (K = 2); that is, the average proportion of markets in regime k over time and the mean and variance of the returns in regime k. The result is in line with the common dichotomization of financial markets into “bull” and “bear” markets. Consistently, the reported means show that one of the regimes is associated with positive returns (bull market) and the other with negative returns (bear market). The probability of being in the bear and bull regimes is 0.27 and 0.73, respectively. We would also like to emphasize that these results are coherent with the common acknowledgment of volatility asymmetry of financial markets. Volatility is likely to be higher when markets fall than when markets rise.

Table 2 Estimated marginal probabilities of the regimes and within Gaussian parameters

Full size table

Table 3 reports the estimated probabilities of being in one of the regimes within each latent class. There is a clear distinction between these latent classes. Latent class 2 has the largest probability of being in bear regime (0.41). For latent class 1, this probability is 0.24. Moreover, Table 3 provides another key insight from our analysis. It gives the transition probabilities between the two regimes for both latent classes. First, notice that both latent classes show regime persistence. Once a stock market jumps into a regime, it is likely to remain within the same regime for a while, which is coherent with stylized facts in financial markets. Second, latent class 1 shows lower propensity to move from a bull regime to a bear regime (0.007) than latent class 2. Third, latent class 1 shows higher probability to jump from a bear to a bull regime than latent class 2.

Table 3 Characterization of the switching regimes

Full size table

The sojourn time is the expected number of days that a stock market stays in a given regime. For regime k and latent class w it can be obtained by $1/(1 - p_{\mathit{kkw}})$. As reported in Table 3, stock markets in latent class 2 stay the shortest number of days in both bear and bull markets, and consequently being the less stable group.

Table 4 summarizes the results related to the distribution of stock market across latent classes. From the posterior class membership probabilities, the probability of belonging to each of the latent classes conditional on the observed data, we found only two stock markets are more likely to belong to latent class two: Greece (1.00) and Hungary (0.99). For most of the stock markets the posterior probability is precise (the probability of the most likely latent class is always one or very close to one), the exception being Ireland with probability 0.16 of belonging to latent class 2. By combining the classification information with the descriptive statistics in Table 1, latent class 1 tends to contain countries with lower volatility than latent class 2.

Table 4 Estimated posterior probabilities and ERCM

Full size table

Based on the posterior probabilities from the estimated model, Table 4 also reports the estimate of the ERCM for each stock market. Above 0.9 we have Switzerland (0.936), Germany (0.923), Belgium (0.922), Netherlands (0.914), Denmark (0.9111), Czech Republic (0.910), and France (0.905) as the least uncertain stock markets. On the other hand, Greece (0.605) and Hungary (0.672) are the most uncertain stock markets with ERCM below 0.7. The third most uncertain market is Ireland (0.777). Thus, the ERCM complements the values of posterior probabilities providing a more detailed indicator of stock market uncertainty. These results are consistent with financial market stylized facts.

6 Conclusions

This paper provides an extension to the MHMM model (Dias et al. 2008; Ramos et al. 2011) as a tool for measuring classification uncertainty in financial time series analysis. The proposed measure of uncertainty – Entropy Regime Classification Measure (ERCM) – reveals the amount of uncertainty in a given time series. In the analysis of a sample of twenty stock markets for a period of 1,038 days, the best model was the one with two latent classes, with distinct types of regime switching. We conclude that the European market uncertainty in this period of analysis ranged from a minimum for Switzerland (ERCM = 0. 936) and a maximum for Greece (ERCM = 0. 605).

This approach should be further explored namely in applications where more than two regimes are needed. For instance, in modeling electricity prices it is standard to use at least three regimes as a result of abnormality in the markets and spike prices.

References

Baum, L. E., Petrie, T., Soules, G., & Weiss, N. (1970). A maximization technique occurring in the statistical analysis of probabilistic functions of Markov chains. Annals of Mathematical Statistics, 41, 164–171.
Article MathSciNet MATH Google Scholar
Dempster, A. P., Laird, N. M., & Rubin, D. B. (1977). Maximum likelihood from incomplete data via the EM algorithm (with discussion). Journal of the Royal Statistical Society B, 39, 1–38.
MathSciNet MATH Google Scholar
Dias, J. G., & Vermunt, J. K. (2006). Bootstrap methods for measuring classification uncertainty in latent class analysis. In A. Rizzi & M. Vichi (Eds.), COMPSTAT 2006: proceedings in computational statistics, Rome (pp. 31–41). Heidelberg: Physica-Verlag.
Google Scholar
Dias, J. G., & Vermunt, J. K. (2008). A bootstrap-based aggregate classifier for model-based clustering. Computational Statistics, 23(4), 643–659.
Article MathSciNet MATH Google Scholar
Dias, J. G., Vermunt, J. K., & Ramos, S. B. (2008). Heterogeneous hidden Markov models. In P. Brito (Ed.), COMPSTAT 2008: proceedings in computational statistics, Porto (pp. 373–380). Heidelberg: Physica-Verlag.
Google Scholar
McLachlan, G. J., & Peel, D. (2000). Finite mixture models. New York: Wiley.
Book MATH Google Scholar
Ramos, S. B., Vermunt, J. K., & Dias, J. G. (2011). When markets fall down: Are emerging markets all the same? International Journal of Finance and Economics, 16(4), 324-338.
Article Google Scholar
Schwarz, G. (1978). Estimating the dimension of a model. Annals of Statistics, 6, 461–464.
Article MathSciNet MATH Google Scholar

Download references

Acknowledgements

The author would like to thank the Fundação para a Ciência e a Tecnologia (Portugal) for its financial support (PTDC/EGE-GES/103223/2008 and PEst-OE/EGE/UI0315/ 2011) and the three referees for their very valuable comments.

Author information

Authors and Affiliations

UNIDE, ISCTE – Instituto Universitário de Lisboa, BRU – Business Research Centre, Lisbon, Portugal
José G. Dias
Edifício ISCTE, Av. das Forças Armadas, 1649–026, Lisboa, Portugal
José G. Dias

Authors

José G. Dias
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to José G. Dias .

Editor information

Editors and Affiliations

University of Essex Department of Mathematical Sciences, Colchester, United Kingdom
Berthold Lausen
Ghent University Department of Marketing, Ghent, Belgium
Dirk Van den Poel
University of Marburg Databionics, FB 12, Marburg, Germany
Alfred Ultsch

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Dias, J.G. (2013). Latent Class Models of Time Series Data: An Entropic-Based Uncertainty Measure. In: Lausen, B., Van den Poel, D., Ultsch, A. (eds) Algorithms from and for Nature and Life. Studies in Classification, Data Analysis, and Knowledge Organization. Springer, Cham. https://doi.org/10.1007/978-3-319-00035-0_20

Download citation

DOI: https://doi.org/10.1007/978-3-319-00035-0_20
Published: 16 July 2013
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-00034-3
Online ISBN: 978-3-319-00035-0
eBook Packages: Mathematics and StatisticsMathematics and Statistics (R0)

Publish with us

Policies and ethics

Latent Class Models of Time Series Data: An Entropic-Based Uncertainty Measure

Abstract

Similar content being viewed by others