Keywords

These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

1 Introduction

Latent class models are a powerful tool for capturing unobserved heterogeneity in a wide range of social and behavioral science data (see, for example, McLachlan and Peel 2000 or Ramos et al. 2011).

Let y i denote a T-dimensional observation and \(D =\{ \mathbf{y}_{1},\ldots,\mathbf{y}_{n}\}\) a sample of size n. Each data point is assumed to be a realization of the random variable Y coming from an S-component mixture probability density function (p.d.f.)

$$\displaystyle{ f(\mathbf{y}_{i};\boldsymbol{\varphi }) =\sum _{ w=1}^{S}\pi _{ w}f_{w}(\mathbf{y}_{i};\boldsymbol{\theta }_{w}), }$$
(1)

where π w are positive mixing proportions that sum to one, \(\boldsymbol{\theta }_{w}\) are the parameters defining the conditional distribution \(f_{w}(\mathbf{y}_{i};\boldsymbol{\theta }_{w})\) for latent class w, and \(\boldsymbol{\varphi }= \left \{\pi _{1},\ldots,\pi _{S-1},\boldsymbol{\theta }_{1},\ldots,\boldsymbol{\theta }_{S}\right \}\). Note that \(\pi _{S} = 1 -\sum _{w=1}^{S-1}\pi _{w}\). The log-likelihood function for an LC model – assuming i.i.d. observations – has the form \(\ell(\boldsymbol{\varphi };\mathbf{y}) =\sum _{ i=1}^{n}\log f(\mathbf{y}_{i};\boldsymbol{\varphi })\), which is straightforward to maximize (yielding the MLE - maximum likelihood estimator) by the EM algorithm (Dempster et al. 1977).

From parameter estimates of the LC model one can derive the posterior probability that an observation belongs to a certain latent class conditional on its response pattern. From the ML parameter estimates, Bayes’ theorem gives the posterior probability that observation i was generated by latent class w:

$$\displaystyle{ \hat{\alpha }_{iw} = \frac{\hat{\pi }_{w}f_{w}(\mathbf{y}_{i};\hat{\boldsymbol{\theta }}_{w})} {\sum _{v=1}^{S}\hat{\pi }_{v}f_{v}(\mathbf{y}_{i};\hat{\boldsymbol{\theta }}_{v})}, }$$
(2)

where \(\hat{\pi }_{w}\) and \(\hat{\boldsymbol{\theta }}_{w}\) are the ML estimates of π w and \(\boldsymbol{\theta }_{w}\), respectively.

The \(\hat{\alpha }_{iw}\) values define a soft partitioning/clustering of the data set, since \(\sum _{w=1}^{S}\hat{\alpha }_{iw} = 1\) and \(\hat{\alpha }_{iw} \in [0,1]\). Let c i represent the true cluster membership (the missing data) of observation i. Then, the optimal Bayes rule assigning observation i to the class with maximum posterior probability can be defined as follows:

$$\displaystyle{ \hat{c}_{i} =\arg \max _{w}\hat{\alpha }_{iw},i = 1,\ldots,n, }$$
(3)

where \(\hat{c}_{i}\) is the estimate of c i .

In this paper, we address the following question: How can we measure the level of uncertainty in the mapping from the [0, 1] soft partition to the {0,1} hard partition obtained by applying the optimal Bayes rule in time series analysis?

The paper is organized as follows: Sect. 2 presents the full mixture hidden Markov model used to obtain the posterior probabilities of the regimes; Sect. 3 introduces a measure of classification uncertainty; Sect. 4 studies its behavior using a synthetic example; and Sect. 5 illustrates the application of the procedure to a panel data set of twenty European stock markets. Section 6 gives concluding remarks.

2 Latent Class Modeling of Time Series Data

The mixture of hidden Markov models (MHMM-S) (Dias et al. 2008; Ramos et al. 2011) is defined by the density:

$$\displaystyle{ f(\mathbf{y}_{i};\boldsymbol{\varphi }) =\sum _{ w_{i}=1}^{S}\pi _{ w}\sum _{z_{i1}=1}^{K}\ldots \sum _{ z_{iT}=1}^{K}f(z_{ i1}\vert w_{i})\prod \limits _{t=2}^{T}f(z_{\mathit{ it}}\vert z_{i,t-1},w_{i})\prod \limits _{t=1}^{T}f(y_{\mathit{ it}}\vert z_{\mathit{it}}), }$$
(4)

where the conditional distribution within each latent class is given by a hidden Markov model with K regimes. \(\boldsymbol{\varphi }\) is the vector containing all parameters in the model. Thus, we assume that within latent class w the sequence \(\{z_{i1},\ldots,z_{iT}\}\) is in agreement with a first-order Markov chain. Moreover, we assume that the observed value y it at a particular time point depends only on the regime at this time point; i.e, conditionally on the regime z it , the response y it is independent of other time points, which is often referred to as the local independence assumption. As far as the first-order Markov assumption for the latent regime switching conditional on latent class membership w is concerned, it is important to note that this assumption is not as restrictive as one may initially think. It does clearly not imply a first-order Markov structure for the responses y it . The standard hidden Markov model (HMM) (Baum et al. 1970) is a special case of the MHMM-S that is obtained by eliminating the time-constant latent variable w from the model, that is, by assuming that there is no unobserved heterogeneity across time series.

The characterization of the MHMM is provided by:

  • π w is the prior probability of belonging to the latent class w;

  • \(f(z_{i1}\vert w_{i})\) is the initial-regime probability; that is, the probability of having a particular initial regime conditional on belonging to latent class w with multinomial parameter \(\lambda _{\mathit{kw}} = P(Z_{i1} = k\vert W_{i} = w)\);

  • \(f(z_{\mathit{it}}\vert z_{i,t-1},w_{i})\) is a latent transition probability; that is, the probability of being in a particular regime at time point t conditional on the regime at time point t − 1 and latent class membership; assuming a time-homogeneous transition process, we have \(p_{\mathit{jkw}} = P(Z_{\mathit{it}} = k\vert Z_{i,t-1} = j,W_{i} = w)\) as the relevant multinomial parameter. Note that the MHMM-S allows that each latent class has its specific transition or regime-switching dynamics, whereas in a standard HMM it is assumed that all cases have the same transition probabilities;

  • f(y it  | z it ), the probability density of having a particular value y it , conditional on the regime occupied at time point t, is assumed to have the form of a univariate normal (or Gaussian) density function. This distribution is characterized by the parameter vector \((\mu _{k},\sigma _{k}^{2})\) containing the mean (μ k ) and variance (σ k 2) for regime k. Note that these parameters are assumed invariant across latent classes, an assumption that may, however, be relaxed.

Since \(f(\mathbf{y}_{i};\boldsymbol{\varphi })\), defined by Eq. (4), is a mixture of densities across S latent classes and K regimes, it defines a flexible Gaussian mixture model that can accommodate deviations from normality in terms of skewness and kurtosis. For example, for two regimes (K = 2), the MHMM-S has 4S + 3 free parameters to be estimated, including S − 1 class sizes, S initial-regime probabilities, 2S transition probabilities, 2 conditional means, and 2 conditional variances.

Maximum likelihood (ML) estimation of the parameters of the MHMM-S involves maximizing the log-likelihood function: \(\ell(\boldsymbol{\varphi };\mathbf{y}) =\sum _{ i=1}^{n}\log f(\mathbf{y}_{i};\boldsymbol{\varphi })\), a problem that can be tackled by the Expectation-Maximization (EM) algorithm (Dempster et al. 1977). The E step computes the joint conditional distribution of the T + 1 latent variables given the data and the current provisional estimates of the model parameters. In the M step, standard complete data ML methods are used to update the unknown model parameters using an expanded data matrix with the estimated densities of the latent variables as weights. Since the EM algorithm requires the computation of S ⋅2T entries in the E step, which makes this algorithm impractical or even impossible to apply with more than a few time points. However, for hidden Markov models, a special variant of the EM algorithm was proposed that is usually referred to as the forward-backward or Baum-Welch algorithm (Baum et al. 1970). The Baum-Welch algorithm circumvents the computation of this joint posterior distribution making use of the conditional independencies implied by the model.

An important modeling issue is the setting of S and K, the number of latent classes and regimes needed to capture the unobserved heterogeneity across time series. The selection of S and K is typically based on information statistics such as the Bayesian Information Criterion (BIC) (Schwarz 1978) defined as:

$$\displaystyle{ \mathit{BIC}_{S,K} = -2\ell_{S,K}(\hat{\boldsymbol{\varphi }};\mathbf{y}) + N_{S,K}\log n, }$$
(5)

where N S, K is the number of free parameters of the model and n is the sample size.

3 An Entropic-Based Uncertainty Measure

Classification uncertainty can be measured by the posterior probabilities \(\hat{\alpha }_{\mathit{is}}\). An aggregate measure of classification uncertainty is the entropy. For LC models, the entropy is

$$\displaystyle{ \mathit{EN}(\boldsymbol{\alpha }) = -\sum _{i=1}^{n}\sum _{ s=1}^{S}\alpha _{ \mathit{is}}\log \alpha _{\mathit{is}}. }$$
(6)

Its normalized version has been used as a model selection criterion, indicating the level of separation of latent classes (Dias and Vermunt 20062008). The relative entropy that scales the entropy to the interval [0,1] is given by

$$\displaystyle{ E = 1 -\mathit{EN}(\boldsymbol{\alpha })/(n\log S). }$$
(7)

For well-separated latent classes, E ≈ 1; for ill-separated latent classes, E ≈ 0. This provides a method for assessing the “fuzzyness” of the partition of the data under the hypothesized model. The ML estimate of E\(\hat{E}\) – can be obtained using the MLE (\(\hat{\alpha }_{\mathit{is}}\)) of α is in Eq. (7).

We propose an extension of the relative entropy to panel data, that we call Entropy Regime Classification Measure (ERCM). For the time series y i , the ERCM is given by

$$\displaystyle\begin{array}{rcl} \mathit{ERCM}_{i} = 1 + \frac{1} {T\log K}\sum _{t=1}^{T}\sum _{ k=1}^{K}\alpha _{ \mathit{itk}}\log (\alpha _{\mathit{itk}}),& &{}\end{array}$$
(8)

where \(\alpha _{\mathit{itk}} = P(Z_{\mathit{it}} = k\vert \mathbf{y}_{i})\) is the probability that time series i is in regime k at time t conditional on the observed data.

4 A Synthetic Example

To understand the behavior of ERCM, let us assume that the posterior probabilities for each time series i and time t\(\boldsymbol{\alpha }_{\mathit{it}} = \left (\alpha _{\mathit{it}1},\ldots,\alpha _{\mathit{itK}}\right )\) – are:

$$\displaystyle\begin{array}{rcl} \boldsymbol{\alpha }_{\mathit{it}} = \left (\delta, \frac{1-\delta } {K - 1},\ldots, \frac{1-\delta } {K - 1}\right ),& &{}\end{array}$$
(9)

i.e., it represents a regime with probability δ and the remaining K − 1 regimes with identical probability: \((1-\delta )/(K - 1)\). Replacing vector \(\boldsymbol{\alpha }_{\mathit{it}}\), in Eq. (8), we obtain the expression below for ERCM:

$$\displaystyle\begin{array}{rcl} \mathit{ERCM}_{K,\delta } = 1 + \frac{1} {\log K}\left [\delta \log \delta +(1-\delta )\log \left ( \frac{1-\delta } {K - 1}\right )\right ].& & {}\\ \end{array}$$

Figure 1 depicts the ERCM as function of K and δ with values {2, 3, 4, 5}, and [0. 5, 1. 0], respectively. For example, for K = 2 and δ = 0. 5, the classification uncertainty is maximum and then ERCM  = 0.

Fig. 1
figure 1

ERCM as function of K and δ

As expected, the relation between δ and ERCM is nonlinear. For the same value of δ, increasing K leads to an increase of ERCM as the value of \((1-\delta )/(K - 1)\) decreases with an increase in K, and it becomes clearer the ‘right’ regime. In the opposite case, with δ = 1, then ERCM  = 1, with 0 ⋅log0 = 0.

5 Application

Modeling the dynamics of stock market returns has been an important challenge in modern financial econometrics. The statistics and dynamics of correctly specified distributions provide more accurate and detailed input for financial asset pricing and risk management. For example, investors buy or sell securities according to their expectation of the market regime. In addition, portfolio risk reduction might be achieved by procedures that take into account the synchronization of market regimes. Therefore, regime switching uncertainty is key in financial modeling.

The data set used in this article are daily closing prices from 4 July 2007 (the start of the subprime crisis) to 11 July 2011 for twenty European stock market indexes drawn from Datastream database and listed in Table 1. The series are expressed in US dollars. In total, we have 1,038 end-of-the-day observations per country. Let P it be the observed daily closing price of market i on day t, i = 1, , 20 and t = 0, , 1,037. The daily rates of return are defined as the log-returns multiplied by 100: \(y_{\mathit{it}} = 100 \times \log (P_{\mathit{it}}/P_{i,t-1})\), t = 1, , T, with T = 1,037.

Table 1 Summary statistics

This period was a very harsh one for the European stock markets. Table 1 provides descriptive statistics of the time series, while Fig. 2 depicts the log-returns time series. It can be seen that the mean is not positive for all markets in this period, however only for three markets – Finland, Greece, and Ireland – the median is negative. Stock markets show, instead, very different patterns of dispersion (Fig. 2); the largest standard deviation is found in Russia, Hungary, and Norway, while the smallest is in Switzerland (1.47). Return rate distributions are diverse in terms of skewness and the kurtosis (which equals 0 for normal distributions) shows high positive values, indicating heavier tails and more peakness than the normal distribution. The Jarque-Bera test rejects the null hypothesis of normality for all twenty stock markets.

Fig. 2
figure 2

Log-return time series for 20 European stock markets

Overall, these stock market features seem well suited to be modeled using MHMMs as we want to model simultaneously the 20 time series with typical cluster volatility. Given the traditional dichotomization of financial markets into “bull” and “bear” markets, we assume K = 2. We estimated models characterized by different number of latent classes (S = 1, , 5). To minimize the impact of local maxima, 300 different starting values for the parameters are used for each model. The model with two latent classes (S = 2) yielded the lowest BIC value (\(\ell_{2}(\hat{\varphi };\mathbf{y}) = -\)42,479.61, N 2 = 11 and BIC 2 = 84,992.17).

Table 2 provides information on the two regimes that were identified (K = 2); that is, the average proportion of markets in regime k over time and the mean and variance of the returns in regime k. The result is in line with the common dichotomization of financial markets into “bull” and “bear” markets. Consistently, the reported means show that one of the regimes is associated with positive returns (bull market) and the other with negative returns (bear market). The probability of being in the bear and bull regimes is 0.27 and 0.73, respectively. We would also like to emphasize that these results are coherent with the common acknowledgment of volatility asymmetry of financial markets. Volatility is likely to be higher when markets fall than when markets rise.

Table 2 Estimated marginal probabilities of the regimes and within Gaussian parameters

Table 3 reports the estimated probabilities of being in one of the regimes within each latent class. There is a clear distinction between these latent classes. Latent class 2 has the largest probability of being in bear regime (0.41). For latent class 1, this probability is 0.24. Moreover, Table 3 provides another key insight from our analysis. It gives the transition probabilities between the two regimes for both latent classes. First, notice that both latent classes show regime persistence. Once a stock market jumps into a regime, it is likely to remain within the same regime for a while, which is coherent with stylized facts in financial markets. Second, latent class 1 shows lower propensity to move from a bull regime to a bear regime (0.007) than latent class 2. Third, latent class 1 shows higher probability to jump from a bear to a bull regime than latent class 2.

Table 3 Characterization of the switching regimes

The sojourn time is the expected number of days that a stock market stays in a given regime. For regime k and latent class w it can be obtained by \(1/(1 - p_{\mathit{kkw}})\). As reported in Table 3, stock markets in latent class 2 stay the shortest number of days in both bear and bull markets, and consequently being the less stable group.

Table 4 summarizes the results related to the distribution of stock market across latent classes. From the posterior class membership probabilities, the probability of belonging to each of the latent classes conditional on the observed data, we found only two stock markets are more likely to belong to latent class two: Greece (1.00) and Hungary (0.99). For most of the stock markets the posterior probability is precise (the probability of the most likely latent class is always one or very close to one), the exception being Ireland with probability 0.16 of belonging to latent class 2. By combining the classification information with the descriptive statistics in Table 1, latent class 1 tends to contain countries with lower volatility than latent class 2.

Table 4 Estimated posterior probabilities and ERCM

Based on the posterior probabilities from the estimated model, Table 4 also reports the estimate of the ERCM for each stock market. Above 0.9 we have Switzerland (0.936), Germany (0.923), Belgium (0.922), Netherlands (0.914), Denmark (0.9111), Czech Republic (0.910), and France (0.905) as the least uncertain stock markets. On the other hand, Greece (0.605) and Hungary (0.672) are the most uncertain stock markets with ERCM below 0.7. The third most uncertain market is Ireland (0.777). Thus, the ERCM complements the values of posterior probabilities providing a more detailed indicator of stock market uncertainty. These results are consistent with financial market stylized facts.

6 Conclusions

This paper provides an extension to the MHMM model (Dias et al. 2008; Ramos et al. 2011) as a tool for measuring classification uncertainty in financial time series analysis. The proposed measure of uncertainty – Entropy Regime Classification Measure (ERCM) – reveals the amount of uncertainty in a given time series. In the analysis of a sample of twenty stock markets for a period of 1,038 days, the best model was the one with two latent classes, with distinct types of regime switching. We conclude that the European market uncertainty in this period of analysis ranged from a minimum for Switzerland (ERCM = 0. 936) and a maximum for Greece (ERCM = 0. 605).

This approach should be further explored namely in applications where more than two regimes are needed. For instance, in modeling electricity prices it is standard to use at least three regimes as a result of abnormality in the markets and spike prices.