Definition

Hidden Markov models (HMMs) form a class of statistical models in which the system being modeled is assumed to be a Markov process with hidden states. From observed output sequences generated by the Markov process, both the output emission probabilities from the hidden states and the transition probabilities between the hidden states can be estimated with dynamic programming methods. The estimated model parameters can then be used for various sequence analysis purposes.

Motivation and Background

The states of a regular Markov model, named after Russian mathematician Andrey Markov (1865–1922), are directly observable; hence its only parameters are the state transition probabilities. In many real-world cases, however, the states of the system that one wants to model are not directly observable. For instance, in speech recognition, the audio is the observable stream, while the goal is to discover the phonemes (the categorical elements of speech) that emitted the audio. Hidden Markov models offer one type of architecture to estimate hidden states through indirect means. Dynamic programming methods have been developed that can estimate both the output emission probabilities and the transition probabilities between the hidden states, either from observations of output sequences only (an unsupervised learning setting) or from pairs of aligned output sequences and gold-standard hidden sequences (a supervised learning setting).

Structure of the Learning System

Figure 1 displays the general architecture of a hidden Markov model. Each circle represents a variable x i or y i occurring at time i; x t is the discrete value of the hidden variable at time t. The variable y t is the output variable observed at the same time t, said to be emitted by x t . Arrows denote conditional dependencies. Any hidden variable is only dependent on its immediate predecessor; thus, the value of x t is only dependent on that of xt−1 occurring at time t − 1. This deliberate simplicity is referred to as the Markov assumption. Analogously, observed variables such as y t are conditionally dependent only on the hidden variables occurring at the same time t, i.e., x t in this case.

Hidden Markov Models, Fig. 1
figure 81figure 81

Architecture of a hidden Markov model

Typically, a start state x0 is used as the first hidden state (not conditioned by any previous state), as well as an end state xn+1 that closes the hidden state sequence of length n. Start and end states usually emit meta-symbols signifying the “start” and “end” of the sequence.

An important constraint on the data that can in principle be modeled in a hidden Markov model is that the hidden and output sequences need to be discrete, aligned (i.e., one y t for each x t ), and of equal length. Sequence pairs that do not conform to these constraints need to be discretized (e.g., in equal-length time slices) or aligned where necessary.

Training and Using Hidden Markov Models

Hidden Markov models can be trained both in an unsupervised and a supervised fashion. First, when only observed output sequences are available for training, the model’s conditional probabilities from this indirect evidence can be estimated through the Baum-Welch algorithm (Baum et al. 1970), a form of unsupervised learning, and an instantiation of the expectation-maximization (EM) algorithm (Dempster et al. 1977).

When instead aligned sequences of gold-standard hidden variables and output variables are given as supervised training data, both the output emission probabilities and the state transition probabilities can be straightforwardly estimated from frequencies of co-occurrence in the training data.

Once trained, it is possible to find the most likely sequence of hidden states that could have generated a particular (test) output sequence by the Viterbi algorithm (Viterbi 1967).

Applications of Hidden Markov Models

Hidden Markov models are known for their successful application in pattern recognition tasks such as speech recognition (Rabiner 1989) and DNA sequencing (Kulp et al. 1996) but also in sequential pattern analysis tasks such as in part-of-speech tagging (Church 1988).

Their introduction in speech recognition in the 1970s (Jelinek 1998) led the way toward the introduction of stochastic methods in general in the field of natural language processing in the 1980s and 1990s (Charniak 1993; Manning and Schütze 1999) and into text mining and information extraction in the late 1990s and onward (Freitag and McCallum 1999). In a similar way, hidden Markov models started to be used in DNA pattern recognition in the mid-1980s and have gained widespread usage throughout bioinformatics since (Durbin et al. 1998; Burge and Karlin 1997).

Programs

Many implementations of hidden Markov models exist. Three noteworthy packages are the following:

  • UMDHMM by Tapas Kanungo. Implements the forward-backward, Viterbi, and Baum-Welch algorithms (Kanungo 1999)

  • JAHMM by Jean-Marc François. A versatile Java implementation of algorithms related to hidden Markov models (François 2006)

  • HMMER by Sean Eddy. An implementation of profile HMM software for protein sequence analysis (Eddy 2007)

Cross-References