Abstract
Starting from the concept of regular Markov models we introduce the concept of hidden Markov model, and the issue of estimating the output emission and transition probabilities between hidden states, for which the Baum-Welch algorithm is the standard choice. We mention typical application in which hidden Markov models play a central role, and mention a number of popular implementations.
Access provided by CONRICYT-eBooks. Download reference work entry PDF
Similar content being viewed by others
Definition
Hidden Markov models (HMMs) form a class of statistical models in which the system being modeled is assumed to be a Markov process with hidden states. From observed output sequences generated by the Markov process, both the output emission probabilities from the hidden states and the transition probabilities between the hidden states can be estimated with dynamic programming methods. The estimated model parameters can then be used for various sequence analysis purposes.
Motivation and Background
The states of a regular Markov model, named after Russian mathematician Andrey Markov (1865–1922), are directly observable; hence its only parameters are the state transition probabilities. In many real-world cases, however, the states of the system that one wants to model are not directly observable. For instance, in speech recognition, the audio is the observable stream, while the goal is to discover the phonemes (the categorical elements of speech) that emitted the audio. Hidden Markov models offer one type of architecture to estimate hidden states through indirect means. Dynamic programming methods have been developed that can estimate both the output emission probabilities and the transition probabilities between the hidden states, either from observations of output sequences only (an unsupervised learning setting) or from pairs of aligned output sequences and gold-standard hidden sequences (a supervised learning setting).
Structure of the Learning System
Figure 1 displays the general architecture of a hidden Markov model. Each circle represents a variable x i or y i occurring at time i; x t is the discrete value of the hidden variable at time t. The variable y t is the output variable observed at the same time t, said to be emitted by x t . Arrows denote conditional dependencies. Any hidden variable is only dependent on its immediate predecessor; thus, the value of x t is only dependent on that of xt−1 occurring at time t − 1. This deliberate simplicity is referred to as the Markov assumption. Analogously, observed variables such as y t are conditionally dependent only on the hidden variables occurring at the same time t, i.e., x t in this case.
Typically, a start state x0 is used as the first hidden state (not conditioned by any previous state), as well as an end state xn+1 that closes the hidden state sequence of length n. Start and end states usually emit meta-symbols signifying the “start” and “end” of the sequence.
An important constraint on the data that can in principle be modeled in a hidden Markov model is that the hidden and output sequences need to be discrete, aligned (i.e., one y t for each x t ), and of equal length. Sequence pairs that do not conform to these constraints need to be discretized (e.g., in equal-length time slices) or aligned where necessary.
Training and Using Hidden Markov Models
Hidden Markov models can be trained both in an unsupervised and a supervised fashion. First, when only observed output sequences are available for training, the model’s conditional probabilities from this indirect evidence can be estimated through the Baum-Welch algorithm (Baum et al. 1970), a form of unsupervised learning, and an instantiation of the expectation-maximization (EM) algorithm (Dempster et al. 1977).
When instead aligned sequences of gold-standard hidden variables and output variables are given as supervised training data, both the output emission probabilities and the state transition probabilities can be straightforwardly estimated from frequencies of co-occurrence in the training data.
Once trained, it is possible to find the most likely sequence of hidden states that could have generated a particular (test) output sequence by the Viterbi algorithm (Viterbi 1967).
Applications of Hidden Markov Models
Hidden Markov models are known for their successful application in pattern recognition tasks such as speech recognition (Rabiner 1989) and DNA sequencing (Kulp et al. 1996) but also in sequential pattern analysis tasks such as in part-of-speech tagging (Church 1988).
Their introduction in speech recognition in the 1970s (Jelinek 1998) led the way toward the introduction of stochastic methods in general in the field of natural language processing in the 1980s and 1990s (Charniak 1993; Manning and Schütze 1999) and into text mining and information extraction in the late 1990s and onward (Freitag and McCallum 1999). In a similar way, hidden Markov models started to be used in DNA pattern recognition in the mid-1980s and have gained widespread usage throughout bioinformatics since (Durbin et al. 1998; Burge and Karlin 1997).
Programs
Many implementations of hidden Markov models exist. Three noteworthy packages are the following:
-
UMDHMM by Tapas Kanungo. Implements the forward-backward, Viterbi, and Baum-Welch algorithms (Kanungo 1999)
-
JAHMM by Jean-Marc François. A versatile Java implementation of algorithms related to hidden Markov models (François 2006)
-
HMMER by Sean Eddy. An implementation of profile HMM software for protein sequence analysis (Eddy 2007)
Recommended Reading
Baum LE, Petrie T, Soules G, Weiss N (1970) A maximization technique occurring in the statistical analysis of probabilistic functions of Markov chains. Ann Math Stat 41(1):164–171
Burge C, Karlin S (1997) Prediction of complete gene structures in human genomic DNA. J Mol Biol 268:78–94
Charniak E (1993) Statistical language learning. The MIT Press, Cambridge, MA
Church KW (1988) A stochastic parts program and noun phrase parser for unrestricted text. In: Proceedings of the second conference on applied natural language processing, Austin, pp 136–143
Dempster A, Laird N, Rubin D (1977) Maximum likelihood from incomplete data via the EM algorithm. J R Stat Soc Ser B 39(1):1–38
Durbin R, Eddy S, Krogh A, Mitchison G (1998) Biological sequence analysis: probabilistic models of proteins and nucleic acids. Cambridge University Press, Cambridge
Eddy S (2007) HMMER. http://hmmer.org/
François J-M (2006) JAHMM. https://code.google.com/p/jahmm/
Freitag D, McCallum A (1999) Information extraction with HMM structures learned by stochastic optimization. In: Proceedings of the national conference on artificial intelligence. The MIT Press, Cambridge, MA, pp 584–589
Jelinek F (1998) Statistical methods for speech recognition. The MIT Press, Cambridge, MA
Kanungo T (1999) UMDHMM: hidden Markov model toolkit. In: Kornai A (ed) Extended finite state models of language. Cambridge University Press, Cambridge. http://www.kanungo.us/software/software.html
Kulp D, Haussler D, Reese MG, Eeckman FH (1996) A generalized hidden Markov model for the recognition of human genes in DNA. Proc Int Conf Intell Syst Mol Biol 4:134–142
Manning C, Schütze H (1999) Foundations of statistical natural language processing. The MIT Press, Cambridge, MA
Rabiner LR (1989) A tutorial on hidden Markov models and selected applications in speech recognition. Proc IEEE 77(2):257–286
Viterbi AJ (1967) Error bounds for convolutional codes and an asymptotically optimum decoding algorithm. IEEE Trans Inf Theory 13(2):260–269
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2017 Springer Science+Business Media New York
About this entry
Cite this entry
van den Bosch, A. (2017). Hidden Markov Models. In: Sammut, C., Webb, G.I. (eds) Encyclopedia of Machine Learning and Data Mining. Springer, Boston, MA. https://doi.org/10.1007/978-1-4899-7687-1_124
Download citation
DOI: https://doi.org/10.1007/978-1-4899-7687-1_124
Published:
Publisher Name: Springer, Boston, MA
Print ISBN: 978-1-4899-7685-7
Online ISBN: 978-1-4899-7687-1
eBook Packages: Computer ScienceReference Module Computer Science and Engineering