1 Introduction

A series of papers in the late 1960s and early 1970s by Leonard E. Baum and other researchers introduced statistical methods of Markov source and hidden Markov modeling [1]. HMMs have become popular models in the last two decades due to its flexible nature. The mathematical structure of HMM makes the theoretical basis for many real-world applications like speech recognition, facial expression recognition, gene prediction, gesture recognition, musical composition and Bio-informatics.

HMM, a statistical model designed using a Markov process with hidden states. Andrey Markov introduced the Markov model in the early 20th century. Later, a series of papers in the late 1960s and early 1970s by Leonard E. Baum and other researchers introduced statistical methods of Markov source and Markov modeling [1]. State transitions refer to the random change in states of the Markov process in discrete time. Markov model follows the concept of memory-less property, i.e. the transition from one state to other state depends only on the present state [2]. In HMM, emitted symbols are observable, and random transitions from one state to another state remains unobserved. The ease in the implementation, handling of sequential data and handling of variable-length inputs, makes HMM applicable for many real-life applications.

1.1 Motivation

In the last five decades, various researchers explored the HMM and its variant in various application domains. In 1970s, HMM has been applied in speech recognition. Since 1980, HMM has been extensively used in the domain of bioinformatics [3]. HMM are further classified into First-order HMM, Higher-Order HMM (HO-HMM), Hidden-Semi Markov Model (HSMM), Factorial HMM (FHMM), Second-Order HMM, Layered HMM (LHMM), Autoregressive HMM (AR-HMM), Non-Stationary HMM (NS-HMM) and Hierarchal HMM (HHMM) as depicted in Fig. 1. There is a need to bind the work done by various researchers in the area of HMMs.

Fig. 1
figure 1

Variants of Hidden Markov Model

1.2 Outline

This survey paper is structured as follows: Sect. 2 outlines the review process. Section 3 gives the preliminaries required for HMM. We reviewed the work of first-order HMM, HOHMM, HSMM and FHMM with their applications in Sects. 4.1, 4.2, 4.3 and 4.4 respectively. Application of second-order HMM, LHMM, AR-HMM and NS-HMM are explored in various domains in Sects. 4.5, 4.6, 4.7 and 4.8 respectively. In Sect. 4.9, we lay out the various applications of HHMM and finally, Sect. 5 summarizes the conclusions of the paper.

2 Review Process

2.1 Classification of Papers

In this review paper, we explored the applications of various types of HMM and categorized the papers based on several criteria. Table 1 represents the properties and categorization of Papers. Research questions in Table 2 helped in fetching all the essential information from the papers.

Table 1 Classification of papers

2.1.1 Distribution of Papers for HMM Variants (RQ1)

Figure 2 represents the number of papers reviewed for nine different types of HMM variants. Figure 2 shows that HSMM (29%) and first-order HMM (23%) are the commonly used HMMs variants. Rest of the variants are almost equally used with a difference of 1–2%. Only 3% of researchers used NS-HMM for their research work.

Fig. 2
figure 2

Number of papers considered using different Variants of Hidden Markov Model

Table 2 Research questions

2.1.2 Application Fields of HSMM (RQ2)

HS-HMM is mainly used in the area of analyzing tool wearing and musicology. We had considered eight and seven published papers in the area of tool wearing and musicology, respectively. Besides, HS-HMM are also explored in the stock market, data analysis, speech recognition and network analysis by considering two, three, three and three papers, respectively (Fig. 3).

Fig. 3
figure 3

Application areas of HSMM

2.1.3 HMMs for Speech Recognition (RQ3)

Fig. 4
figure 4

HMMs for speech recognition

At present, HMM is the most successful and simplified approach for speech recognition. Figure 4 represents that the first-order HMM is explored maximally by researchers for speech recognition. As evident from Fig. 4, Researchers had published three papers using each variant of HO-HMM, FHMM, second-order HMM and AR-HMM in the area of speech recognition. Furthermore, no paper had published using NS-HMM and HHMM in the area of speech recognition.

Fig. 5
figure 5

Application areas of HMMs

2.1.4 Application Areas with HMMs (RQ4)

Figure 5 represents that HMMs are widely used in the area of speech recognition (25% papers) and human activity recognition (25% papers). Additionally, HMMs are also used in the area of musicology (9% papers), data processing (7% papers) and network analysis (6% papers).

3 Preliminaries

Fig. 6
figure 6

Basic HMM Architecture [4]

HMM is a doubly stochastic finite model that calculates probability distribution over an infinite number of possible sequences [2]. It is used for studying the observed items from a discrete-time series. States have assigned transition probabilities, and every state emits symbol according to the emission probability of the state [5]. Figure 6 represents the underlying architecture of HMM.

Definition 1: HMM [4] is defined by quintuple \((S, O, A, B, \pi )\) where,

  • \(S={S_1,S_2,S_3,\ldots ,S_n}\) is a set of hidden states.

  • \(O(t)={o_1,o_2,\ldots ,o_m}\) is set of m-observable symbols at each time intervals.

  • A represents state transition probability and denoted by \(A=a_{ij}=\{P(X_{t+1}=S_j|X_t=S_i)| 1 \le i, j \le n\}\). Here \(a_{ij}\) represents the probability of moving from state i at time t to state j at time \(t+1\).

  • B represents symbol emission probability and denoted by \(B=b_j(t)=\{P(O(t)|X(t)=S_j)| 1 \le j \le n \}\) represents the probability of emitting symbol O(t) from state j.

  • \(\pi =\{\pi _i=P(X_1=S_i)| 1 \le i \le n \}\) is initial state probability.

4 Literature Survey

4.1 First-Order HMM

The basic HMM (discussed in Sect. 3) referred as first-order HMM [6]. We had summarized the first-order HMM in the area of speech recognition, human action recognition and analyzing genome structure.

Rabiner et al. [7] combined the techniques of vector quantization with HMM for generating speaker-independent and isolated word recognition system. Their system produced higher accuracy rate for word recognizer on the vocabulary of isolated digits. Levinson [8] recognized speaker-independent isolated digit using HMM and Linear Predictive Coding (LPC). Schwartz et al. [9] improved HMM for modeling phonemes in speech recognition by considering the trade off between robustness and specificity. Rabiner  [1] reviewed various aspects of HMM and applied it in speech recognition. Juang and Rabiner [10] applied HMM in speech recognition and observed an accuracy rate higher than 95% in speaker-independent tasks. Figure 7 represents various applications of first-order HMM.

Fig. 7
figure 7

Applications of first-order HMM

Bahl et al. [11] described a method for estimating the maximum mutual information for various parameters of HMM in speech recognition. Poritz [12] proposed a linear predictive HMM for analyzing the speech signals. The method was further applied for talker verification. Rose and Paul [13] described a system for baseline keyword recognition using HMM. Their system deals with the effect of linear channels and non-keyword speech. Lee and Hon [14] applied in speaker-independent phone recognition. They improved the accuracy using multiple codebooks of LPC parameters and Viterbi decoding. Juang [15] used HMM and dynamic time wrapping techniques for speech recognition. Varga and Moore [16] improved the task of speech recognition by signal decomposition using HMM. They recognized the concurrent events simultaneously for stationary and non-stationary noises.

Sonnhammer et al. [17] predicated the location and orientation of transmembrane helices in protein sequences. Churchill [18] studied the structure of a human genome segment and explored the correlation between discrete compositional domains and genome function. Soruri et al. [19] introduced a novel gene clustering approach using HMM and optimized it using particle swarm optimization algorithm. They described specific HMM for each gene sequences and evaluated probabilities for every individual sequence. Yamato et al. [5] proposed a method using HMM and feature-based bottom-up approach for human action recognition from a set of time-sequential images. Krogh [20] introduced HMM for labeled observations and developed a maximum likelihood method for estimating the parameters of the model.

Manogaran et al. [21] used Bayesian HMM with Gaussian mixture clustering for cancer diagnosis. They proposed a machine learning approach to model DNA copy number change in genome structure. Xin et al. [22] introduced a semi-automated diagnosis method for handling fault detection, identification and extraction at the same time. Yao et al. [23] proposed a routing method based on HMM for vehicle Adhoc Networks (VANET). Their proposed hybrids scheme predicted the vehicles future path based on the history of mobility patterns. Petersen et al. [24] modeled sepsis progression with HMM for studying patients heterogeneity. It extracts a patients physiological trajectory to identify patients with higher risks. Tang and Dong [25] detected malicious domain name using improved HMM in Spark environment. Zhuo et al. [26] used profile HMM for website fingerprinting attack on anonymous networks. The proposed approach identified website and webpage in the closed world setting. Putland et al. [27] detected underwater bio-phonic sounds using HMM. Their approach effectively detected Brydes whale vocalization irrespective of the duration and conflicted vessel passage sounds. Habayeb et al. [28] proposed HMM for identifying the time to fix bug reports. The approach enabled software quality teams for early indication of forecasted bug reports.

Ullah et al. [29] designed HMM-based algorithm for predicting the energy consumption in smart buildings. Further, they validated their model using the real-data collected from few selected building of South Korea. Yip et al. [30] modeled HMM for predicting earthquakes and introduced a latent Markov process for explaining the underground dynamics. Their model also predicts the magnitude and arrival time of further earthquakes. Pastell and Frondelius [31] developed HMM for calculating the time spend by dairy cows at the feed bunk using ultra-wide bands indoor positioning system. Further, they showed that the performance of their model could be improved using the Viterbi algorithm with logistic regression. Alshamaa et al. [32] designed HMM-based mobility model for tracking of older people. It will help in determining the trajectory of older peoples in an indoor environment. Liu et al. [33] predicted the driver intention for autonomous vehicles using HMM. They trained and tested their model by taking real data from the flyover.

Jiang et al. [34] introduced a dynamic fault prediction model based on HMM by analysing the dissolved gas. Using Jiang et al.s model, preventive action can be taken for maintaining the power transformers. Lu et al. [35] proposed a data mining approach based on HMM. Xu et al. [36] applied HMM with Eskins probabilistic detection algorithm for detecting the low-carbon anomaly and abuse of resources. It helps in the green technology innovation ecosystem. Joo et al. [37] generated an adaptive approach for estimating the batch size with HMM. The adaptive model could capture the changes in the process deduced from analyzing product quality data. Coast et al. [38] detected cardiac arrhythmia using HMM with statistical knowledge of ECG signals. They calculated the parameters using the maximum likelihood re-estimation algorithm. Yang et al. [39] applied HMM with vector quantization to recognize speaker-independent lexical tones for Mandarin speech. They showed that the recognition of speaker-independent tone requires pitch-base adjustment. Table 3 represents classification of papers related to first-order HMM.

Table 3 Classification of first-order HMM papers

4.2 Higher-Order HMM

Fig. 8
figure 8

Applications of higher-order HMM

HO-HMM generalizes the first-order HMM and extends the dependency from the previous state to n states (Fig. 8). Both transition and observation probability distribution depend on several previous states [40]. A HO-HMM of kth order is a HMM which considers HMM values up to lag k order [41].

Xiong and Mamon [41] introduced a self-updating model for the evolution of daily average temperature using HO-HMM. Further, they analysed the weather derivatives using their designed model. Zhu et al. [42] discussed the asset allocation problem using HO-HMM. They studied optimal portfolio selection using long term memories of varying hidden economic conditions and optimal asset collection. Lee and Jean [40] modeled piece-wise linear processes with HO-HMM. Their model will help in better behaviour approximation of real processes and reduced the error rate in the speech recognition for noisy Mandarin digits. Quan and Ren [43] recognized the most likely sequence of emotions in the text using weighted HO-HMM. Seifert et al. [44] applied parsimonious HO-HMM for analyzing array-based comparative genomics hybridization. The model enabled the interpolation between a mixture model and HO-HMM for detecting DNA polymorphism in a closely related genome.

Fig. 9
figure 9

Higher-order hidden Markov Model [144]

Lee and Lee [45] applied the HO-HMM for capturing the dynamics and duration of speech signals. Their proposed approach is robust against noise and speech recognition can be carried out with reduced error rates. Xiong et al. [46] applied the HO-HMM for car ownership behavioural analysis. Zhang et al. [47] presented a high accuracy and low-risk approach for predicting the trend in stock market price using HO-HMM. Chen and Qiu [48] proposed an approach for channel state of cognitive radio using HO-HMM. The approach was based on spectrum sensing slots to reduce the effect of latency between spectrum sensing. Figure 9 represents the application areas of higher-order HMM.

4.3 Hidden Semi-Markov Model

HSMM provide a way to deal explicitly with state durations. In HSMM, the underlying process of hidden state is a semi-Markov chain (Fig. 10). A hidden state remains in the same state for time duration d, also the hidden state emits d observed states [49]. The probability of going from one hidden state to others depends on the time elapsed since entering into the current state [50]. HSMM is also known as explicit duration HMM (DHMM) or variable-length HMM (VLHMM).

Fig. 10
figure 10

Hidden-semi Markov model [51]

Fig. 11
figure 11

Applications of Hidden Semi-Markov model

Narimatsu and Kasai [52] proposed two extended models (Interval state HSMM and Interval length probability HSMM) for analysing sequential data. These models support concepts of state interval and state duration representation. Zhu and Liu [53] monitored online tool wearing using duration-dependent HSMM. Liu et al. [54] applied duration-dependent HSMM to diagnose equipments degradation process. Li et al. [55] applied an optimal Bayesian control scheme based on the three-state continuous-time hidden semi-Markov process for early detection of the fault gear shaft. Liu and Wang [56] decoded the time-varying distribution of Chinese stock market returns using three-state HSMM.

Xiao et al. [57] proposed a duration-dependent HSMM for analyzing online machine health. The analysis is useful in predicting the useful residual lifetime of the machine. Kong et al. [58] estimated tool wearing in the mining process with HSMM. The straightforward model provides higher accuracy rate. Wu et al. [59] presented lightweight and real-time fused deposition modeling for monitoring machine condition. The method used HSMM with acoustic emission to improve product quality and printing process reliability. Pertsinidou et al. [60] studied the application of HSMM for the assessment of seismic hazard in Greece. They used a simplified novel Viterbi algorithm for detecting precursory phases and provided warning for any anticipated earthquake occurrences.

Bang et al. [61] designed a scheme based on HSMM for detection of an anomaly in network-initiated LTE signaling attacks in wireless sensor networks. The proposed scheme captured both the temporal and spatial characteristics of the normal nodes. Tanwani and Calinon [62] investigated semi tied HSMM in learning of robot manipulation tasks. Cai et al. [63] applied HSMM for analyzing network protocols of the application layer. They modeled the protocol message format for maximizing the likelihood probability of keyword selection and message segmentation. Galvez et al. [64] HSMM model can be applied for generation and analysis of processes. Liu et al. [65] proposed a novel method for multi-sensor monitoring of health equipment.

Xiao and Dong [66] designed HSMM-based reputation management system in the online to the offline e-commerce market. They performed the usefulness of the model by demonstrating in real-life application. Yue et al. [67] proposed a logical hierarchal HSMM for recognizing the intention of each team member, team intention and working mode. Altuve et al. [68] introduced an online system for detecting apnea-bradycardia along with temporal evolution using HSMM. Votsi et al. [69] modeled HSMM for estimating occurrence rate of earthquakes. The application of HSMM in seismology was studied to identify features in the earthquake generation process. Figure 11 shows various applications of Hidden Semi-Markov Model.

Du et al. [70] performed genomic segmentation by using HSMM. The model was designed as a general segmentation engine for better sensitivity and specificity in genomic segmentation. Xu et al. [71] proposed a method for identifying user click patterns using HSMM. Further, they proposed a state selection algorithm and evaluated their result on the real data set of a state Telecom. Liu et al. [72] trained HSMM in max-margin learning framework for segmentation of mitosis event. The segmentation was performed in the time-lapse phase-contrast microscopy image sequence of stem cell populations. Boussemart and Cummings [49] presented a methodology for learning HSMM with human supervisory control setting. Dong and Peng [73] applied non-stationary segmental HSMM for predicting equipments health and maintenance. Liang et al. [74] introduced a voice activity detector with noise-robust using HSMM. They considered issues of feature distributions, temporal dependence and speech feature related to noise robustness. Xie et al. [75] proposed a forward-backwards algorithm for nested HSMM and applied it to a network traffic model.

Kerk et al. [76] applied HSMM in geographic positioning system location to reveal the multiphasic movement of the endangered Florida panthers. Duan et al. [77] used HSMM for detecting faults and predicting the useful remaining life of computer numerically controlled equipment. Chen et al. [78] generated audio chord recognition system using DHMM. They explicitly considered chords duration for recognizing the system. Karg et al. [79] performed clinical gait analysis with DHMMs. They modeled time series data of a group and applied the reference-based measure to compare the observations. Benouareth et al. [80] designed a recognition system for off-line handwritten Arabic words using explicit state duration semi-continuous HMM and segmentation-free approach.

Benetos and Weyde [81] used pitch-wise DHMM for transcription of multiple-instrument polyphonic music. It could be useful in model tone durations and temporal evolution presented in musical patterns. Yue et al. [82] presented DHMM based prognostics and diagnostics method for evaluating the residual life distribution of face milling. Calinon et al. [83] applied DHMM to encode information about time and position constraint in robot learning. Chordia et al. [84] modeled north Indian tabla sequences with Variable-length Markov model and VLHMM. The model could determine the next stroke from an audio file of tabla sequences. Senturk [85] performed computational modeling of improvised Turkish folk music with VLMM and prediction of melody in the music. Senturk and Chordia [86] designed a VLHMM for predicting melodies in musical structures. They generated melodic improvisation for Turkeys folk music. Pikrakis et al. [87] classified musical patterns from raw data using variable duration HMM. Dumont [88] statistically analyzed VLHMM and proposed an algorithm to find a consistent estimator for context tree estimation.

Chen et al. [89] proposed a system for recognizing off-line handwritten words. Their approach was based on continuous density VLHMM and morphological segmentation for recognition. Liang et al. [90] applied VLHMM for analyzing human behaviour. The model consists of labeling posture and learning-recognizing atomic human action modules. Cao et al. [91] proposed an approach for context-aware search using VLHMM. Various contexts of queries could be captured from the search session of log data. Bernard et al. [92] recognized Arabic isolated handwritten words using context-dependent and VLHMM.

4.4 Factorial HMM

FHMM is an extended HMM, allowing the modeling of several loosely coupled random processes. FHMM is a multi-layer state structure with improved representational capacity [93]. Each FHMM layer can be considered as a HMM and each layer work independently from other layers. The output of FHMM depends only on the current states of all the layers at the time [94] (Fig. 12).

Fig. 12
figure 12

Factorial hidden Markov model [95]

Fig. 13
figure 13

Applications of Factorial HMM

Ozerov et al. [96] designed Factorial Scaled HMM for representing the polyphonic audio music files. FSHMM was a generalization of Gaussian scaled mixture model and Itakura-Saito Non-negative Matrix Factorization model. Bonfigli et al. [97] proposed a non-intrusive monitoring algorithm for appliances using active-reactive power of additive Factorial HMM. Their proposed algorithm will help the user to modify their habits for saving the electrical energy. Khorasani et al. [98] recognized amyotrophic lateral sclerosis (ALS) patient using FHMM. FHMM distinguishes ALS patients and healthy subjects by removing the unwanted data from stride interval time and extracting useful data. Chen et al. [93] recognized gait features with FHMM and Parallel HMM (PHMM). FHMM and PHMM were introduced as a feature-level fusion scheme and decision-level fusion scheme respectively for combining gait features. The applications of Factorial HMM are shown in Fig. 13.

Betkowska et al. [94] recognized robust speech for the home environment by applying FHMM architecture. They recognized speech in the presence of sudden non-stationary noises. Li et al. [99] recognized faults using independent component analysis (ICA) and FHMM. ICA reduced redundancy and extracted features from multi-channel detection. FHMM recognized the faults in speed up and down process of the rotating machinery. Husmeier [100] detected mosaic structures in DNA sequence using a phylogenetic tree and FHMM. The model discriminated between rate heterogeneity and inter-specific recombination in the DNA sequence alignment. Durrieu and Thiran [101] proposed FHMM with source/filter model to achieve robust pitch and formant tracks in speech processing. Kolter and Jaakkola [102] worked on approximate inference problem in additive FHMM. Table 4 represents the major research findings of FHMM and its applications.

Table 4 Major Research findings of FHMM and its applications

4.5 Second-Order HMM

In a second-order HMM, the transition probability of a state at any time depends on the two previous states at the time. The sequence of the state depends on the second-order Markov chain. The state duration of these models is estimated by the probability of entering any state only once, and the probability of visiting any state at least twice [103]. Figure 14 represents different applications of second-order HMM.

Fig. 14
figure 14

Applications of Second-order HMM

Hyun et al. [104] designed a log-Viterbi algorithm for recognizing human activities in smart homes with increased accuracy and decreased time complexity using second-order HMM. Kabir et al. [105] also recognized human activity in the home environment using two-layer HMM. One layer contains the location information, whereas the second layer contains the object information. Their model also mapped low-level sensor data to high-level activity based on binary sensor data. Zhou et al. [106] used a two-stage HMM for detecting biomarker. They modeled HMM with the local false discovery rate (FDR) for detecting a significant association in microbiome research for practical analysis. Liang et al. [107] presented a system to filter and classify ECG signals using two-layer HMM in a free-living environment. Othman and Aboulnasr [108] applied second-order HMM for face recognition. The model used a non-overlap strategy to reduce the computational load. Wu et al. [109] proposed a two-layered HMM for human action recognition by decomposing the problem in two layers. First layers modelled the actions of two-arms, whereas the second layer modeled the relation in two arms. Zhang et al. [110] modeled the actions of individuals and groups in a meeting using two-layered HMM. The first layer mapped low-level features of individual actions, and the second layer takes input from the first layer to recognize group actions.

Mari et al. [111] showed that second-order HMM could yield high-performance forward and phoneme-based speech recognition task. Thede and Harper [112] used second-order HMM for tagging part-of-speech using lexical and contextual probabilities. Wei et al. [113] proposed a model for monitoring daily activities using body sensor network with two-layered HMM. The lower-layer HMM processed sensory data locally to decrease data transmission and the top-layer extracted the sequence of activity from locally processed data.

4.6 Layered HMM

In LHMM, several composed HMMs at each layer runs parallel to each other. Each layer provides an output to the higher layer. For enabling fast re-training of the model, these models are trained layer-by-layer [114]. Each layer is connected to the next layer through inferential results [115].

Fig. 15
figure 15

Applications of Layered HMM

Lee and Cho [116] applied LHMM for recognizing long and short-term activities with in-built mobile sensors on Android platform. The LHMM could model temporal patterns using multi-dimensional data. Razin et al. [117] learned characteristics of the human operators performance from surface electromyography for predicting their intentions in task operations using LHMM. Glodek et al. [114] applied LHMM for recognizing human activities based on the modalities multitude. The model detected complex activities from a stream of class assignments provided by the classifiers on the previous layer. Glodek et al. [118] improved human activity recognition problem by incorporating uncertainty of the class decision. Aarno and Kragic [119] modeled human skills using LHMM with greater discriminating power. They modeled the complex task of motion intention recognition even with miss-classifications present in the layers. Oliver et al. [120] represented humans activity from real-time streams of video, computer interaction and acoustic with LHMM. The applications of Layered HMM are shown in Fig. 15.

Oliver et al. [115] recognized the users activity of a multimodal, real-time approach in an office environment using LHMM. The layered representation enabled the learning of humans office activity with multiple sensory channels. Barnard and Odobez [121] used LHMM with an unsupervised low-level clustering to recognize events in sports videos. Zhang et al. [122] proposed cross-layered HMM (CLHMM) for surveillance events recognition. The cross-layer reduced computational complexity and increased the accuracy rate. Runsewe and Samaan [123] proposed layered multi-dimensional HMM for cloud resource scaling in big data streaming applications. Solaimanpour and Doshi [124] used LHMM with Monte Carlo algorithm to predict the motion of a robot online. The predicted motions enabled updating nested track that could track other robots in a known environment. Ingels [125] recognized connected text with LHMM and token passing. The robust tokenizer was implemented to recover from segmentation and lexical error on the text input. Perdikis et al. [126] also recognized the inherent characteristics of human actions with LHMM. The first layer of the model detected short and primitive motions and upper layer were processed to recognize human actions.

4.7 Autoregressive HMM

AR-HMM models can capture temporal structures in time series data. The current observation \(x^t\) of AR model is the linear combination of p previous observations \(x^{t-p},\ldots ,x^{t-2},x^{t-1}\) [127]. AR-HMM can explicitly model the longer-range correlations of sequential data by adding direct stochastic dependence among observations [128] (Fig. 16).

Fig. 16
figure 16

Autoregressive hidden Markov model [129]

Stanculescu et al. [128] designed an AR-HMM for early detection of neonatal sepsis. They modeled the distribution of observed physiological events of patients with AR-HMM. Dang et al. [130] proposed an AR-HMM for Effective connectivity (EC) learning in brain regions with fMRI signals. They modeled unobserved fMRI data and neuronal activity lost over time. Zhao et al. [131] proposed an order self-learning ARHMM for detecting the online outlier in the grade analysis process of geological minerals. The model did not set any detection threshold and applied detection-before-update and detection-based update strategies to avoid outliers influence. Malesevic et al. [132] presented a computational technique to control the multifunctional artificial hand with multichannel surface electromyography (EMG). The vector AR-HMM was used for decrypting movement of every individual finger through surface EMG signals. Figure 17 represents applications of Autoregressive HMM.

Fig. 17
figure 17

Applications of Autoregressive HMM

Seifert et al. [133] exploited local dependencies in local chromosomes for identifying tumour genes using higher-order AR-HMM. Nakamura et al. [134] modeled symbolic music performances with AR-HSMM. The model had better computational time and accuracy as compared with HMMs. Barber et al. [135] used AR-HMM in the wind power industry to model short-horizon wind forecasting. The ARHMM with some approximation inference methods could be used in missing data situations. Sasou et al. [136] applied AR-HMM to extract features from singing voices. The model estimated the characteristic of the articulatory system and signals from the high-pitched voice. Quillen [137] used AR-HMM for synthesizing speech. The model enhanced the stability of estimated predictor coefficients.

Ai et al. [138] investigated the use of AR-HMM estimated occupancy for smart building. They calculated the total number of occupants in a research laboratory of a building using a deployed network with wireless sensors. Guan et al. [127] recognized activities from time-series data with AR-HMM. They proposed a graphical model that could predict instance and bag labels using tractable inference algorithm. Dong [139] diagnosed equipments health with AR-HSMM that combined temporal knowledge and shape information. Bryan and Levinson [140] proposed an approach based on AR-HMM for inferring structures in linguistic of the speech signal.

4.8 Non-stationary HMM

NS-HMM was introduced to capture state duration behaviour by defining a set of dynamic transition probability parameters. It can model state duration probabilities explicitly as a function of time. In transition process, the time duration in a state is used for estimating the probability of the next transition. NS-HMM is a generalized version of the state duration model and Baum–Welch algorithm [141]. The applications of Non-stationary HMM are shown in Fig. 18.

Fig. 18
figure 18

Applications of Non-stationary HMM

Chen et al. [142] used NS-HMM for predicting spectrum occupancies. The model realized the time-varying property of stochastic behaviour of a primary user and estimated parameters by using a variant of the Baum-Welch algorithm. Chatzis and Demiris [143] on the modeling of sequential data with NS-HMM. Lin and Tseng [144] modeled the fading properties of mobile satellite link channels using NS-HMM and predicted the characteristics in the satellite-to-earth channel. Hui et al. [145] studied the principles of NS-HMM and applied in POS tagging and pinyin-to-character conversion.

4.9 Hierarchical HMM

HHMM is a stochastic process having multi-levels states that describe a sequence of input at various levels of granularity. It is an HMM with internal states generated from sub-HMM. HHMM has a tree-like structure where nodes of a tree are states of the model, and the trees edges define their transitions. The states of HHMM emit sequences by the repeated activation of any of sub-state of a state [146] (Fig. 19).

Fig. 19
figure 19

Hierarchical hidden Markov model [147]

Fine et al. [146] introduced HHMM in 1998 and modeled natural English text with HHMMs. They also applied HHMM for identifying the repeated strokes, which represent letters in the cursive handwriting. Kerr [148] designed HHMM for analyzing the melodic structures. The analyzed structures could be used in music compositions. Weiland et al. [149] extracted musical pitch structures representing musical patterns using HHMM. Hoffman et al. [150] explored the application of Hierarchical Dirichlet Process HMM (HDP-HMM) for generating data-driven music. The models were trained with multiple songs and produced output from different hybrid inputs.

Patel et al. [151] used multi-level HHMM to deduce the users manipulative activities. The probabilistic algorithm was used to learn and grasp complex manipulation activities of human in everyday life. Martindale et al. [152] performed smart annotation of cyclic data with HHMM and reduced the cost of labeling data based on sensors. Marco et al. [153] presented an HHMM for systematic annotation of chromatin states at different length scales. The model investigated the use of higher-order chromatin structure of gene regulation. Chen et al. [154] performed a single-molecule protein transportation experiment with HHMM. Raman and Maybank [155] used non-parametric HHMM for human activity recognition. The model enabled automatic inference of all states and facilitated information with semi-supervised learning. Figure 20 represents applications of Hierarchical hidden Markov model.

Fig. 20
figure 20

Applications of Hierarchical HMM

Karaman et al. [156] proposed HHMM to detect daily living activities in videos collected from the wearable camera. The patients wore the camera for studying dementia disease. Table 5 represents the major findings of HHMM in various applications.

Table 5 Major findings of HHMM in various applications

5 Conclusion

HMMs were introduced in the late 1960s, but the basic theory of Markov chain was known to the mathematicians for around 80 years. HMM was first applied to the problems of speech recognition in the mid-1970s. Many researchers in 1980 began to use HMMs in various fields like bioinformatics, musicology, gesture recognition, trend analysis, data analysis and many more. Work done by various researchers with HMM variants for different application fields is reviewed in this paper. The paper provides an overview of HMM variants and their applications areas. Much work has been done with various HMMs for many application fields, but still, the use of HMMs in many new application fields are yet to be explored. To the best of author’s knowledge, this is the very first attempt to compile the research performed with different types of HMMs.