FormalPara Suggested Prerequisites

Previous programming experience; a basic understanding of probability; knowledge of Markov models, such as in Module 13.4, “Probable Cause: Modeling with Markov Chains,” from [ 19 ]; understanding of parallel programming for optional Sect. 9 , “Parallel Forward Algorithm,” and optional Sect. 10.3 , “Parallel Viterbi Algorithm” and the corresponding projects.

1 Introduction

We can employ Hidden Markov Models (HMMs) to solve a variety of problems from facial recognition and language translation to animal movement characterization and gene discovery (see “Further Reading”). In such a problem, we know a sequence of observations, but we are not certain that our observations are accurate; the actual sequence of states is hidden, or unknown. However, we often know for each state the probability that it is the initial one, the probabilities of transitioning from any one state to every other state, and the probabilities of a state resulting in each type of observation. Hidden Markov Models can attack three types of problems:

  • Likelihood problem: Calculate the probability of a particular sequence of observations

  • Decoding problem: For a particular sequence of observations, determine the most likely underlying sequence of states

  • Training problem: For a sequence of observations and a sequence of hidden states, discover the most likely HMM parameters

In this article, we cover the basics of Hidden Markov Models, algorithms, and applications involving the first two types of problems. We begin with an example of how HMMs are used to advance our understanding of genetics and the workings of the human body.

1.1 Case in Point

Nina was 15 years old and 50 inches tall. Her mother was concerned that her daughter was so short. The Centers for Disease Control and Prevention growth charts indicate a range for normal height for 15-year-old females to be between 60 and 68 inches [2]. Testing done from her physical examination indicated that Nina has growth hormone deficiency. Children diagnosed with this deficiency have inadequate secretion of growth hormone (GH), a hormone produced and stored by somatotropic cells of the pituitary gland at the base of the brain. Growth hormone molecules are stored in the pituitary until growth hormone-releasing hormone (GHRH) is secreted by the brain. The secreted GHRH binds specifically to growth hormone-releasing hormone receptors, found within the cell membranes that form the surface of the somatotropic cells. The receptor is associated with a protein, called a G-protein, which transduces the external signal into internal chemical signals (second messengers). The second messengers induce the cell to release GH.

Because Nina’s brother was given the same diagnosis, both patients were referred for genetic evaluation. The results of these tests indicated that both children had a genetic mutation in the gene that codes for the GHRH receptor. This mutation may alter the receptor enough that it no longer binds to the signal as well, and therefore there is less internal response to the signal. Hence, less GH is released into the bloodstream. Children with growth hormone deficiency are generally treated with periodic injections of GH.

The mechanism for the release of growth hormone from the pituitary is very complex. Much is still unknown about G-proteins and their receptors, so scientists are working to unravel these mysteries. So, how do G-proteins and receptors relate to hidden Markov models? One example is found in experiments conducted at the University of Birmingham, UK [22]. Using sophisticated imaging studies of individual G-proteins and receptors, researchers could track and map the diffusion of particles (proteins) along the membrane. Using HMM, they assumed that particles shift among discrete diffusive states that follow a randomly determined course. They found that G-proteins and receptors move through four discernable, diffusive states, varying from immobile to fast diffusing, and have association with four different diffusion coefficients, which are the possible observations. Their results were consistent with results of the imaging studies. From these experiments and analyses, the scientists concluded that G-proteins and their receptors are sequestered into small membrane compartments. Such restriction of diffusion allows the two particles to bind more easily, although the binding is very short-lived.

From these results we understand more about how these important components in cell signaling work. As we figure out the intricacies of cell signaling, we may discover ways to modify or correct defects in the components.

2 Example Model

Although the studies of human growth factor are quite interesting, there are many other more approachable problems. Moreover, we will consider other genetic applications of HMMs later in this chapter. Thus, we begin our study of the mathematics of HMMs with a simpler hypothetical example involving animal behavior.

With a primary diet of leaves, which are not very nutritional and are hard to digest, red howler monkeys spend most of their time eating and resting. Suppose scientists in a simplified appraisal considered the monkey to be in two primary states, eating ( E ) or resting/sleeping ( R ), so that the set of possible states is S = {E, R}. Moreover, suppose the biologists observe that, on average, the monkeys spend 30% of their time eating and 70% sleeping or resting. In this case, where u 0 is the state at time 0 and P indicates probability, the initial state probability, π, for E is π(E)  =  P(u 0  =  E)  =  0.30, and the initial state probability for R is π(R)  =  P (u 0  =  R)  =  0.70. (As we will see later, the choice of variable, u, represents the “underlying,” or hidden, state.)

Suppose also the biologists determined that if a monkey is eating at hour k, so that its state is u k  =  E, then there is a 60% chance that the animal will be eating the next hour (k + 1), or u k+1  =  E. Thus, in terms of conditional probability, P(u k+1 =  E | u k =  E) = 0.6 and P(u k+1 =  R | u k =  E) = 1 − 0.6 = 0.4; that is, given u k =  E, the probability of u k+1 =  E is 0.6, and the probability of u k+1 =  R is 0.4. Suppose the scientists also discovered that P(u k+1 =  E | u k =  R) = 0.2, so that P(u k+1 =  R | u k =  R) = 1 − 0.2 = 0.8. If resting at hour k, the animal has a 20% chance of eating and an 80% chance of resting the next hour. Figure 1 presents a state diagram of the findings, with probabilities of transitioning from one state to another on the arrows. Thus, the following transition matrix summarizes their findings:

We can also denote these transition probabilities, t, as t(previous state, next state), so t(E, E) = 0.6, t(E, R) = 0.4, t(R, E) = 0.2, and t(R, R) = 0.8. The transition probabilities (or, comparably, matrix T) form a Markov model with each state, u k+1, only depending on its previous state, u k, and no other states.

Fig. 1
figure 1

State diagram for hypothetical study on red howler monkeys

Suppose, however, scientists want to study the behavior of a red howler monkey in a more remote area. Knowing they will have limited opportunities of making visual observations, they attach a small microphone to one of the monkeys, whom they name Holly. The biologists discern that when hearing munching (M), the monkey is probably eating; but when hearing breathing (B) noises, the animal is likely to be at rest. Thus, the set of possible observations is O = {M, B}. These hypothetical researchers have developed a computer program to analyze the sounds and record B or M once an hour. However, the microphone/computational results are not completely accurate. Besides background noises, such as from rain or other monkeys, a sleeping Holly might be moving her mouth, perhaps dreaming of luscious leaves. For a while, the scientists are able to observe Holly personally and with their computer-enhanced microphone. In doing so, they discover that there is a 90% chance that if Holly is resting (u k = R), then their monitoring system indicates breathing noises (v k = B). (We select the symbol v to represent the “visible,” or observed, symbol obtained from the monitoring system). Thus, the emission probability, e, of B at state R is e(B | R) = P(v k = B | u k = R) = 0.9, so that e(M | R) = P(v k = M | u k = R) = 1 − 0.9 = 0.1. However, the scientists discover that the system is only 80% accurate in detecting eating; given that Holly is eating (u k = E), their computer program interprets the audio signal as munching noises (v k = M) 80% of the time. Thus, the emission probability of M given state E is e(M | E) = P(v k = M | u k = E) = 0.8 and e(B | E) = P(v k = B | u k = E) = 1 − 0.8 = 0.2. The HMM property of output independence states that regardless of the situation, the probability of an observation, v k, only depends on the corresponding underlying state, u k, that leads to the observation and no other states or observations. For example, the probability of the system displaying an output of munching (M) depends exclusively on underlying state of Holly eating (E).

After Holly scampers into the jungle, where the scientists cannot make visual observations, their monitoring system records an observation of B or M each hour. Using this sequence of observations, or observed symbol sequence (v = v 1, v 2, v 3, …, v n, abbreviated v 1v 2v 3v n), and the measures for initial, transition, and emission probabilities, (π, t, e), the scientists can answer a number of questions. Such problems generally fall in one of three categories: likelihood, decoding, and learning problems. For example, the scientists might want to determine the probability, or likelihood, of obtaining an observed symbol sequence, P(v), to discern if v is an unusual sequence of observations or not. As another likelihood-type problem, the scientists might want to determine the probability that a particular sequence of states, or underlying state sequence (u = u 1u 2u 3u n) would generate a particular v; so that they need to evaluate P(v | u). Additionally, given a particular observation, v k, the scientists might want to know the probability of an underlying state, u k, written P(u k | v k). Determining P(u k ≠ u k+1 | v) represents a change-detection problem. In this case, given a sequence of monitoring system readings, we are determining the probability of Holly eating 1 h but resting the next, or vice versa. Perhaps in earlier studies we observed that usually the monkeys eat for exactly two time periods before sleeping deeply for at least 3 h, and we might want to use our system to estimate Holly’s sleeping habits. In a decoding-type problem, they might be interested in finding the most likely u to generate a particular observed sequence, v. In a learning-type problem or training-type problem, for an observation sequence, v, and a set of states, we would be interested in determining the parameters for the system. For example, suppose we do not know the numbers for the initial state probabilities and in the transition and output matrices of the HMM associated with Holly. Given a long sequence of observations, such as v = MMMBBBBBBMM…BB, a learning-type problem is to derive those numbers for the HMM that maximize the likelihood of observing the sequence, v. Determination of these parameters is called training the HMM.

In all cases, we are using observations and probabilities, which include a Markov model, to estimate something that is hidden. Hence, the name of this system is Hidden Markov Model (HMM). The HMM for Holly consists in the following parameters, which Fig. 2, an expansion of Fig. 1, diagrams:

Fig. 2
figure 2

Diagram of HMM for hypothetical study on red howler monkeys

Holly’s HMM

State space, or set of possible states, S = {E, R}, with elements representing eating and resting/sleeping, respectively.

Observation space, or set of possible observations, O = {M, B}, with elements representing munching and breathing noises, respectively.

Initial state probabilities, π(E) = 0.30 and π(R) = 0.70

Transition probabilities, t(E, E) = 0.6, t(E, R) = 0.4, t(R, E) = 0.2, and t(R, R) = 0.8, summarized by the following transition matrix:

$$\displaystyle \begin{aligned}\begin{array}{c} \\ \\ T= \end{array} \begin{array}{ccc} u_k/{u_{k+1}}&\begin{array}{cc} \text{E} \,\, & \text{ R}\end{array}\\[0.07in] \begin{array}{c} \text{E}\\ \text{R} \end{array}\,\,&\left[ \begin{array}{cc} 0.6\,\,&0.4\\ 0.2\,\,&0.8\end{array}\right]\end{array} \end{aligned}$$

Emission probabilities, e(M | E) = 0.8, e(B | E) = 0.2, e(M | R) = 0.1, and e(B | R) = 0.9, summarized by the following output, or emission, matrix:

$$\displaystyle \begin{aligned}\begin{array}{ccc} hidden/observable&\text{M B}\\[0.07in] \begin{array}{c} \text{E}\\ \text{R} \end{array}\,\,&\left[ \begin{array}{cc} 0.8\,\,&0.2\\ 0.1\,\,&0.9\end{array}\right]\end{array} \end{aligned}$$

Answers to Quick Review Questions appear at the end of the module, after “ Projects .”

Quick Review Question 1

Consider the HMM diagram in Fig. 3 where the initial state probability for A is 0.2 and for B is 0.1. Determine each of the following:

  1. a.

    The set of possible states, S

  2. b.

    The set of possible observations, O

  3. c.

    t(B, C)

  4. d.

    t(B, A)

  5. e.

    The transition matrix with column headings being u k+1

  6. f.

    π(A)

  7. g.

    π(C)

  8. h.

    e(G | B)

  9. i.

    e(H | C)

  10. j.

    The output (emission) matrix with column headings being observable values

Fig. 3
figure 3

HMM diagram for Quick Review Question 1

3 Probability Equalities

Before we can start solving some of the problems with the HMM, we need to consider several probability equalities.

3.1 Joint Probability

Joint probability evaluates the probability of the simultaneous occurrence of two events, A and B, which are not necessarily independent as follows:

$$\displaystyle \begin{aligned} \boldsymbol{P(A \mbox{ and } B)} = \boldsymbol{P(A,\, B)} = \boldsymbol{P(A\,|\,B)\cdot P(B)} \end{aligned} $$
(1)

and

$$\displaystyle \begin{aligned} \boldsymbol{P(A \mbox{ and } B)} = \boldsymbol{P(A,\, B)} = \boldsymbol{P(B\,|\,A)\cdot P(A)}, \end{aligned} $$
(2)

where the comma means “and,” or “intersection.”

For example, consider Holly’s HMM. Specifically, P(E) = π(E) = 0.30, P(R) = π(R) = 0.70, and the output, or emission, matrix is

Knowing their probabilities, we can condition on the hidden states, E or R. For example, P(hidden = E and observable = M) = P(E, M) = P(M | E) ⋅ P(E) = 0.8 ⋅ 0.3 = 0.24.

Quick Review Question 2

Calculate each of the following:

  1. a.

    P(E, B)

  2. b.

    P(R, M)

  3. c.

    P(R, B)

  4. d.

    P(E, M) + P(E, B) + P(R, M) + P(R, B)

  5. e.

    x ∈{M, B}P(E, x) = P(E, M) + P(E, B)

  6. f.

    x ∈{M, B}P(R, x)

  7. g.

    The sum of the answers to Parts e and f

  8. h.

    x ∈{E, R}P(x, M)

  9. i.

    x ∈{E, R}P(x, B)

  10. j.

    The sum of the answers to Parts h and i

To obtain an intuition for why the joint probability relationship Eq. (1) is true, consider the Venn diagram in Fig. 4, where the area for A consists in the areas a and c, while B contains areas B and c. Let us evaluate each term and verify that the left-hand side equals the right-hand side in Eq. (1). P(A, B), the probability that an item is simultaneously in A and B, is the area of the intersection, c, divided by the diagram’s entire area, (a + b + c + d), or

$$\displaystyle \begin{aligned}P(A,\, B)= \frac{c}{a + b + c + d}.\end{aligned}$$

Asking for P(A | B) means, “Given that an item is in B, what is the probability that the item is also in A?” B’s area is b + c, while the part of A that is simultaneously in B is c, so

$$\displaystyle \begin{aligned}\displaystyle P(A\,|\, B) = \frac{c}{b + c}.\end{aligned}$$

The probability that an item is in B considers the area of B, (b + c), in relationship to the whole diagram, whose area is (a + b + c + d), so that

$$\displaystyle \begin{aligned}\displaystyle P(B) = \frac{b + c}{a + b + c + d}.\end{aligned}$$

Substituting in Eq. (1) and simplifying, as follows, we verify that the left-hand side does indeed equal the right-hand side:

This argument is not a proof of Eq. (1) but does provide intuition into the equality.

Fig. 4
figure 4

Venn diagram

Quick Review Question 3

Using a similar argument as for Eq. (1), provide an intuitive argument for Eq. (2).

If A and B are independent, so that A does not depend on B and vice versa, P(A | B) = P(A) and P(B | A) = P(B). Thus, for independent events, the equality reduces to the following:

$$\displaystyle \begin{aligned} \boldsymbol{P(A \mbox{ and } B)= P(A,\,B)= P(A)\cdot P(B) \mbox{ for independent}\; A \mbox{ and } B}. \end{aligned} $$
(3)

3.2 Marginal Probability

We can summarize the calculations from the last section, particularly Quick Review Question 2, for the joint probabilities in a matrix as follows:

(4)

Each matrix element gives a joint probability. For example, the probability that the monkey is eating (E) while the monitoring system simultaneously indicates munching (M) is P(E, M) = 0.24.

As developed in the quick review question, the sum of the probabilities in each row is to the right, and the sums of the column probabilities are below. When the monkey is eating (E), the equipment will either record munching (M) or breathing (B) with probabilities P(E, M) = 0.24 and P(E, B) = 0.06, respectively. Thus, the probability that the monkey is eating is 0.24 + 0.06 = 0.30, a value indicated in Holly’s HMM by π(E) = 0.30 and calculated in Quick Review Question 2 e for ∑ x ∈{M, B}P(E, x) = P(E, M) + P(E, B). That is, the probability of E is the sum of the probabilities of hidden value E with each of the possible observable values, M and B; or P(E) =∑ x ∈{M, B}P(E, x).

Additionally, the values below Matrix (4) are the sums of the probabilities in the columns. Here, we have new information. The probability of the equipment recording munching is 0.31; Holly is either eating (with probability 0.24) or resting/sleeping (with probability 0.07). Quick Review Question 2 h determined this value for ∑ x ∈{E, R}P(x, M) = P(E, M) + P(R, M), which is P(M). We see that the probability of observable value M is equal to the sum of the joint probabilities of M and every possible hidden state, E and R; or P(M) =∑ x ∈{E, R}P(x, M).

These row and column sums are called marginal probabilities because they are written in the “margins” of the matrix. For example, the marginal probability of M is P(M) =∑ x ∈{E, R}P(x, M) = P(E, M) + P(R, M) = 0.31, and we are marginalizing over {E, R}. Notice also that the sum of the marginal probabilities on the right (or below) is 1.00, which is the sum of all the probabilities in the matrix. In general, the marginal probability of y is as follows:

$$\displaystyle \begin{aligned} \boldsymbol{P(y)= \sum_{{x\in X}} P(x,y)= \sum_{{x\in X}} P(y,x)}. \end{aligned} $$
(5)

Quick Review Question 4

Consider the following matrix of joint probabilities:

$$\displaystyle \begin{aligned}\begin{array}{cccc} &&\text{F G H }\\[0.07in] \begin{array}{c} \text{J}\\ \text{K} \end{array}&\,\,&\left[ \begin{array}{ccc} 0.05\,\,&0.18\,\,&0.20\\ 0.36\,\,&0.11\,\,&0.10\end{array}\right]\end{array} \end{aligned}$$
  1. a.

    Determine P(J).

  2. b.

    Determine the marginal probability of K.

  3. c.

    For Part b, give the set over which we are marginalizing.

  4. d.

    Using the matrix of joint probabilities, the answer for P(J) in Part a, and joint probability formula (1) or (2), determine the conditional probability P(G | J).

  5. e.

    Using the answer for Part b, determine P(G | K).

  6. f.

    Write P(K) using sigma notation.

  7. g.

    Determine P(F).

  8. h.

    Using the answer for Part g, determine P(J | F).

  9. i.

    Using the answer for Part g, determine P(K | F).

  10. j.

    Determine the marginal probability of G.

  11. k.

    For Part j, give the set over which we are marginalizing.

  12. l.

    Write P(G) using sigma notation.

  13. m.

    Determine P(H).

3.3 Bayes’ Theorem

Bayes’ Theorem, or Bayes’ Law, which follows, is useful in manipulating conditional probabilities:

$$\displaystyle \begin{aligned} \boldsymbol{P(A\,|\,B)=}\frac{\boldsymbol{P(B\,|\,A)\cdot P(A)}}{\boldsymbol{P(B)}}. \end{aligned} $$
(6)

Sometimes, we do not know P(A | B) directly, but we do have values for each of the probability terms on the right. In this case, using Bayes’ Theorem we can easily evaluate P(A | B). For example, suppose we need to evaluate the probability of Holly eating (E) given that the equipment registers munching (M), P(E | M). From Holly’s HMM, we know P(M | E) = e(M | E) = 0.8 and P(E) = π(E) = 0.30. Moreover, in the “Marginal Probability” section, we calculated P(M) = 0.31. Thus, we can calculate P(E | M) using Bayes’ Law as follows:

$$\displaystyle \begin{aligned}P(\text{E}\,|\,\text{M})=\frac{P(\text{M}\,|\,\text{E})\cdot P(\text{E})}{P(\text{M})}=\frac{0.8\cdot 0.3}{0.31}\approx0.77.\end{aligned}$$

Exercise 1 provides an intuitive justification for Bayes’ Law.

Another version of Bayes’ Theorem in which the comma means “and,” or “intersection,” is as follows:

$$\displaystyle \begin{aligned} \boldsymbol{P(A\,|\,B,\,C)=}\frac{\boldsymbol{P(B\,|\,A,C)\cdot P(A\,|\,C)}}{\boldsymbol{P(B\,|\,C)}}. \end{aligned} $$
(7)

Notice that this form is the same as the original version, except C is always in the condition. Using a Venn diagram, Exercise 2 provides intuition into Eq. (7).

Quick Review Question 5

Evaluate each of the following using P(J) = 0.43, P(K) = 0.57, P(F) = 0.41, P(G) = 0.29, P(H) = 0.30, P(G | J) = 0.42, P(G | K) = 0.35, P(J | F) = 0.12, P(K | F) = 0.88, and the matrix of joint probabilities from Quick Review Question 4:

$$\displaystyle \begin{aligned}\begin{array}{cccc} &&\text{F G H }\\[0.07in] \begin{array}{c} \text{J}\\ \text{K} \end{array}&\,\,&\left[ \begin{array}{ccc} 0.05\,\,&0.18\,\,&0.20\\ 0.36\,\,&0.11\,\,&0.10\end{array}\right]\end{array} \end{aligned}$$
  1. a.

    P(J | G)

  2. b.

    P(K | G)

  3. c.

    P(F | J)

  4. d.

    P(F | K)

4 Probability of a State Given an Observation

Using Bayes’ Theorem, other probability equalities, and HMM, we can determine the probabilities of a variety of situations. For example, suppose after Holly scampers off into the jungle, at hour k the monitoring equipment registers v k =  M, munching. The scientists might wonder if she really is eating, or, symbolically, if it is true that u k =  E. They are asking, “For this single munching reading and without reference to earlier readings, what is the likelihood, or probability, that Holly is eating?” This question falls in the category of an HMM likelihood problem. In notation, they want to discover P(u k =  E | v k =  M), abbreviated P(E | M); that is, given a reading of M, what is the probability that Holly is in state E.

Examining the HMM model, we do not find the answer directly. However, using Bayes’ Theorem, we can rewrite the question as follows:

$$\displaystyle \begin{aligned} P(u_k=\text{E}\,|\,v_k=\text{M})=\frac{P(v_k=\text{M}\,|\,u_k=\text{E})\cdot P(u_k=\text{E})}{P(v_k=\text{M})} \end{aligned}$$

or, alternatively,

$$\displaystyle \begin{aligned} P(\text{E}\,|\,\text{M})=\frac{P(\text{M}\,|\,\text{E})\cdot P(\text{E})}{P(\text{M})}. \end{aligned}$$

The advantage of this equality is that we can evaluate each of the terms on the right. Consulting Holly’s HMM, P(M | E) = e(M | E) = 0.8, and P(E) = π(E) = 0.30. The denominator, P(M) does take a bit more thought. If the monitoring system records v k =  M, Holly could either be eating (u k = E, v k = M) or resting (u k = R, v k = M). Thus, marginalizing over E, R, we have

$$\displaystyle \begin{aligned}P(v_k = \text{M}) = P(u_k =\text{E},\; v_k = \text{M}) + P(u_k =\text{R},\; v_k = \text{M})\end{aligned}$$

or

$$\displaystyle \begin{aligned}P(\text{M}) = P(\text{E},\; \text{M}) + P(\text{R},\; \text{M}).\end{aligned}$$

Using a joint probability to determine the first summand, we have

$$\displaystyle \begin{aligned}P(\text{E},\; \text{M}) = P(\text{M},\; \text{E}) = P(\text{M}\,|\,\text{E}) \cdot P(\text{E}).\end{aligned}$$

That is, the probability of Holly eating leaves and the system recording munching is the same as the probability that the equipment correctly records munching when Holly is eating and Holly really is enjoying her dinner. Similarly,

$$\displaystyle \begin{aligned}P(\text{R}\mbox,\; \text{M}) = P(\text{M}\,|\,\text{R}) \cdot P(\text{R}).\end{aligned}$$

The probability of Holly resting and the system indicating munching is identical to the probability of an incorrect recording of munching when she is resting and Holly is actually inactive. Thus, putting the pieces together determined from joint and marginal probabilities, we have

$$\displaystyle \begin{aligned} \begin{array}{rcl} P(M) &\displaystyle =&\displaystyle P(\text{E},\; \text{M}) + P(\text{R},\; \text{M})\\ &\displaystyle =&\displaystyle P(\text{M}\,|\,\text{E}) \cdot P(\text{E}) + P(\text{M}\,|\,\text{R}) \cdot P(\text{R}). \end{array} \end{aligned} $$

Fortunately, from Holly’s HMM, we know each of the terms on the right: P(M | E) = e(M | E) = 0.8, P(E) = π(E) = 0.30, P(M | R) = e(M | R) = 0.1, and P(R) = π(R) = 0.70. Incorporating all calculations, we have the following:

$$\displaystyle \begin{aligned} \begin{array}{rcl} P(\text{E}\,|\,\text{M})&\displaystyle =&\displaystyle \frac{P(\text{M}\,|\,\text{E})\cdot P(\text{E})}{P(\text{M})}=\frac{P(\text{M}\,|\,\text{E})\cdot P(\text{E})}{P(\text{M}\,|\,\text{E})\cdot P(\text{E})+P(\text{M}\,|\,\text{R})\cdot P(\text{R})}\\ P(\text{E}\,|\,\text{M})&\displaystyle =&\displaystyle \frac{0.8 \cdot 0.3}{0.8\cdot 0.3+0.1\cdot 0.7}\approx 0.77. \end{array} \end{aligned} $$

There is a 77% chance of Holly eating when the system records munching.

In general, for Holly’s situation, the following equation determines the probability of an underlying state, u k, given an observation, v k:

$$\displaystyle \begin{aligned}P(u_k\,|\,v_k)=\frac{P(v_k\,|\,u_k)\cdot P(u_k)}{P(v_k)}=\frac{P(v_k\,|\,u_k)\cdot P(u_k)}{P(v_k\,|\,\text{E})\cdot P(\text{E})+P(v_k\,|\,\text{R})\cdot P(\text{R})} \end{aligned}$$

or, using sigma notation,

$$\displaystyle \begin{aligned}P(u_k\,|\,v_k)=\frac{P(v_k\,|\,u_k)\cdot P(u_k)}{\sum_{x\in \{\text{E},\,\text{R}\}}P(v_k\,|\,x)\cdot P(x)}. \end{aligned}$$

However, most systems have more than two states. Thus, for other HMMs with S being the set of all (hidden) states, we have the following:

$$\displaystyle \begin{aligned} \boldsymbol{P(u_k\,|\,v_k)=}\frac{\boldsymbol{P(v_k\,|\,u_k)\cdot P(u_k)}}{\boldsymbol{\sum_{{x\in S}}P(v_k\,|\,x)\cdot P(x)}}.\end{aligned} $$
(8)

Quick Review Question 6

For Holly’s HMM, determine the following:

  1. a.

    P(R | M)

  2. b.

    P(R | B)

  3. c.

    P(E | B)

Quick Review Question 7

Suppose P(F) = 0.41, P(G) = 0.29, P(H) = 0.30, P(J | F) = 0.12, P(J | G) = 0.62, P(J | H) = 0.26. Determine P(F | J).

5 Probability of a Sequence of States Generating a Sequence of Observations

In this section, we consider a problem that would usually not occur but whose answer will help us in the solution of other more realistic problems: Knowing a sequence of states, u, what is the probability of a particular sequence of observations, v, or P(v | u). For example, suppose we could spy on Holly in the jungle and discover that initially she rested (u 1 =  R), but in the next 2 h she ate (u 2 =  E and u 3 =  E), so that u =  REE. Also, suppose we want to discover the probability that the monitoring equipment registers breathing (v 1 =  B) followed by munching and then breathing noises in the next two readings (v 2 =  M and v 3 =  B), or v =  BMB. Thus, we are interested in P(v =  BMB | u =  REE) = P(BMB | REE). Figure 5 presents a trellis diagram of the situation, with state circles unshaded, observation circles shaded, and arrows denoting conditional dependencies.

Fig. 5
figure 5

Trellis diagram with state circles unshaded, observation circles shaded, and arrows denoting conditional dependencies

Because of the HMM property of output independence, the probability of each observation, v i, only depends on the corresponding state, u i. For example, the probability that the equipment’s third reading is B, given that Holly is eating the third hour, P(v 3 =  B | u 3 =  E), is the emission probability, e(B | E) = 0.2. Because of independence, to evaluate P(BMB | REE), we take the product of the three corresponding emission probabilities as follows:

$$\displaystyle \begin{aligned} \begin{array}{rcl} P(\text{BMB}\,|\,\text{REE}) &\displaystyle =&\displaystyle P(\text{B}\,|\,\text{R}) \cdot P(\text{M}\,|\,\text{E}) \cdot P(\text{B}\,|\,\text{E})\\ &\displaystyle =&\displaystyle e(\text{B}\,|\,\text{R}) \cdot e(\text{M}\,|\,\text{E}) \cdot e(\text{B}\,|\,\text{E})\\ &\displaystyle =&\displaystyle 0.9 \cdot 0.8 \cdot 0.2\\ &\displaystyle =&\displaystyle 0.144. \end{array} \end{aligned} $$

In general, for sequences of length n, u = u 1u 2u n and v = v 1v 2v n, we have

$$\displaystyle \begin{aligned}P(v|u)=e(v_1|u_1)\cdot e(v_2|u_2)\ldots e(v_n|u_n).\end{aligned}$$

We can abbreviate the product with pi notation to yield the following equality for the probability of an observation sequence given a hidden sequence:

$$\displaystyle \begin{aligned} \boldsymbol{P(v\,|\,u)=\prod_{{i=1}}^{{n}}e(v_i\,|\,u_i)}. \end{aligned} $$
(9)

Quick Review Question 8

  1. a.

    Evaluate P(MMMB | ERER)

  2. b.

    Suppose a team catches up with Holly in the jungle and finds her sleeping for 6 h. Calculate the probability that the equipment in the base camp registers correctly records breathing over that period.

6 Probability of a State Sequence and an Observation Sequence

Another likelihood problem considers the probability of a particular observation sequence and a particular state sequence. There is an important difference between this and the problem in the previous section. For the previous problem, given a hidden sequence, u, we wanted to determine the conditional probability of a particular visible sequence, v, namely P(v | u). In the current problem, we desire the joint probability that both a hidden sequence, u, and a visible sequence, v, occur, or P(v, u).

For example, in the case of Holly, we might want to know the chance of the equipment registering a sequence of breathing (B), munching (M), and breathing (B), while she is actually resting (R), eating (E), and eating (E). By the joint probability equality (1), we can calculate this probability, P(BMB, REE), as follows:

$$\displaystyle \begin{aligned} P(\text{BMB}, \text{REE}) = P(\text{BMB}\,|\,\text{REE}) \cdot P(\text{REE}). \end{aligned} $$
(10)

That is, the joint probability of both BMB and REE is the probability of BMB given REE and that REE really does occur.

We determined the conditional probability P(BMB | REE) in the previous section as 0.144, while Holly’s Markov model enables us to calculate the chance of the hidden state sequence REE. As the trellis diagram in Fig. 5 illustrates, REE involves three independent events: Holly is initially resting; given that she is resting the first hour, she is eating the next hour; and she continues eating from the second hour to the next. Thus, the probability of the state sequence REE is as follows:

$$\displaystyle \begin{aligned} P(\text{REE}) = P(u_1 = \text{R}) \cdot P(u_2 = E\,|\,u_1 = \text{R}) \cdot P(u_3 = \text{E}\,|\, u_2 = \text{E}).\end{aligned}$$

From Holly’s HMM, we know P(u 1 =  R) = π(R) = 0.70 and the transition probabilities P(u 2 =  E | u 1 =  R) = t(R, E) = 0.2 and P(u 3 =  E | u 2 =  E) = t(E, E) = 0.6. Thus, the probability of the state sequence REE is as follows:

$$\displaystyle \begin{aligned}P(\text{REE}) = \pi(\text{R}) \cdot t(\text{R}, \text{E}) \cdot t(\text{E}, \text{E}) = 0.70 \cdot 0.2 \cdot 0.6 = 0.084.\end{aligned}$$

Substituting values into Eq. (10), we have the following:

$$\displaystyle \begin{aligned}P(\text{BMB}, \text{REE}) = P(\text{BMB}\,|\,\text{REE}) \cdot P(\text{REE}) = 0.144 \cdot 0.084 \approx 0.012.\end{aligned}$$

Although given hidden sequence REE, there is approximately a 14% chance of observable sequence BMB, there is only about a 1% chance of BMB and REE simultaneously occurring.

Quick Review Question 9

Use the answers from Quick Review Question 8 as needed.

  1. a.

    Evaluate P(MMMB, ERER)

  2. b.

    Find the probability that Holly sleeps for 6 h and the monitoring equipment registers breathing over that same time period.

Generalizing, based on the Markov property, the probability of a sequence of observations, u, is as follows:

$$\displaystyle \begin{aligned} \boldsymbol{P(u)=\pi(u_1)\cdot \prod_{{i=2}}^{{n}}t(u_{{i-1}},u_i)}. \end{aligned} $$
(11)

Moreover, considering individual sequence elements, we have the following:

$$\displaystyle \begin{aligned} \boldsymbol{P(v\,|\,u)=\prod_{{i=1}}^{{n}}P(v_i\,|\,u_i)=\prod_{{i=1}}^{{n}}e(v_i\,|\,u_i)}. \end{aligned} $$
(12)

Thus, using Eqs. (11) and (12), the generalized formula for the joint probability of particular observable and hidden sequences is as follows:

$$\displaystyle \begin{aligned} \boldsymbol{P(v,\,u)=P(v\,|\,u)\cdot P(u)=\left(\prod_{{i=1}}^{{n}}e(v_i\,|\,u_i)\right)\left(\pi(u_1)\cdot \prod_{{i=2}}^{{n}}t(u_{{i-1}},u_i)\right)}. \end{aligned} $$
(13)

A useful reorganization of Eq. (13), which follows, is evocative of the trellis diagram in Fig. 5:

$$\displaystyle \begin{aligned} \begin{array}{rcl} P(v,\,u) &\displaystyle =&\displaystyle \pi(u_1)\cdot e(v_1\,|\,u_1)\cdot\left(\prod_{i=2}^{n}\left[t(u_{i-1},\,u_i)\cdot e(v_i\,|\,u_i)\right]\right)\\ &\displaystyle =&\displaystyle \pi(u_1)\cdot e(v_1\,|\,u_1)\cdot t(u_1,\,u_2)\cdot e(v_2\,|\,u_2)\cdot t(u_2,\,u_3)\cdot\\ &\displaystyle &\displaystyle e(v_3\,|\,u_3)\cdots t(u_{n-1},\,u_n)\cdot e(v_n\,|\,u_n). \end{array} \end{aligned} $$

Regrouping this expression, as follows, is revealing for the joint probability P(v, u) = P(u, v):

Notice that the product of the first two terms, which are in the inner brackets, \(\left [\pi (u_1)\cdot e(v_1\,|\,u_1)\right ]\), is P(u 1, v 1), or P(u, v), when u and v are one-element sequences. Moreover, the product of the terms in the next set of brackets is P(u 1,2, v 1,2), which is P(u, v) when u and v are two-element sequences. The expression in the next set of brackets calculates P(u 1,3, v 1,3); and the expression in the outermost set of brackets is P(u 1,n−1, v 1,n−1), which is P(u, v) for sequences of n − 1 elements. Thus, we can use recursion to define P(u, v). A recursive task is one that calls itself. In this example, we employ the joint probability P(u 1,n−1, v 1,n−1) in the definition of the joint probability P(u, v). The following recursive formula for P(u, v) will be useful in further algorithms:

(14)

7 Probability of a Sequence of Observations: The Forward Algorithm

A more realistic likelihood problem is to determine the probability of an observed sequence, P(v). For example, we might want to calculate the probability that the monitoring equipment registers breathing sounds and then munching noises the next 2 h, P(v =  BMM).

7.1 Obvious Solution

In the last section, we determined the simultaneous probability of a visible and a hidden sequence. Thus, the obvious solution for this short sequence is to marginalize over all the possible three-element hidden state sequences. For S = {E, R}, there are two choices, E and R, for each of the three positions, yielding 23 = 8 possible hidden sequences. Thus, marginalizing over this set of eight underlying sequences, U = {RRR, RRE, RER, REE, ERR, ERE, EER, EEE}, we can calculate P(BMM) as follows:

Expanding the first sum, we have the following:

$$\displaystyle \begin{aligned} \begin{array}{rcl} P(\text{BMM}) &\displaystyle =&\displaystyle P(\text{BMM}, \text{RRR}) + P(\text{BMM}, \text{RRE}) + P(\text{BMM}, \text{RER})\\ &\displaystyle &\displaystyle +\, P(\text{BMM}, \text{REE}) + P(\text{BMM}, \text{ERR}) + P(\text{BMM}, \text{ERE})\\ &\displaystyle &\displaystyle +\, P(\text{BMM}, \text{EER}) + P(\text{BMM}, \text{EEE}). \end{array} \end{aligned} $$

We can calculate each of these joint probabilities using Eqs. (12) or (13) from the last section. Such a solution seems feasible, although tedious. However, consider the situation where we have sequences of 10 elements. The number of 10-element sequences formed from {E, R} is 210 = 1024, so that we would have 1024 summands. Doubling to 20 observations, the number of possible sequences, and, hence, the number of summands, is over a million, 220 = 1, 048, 576. Astoundingly, the number sequences of length 100 is over 1030. As these calculations illustrate, for h hidden states and n observations, there are h n number of possible hidden sequences; as n gets larger, h n grows exponentially. Thus, this solution has an exponential growth rate on the order of h n, written O(h n).

Quick Review Question 10

For each string length, find the number of strings formed from the bases A, C, T, and G.

  1. a.

    3

  2. b.

    10

  3. c.

    20

  4. d.

    21

7.2 Forward Algorithm

For realistic problems, often the number of states (h) and certainly the number of observations (n) are large, so that h n is enormous, too great to compute by hand, and, in fact, intractable even for a computer. Clearly, we must find a better way to calculate P(v), the probability of an observed sequence. Fortunately, the forward algorithm is much faster. The algorithm employs dynamic programming, which divides a problem into a collection of smaller problems and uses the solutions to these smaller problems to solve the larger problem. For the forward algorithm, we store answers to the smallest problems in the first column of a matrix. Repeatedly, we solve progressively larger problems, employing the answers in the previous column and storing the answers in a new column. Finally, the answer to the overall problem, the probability of an observed sequence, is the sum of the elements in the last column.

For P(BMM), we employ a matrix with the same number of rows as the number of states, h, and the same number of columns as the length of the observation sequence, n. For Holly’s HMM, with S = {E, R} and observed sequence v = BMM, h = 2 and n = 3, so we store values in a 2 × 3 matrix, F, with row and column headings from S and v, respectively:

7.3 Forward Algorithm Initialization

Initially, we solve the smaller problem of P(B), which by marginalizing Eq. (5) is as follows:

$$\displaystyle \begin{aligned}P(\text{B})=\sum_{x\in S}P(\text{B},\,x)=P(\text{B},\,\text{E})+P(\text{B},\,\text{R}).\end{aligned}$$

We will place the calculation of P(B, E) in \(f_{{ }_{\text{E}1}}\), the element in the E row and first column of F, while \(f_{{ }_{\text{R}1}}\) will be P(B, R). The probability of u 1 =  B is the sum of these two values in the first column. For the calculation of the summands, we employ the step for n = 1 of the recursive formula (14) as follows:

$$\displaystyle \begin{aligned}P(\text{B},\,\text{E})=\pi(\text{E})\cdot e(\text{B}\,|\,\text{E})\end{aligned}$$

and

$$\displaystyle \begin{aligned}P(\text{B},\text{R})=\pi(\text{R})\cdot e(\text{B}\,|\,\text{R}).\end{aligned}$$

The initial probabilities are π(E) = 0.30 and π(R) = 0.70, and the emission probabilities are e(B | E) = 0.2, e(B | R) = 0.9. Thus, as Fig. 6 illustrates, P(B, E) = 0.06 and P(B, R) = 0.63. Thus, placing these values from the initialization step in the first column, the matrix F is as follows:

Fig. 6
figure 6

Initialization step of the forward algorithm

Quick Review Question 11

The HMM in Quick Review Question 1 and Fig. 3 contains the following information:

S = {A, B, C} and O = {G, H}

π(A) = 0.2, π(B) = 0.1, π(C) = 0.7

$$\displaystyle \begin{aligned}\begin{array}{cccc} u_k/{u_{k+1}}&\text{A B C}\\[0.07in] T=\begin{array}{c} \text{A}\\ \text{B}\\ \text{C} \end{array}\,\,&\left[ \begin{array}{ccc} 0.1\,\,\,\,&0.4\,\,\,\,&0.5\\ 0.2\,\,\,\,&0.2\,\,\,\,&0.6\\ 0.3\,\,\,\,&0.2\,\,\,\,&0.5\end{array}\right]\end{array} \end{aligned}$$

Suppose we wish to use the forward algorithm to calculate P(HHGH).

  1. a.

    Give the size of F, the forward algorithm matrix.

  2. b.

    Calculate \(f_{{ }_{A1}}\).

  3. c.

    Write the conditional probability for \(f_{{ }_{\text{A}1}}\) symbolically; that is, write \(f_{{ }_{\text{A}1}}\) as P(x | y) with the symbols for x and y for this example.

  4. d.

    Calculate \(f_{{ }_{\text{B}1}}\).

  5. e.

    Write the conditional probability for \(f_{{ }_{\text{B}1}}\) symbolically.

  6. f.

    Calculate \(f_{{ }_{\text{C}1}}\).

  7. g.

    Write the conditional probability for \(f_{{ }_{\text{C}1}}\) symbolically.

  8. h.

    Calculate P(H).

7.4 Forward Algorithm Step 2

In the calculation of P(BMM), after initialization, which calculates the summands of P(B), we approach a little bit larger problem, P(BM). Fortunately, as we will see, we can use the values in the first column in the solution of this probability. As we did in the initialization step, we employ marginality to split the problem into a sum of joint probabilities. With marginality, we have

$$\displaystyle \begin{aligned} P(\text{BM})=\sum_{x \in S^2} P(\text{BM}, \,x), \end{aligned} $$
(15)

where S 2 represents the set of all two-element sequences with elements from S = {E, R}, S 2 = {EE, ER, RE, RR}; and, in general, S n denotes the set of all n-element sequences over S. Thus, expanding Eq. (15) symbolically, we have the following sum:

$$\displaystyle \begin{aligned} P(\text{BM}) = P(\text{BM}, \text{E}\mathbf{E}) + P(\text{BM}, \text{E}\underbar{\text{R}}) + P(\text{BM}, \text{R}\mathbf{E}) + P(\text{BM}, \text{R}\underbar{\text{R}}). \end{aligned} $$
(16)

Let us calculate one of these summands, P(BM, ER). Applying the second line of the recursive definition of P(u, v), Eq. (14), we have the following:

$$\displaystyle \begin{aligned}P(\text{BM}, \text{ER}) = P(\text{B}, \text{E}) \cdot t(\text{E}, \text{R}) \cdot e(\text{M}\,|\,\text{R}).\end{aligned}$$

Fortunately, we calculated P(B, E) in the initialization step, and its value is matrix element \(f_{{ }_{E1}} = 0.06\). Moreover, Holly’s HMM gives t(E, R) and e(M | R) as 0.4 and 0.1, respectively. Thus, P(BM, ER) = 0.06 ⋅ 0.4 ⋅ 0.1 = 0.0024.

Quick Review Question 12

Calculate P(BM, RR).

For \(f_{{ }_{\text{E}2}}\) we add the elements of Eq. (16) with sequences that end in E. Similarly, f R2 is the sum of elements with sequences that end in R. Rewriting Eq. (16) illustrates the process:

$$\displaystyle \begin{aligned} \begin{array}{ccccc} P(\text{BM})&=& [P(\text{BM}, \text{E}\mathbf{E}) + P(\text{BM}, \text{R}\mathbf{E})] &+&[P(\text{BM}, \text{E}\underbar{\text{R}}) + P(\text{BM}, \text{R}\underbar{\text{R}})]\\ &=& f_{{}_{\text{E}2}} &+&f_{{}_{\text{R}2}}. \end{array} \end{aligned} $$
(17)

With sigma notation, we have the following equations for the second-column elements:

$$\displaystyle \begin{aligned}f_{{}_{\text{E}2}}=\sum_{x\in S} P(\text{BM},x\text{E})\end{aligned}$$

and

$$\displaystyle \begin{aligned}f_{{}_{\text{R}2}}=\sum_{x\in S} P(\text{BM},x\text{R}).\end{aligned}$$

From the calculations above and in Quick Review Question 12, we know that P(BM, ER) = 0.0024 and P(BM, RR) = 0.0504. Thus, \(f_{{ }_{\text{R}2}} = P(\)BM, ER) + P(BM, RR) = 0.0024 + 0.0504 = 0.0528.

Quick Review Question 13

Using P(BM, EE) = 0.0288 and P(BM, RE) = 0.1008, calculate \(f_{{ }_{\text{R}2}}\).

Figure 7 illustrates the calculation of the second column of matrix F with P(BM, EE) + P(BM, R\(\mathbf {E}) = f_{{ }_{\text{E}2}}\) and P(BM, ER) + P(BM, RR \() = f_{{ }_{\text{R}2}}\). Thus, the developing F now is as follows:

Moreover, the sum of the elements in the second column, 0.1824, is P(BM); that is, for sequences of length two, there is an 18.24% chance of BM being an output sequence.

Fig. 7
figure 7

Calculation of the second column of forward matrix, F

Quick Review Question 14

The HMM in Quick Review Questions 1 and 11 and Fig. 3 contains the following information:

S = {A, B, C} and O = {G, H}

π(A) = 0.2, π(B) = 0.1, π(C) = 0.7

$$\displaystyle \begin{aligned}\begin{array}{cccc} u_k/{u_{k+1}}&\text{A B C}\\[0.07in] T=\begin{array}{c} \text{A}\\ \text{B}\\ \text{C} \end{array}\,\,&\left[ \begin{array}{ccc} 0.1\,\,\,\,&0.4\,\,\,\,&0.5\\ 0.2\,\,\,\,&0.2\,\,\,\,&0.6\\ 0.3\,\,\,\,&0.2\,\,\,\,&0.5\end{array}\right]\end{array} \end{aligned}$$
$$\displaystyle \begin{aligned}\begin{array}{ccc} hidden/observable&\text{G H}\\[0.07in] \begin{array}{c} \text{A}\\ \text{B}\\ \text{C} \end{array}\,\,&\left[ \begin{array}{cc} 0.9\,\,&0.1\\ 0.6\,\,&0.4\\ 0.1\,\,&0.9\end{array}\right].\end{array} \end{aligned}$$

Suppose we wish to use the forward algorithm to calculate P(HHGH). As calculated in Quick Review Question 11, the first column of the forward algorithm matrix F contains \(f_{{ }_{\text{A}1}} = 0.02\), \(f_{{ }_{\text{B}1}} = 0.04\), and \(f_{{ }_{\text{C}1}} = 0.63\).

  1. a.

    Give the number of summands in the calculation of \(f_{{ }_{\text{B}2}}\).

  2. b.

    For \(f_{{ }_{\text{B}2}}\), calculate the first summand, which employs \(f_{{ }_{\text{A}1}}\) as a factor.

  3. c.

    For \(f_{{ }_{\text{B}2}}\), calculate the second summand, which employs \(f_{{ }_{\text{B}1}}\) as a factor.

  4. d.

    For \(f_{{ }_{\text{B}2}}\), calculate the third summand.

  5. e.

    Calculate \(f_{{ }_{\text{B}2}}\).

7.5 Forward Algorithm Completion

The calculation of the third column of F for P(BMM) proceeds in a similar manner to that of the second column. We use the small-problem answers in the previous column to calculate the values for the new column. Quick Review Question 15 steps through the process.

Quick Review Question 15

This question completes the forward algorithm for P(BMM).

  1. a.

    Write P(BMM) using sigma notation similar to that in Eq. (15).

  2. b.

    Write out S 3.

  3. c.

    Enumerate P(BMM) symbolically as in Eq. (17), grouping together the elements of f E3 and f R3.

  4. d.

    Notice in the calculation of f E3 that two probabilities have sequences that end in EE, P(BMM, EEE) and P(BMM, REE). Moreover, by Eq. (14), we have the following:

    $$\displaystyle \begin{aligned} \begin{array}{rcl} P(\text{BMM}, \,\text{EEE}) &\displaystyle =&\displaystyle P(\text{BM},\, \text{EE}) \cdot \boldsymbol{t(\mathbf{E}, \,\mathbf{E}) \cdot e(\mathbf{M}\,|\,\mathbf{E})}\\ P(\text{BMM},\, \text{REE}) &\displaystyle =&\displaystyle P(\text{BM},\, \text{RE}) \cdot \boldsymbol{t(\mathbf{E}, \,\mathbf{E}) \cdot e(\mathbf{M}\,|\,\mathbf{E})}. \end{array} \end{aligned} $$

    With the last two factors being identical, we can group the sum of the probabilities as follows:

    $$\displaystyle \begin{aligned} P(\text{BMM}, \text{EEE}) + P(\text{BMM}, \text{REE}) {=} [P(\text{BM}, \text{EE}) + P(\text{BM}, \text{RE})] \cdot \boldsymbol{t(\mathbf{E}, \mathbf{E}) \cdot e(\mathbf{M}|\mathbf{E})}. \end{aligned}$$

    In a similar manner, show the development to write the sum of the remaining two terms of f E3 using \(f_{{ }_{\text{R}2}}\).

  5. e.

    As in Part d, write the sum of two terms of f E3 using \(f_{{ }_{\text{E}2}}\).

  6. f.

    As in Part d, write the sum of two terms of f R3 using \(f_{{ }_{\text{E}2}}\).

  7. g.

    Evaluate P(BMM, EEE) + P(BMM, REE), developed symbolically in Part d.

  8. h.

    Referring to the answer to Part d, evaluate P(BMM, ERE) + P(BMM, RRE).

  9. i.

    Referring to the answer to Part e, evaluate P(BMM, EER) + P(BMM, RER).

  10. j.

    Referring to the answer to Part f, evaluate P(BMM, ERR) + P(BMM, RRR).

  11. k.

    Using the answers to Parts g and i, calculate f E3.

  12. l.

    Using the answers to Parts h and j, calculate f R3.

  13. m.

    Calculate P(BMM)

Figure 8 illustrates the development of the last column of the forward matrix, which Quick Review Question (15) solved. Moreover, with this and previous calculations, we can complete the forward matrix as follows:

$$\displaystyle \begin{aligned} \begin{array}{ccccc} &&&\text{B M M }\\[0.07in] F=&\begin{array}{c} \text{E}\\ \text{R} \end{array}&\,\,&\left[ \begin{array}{ccc} 0.06\,\,&0.1296\,\,&0.070656\\ 0.63\,\,&0.0528\,\,&0.094080\end{array}\right]\end{array} \end{aligned} $$
(18)

Then, we can calculate P(BMM) as the sum of the elements in the last column, or P(BMM) = 0.080064. There is approximately an 8% chance of the output sequence BMM.

Fig. 8
figure 8

Step 3 of forward algorithm in calculation of P(BMM)

As the work for Quick Review Question 15 indicates, we can iterate through the matrix, using one column in the evaluation of the next. For S = {E, R} and ith output symbol, v i, we have:

$$\displaystyle \begin{aligned} \begin{array}{rcl} f_{{}_{\text{E}i}} &\displaystyle =&\displaystyle f_{{}_{\text{E}(i-1)}}\cdot t(\text{E},\,\text{E})\cdot e(v_i\,|\,E)+f_{{}_{\text{R}(i-1)}}\cdot t(\text{R},\,\text{E})\cdot e(v_i\,|\,\text{E})\\ &\displaystyle =&\displaystyle \sum_{y\in S}\left[f_{{}_{y(i-1)}}\cdot t(y,\,\text{E})\cdot e(v_i\,|\,\text{E})\right] \end{array} \end{aligned} $$

and

$$\displaystyle \begin{aligned} \begin{array}{rcl} f_{{}_{\text{R}i}} &\displaystyle =&\displaystyle f_{{}_{\text{E}(i-1)}}\cdot t(\text{E},\,\text{R})\cdot e(v_i\,|\,\text{R})+f_{{}_{\text{R}(i-1)}}\cdot t(\text{R},\,\text{R})\cdot e(v_i\,|\,\text{R})\\ &\displaystyle =&\displaystyle \sum_{y\in S}\left[f_{{}_{y(i-1)}}\cdot t(y,\,\text{R})\cdot e(v_i\,|\,\text{R})\right]. \end{array} \end{aligned} $$

In general, for state x,

$$\displaystyle \begin{aligned}f_{{}_{xi}}= \sum_{y\in S}\left[f_{{}_{y(i-1)}}\cdot t(y,\,x)\cdot e(v_i\,|\,x)\right]. \end{aligned}$$

Moreover, the sum of the elements in column i is the probability of the sequence of the first i observations, P(v 1,i). For example, in (18), we completed the forward matrix for P(BMM) to be the following:

The sum of the elements in the first column is P(B) = 0.69, while the sum of the second-column elements is P(BM) = 0.1824. The answer we desire, P(BMM), is the sum of the elements in the final column, 0.080064; the probability of BMM is approximately 8%. To summarize,

$$\displaystyle \begin{aligned} \boldsymbol{P(v)=\sum_{{x \in S}}f_{{xn}}}, \end{aligned} $$
(19)

where v is a sequence of n observations and f xn is the element in row x and column n of the forward matrix F. The following quick review question considers the total number of calculations for such a probability using the forward algorithm.

Quick Review Question 16

This question analyzes the complexity of the forward algorithm.

  1. a.

    Suppose a HMM has 5 hidden states. Give the number of rows in the forward matrix, F.

  2. b.

    In Holly’s HMM with 2 hidden states, the calculation of each of the two elements in the first column involved one product, for a total of 2 products. For 5 hidden states, give the total number of products in the evaluation of the first column.

  3. c.

    For h states, give the total number of products in the evaluation of the first column.

  4. d.

    Suppose x is one of these 5 hidden states. Not including the initialization step, give the number of terms having f x(i−1) as a factor, where f x(i−1) is the forward matrix value in row x and column (i − 1). This value is the same number of lines emanating from an f-value. For example, in Fig. 8, two terms had f 2E (or f 2R) as a factor and two lines emanating from f 2E (or f 2R).

  5. e.

    For 2 hidden states, each of these terms involves two products. Give the number of products when we have 5 hidden states.

  6. f.

    Give the number of these products when we have h hidden states. Note that the number of products does not depend on the number of hidden states.

  7. g.

    Give the number of elements in the forward matrix for 5 hidden states and 20 observations, not including those in the first column.

  8. h.

    Give the number of elements in the forward matrix for h hidden states and n observations, not including those in the first column.

  9. i.

    Using your answers from Parts b, e, and g, for 5 hidden states and 20 observations, give the total number of products.

  10. j.

    Using your answers from Parts c, f, and h, for h hidden states and n observations, give the total number of products. Simplify the result.

  11. k.

    Give the total number of sums used in the calculations of the elements in the first column of a forward matrix, F.

  12. l.

    For 5 hidden states and not including the initialization step, give the number of sums used in the calculation of \(f_{{ }_{xi}}\), where x is a hidden state and i is a column. For example, in Fig. 8, f 3E (or f 3R) has two summands and, thus, one sum.

  13. m.

    For h hidden states, give the number of sums used in the calculation of \(f_{{ }_{xi}}\), where x is a hidden state and i is a column greater than 1.

  14. n.

    For h hidden states and n observations, give the total number of sums used in the calculations of the elements of the forward matrix. Use answers from Parts h, k, and m. Simplify the result.

  15. o.

    For h hidden states and n observations, give the total number of sums and products in the calculation of the elements of F. Use answers from Parts j and n.

  16. p.

    Give the complexity of the forward algorithm.

As indicated in Quick Review Question 16, for h states and n observations, the forward matrix, F, has h rows and n columns, yielding hn number of elements. Most of these elements employ h summands of three factors each, although elements in the first column have fewer terms. Thus, the approximate amount of work involved in creating the forward matrix is on the order of nh 2. Moreover, to calculate the probability of the observed sequence, we add the h elements in the last column. In comparison to nh 2, h is inconsequential, so the forward algorithm is on the order of nh 2, written O(nh 2). The complexity O(nh 2) is a dramatic improvement over the exponential complexity, O(h n), for our original solution. For example, suppose our HMM has h = 10 hidden states and we wish to know the probability of a sequence of n = 9 observations. The original approach to the solution would involve h n = 109 = 1, 000, 000, 000, a billion calculations, while the forward algorithm would employ only about nh 2 = 9 ⋅ 102 = 900 calculations.

8 Probability of a Genomics Sequence

In this section, we employ the forward algorithm to solving an important biological problem—locating genes.

8.1 Biology Background

Proteins, which are the basic molecules of life, perform many essential functions, such as forming the structural components of cells and, in the case of protein enzymes, catalyzing chemical reactions. Simple proteins are chains formed from 20 amino acids.

In the cell, the nucleic acid DNA (deoxyribonucleic acid) contains the encoded information for the building of all the proteins needed by a cell. DNA acts through an intermediary nucleic acid, RNA (ribonucleic acid), to synthesize proteins. RNA has one long chain of molecules, called nucleotides, while two strands of nucleotides compose DNA. Each nucleotide is made up of a sugar, a phosphate, and a nitrogen base, which can be adenine (A), guanine (G), cytosine (C), or thymine (T) in DNA or uracil (U) in RNA. In DNA, A in one strand bonds with T in the other strand, while C and G bond together. Each of these pairs of complementary bases is referred to as a base pair (bp).

Virtually every cell (except red blood cells) in the human body contains chromosomes, or sequences of very long DNA molecules. The genome is the complete set of chromosomes in a cell and contains the organism’s hereditary information. For example, a human genome has 23 pairs of chromosomes (46 total) with each pair having one chromosome from each parent. A gene is a subsequence of a chromosome that contains information for building a protein. Although lengths vary greatly, an average gene contains about 28, 000 base pairs [1]. Some contiguous sections of each chromosome are not part of any gene, but some are important for the regulation of gene expression.

8.2 Locations of Genes

We can employ the forward algorithm to solve an important problem in genomics—the locations of genes—by helping us find areas of high CG concentration. In mammalian DNA, the sequence of bases CG appears more frequently upstream of, or before, a gene than in other parts of a DNA sequence, where CG occurs much less than expected from random occurrences of C and G. We call an area of greater concentration of CG a CpG island, where “p” in “CpG” represents a phosphate linking the bases C and G. The reason that CG occurs infrequently in much of a sequence of DNA is that often CG transforms to (methyl-C)G before mutating to TG. However, the transformation from CG to TG is suppressed in the CpG islands. A DNA segment of 200 bases is a CpG island if CG occurs at least 50% of the time and the ratio of observed-to-expected number of CpGs is greater than 0.6 [7].

Figure 9a and b contains transition matrices for sequences in and not in CpG islands, respectively. The matrices were derived from a database of human DNA sequences using 48 accepted CpG islands. In the matrix for CpG islands, the probability of the pair CG (or the probability that G occurs, given that C has just appeared) is 0.274, written as P(x i =  G | x i−1 =  C) = P(G | C) = 0.274, while in the CpG negative matrix (Fig. 9b), the probability of the sequence CG is much lower, P(G | C) = 0.078. Stanke [21] uses π = (0.148, 0.334, 0.365, 0.154) as the initial state probability vector for CpG islands and π = (0.260, 0.249, 0.241, 0.251) for non-CpG islands. For each situation, we assume the emission matrix is as follows:

Project 5 employs these hidden Markov models in accessing the probabilities of output sequences inside and outside CpG islands.

Fig. 9
figure 9

Possible transition matrix for (a) samples within CpG islands and for (b) samples not within CpG islands [4]

9 Parallel Forward Algorithm (Optional)

Although the forward algorithm has a big improvement in speed over the brute-force technique discussed initially, for a large number of observations and states, such as often encountered in genetics problems, the sequential forward algorithm can take a long time. To speed the task, we can employ parallel programming.

9.1 Communication

One obvious way to parallelize is to have different processes calculate different initial values for the first column and to have different processes compute separate summands, such as those in the middle sections of Figs. 7 and 8. In Holly’s HMM with s = 2 states, we employ 2 processes for the initialization step; and in general, for s states, each of s processes could compute a different first-column element. In subsequent steps for Holly’s HMM as in Figs. 7 and 8, each new state is a linear combination of the previous states, so each process calculates the new state by using the inputs from the arrows that point to that state. In theory, s processes are needed for s states to calculate the summands simultaneously. In reality, communication of appropriate forward matrix elements must occur for evaluation of the next column elements, and such communication limits the speedup of the parallel algorithm. Thus, it can often be more efficient to have less processes, so that each process does more work and less communication.

The heart of the forward algorithm is the matrix that holds the probabilities for the sequence of observations. From the perspective of the parallel program, each process could have its own copy of the matrix and would have to communicate the value of its calculated cell to all the other processes. Alternatively, the processes could share the same matrix, which would reduce the amount of needed communication. Such consideration of communication is important in deciding how to implement the parallel algorithm.

9.2 Implementation of the Parallel Forward Algorithm

The parallel forward algorithm, available to professors by request, is implemented using the OpenMP library. OpenMP (OMP) uses threads, which are like processes except that the threads share the same memory. Thus, threads communicate by reading and writing to the same matrix.

Communication limits speedup of the algorithm because when multiple threads have to use the same matrix cell, these lightweight processes must take turns—two threads cannot access a cell simultaneously. Thus, increasing the number of threads for any parallel program comes with a tradeoff: Each thread has less work to do, so the work can be done faster; but a thread must wait longer when multiple threads need to access the same matrix cell. Speedup of a parallel program depends on the number of threads used, and increasing the number of threads to its theoretical limit does not always improve speedup.

To illustrate, Table 1 gives timings and speedups and Fig. 10 depicts these timings for serial and parallel OMP implementations of the forward algorithm. One definition of speedup using n threads (or processes) is the length of time to execute the serial algorithm divided by the length of time to execute the corresponding parallel algorithm. Notice that the speedup of the parallel algorithm is a function of the number of states and the number of threads. For example, for 2048 states, the speedup is the greatest at about 4 threads:

$$\displaystyle \begin{aligned}\mbox{speedup }= 9.9255\, \mbox{s}/\, 4.0269\,\, \mbox{s} = 2.4648.\end{aligned}$$

However, with 4096 states, 8 threads are better:

$$\displaystyle \begin{aligned}\mbox{speedup }= 43.6465\, \mbox{s}/\, 11.9184\,\, \mbox{s} = 3.6621.\end{aligned}$$

Even with more states, the use of more threads eventually suppresses speedup because of the need for additional communication.

Fig. 10
figure 10

Timings for forward algorithm

Table 1 Timings and speedup for forward algorithm

10 Decoding Problem

We employ the HMM forward algorithm to solve likelihood problems, such as the likelihood of monitoring equipment registering breathing sounds (B) and then munching noises (M) the next 2 h for Holly Howler, P(v =  BMM). We use the Viterbi algorithm to solve another type of HMM problem, decoding. In this case, given a sequence of observations, such as v =  BMM, we determine the most probable sequence of underlying states, u, to yield v. Thus, we wish to determine the u with maximum P(u | v), such as the sequence of three states, u, that yields max(P(u | v =  BMM)).

10.1 Obvious Solution

In the development above, we learned by joint probability that P(u, BMM) = P(u | BMM) ⋅ P(BMM). Thus, the problem of determining the state sequence u that maximizes P(u | BMM) is equivalent to the problem of determining the u that maximizes P(u, BMM)∕P(BMM). However, because P(BMM) is a constant, the problem simplifies to finding u where P(u, BMM) is maximum. The most obvious method to solve the problem is to determine P(u, BMM) for every possible three-element state sequence, S 3 = {RRR, RRE, RER, REE, ERR, ERE, EER, EEE}, using a version of the forward algorithm, and then to select the u with the largest probability. However, this solution is exponentially large because for a sequence of n observations with h = 2 states, the number of possible n-element state sequences, or the number of elements in S n, is h n = 2n. In general, for h number of states, the number of n-element state sequences is h n. The much faster Viterbi algorithm is another dynamic programming algorithm, which has the forward algorithm as its base.

10.2 Viterbi Algorithm

The key to the Viterbi algorithm is Eq. (14), repeated below, for calculating a joint probability:

$$\displaystyle \begin{aligned} P(u,v)=P(u_{1,n},v_{1,n})=\left\{ \begin{array}{cc} \pi(u_1)\cdot e(v_1|u_1),&\,\,\, \mbox{ if } n=1\\ P(u_{1,n-1}, v_{1,n-1}\cdot t(u_{n-1},u_{n})\cdot e(v_n|u_n),&\,\,\, \mbox{if } n>1 \end{array} \right. \end{aligned} $$
(20)

Beginning the Viterbi algorithm in the same way as the forward algorithm for observation sequence BMM in Holly’s HMM, we employ a 2 × 3 matrix, G, with first-column elements being \(g_{{ }_{\text{E}1}} = P(\text{B}, \text{E}) = \pi (\text{E}) \cdot e(\text{B}\,|\,\text{E})\) and \(g_{{ }_{\text{R}1}} = P(\text{B}, \text{R}) = \pi (\text{R}) \cdot e(\text{B}\,|\, \text{R})\). The initialization step is identical to that of the forward algorithm (see Fig. 6), resulting in the following initial Viterbi matrix, G:

Quick Review Question 17

Calculate the first-column elements of the Viterbi matrix to calculate u for max(P(u | HHGH)) using the HMM in Fig. 11, which contains the following information:

$$\displaystyle \begin{aligned}S = \{\text{A}, \text{B}, \text{C}\} \mbox{ and } O = \{\text{G}, \text{H}\}\end{aligned}$$
$$\displaystyle \begin{aligned}\pi(\text{A}) = 0.2,\,\,\, \pi(\text{B}) = 0.1,\,\,\, \pi(\text{C}) = 0.7\end{aligned}$$
$$\displaystyle \begin{aligned}\begin{array}{cccc} u_k/{u_{k+1}}&\text{ A B C }\\[0.07in] T=\begin{array}{c} \text{A}\\ \text{B}\\ \text{C} \end{array}\,\,&\left[ \begin{array}{ccc} 0.1\,\,\,\,&0.4\,\,\,\,&0.5\\ 0.2\,\,\,\,&0.2\,\,\,\,&0.6\\ 0.3\,\,\,\,&0.2\,\,\,\,&0.5\end{array}\right]\end{array} \end{aligned}$$
Fig. 11
figure 11

HMM diagram for Quick Review Question 17

Computations of the second column of the Viterbi and the forward matrices also begin in the same way with calculating the product of a first-column element, a transition value, and an emission value. As in Fig. 7 for the forward algorithm, Fig. 12 of the current module for the Viterbi algorithm makes the following computations using first-column values, \(g_{{ }_{\text{E}1}}\) and \(g_{{ }_{\text{R}1}}\), that correspond to \(f_{{ }_{\text{E}1}}\) and \(f_{{ }_{\text{R}1}}\), respectively, of the forward matrix:

$$\displaystyle \begin{aligned}\begin{array}{ccccc} g_{{}_{\text{E}1}}& \cdot& t(\text{E}, \,\text{E})& \cdot& e(\text{M}\,|\,\text{E})\\ g_{{}_{\text{E}1}}& \cdot& t(\text{E}, \,\text{R})& \cdot& e(\text{M}\,|\,\text{R})\\ g_{{}_{\text{R}1}}& \cdot& t(\text{R}, \,\text{E})& \cdot& e(\text{M}\,|\,\text{E})\\ g_{{}_{\text{R}1}}& \cdot& t(\text{R}, \,\text{R})& \cdot& e(\text{M}\,|\,\text{R})\\ \end{array}\end{aligned}$$

However, instead of taking the sum of pairs of expressions (first and third transitioning to E, second and fourth transitioning to R) as we did with the forward algorithm, we take the maxima as follows:

$$\displaystyle \begin{aligned}\begin{array}{cccccc} g_{{}_{\text{E}2}} &=& \max( g_{{}_{\text{E}1}} \cdot t(\text{E}, \,\text{E}) \cdot e(\text{M}\,|\,\text{E}),\,\,\,& g_{{}_{\text{R}1}}\cdot t(\text{R}, \,\text{E}) \cdot e(\text{M}\,|\,\text{E}) )\\ g_{{}_{\text{R}2}} &=& \max( g_{{}_{\text{E}1}}\cdot t(\text{E}, \,\text{R}) \cdot e(\text{M}\,|\,\text{R}),\,\,\,& g_{{}_{\text{R}1}}\cdot t(\text{R}, \,\text{R}) \cdot e(\text{M}\,|\,\text{R}) ). \end{array} \end{aligned}$$

Figure 12 details these computations with boldface arrows indicating the maxima. The following displays the developing Viterbi matrix, G:

Fig. 12
figure 12

Calculation of the second column of Viterbi matrix, G

Quick Review Question 18

Suppose we wish to use the Viterbi algorithm to find the state sequence, u, with maximum P(u, HHGH) for the HMM in Quick Review Question 17. As calculated in that question, the first column of the Viterbi algorithm matrix G contains \(g_{{ }_{\text{A}1}} = 0.2,\,\, g_{{ }_{\text{B}1}} = 0.4,\mbox{ and }g_{{ }_{\text{C}1}} = 0.63\).

  1. a.

    The calculation of \(g_{{ }_{\text{B}2}}\) involves three expressions whose values are 0.0032, 0.0032, and 0.0504. Calculate \(g_{{ }_{\text{B}2}}\).

  2. b.

    Calculate \(g_{{ }_{A2}}\).

Calculations of subsequent Viterbi matrix elements for this example proceed in a similar fashion. With observation v i, we employ the following evaluations for the elements of column i:

With boldface arrows indicating maximum values, Fig. 13 illustrates the calculation of the final column of the Viterbi matrix. Note that there are two paths to R that yield the maximum, 0.004032. The completed Viterbi matrix is as follows:

Fig. 13
figure 13

Step 3 of Viterbi algorithm in calculation of P(BMM)

In general, using Viterbi’s algorithm for any HMM; for state, x; observation, v i; and set of states, S, we have the following calculation for a Viterbi matrix element in row x and column i:

$$\displaystyle \begin{aligned}\boldsymbol{g_{{}_{{xi}}}}=\displaystyle \max_{\boldsymbol{y\in S}}\boldsymbol{\left(g_{{}_{{y(i-1)}}}\cdot(y,x)\cdot e(v_i\,|\,x)\right)}.\end{aligned}$$

Quick Review Question 19

Suppose we wish to use the Viterbi algorithm to find the state sequence, u, with maximum P(u, HHGH) for the HMM in Quick Review Question 17. Suppose, also, that the third column of the Viterbi matrix G contains \(g_{{ }_{\text{A}3}} = 0.0765\), \(g_{{ }_{\text{B}3}} = 0.0340\), and \(g_{{ }_{\text{C}3}} = 0.0142\). Calculate \(g_{{ }_{\text{C}4}}\) to four decimal places.

To calculate the probability of the visible sequence with the forward algorithm, we added the probabilities in the final column. However, to calculate the maximum joint probability of a hidden sequence and a given visible sequence using the Viterbi algorithm, we find the maximum of the values in the final column. Thus, for Holly’s HMM, we have the following:

$$\displaystyle \begin{aligned}\max_{u\in S^3}(u, \,\text{BMM})=\max(0.048384,0.004032)=0.048384.\end{aligned}$$

However, we would like to calculate \(\max ( P(u |\, \)BMM)) for this u. Recall that

$$\displaystyle \begin{aligned} P(u, \,\text{BMM}) = P(u\, |\, \text{BMM}) \cdot P(\text{BMM}). \end{aligned}$$

Dividing both sides by the factor P(BMM), we have

$$\displaystyle \begin{aligned} P(u \,|\, \text{BMM}) = P(u,\, \text{BMM}) / P(\text{BMM}). \end{aligned}$$

Moreover, using the forward algorithm, we discovered P(BMM) = 0.080064. Thus, over all three-element hidden sequences, u,

$$\displaystyle \begin{aligned} \max( P(u \,|\, \text{BMM}) ) = 0.048384 / 0.080064 = 0.604317. \end{aligned}$$

More important than finding this maximum probability, we would like to discover the particular state sequence that yields this maximum. Fortunately, by backtracking through the Viterbi matrix, we can determine this hidden sequence. Figure 14 summarizes results of Figs. 9 and 10 with arrows indicating the expressions generating the maxima. To calculate the state sequence, u, that results in \(\max ( P(u \,|\, \)BMM)) = 0.604317, we start by finding the maximum in the final column, 0.048384, which is in row E. Backtracking through the path indicated by the arrows, we then go to column 2, row E and finally to column 1, row R. Reading the row values from left to right, we obtain the state sequence u =  REE. Thus, given observed sequence BMM, REE is the most likely state sequence, and P(REE | BMM) = 0.604317, which is over 60%.

Fig. 14
figure 14

Final Viterbi matrix with arrows indicating the paths

Quick Review Question 20

Suppose we wish to use the Viterbi algorithm to find the state sequence, u, with maximum P(u | HHGH) for the HMM in Quick Review Question 17. Suppose, also, that P(HHGH) = 0.1028 and the Viterbi matrix, G, is in Fig. 15, with arrows indicating the direction from which maxima came.

Fig. 15
figure 15

Viterbi matrix for Quick Review Question 20 with arrows indicating the direction from which maxima came

  1. a.

    Calculate the maximum P(u, HHGH) for hidden state sequence u.

  2. b.

    Calculate the maximum P(u | HHGH) for hidden state sequence u.

  3. c.

    Give the u that achieves these maxima.

10.3 Parallel Viterbi Algorithm (Optional)

As with the forward algorithm, we can use high performance computing to achieve faster results when a decoding problem involves a large number of states and/or observations. Moreover, we can parallelize the Viterbi algorithm similarly to the forward algorithm with OpenMP and threads communicating by reading and writing to the same matrix. Project 6 calculates the speedup that can be achieved using HPC with this algorithm.

11 Detecting CpG Islands

One example of an HMM decoding problem with a solution employing the Viterbi algorithm involves detecting genes. As discussed in section “Probability of a Genomics Sequence,” an area of greater frequency of the base sequence CG can be an indicator that a gene is to follow. The section presented initial probabilities, emission matrices, and possible transition matrices for samples within and not within such CpG islands, called positive and negative areas, respectively [4]. Suppose we also have transition probabilities from bases in positive areas (A+, C+, T+, G+) to bases in negative areas (A, C, T, G, respectively) and vice versa. Then, using the Viterbi algorithm, for a given observed sequence of bases from {A, C, T, G}, we can compute the most likely hidden sequence from the set of states, S = {begin/end, A+, C+, T+, G+, A, C, T, G}, where the sign indicates whether the base is probably in a CpG island or not. Project 7 considers such a decoding problem, where we can decode areas of high CpG concentration, containing bases A+, C+, T+, and G+.

12 Exercises

  1. 1

    As we did in Quick Review Question 3 using the Venn diagram in Fig. 4, provide an intuitive justification for the first version of Bayes’ Theorem in Eq. (6).

  2. 2

    Use a Venn diagram to justify the second version of Bayes’ Theorem in Eq. (7).

  3. 3

    This exercise relates to the HMM in Quick Review Question 14.

    1. a.

      Calculate \(f_{{ }_{\text{A}2}}\).

    2. b.

      Calculate \(f_{{ }_{\text{C}2}}\).

    3. c.

      Calculate P(HH).

    4. d.

      Calculate P(HHG).

13 Projects

  1. 1

    Write a sequential program to calculate the probability of a state given an observation using Eq. (8).

  2. 2
    1. a.

      Develop a sequential program to calculate the probability of a sequence of states generating a sequence of observations using Eq. (9).

    2. b.

      Develop a parallel version of this program, having each thread responsible for a portion of the factors and gather the results into a final product.

    3. c.

      For large sequence length, time the parallel version for increasing numbers of threads. Produce a graph of the speedup versus the number of threads similar to Fig. 10.

  3. 3
    1. a.

      Develop a sequential program to calculate the probability of a state sequence and an observation sequence using Eq. (13).

    2. b.

      Develop a parallel version of this program, having each thread responsible for a portion of the factors and gather the results into a final product.

    3. c.

      For large sequence length, time the parallel version for increasing numbers of threads. Produce a graph of the speedup versus the number of threads similar to Fig. 10.

  4. 4
    1. a.

      Using the forward algorithm, develop a sequential program to calculate the probability of a sequence of observations.

    2. b.

      Develop a parallel version of this program.

    3. c.

      For large sequence length, time the parallel version for increasing numbers of threads. Produce a graph of the speedup versus the number of threads similar to Fig. 10.

  5. 5

    Download ProbabilitiesHumanPN.txt [15], which stores the transition matrices from Fig. 9, and “Accessing Chromosome 19 Data” (AccessingChr19Data.pdf) [14], which describes how to access gene locations and subsequences from chromosome 19 of the human genome. Using the UCSC Genome Browser [23], select several subsequences of about 50 bases that occur in a CpG island and several that do not. Using a sequential or parallel forward algorithm program that you develop, determine the probability of each subsequence twice, once for each hidden Markov model in section “Probability of a Genomics Sequence,” and calculate how many more times likely the subsequence is to be in a CpG island than not. Do your results concur with those of the UCSC Genome Browser?

  6. 6
    1. a.

      Using the Viterbi algorithm, develop a sequential program to determine the most likely sequence of hidden states corresponding to a sequence of observations.

    2. b.

      Develop a parallel version of this program.

    3. c.

      For a large sequence length, time the parallel version for increasing numbers of threads. Produce a graph of the speedup versus the number of threads similar to Fig. 10.

  7. 7

    Download ProbabilitiesHumanV.txt [16], which contains the transition matrix and other data described in the section “Detecting CpG Islands” from [8], and “Accessing Chromosome 19 Data” (AccessingChr19Data.pdf) [14], which describes how to access gene locations and subsequences from chromosome 19 of the human genome. Using the UCSC Genome Browser [23], select several subsequences of about 50 bases that overlap CpG islands. Using a sequential or parallel Viterbi algorithm program that you develop, for each downloaded sequence, determine the most likely hidden sequence of states from

    $$\displaystyle \begin{aligned}S = \{\mbox{begin/end, } \text{A}^+,\, \text{C}^+,\, \text{T}^+,\, \text{G}^+,\, \text{A}^-,\, \text{C}^-,\, \text{T}^-,\, \text{G}^-\}\end{aligned}$$

    and its probability. Do your results concur with those of the UCSC Genome Browser?

  8. 8

    Scientists have used telemetry data, such as from radio tags or collars, to monitor wildlife movement and HMMs to infer an animal’s hidden behavior [9, 11]. Download telemetry.nlogo [17], a NetLogo program to simulate using telemetry to follow an animal and interpret its actions, and “Using a NetLogo Program” (UsingNetLogo.pdf) [18], a brief description of how to interact with a NetLogo simulation [12]. With NetLogo, which is free to download, we can generate agent-based models that have autonomous, decision-making agents, which have states and behaviors. In the simulation telemetry.nlogo, an animal wanders at random unless thirsty. When thirsty, the animal moves to water, in the center of the world, and stays in the water until satiated. The user can observe, through this simulated telemetry, if the animal is in water (W), facing water but not in water (F), or not facing water and not in water (A). The state of the animal is thirsty, either on way to water or in water (T); or not thirsty, not on way to water and not in water (N).

    1. a.

      Read the description of the model under the Info tab for the program. Estimate the parameters for an HMM to model the scenario as described in the section “Things to Try.”

    2. b.

      Using this HMM, take part of an observation sequence from file output, say of about 50 observations; and with your forward algorithm program, determine the probability of that subsequence.

    3. c.

      Using your Viterbi algorithm program and the subsequence of observations from Part b, determine the most likely corresponding state sequence. Calculate the percentage of the sequence derived using the Viterbi algorithm that agrees with the actual state sequence generated by the NetLogo program.

    4. d.

      Revise the NetLogo program, telemetry.nlogo, to save to a file at least 100 observation sequences of length 50 and to store in another file the corresponding hidden state sequences. Revise your Viterbi program to read the file of observation sequences and to produce another file of derived state sequences. Write another program to read the latter output file of derived state sequences and the file of state sequences from your NetLogo program and to calculate the fraction of time the two files agree. That is, what is the probability that your Viterbi program will accurately generate the underlying state sequence.

    5. e.

      Extend the model in telemetry.nlogo as suggested under the Info tab in the section “Extending the Model.” Tutorials on programming with NetLogo can be obtained from [20] and [12].

  9. 9

    [9] described using telemetry to obtain movement data of bison, which could be in underlying, hidden states of “encamped” or “exploratory.” Their paper discussed several types of HMMs to use observations to infer a bison’s states. Observations were bivariant, involving step length and turning angles, or directions. In an encamped state, the animals were observed to have numerous long steps and few turns; while in an exploratory state, bison were seen having short steps with more frequent reversals.

    1. a.

      Similar to telemetry.nlogo, available on the website containing this module and discussed in the previous project, develop a NetLogo simulation of bison movement that outputs important totals and writes to one file a sequence of hidden states and to another file the corresponding sequence of observations [17]. In Part b of this project, we will use the output data to estimate an HMM. Then, using this HMM, we will take a sequence of observations and attempt to derive the underlying state sequence. We can compare our derivation with the state sequence from the NetLogo simulation. “Using a NetLogo Program” and tutorials on programming with NetLogo can be obtained from [18] and [20], respectively. Note that we could model the step lengths as being random numbers taken from different exponential distributions, each with step lengths ranging from 0.0 to 0.6 km/d. The turning angles for an exploring and for an encamping animal can be modeled as random numbers from normal distributions with means of 0 and 180 degrees/d, respectively. ([9] employed Weibull distributions for step lengths and wrapped Cauchy distributions for turning angles.)

    2. b.

      Estimate the parameters for an HMM to model the scenario.

    3. c.

      Using this HMM, take part of an observation sequence from file output, say of about 50 observations; and with your forward algorithm program, determine the probability of that subsequence.

    4. d.

      Using this HMM, take part of an observation sequence from file output, say of about 50 observations; and with your Viterbi algorithm program, derive the most likely corresponding subsequence of observations. Calculate the percentage of the sequence derived using the Viterbi algorithm that agrees with the actual state sequence generated by the NetLogo program.

    5. e.

      Revise your NetLogo program to save to a file at least 100 observation sequences of length 50 and to store in another file the corresponding hidden state sequences. Revise your Viterbi program to read the file of observation sequences and to produce a file of derived state sequences. Develop another program to take the latter output file of derived state sequences and the file of state sequences from your NetLogo program and to calculate the fraction of time the two files agree. That is, what is the probability that your Viterbi program will accurately generate the underlying state sequence.

  10. 10
    1. a.

      In the programming language of your choice (such as Python, R, MATLAB, Mathematica, or C++), develop a program that uses the probabilities from Holly’s HMM of this module to generate a given number of states and corresponding observations. Save the sequences to separate files. For example, the initial state should be selected at random using the initial state probabilities. Have the program select the corresponding observation at random with probabilities indicted by the emission matrix. The next state should be picked at random using probabilities from the transition (Markov) matrix, and so forth.

    2. b.

      Repeat Parts c–e from Project 9 using this program.

  11. 11
    1. a.

      Repeat Project 10 a for a generalized situation. That is, develop a procedure with input of a vector of state names or their abbreviations, a vector of observation names or their abbreviations, an HMM (initial vector, transition matrix, and emission matrix), and a number (n) of states/observations for an output sequence. Then, using the HMM’s probabilities, have the procedure generate and return a sequence of n states and corresponding sequence of n observations.

    2. b.

      Develop the two HMMs from Project 5. Using your procedure from Part a, generate subsequences of length 50 for the positive and the negative models. Using a forward algorithm program that you develop, determine the probability of each subsequence twice, once for each hidden Markov model. For the positive subsequence, calculate how many more times likely the subsequence is to be in a CpG island than not. For the negative subsequence, calculate how many more times likely the subsequence is not to be in a CpG island than to be in a CpG island.

    3. c.

      A gene in DNA and the corresponding subsequence of RNA have exons, which contain part of the encoding information, such as for proteins, separated by introns, or non-coding segments. RNA splicing removes the introns and reassembles the exons so that translation to the protein can eventually occur, and a splice is the location of a cut between an intron and an exon. In DNA and RNA, each non-terminal nucleotide attaches to the #5 carbon, written 5’ carbon, of a sugar at one end and the #3 carbon, 3’ carbon, of its neighbor at the other end. Thus, we consider a nucleotide chain to have a specific 5’-3’ orientation. Develop the “Toy HMM” for a 5’ splice recognition problem in [5], and using your procedure from Part a, repeat Project 9, Parts c–e.

    For each of the following HMMs, using your procedure from Part a, repeat Project 9 , Parts c–e:

    1. d.

      Project 7

    2. e.

      The unfair-casino HMM in Fig. 1 of [3] or other sources

    3. f.

      The weather HMM on the fourth slide of [10]

    4. g.

      The weather HMM in Sections 1 and 2 of [6]

    5. h.

      The light-dark chocolate candy HMM in Section 5, “Exercises,” of [6]

    6. i.

      The tree-growth rings HMM in Section 1, “A Simple Example,” of [13].

  12. 12
    1. a.

      Study the Baum-Welch (or forward-backward) algorithm for solving the training (or learning) problem mentioned in sections “Introduction” and “Example Model,” and develop a program to solve the problem.

    2. b.

      Using the NetLogo program telemetry.nlogo, available on the website containing this module and discussed in Project 9, generate a collection of state and corresponding observation sequences [17]. Using these and your program from Part a, determine the parameters for an HMM.

    3. c.

      With your procedure from Project 10 a, generate a collection of state and corresponding observation sequences. Using these and your program from Part a, determine the parameters for a trained HMM. Compare this HMM to Holly’s HMM.

      Do the same process as described in Part c for the following HMMs:

    4. d.

      The two HMMs of Project 11 b

    5. e.

      Project 11 c

    6. f.

      Project 11 d

    7. g.

      Project 11 e

    8. h.

      Project 11 f

    9. i.

      Project 11 g

    10. j.

      Project 11 h

    11. k.

      Project 11 i

14 Answers to Quick Review Questions

  1. 1
    1. a.

      {A, B, C}

    2. b.

      {G, H}

    3. c.

      0.6

    4. d.

      0.2

    5. e.
      $$\displaystyle \begin{aligned}\begin{array}{cccc} u_k/{u_{k+1}}&\text{A B C}\\[0.07in] T=\begin{array}{c} \text{A}\\ \text{B}\\ \text{C} \end{array}\,\,&\left[ \begin{array}{ccc} 0.1\,\,\,\,&0.4\,\,\,\,&0.5\\ 0.2\,\,\,\,&0.2\,\,\,\,&0.6\\ 0.3\,\,\,\,&0.2\,\,\,\,&0.5\end{array}\right]\end{array} \end{aligned}$$
    6. f.

      0.2

    7. g.

      0.7 = 1 − π(A) − π(B)

    8. h.

      0.6

    9. i.

      0.9

    10. j.
      $$\displaystyle \begin{aligned}\begin{array}{ccc} hidden/observable&\text{G H}\\[0.07in] \begin{array}{c} \text{A}\\ \text{B}\\ \text{C} \end{array}\,\,&\left[ \begin{array}{cc} 0.9\,\,&0.1\\ 0.6\,\,&0.4\\ 0.1\,\,&0.9\end{array}\right]\end{array} \end{aligned}$$
  2. 2
    1. a.

      P(E, B) = P(B | E) ⋅ P(E) = 0.2 ⋅ 0.3 = 0.06

    2. b.

      P(R, M) = P(M | R) ⋅ P(R) = 0.1 ⋅ 0.7 = 0.07

    3. c.

      P(R, B) = P(B | R) ⋅ P(R) = 0.9 ⋅ 0.7 = 0.63

    4. d.

      1.00 = 0.24 + 0.06 + 0.07 + 0.63

    5. e.

      0.30 = 0.24 + 0.06

    6. f.

      0.70 = P(E, M) + P(R, B) = 0.07 + 0.63

    7. g.

      1.00

    8. h.

      0.31 = P(E, M) + P(R, M) = 0.24 + 0.07

    9. i.

      0.69 = P(E, B) + P(R, B) = 0.06 + 0.63

    10. j.

      1.00

  3. 3
  4. 4
    1. a.

      0.43 = 0.05 + 0.18 + 0.20

    2. b.

      0.57 = 0.36 + 0.11 + 0.10

    3. c.

      {F, G, H}

    4. d.

      0.42 because

      $$\displaystyle \begin{aligned}P(\text{G}, \,\text{J}) = P(\text{G}\,|\,\text{J})\cdot P(\text{J}),\mbox{ so }P(\text{G}\,|\,\text{J}) = P(\text{G}, \,\text{J}) / P(\text{J}) = 0.18 / 0.43 \approx 0.42\end{aligned}$$
    5. e.

      0.19 because

      $$\displaystyle \begin{aligned}P(\text{G}, \,\text{K}) {=} P(\text{G}\,|\,\text{K})\cdot P(\text{K}),\mbox{ so }P(\text{G}\,|\,\text{K}) {=} P(\text{G}, \,\text{K}) / P(\text{K}) {=} 0.11 / 0.57{\approx} 0.19\end{aligned}$$
    6. f.

      x ∈{E, G, H}P(K, x) or∑ x ∈{E, G, H}P(x, K)

    7. g.

      0.41 = 0.05 + 0.36

    8. h.

      0.12 because

      $$\displaystyle \begin{aligned}P(\text{J}, \,\text{F}) = P(\text{J}\,|\,\text{F})\cdot P(\text{F}),\mbox{ so }P(\text{J}\,|\,\text{F}) = P(\text{J}, \text{F}) / P(\text{F}) = 0.05 / 0.41\approx 0.12\end{aligned}$$
    9. i.

      0.88 because

      $$\displaystyle \begin{aligned} P(\text{K}, \,\text{F}) {=} P(\text{J}\,|\,\text{F})\cdot P(\text{F}), \mbox{ so }P(\text{K}\,|\,\text{F}) {=} P(\text{K}, \,\text{F}) / P(\text{F}) {=} 0.36 / 0.41\approx 0.88 \end{aligned}$$
    10. j.

      0.29 = 0.18 + 0.11

    11. k.

      {J, K}

    12. l.

      x ∈{J, K}P(G, x) or∑ x ∈{J, K}P(x, G)

    13. m.

      0.30 = 0.20 + 0.10

  5. 5
    1. a.

      0.62 because P(J | G) = P(G | J) ⋅ P(J)∕P(G) = 0.42 ⋅ 0.43∕0.29 ≈ 0.62

    2. b.

      0.69 because P(K | G) = P(G | K) ⋅ P(K)∕P(G) = 0.35 ⋅ 0.57∕0.29 ≈ 0.69

    3. c.

      0.11 because P(F | J) = P(J | F) ⋅ P(F)∕P(J) = 0.12 ⋅ 0.41∕0.43 ≈ 0.11

    4. d.

      0.63 because P(F | K) = P(K | F) ⋅ P(F)∕P(K) = 0.88 ⋅ 0.41∕0.57 ≈ 0.63

  6. 6
    1. a.

      0.23 ≈ 1 − P(E | M) ≈ 1 − 0.77

    2. b.

      0.91 because P(R | B) = P(B | R) ⋅ P(R)∕(P(B | E) ⋅ P(E) + P(B | R) ⋅ P(R)) = 0.9 ⋅ 0.7∕(0.2 ⋅ 0.3 + 0.9 ⋅ 0.7) = 0.9 ⋅ 0.7∕0.69 ≈ 0.91

    3. c.

      0.09 ≈ 1 − P(R | B) ≈ 1 − 0.91; alternatively, P(E | B) = P(B | E) ⋅ P(E)∕(P(B | E) ⋅ P(E) + P(B | R) ⋅ P(R)) = 0.2 ⋅ 0.3∕0.69 ≈ 0.09

  7. 7

    0.16 ≈ P(F | J) = P(J | F) ⋅ P(F)∕[P(J | F) ⋅ P(F) + P(J | G) ⋅ P(G) + P(J | H) ⋅ P(H)] = 0.12 ⋅ 0.41∕[0.12 ⋅ 0.41 + 0.62 ⋅ 0.29 + 0.26 ⋅ 0.30]

  8. 8
    1. a.

      0.0576 = P(MMMB | ERER) = e(M | E) ⋅ e(M | R) ⋅ e(M | E) ⋅ e(B | R) = 0.8 ⋅ 0.1 ⋅ 0.8 ⋅ 0.9

    2. b.

      0.53144 ≈ P(BBBBBB | RRRRRR) = e(B | R)6 = 0.96

  9. 9
    1. a.

      0.00055 ≈ P(MMMB, ERER) = P(MMMB | ERER) ⋅ P(ERER) = 0.0576 ⋅ π(E) ⋅ t(E, R) ⋅ t(R, E) ⋅ t(E, R) = 0.0576 ⋅ 0.3 ⋅ 0.4 ⋅ 0.2 ⋅ 0.4, where P(MMMB | ERER) = 0.0576 is from Quick Review Question 8 a.

    2. b.

      0.1219 ≈ P(BBBBBB, RRRRRR) = P(BBBBBB | RRRRRR) ⋅ P(RRRRRR) = 0.53144 ⋅ π(R) ⋅ t(R, R)5 = 0.53144 ⋅ 0.70 ⋅ 0.85 where P(BBBBBB | RRRRRR) ≈ 0.53144 is from Quick Review Question 8 b.

  10. 10
    1. a.

      64 = 43

    2. b.

      1, 048, 576 = 410

    3. c.

      1, 099, 511, 627, 776 = 420

    4. d.

      4, 398, 046, 511, 104 = 421

  11. 11
    1. a.

      3 × 4

    2. b.

      0.02 = π(H) ⋅ e(H | A) = 0.2 ⋅ 0.1

    3. c.

      P(A, H)

    4. d.

      0.04 = π(H) ⋅ e(H | B) = 0.1 ⋅ 0.4

    5. e.

      P(B, H)

    6. f.

      0.63 = π(H) ⋅ e(H | C) = 0.7 ⋅ 0.9

    7. g.

      P(C, H)

    8. h.

      0.69 = P(A, H) + P(B, H) + P(C, H\() = f_{{ }_{\text{A}1}} + f_{{ }_{\text{B}1}} + f_{{ }_{\text{C}1}} = 0.02 + 0.04 + 0.63\)

  12. 12

    0.0504 = P(BM, RR) = P(B, R) ⋅ t(R, R) ⋅ e(M | R\() = f_{{ }_{\text{R}1}} \cdot t(\)R, R) ⋅ e(M | R) = 0.63 ⋅ 0.8 ⋅ 0.1

  13. 13

    0.1296 = P(BM, EE) + P(BM, RE) = 0.0288 + 0.1008

  14. 14
    1. a.

      3

    2. b.

      \(0.0032 = f_{{ }_{\text{A}1}} \cdot t(\)A, B) ⋅ e(H | B) = 0.02 ⋅ 0.4 ⋅ 0.4

    3. c.

      \(0.0032 = f_{{ }_{\text{B}1}} \cdot t(\)B, B) ⋅ e(H | B) = 0.04 ⋅ 0.2 ⋅ 0.4

    4. d.

      \(0.0504 = f_{{ }_{\text{C}1}} \cdot t(\)C, B) ⋅ e(H | B) = 0.63 ⋅ 0.2 ⋅ 0.4

    5. e.

      0.0568 = 0.0032 + 0.0032 + 0.0504

  15. 15
    1. a.

      P(BMM\()=\sum _{x\in S^3}P(\)BMM, x)

    2. b.

      {EEE, EER, ERE, ERR, REE, RER, RRE, RRR}

    3. c.

      P(BMM) = [P(BMM, EEE) + P(BMM, ERE) + P(BMM, REE) + P(BMM, RRE)] + [P(BMM, EER) + P(BMM, ERR) + P(BMM, RER) + P(BMM, RRR\()] = f_{{ }_{\text{E}3}} + f_{{ }_{\text{R}3}}\)

    4. d.

      P(BMM, ERE) + P(BMM, RRE) = P(BM, ER) ⋅ t(R, E) ⋅ e(M | E) + P(BM, RR) ⋅ t(R, E) ⋅ e(M | E) = [P(BM, ER) + P(BM, RR)] ⋅ t(R, E) ⋅ e(M | E\() = f_{{ }_{\text{R}2}}\cdot t(\)R, E) ⋅ e(M | E)

    5. e.

      P(BMM, EER) + P(BMM, RER\() = f_{{ }_{\text{E}2}} \cdot t(\)E, R) ⋅ e(M | R)

    6. f.

      P(BMM, ERR) + P(BMM, RRR\() = f_{{ }_{\text{R}2}} \cdot t(\)R, R) ⋅ e(M | R)

    7. g.

      0.062208 = 0.1296 ⋅ 0.6 ⋅ 0.8

    8. h.

      0.005184 = 0.1296 ⋅ 0.4 ⋅ 0.1

    9. i.

      0.008448 = 0.0528 ⋅ 0.2 ⋅ 0.8

    10. j.

      0.004224 = 0.0528 ⋅ 0.8 ⋅ 0.1

    11. k.

      0.70656 = 0.062208 + 0.008448

    12. l.

      0.009408 = 0.005184 + 0.004224

    13. m.

      0.080064 = 0.070656 + 0.009408

  16. 16
    1. a.

      5

    2. b.

      5

    3. c.

      h

    4. d.

      5

    5. e.

      2

    6. f.

      2

    7. g.

      95 = 5 ⋅ 19

    8. h.

      h ⋅ (n − 1)

    9. i.

      195 = 5 + 95 ⋅ 2

    10. j.

      2hn − h because h + h ⋅ (n − 1) ⋅ 2 = h + 2hn − 2h = −h + 2hn

    11. k.

      0

    12. l.

      4

    13. m.

      h − 1

    14. n.

      h 2(n − 1) − hn − h or h 2n − h 2 − hn − h because 0+(h−1)⋅h⋅(n−1) = (h 2h)(n−1) = h 2nhnh 2h = h 2(n−1)−hnh

    15. o.

      h 2n − h 2 + hn − 2h because (2hn − h) + (h 2n − h 2 − hn − h) = h 2n − h 2 + hn − 2h

    16. p.

      O(h 2n)

  17. 17

    \(g_{{ }_{\text{A}1}} = 0.2, g_{{ }_{\text{B}1}} = 0.4\), and \(g_{{ }_{\text{C}1}} = 0.63\)

  18. 18
    1. a.

      0.0504 = max(0.0032, 0.0032, 0.0504)

    2. b.

      0.0189 = max(0.0002, 0.0008, 0.0189)

  19. 19

    0.0344 because of the following: \(g_{{ }_{\text{A}3}} \cdot t(\)A, C) ⋅ e(H | C) = 0.034425; \(g_{{ }_{\text{B}3}} \cdot t(\)B, C) ⋅ e(H | C) = 0.01836; \(g_{{ }_{\text{C}3}} \cdot t(\)C, C) ⋅ e(H | C) = 0.00639; and the maximum of these expressions is 0.034425.

  20. 20
    1. a.

      0.0344, the maximum in the final column

    2. b.

      0.33463 = 0.0344∕0.1028

    3. c.

      CCAC

15 Further Reading

  • Baldi, P., Chauvin, Y., Hunkapiller, T. and McClure, M.A.: Hidden Markov models of biological primary sequence information. Proc. of the Natl. Academy of Sciences, 91(3), 1059–1063 (1994)

  • Gales, M., Young, S.: The application of hidden Markov models in speech recognition. Foundations and Trends in Signal Processing, 1(3), 195–304 (2007)

  • Kamal, M.S., Chowdhury, L., Khan, M.I., Ashour, A.S., Tavares, J.M.R., Dey, N.: Hidden Markov model and Chapman Kolmogrov for protein structures prediction from images. Computational Biology and Chemistry, 68 231–244 (2017)

  • Krogh, A., Brown, M., Mian, I.S., Sjölander, K., Haussler, D.: Hidden Markov models in computational biology: Applications to protein modeling. J. of Molecular Biology, 235(5), 1501–1531 (1994)

  • Manogaran, G., Vijayakumar, V., Varatharajan, R., Kumar, P.M., Sundarasekar, R., Hsu, C.H.: Machine learning based big data processing framework for cancer diagnosis using hidden Markov model and GM clustering. Wireless Personal Communications, 102(3), 2099–2116 (2018)

  • McGibbon, R.T., Ramsundar, B., Sultan, M.M., Kiss, G., Pande, V.S.: Understanding protein dynamics with L1-regularized reversible hidden Markov models. arXiv preprint arXiv:1405.1444 (2014)

  • Petersen, B.K., Mayhew, M.B., Ogbuefi, K.O., Liu, V.X., Greene, J.D., Ray, P.: Modeling sepsis disease progression using hidden Markov models (No. LLNL-CONF-740757). Lawrence Livermore National Lab.(LLNL), Livermore, CA (U. S.) (2017)

  • Rabiner, L.R.: A tutorial on hidden Markov models and selected applications in speech recognition. Proc. of the IEEE 77(2): 257–286 (1989)

  • Ramanathan, N.: Applications of hidden Markov models.

    http://www.cs.umd.edu/~djacobs/CMSC828/ApplicationsHMMs.pdf

  • Sharp, C., Bray, J., Housden, N.G., Maiden, M.C., Kleanthous, C.: Diversity and distribution of nuclease bacteriocins in bacterial genomes revealed using Hidden Markov Models. PLoS Computational Biology, 13(7), p.e1005652 (2017)

  • Williams, J.P., Storlie, C.B., Therneau, T.M., Jack Jr, C.R., Hannig, J.: A Bayesian approach to multi-state hidden Markov models: application to dementia progression. arXiv preprint arXiv:1802.02691 (2018)

  • Yoon, B.: Hidden Markov models and their applications in biological sequence analysis. Curr. Genomics 10(6), 402–415 (2009)