Introduction

Earthquakes can be one of the most devastating natural phenomena. The randomness of its occurrence as well as its size make this event a potentially life-threatening disaster that could affect thousands of people and infrastructures (Kannan 2014). Analysis of earthquake data is therefore important to better understand earthquake behavior and create mechanism in countering its probable effects (Yip et al. 2017).

Stochastic modeling has been a traditional approach to study earthquake occurrences. Models such as Poisson (Utsu 1969; Lomnitz 1974; Dionysiou and Papadopoulos 1992) and negative binomial (Dionysiou and Papadopoulos 1992; Rao and Kaila 2010) models have been used to describe earthquake occurrences. These models assume that the previous occurrences do not affect the time for which the next one will occur. However, earthquakes of large magnitude could cause other large earthquakes to occur consecutively in a short span of time (Kannan 2014). This means that the occurrences are self-exciting, i.e, the previous occurrences make the future occurrences more likely to happen. Some researches on modelling the seismicity such as Bansal et al. (2012) and Spassiani and Sebastiani (2016) used the epidemic-type aftershock sequence (ETAS) model. ETAS model is a point process on modelling seismicity that is based on three assumptions: (i) the background seismicity follows the Poisson distribution, (ii) the number of aftershocks is proportional to exp(αM), and (iii) the decrease in the number of aftershocks follows the modified Omori Law (Bansal et al. 2012). The ETAS model would be impossible to apply on the available local data since it is difficult to determine which among the earthquake occurrences are aftershocks or mainshocks. Hence, Markov model would be a suitable alternative since it uses information from the immediate past.

A hidden Markov model (HMM) is a stochastic process that involves random variables characterized as either an observed process or a hidden process. The hidden process is a sequence of unobservable events that directly affects the observed process. It is assumed to be a Markov process that governs the distribution of the observed process. One of the early works on the application of HMM was by Rabiner (1989) where they used the model in advanced speech recognition problems. Related studies also show how HMM can be used in molecular biology (Krogh et al. 2001), genetics (Pachter et al. 2002), engineering (Goh et al. 2012) and notably, seismology (Doganer and Calik 2013; Yip et al. 2017). The study of Yip et al. (2017) developed a novel HMM in modeling and predicting earthquakes. They used HMM recognizing that while earthquake occurrences are observable, the underlying underground dynamics, which involves the stress level around faults, are not. Through their model, they predicted the arrival time and magnitude of future earthquakes simultaneously using the data from the Southern California earthquake catalogues from 1981 to 2015. Meanwhile, Doganer and Calik (2013) focused on the use of HMM with forward algorithm in estimating the epicenter of many occurring earthquakes in East Anatolian Fault Zone. The use of HMM aided them in considering times of seismic inactivity. Their results showed a 0.73 chance of earthquake occurrence in Sincik- Lake Hazar. Different distributions can be used as the state-dependent probability distribution since HMM can be applied to both discrete and continuous data. This paper will employ Poisson distribution due to the discrete characteristic of the seismic data. Moreover, using Poisson hidden Markov model (PHMM) addresses the problem of over-dispersion of the data which is typically the case in earthquake data (Can et al. 2014).

Recent researches in PHMM include the application of this model to studies in video traffic (Rossi et al. 2015), infrastructure deterioration (Le Thanh et al. 2015), and insurance (Paroli et al. 2002). The study of Orfanogiannaki et al. (2011) introduced PHMM in modeling temporal seismicity changes. According to them, a PHMM can reveal unknown attributes of the earthquake mechanisms that produced the seismic data by providing a way to estimate the underlying hidden states of the system. Using PHMM, they were able to model the earthquake frequencies with local magnitude ML > 3.2 in the seismogenic area of Killini, Ionian Sea, Greece, in the period 1990-2006. They allowed them to capture short-term precursory seismicity changes preceding strong mainshocks which the traditional analysis failed to recognize in the 1997 mainshock. Meanwhile, Can et al. (2014) applied PHMM to predict earthquake hazards in Bilecik, NW Turkey. They considered the annual frequencies of earthquakes occurring around the area from January 1900 to December 2012, with magnitude M > 4, and forecasted earthquake hazards for the years 2013-2047. In 2018, Orfanogiannaki et al. (2018) considered two main earthquakes that occurred between the Indo-Australian and the southeastern Eurasian plates, and used PHMM in identifying the temporal patterns in the time series of those two earthquakes. Their results showed the low seismicity in the region 400 days prior to the first earthquake, and a shift from low to high seismicity in between the two main earthquakes. The work of Orfanogiannaki and Karlis (2018) introduced the use of multivariate Poisson hidden Markov models in modeling earthquake occurrences. Each state of the multivariate model is associated with a different multivariate discrete distribution. They apply their model to the seismicity with magnitude M > 5 in three seismogenic subregions in the North Aegean Sea 1981 to 2008. Their results proved the migration of seismicity in adjacent subregions that share similar seismotectonic feature.

The Philippines is considered to have one of the most complex regions of plate interaction in the circum-pacific belt (Hopkins et al. 1991). This results to the high seismic activity in the country. Some of the recent memorable instances include the successive destructive earthquakes of magnitude higher than 6 that occurred in Mindanao as well as in other parts of the country in 2019 (Rappler.com 2019a, 2019b; PHIVOLCS 2019a, 2019b), the earthquake swarm that happened in Batangas in 2017 (PHIVOLCS 2017), and the 2013 Bohol earthquake that caused broken roads and damaged buildings (Rappler.com 2013). While there have been numerous studies on the use of PHMM in understanding seismic activities, none of these have used seismic data from the Philippines. This paper aims to apply PHMM in examining earthquake occurrences using seismic data of Metro Manila.

The paper is organized as follows: in “Poisson hidden Markov model”, we discuss the PHMM and how the expectation-maximization (EM) algorithm is used to estimate the model’s parameters. In “Numerical implementation”, we present the results of the parameter estimations for PHMMs with different number of states. We use these results to determine which PHMM best describes the data. In “Benchmarking”, we perform short-term forecasting using the PHMM and ARIMA model and compare the results. Lastly, in “Conclusion and recommendation”, we present our conclusions and recommendations.

Poisson hidden Markov model

Let \(({\Omega }, \mathcal {F}, P)\) be the probability space such that zk is a Markov chain in discrete time k = 0,1,2,.... The Markov chain evolves according to the dynamics

$$ \mathbf{z}_{k} = A\mathbf{z}_{k-1}+\mathbf{v}_{k} $$
(1)

where vk is a martingale increment, that is,

$$ E[\mathbf{v}_{k} | \mathcal{F}_{k}^{\mathbf{z}}] =0 $$
(2)

where \(\mathcal {F}_{k}^{\mathbf {z}}\) is the filtration generated by {z0,z1,...,zk}. The Markov chain zk represents the state, so if we are working with m states, then \(\mathbf {z}_{k}\in \mathbb {R}^{m}\). In addition, zk is a linear combination from the set {e1,e2,...,em} where \(\mathbf {e}_{i}=(0, ..., 0, 1, 0, ..., 0)^{\top }\in \mathbb {R}^{m}\) is a zero vector with 1 on the ith component, which is the canonical basis of \(\mathbb {R}^{m}\). A is the transition probability matrix with entries aij,i,j = 1,2,...,m which is the probability of transition from state i to state j from time k − 1 to time k, that is,

$$ a_{ij} = P(\mathbf{z}_{k}=\mathbf{e}_{j} | \mathbf{z}_{k-1}=\mathbf{e}_{i})=P(\mathbf{z}_{1}=\mathbf{e}_{j} | \mathbf{z}_{0}=\mathbf{e}_{i}). $$
(3)

It is clear that \(\sum \limits _{j=1}^{m}a_{ij}=1\) for i = 1,...,m. Let ξ = (ξ1,...,ξm) be the marginal distribution of the initial Markov chain, z0, such that ξi = P(z0 = ei). ξ represents the initial distribution of the Markov chain and \(\sum \limits _{i=1}^{m}{\xi }_{i}=1\).

Furthermore, let Yk,k = 0,1,2,... be the distribution of the observed process that depends only on zk. In other words, {Yk} is a sequence of conditionally independent random variables given the state process {zk}. In this study, we assume that Yk given zk is a Poisson random variable, thus we have the term Poisson hidden Markov model. The state space zk determines the parameter of the Poisson process used to generate Yk. See Paroli et al. (2002) for a comprehensive discussion on PHMM.

Let λ = (λ1,...,λm) be the parameter space for the observed process Yk, that is λi is the parameter of the Poisson process at state i for i = 1,...,m. Let ηyi be the conditional probability of Yk given that the process is in state i, that is,

$$ \eta_{yi}=P(Y_{k}=y | \mathbf{z}_{k}=\mathbf{e}_{i})=e^{-\lambda_{i}}\frac{{\lambda_{i}^{y}}}{y!} $$
(4)

Note also here that \(\sum \limits _{y=0}^{\infty }\eta _{yi}=1\) for i = 1,...,m.

The entire process given by {(zk,Yk)} is what we are looking for. The Markov chain zk is the latent process that has a semi-martingale representation given in (1). The process Yk depends on the latent process zk and this is depicted in Figure 1. The processes {zk} and {Yk} are stationary processes, thus each Yk has the same distribution for any value of k. Getting the probability mass function for Yk, we have

$$ \begin{array}{@{}rcl@{}} P(Y_{k} = y) & =& \sum\limits_{i=1}^{m}P(Y_{k}, \mathbf{z}_{k}=\mathbf{e}_{i})\\ & = &\sum\limits_{i=1}^{m} P(Y_{k} = y | \mathbf{z}_{k}=\mathbf{e}_{i}) P(\mathbf{z}_{k} = \mathbf{e}_{i})\\ & =& \sum\limits_{i=1}^{m} \xi_{i}\eta_{yi} \end{array} $$

We can also observe that E[Yk] = 〈ξ,λ〉 where 〈⋅,⋅〉 is the usual inner product.

Fig. 1
figure 1

Hidden Markov process

Let Θ be the parameter space containing the set of plausible parameters for the process {(zk,Yk)}. If 𝜃 ∈Θ is the maximum likelihood estimate for the process {(zk,Yk)}, then

$$ \theta = (a_{11}, a_{12}, ...., a_{mm}, \lambda_{1}, \lambda_{2}, ..., \lambda_{m})^{\top}. $$
(5)

𝜃 contains the entries of the transition probability matrix and the elements of the parameter space of the Poisson process. The transition probability matrix contains m elements but we know that \(\sum \limits _{j=1}^{m} a_{ij} = 1\) for i = 1,...,m, so we can solve only for m2m entries of A. Setting the entries of the diagonal of A dependent on other entries of A, that is, \(a_{ii}=1-\sum \limits _{j=1, j\ne i}^{m} a_{ij}\). So, we can reduce 𝜃 by

$$ \theta = (a_{12}, a_{13}, ..., a_{m, m-1}, \lambda_{1}, \lambda_{2}, ..., \lambda_{m} )^{\top} $$
(6)

which contains m2 elements.

Suppose we have the observed process {y0,y1,...,yK} up to time K and {z0,z1,...,zK} be the Markov chain process of the state up to time K. Thus, the set {z0,Y0,z1,Y1,...,zK,YK} is the set of complete data. The likelihood function for this set, denoted by Lc(Y ;Θ) is given by

$$ \begin{array}{@{}rcl@{}} L^{c}(Y;{\Theta}) & =& P(Y_{0}=y_{0}, Y_{1}=y_{1}, ..., Y_{K}=y_{K}, \mathbf{z}_{0}=\mathbf{e}_{i_{0}}, ..., \mathbf{z}_{K}=\mathbf{e}_{i_{K}}) \\ & =& \xi_{i_{0}}\eta_{y_{0}, i_{0}}\prod\limits_{k=1}^{K} a_{i_{k-1}, i_{k}}\eta_{y_{k}, i_{k}} \end{array} $$
(7)

Since the Markov process is a latent process, getting the sum over all possible values of the state process, we obtain the likelihood function of the incomplete data, given by

$$ L(Y ; {\Theta}) = \sum\limits_{i_{0}=1}^{m}\sum\limits_{i_{1}=1}^{m}\cdots\sum\limits_{i_{K}=1}^{m} \xi_{i_{0}}\eta_{y_{0}, i_{0}}\prod\limits_{k=1}^{K} a_{i_{k-1}, i_{k}}\eta_{y_{k}, i_{k}} $$
(8)

where \(\eta _{y_{k}, i_{k}}\) is the state-dependent probability of yk conditioned on state zk given by

$$ \eta_{y_{k}, i_{k}} = e^{-\lambda_{i_{k}}}\frac{\lambda_{i_{k}}^{y_{k}}}{y_{k}!}. $$
(9)

Solving for maximum likelihood estimates of the parameters is finding 𝜃 such that

$$ \theta = \text{argmax} L(Y;{\Theta}) $$
(10)

Optimizing (8) is analytically intractable so we implement numerical method for estimating the parameters. We perform expectation-maximization (EM) algorithm to numerically estimate the parameters, see Ryden (1996) for a comprehensive discussion on this. EM algorithm is an iterative process that involve two main steps: E-step, the expectation step and M-step, the maximization step. Let

$$ Q(\theta; \hat{\theta})=E_{\hat{\theta}}[\log {L_{K}^{c}}(Y ; {\Theta}) | \mathbf{y}] $$
(11)

where \(\hat {\theta }\in {\Theta }\) and y is a vector of realized process for Y. The EM algorithm is described as follows:

  1. 1.

    Choose \(\hat {\theta }_{0}\in {\Theta }\) such that \(\hat {\theta }_{0}\) is a good approximate for the parameters.

  2. 2.

    E-step is calculating for (11), that is, finding \(Q(\theta ; \hat {\theta }_{0})\) defined in (11).

  3. 3.

    M-step is finding \(\hat {\theta }_{n+1}\) such that \(Q(\hat {\theta }_{n+1} ; \hat {\theta }_{n})\ge Q(\theta ; \hat {\theta }_{n})\) where

    $$ \hat{\theta}_{n+1} = \text{argmax}_{\theta\in{\Theta}} Q(\theta; \hat{\theta}_{n}). $$
    (12)
  4. 4.

    The E and M steps are repeated in an alternating way until an optimal estimate is achieved, that is,

    $$ \left|\log L_{K}(\hat{\theta}_{n+1}) - \log L_{K}(\hat{\theta}_{n})\right| < \epsilon $$
    (13)

    for some tolerance error 𝜖.

Thus, we conclude that

$$ \hat{\theta}_{n+1} = \left( a_{12}^{(n+1)}, a_{13}^{(n+1)}, ..., a_{m,m-1}^{(n+1)}, \lambda_{1}^{(n+1)}, \lambda_{2}^{(n+1)}, ..., \lambda_{m}^{(n+1)}\right)^{\top} $$
(14)

is an optimal estimate for the process {zk,Yk}.

EM algorithm may be simplified by using forward and backward probabilities, commonly known as Baum-Welch algorithm (Baum et al. 1970; Baum and Petrie 1966). We denote αk(i) be the forward probabilities of the past observations up to the present with the current state, then it is given by

$$ \alpha_{k}(i) = P(Y_{0}=y_{0}, Y_{1}=y_{1}, ..., Y_{k}=y_{k}, \mathbf{z}_{k}=\mathbf{e}_{i}) $$
(15)

and the backward probabilities, βk(i), which is the probabilities of the future observations conditioned on the current state, that is,

$$ {\upbeta}_{k}(i)=P(Y_{k+1}=y_{k+1}, ..., Y_{K}=y_{K} | \mathbf{z}_{k}=\mathbf{e}_{i}). $$
(16)

It can be shown that forward probabilities may be derived recursively as

$$ \begin{array}{@{}rcl@{}} \alpha_{0}(i) & =&\xi_{i}\eta_{y_{0}, i} {\kern72pt} \text{ for } i=1, ..., m \\ \alpha_{k}(j) & = &\sum\limits_{i=1}^{m}\alpha_{k-1}(i)a_{ij}\eta_{y_{k}, j} \qquad\text{ for } k=1, ..., K \text{ and } j=1, ..., m \end{array} $$
(17)

and backward probabilities as

$$ \begin{array}{@{}rcl@{}} {\upbeta}_{K}(i) & = & 1 {\kern102pt} \text{ for } i=1, ..., m \\ {\upbeta}_{k}(i) & = & \sum\limits_{j=1}^{m} \eta_{y_{k+1}, j}{\upbeta}_{k+1}(j)a_{ij} \qquad\text{ for }k = K - 1, ..., 0 \text{ and } j = 1, ..., m \end{array} $$
(18)

Evaluating (11) at the n th iteration of the parameters, \(\hat {\theta }_{n}\), we get

$$ \begin{array}{@{}rcl@{}} Q(\theta, \hat{\theta}_{n}) &= &\sum\limits_{i=1}^{m}\frac{\alpha_{0}^{(n)}(i){\upbeta}_{0}^{(n)}(i)}{\sum\limits_{l=1}^{m}\alpha_{k}^{(n)}(l){\upbeta}_{k}^{(n)}(l)}\log\xi_{i} \\ &&+ \sum\limits_{i=1}^{m}\sum\limits_{j=1}^{m}\frac{\sum\limits_{k=0}^{K}\alpha_{k}^{(n)}(i)a_{ij}^{(n)}\eta_{y_{k+1},j}^{(n)}{\upbeta}_{k+1}^{(n)}(j)}{\sum\limits_{l=1}^{m}\alpha_{k}^{(n)}(l){\upbeta}_{k}^{(n)}(l)}\log a_{ij} \\ &&+ \sum\limits_{i=1}^{m}\frac{\sum\limits_{k=0}^{K}\alpha_{k}^{(n)}(i){\upbeta}_{k}^{(n)}(i)}{\sum\limits_{l=1}^{m}\alpha_{k}^{(n)}(l){\upbeta}_{k}^{(n)}(l)}\log\eta_{y_{k}, i} \end{array} $$

where \(\eta _{y_{k}, i}^{(n)}, \alpha _{k}^{(n)}(i)\) and \({\upbeta }_{k}^{(n)}(i)\) are derived based on \(\hat {\theta }_{n}\) obtained at the n th iteration following EM algorithm. Thus, the maximum likelihood estimate of the entries of transition probability matrix derived at the (n + 1)th iteration via EM algorithm is given by

$$ a_{ij}^{(n+1)} = \frac{\sum\limits_{k=0}^{K-1}\alpha_{k}^{(n)}(i)a_{ij}^{(n)}\eta_{y_{k+1},j}^{(n)}{\upbeta}_{k+1}^{(n)}(j)}{\sum\limits_{k=0}^{K-1}\alpha_{k}^{(n)}(i){\upbeta}_{k}^{(n)}(i)} \text{ for }i,j=1, ..., m $$
(19)

and the parameter for the Poisson process is given by

$$ \lambda_{i}^{(n)} = \frac{\sum\limits_{k=0}^{K}\alpha_{k}^{(n)}(i){\upbeta}_{k}^{(n)}(i)y_{k}}{\sum\limits_{k=0}^{K}\alpha_{k}^{(n)}(i){\upbeta}_{k}^{(n)}(i)} \text{ for } i=1, ..., m. $$
(20)

You may refer to Elliott et al. (1995) for a comprehensive discussion on parameter estimation.

Numerical implementation

We examine the earthquake data of Metro Manila (12o -17oN Latitude, 119o-123oE Longitude) obtained from the DOST Philippine Institute of Volcanology and Seismology (DOST-PHIVOLCS). In particular, we consider the earthquake occurrences of magnitude greater than or equal to 4. It should be noted, however, that the magnitude entries in the data are either in local magnitude scale \(\left (M_{l}\right )\), body wave magnitude scale \(\left (M_{b}\right )\), or surface wave magnitude scale \(\left (M_{s}\right )\). We convert all the Ml and Mb entries to Ms using the formulas given by \(M_{s}=\frac {M_{b}-2.5}{0.63}\) and Ms = − 3.2 + 1.45Ml (Tobyás and Mittag 1991).

We determine the frequencies of earthquake occurrences over 30-day intervals from January 1, 1960 to January 20, 2019, and record 719 observed values whose summary is shown in Figure 2. We infer from the data that there is an over-dispersion of earthquake occurrences over 30-day intervals since the value of the standard deviation, 5.57309, is greater than the average number of earthquake occurrences, 1.64534.

Fig. 2
figure 2

30-day frequencies of earthquake occurrences from January 1, 1960 to January 20, 2019

The parameters of the model were estimated using MLE as described in the previous section. We implemented EM algorithm using a fixed set of states. Shown in Table 1 are the estimated parameters with the corresponding transition probability matrix from Poisson process to 6-state PHMM.

Table 1 Estimated parameters and associated transition probability matrices

Based from the table, if we consider the Poisson process, the mean number of earthquakes in every 30-day interval is 1.645341. In the two-state model, the estimated parameters are 1.064042 and 21.212017, which shows two average number of occurrences that are relatively far from each other. Clearly, the estimated λ2 accounts for the 30-day periods with high number of earthquake occurrences. Shown in Figure 3 is the distribution of states under the two-state regime. Based from the transition probability matrix, there is a minute chance that the number of occurrences on the next 30-day period is high. From 4-state to 6-state model, notice that one state has an average of 87 earthquake occurrences. From these three models, the probability that the next 30-day period stays in the same state is 0.5 while and the probability that it will shift to a lower state is also 0.5. The summary of state distributions under the two-state up to the 6-state regimes is shown in the Appendix.

Fig. 3
figure 3

State distribution under 2-state PHMM

In order to choose the best model, we use two metrics: Akaike information criterion (AIC) and Bayesian information criterion (BIC). The quality of the model relative to the other proposed model can be estimated by AIC (Akaike 1974). BIC, on the other hand, is similar to AIC that uses likelihood function to choose the most suitable model for the given data sets (Schwarz 1978). The formula for AIC and BIC are given by

$$ AIC = 2p-2\log L $$
$$ BIC = p\log n - 2\log L $$

where p is the number of parameters estimated in the model, n is the number of data points considered and L is the likelihood value.

Shown in Table 2 are the AIC and BIC for the various models we implemented in this paper. AIC suggests that 5-state PHMM is the most suitable model while BIC suggests that 4-state suits the data sets.

Table 2 Values of AIC and BIC

Since these two metrics do not agree with model selection, we use likelihood ratio test to determine the better model. Let L0 and L1 be the maximum loglikelihood of the 4-state and 5-state PHMMs, respectively. We obtain the values L0 = − 1029.098 and L1 = − 1002.291. To perform the likelihood ratio test, we consider the following hypotheses:

H0::

The 4-state PHMM is better than the 5-state PHMM.

Ha::

The 4-state PHMM is not better than the 5-state PHMM.

We calculate the χ2 test statistic as follows:

$$ \chi^{2}=-2\left( L_{0} - L_{1} \right) = 53.614 $$

We choose α = 0.05 and set the degrees of freedom (df) to 9, which is the difference in the number of parameters of the two models. With α = 0.05 and df = 9, the corresponding value from the chi square distribution table is 16.919. Since χ2 = 53.614 > 16.919, we reject H0. Thus, at 95% level of significance, we conclude that the 5-state PHMM is better than the 4-state PHMM.

Acceptability of the 5-state PHMM parameter estimates

To investigate the acceptability of the parameters of 5-state PHMM, we perform bootstrapping. We partition the original data set into subintervals of 30 points and resample it with replacement from each subinterval. From the resampled data points, we perform estimation of parameters using the algorithm presented. We select 10,000 set of bootstrap samples; then, we calculate for the average value of each of the parameter estimates (Xiong and Mamon 2017). Table 3 summarizes the values of each parameters obtained from bootstrapping, and their corresponding confidence interval. The mean, std dev, 95% lower cl and 95% upper cl represent the average, standard deviation, 2.5% quantile and 97.5% quantile of the 10,000 bootstrap samples, respectively.

Table 3 Results from bootstrap sampling

Notice that as the estimated parameter increases, the corresponding standard deviation from the bootstrap sample increases as well. This indicates that the state with a higher parameter would have larger fluctuations on the number of earthquake occurrences over the 30-day period. Also, all the parameters have values that lie inside their corresponding confidence interval. This means that their values are acceptable estimates of the average 30-day earthquake occurrences.

5-state PHMM on the Metro Manila earthquake data

Shown in Figure 4 are different portions of the graph of the 30-day earthquake occurrences in Metro Manila. The graphs also show the major earthquake occurrences and in which 30-day interval they are a part of. Take note that \(\hat {\lambda }_{1}=0.174485100\), \(\hat {\lambda }_{2}=0.851076600\), \(\hat {\lambda }_{3}=2.463019200\), \(\hat {\lambda }_{4}=13.948613400\), and \(\hat {\lambda }_{5}=87\).

Fig. 4
figure 4

State distribution under 5-state PHMM of some portions of the Metro Manila data

We observe that most of the major earthquakes are part of a 30-day interval that is currently in state 4, i.e., the interval has an average of 13.95 earthquake occurrences. It may be due to the occurrence of foreshocks and aftershocks. A large earthquake is causally preceded by foreshock or multiple-shock activities (Fukao and Furumoto 1975) and the stress increase caused by the major earthquake could result to the occurrence of subsequent minor earthquakes (King et al. 1994). We also notice that the July 22 - August 21, 1990 interval has the most number of earthquake occurrences with 123, and the next interval has the second most occurrences with 51. These are the only intervals that are in state 5, i.e, the interval has an average of 87 earthquake occurrences. The major Luzon earthquake with magnitude 7.8 that happened in July 16, 1990 could have triggered many aftershocks. This major earthquake is also considered to have a possible relation to the eruption of Mt. Pinatubo in July 1991 (Bautista et al. 1996). Lastly, we observe that whenever the 30-day interval is in state 4, whether one-time or consecutive times, the next interval will be in state 2 or 3 which means that the average number of earthquake occurrences in the interval is 0.85 or 2.46, respectively. This is consistent with the explanation in (Yip et al. 2017) that the underground stress goes back to normal after it builds up, reaches a certain threshold, and gets released in the form of an earthquake.

Benchmarking

In forecasting time series, a common model is the Autoregressive integrated moving average (ARIMA) (Fattah et al. 2018) model. Several ARIMA models of varying time steps have been used in forecasting large-scale earthquake occurrences through fitting the model and smoothing the data with a sequence of empirical recurrence rates (ERR) time series (Amei et al. 2012; Ho and Bhaduri 2015). In the study of Ho and Bhaduri (2015), historical earthquake data of Parkfield was modeled through applying ARIMA techniques to an ERR time series. Such applications were also done to similar time series data sets from hurricanes and volcanoes (Ho and Bhaduri 2017; Bhaduri and Ho 2019). For our earthquake time series, ARIMA was applied to forecast future occurrences.

Through the ARIMA R package (Hyndman and Khandakar 2008), a best fit model was used for a one-step-ahead forecast. We do 18 one-step-ahead forecasts, with each training data set increasing by 1 as we include the real data from a previously forecasted time interval. We then compare these forecasts with the values of the one-step-ahead forecasts from the PHMM and the actual January 2019- June 2020 earthquake data as shown in Figure 5. The order of the best ARIMA model that fits the data is (2,3,1).

Fig. 5
figure 5

Plot of the 18 one-step-ahead forecasts of magnitude greater than or equal to 4 earthquakes in Metro Manila from January 2019 - June 2020 of the 5-state PHMM model and the ARIMA(2, 3, 1) model vs the actual data

Notice that both the 5-state PHMM and ARIMA(2, 3, 1) forecast values are relatively close to each other, with the 5-state PHMM having a better estimate in most forecasts in terms of closeness to the actual value. There are three peaks in the actual data (9/17/2019, 1/15/2020, and 5/14/2020) that both models underestimate and fail to capture. In the first two peaks, the ARIMA(2, 3, 1) forecasts has a slight edge over those of the 5-state PHMM, while for the last peak (and most of the other data points), the forecasts for the 5-state PHMM are closer to the actual data than those of ARIMA(2, 3, 1).

We also perform an analysis of the deviations of the forecast values from both models to the actual using the unscaled mean bounded relative absolute error (UMBRAE) developed in (Chen et al. 2017). We find that the 5-state PHMM performs roughly 15.35% better than the ARIMA(2, 3, 1) model.

Conclusion and recommendation

In this study, we modeled the earthquake activity in Metro Manila using Poisson hidden Markov model. We considered the 30-day earthquake occurrences (with magnitude 4 or greater) data of Metro Manila from January 1960 to January 2019. We identified the 5-state Poisson hidden Markov model, with parameters λ1 = 0.174485100, λ2 = 0.851076600, λ3 = 2.463019200, λ4 = 13.948613400, and λ5 = 87, as the best fit for the earthquake data. In addition, we investigated the forecasting capability of this model by comparing its 18 one-step-ahead forecasts to those of the ARIMA. Using various error metrics, the 5-state PHMM gave closer forecast values.

Our study has shown that the number of earthquake occurrences in Metro Manila can be modeled using PHMM. The model can help researchers to further understand the seismic behavior in the area. In particular, it can provide insights on how earthquakes of magnitude 4 or greater behave by observing the patterns of the states for which the 30-day intervals are in. We recommend the use of PHMM on the earthquake occurrences in other areas of the country. Also, the model can be used to develop an early warning signal by identifying trigger points that suggest an upcoming period with high number of occurrences or an occurrence of a major earthquake. It can be used to alert disaster risk management agencies so they can be prepared for a possible calamity. We also recommend the use of some modifications to PHMM such as through empirical recurrence rates relations similar to Bhaduri (2020). If possible, use an earthquake data with moment magnitude (Mw) since it is a more accurate measurement of the earthquake size and try to benchmark against other methods depending on the data set. This data, however, is not available in the earthquake catalogue that the DOST-PHIVOLCS can provide.