Abstract
In this study, non-homogeneous Poisson processes (NHPP) are used to analyze climate data. The data were collected over a certain period time and consist of the yearly average precipitation, yearly average temperature and yearly average maximum temperature for some regions of the world. Different existing parametric forms depending on time and on unknown parameters are assumed for the intensity/rate function \(\lambda (t), t \ge 0\) of the NHPP. In the present context, the Poisson events of interest are the numbers of years that a climate variable measurement has exceeded a given threshold of interest. The threshold corresponds to the overall average measurements of each climate variable taking into account here. Two versions of the NHPP model are considered in the study, one version without including change points and one version including a change point. The parameters included in the model are estimated under a Bayesian approach using standard Markov chain Monte Carlo (MCMC) methods such as the Gibbs sampling and Metropolis–Hastings algorithms. The models are applied to climate data from Kazakhstan and Uzbekistan, in Central Asia and from the USA obtained over several years.
Similar content being viewed by others
Avoid common mistakes on your manuscript.
1 Introduction
Changes in climate either at global or at local level are monitored by following the behavior of climate variables such as precipitation volume and temperature as well as ocean levels, among others, paying special attention to the occurrence of events that deviate from the expected behavior (for instance, precipitation volume and/or temperature that are either higher or lower than the average value). The changes in climate have been observed throughout the world since the end of the nineteenth century (see, for instance, [1] and https://climate.nasa.gov/evidence/). It is possible to observe a significant increase in the average world temperature specially since the 1950s (see https://climate.nasa.gov/evidence/). For instance, nowadays we have an increase of 1.5 degrees Celsius in the average world temperature when compared to the pre-industrial era [1]. Additionally, climate events that deviate from the expected behavior are registered more and more frequently around the globe. One of the possible reasons for these changes could be the increase in the human made emissions into the atmosphere of carbon dioxide and other pollutants (see, for instance, https://www.ncdc.noaa.gov/monitoring-references/faq/indicators.php).
In addition to impacting the occurrence of climate events deviating from the average behavior, climate change has also a serious impact on ecosystems around the world, some of them very fragile. For instance, [2] presents several works studying the impact of climate change on forests in the USA. In [3], it is presented the effects of climate change on marine systems, and [4] introduces a review describing the impacts of the climate change in British Columbia diversity. In particular, changes in the temperature and/or precipitation patterns may have serious consequences in food production. Too little/much rain, high/low temperature and dry spells at the wrong time may affect crop production and meat production. Additionally, we may have economic and human consequences due to the possible forest fires which might reach households and cause economical losses and risk human lives.
Therefore, due to its serious consequences on the lives of the planet, it is very important to study climate change and the occurrence of climate events deviating from the expected behavior. Going in this direction, many statistical models are introduced in the literature dealing in some way with the problems related to this change. Among these models, [5] introduces a stochastic model, assuming different scenarios, which is used to estimate, for example, the probabilities of daily changes in precipitation. In another study, [6] considers stochastic differential models to study climate change problems and [7] presents a non-parametric renewal model for modeling daily precipitation data. A Markov model is also used by [8] to analyze precipitation data and [9] provides an overview of stochastic climate theory from the point of view of applied mathematics. Many other statistical models to infer the years that climatic anomalies occur are also presented in the literature [10,11,12,13].
Despite the existence of a very large number of papers using different statistical models introduced in the literature in recent years to study the climate changes, other statistical modeling alternatives may be of interest in the study of changes in precipitation and temperature patterns. Thus, as the main goal of this study, we consider statistical models in the presence of change-points assumed as unknown parameters that should be estimated from the climate data to accurately detect these changes. The presence of one or more change points is very common in time series data derived from many areas of application, for instance, epidemiology, finance, environment, medicine and many others, see, for instance, [14,15,16,17,18], among others. In general, we have change-points when either there is an intervention on the experiment whose results are being recorded or when there is an occurrence of a natural event, related to the data, that abruptly changes the behavior of the time series. This change in behavior may occur to any type of continuous or discrete data. In the case of discrete measurements a common type of change is in the counting of the occurrence of events, for instance, changes in the occurrence of threshold exceedances and number of hospitalizations, among others.
In addition to estimating specific change points, we may also be interested in the number of times a climate variable deviates from its average behavior over a fixed period of time. The change-points, as pointed out above, can be estimated by considering models in the presence of parameters denoting the change-points. If our interest is the number of times the average annual amount of precipitation and temperature exceeds their corresponding mean values over a given time interval, a natural choice for statistical modeling is given by counting processes and in particular non-homogeneous Poisson processes (NHPP). Even though in the NHPP formulation an assumption of independent inter-occurrence times is implicit, this type of model has provided good approximation for many problems, of similar nature as those considered here, that were studied in the past while allowing to calculate specific quantities of interest such as the probability of the number of occurrences in a given time interval.
NHPP have been used to study problems in several areas where problems of similar nature as those considered here are analyzed, see, for instance, in reliability theory [19,20,21], counting marine species [22], air pollution [14, 16], community noise pollution [23], in medicine [15], among many other areas. One common feature of those models is the assumption of several forms for the rate function associated with the NHPP. These rate functions may dependent not only on time, but also on some parameters that need to be estimated.
Different approaches have been considered in order to estimate the parameters of the proposed models including the location of the change-point. Many works, presented in the literature, consider this point especially under a Bayesian approach. This inference approach has some computational advantages when compared with standard maximum likelihood methods especially when dealing with non-homogeneous Poisson processes and, in particular, in the presence of change-points since the likelihood function may have a very complex form. Other advantage of the Bayesian approach is that we could incorporate prior opinions of experts leading to more accurate inference.
Bayesian inference for either homogeneous or non-homogeneous Poisson processes has been discussed by many authors such as [21, 24,25,26,27]. Those processes have also been used to obtain inference for change-point models [25, 28,29,30]. Raftery and Akman [31] consider a Bayesian analysis for homogeneous Poisson processes in the presence of a change-point. Ruggeri and Sivaganesan [32] and [33] consider a Bayesian analysis dealing with a random number of change-points. In the former work, NHPPs with the so-called power law processes (PLP) as rate function is considered and in the latter a stepwise constant rate function is used.
Within the Bayesian framework the estimation of the parameters may be performed using MCMC methods [34,35,36]. We follow this path when we develop a Bayesian analysis assuming different parametric structures for the rates in non-homogeneous Poisson processes either in the presence or in the absence of change-points for the climate time series considered here. The OpenBugs software [37] is used to simulate the MCMC samples.
The paper is organized as follows. Section 2 introduces the methodology where the NHPP models are presented in two situations: models with and without change-points. In Sect. 3 we give the results assuming different climate data sets. Section 4 presents some discussion on the obtained results and concluding remarks. Finally, an Appendix is included after the list of references where we give the OpenBugs code used to generate the samples to estimate the parameters of interest as well as the data sets used in each of the applications.
2 Methodology
2.1 Non-Homogeneous Poisson Models
As mentioned before NHPP are used in many applications where, as in the present work, occurrences of events are of interest. Here, we use this type of model to estimate the probability that a climate variable (in the present case, either precipitation or temperature) exceeds a predetermined threshold a given number of times in a time interval of interest. The threshold, to which the climate variable measurement is compared, is the overall average measurement of the variable of interest taking into account the entire observed period indicated by [0, T], \(T \ge 0\).
Hence, let \(M_t \ge 0\) record the number of times the climate variable is above the threshold (represented by the overall average of the measurements) in the time interval [0, t), \(t \in [0, T]\). We assume that \(M = \{M_t : t \ge 0\}\) follows a non-homogeneous Poisson process with rate and mean functions given, respectively, by \(\lambda (t) > 0\) and,
Recall that the rate function \(\lambda (t)\) dictates the behavior of the Poisson process.
Different parametric forms may be considered for the rate function. Due to the nature of the questions asked when using NHPP models, these functions may be borrowed from studies made in reliability theory. Hence, if \(\theta\) is the vector of parameters present in the rate and mean functions, then in order to explicitly indicate this dependence, we will use, in the description of the models, \(\lambda (t \mid \theta )\) and \(m(t \mid \theta )\) to denote the rate and mean functions, respectively. Note that \(m(t \mid \theta )\) denotes the expected number of events registered by \(M_t\) up to time t. The characterization of a non-homogeneous Poisson process of this type is specified by the functional form of \(m(t \mid \theta )\), or equivalently, of its intensity function \(\lambda (t \mid \theta )\), given by the first derivative of \(m(t \mid \theta )\), that is, \(\lambda (t \mid \theta ) = dm(t \mid \theta )/dt\). For the analysis of climate data, it is interesting to have a rate function \(\lambda (t \mid \theta ), t \ge 0\) showing different behaviors as decreasing or increasing depending of time.
Different formulations of NHPP could be used in the climate data analysis as well as other types of analysis. One of these formulations, usually used in software reliability studies and denoted as NHPP-I, assumes that the mean value function is given by \(m(t) = \alpha F(t)\) where F(t) is the cumulative function of a specified probability distribution and \(\alpha\) is a unknown parameter that should be estimated [21]. Another formulation, also used in software reliability studies and denoted as NHPP-II, is given by taking \(m(t) = - \log (1 - F(t))\) where F(t) is the cumulative function of a probability distribution also usually used in reliability and reliability software applications [38, 39].
For the climate data, we consider five parametric structures: the power law process (PLP) [40, 41]; the Musa–Okumoto process (MOP) [42]; the Goel–Okumoto process (GOP) [43]; a generalized form of Goel–Okumoto (GGOP); and the exponentiated-Weibull (GPLP) [21, 44, 45] which generalizes de PLP model. The PLP, MOP and GPLP models are defined as special cases of the mean function \(m(t) = - \log (1 - F(t))\), that is, in the class NHPP-II, where F(t) is the cumulative function of a Weibull distribution [46] given by \(F(t) = \exp \{-(t/\sigma )^{\alpha }\}, t > 0\) for the PLP, the cumulative function of a Lomax or Pareto type II distribution [47, 48] given by \(F(t) = 1 - (1 - t/\alpha )^{-\beta } , t > 0\) for the MOP and \(F(t) = \{1- \exp [-(t /\sigma )^{\alpha }]\}^{\beta }, t > 0\) the cumulative distribution of an exponentiated-Weibull distribution for the GPLP that generalizes de PLP process. The GOP and the GGOP are obtained from formulation of the mean value function given by \(m(t) = \alpha F(t)\) where F(t) is the cumulative function of an exponential distribution, that is, \(F(t) = 1- \exp (-\beta t)\) for the GOP model and \(F(t) = 1- \exp (- \beta t^{\gamma })\) the cumulative distribution of a Weibull distribution for the GGOP model. Thus, the mean value functions, considered in the present work, are given by,
with \(F_{EW}(t) = {1- \exp [-(t /\sigma )^{\alpha }]}^{\beta }\). The corresponding intensity/rate functions \(\lambda (t \mid \theta ) = dm(t \mid \theta )/dt\) for the mean functions (2) are given by,
where \(G(t) = \alpha \beta \sigma ^{ -1} \{1 - \exp [-(t /\sigma )^{\alpha }]\}^{\beta -1}\)\(\exp\lbrack-(t/\sigma)^\alpha\rbrack(t/\sigma)^{\alpha-1}\) and where \(F_{EW}(t)\) is defined as in (2).
The intensity functions given by (3) define the hazard rates of the time between occurrence of events in the respective models. The several expressions for the rate functions cover a wide range of forms of behavior of the occurrences of the events of interest in function of time. For instance, the intensity function \(\lambda _{PLP}(t \mid \theta )\) presents different behaviors depending on the value of \(\alpha\). These behaviors could be constant, decreasing or increasing depending on whether \(\alpha = 1, \alpha < 1\) or \(\alpha > 1\), respectively. The intensity functions \(\lambda _{MOP}(t\mid \theta )\) and \(\lambda _{GOP}(t\mid \theta )\) presents a decreasing behavior as functions of t and \(\lambda _{GGOP}(t \mid \theta )\) describes the situation where the intensity increases slightly at the beginning and then begins to decrease with t. Moreover, for the rate \(\lambda _{GPLP}(t \mid \theta )\) we observe that: if \(\alpha \ge 1\) and \(\alpha \beta \ge 1, \lambda (t)\) is an increasing function of t; if \(\alpha \le 1\) and \(\alpha \beta \le 1, \lambda (t)\) is a decreasing function of t; if \(\alpha >1\) and \(\alpha \beta < 1, \lambda (t)\) has a bathtub form; if \(\alpha <1\) and \(\alpha \beta > 1, \lambda (t)\) is unimodal.
Two versions of the models will be considered depending on the rate function used. In one version we assume that no change-points are present and in the other we assume the presence of a change-point. Since the Bayesian point of view will be used to estimate the parameters and the OpenBugs software will be used to program the MCMC algorithm, we only need to specify the likelihood function of the model as well as the prior distributions of the parameters involved. We start with the likelihood function.
In order to specify the likelihood function, we need to describe the information actually used in it. In the present case, this information consists of the years where exceedances of the corresponding threshold for each data set occurred. Hence, let n be the number of observed times where these events of interest occurred in the time interval [0, T], \(T \ge 0\) and let \(0< t_1< t_2< \ldots< t_n < T\) denote these times. Thus, the set of observed values is \(D_T = \{n; t_1, \ldots , t_n; T\}\).
The likelihood functions of the two versions of the models considered here are given as follows.
2.2 Likelihood Function Without the Presence of Change-Points
Suppose that there are no change-points, then the likelihood function of the model is given by [49],
2.3 Likelihood Function in the Presence of a Change-Point
When we have a single change-point \(\tau\) making a transition between two NHPP models of the same type but with different parameters, the intensity function of the overall process is given by [14] ,
where \(\lambda (t \mid \theta _j ), j = 1, 2\) are the intensity functions related to the intensity functions defined in (2) and \(\theta _j , j = 1, 2\) are the parameters associated to the NHPP before and after the change-point. The corresponding mean value functions \(m(t \mid \theta _j ), j = 1, 2\), are given by,
Hence, if n exceedances occurred in the time interval [0, T] with the occurrence time given by \(D_{T}\), then we may rewrite this set as \(D_T = \{n; t_1, \ldots ,t_{N_\tau };\) \(t_{N_{\tau }+1}, \ldots ,t_n; T\}\) where \(t_k\), \(k = 1, 2, \ldots , n\) is the time of occurrence of the kth event (in the present case is the kth exceedances of the climate threshold), \(\tau\) is the change-point, and \(N_{\tau }\) is the number of times the event occurred before the change-point. Therefore, when one change-point is allowed the likelihood function of the model is given by,
As a special case, for PLP model in the presence of a change-point, the intensity function (5) is given by,
with the corresponding mean value function given by,
In a similar way, we obtain the rate and mean functions for the MOP, GOP, GGOP and GPLP models in the presence of one change-point. Hence, we may consider a Bayesian approach entirely based on the marginal posterior densities. However, deriving analytical expressions for these densities is infeasible, mainly due to the complexity of the associated log-likelihood function. In our case, we used Markov chain Monte Carlo (MCMC) method based on Gibbs sampling algorithms to simulate samples for the joint posterior distributions of interest [34, 35]. A brief description of the Gibbs sampling algorithm is presented as follows:
-
Suppose \(\pi (\theta \mid y)\) a joint posterior distribution, where \(\theta = (\theta _1,\ldots ,\theta _k)\), on which we want to obtain inferences.
-
For this, we simulate random quantities of complete conditional distributions \(\pi (\theta _i \mid y,\varvec{\theta }_{(i)})\), \(\theta _{(i)} = (\theta _1,\ldots ,\theta _{(i-1)},\theta _{(i+1)},\ldots ,\theta _k)\).
-
Consider the initial (arbitrary) values \(\theta ^{(0)}=(\theta _{1}^{(0)},\theta _{2}^{(0)},\ldots ,\theta _{k}^{(0)})\).
-
Generate \(\theta _{1}^{(1)}\) from \(\pi (\theta _1 \mid y,\theta _{2}^{(0)},\ldots ,\theta _{k}^{(0)})\),
-
Generate \(\theta _{2}^{(1)}\) from \(\pi (\theta _2 \mid y,\theta _{1}^{(1)},\ldots ,\theta _{k}^{(0)})\),
-
\(\vdots\)
-
Generate \(\theta _{k}^{(1)}\) from \(\pi (\theta _k \mid y,\theta _{1}^{(1)},\ldots ,\theta _{k-1}^{(1)})\).
-
Replace the initial values by \(\theta ^{(1)}=(\theta _{1}^{(1)},\theta _{2}^{(1)},\ldots ,\theta _{k}^{(1)})\).
-
The values \(\theta _{1}^{(z)},\theta _{2}^{(z)},\ldots ,\theta _{k}^{(z)}\), for z sufficiently large, converge to a random quantity value with distribution \(\pi (\theta \mid y)\).
For each generated sample, a chain with \(N=200,000\) values was generated for each component of the parameter vector of the model, considering a burn-in period of 5% of the chain’s size. To obtain pseudo-independent samples from the joint posterior distribution, one out every 100 generated values was kept, resulting in chains of size 2,000 for each parameter. We assumed independent uniform U(a, b) or Gamma(c, d) prior distributions for the parameters of the proposed models, considering all data sets taken into account, where the hyperparameters a, b, c and d are known and Gamma(c, d) denotes a gamma distribution with mean c/d and variance \(c/d^2\).
The discrimination of the models was made by comparing plots of the empirical accumulated number of climate violations with the estimated mean value functions versus time of occurrence. The Bayesian analysis for all models was made using the OpenBugs software [37]. Convergence of the Gibbs sampling algorithm was monitored by usual time series plots for the simulated samples. In the selection of the best model we also used the deviance information criterion (DIC) [50] which is an approximation for the Bayes factor (smaller values of DIC indicate better models). However, the results were very similar for all assumed models. That makes it difficult to choose the best model using this criterion.
3 Results
The data sets considered in the present work are the yearly averaged amount of rainfall (in mm) from 1879 to 2002 (\(T = 124\)) and the yearly average of the maximum temperatures (in degrees Celsius - \(^{\circ }\)C) collected from 1915 to 2003 (\(T = 88\)) in a climate station in Almaty, Kazakhstan [51]; the yearly maximum temperature averages collected from 1894 to 2003 (\(T = 110\)) reported at a climate station in Tashkent, Uzbekistan; and the yearly average temperature (in degrees Fahrenheit - \(^{\circ }\)F) collected from 1895 to 2019 (\(T = 125\)) in the USA (https://www.ncdc.noaa.gov/cag/).
The threshold considered in each case is the corresponding overall average measurement. For instance, the threshold for the rainfall data from Kazakhstan is the average of the 124 values belonging to this data set, and in the case of temperature, it is the average of the corresponding 88 measurements.
In the next three subsections we present the application of the NHPP model considered in the earlier sections to each of these data sets.
3.1 Kazakhstan Climate Data
In a first instance, we consider data from Almaty, Kazakhstan. Hence, we have the yearly precipitation averages (in mm) from 1879 to 2002 (given an observation period of \(T = 124\) years) and the yearly average of the maximum temperatures (in \(^{\circ }\)C) measured from 1915 to 2003 (given an observation period of \(T = 88\) years) in a climate station in Almaty, Kazakhstan [51]. Figure 1 shows the plots of these two time series. The plot on the left corresponds to the yearly precipitation averages and that on the right corresponds to the yearly averages of maximum temperatures.
The overall precipitation average for the period of 124 years (1879 to 2002) is equal to 50.88 (data set in Table 5, in Appendix 1), and the overall average of averages of the maximum temperatures for the period of 88 years (1915 to 2003) is equal to 14.939 (data set in Table 6, in Appendix 1). Taking this into account, it is possible to observe an increase in trend, which is above average, in the precipitation as well as temperature data after the years 1920 and 1950, respectively. That could indicate a possible presence of a change-point in the time series since after those years a change in the behavior of the data may be observed.
3.1.1 Yearly Rainfall Averages
Since the overall precipitation average for the period of 124 years is equal to 50.88, we assume for this data analysis a threshold for the precipitation average equal to 51. That gives us \(n = 57\) exceedances, that is, there were 57 years where precipitation averages were above 51 during the follow-up period of \(T = 124\) years.
Under the NHPP setting, we assume that, in the case where no change-points are present, the prior distributions of the parameters appearing in the rate functions given by (2) are the following,
-
PLP: \(\alpha \sim U(0,5); \sigma \sim U(0,10000)\).
-
MO: \(\alpha \sim \text{ Gamma }(0.01,0.01); \beta \sim \text{ Gamma }(0.01,0.01)\).
-
GO: \(\alpha \sim \text{ Gamma }(0.01,0.01); \beta \sim \text{ Gamma }(0.01,0.01)\).
-
GGO: \(\alpha \sim \text{ Gamma }(0.01,0.01); \beta \sim \text{ Gamma }(0.01,0.01); \gamma \sim U(0,5)\).
-
GPLP: \(\alpha \sim \text{ Gamma }(0.1,0.1); \sigma \sim \text{ Gamma }(0.1,0.1); \beta \sim U(0,100)\).
Observe that we are using non-informative prior distributions for the parameters of the proposed models. Estimation of the parameters was performed using a sample of size 1000 taken every 100th simulated Gibbs sample after a burn-in period of 11000 iterations.
Table 1 shows the posterior means, standard deviations and 95% credible intervals for the parameters of each model considering the yearly precipitation averages from Kazakhstan where the threshold 51 was used.
From Table 1, we have that assuming a squared error loss function, the Bayesian estimator is given by the posterior estimated mean which is equal to 42.53 (or year 43 which corresponds to the year 1921), but considering the posterior estimated median (32.63), this change-point occurred earlier, in year 33 corresponding to the year 1911.
In Fig. 2, leftmost figure, we have the plots of the accumulated average precipitation exceedances (values above the threshold 51) as well as the estimated mean value function assuming each one of the five proposed rate functions and when no change-points are present. It is possible to observe that the best fitted model is given by the PLP. Even though the GPLP model has three parameters and good convergence of the MCMC algorithm, it has not produced a better fit for this data set. Note that this model possibly has some identifiability problems, and for practical use it is necessary very informative prior distributions based on opinion of climate experts.
Also in Fig. 2, middle figure, we have the plots of the empirical mean value function m(t) and the estimated using the PLP model (the best fitted model when no change-points are present) when a change-point is allowed and when we assume the following prior distributions: \(\alpha _j \sim U(1.3,1)\); \(\sigma _j \sim U(0,10)\) and \(\tau \sim U(1,124), j = 1,2\). Observe that when specifying the prior distributions in the case of one change-point, we use some prior information obtained from the Bayesian estimators assuming PLP model without the presence of change-points, especially for the parameters \(\alpha _1\) and \(\alpha _2\), i.e., we use a Bayesian empirical analysis [52]. Figure 2 also shows the histogram of the generated Gibbs samples for the change-point \(\tau\) assuming the PLP model in the presence of a change-point (rightmost figure).
Table 1 also shows the posterior summaries of interest for the PLP model in the presence of a change-point. With the obtained Bayesian estimate for the change-point \(\tau = 42.63\), that is, \(\tau = 43\) which corresponds to the year 1921, from (8) we have \(m(t) = (t/5.082)^{1.354}\) if \(t \le 43\), and \(m(t) = (43/5.082)^{1.354} + (t/6.012)^{1.307} - (43/6.012)^{1.307}\) if \(t > 43\).
3.1.2 Yearly Maximum Temperature Averages
When we consider the temperature data, we have seen that the overall average for the period of 88 years is 14.939. Hence, we take 15 as the threshold of the Kazakhstan temperature data. Using this threshold, there were \(n = 43\) years where exceedances occurred, that is, there were 43 years with values above the threshold 15 in the follow-up period of \(T = 88\) years.
Consider now the NHPP formulation with parametric forms for the intensity function given by (2). In the present case, we use the same prior distributions (9) assumed for the Kazakhstan’s precipitation data as well as the same MCMC simulation procedure.
Table 2 shows the posterior means, standard deviations and 95% credible intervals for the parameters of each model taking into account the yearly averages of the maximum temperatures and the threshold 15.
In Fig. 3, leftmost figure, we have the plots of the accumulated exceedances of averages of the maximum temperatures (values above the threshold 15) and the estimated mean value functions assuming each one of the five proposed models and with no change-points present.
Looking at Fig. 3, it is observed that the best fitted model, when no change-points are allowed, is again given by the PLP model. Also in Fig. 3, middle figure, we have the plots of the empirical m(t) and the fitted by a PLP in the presence of a change-point when we use the following prior distributions: \(\alpha _j \sim U(1.3,1)\); \(\sigma _j \sim U(0,10)\) and \(\tau \sim U(1,88), j = 1,2\).
The posterior summaries of interest for the PLP model in the presence of a change-point are also shown in Table 2. The estimated change-point \(\tau = 33\) (approximation for the estimated value 32.56) corresponds to the year 1948. Hence, from (8), we have \(m(t) = (t/3.884)^{1.09}\) if \(t \ge 33\) and \(m(t) = (33/3.884)^{1.09} + (t/6.24)^{1.407}- (33/6.24)^{1.307}\) if \(t > 33\).
Figure 3 also shows the histogram of the simulated Gibbs samples for the change-point \(\tau\) (rightmost plot) assuming the PLP model with one change-point. Moreover, from Table 2, we observe that assuming a squared error loss function the Bayesian estimator of the change-point, given by the posterior mean, is equal to 32.56, but considering the posterior median (23.505), this estimated change-point occurred early in the year 23 corresponding to the year 1938. Hence, the changes in the climate variable corresponding to temperature have been observed for quite a while.
3.2 Uzbekistan Maximum Temperature Data
We consider now the Uzbekistan’s temperature (\(^{\circ }\)C) data (data set in Table 7, in Appendix 1). Hence, we have the yearly averages of the maximum temperatures collected from 1894 to 2003 (giving an observed period of \(T =110\) years) in a climate station in Tashkent, Uzbekistan, another country in Central Asia [51]. Figure 4 shows the plot of this time series.
The overall average for the period of 110 years is 20.7395. Thus, we assume in the application of the model a threshold equal to 21. That gives us \(n = 50\) exceedances, that is, there were 50 years in which the average of the maximum temperature averages is above 21 during the follow-up period of \(T = 110\) years.
Hence, under the NHPP formulation we consider the parametric forms for the intensity function given by (2) and the prior distributions given in (9). Table 3 shows the posterior means, standard deviations and 95% credible intervals for the parameters of each model considering the yearly averages of the maximum temperature averages when the threshold equals to 21. The same MCMC scheme considered in Sect. 3.1 is used here as well.
Figure 5, leftmost figure, shows the plots of the accumulated maximum temperature averages exceedances (values above the threshold 21) and the estimated mean value functions assuming each of the five proposed forms for the rate function and when no change-points are present.
It is possible to observe, by looking at Fig. 5, that, in the case of no change-points, the best fit is again given by the PLP model. Figure 5 also shows, middle figure, the plots of the empirical and fitted m(t) when the PLP model in the presence of a change-point is used and when the following prior distributions considered: \(\alpha _j \sim U(1.5,1); \sigma _j \sim U(0,30)\) and \(\tau \sim U(1,110), j = 1,2\).
From Table 3, we may observe that assuming a squared error loss function, the Bayesian estimator for the change-point, given by the posterior mean, is equal to 43.20 corresponding to a change-point \(\tau = 43\). Using this change-point, from (8) we have that, \(m(t) = (t/6.752)^{1.264}\) if \(t \le 43\) and \(m(t) = (43/6.752)^{1.264} + (t/18.12)^{2.103} - (43/18.12)^{2.103}\) if \(t > 43\).
3.3 US Climate Data
Now, consider the USA yearly average temperatures (\(^{\circ }\)F) data from 1895 to 2019 giving an observed period of \(T = 125\) years. The overall of the 125 consecutive years measurements is 52.16 (data set given in Table 8, in the Appendix 1). The data published in August 2020 (see https://www.ncdc.noaa.gov/cag/) are presented in Fig. 6.
Looking at that Fig. 6 we may also see that, apparently, from the year 1982 there is an increasing trend in the values of the average temperature until the year 2019.
Considering the five rate functions, described in Sect. 2, in the model without the presence of change-points, we assume the same prior distributions (9) for the parameters of the models, except for the parameter \(\gamma\) of the GGOP model where we assume a prior \(\gamma \sim U(0,100)\) instead of \(\gamma \sim U(0,5)\). The same MCMC scheme used for the climate data of Kazakhstan using the OpenBugs software is applied to the present data set. Since the mean temperature during the observed period is 52.16, for the present data set, we take 52 as the threshold value. That gives \(n = 60\) exceedances, that is, there were 60 years in which the temperature averages were above 52 during the follow-up period of \(T = 125\) years.
Table 4 shows the posterior means, standard deviations and 95% credible intervals for the parameters of each model when we take into account the temperature averages in USA with a threshold equals to 52.
Figure 7, leftmost figure, shows the plots of the accumulated average precipitation exceedances (values above the threshold 52) and the estimated mean value functions assuming each one of the five proposed models when no change-points are allowed.
Looking at the leftmost plots in Fig. 7, we may see that the best fit is again given by the PLP. In the same Fig. 7, middle figure, we have the plots of the empirical m(t) and fitted by a PLP model in the presence of a change-point where the following prior distributions were assumed: \(\alpha _j \sim U(1.7,1); \sigma _j \sim U(0,30)\) and \(\tau \sim U(1,124), j = 1,2\).
In Table 4 we also have the posterior summaries of interest for the PLP model in the presence of a change-point. The estimated change-point \(\tau = 55\) (approximation to the Bayesian estimate \(\tau = 54.67\)) corresponds to the year 1949. Hence, from (8), we have that \(m(t) = (t/11.5)^{1.646}\) if \(t \le 55\) and \(m(t) = (55/11.5)^{1.646} + (t/16.51)^{1.978} - (55/16.51)^{1.978}\) if \(t > 55\).
In Fig. 7, rightmost figure, we also have the histogram of the generated Gibbs samples for the change-point \(\tau\) assuming the PLP model in the presence of a change-point. In this case, the posterior mean equal to 54.67 (or year 55) is close to the posterior median which is 56.38.
4 Discussion and Concluding Remarks
The use of the methodology based in NHPP in the presence of one change-point considered in this study could be applied to any climate data to detect the year in which a possible change in climate might have occurred. The proposed methodology also could be generalized for situations with more than one change-point. However, we do not take that into account. One reason for not doing so is that changes in climate are, in general, a very slow process and the occurrence of a comparatively new change-point detection may be considered a rare event to occur in the future, during the period of time considered in the present work. However, this might be a subject of a future study if an extended data set is obtained. The Bayesian inferences of interest (point estimators and credible intervals for the parameters of the models) are obtained using MCMC simulation methods where existing free software like the OpenBugs may be used with few computational costs. Important results were obtained from the data analysis indicating significative climate changes in the applications considered in the study. They are listed as follows.
In the first application, i.e., where the rainfall data from Kazakhstan (1879 to 2002) are used, there are 11 years before the estimated change-point (year 43 corresponding to the year 1921) and 46 years after, with precipitation averages above the overall average. This might be considered as an indication that the average rain has been increasing after the year 1921. Moreover, in the second application, i.e., where the temperature data from Kazakhstan (1915 to 2003) are used we have that: there are 13 years before the estimated change-point (year 33 which corresponds to the year 1948) and 30 years after, with maximum temperature averages above the overall average, an indication that the average maximum temperature has been increasing after the year 1948.
In the third application, i.e., when the temperature data from Uzbekistan (1894 to 2003) are used, we have that: there are 11 years before the estimated change-point (year 43 which corresponds to the year 1936) and 99 years after, with maximum temperature averages above the overall average. That is also an indication that the average maximum temperature has been increasing after the year 1936.
Finally, when average temperature data from USA (1895 to 2019) are used, we have the following: there are 18 years before the estimated change-point (year 55 which corresponds to the year 1949) and 106 years after the estimated change-point with average temperatures above the overall average, also an indication that the average temperature has been increasing after the year 1949.
Another possible way of studying the type of problems considered here, is to use the so-called Hawkes process. In this case, the intensity function ruling the occurrences of events that we are trying to study is given by \(\lambda ^{*}(t) = f^{*}(t)/(1- F^{*}(t))\), \(t \ge 0\), where \(f^{*}(\cdot )\) and \(F^{*}(\cdot )\) are the density and distribution functions of the occurrence times conditioned to the past occurrences times. This type of models may be useful in the cases where exceedances may occur in small clusters. However, this paths is not followed here and is the subject of future studies. For a review regarding the Hawkes process see, for instance, [53].
Data Availability
See Appendix 1.
Code Availability
See Appendix 2.
References
Intergovernmental Panel on Climate Change. (2019). IPCC, 2018: Global Warming of 1.5\(^{\circ }\)C. An IPCC Special Report on the impacts of global warming of 1.5\(^{\circ }\)C above pre-industrial levels and related global greenhouse gas emission pathways, in the context of strengthening the global response to the threat of climate change, sustainable development, and efforts to eradicate poverty. V. Masson-Delmotte, P. Zhai, H.-O. Pörtner, D. Roberts, J. Skea, P.R. Shukla, A. Pirani, W. Moufouma-Okia, C. Péan, R. Pidcock, S. Connors, J.B.R. Matthews, Y. Chen, X. Zhou, M.I. Gomis, E. Lonnoy, T. Maycock, M. Tignor, and T. Waterfield (Eds.).
Joyce, L. A., & Birdsey, R. (2000). The impact of climate change on America’s forests: A technical document supporting the 2000 USDA Forest Service RPA Assessment. In L.A. Joyce, R. Birdsey (Eds.) Gen. Tech. Rep RMRS-GTR-59. Fort Collins, CO: U.S. Department of Agriculture, Forest Service, Rocky Mountain Research Station.
Doney, S. C., Ruckelshaus, M., Duffy, J. E., et al. (2012). Climate change impacts on marine ecosystems. Annual Review of Marine Sciences, 4, 11–37.
Gayton, D. V. (2008). Impacts of climate change on British Columbia’s biodiversity: A literature review. Extended Abstract. BC Journal of Ecosystems and Management, 9, 26-30.
Katz, R. W. (1996). Use of conditional stochastic models to generate climate change scenarios. Climate Change, 32, 237–255.
Majda, A. J., Tomofeyev, I., & Eijnden, E. V. (2001). A mathematical framework for stochastic climate models. Communications on Pure and Applied Mathematics LIV, 891–974.
Rajagopalan, B., Lall, U., & Tarboton, D. G. (1984). A non parametric renewal model for modeling daily precipitation. In K.W. Hipel, A.I. McLeod, U.S. Panu, V.P. Sing (Eds.) Stochastic and Statistical Methods in Hydrology and Environmental Engineeing, 4, 47-59.
Outayek, S. E., & Nguyen, V. -T. -V. (2019) Stochastic modeling of daily rainfall process in the context of climate change. In: CSCE Annual Conference.
Franzke, C. L. E., O’Kane, T. J., Bener, J., Williams, P. D., & Lucarini, V. (2015). Stochastic climate theory and modeling. WIRES Climate Change, 6, 63–78.
Alexander, L. V., Zhang, X., Peterson, T. C., Caesar, J., Gleason, B., Klein Tank, A., Haylock, M., Collins, D., Trewin, B., Rahimzadeh, F., et al. (2006). Global observed changes in daily climate extremes of temperature and precipitation. Journal of Geophysical Research: Atmospheres, 111(D5).
Foley, A. (2010). Uncertainty in regional climate modelling: A review. Progress in Physical Geography, 34(5), 647–670.
Powell, B. (2016). Chapter 22 - Statistical modelling of climate change. In: Climate Change. Elsevier, 341–354.
Richards, G. R. (1993). Change in global temperature: A statistical analysis. Journal of Climate, 6(3), 546–559.
Achcar, J. A., Rodrigues, E. R., Paulino, C. D., & Soares, P. (2010). Non-homogeneous Poisson models with a change-point: An application to ozone peaks in Mexico City. Environmental and Ecological Statistics, 17(4), 521–541.
Achcar, J. A., & Sicchieri, M. P. L. (2010). da Silva: Comparação de modelos para os excessos de internações hospitalares diárias por pneumonia na cidade de São Paulo. (In Portuguese) Revista Brasileira de Biometria, 28, 87-103.
Achcar, J. A., Rodrigues, E. R., & Tzintzun, G. (2011). Using non-homogeneous Poisson models with multiple change-points to estimate the number of ozone exceedances in Mexico City. Environmetrics, 22(1), 1–12.
Coelho-Barros, E. A., Achcar, J. A., Martinez, E. Z., Davarzani, N., & Grabsh, H. I. (2019). Bayesian inference for segmented Weibull distribution. Revista Colombiana de Estadística, 42, 225–243.
Puziol de Oliveira, R., Chen, C., & Achcar, J. A. (2018). Bayesian estimation for change-points in counting time series: a unified approach using linear regression model and non-homogeneous Poisson processes. In: Workshop de Bioestatística, 05-07 December 2018, Maringá-PR, Brazil.
Achcar, J. A. (2001). Ch. 29. Bayesian analysis for software reliability data. Handbook of Statistics, 20, 733–748.
Achcar, J. A., Dey, D .K., & Niverthi, M. (1998). A Bayesian approach using nonhomogeneous Poisson process for software reliability models. In Frontiers in Reliability, Basu, SK, Mukhopadhyay, S. (eds.); Series on Quality, Reliability and Engeneering Statistics 4; Calcutta University; India
Cid, J. E. R., & Achcar, J. A. (1999). Bayesian inference for nonhomogeneous poisson processes in software reliability models assuming nonmonotonic intensity functions. Computational Statistics & Data Analysis, 32(2), 147–159.
Wilson, S. P., & Costello, M. J. (2005). Predicting future discoveries of european marine species by using a non-homogeneous renewal process. Journal of the Royal Statistical Society: Series C (Applied Statistics), 54(5), 897–918.
Guarnaccia, C., Quartieri, J., Tepedino, C., & Rodrigues, E. R. (2015). An analysis of airport noise data using a non-homogeneous Poisson model with a change-point. Applied Acoustics, 91, 33-39.
Achcar, J., Martinez, E., Ruffino-Netto, A., Paulino, C., & Soares, P. (2008). A statistical model investigating the prevalence of tuberculosis in new york city using counting processes with two change-points. Epidemiology & Infection, 136(12), 1599–1605.
Achcar, J. A., & Loibel, S. (1998). Constant hazard function models with a change point: A Bayesian analysis using Markov chain Monte Carlo methods. Biometrical Journal: Journal of Mathematical Methods in Biosciences, 40(5), 543–555.
Kuo, L., & Yang, T. Y. (1996). Bayesian computation for nonhomogeneous Poisson processes in software reliability. Journal of the American Statistical Association, 91(434), 763–773.
Pievatolo, A., & Ruggeri, F. (2004). Bayesian reliability analysis of complex repairable systems. Applied Stochastic Models in Business and Industry, 20(3), 253–264.
Achcar, J. A., & Bolfarine, H. (1989). Constant hazard against a change-point alternative: A Bayesian approach with censored data. Communications in Statistics-Theory and Methods, 18(10), 3801–3819.
Carlin, B. P., Gelfand, A. E., & Smith, A. F. (1992). Hierarchical Bayesian analysis of changepoint problems. Journal of the Royal Statistical Society: Series C (Applied Statistics), 41(2), 389–405.
Dey, D. K., & Purkayastha, S. (1997). Bayesian approach to change point problems. Communications in Statistics-Theory and Methods, 26(8), 2035–2047.
Raftery, A. E., & Akman, V. (1986). Bayesian analysis of a Poisson process with a change-point. Biometrika, 85–89.
Ruggeri, F., & Sivaganesan, S. (2005). On modeling change points in non-homogeneous Poisson processes. Statistical Inference for Stochastic Processes, 8(3), 311–329.
Gyarmati-Szabó, J., Bogachev, L. V., & Chen, H. (2011). Modelling threshold violations of air pollution concentrations using multiple change-points Poisson process. Atmospheric Environment, 45, 5493–5503.
Chib, S., & Greenberg, E. (1995). Understanding the Metropolis-Hastings algorithm. The American Statistician, 49(4), 327–335.
Gelfand, A. E., & Smith, A. F. (1990). Sampling-based approaches to calculating marginal densities. Journal of the American Statistical Association, 85(410), 398–409.
Smith, A. F., & Roberts, G. O. (1993). Bayesian computation via the Gibbs sampler and related Markov chain Monte Carlo methods. Journal of the Royal Statistical Society: Series B (Methodological), 55(1), 3–23.
Spiegelhalter, D. J., Thomas, A., Best, N., & Lunn, D. (2003). WinBUGS version 1.4 user manual. MRC Biostatistics Unit, Cambridge. URL: http://www.mrc-bsu.cam.ac.uk/bugs.
Oliveira, R. P., Achcar, J. A., Mazucheli, J., & Bertoli, W. (2021). A new class of bivariate Lindley distributions based on stress and shock models and some of their reliability properties. Reliability Engineering & System Safety, 211, 107528.
Yang, T. Y. (1994). Computational approaches to bayesian inference for software reliability. PhD Thesis, Department of Statistics, University of Connecticut, Storrs, USA.
Crow, L. H. (1974). Reliability analysis for complex repairable systems. In: Proschan, F., Serfling, R. J. (Eds.). Reliability and Biometry, 379–410.
Crow, L. H. (1982). Confidence interval procedures for the Weibull process with applications to reliability growth. Technometrics, 24(1), 67–72.
Musa, J. D., & Okumoto, K. (1984). A logarithmic Poisson execution time model for software reliability measurement. In: Proceedings of the 7th international Conference on Software Engineering. Citeseer, 230–238.
Goel, A., & Okumoto, K. (1978). An analysis of recurrent software failures in real-time control system. In: Proceedings of ACM Conference. Washington, 496–500.
Cancho, V., Bolfarine, H., & Achcar, J. (1999). A Bayesian analysis for the exponentiated-Weibull distribution. Journal Applied Statistical Science, 8(4), 227–42.
Mudholkar, G. S., Srivastava, D. K., & Freimer, M. (1995). The exponentiated Weibull family: A reanalysis of the bus-motor-failure data. Technometrics, 37(4), 436–445.
Weibull, W., et al. (1951). A statistical distribution function of wide applicability. Journal of Applied Mechanics, 18(3), 293–297.
Kotz, S., Balakrishnan, N., & Johnson, N. L. (2004). Pareto distributions. Continuous multivariate distributions, Volume 1: Models and applications. John Wiley & Sons.
Lomax, K. S. (1954). Business failures: Another example of the analysis of failure data. Journal of the American Statistical Association, 49(268), 847–852.
Cox, D. R., & Lewis, P. A. (1996). Statistical analysis of series of events. U.K.: Methuem.
Spiegelhalter, D. J., Best, N. G., Carlin, B. P., & Van Der Linde, A. (2002). Bayesian measures of model complexity and fit. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 64(4), 583–639.
Williams, M., & Konovalov, V. (2008). Central Asia temperature and precipitation data, 1879–2003. Boulder, Colorado: USA National Snow and Ice Data Center. https://doi.org/10.7265/N5NK3BZ8. [Accessed in 10/20/2020].
Carlin, B. P., & Louis, T. A. (2000). Bayes and empirical Bayes methods for data analysis. Chapman & Hall/CRC.
Laub, P. J., Taimre, T., & Pollett, P. K. (2015). Hawkes processes. In arXiv:1507.02822v1.
Acknowledgements
The authors are grateful to an anonymous reviewer for the comments and suggestions which helped to improve the presentation of the results and also for suggesting the work about Hawkes processes. The authors are also grateful to Dr Eliane R. Rodrigues for a review of the manuscript and important comments that led to the great improvement of the article.
Funding
Not applicable.
Author information
Authors and Affiliations
Contributions
Jorge Alberto Achcar and Ricardo Puziol de Oliveira contributed to revision, writing, analysis and methodology.
Corresponding author
Ethics declarations
Conflicts of Interest/Competing Interests
There is no conflict of interest.
Additional information
Publisher’s Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Appendices
Appendix 1
Appendix 2
Rights and permissions
About this article
Cite this article
Achcar, J.A., de Oliveira, R.P. Climate Change: Use of Non-Homogeneous Poisson Processes for Climate Data in Presence of a Change-Point. Environ Model Assess 27, 385–398 (2022). https://doi.org/10.1007/s10666-021-09797-z
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10666-021-09797-z