Abstract
Sequences of events are a common type of data in various scientific and business applications, e.g. telecommunication network management, study of web access logs, biostatistics and epidemiology. A natural approach to modelling event sequences is using time-dependent intensity functions, indicating the expected number of events per time unit. In Bayesian modelling, piecewise constant functions can be utilized to model continuous intensities, if the number of segments is a model parameter. The reversible jump Markov chain Monte Carlo (RJMCMC) methods can be exploited in the data analysis. With very large quantities, these approaches may be too slow. We study dynamic programming algorithms for finding the best fitting piecewise constant intensity function, given a number of pieces. We introduce simple heuristics for pruning the number of the potential change points of the functions. Empirical evidence from trials on real and artificial data sets is provided, showing that the developed methods yield high performance and they can be applied to very large data sets. We also compare the RJMCMC and dynamic programming approaches and show that the results correspond closely. The methods are applied to fault-alarm sequences produced by large telecommunication networks.
Article PDF
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.Avoid common mistakes on your manuscript.
References
Arjas E (1989) Survival models and Martingale dynamics. Scand J Stat 16:177–225
Bernardo J, Smith A (1994) Bayesian Theory. Wiley, Chichester
Brooks S, Giudici P (1999) Diagnosing convergence of reversible jump MCMC algorithms. In: Bernardo J, Berger J, Dawid A, Smith A (eds) Bayesian Statistics 6. Oxford University Press, Oxford, pp 733–742
Chib S, Greenberg E (1995) Understanding the Metropolis–Hastings algorithm. Am Stat 49:327–335
Eerola M, Mannila H, Salmenkivi M (1998) Frailty factors and time-dependent hazards in modeling ear infections in children using Bassist. In: Proceedings of XIII symposium on computational statistics (COMPSTAT ’98). Physica-Verlag, Heidelberg, pp 287–292
Gamerman D (1997) Markov chain Monte Carlo. Stochastic simulation for Bayesian inference. Texts in Statistical Science. Chapman and Hall, London
Gelman A, Carlin J, Stein H, Rubin D (1995) Bayesian data analysis. Texts in Statistical Science. Chapman, New York
Green P (1995) Reversible jump Markov chain Monte Carlo computation and Bayesian model determination. Biometrika 82:711–732
Guralnik V, Srivastava J (1999) Event detection from time series data. In: Proceedings of the Fifth ACM SIGKDD international conference on knowledge discovery and data mining (KDD-1999). San Diego, CA, USA, pp 33–42
Guttorp P (1995) Stochastic modeling of scientific data. Stochastic modeling series. Chapman and Hall, London
Hawkins D (1976) Point estimation of parameters of piecewise regression models. Journal Roy Stat Soc Ser C 25:51–57
Hätönen K, Klemettinen M, Mannila H et al (1996) TASA: Telecommunication alarm sequence analyzer, or “how to enjoy faults in your network.” In: Proceedings of the 1996 IEEE network operations and management symposium (NOMS’96). Kyoto, Japan, pp 520–529
Mannila H, Salmenkivi M (2001) Finding simple intensity descriptions from event sequence data. In: Proceedings of the seventh ACM SIGKDD international conference on knowledge discovery and data mining (KDD-2001). San Francisco, CA, USA, pp 341–346
Tierney L (1994) Markov chains for exploring posterior distributions. An Stat 22:1701–1728.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Salmenkivi, M., Mannila, H. Using Markov chain Monte Carlo and dynamic programming for event sequence data. Knowl Inf Syst 7, 267–288 (2005). https://doi.org/10.1007/s10115-004-0157-6
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10115-004-0157-6