Abstract
In this paper we present an initial analysis of job arrivals in a production data-intensive Grid and investigate several traffic models to characterize the interarrival time processes. Our analysis focuses on the heavy-tail behavior and autocorrelation structures, and the modeling is carried out at three different levels: Grid, Virtual Organization (VO), and region. A set of m-state Markov modulated Poisson processes (MMPP) is investigated, while Poisson processes and hyperexponential renewal processes are evaluated for comparison studies. We apply the transportation distance metric from dynamical systems theory to further characterize the differences between the data trace and the simulated time series, and estimate errors by bootstrapping. The experimental results show that MMPPs with a certain number of states are successful to a certain extent in simulating the job traffic at different levels, fitting both the interarrival time distribution and the autocorrelation function. However, MMPPs are not able to match the autocorrelations for certain VOs, in which strong deterministic semi-periodic patterns are observed. These patterns are further characterized using different representations. Future work is needed to model both deterministic and stochastic components in order to better capture the correlation structure in the series.
Access provided by Autonomous University of Puebla. Download to read the full chapter text
Chapter PDF
Similar content being viewed by others
Keywords
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.
References
Asmussen, S., Nerman, O., Olsson, M.: Fitting phase-type distribution via the EM algorithm. Scand. J. Statist. 23, 419–441 (1996)
Barabasi, A.-L.: The origin of bursts and heavy tails in human dynamics. Nature 435, 207–211 (2005)
Basu, S., Foufoula-Georgiou, E.: Detection of nonlinearity and chaoticity in time series using the transportation distance function. Physics Letters A 301, 413–423 (2002)
Beran, J.: Statistics for Long Memory Processes. Chapman and Hall, New York (1994)
Brémaud, P.: Markov Chains. Gibbs Fields, Monte Carlo Simulation, and Queues. Springer, New York (2001)
Chapin, S.J., et al.: Benchmarks and standards for the evaluation of parallel job schedulers. In: Feitelson, D.G., Rudolph, L. (eds.) JSSPP 1999, IPPS-WS 1999, and SPDP-WS 1999. LNCS, vol. 1659, pp. 67–90. Springer, Heidelberg (1999)
Cirne, W., Berman, F.: A comprehensive model of the supercomputer workload. In: IEEE 4th Annual Workshop on Workload Characterization, IEEE Computer Society Press, Los Alamitos (2001)
Davison, A.C., Hinkley, D.V.: Bootstrap Methods and Their Applications. Cambridge University Press, Cambridge (1997)
Downey, A.B., Feitelson, D.G.: The elusive goal of workload characterization. Performance Evaluation Review 26(4), 14–29 (1999)
Dumitrescu, C., Raicu, I., Foster, I.: DI-GRUBER: A Distributed Approach to Grid Resource Brokering. In: Proceedings of Supercomputing ’05, ACM Press, New York (2005)
Workload Management in EGEE and gLite. http://lxmi.mi.infn.it/egee-jra1-wm/
EMpht program. http://home.imf.au.dk/asmus/
Feitelson, D.G.: Workload modeling for performance evaluation. In: Calzarossa, M.C., Tucci, S. (eds.) Performance 2002. LNCS, vol. 2459, pp. 114–141. Springer, Heidelberg (2002)
Fischer, W., Meier-Hellstern, K.: The Markov-modulated Poisson process (MMPP) cookbook. Performance Evaluation 18(2), 149–171 (1993)
Li, H.: Tools for Workload Modeling in the Grid. http://www.liacs.nl/home/hli/gwm/
Heffes, H., Lucantoni, D.M.: A Markov modulated characterization of packetized voice and data traffic and related statistical multiplexer performance. IEEE J. on Sel. Areas in Comm. 4(6), 856–868 (1986)
Jagerman, D.L., Melamed, B., Willinger, W.: Stochastic modeling of traffic processes. In: Frontiers in Queueing: Models, Methods and Problems, CRC Press, Boca Raton (1996)
Kantz, H., Schreiber, T.: Nonlinear Time Series Analysis. Cambridge University Press, Cambridge (2003)
Karagiannis, T., Faloutsos, M.: SELFIS: A Tool For Self-Similarity and Long-Range Dependence Analysis. In: 1st Workshop on Fractals and Self-Similarity in Data Mining: Issues and Approaches, Canada (2002)
Künsch, H.R.: The jackknife and bootstrap for general stationary observations. The Annals of Statistics 17, 1217–1241 (1989)
The Worldwide LHC Computing Grid project. http://lcg.web.cern.ch/LCG/
Leland, W., et al.: On the self-similar nature of ethernet traffic (extended version). IEEE/ACM Trans. on Networking 2(1), 1–15 (1994)
Li, H., Groep, D., Wolters, L.: Workload Characteristics of a Multi-cluster Supercomputer. In: Feitelson, D.G., Rudolph, L., Schwiegelshohn, U. (eds.) JSSPP 2004. LNCS, vol. 3277, pp. 176–193. Springer, Heidelberg (2005)
lp_solve 5.5.0.7. http://lpsolve.sourceforge.net/5.5/
Lublin, U., Feitelson, D.G.: The workload on parallel supercomputers: modeling the characteristics of rigid jobs. J. Para. and Dist. Comput. 63(11), 1105–1122 (2003)
Löbel, A.: Solving large-scale real-world minimum-cost flow problems by a network simplex method. Technical Report SC 96-7, Konrad-Zuse-Zentrum für Informationstechnik Berlin (ZIB) (February 1996), Software available at http://www.zib.de/Optimization/Software/Mcf/
Medernach, E.: Workload analysis of a cluster in a Grid environment. In: Feitelson, D.G., et al. (eds.) JSSPP 2005. LNCS, vol. 3834, Springer, Heidelberg (2005)
Moeckel, R., Murray, B.: Measuring the distance between timeseries. Physica D 102, 187–194 (1997)
Muskulus, M., et al.: Estimating differences between probability densities and time series (In preparation), Software available at http://www.math.leidenuniv.nl/~muskulus/
Neuts, M.F.: Structured Stochastic Matrices of M/G/1-type and their Applications. Marcel Dekker, New York (1989)
Nabrzyski, J., Schopf, J.M., Weglarz, J.: Grid Resource Management: State of the Art and Future Trends. International Series in Operations Research & Management Science. Springer, Heidelberg (2003)
Politis, D.N.: The Impact of Bootstrap Methods on Time Series Analysis. Statistical Science 18(2), 219–230 (2003)
Parallel Workload Archive. http://www.cs.huji.ac.il/labs/parallel/workload/
Riska, A.: Aggregate Matrix-analytic Techniques and their Applications. PhD thesis, Department of Computer Science, College of William and Mary (2002)
Roberts, W.J.J., Ephraim, Y., Dieguez, E.: On Ryden’s EM algorithm for estimating MMPP’s. IEEE Sig. Proc. Let. (to appear)
The LCG Real Time Monitor. http://gridportal.hep.ph.ic.ac.uk/rtm/
Ryden, T.: Parameter estimation for Markov modulated Poisson processes. Communications in Statistics - Stochastic Models 10(4), 795–829 (1994)
Ryden, T.: An EM algorithm for estimation in Markov-modulated Poisson processes. Comp. Stat. and Data Analysis 21, 431–447 (1996)
Schrijver, A.: Theory of Linear and Integer Programming. Wiley, Chichester (1998)
Scott, S.L.: Bayesian Methods for Hidden Markov Models: Recursive Computing in the 21st Century. J. Am. Stat. Assoc. 97(457), 337–351 (2002)
Song, B., Ernemann, C., Yahyapour, R.: Parallel Computer Workload Modeling with Markov Chains. In: Feitelson, D.G., Rudolph, L., Schwiegelshohn, U. (eds.) JSSPP 2004. LNCS, vol. 3277, pp. 47–62. Springer, Heidelberg (2005)
Squillante, M.S., Yao, D.D., Zhang, L.: The impact of job arrival patterns on parallel scheduling. ACM SIGMETRICS Performance Evaluation Review 26(4), 52–59 (1999)
Takeuchi, J.I., Yamanishi, K.: A Unified Framework for Detecting Outliers and Change Points from Time Series. IEEE Transactions on Knowledge and Data Engineering 18(4), 482–492 (2006)
Willinger, W., Taqqu, M.S., Erramilli, A.: A Bibliographical Guide to Self-Similar Traffic and Performance Modeling for Modern High-Speed Networks. In: Stochastic Networks: Theory and Applications, pp. 339–366. Oxford University Press, Oxford (1996)
Author information
Authors and Affiliations
Editor information
Rights and permissions
Copyright information
© 2007 Springer Berlin Heidelberg
About this paper
Cite this paper
Li, H., Muskulus, M., Wolters, L. (2007). Modeling Job Arrivals in a Data-Intensive Grid. In: Frachtenberg, E., Schwiegelshohn, U. (eds) Job Scheduling Strategies for Parallel Processing. JSSPP 2006. Lecture Notes in Computer Science, vol 4376. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-71035-6_11
Download citation
DOI: https://doi.org/10.1007/978-3-540-71035-6_11
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-71034-9
Online ISBN: 978-3-540-71035-6
eBook Packages: Computer ScienceComputer Science (R0)