Abstract
The growing number of scientific computation-intensive applications calls for an efficient utilization of large-scale, potentially interoperable distributed infrastructures. Parameter sweep applications represent a large body of workflows. While the principle of workflows is easy to conceive, their execution is very complex and no universally accepted solution exists. In this paper we focus on the resource allocation challenges of parameter study jobs in distributed computing infrastructures. To cope with this NP-hard problem and the high uncertainty present in these systems, we propose a series of job allocation models that helps refining and simplifying the problem complexity. In this way we present some special cases that are polynomial and show how more complex scenarios can be reduced to these models. It is known from practice that a small number of job sizes improves the result of job allocation, therefore we state a hypothesis relying on this fact in one of our models. Unfortunately, the reduction of the general problem (using K-means clustering) did not help, and thus the hypothesis has proved to be false. In the future, we shall look for clustering techniques which fit this goal better.
Article PDF
Similar content being viewed by others
Avoid common mistakes on your manuscript.
References
Gil, Y., Deelman, E., Ellisman, M., Fahringer, T., Fox, G., Gannon, D., Goble, C., Livny, M., Moreau, L., Myers, J.: Examining the challenges of scientific workflows. IEEE Comput. 40(12), 26–34 (2007)
Bacsó, G., Visegrádi, A., Kertesz, A., Németh, Z.: On efficiency of multi-job grid allocation based on statistical trace data. J. Grid Comput. (2013)
Garey, M.R., Johnson, D.S.: Computers and Intractability: A Guide to the Theory of NP-Completeness. W. H. Freeman & Co., New York (1979)
Lee, C.B., Schwartzman, Y., Hardy, J., Snavely, A.: Are User Runtime Estimates Inherently Inaccurate? vol. 3277, pp 253–263. Springer LNCS (2005)
Ramírez-Alcaraz, J.M., Tchernykh, A., Yahyapour, R., Schwiegelshohn, U., Quezada-Pina, A., Gonzalez-García, J.L., Hirales-Carbajal, A.: Job allocation strategies with user run time estimates for online scheduling in hierarchical grids. J. Grid Comput. 9(1), 95–116 (2011)
Schwiegelshohn, U., Tchernykh, A., Yahyapour, R.: Online scheduling in grids. In: 22nd IEEE International Symposium on Parallel and Distributed Processing (IPDPS), vol. 2008, pp 1–10 (2008)
Oprescu, A., Kielmann, T.: Bag-of-Tasks Scheduling under Budget Constraints. CloudCom, pp. 351–359 (2010)
Sgall, J.: On-line Scheduling - A Survey. Dagstuhl Seminar on On-Line Algorithms (Schloss Dagstuhl, Wadern, Germany, June 24-28, 1996), LNCS. Springer (1998)
Schuurman, P., Woeginger, G.: Approximation schemes – a tutorial. In: Möhring, R.H., Potts, C.N., Schulz, A.S., Woeginger, G.J., Wolsey, L.A. (eds.) Preliminary version of a chapter of the book ”Lectures on Scheduling” (to appear)
Arndt, O., Freisleben, B., Kielmann, T., Thilo, F.: A comparative study of on-line scheduling algorithms for networks of workstation. Clust. Comput. 3(2), 95–112 (2000)
Braun, T.D., Siegel, H.J., Beck, N., Bölöni, L.L., Maheswaran, M., Reuther, A.I., Robertson, J.P., et al.: A comparison of eleven static heuristics for mapping a class of independent tasks onto heterogeneous distributed computing systems. J. Parallel Distrib. Comput. 61(6), 810–837 (2001)
Ullman, J.D.: NP-complete scheduling problems. J. Comput. Syst. Sci. 10(3), 384–393 (1975)
Xu, C., Lau, F.C.M.: Load Balancing in Parallel Computers: Theory and Practice. Springer Science & Business Media (1997)
Graham, R.L.: Bounds on multiprocessing timing anomalies. SIAM J. Appl. Math. 17(2), 416–429 (1969)
Smith, W.E.: Various optimizers for single-stage production. Nav. Res. Logist. Q. 3, 5917 (1956)
Schrage, L.: A proof of the shortest remaining processing time processing discipline. Oper. Res. 16, 687170 (1968)
Hirales-Carbajal, A., Tchernykh, A., Yahyapour, R., Gonzalez-Garcia, J.L., Roblitz, T., Ramirez-Alcaraz, J.M.: Multiple workflow scheduling strategies with user run time estimates on a grid. J. Grid Comput. (2012)
Rényi, A.: Probability Theory. Elsevier (1970)
Silberstein, M., Sharov, A., Geiger, D., Schuster, A.: GridBot, execution of bags of tasks in multiple grids. In: Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis (SC ’09) (2009)
Saha, D., Menasce, D., Porto, S.: Static and dynamic processor scheduling disciplines in heterogeneous parallel architectures. J. Parallel Distrib. Comput. 28.1, 1–18 (1995)
Maheswaran, M., Ali, S., Siegal, H.J., Hensgen, D., Freund, R.F.: Dynamic matching and scheduling of a class of independent tasks onto heterogeneous computing systems. In: Proceedings of Heterogeneous Computing Workshop, 1999. (HCW’99), pp 30–44. IEEE (1999)
Casanova, H., et al.: Heuristics for scheduling parameter sweep applications in grid environments. In: Proceedings of 9th Heterogeneous Computing Workshop (HCW 2000). IEEE (2000)
Cirne, W., Paranhos, D., Costa, L., Santos-Neto, E., Brasileiro, F., Sauve, J., Silva, F.A.B., Barros, C.O., Silveira, C.: Running bag-of-tasks applications on computational grids: The mygrid approach, pp 407–416. IEEE (2003)
Da Silva, D.P., Cirne, W., Vilar Brasileiro, F.: Trading cycles for information: Using replication to schedule bag-of-tasks applications on computational grids. Euro-Par 2003 Parallel Processing, pp 169–180. Springer Berlin Heidelberg (2003)
Nielsen, F.: Chernoff information of exponential families. CoRR, arXiv: 1102.2684(2011)
SZTAKI Desktop Grid. http://szdg.lpds.sztaki.hu/szdg/ (2014)
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Bacsó, G., Kis, T., Visegrádi, Á. et al. A Set of Successive Job Allocation Models in Distributed Computing Infrastructures. J Grid Computing 14, 347–358 (2016). https://doi.org/10.1007/s10723-015-9347-6
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10723-015-9347-6