Abstract
Traditionally, PVM and MPI programs live on message passing systems, from clusters of non-dedicated workstations to MPP machines. The performance of a parallel program in such an environment is usually determined by the single least performing task in that program. In a homogeneous, stable environment, such as an MPP machine, this can only be repaired by improving the workload balance between the individual tasks. In a cluster of workstations, differences in the performance of individual nodes and network components can be an important cause of imbalance. Moreover, these differences will be time dependent as the load generated by other users plays an important role. Worse yet, nodes may be dynamically removed from the available pool of workstations. In such a dynamically changing environment, redistributing tasks over the available nodes can help to maintain the performance of individual programs and of the pool as a whole. Condor [1] solves this task migration problem for sequential programs. However, the migration of tasks in a parallel program presents a number of additional challenges, for the migrator as well as for the scheduler. For PVM programs, there are a number of solutions, including Dynamite [2]; Hector [3] was designed to migrate MPI tasks and to checkpoint complete MPI programs. The latter capability is very desirable for long-running programs in an unreliable environment.
This brings us to the Grid, where both performance and availability of resources vary dynamically and where reliability is an important issue. Once again, Livny with his Condor-G [4] provides a solution for sequential programs, including provisions for fault-tolerance. In the Polder Metacomputer Project, based on our experience with Dynamite, we are currently investigating the additional challenges in creating a task-migration and checkpointing capability for the Grid environment. This includes the handling of shared resources, such as open files; differences in administrative domains, etc. Eventually, the migration of parallel programs will allow large parallel applications to surf the Grid and ride the waves in this highly dynamic environment.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
M. Litzkow, T. Tannenbaum, J. Basney, M. Livny, Checkpoint and migration of Unix processes in the Condor distributed processing system, Technical Report 1346, University of Wisconsin, WI, USA, 1997.
K. A. Iskra, F. van der Linden, Z. W. Hendrikse, G. D. van Albada, B. J. Overein-der, P. M. A. Sloot, The implementation of Dynamite-an environment for migrating PVM tasks, Operating Systems Review, vol. 34, nr 3 pp. 40–55. Association for Computing Machinery, Special Interest Group on Operating Systems, July 2000.
J. Robinson, S. H. Russ, B. Flachs, B. Heckel, A task migration implementation of the Message Passing Interface, Proceedings of the 5th IEEE international symposium on high performance distributed computing, 61–68, 1996.
J. Frey, T. Tannenbaum, I. Foster, M. Livny, S. Tuecke, Condor-G: A Computation Management Agent for Multi-Institutional Grids, Proceedings of the Tenth IEEE Symposium on High Performance Distributed Computing (HPDC10) San Francisco, California, August 7–9, 2001.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2002 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
van Albada, D., Sloot, P. (2002). Surfing the Grid - Dynamic Task Migration in the Polder Metacomputer Project. In: Kranzlmüller, D., Volkert, J., Kacsuk, P., Dongarra, J. (eds) Recent Advances in Parallel Virtual Machine and Message Passing Interface. EuroPVM/MPI 2002. Lecture Notes in Computer Science, vol 2474. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-45825-5_3
Download citation
DOI: https://doi.org/10.1007/3-540-45825-5_3
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-44296-7
Online ISBN: 978-3-540-45825-8
eBook Packages: Springer Book Archive