Abstract
We present a process management system for parallel programs such as those written using MPI. A primary goal of the system, which we call MPD (for multipurpose daemon), is to be scalable. By this we mean that startup of interactive parallel jobs comprising a thousand processes is quick, that signals can be quickly delivered to processes, and that stdin, stdout, and stderr are managed intuitively. Our primary target is parallel machines made up of clusters of SMPs, but the system is also useful in more tightly integrated environments. We describe how MPD enables much faster startup and better runtime management of MPICH jobs. We show how close control of stdio can support the easy implementation of a number of convenient system utilities, even a parallel debugger. MPD is implemented and freely distributed with MPICH.
This work was supported by the Mathematical, Information, and Computational Sciences Division subprogram of the Office of Advanced Scientific Computing Research, U.S. Department of Energy, under Contract W-31-109-Eng-38.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Chiba City home page. http://www.mcs.anl.gov/chiba
The Maui scheduler home page. http://maui-scheduler.mhpcc.edu/newdoc, http://www.mhpcc.edu/maui.
M. A. Baker, G. C. Fox, and H. W. Yau. Review of cluster management software. NHSE Review, 1(1), May 1996.
Amnon Barak, Shai Guday, and Richard G. Wheeler. The MOSIX distributed operating system: Load balancing for UNIX, volume 672 of Lecture Notes in Computer Science. Springer-Verlag, New York, 1993.
Micah Beck, Jack J. Dongarra, Graham E. Fagg, G. Al Geist, Paul Gray, James Kohl, Mauro Migliardi, Keith Moore, Terry Moore, Philip Papadopoulous, Stephen L. Scott, and Vaidy Sunderam. HARNESS: A next generation distributed virtual machine. International Journal on Future Generation Computer Systems, 15(5/6), 1999.
Greg Burns, Raja Daoud, and James Vaigl. LAM: An open cluster environment for MPI. In John W. Ross, editor, Proceedings of Supercomputing Symposium’ 94, pages 379–386. University of Toronto, 1994.
Ralph Butler and Ewing Lusk. Monitors, messages, and clusters: The p4 parallel programming system. Parallel Computing, 20:547–564, April 1994.
DQS home page. http://www.scri.fsu.edu/~pasko/dqs.html.
I. Foster and C. Kesselman, editors. The Grid: Blueprint for a New Computing Infrastructure. Morgan Kaufmann, 1999.
Al Geist, Adam Beguelin, Jack Dongarra, Weicheng Jiang, Bob Manchek, and Vaidy Sunderam. PVM: Parallel Virtual Machine—A User’s Guide and Tutorial for Network Parallel Computing. MIT Press, Cambridge, Mass., 1994.
Douglas P. Ghormley, David Petrou, Steven H. Rodrigues, Amin M. Vahdat, and Thomas E. Anderson. GLUnix: A Global Layer Unix for a network of workstations. Software—Practice and Experience, 28(9):929–961, July 1998.
William Gropp and Ewing Lusk. Scalable Unix tools on parallel processors. In Proceedings of the Scalable High-Performance Computing Conference, pages 56–62. IEEE Computer Society Press, 1994.
William Gropp, Ewing Lusk, Nathan Doss, and Anthony Skjellum. A high-performance, portable implementation of the MPI Message-Passing Interface standard. Parallel Computing, 22(6):789–828, 1996.
IBM. Loadleveler: Using and Administering, version 2 release 1 edition, November 1998. SA22-7311-00.
M. J. Litzkow, M. Livny, and M. W. Mutka. Condor-A hunter of idle workstations. In Proc. 8th Intl. Conf. on Distributed Computing Systems, pages 104–111, San Jose, Calif., June 1988.
M. Migliardi and V. Sunderam. PVM emulation in the Harness metacomput-ing system: A plug-in based approach. In J.J. Dongarra, E. Luque, and Tomas Margalef, editors, Recent advances in parallel virtual machine and message passing interface: 6th European PVM/MPI Users’ Group Meeting, Barcelona, Spain, September 26–29, 1999: Proceedings, volume 1697 of Lecture Notes in Computer Science, pages 117–124, Berlin, 1999. Springer-Verlag.
PBS home page. http://pbs.mrj.com/.
Load Sharing Facility (LSF). http://www.platform.com.
J. Pruyne and M. Livny. Interfacing Condor and PVM to harness the cycles of workstation clusters. Future Generation Computer Systems, 12(1):67–85, May 1996.
Andrew S. Tanenbaum. Computer Networks. Prentice Hall, third edition, 1996.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2000 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Butler, R., Gropp, W., Lusk, E. (2000). A Scalable Process-Management Environment for Parallel Programs. In: Dongarra, J., Kacsuk, P., Podhorszki, N. (eds) Recent Advances in Parallel Virtual Machine and Message Passing Interface. EuroPVM/MPI 2000. Lecture Notes in Computer Science, vol 1908. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-45255-9_25
Download citation
DOI: https://doi.org/10.1007/3-540-45255-9_25
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-41010-2
Online ISBN: 978-3-540-45255-3
eBook Packages: Springer Book Archive