Abstract
This article is written in the context of running a suite of time-critical operational numerical weather prediction batch jobs, along with a substantial number of research batch jobs on a large IBM Cluster 1600 system. The batch subsystem used is IBM’s LoadLeveler incorporating a little known feature called Resource Reservation.
The article describes how the mixture of operational and research parallel batch jobs are scheduled to run on the 117 nodes provided, and how Resource Reservation for operational jobs is performed without reference to job class. Where research parallel batch jobs are jobs requesting more than 1 CPU and must run consistently to ensure resources are released predictably. Note – information is given explaining how consistent runtimes are achieved.
Access provided by Autonomous University of Puebla. Download to read the full chapter text
Chapter PDF
Similar content being viewed by others
Keywords
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2005 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Holt, G. (2005). Time-Critical Scheduling on a Well Utilised HPC System at ECMWF Using Loadleveler with Resource Reservation. In: Feitelson, D.G., Rudolph, L., Schwiegelshohn, U. (eds) Job Scheduling Strategies for Parallel Processing. JSSPP 2004. Lecture Notes in Computer Science, vol 3277. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11407522_6
Download citation
DOI: https://doi.org/10.1007/11407522_6
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-25330-3
Online ISBN: 978-3-540-31795-1
eBook Packages: Computer ScienceComputer Science (R0)