Abstract
We present in this paper a study on fault management in a grid middleware. The middleware is our home-grown software called P2P-MPI. This framework is MPJ compliant, allows users to execute message passing parallel programs, and its objective is to support environments using commodity hardware. Hence, running programs is failure prone and a particular attention must be paid to fault management. The fault management covers two issues: fault-tolerance and fault detection. Fault-tolerance deals with the program execution: P2P-MPI provides a transparent fault tolerance facility based on replication of computations. Fault detection concerns the monitoring of the program execution by the system. The monitoring is done through a distributed set of modules called failure detectors. The contribution of this paper is twofold. The first contribution is the evaluation of the failure probability of an application depending on the replication degree. The failure probability depends on the execution length, and we propose a model to evaluate the duration of a replicated parallel program. Then, we give an expression of the replication degree required to keep the failure probability of an execution under a given threshold. The second contribution is a study of the advantages and drawbacks of several fault detection systems found in the literature. The criteria of our evaluation are the reliability of the failure detection service and the failure detection speed. We retain the binary round-robin protocol for its failure detection speed, and we propose a variant of this protocol which is more reliable than the application execution in any case. Experiments involving of up to 256 processes, carried out on Grid’5000, show that the real detection times closely match the predictions.
Article PDF
Similar content being viewed by others
Avoid common mistakes on your manuscript.
References
Alvisi, L., Marzullo, K.: Message logging: pessimistic, optimistic, and causal. In: Proceeding of the 15th International Conference on Distributed Computing Systems (ICDCS’95), pp. 229–236 (1995)
Amdahl, G.M.: Validity of the single processor approach to achieving large scale computing capabilities. In: Proceedings of AFIPS 1967 Spring Joint Computer Conference, vol. 30, pp. 483–485, (1967)
Baker, M., Carpenter, B., Shafi, A.: MPJ express: towards thread safe java HPC. In: CLUSTER. IEEE (2006)
Batchu R., Dandass Y.S., Skjellum A., Beddhu M.: MPI/FT: a model-based approach to low-overhead fault tolerant message-passing middleware. Clust. Comput. 7(4), 303–315 (2004)
Bornemann, M., van Nieuwpoort, R.V., Kielmann, T.: MPJ/Ibis: a flexible and efficient message passing platform for java. In: Euro PVM/MPI 2005 (2005)
Bouteiller, A., Cappello, F., Hérault, T., Krawezik, G., Lemarinier, P., Magniette, F.: Mpich-v2: a fault tolerant mpi for volatile nodes based on pessimistic sender based message logging. In: Proceedings of the ACM/IEEE SC2003 Conference on High Performance Networking and Computing, p. 25. ACM (2003)
Cappello F., Djilali S., Fedak G., Hérault T., Magniette F., Néri V., Lodygensky O.: Computing on large-scale distributed systems: Xtremweb architecture, programming models, security, tests and convergence with grid. Future Generation Comp. Syst. 21(3), 417–437 (2005)
Carpenter B., Getov V., Judd G., Skjellum A., Fox G.: Mpj: Mpi-like message passing for java. Concurr. Pract. Exp. 12(11), 1019–1038 (2000)
Chandra T.D., Toueg S.: Unreliable failure detectors for reliable distributed systems. J. ACM 43(2), 225–267 (1996)
Char B.W., Geddes K.O., Gonnet G.H., Monagan M.B., Watt S.M.: MAPLE reference manual. University of Waterloo, Waterloo Maple Software, Waterloo (1989)
Cirne W., Brasileiro F.V., Andrade N., Costa L., Andrade A., Novaes R., Mowbray M.: Labs of the world, unite!!!. J. Grid Comput. 4(3), 225–246 (2006)
Défago X., Schiper A., Urbán P.: Total order broadcast and multicast algorithms: taxonomy and survey. ACM Comput. Surv. 36(4), 372–421 (2004)
Felber, P., Defago, X., Guerraoui, R., Oser, P.: Failure detectors as first class objects. In: Proceeding of the 9th IEEE Intl. Symposium on Distributed Objects and Applications (DOA’99), pp. 132–141, (1999)
Genaud S., Rattanapoka C.: P2P-MPI: a peer-to-peer framework for robust execution of message passing parallel programs on grids. J. Grid Comput. 5(1), 27–42 (2007)
Nurmi D., Brevik J., Wolski R.: Modeling machine availability in enterprise and wide-area distributed computing environments. In: Cunha, J.C., Medeiros, P.D. (eds) Euro-Par, volume 3648 of Lecture Notes in Computer Science, pp. 432–441. Springer, Berlin (2005)
Ranganathan S., George A.D., Todd R.W., Chidester M.C.: Gossip-style failure detection and distributed consensus for scalable heterogeneous clusters. Clust. Comput. 4(3), 197–209 (2001)
Renesse, R.V., Minsky, Y., Hayden, M.: A gossip-style failure detection service. In: IFIP International Conference on Distributed Systems Platforms and Open Distributed Middleware, pp. 55–70, England, (1998)
Sankaran S., Squyres J.M., Barrett B., Lumsdaine A., Duell J., Hargrove P., Roman E.: The LAM/MPI checkpoint/restart framework: system-initiated checkpointing. Int. J. High Perform. Comput. Appl. 19(4), 479–493 (2005)
Schneider F.B.: Replication management using the state machine approach, Chapter 7, pp. 169–195. ACM Press, New York (1993)
Shudo, K., Tanaka, Y., Sekiguchi, S.: P3: P2P-based middleware enabling transfer and aggregation of computational resource. In: 5th International Workshop on Global and Peer-to-Peer Computing. IEEE, (2005)
Snir M., Otto S.W., Walker D.W., Dongarra J., Huss-Lederman S.: MPI: the complete reference. MIT Press, Cambridge (1995)
Stellner, G.: CoCheck: checkpointing and process migration for MPI. In: Proceedings of the 10th International Parallel Processing Symposium (IPPS’96), pp. 526–531 (1996)
Nieuwpoort R., Maassen J., Wrzesinska G., Hofman R.F.H., Jacobs C.J.H., Kielmann T., Bal H.E.: Ibis: a flexible and efficient java-based grid programming environment. Concurr. Pract. Exp. 17(7-8), 1079–1107 (2005)
Walters J.P., Chaudhary V.: A scalable asynchronous replication-based strategy for fault tolerant MPI applications. In: Aluru, S., Parashar, M., Badrinath, R., Prasanna, V.K. (eds) HiPC, volume 4873 of Lecture Notes in Computer Science, pp. 257–268. Springer, Berlin (2007)
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Genaud, S., Jeannot, E. & Rattanapoka, C. Fault-Management in P2P-MPI. Int J Parallel Prog 37, 433–461 (2009). https://doi.org/10.1007/s10766-009-0115-8
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10766-009-0115-8