Abstract
Checkpointing uses stable storage available in the distributed system for saving the consistent states of processes to which they can rollback at the time of recovery. But the checkpointing techniques for wired and cellular mobile systems are not trivially applicable to ad hoc networks as these networks have limited stable storage and wireless links are of low bandwidth. Moreover if synchronous checkpointing is employed, the processes contend for these limited resources at the time of checkpointing. This paper addresses the application of checkpointing to ad hoc networks and proposes a staggered approach to avoid simultaneous contention for resources. The staggering causes events, which would normally happen at the same time, to start or happen at different times. The proposed protocol does not need FIFO channels and logs minimum number of messages. It supports concurrent checkpoint initiation and successfully handles the overlapping failures in ad hoc networks.
Access provided by Autonomous University of Puebla. Download to read the full chapter text
Chapter PDF
Similar content being viewed by others
References
Elnozahi, E.N., Alvisi, L., Wang, Y.M., Johnson, D.B.: A survey of rollback-recovery protocols in message-passing systems. ACM Computing Surveys 34(3), 375–408 (2002)
Norman, A.N., Choi, S.E., Lin,C.: Compiler-generated staggered checkpointing. In: Proc. 7th ACM Workshop on Languages, Compilers, and Run-time Support for Scalable Systems LCR 2004, pp. 1–8 (2004)
Chandy, K.M., Lamport, L.: Distributed snapshots: determining global states of distributed systems. ACM Transactions on Computer Systems 3(1), 63–75 (1985)
Plank, J.S.: Efficient checkpointing on MIMD architectures, Ph.D. dissertation, Dept. of Computer Science, Princeton Univ. (1993)
Vaidya, N.H.: Staggered consistent checkpointing. IEEE Transactions on Parallel and distributed Systems 10(7), 694–702 (1999)
Jin, H., Hwang, K.: Distributed checkpointing on clusters with dynamic striping and staggering. In: Jean-Marie, A. (ed.) ASIAN 2002. LNCS, vol. 2550, pp. 19–33. Springer, Heidelberg (2002)
Hwang, K., Jin, H., Ho, R., Ro, W.: Reliable cluster computing with a new checkpointing RAID-x architecture. In: Proc. 9th Workshop on Heterogeneous Computing HCW 2000, Cancun, Mexico, pp. 171–184 (2000)
Ahn, J.: An efficient algorithm for removing useless logged messages in SBML protocols. In: Chakraborty, G. (ed.) ICDCIT 2005. LNCS, vol. 3816, pp. 166–171. Springer, Heidelberg (2005)
Koo, R., Toueg, S.: Checkpointing and rollback-recovery for distributed systems. IEEE Transactions on Software Engineering SE-13(1), 23–31 (1987)
Spezialetti, M., Kearns, P.: Efficient distributed snapshots. In: Proc. 6th IEEE International Conference on Distributed Computing Systems, pp. 382–388 (1986)
Prakash, R., Singhal, M.: Maximal global snapshot with concurrent initiators. In: Proc. 6th IEEE Symposium on Parallel and Distributed Processing, pp. 344–351 (1994)
Mandal, P.S., Mukhopadhyay, K.: Concurrent checkpoint initiation and recovery algorithms on asynchronous ring networks. Journal of Parallel and Distributed Computing 64(5), 649–661 (2004)
Manivannan, D., Jiang, Q., Yang, J., Persson, K.E., Singhal, M.: An asynchronous recovery algorithm based on a staggered quasi-synchronous checkpointing algorithm. In: Pal, A., Kshemkalyani, A.D., Kumar, R., Gupta, A. (eds.) IWDC 2005. LNCS, vol. 3741, pp. 117–128. Springer, Heidelberg (2005)
Jiang, Q., Manivannan, D.: An optimistic checkpointing and selective message logging approach for consistent global checkpoint collection in distributed systems. In: Proc. IEEE International Parallel and Distributed Processing Symposium, pp. 1–10 (2007)
Men, C., Xu, Z., Li, X.: An Efficient Checkpointing and Rollback Recovery Scheme for Cluster-Based Multi-channel Ad Hoc Wireless Networks. In: Proc. of the 2008 IEEE International Symposium on Parallel and Distributed Processing with Applications (ISPA 2008), pp. 371–378. IEEE Computer Society, Washington, DC, USA (2008)
Riva, O., Nzouonta, J., Borcea, C.: Context-aware fault tolerance in migratory services. In: Proceedings of the 5th Annual International Conference on Mobile and Ubiquitous Systems: Computing, Networking, and Services (Mobiquitous 2008). ICST (Institute for Computer Sciences, Social-Informatics and Telecommunications Engineering), ICST, Brussels, article 22 (2008)
Ono, M., Higaki, H.: Consistent Checkpoint Protocol for Wireless Ad-hoc Networks. In: The 2007 International Conference on Parallel and Distributed Processing Techniques and Applications, Las Vegas, Nevada, USA, pp. 1041–1046 (2007)
Juang, T.T., Liu, M.C.: An Efficient Asynchronous Recovery Algorithm In Wireless Mobile Ad Hoc Networks. J. of Internet Technology 4, 143–152 (2002)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2011 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Jaggi, P.K., Singh, A.K. (2011). Staggered Checkpointing and Recovery in Cluster Based Mobile Ad Hoc Networks. In: Nagamalai, D., Renault, E., Dhanuskodi, M. (eds) Advances in Parallel Distributed Computing. PDCTA 2011. Communications in Computer and Information Science, vol 203. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-24037-9_13
Download citation
DOI: https://doi.org/10.1007/978-3-642-24037-9_13
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-24036-2
Online ISBN: 978-3-642-24037-9
eBook Packages: Computer ScienceComputer Science (R0)