Abstract
The well-known coordinated snapshot algorithm of mutable checkpointing [7,8,9] is studied. We equip it with a concise formal model and analyze its operational behavior via an invariant characterizing the snapshot computation. By this we obtain a clear understanding of the intermediate behavior and a correctness proof of the final snapshot based on a strong notion of consistency (reachability within the partial order representing the underlying computation). The formal model further enables a comparison with the blocking queue algorithm [13] introduced for the same scenario and with the same objective.
From a broader perspective, we advocate the use of formal semantics to formulate and prove correctness of distributed algorithms.
Chapter PDF
Similar content being viewed by others
References
Agbaria, A., Friedman, R.: Model-based performance evaluation of distributed checkpointing protocols. Performance Evaluation 65 (2008)
Aggarwal, D., Kiehn, A.: Analyzing Mutable Checkpointing via Invariants (full version). Technical Report IIIT Delhi, No IIITD-TR-2015-008 (2015)
Andriamiarina, M.B., Mery, D., Singh, N.K.: Revisiting snapshot algorithms by refinement-based techniques. Computer Science and Information Systems 11 (2014)
Babaoglu, Ö., Marzullo, K.: Consistent global states of distributed systems: Fundamental concepts and mechanisms. In: Distributed Systems. ACM Press/Addison-Wesley (1993)
Boutellier, A., Lemarinier, P., Krawezik, G., Cappello, F.: Coordinated checkpoint versus message log for fault tolerant MPI. In: IEEE Inernational Conference on Cluster Computing, Cluster 2013 (2012)
Cansell, D., Méry, D.: The Event-B modelling method - concepts and case studies. In: Bjorner, D., Henson, M. (eds.) Logics of Specification Languages, Springer, Heidelberg (2008)
Cao, G., Singhal, M.: On coordinated checkpointing in distributed systems. IEEE Transactions on Parallel and Distributed Systems 9(12), 1213–1225 (1998)
Cao, G., Singhal, M.: Mutable checkpoints: a new checkpointing approach for mobile computing systems. IEEE Transactions on Parallel and Distributed Systems 12(2), 157–172 (2001)
Cao, G., Singhal, M.: Checkpointing with mutable checkpoints. Theoretical Computer Science 290(2), 1127–1148 (2003)
Chandy, K.M., Lamport, L.: Distributed snapshots: determining global states of distributed systems. ACM Transactions on Computer Systems (TOCS) 3(1), 63–75 (1985)
Elliot, J., Kharbas, K., Fiala, D., Mueller, F., Ferreira, K., Engelmann, C.: Combining partial redundancy and checkpointing for HPC. In: IEEE Distributed Computing Systems, ICDCS 2012 (2012)
Jiang, Q., Luo, Y., Manivannan, D.: An optimistic checkpointing and message logging approach for consistent global checkpoint collection in distributed systems. Journal of Parallel Distributed Computing, 68 (2008)
Kiehn, A., Raj, P., Singh, P.: A causal checkpointing algorithm for mobile computing environments. In: Chatterjee, M., Cao, J.-n., Kothapalli, K., Rajsbaum, S. (eds.) ICDCN 2014. LNCS, vol. 8314, pp. 134–148. Springer, Heidelberg (2014)
Kshemkalyani, A.D., Singhal, M.: Distributed Computing: Principles, Algorithms, and Systems. Cambridge University Press (2008)
Lai, T.H., Tao, H.: Yang.: On distributed snapshots. Information Processing Letters 25 (1987)
Ljubuncic, I., Giri, R., Rozenfeld, A., Goldis, A.: Be kind, rewind: checkpoint & restore capability for improving reliability of large-scale semiconductor design. In: IEEE high Performance Extreme Computing Conference, HPEC 2014 (2014)
Raynal, M.: Distributed Algorithms for Message-Passing Systems. Springer (2013)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2015 IFIP International Federation for Information Processing
About this paper
Cite this paper
Aggarwal, D., Kiehn, A. (2015). Analyzing Mutable Checkpointing via Invariants. In: Dastani, M., Sirjani, M. (eds) Fundamentals of Software Engineering. FSEN 2015. Lecture Notes in Computer Science(), vol 9392. Springer, Cham. https://doi.org/10.1007/978-3-319-24644-4_12
Download citation
DOI: https://doi.org/10.1007/978-3-319-24644-4_12
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-24643-7
Online ISBN: 978-3-319-24644-4
eBook Packages: Computer ScienceComputer Science (R0)