loading...
A Hierarchical Checkpointing Protocol for Parallel Applications in Cluster Federations
Santa Fe, New Mexico April 26-April 30
DOI Bookmark: http://doi.ieeecomputersociety.org/10.1109/IPDPS.2004.130324218th International Parallel and Distr ...
 This Article 
 
PDF
HTML
 
 Share 
   
 Bibliographic References 
   
 Add to: 
 
Digg
Furl
Spurl
Blink
Simpy
Google
Del.icio.us
Y!MyWeb
 
 Search 
   
Christine Morin, IRISA/INRIA
Ramamurthy Badrinath, Hewlett-Packard ISO
Code coupling applications can be divided into communicating modules, that may be executed on different clusters in a cluster federation. As a cluster federation comprises of a large number of nodes, there is a high probability of a node failure. We propose a hierarchical checkpointing protocol that combines a synchronized checkpointing technique inside clusters and a communication-induced technique between clusters. This protocol fits to the characteristics of a cluster federation (large number of nodes, high latency and low bandwidth networking technologies between clusters). A preliminary performance evaluation performed using a discrete event simulator shows that the protocol is suitable for code coupling applications.
Index Terms:
Cluster Federation, Checkpointing and Recovery, Fault-tolerance, Parallel Application, Code Coupling
Citation:
Sébastien Monnet, Christine Morin, Ramamurthy Badrinath, "A Hierarchical Checkpointing Protocol for Parallel Applications in Cluster Federations," ipdps, vol. 12, pp.211a, 18th International Parallel and Distributed Processing Symposium (IPDPS'04) - Workshop 11, 2004
Usage of this product signifies your acceptance of the Terms of Use.