loading...
Cloning-Based Checkpoint for Localized Recovery
Las Vegas, Nevada, USA December 07-December 09
DOI Bookmark: http://doi.ieeecomputersociety.org/10.1109/ISPAN.2005.268th International Symposium on Parall ...
 This Article 
 
PDF
HTML
 
 Share 
   
 Bibliographic References 
   
 Add to: 
 
Digg
Furl
Spurl
Blink
Simpy
Google
Del.icio.us
Y!MyWeb
 
 Search 
   
Zunce Wei, Concordia University
Hon F. Li, Concordia University
Dhrubajyoti Goswami, Concordia University
This paper studies the use of process clones towards localizing recovery in large-scale distributed systems. A clone is a virtual recovery process with a limited life, and is useful for decoupling recovery dependencies among checkpoints. A generic Checkpoint Dependency Graph (CDG) model is used to capture the dependency relations among checkpoints. A Non-atomic Group Checkpoint (NGC) protocol is presented. It is proved that the protocol can result in localized recovery involving a single group when clones are employed. To limit recovery spread, the size of a group should be limited. This paper presents a few interesting results in this aspect: (i) there is no embedded protocol for atomic group formation with a bounded group-size (k-bounded protocol); (ii) a k-bounded atomic group checkpoint protocol requires at least m-1 explicit messages for checkpoint synchronization in a system consisting of m processes. Lastly, a simple k-bounded atomic group checkpoint protocol is presented and proved.
Citation:
Zunce Wei, Hon F. Li, Dhrubajyoti Goswami, "Cloning-Based Checkpoint for Localized Recovery," ispan, pp.174-181, 8th International Symposium on Parallel Architectures,Algorithms and Networks (ISPAN'05), 2005
Usage of this product signifies your acceptance of the Terms of Use.