loading...
Application-Driven Coordination-Free Distributed Checkpointing
Columbus, Ohio, USA June 06-June 10
DOI Bookmark: http://doi.ieeecomputersociety.org/10.1109/ICDCS.2005.1425th IEEE International Conference on ...
 This Article 
 
PDF
HTML
 
 Share 
   
 Bibliographic References 
   
 Add to: 
 
Digg
Furl
Spurl
Blink
Simpy
Google
Del.icio.us
Y!MyWeb
 
 Search 
   
Adnan Agbaria, University of Illinois at Urbana-Champaign
William H. Sanders, University of Illinois at Urbana-Champaign
Distributed checkpointing is an important concept in providing fault tolerance in distributed systems. In today?s applications, e.g., grid and massively parallel applications, the imposed overhead of taking a distributed checkpoint using the known approaches can often outweigh its benefits due to coordination and other overhead from the processes. This paper presents an innovative approach for distributed checkpointing. In this approach, the checkpoints are obtained using offline analysis based on the application level. During execution, no coordination is required. After presenting our approach, we prove its safety and present a performance analysis of it using stochastic models.
Citation:
Adnan Agbaria, William H. Sanders, "Application-Driven Coordination-Free Distributed Checkpointing," icdcs, pp.177-186, 25th IEEE International Conference on Distributed Computing Systems (ICDCS'05), 2005
Usage of this product signifies your acceptance of the Terms of Use.