loading...
Extending a Cluster SSI OS for Transparently Checkpointing Message-Passing Parallel Application
Las Vegas, Nevada, USA December 07-December 09
DOI Bookmark: http://doi.ieeecomputersociety.org/10.1109/ISPAN.2005.468th International Symposium on Parall ...
 This Article 
 
PDF
HTML
 
 Share 
   
 Bibliographic References 
   
 Add to: 
 
Digg
Furl
Spurl
Blink
Simpy
Google
Del.icio.us
Y!MyWeb
 
 Search 
   
Matthieu Fertre, PARIS project team IRISA/INRIA, France
Christine Morin, PARIS project team IRISA/INRIA, France
Nowadays, clusters are widely used to execute scientific applications. These applications are often messagepassing parallel applications with long execution times. Since the number of nodes in clusters is growing, faults are more frequent. Thus the application execution time may be greater than the mean time before failure (MTBF) of the cluster. To avoid restarting application from the beginning, it is desirable that cluster systems provide some fault tolerant mechanisms such as checkpoint/restart. An approach to implement efficiently this mechanism is to implement it directly in the application or in the communication library. Another approach is to implement it in an operating system dedicated to clusters. This is more complex but let you checkpoint/restart any message-passing application whatever the communication library. This paper presents basic mechanisms for system initiated checkpoint of message-passing parallel applications running on top of a cluster. Performance results obtained from a prototype implemented in KERRIGHED Single Sytem Image cluster Operating System based on LINUX are presented.
Index Terms:
single system image, checkpointing, parallel application, global coordination.
Citation:
Matthieu Fertre, Christine Morin, "Extending a Cluster SSI OS for Transparently Checkpointing Message-Passing Parallel Application," ispan, pp.364-369, 8th International Symposium on Parallel Architectures,Algorithms and Networks (ISPAN'05), 2005
Usage of this product signifies your acceptance of the Terms of Use.


Suggestions