loading...
Adaptive and Fault Tolerant Simulation of Relativistic Particle Transport with Data-Level Checkpointing
July 16-July 18
DOI Bookmark: http://doi.ieeecomputersociety.org/10.1109/CSE.2008.542008 11th IEEE International Conferen ...
 This Article 
 
PDF
HTML
 
 Share 
   
 Bibliographic References 
   
 Add to: 
 
Digg
Furl
Spurl
Blink
Simpy
Google
Del.icio.us
Y!MyWeb
 
 Search 
   
Many scientific applications exhibit high demands on memory storage and computing capability. Recent improvements in commodity processors and networks have provided an opportunity to support such scientific applications within an everyday computing infrastructure. Good applications need the ability to work in constantly changing environments. Adaptability and fault tolerance are essential. Based on simulation of relativistic particle transport, this paper proposes a data-level checkpointing scheme for common scientific applications. This scheme takes advantage of the regular program layout, dominant computing loops, and fine-grained iterations. Without handling stack and heap segments directly, only application data is saved and restored as the computation state. Checkpointing interval can be dynamically adjusted to satisfy sensitivity and efficiency requirements for feasible fault tolerance. With this periodic but fixed-location checkpointing scheme, the MPI-based simulation system can be reconfigured by being shut down first and then restarted on same or different computer clusters. Application data can be redistributed for the new configuration. Experimental results have demonstrated this scheme's efficiency and effectiveness.
Index Terms:
Simulation, Fault Tolerance, Reconfiguration, Checkpointing, Relativistic Particle Transport
Citation:
Ruipeng Li, Hai Jiang, Hung-Chi Su, Bin Zhang, Jeff Jenness, "Adaptive and Fault Tolerant Simulation of Relativistic Particle Transport with Data-Level Checkpointing," cse, pp.345-352, 2008 11th IEEE International Conference on Computational Science and Engineering, 2008
Usage of this product signifies your acceptance of the Terms of Use.