loading...
Probabilistic QoS Guarantees for Supercomputing Systems
Yokohama, Japan June 28-July 01
DOI Bookmark: http://doi.ieeecomputersociety.org/10.1109/DSN.2005.802005 International Conference on Depe ...
 This Article 
 
PDF
HTML
 
 Share 
   
 Bibliographic References 
   
 Add to: 
 
Digg
Furl
Spurl
Blink
Simpy
Google
Del.icio.us
Y!MyWeb
 
 Search 
   
A. J. Oliner, Massachusetts Institute of Technology
L. Rudolph, Massachusetts Institute of Technology
R. K. Sahoo, IBM T. J. Watson Research Center
J. E. Moreira, IBM T. J. Watson Research Center
M. Gupta, IBM T. J. Watson Research Center
Supercomputing systems must be able to reliably and efficiently complete their assigned workloads, even in the presence of failures. This paper proposes a system that allows the system and users to negotiate a mutually desirable risk strategy; in order to accomplish this, the system makes probabilistic guarantees on quality of service (QoS), of the form, "Job j can be completed by deadline d with probability p." In order to make such guarantees, the system uses event prediction (forecasting) in conjunction with fault-aware job scheduling and cooperative checkpointing strategies. Using job logs and failure traces from actual high performance computing systems, we employ trace-based simulations to assess the effects of the prediction accuracy (a) and user risk strategy (U) on a variety of performance metrics. Compared to a system that does not use event prediction, a high forecasting accuracy resulted in QoS and utilization improvements of as much as 6%, along with an 89% reduction in the amount of lost work. Therefore, our results show that a system that makes probabilistic QoS guarantees using a market-based scheduling approach can increase both system performance and reliability.
Citation:
A. J. Oliner, L. Rudolph, R. K. Sahoo, J. E. Moreira, M. Gupta, "Probabilistic QoS Guarantees for Supercomputing Systems," dsn, pp.634-643, 2005 International Conference on Dependable Systems and Networks (DSN'05), 2005
Usage of this product signifies your acceptance of the Terms of Use.