loading...
Ensembles of Models for Automated Diagnosis of System Performance Problems
Yokohama, Japan June 28-July 01
DOI Bookmark: http://doi.ieeecomputersociety.org/10.1109/DSN.2005.442005 International Conference on Depe ...
 This Article 
 
PDF
HTML
 
 Share 
   
 Bibliographic References 
   
 Add to: 
 
Digg
Furl
Spurl
Blink
Simpy
Google
Del.icio.us
Y!MyWeb
 
 Search 
   
Steve Zhang, Stanford University
Ira Cohen, Hewlett Packard Research Labs
Moises Goldszmidt, Hewlett Packard Research Labs
Julie Symons, Hewlett Packard Research Labs
Armando Fox, Stanford University
Violations of service level objectives (SLO) in Internet services are urgent conditions requiring immediate attention. Previously we explored [1] an approach for identifying which low-level system properties were correlated to high-level SLO violations (the metric attributionproblem). The approach is based on automatically inducing models from data using pattern recognition and probability modeling techniques. In this paper we extend our approach to adapt to changing workloads and external disturbances by maintaining an ensemble of probabilistic models, adding new models when existing ones do not accurately capture current system behavior. Using realistic workloads on an implemented prototype system, we show that the ensemble of models captures the performance behavior of the system accurately under changing workloads and conditions. We fuse information from the models in the ensemble to identify likely causes of the performance problem, with results comparable to those produced by an oracle that continuously changes the model based on advance knowledge of the workload. The cost of inducing new models and managing the ensembles is negligible, making our approach both immediately practical and theoretically appealing.
Index Terms:
Automated diagnosis, self-healing and selfmonitoring systems, statistical induction and Bayesian Model Management
Citation:
Steve Zhang, Ira Cohen, Moises Goldszmidt, Julie Symons, Armando Fox, "Ensembles of Models for Automated Diagnosis of System Performance Problems," dsn, pp.644-653, 2005 International Conference on Dependable Systems and Networks (DSN'05), 2005
Usage of this product signifies your acceptance of the Terms of Use.


Suggestions