Hui Li, Leiden University, The Netherlands
Juan Chen, Leiden University, The Netherlands
Ying Tao, Leiden University, The Netherlands
David Gro, National Institute for Nuclear and High Energy Physics (NIKHEF), The Netherlands
Local learning has been proposed as a common framework to predict both application run times and queue wait times based on workload traces [8]. The queue wait time is shown to be more difficult and expensive to predict because its distance calculations typically involve not only job attributes but also resource states. In this paper methods and algorithms are investigated to improve prediction accuracy and prediction performance for queue wait times. Firstly, the so-called "local tuning" is adopted to tune parameters for each training subset divided by a pivot attribute (e.g., group or queue name). Bias-variance analysis of error is conducted on local tuning and its global counterparts - tuning parameters on the whole training set. A method is then developed to select tuning type adaptively based on the generalization error and bias-variance decomposition. Secondly, an efficient search tree structure called "M-Tree" is integrated into our algorithm to speed up k-nearest neighbor search. Experimental studies are conducted to evaluate the proposed methods and algorithms using real-world workload traces, which are collected from the NIKHEF production cluster on the LHC Computing Grid and Blue Horizon in the San Diego Supercomputer Center (SDSC). The results show that adaptive tuning can reduce the average prediction error by 3 to 10 percents compared to global tuning, and that the M-Tree nearest neighbor search is up to 8 times faster than the sequential search.
Citation:
Hui Li, Juan Chen, Ying Tao, David Gro, Lex Wolters, "Improving a Local Learning Technique for QueueWait Time Predictions," ccgrid, pp.335-342, Sixth IEEE International Symposium on Cluster Computing and the Grid (CCGRID'06), 2006