We present a Markov model for analyzing the performance of parallel -distributed processors that execute a job consisting of N independent tasks in parallel using P processors. The model is a Markov Chain with states representing service and failure rates with k (0 < k <= P) active processors. The task-times and processor failures are both exponentially distributed. We derive a number of expressions to determine the mean execution time, probability of success, work, and other measurable quantities, all conditioned on the job finishing successfully. A prototype, implemented using an extended version of ACMPI, is used for actual experiments that are based on simulated task-times and processor failures. We present our results comparing the analytic model with the prototype for a range of values of processor failure rates. We also discuss extensions of the model and issues related to communication costs, approximations and effect of task-time distributions.
Index Terms:
Parallel computing, distributed computing, Markov Model, Performance, Processor Failures
Citation:
Gehan Weerasinghe, Imad Antonios, Lester Lipsky, "An Analytic Performance Model of Parallel Systems that Perform N Tasks Using P Processors That Can Fail," nca, pp.0310, IEEE International Symposium on Network Computing and Applications (NCA'01), 2001