With the momentum gaining for the grid computing systems, the issue of deploying support for integrated scheduling and fault-tolerant approaches becomes paramount importance. Unfortunately, fault-tolerance have not been factored into the design of most existing Grid scheduling strategies. To this end, we propose a fault-tolerant scheduling policy that loosely couples job scheduling with job replication scheme such that jobs are e.ciently and reliably executed. Performance evaluation of the proposed fault-tolerant scheduler against a non-fault-tolerant scheduling policy is presented and shown that the proposed policy performs reasonably in the presence of various types of failures.
Index Terms:
Fault-tolerance, grid computing, fault-detection, grid scheduler, reconfigurable infrastructure
Citation:
J. H. Abawajy, "Fault-Tolerant Scheduling Policy for Grid Computing Systems," ipdps, vol. 14, pp.238b, 18th International Parallel and Distributed Processing Symposium (IPDPS'04) - Workshop 13, 2004