This paper presents a design and analysis of scheduling techniques to cope with the inherent unreliability and instability of worker nodes in large-scale donation-based distributed infrastructures such as P2P and Grid systems. In particular, we focus on nodes that execute tasks via donated computational resources and may behave erratically or maliciously. We present a model in which reliability is not a binary property but a statistical one based on a node?s prior performance and behavior. We use this model to construct several reputation-based scheduling algorithms that employ estimated reliability ratings of worker nodes for efficient task allocation. Through simulation of a BOINC-like distributed computing infrastructure, we demonstrate that our algorithms can significantly improve throughput, while maintaining a very high success rate of task completion.
Citation:
Jason Sonnek, Mukesh Nathan, Abhishek Chandra, Jon Weissman, "Reputation-Based Scheduling on Unreliable Distributed Infrastructures," icdcs, pp.30, 26th IEEE International Conference on Distributed Computing Systems (ICDCS'06), 2006