loading...
Early Stopping in Global Data Computation
DOI Bookmark: http://doi.ieeecomputersociety.org/10.1109/TPDS.2003.1233713September 2003 (vol. 14 no. 9) pp. 909-921
 This Article 
 
PDF
HTML
 
 Share 
   
 Bibliographic References 
   
 Add to: 
 
Digg
Furl
Spurl
Blink
Simpy
Google
Del.icio.us
Y!MyWeb
 
 Search 
   

Abstract—The Global Data Computation problem consists of providing each process with the same vector (with one entry per process) such that each entry is filled by a value provided by the corresponding process. This paper presents a protocol that solves this problem in an asynchronous distributed system where processes can crash, but equipped with a perfect failure detector. This protocol requires that processes execute asynchronous computation rounds. The number of rounds is upper bounded by min(f + 2, t + 1, n), where n, t, and f represent the total number of processes, the maximum number of processes that can crash, and the number of processes that actually crash, respectively. This value is a lower bound for the number of rounds when t < n - 1. To our knowledge, this protocol is the first to achieve this lower bound. Interestingly, this protocol meets the same lower bound as the one required in synchronous systems.

[1] 909 H. Attiya and J. Welch, Distributed Computing: Fundamentals, Simulations and Advanced Topics. McGraw-Hill, 1998.
[2] R.A. Bazzi and G. Neiger, Simplifying Fault-Tolerance: Providing the Abstraction of Crash Failures J. ACM, vol. 48, no. 3, pp. 499-554, 2001.
[3] R. Brand, Iso-Ethernet: Bridging the Gap from WAN to LAN Data Comm., 1995.
[4] T. Chandra and S. Toueg, Unreliable Failure Detectors for Reliable Distributed Systems J. ACM, vol. 43, no. 2, pp. 225-267, 1996.
[5] B. Charron-Bost, R. Guerraoui, and A. Schiper, Synchronous System and Perfect Failure Detector: Solvability and Efficiency Issues Proc. Int'l IEEE Conf. Dependable Systems and Networks, pp. 523-532, 2000.
[6] C. Delporte-Gallet, H. Fauconnier, J.M. Hélary, and M. Raynal, Early Stopping in Global Data Computation Research Report PI-1436, IRISA, Jan. 2002. ftp://ftp.irisa.fr/techreports/2002PI-1436.ps.gz
[7] D. Dolev, R. Reischuk, and R. Strong, Early Stopping in Byzantine Agreement J. ACM, vol. 37, no. 4, pp. 720-741, 1990.
[8] H. Fauconnier, Fault-Tolerant Round-Based Distributed Computing (in French) Mémoire d'Habilitation, UniversitéParis 7-Denis Diderot, 2001.
[9] M.J. Fischer, N. Lynch, and M.S. Paterson, Impossibility of Distributed Consensus with One Faulty Process J. ACM, vol. 32, no. 2, pp. 374-382, 1985.
[10] E. Gafni, Round-by-Round Fault Detectors: Unifying Synchrony and Asynchrony Proc. 17th Symp. Principles of Distributed Computing (PODC '98), pp. 143-152, 1998.
[11] S. Gopal and S. Toueg, Inconsistency and Contamination Proc. 10th ACM Symp. Principles of Distributed Computing (PODC '91), pp. 257-272, 1991.
[12] V. Hadzilacos, Byzantine Agreement under Restricted Types of Failures (Not Telling the Truth Is Different from Telling Lies) Technical Report 18-83, Dept. Computer Science, Harvard Univ., 1983.
[13] V. Hadzilacos and S. Toueg, Reliable Broadcast and Related Problems Distributed Systems, S. Mullender ed., pp. 97-145, New York: ACM Press, 1993.
[14] J.M. Hélary, M. Hurfin, A. Mostefaoui, M. Raynal, and F. Tronel, Computing Global Functions in Asynchronous Distributed Systems with Perfect Failure Detectors IEEE Trans. Parallel and Distributed Systems, vol. 11, no. 9, pp. 897-909, Sept. 2000.
[15] L. Lamport, Time, Clocks, and the Ordering of Events in a Distributed System Comm. ACM, vol. 7, no. 21, pp. 558-565, 1978.
[16] N. Lynch, Distributed Algorithms. Morgan Kaufmann, 1996.
[17] L. Pease, R. Shostak, and L. Lamport, Reaching Agreement in Presence of Faults J. ACM, vol. 27, no. 2, pp. 228-234, 1980.
[18] G. Neiger and S. Toueg, Automatically Increasing the Fault-Tolerance of Distributed Algorithms J. Algorithms, vol. 11, no. 3, pp. 374-419, 1990.
[19] M. de Prycker, Asynchronous Transfer Mode: Solution for Broadband ISDN. Prentice Hall, 1995.
[20] M.-C. Rosu, Early-Stopping Terminating Reliable Broadcast Protocol for General-Omission Failures Brief Announcement, Proc. 15th ACM Symp. Principles of Distributed Computing (PODC '96) p. 209, 1996
[21] J. Rufino, P. Veríssimo, G. Arroz, C. Almeida, and L. Rodrigues, Fault-Tolerant Broadcasts in CAN Proc. 28th Symp. Fault-Tolerant Computing, pp. 150-159, June 1998.
[22] S. Toueg, K.J. Perry, and T.K. Srikanth, Fast Distributed Agreement SIAM J. Computing vol. 16, no. 3, pp. 445-457, 1987.
[23] P. Veríssimo, Real-Time Communication Distributed Systems, second ed., chapter 17, pp. 447-490, S. Mullender ed., Addison-Wesley/ACM Press, 1993.

Index Terms:
Asynchronous distributed systems, crash failure, distributed computing, early stopping, fault tolerance, global data, perfect failure detector, reliability.
Citation:
Carole Delporte-Gallet, Hugues Fauconnier, Jean-Michel H?lary, Michel Raynal, "Early Stopping in Global Data Computation," IEEE Transactions on Parallel and Distributed Systems, vol. 14, no. 9, pp. 909-921, Sept. 2003, doi:10.1109/TPDS.2003.1233713
Usage of this product signifies your acceptance of the Terms of Use.