In the past decade, the use of distributed algorithms to model simulations is considerably increased, in order to gain speedup over traditional sequential simulations. Also, there has been much interest in using inexpensive, powerful workstation nets, with high speed interconnection, instead of expensive parallel computers. In this paper, we briefly present the kernel of a distributed system (PV/sup 2/M) implemented on top of PVM routines, where synchronization is based on the concept of Virtual Time. Special emphasis is given to the fault tolerant mechanisms provided in it. PV/sup 2/M implements a checkpoint-restart mechanism, with respect to processes located on non master hosts, in such a way as to be 1-resilient with respect to failures occurring to these hosts.
Index Terms:
distributed algorithms; fault tolerant computing; synchronisation; message passing; distributed dependable simulation system; distributed algorithms; simulations modelling; high speed interconnection; PVM routines; Virtual Time; fault tolerant mechanisms; checkpoint-restart mechanism
Citation:
V. Gianuzzi, F. Merani, "Using PVM to implement a distributed dependable simulation system," pdp, pp.529, 3rd Euromicro Workshop on Parallel and Distributed Processing, 1995