Abstract: This paper presents CPVM, a library that provides the user with a support to implement non-blocking, global checkpoint-restart algorithms for applications written using PVM thereby achieving fault-tolerance. A salient feature of CPVM is the way in which, solely on the basis of a simple set of new PVM primitives, it provides several advanced facilities useful to solve different problems. CPVM can also be used as a platform to implement different algorithms to detect stable properties such as deadlocks and termination, and to support job-swapping and migration in an environment where there previously was none.
Index Terms:
parallel programming; software libraries; software fault tolerance; software tools; concurrency control; software portability; CPVM; PVM; consistent checkpointing; software library; nonblocking; global checkpoint-restart algorithms; software fault-tolerance; deadlocks; termination; job-swapping; migration; Parallel Virtual Machine
Citation:
A. Clematis, V. Gianuzzi, "CPVM -- Extending PVM for Consistent Checkpointing," pdp, pp.0067, 4th Euromicro Workshop on Parallel and Distributed Processing (PDP '96), 1996