Recently, large-scale distributed systems are being developed. Owing to the high construction cost, the lifetime of the systems are prolonged. Therefore, it is an important problem to achieve flexible distributed systems by introducing some mechanism in the system software for updating application software. The system cannot be kept highly available by the conventional updating methods because multiple processes have to be suspended simultaneously. This paper discusses a new method which allows each process to invoke the updating procedure asynchronously. The key idea is that multiple versions of processes can co-exist temporarily. Each pair of an old-version process and a new-version one is managed as a process group. The group communication protocol proposed in this paper supports message transmission among the process groups. Moreover, the protocol detects and resolves protocol errors caused by the mixture of multiple versions of processes by using checkpointing, timeout and rollback recovery.
Index Terms:
distributed processing; parallel programming; groupware; flexible distributed systems; distributed systems; updating application software; multiple processes; protocol; message transmission; process group; protocol errors; checkpointing; timeout; rollback recovery
Citation:
H. Kigaki, M. Takizawa, "Group communication approach for flexible distributed systems," ispan, pp.154, 1996 International Symposium on Parallel Architectures, Algorithms and Networks (ISPAN '96), 1996