loading...
Fault-tolerance in a Distributed Management System: a Case Study
Portland, Oregon May 03-May 10
DOI Bookmark: http://doi.ieeecomputersociety.org/10.1109/ICSE.2003.120122525th International Conference on Soft ...
 This Article 
 
PDF
HTML
 
 Share 
   
 Bibliographic References 
   
 Add to: 
 
Digg
Furl
Spurl
Blink
Simpy
Google
Del.icio.us
Y!MyWeb
 
 Search 
   
Robert Smeikal, Vienna University of Technology
Karl M. Goeschka, Frequentis Nachrichtentechnik GmbH
Our case study provides the most important conceptual lessons learned from the implementation of a Distributed Telecommunication Management System (DTMS), which controls a networked voice communication system. Major requirements for the DTMS are fault-tolerance against site or network failures, transactional safety, and reliable persistence. In order to provide distribution and persistence both transparently and fault-tolerant we introduce a two-layer architecture facilitating an asynchronous replication algorithm. Among the lessons learned are: component based software engineering poses a significant initial overhead but is worth it in the long term; a fault-tolerant naming service is a key requirement for fail-safe distribution; the reasonable granularity for persistence and concurrency control is one whole object; asynchronous replication on the database layer is superior to synchronous replication on the instance level in terms of robustness and consistency; semi-structured persistence with XML has drawbacks regarding consistency, performance and convenience; in contrast to an arbitrarily meshed object model, a accentuated hierarchical structure is more robust and feasible; a query engine has to provide a means for navigation through the object model; finally the propagation of deletion operation becomes more complex in an object-oriented model. By incorporating these lessons learned we are well underway to provide a highly available, distributed platform for persistent object systems.
Citation:
Robert Smeikal, Karl M. Goeschka, "Fault-tolerance in a Distributed Management System: a Case Study," icse, pp.478, 25th International Conference on Software Engineering (ICSE'03), 2003
Usage of this product signifies your acceptance of the Terms of Use.