In this paper, we present the SNMP-FD service, a novel failure detection service entirely based on the Simple Network Management Protocol (SNMP). This approach promises better interoperability with external tools and failure information sources, including network equipment and cluster management tools. We first show how the SNMP standard can be used to build a failure detection service. We describe the already standardized interfaces that can be reused and introduce the interfaces that need to be added. SNMP is used extensively in the service: for messaging, process status description, configuration, services statistics and delivering failure detection information to applications. We then present our implementation and an evaluation of performance and quality of service.
Citation:
Matthias Wiesmann, Peter Urban, Xavier Defago, "An SNMP based failure detection service," srds, pp.365-376, 25th IEEE Symposium on Reliable Distributed Systems (SRDS'06), 2006