Abstract: A network management system must be fault-tolerant in order to provide the required fault management functionality. It is often useful to examine MIB objects of a faulty agent in order to determine why it is faulty. This paper presents a new framework for replicating of SNMP management objects in local area networks. The framework is based on groups of agents that communicate with each other using reliable multicast. A group of agents provides fault-tolerant object functionality. A SNMP service is proposed that allows replicated MIB objects of a faulty agent of a given group to be accessed through fault-free agents of that group. The presented framework allows the dynamic definition of agent groups, and management objects to be replicated in each group. A practical fault-tolerant tool for local area network fault management was implemented and is presented. The system employs SNMP agents that interact with a group communication tool. As an example, we show how the examination of TCP-related objects of faulty agents have been use d in the fault diagnosis process. The impact of replication on network performance is evaluated as well as a probabilistic analysis of replicated object consistency.
Citation:
E. Duarte, Jr., A.L. dos Santos, "Network Fault Management Based on SNMP Agent Groups," icdcsw, pp.0051, 21st International Conference on Distributed Computing Systems Workshops (ICDCSW '01), 2001