Achieving performance, reliability, and scalability presents a unique set of challenges for large distributed storage. To identify problem areas, there must be a way for developers to have a comprehensive view of the entire storage system. That is, users must be able to understand both node specific behavior and complex relationships between nodes. We present a distributed file system profiling method that supports such analysis. Our approach is based on combining node-specific metrics into a single cohesive system image. This affords users two views of the storage system: a micro, per-node view, as well as, a macro, multinode view, allowing both node-specific and complex internodal problems to be debugged. We visualize the storage system by displaying nodes and intuitively animating their metrics and behavior allowing easy analysis of complex problems.
Citation:
Andrew W. Leung, Eric Lalonde, Jacob Telleen, James Davis, Carlos Maltzahn, "Using Comprehensive Analysis for Performance Debugging in Distributed Storage Systems," msst, pp.281-286, 24th IEEE Conference on Mass Storage Systems and Technologies (MSST 2007), 2007