Many parallel programs employ regular topological structures to support their computation. This topological information is exploitable in the debugging process. Communications not normally part of a topology, ones that are either missing or unexpected, are immediately recognizable. Furthermore, animations used to assist the debugging may be enhanced by arranging representations of the executing tasks with reference to the program's topology.However, direct topology support is lacking in many environments, including workstation clusters, where popular language extensions such as the Parallel Virtual Machine (PVM) and the Message Passing Interface (MPI) are common. Programmers are required to implement topology support themselves. Moreover, debugger support that exploits topological information is lacking; without explicit knowledge, determining a program's topology is difficult.This paper presents a methodology to identify program topologies using only standard trace facilities. This methodology uses the concept of distance between graphs. To demonstrate the feasibility of the approach, several generic algorithms are implemented, and results on five different types of topologies reported.
Index Terms:
topology, parallel program debugging, graph distance, maximal common subgraph
Citation:
Simon Huband, Chris McDonald, "Debugging Parallel Programs Using Incomplete Information," iwcc, pp.278, 1st IEEE Computer Society International Workshop on Cluster Computing, 1999