loading...
Reachability-Based Fault-Tolerant Routing
Minneapolis, Minnesota July 12-July 15
DOI Bookmark: http://doi.ieeecomputersociety.org/10.1109/ICPADS.2006.8912th International Conference on Para ...
 This Article 
 
PDF
HTML
 
 Share 
   
 Bibliographic References 
   
 Add to: 
 
Digg
Furl
Spurl
Blink
Simpy
Google
Del.icio.us
Y!MyWeb
 
 Search 
   
J. M. Monta?ana, Universidad Polit?cnica de Valencia, Spain
J. Flich, Universidad Polit?cnica de Valencia, Spain
A. Robles, Universidad Polit?cnica de Valencia, Spain
J. Duato, Universidad Polit?cnica de Valencia, Spain
Currently, clusters of PCs are being used as a costeffective alternative to large parallel computers. In most of them it is critical to keep the system running even in the presence of faults. As the number of nodes increases in these systems, the interconnection network grows accordingly. Along with the increase in components the probability of faults increases dramatically, and thus, fault-tolerance in the system, in general, and in the interconnection network, in particular, plays a key role.

An interesting approach to provide fault-tolerance consists of migrating on fly the paths affected by the failure to new fault-free paths.

In this paper, we propose a simple and effective faulttolerant routing methodology, referred to as Reachability Based Fault Tolerant Routing (RFTR), that can be applied to any topology. RFTR builds new alternative paths by joining subpaths extracted from the set of already computed paths, thus being time-efficient. In order to avoid deadlocks, RFTR performs, if required, a virtual channel transition on the subpath union.

As an example of applicability, in this paper we apply RFTR to InfiniBand. Evaluation results on tori show that RFTR exhibits a low computation cost and does not degrade performance significantly.

Citation:
J. M. Monta?ana, J. Flich, A. Robles, J. Duato, "Reachability-Based Fault-Tolerant Routing," icpads, vol. 1, pp.515-524, 12th International Conference on Parallel and Distributed Systems - Volume 1 (ICPADS'06), 2006
Usage of this product signifies your acceptance of the Terms of Use.