loading...
Compiler-Managed Software-based Redundant Multi-Threading for Transient Fault Detection
San Jose, California March 11-March 14
DOI Bookmark: http://doi.ieeecomputersociety.org/10.1109/CGO.2007.7International Symposium on Code Gener ...
 This Article 
 
PDF
HTML
 
 Share 
   
 Bibliographic References 
   
 Add to: 
 
Digg
Furl
Spurl
Blink
Simpy
Google
Del.icio.us
Y!MyWeb
 
 Search 
   
Cheng Wang, Intel Corporation
Ho-seop Kim, Intel Corporation
Youfeng Wu, Intel Corporation
Victor Ying, Intel Corporation
As transistors become increasingly smaller and faster with tighter noise margins, modern processors are becoming increasingly more susceptible to transient hardware faults. Existing Hardware-based Redundant Multi-Threading (HRMT) approaches rely mostly on special-purpose hardware to replicate the program into redundant execution threads and compare their computation results. In this paper, we present a Software-based Redundant Multi-Threading (SRMT) approach for transient fault detection. Our SRMT technique uses compiler to automatically generate redundant threads so they can run on general-purpose chip multi-processors (CMPs). We exploit high-level program information available at compile time to optimize data communication between redundant threads. Furthermore, our software-based technique provides flexible program execution environment where the legacy binary codes and the reliability-enhanced codes can co-exist in a mix-and-match fashion, depending on the desired level of reliability and software compatibility. Our experimental results show that compiler analysis and optimization techniques can reduce data communication requirement by up to 88% of HRMT. With general-purpose intra-chip communication mechanisms in CMP machine, SRMT overhead can be as low as 19%. Moreover, SRMT technique achieves error coverage rates of 99.98% and 99.6% for SPEC CPU2000 integer and floating-point benchmarks, respectively. These results demonstrate the competitiveness of SRMT to HRMT approaches.
Citation:
Cheng Wang, Ho-seop Kim, Youfeng Wu, Victor Ying, "Compiler-Managed Software-based Redundant Multi-Threading for Transient Fault Detection," cgo, pp.244-258, International Symposium on Code Generation and Optimization (CGO'07), 2007
Usage of this product signifies your acceptance of the Terms of Use.