loading...
Argus: Low-Cost, Comprehensive Error Detection in Simple Cores
DOI Bookmark: http://doi.ieeecomputersociety.org/10.1109/MM.2008.3January/February 2008 (vol. 28 no. 1) pp. 52-59
 This Article 
 
PDF
HTML
 
 Share 
   
 Bibliographic References 
   
 Add to: 
 
Digg
Furl
Spurl
Blink
Simpy
Google
Del.icio.us
Y!MyWeb
 
 Search 
   
Albert Meixner, Duke University
Michael E. Bauer, Duke University
Daniel J. Sorin, Duke University
Argus, a novel approach for detecting errors in simple processor cores, dynamically verifies the correctness of the four tasks performed by a von Neumann core: control flow, data flow, computation, and memory access. Argus detects transient and permanent errors, with far lower impact on performance and chip area than previous techniques.

[1] 52 International Technology Roadmap for Semiconductors, 2003; http:/www.itrs.net.
[2] A. Meixner, M.E. Bauer, and D.J. Sorin, "Argus: Low-Cost, Comprehensive Error Detection in Simple Cores," Proc. 40th Ann. IEEE/ACM Int'l Symp. Microarchitecture (MICRO 07), IEEE CS Press, 2007, pp. 210-222.
[3] T.M. Austin, "DIVA: A Dynamic Approach to Microprocessor Verification," J. Instruction-Level Parallelism, vol. 2, May 2000; http://www.jilp.org/vol2v2paper7.pdf.
[4] X. Delord and G. Saucier, "Formalizing Signature Analysis for Control Flow Checking of Pipelined RISC Microprocessors," Proc. Int'l Test Conf. (ITC 91), IEEE Press, 1991, pp. 936-945.
[5] A. Meixner and D.J. Sorin, "Error Detection Using Dynamic Dataflow Verification," Proc. Int'l Conf. Parallel Architectures and Compilation Techniques (PACT 07), IEEE CS Press, Sept. 2007, pp. 104-118.
[6] F.F. Sellers, M.-Y Hsiao, and L.W. Bearnson, Error Detecting Logic for Digital Computers, McGraw Hill Book Company, 1968.
[7] A. Meixner and D.J. Sorin, "Dynamic Verification of Memory Consistency in Cache-Coherent Multithreaded Computer Architectures," Proc. Int'l Conf. Dependable Systems and Networks, (DSN 06), IEEE CS Press, 2006, pp. 73-82.
[8] D. Lampret OpenRISC 1200 IP Core Specification, rev. 0.7, Sept. 2001, http:/www.opencores.org.
[9] A. Mahmood and E. McCluskey, "Watchdog Processors: Error Coverage and Overhead," Proc. 15th Int'l Symp. Fault-Tolerant Computing Systems (FTCS 85), IEEE Press, 1985, pp. 214-219.
[10] J.B. Sulistyo, J. Perry, and D.S. Ha, "Developing Standard Cells for TSMC 0.25 &SetFont Typeface="11";µ&SetFont Typeface="46";m Technology under MOSIS DEEP Rules," tech. report VISC-2003-01, Dept. of Electrical and Computer Engineering, Virginia Polytechnic Institute and State Univ., 2003.
[11] S.J. Wilton and N.P. Jouppi, "An Enhanced Access and Cycle Time Model for On-Chip Caches," research report 93/5, DEC Western Research Laboratory, July 1994; http://www.hpl.hp.com/techreports/Compaq-DEC WRL-93-5.pdf.
[12] C. Lee, M. Potkonjak, and W.H. Mangione-Smith, "MediaBench: A Tool for Evaluating and Synthesizing Multimedia and Communications Systems," Proc. 30th Ann. IEEE/ACM Int'l Symp. Microarchitecture (MICRO 97), IEEE CS Press, 1997, pp. 330-335.
[1] C. Weaver and T. Austin, "A Fault Tolerant Approach to Microprocessor Design," Proc. Int'l Conf. Dependable Systems and Networks (DSN 01), IEEE CS Press, 2001, pp. 411-420.
[2] T.M. Austin, "DIVA: A Dynamic Approach to Microprocessor Verification," J. Instruction-Level Parallelism, vol. 2, May 2000, http://www.jilp.org/vol2v2paper7.pdf
[3] E. Rotenberg, "AR-SMT: A Microarchitectural Approach to Fault Tolerance in Microprocessors," Proc. 29th Int'l Symp. Fault-Tolerant Computing Systems (FTCS 99), IEEE CS Press, 1999, pp. 84-91.
[4] S.S. Mukherjee, M. Kontz, and S.K. Reinhardt, "Detailed Design and Implementation of Redundant Multithreading Alternatives," Proc. 29th Ann. Int'l Symp. Computer Architecture (ISCA 02), IEEE CS Press, 2002, pp. 99-110.
[5] S. Shyam et al., "Ultra Low-Cost Defect Protection for Microprocessor Pipelines," Proc. 12th Int'l Conf. Architectural Support for Programming Languages and Operating Systems (ASPLOS 06), ACM Press, 2006, pp. 73-82.
[6] N. Oh, P.P. Shirvani, and E.J. McCluskey, "Error Detection by Duplicated Instructions in Super-Scalar Processors," IEEE Trans. Reliability, vol. 51, no. 1, Mar 2002, pp. 63-74.
[7] G.A. Reis et al., "SWIFT: Software Implemented Fault Tolerance," Proc. Int'l Symp. Code Generation and Optimization (CGO 05), IEEE CS Press, 2005, pp. 243-254.

Index Terms:
microarchitecture, error detection, dependability, fault tolerance
Citation:
Albert Meixner, Michael E. Bauer, Daniel J. Sorin, "Argus: Low-Cost, Comprehensive Error Detection in Simple Cores," IEEE Micro, vol. 28, no. 1, pp. 52-59, Jan./Feb. 2008, doi:10.1109/MM.2008.3
Usage of this product signifies your acceptance of the Terms of Use.