loading...
Reducing Cache Pollution via Dynamic Data Prefetch Filtering
DOI Bookmark: http://doi.ieeecomputersociety.org/10.1109/TC.2007.17January 2007 (vol. 56 no. 1) pp. 18-31
 This Article 
 
PDF
HTML
IEEE Xplore Subscribers
 
 Share 
   
 Bibliographic References 
   
 Add to: 
 
Digg
Furl
Spurl
Blink
Simpy
Google
Del.icio.us
Y!MyWeb
 
 Search 
   
In order to bridge the gap of the growing speed disparity between processors and their memory subsystems, aggressive prefetch mechanisms, either hardware-based or compiler-assisted, are employed to hide memory latencies. As the first-level cache gets smaller in deep submicron processor design for fast cache accesses, data cache pollution caused by overly aggressive prefetch mechanisms will become a major performance concern. Ineffective prefetches not only offset the benefits of benign prefetches due to pollution but also throttle bus bandwidth, leading to an overall performance degradation. In this paper, we propose and analyze a number of hardware-based prefetch pollution filtering mechanisms to differentiate good and bad prefetches dynamically based on history information. We designed three prefetch pollution filters organized as a one-level, two-level, or gshare style. In addition, we examine two table indexing schemes: Per-Address (PA) based and Program Counter (PC) based. Our prefetch pollution filters work in tandem with both hardware and software prefetchers. As our analysis shows, the cache pollution filters can reduce the ineffective prefetches by more than 90 percent and alleviate the excessive memory bandwidth induced by them. Also, the performance can be improved by up to 16 percent when our filtering mechanism is incorporated with aggressive prefetch filters as a result of reduced cache pollution and less competition for the limited number of cache ports. In addition, a number of sensitivity studies are performed to provide more understandings of the prefetch pollution filter design.

[1] 18 K.K. Chan, C.C. Hay, J.R. Keller, G.P. Kurpanek, F.X. Schumacher, and J. Zheng, “Design of the HP PA 7200 CPU,” Hewlett-Packard J., vol. 47, no. 1, 1996.
[2] S. Przybylski, “The Performance Impact of Block Sizes and Fetch Strategies,” Proc. 17th Int'l Symp. Computer Architecture, 1990.
[3] A.J. Smith, “Cache Memories,” Computing Surveys, vol. 14, no. 3, 1982.
[4] F. Dahlgren, M. Dubois, and P. Stenstrom, “Fixed and Adaptive Sequential Prefetching in Shared-Memory Multiprocessors,” Proc. 1993 Int'l Conf. Parallel Processing, 1993.
[5] J.W.C. Fu, J.H. Patel, and B.L. Janssens, “Stride Directed Prefetching in Scalar Processors,” Proc. 25th Int'l Symp. Microarchitecture, 1992.
[6] T.-F. Chen and J.-L. Baer, “Reducing Memory Latency via Non-Blocking and Prefetching Caches,” Proc. Fifth Int'l Conf. Architectural Support for Programming Languages and Operating Systems, 1992.
[7] T.-F. Chen and J.-L. Baer, “Effective Hardware-Based Data Prefetching for High Performance Processors,” IEEE Trans. Computers, vol. 44, no. 5, May 1995.
[8] M.J. Charney and A.P. Reeves, “Generalized Correlation Based Hardware Prefetching,” Technical Report EE-CEG-95-1, Cornell Univ., 1995.
[9] D. Joseph and D. Grunwald, “Prefetching Using Markov Predictors,” Proc. 24th Ann. Int'l Symp. Computer Architecture, pp.252-263, 1997.
[10] Y. Solihin, J. Lee, and J. Torrellas, “Using a User-Level Memory Thread for Correlation Prefetching,” Proc. 29th Ann. Int'l Symp. Computer Architecture, pp. 171-182, 2002.
[11] K.J. Nesbit, A.S. Dhodapkar, and J.E. Smith, “AC/DC: An Adaptive Data Cache Prefetcher,” Proc. Int'l Conf. Parallel Architectures and Compiler Techniques, pp. 135-145, 2004.
[12] K.J. Nesbit and J.E. Smith, “Data Cache Prefetching Using a Global History Buffer,” Proc. Int'l Symp. High-Performance Computer Architecture, pp. 96-105, 2004.
[13] D.G. Perez, G. Mouchard, and O. Temam, “MicroLib: A Case for the Quantitative Comparison of Micro-Architecture Mechanisms,” Proc. 37th Int'l Symp. Microarchitecture, pp. 43-54, 2004.
[14] A.K. Porterfield, “Software Methods for Improvement of Cache Performance on Supercomputer Application,” PhD dissertation, Rice Univ., 1989.
[15] IA-32 Intel Architecture Software Developer's Manual Volume 2B, http://developer.intel.com/design/Pentium4/ manuals25366714.pdf, Intel Corporation, 2004.
[16] Alpha Architecture Handbook. Compaq Computer Corp., Oct. 1998.
[17] C.-K. Luk and T.C. Mowry, “Automatic Compiler-Inserted Prefetching for Pointer-Based Applications,” IEEE Trans. Computers, vol. 48, no. 2, Feb. 1999.
[18] V. Srinivasan, E.S. Davidson, and G.S. Tyson, “A Prefetch Taxonomy,” IEEE Trans. Computers, vol. 53, no. 2, pp. 126-140, Feb. 2004.
[19] T.R. Puzak, A. Hartstein, P.G. Emma, and V. Srinivasan, “When Prefetching Improves/Degrades Performance,” Proc. Second Conf. Computing Frontiers (CF '05), pp. 342-352, 2005.
[20] Z. Wang, K.S. McKinley, A.L. Rosenberg, and C.C. Weems, “Using the Compiler to Improve Cache Replacement Decisions,” Proc. Int'l Conf. Parallel Architectures and Compiler Techniques, 2002.
[21] A. Lai, C. Fide, and B. Falsafi, “Dead-Block Prediction and Dead-Block Correlating Prefetchers,” Proc. 28th Int'l Symp. Computer Architecture, 2001.
[22] O. Mutlu, H. Kim, D.N. Armstrong, and Y.N. Patt, “Cache Filtering Techniques to Reduce the Negative Impact of Useless Speculative Memory References on Processor Performance,” Proc. 16th Symp. Computer Architecture and High Performance Computing, 2004.
[23] Z. Wang, D. Burger, K.S. McKinley, S.K. Reinhardt, and C.C. Weems, “Guided Region Prefetching: A Cooperative Hardware/Software Approach,” Proc. 30th Ann. Int'l Symp. Computer Architecture, pp. 388-398, 2003.
[24] W.Y. Chen, S.A. Mahlke, P.P. Chang, and W.-M.W. Hwu, “Data Access Microarchitectures for Superscalar Processors with Compiler-Assisted Data Prefetching,” Proc. Int'l Symp. Microarchitecture, 1991.
[25] W.-F. Lin, S.K. Reinhardt, D. Burger, and T.R. Puzak, “Filtering Superfluous Prefetches Using Density Vectors,” Proc. 19th Int'l Conf. Computer Design, pp. 124-132, 2001.
[26] V. Srinivasan, G.S. Tyson, and E.S. Davidson, “A Static Filter for Reducing Prefetch Traffic,” Technical Report CSE-TR-400-99, Univ. of Michigan, 1999.
[27] J. Pomerene, T. Puzak, R. Rechtschaffen, and F. Sparacio, “Prefetching System for a Cache Having a Second Directory for Sequentially Accessed Block,” US patent 4,807,110, Feb. 1989.
[28] T.-Y. Yeh and Y.N. Patt, “Two-Level Adaptive Training Branch Prediction,” Proc. 24th Int'l Symp. Microarchitecture, 1991.
[29] T.-Y. Yeh and Y.N. Patt, “Alternative Implementations of Two-Level Adaptive Training Branch Prediction,” Proc. 19th Int'l Symp. Computer Architecture, 1992.
[30] T.-Y. Yeh and Y.N. Patt, “A Comparison of Dynamic Branch Predictors that Use Two Levels of Branch History,” Proc. 20th Int'l Symp. Computer Architecture, 1993.
[31] S. McFarling, “Combining Branch Predictors,” WRL Technical Note TN-36, Digital Equipment Corp., June 1993.
[32] G. Hinton, D. Sagar, M. Upton, D. Boggs, D. Carmean, A. Kyker, and P. Roussel, “The Microarchitecture of the Pentium 4 Processor,” Intel Technology J., Q1 Issue, 2001.
[33] M. Carlisle, “Olden: Parallelizing Programs with Dynamic Data Structures on Distributed-Memory Machines,” PhD dissertation, Dept. of Computer Science, Princeton Univ., 1996.

Index Terms:
Prefetch, memory subsystems, microarchitecture.
Citation:
Xiaotong Zhuang, Hsien-Hsin S. Lee, "Reducing Cache Pollution via Dynamic Data Prefetch Filtering," IEEE Transactions on Computers, vol. 56, no. 1, pp. 18-31, Jan. 2007, doi:10.1109/TC.2007.17
Usage of this product signifies your acceptance of the Terms of Use.