loading...
A Flexible Heterogeneous Multi-Core Architecture
Brasov, Romania September 15-September 19
DOI Bookmark: http://doi.ieeecomputersociety.org/10.1109/PACT.2007.516th International Conference on Para ...
 This Article 
 
PDF
HTML
IEEE Xplore Subscribers
 
 Share 
   
 Bibliographic References 
   
 Add to: 
 
Digg
Furl
Spurl
Blink
Simpy
Google
Del.icio.us
Y!MyWeb
 
 Search 
   
Miquel Pericas, Universitat Politecnica de Catalunya, Spain; Barcelona Supercomputing Center, Spain
Adrian Cristal, Barcelona Supercomputing Center, Spain
Francisco J. Cazorla, Barcelona Supercomputing Center, Spain
Ruben Gonzalez, Universitat Politecnica de Catalunya, Spain
Daniel A. Jimenez, The University of Texas at San Antonio, USA
Mateo Valero, Universitat Politecnica de Catalunya, Spain; Barcelona Supercomputing Center, Spain
Multi-core processors naturally exploit thread-level par- allelism (TLP). However, extracting instruction-level paral- lelism (ILP) from individual applications or threads is still a challenge as application mixes in this environment are nonuniform. Thus, multi-core processors should be flexi- ble enough to provide high throughput for uniform paral- lel applications as well as high performance for more gen- eral workloads. Heterogeneous architectures are a first step in this direction, but partitioning remains static and only roughly fits application requirements.

This paper proposes the Flexible Heterogeneous Mul- tiCore processor (FMC), the first dynamic heterogeneous multi-core architecture capable of reconfiguring itself to fit application requirements without programmer intervention. The basic building block of this microarchitecture is a scal- able, variable-size window microarchitecture that exploits the concept of Execution Locality to provide large-window capabilities. This allows to overcome the memory wall for applications with high memory-level parallelism (MLP). The microarchitecture contains a set of small and fast cache processors that execute high locality code and a network of small in-order memory engines that together exploit low locality code. Single-threaded applications can use the entire network of cores while multi-threaded applications can effi- ciently share the resources. The sizing of critical structures remains small enough to handle current power envelopes.

In single-threaded mode this processor is able to out- perform previous state-of-the-art high-performance proces- sor research by 12% on SpecFP. We show how in a quad- threaded/quad-core environment the processor outperforms a statically allocated configuration in both throughput and harmonic mean, two commonly used metrics to evaluate SMT performance, by around 2-4%. This is achieved while using a very simple sharing algorithm.

Citation:
Miquel Pericas, Adrian Cristal, Francisco J. Cazorla, Ruben Gonzalez, Daniel A. Jimenez, Mateo Valero, "A Flexible Heterogeneous Multi-Core Architecture," pact, pp.13-24, 16th International Conference on Parallel Architecture and Compilation Techniques (PACT 2007), 2007
Usage of this product signifies your acceptance of the Terms of Use.