loading...
Performance Analysis of Cell Broadband Engine for High Memory Bandwidth Applications
San Jose, CA April 25-April 27
DOI Bookmark: http://doi.ieeecomputersociety.org/10.1109/ISPASS.2007.3637512007 IEEE International Symposium on ...
 This Article 
 
PDF
HTML
 
 Share 
   
 Bibliographic References 
   
 Add to: 
 
Digg
Furl
Spurl
Blink
Simpy
Google
Del.icio.us
Y!MyWeb
 
 Search 
   
D. Jimenez-Gonzalez, Dept. of Comput. Archit., Univ. Politecnica de Catalunya, Barcelona
X. Martorell, Dept. of Comput. Archit., Univ. Politecnica de Catalunya, Barcelona
A. Ramirez, Dept. of Comput. Archit., Univ. Politecnica de Catalunya, Barcelona
The cell broadband engine (CBE) is designed to be a general purpose platform exposing an enormous arithmetic performance due to its eight SIMD-only synergistic processor elements (SPEs), capable of achieving 134.4 GFLOPS (16.8 GFLOPS * 8) at 2.1 GHz, and a 64-bit power processor element (PPE). Each SPE has a 256Kb non-coherent local memory, and communicates to other SPEs and main memory through its DMA controller. CBE main memory is connected to all the CBE processor elements (PPE and SPEs) through the element interconnect bus (EIB), which has a 134.4 GB/s bandwidth performance peak at half the processor speed. Therefore, CBE platform is suitable to be used by applications using MPI and streaming programming models with a potential high performance peak. In this paper we focus on the communication part of those applications, and measure the actual memory bandwidth that each of the CBE processor components can sustain. We have measured the sustained bandwidth between PPE and memory, SPE and memory, two individual SPEs to determine if this bandwidth depends on their physical location, pairs of SPEs to achieve maximum bandwidth in nearly-ideal conditions, and in a cycle of SPEs representing a streaming kind of computation. Our results on a real machine show that following some strict programming rules, individual SPE to SPE communication almost achieves the peak bandwidth when using the DMA controllers to transfer memory chunks of at least 1024 Bytes. In addition, SPE to memory bandwidth should be considered in streaming programming. For instance, implementing two data streams using 4 SPEs each can be more efficient than having a single data stream using the 8 SPEs
Index Terms:
single instruction multiple data, arithmetic performance analysis, cell broadband engine, memory bandwidth application, synergistic processor element, DMA controller, direct memory access, element interconnect bus, bandwidth performance peak, processor speed, message passing interface, streaming programming model, processor component, data stream
Citation:
D. Jimenez-Gonzalez, X. Martorell, A. Ramirez, "Performance Analysis of Cell Broadband Engine for High Memory Bandwidth Applications," ispass, pp.210-219, 2007 IEEE International Symposium on Performance Analysis of Systems&Software, 2007
Usage of this product signifies your acceptance of the Terms of Use.