loading...
16-Bit FP Sub-Word Parallelism to Facilitate Compiler Vectorization and Improve Performance of Image and Media Processing
Montreal, Quebec, Canada August 15-August 18
DOI Bookmark: http://doi.ieeecomputersociety.org/10.1109/ICPP.2004.13279642004 International Conference on Para ...
 This Article 
 
PDF
HTML
 
 Share 
   
 Bibliographic References 
   
 Add to: 
 
Digg
Furl
Spurl
Blink
Simpy
Google
Del.icio.us
Y!MyWeb
 
 Search 
   
Daniel Etiemble, University of Paris Sud
Lionel Lacassagne, University of Paris Sud
We consider the implementation of 16-bit floating point instructions on a Pentium 4 and a PowerPC G5 for image and media processing. By measuring the execution time of benchmarks with these new simulated instructions, we show that significant speed-up is obtained compared to 32-bit FP versions. For image processing, the speed-up both comes from doubling the number of operations per SIMD instruction and the better cache behavior with byte storage. For data stream processing with arrays of structures, the speed-up mainly comes from the wider SIMD instructions.
Citation:
Daniel Etiemble, Lionel Lacassagne, "16-Bit FP Sub-Word Parallelism to Facilitate Compiler Vectorization and Improve Performance of Image and Media Processing," icpp, pp.540-547, 2004 International Conference on Parallel Processing (ICPP'04), 2004
Usage of this product signifies your acceptance of the Terms of Use.