loading...
Matrix Bidiagonalization on the Trident Processor
Nice, France April 22-April 26
DOI Bookmark: http://doi.ieeecomputersociety.org/10.1109/IPDPS.2003.1213467International Parallel and Distribute ...
 This Article 
 
PDF
HTML
 
 Share 
   
 Bibliographic References 
   
 Add to: 
 
Digg
Furl
Spurl
Blink
Simpy
Google
Del.icio.us
Y!MyWeb
 
 Search 
   
Mostafa I. Soliman, University of Aizu
Stanislav G. Sedukhin, University of Aizu
This paper discusses the implementation and evaluation of the reduction of a dense matrix to bidiagonal form on the Trident processor. The standard Golub and Kahan Householder bidiagonalization algorithm, which is rich in matrix-vector operations, and the LAPACK subroutine _GEBRD, which is rich in a mixture of vector, matrix-vector, and matrix operations, are simulated on the Trident processor. We show how to use the Trident parallel execution units, ring, and communication registers to effectively perform vector, matrix-vector, and matrix operations needed for bidiagonalizing a matrix. The number of clock cycles per FLOP is used as a metric to evaluate the performance of the Trident processor. Our results show that increasing the number of the Trident lanes proportionally decreases the number of cycles needed per FLOP. On a 32K?32K matrix and 128 Trident lanes, the speedup of using matrix-vector operations on the standard Golub and Kahan algorithm is around 1.5 times over using vector operations. However, using matrix operations on the _GEBRD subroutine gives speedup around 3 times over vector operations, and 2 times over using matrix-vector operations on the standard Golub and Kahan algorithm.
Index Terms:
Bidiagonal form, block Householder transformation, BLAS, scalable architecture
Citation:
Mostafa I. Soliman, Stanislav G. Sedukhin, "Matrix Bidiagonalization on the Trident Processor," ipdps, pp.257b, International Parallel and Distributed Processing Symposium (IPDPS'03), 2003
Usage of this product signifies your acceptance of the Terms of Use.