loading...
High performance LU factorization for non-dedicated clusters
Chicago, IL, USA April 19-April 22
DOI Bookmark: http://doi.ieeecomputersociety.org/10.1109/CCGrid.2004.1336698Fourth IEEE International Symposium o ...
 This Article 
 
PDF
HTML
 
 Share 
   
 Bibliographic References 
   
 Add to: 
 
Digg
Furl
Spurl
Blink
Simpy
Google
Del.icio.us
Y!MyWeb
 
 Search 
   
T. Endo, Univ. of Tokyo, Japan
K. Kaneda, Univ. of Tokyo, Japan
K. Taura, Univ. of Tokyo, Japan
A. Yonezawa, Dept. of Comput. Sci., Virginia Univ., Charlottesville, VA, USA
This paper describes an implementation of parallel LU factorization. The focus is to achieve high performance on non-dedicated clusters, where the number of available computing resources may be arbitrary and even dynamically changing. We accommodate joining/leaving processes by describing the algorithm in the Phoenix programming model. We achieve high performance in this setting by a combination of techniques including a latency tolerant communication and data partitioning that achieves both load balance and small communication volume for arbitrary and dynamically changing number of processors. We observed 130 GFlops with 128 processes on a 70-node dual 2.4GHz Xeon cluster, at matrix size = 46080. This performance is comparable to that of the High Performance Linpack (HPL). When cluster nodes are loaded by background processes, our implementation surpasses HPL.
Citation:
T. Endo, K. Kaneda, K. Taura, A. Yonezawa, "High performance LU factorization for non-dedicated clusters," ccgrid, pp.678-685, Fourth IEEE International Symposium on Cluster Computing and the Grid (CCGrid'04), 2004
Usage of this product signifies your acceptance of the Terms of Use.


Suggestions