We designed and implemented a software distributed shared memory (DSM) system, SCASH-MPI, by using MPI as the communication layer of the SCASH DSM. With MPI as the communication layer, we could use high-speed networks with several clusters and high portability. Furthermore, SCASH-MPI can use high-speed networks with MPI, which is the most commonly available communication library. On the other hand, existing software DSM systems usually use a dedicated communication layer, TCP, or UDP-Ethernet. SCASH-MPI avoids the need for a large amount of pin-down memory for shared memory use that has limited the applications of the original SCASH. In SCASH-MPI, a thread is created to support remote memory communication using MPI. An experiment on a 4-node Itanium cluster showed that the Laplace Solver benchmark using SCASH-MPI achieves a performance comparable to the original SCASH. Performance degradation is only 6.3% in the NPB BT benchmark Class B test. In SCASH-MPI, page transfer does not start until a page fault is detected. To hide the latency of page transmission, we implemented a prefetch function. The latency in BT Class B was reduced by 64% when the prefetch function was used.
Citation:
Yoshinori Ojima, Mitsuhisa Sato, Taisuke Boku, Daisuke Takahashi, "Design of a Software Distributed Shared Memory System using an MPI communication layer," ispan, pp.220-229, 8th International Symposium on Parallel Architectures,Algorithms and Networks (ISPAN'05), 2005