This paper presents Clusterfile, a parallel file system that provides parallel file access on a cluster of computers. Existing parallel file systems offer little control over matching the I/O access patterns and file data layout. Without this matching the applications may face the following problems: contention at I/O nodes, fragmentation of file data, false sharing, small network messages, high overhead of scattering/gathering the data. Clusterfile addresses some of these inefficiencies. Parallel applications can physically partition a file in arbitrary patterns. They can also set arbitrary views on a file. Views hide the parallel structure of the file and ease the programmer's burden of computing complex access indices. The intersections between views and layouts are computed by a memory redistribution algorithm. Read and write operations are optimized by pre-computing the direct mapping between access patterns and disks. Clusterfile uses the same data representation for file layouts, access patterns, and the mappings between each other.
Index Terms:
parallel I/O, file physical and logical partitioning, array regular distributions
Citation:
Florin Isaila, Walter F. Tichy, "Clusterfile: A Flexible Physical Layout Parallel File System," cluster, pp.37, Third IEEE International Conference on Cluster Computing (CLUSTER'01), 2001