The purpose of our work is to better understand the dynamic data sharing behavior of certain classes of grid end users. Toward this end, we study a large-scale data repository and the end users? access behaviors to this repository. Interesting insights from the study include that (1) the use of parallel methods for data downloads, via download accelerators, is common, despite qualms expressed by the community about the impacts of such behavior on wide area data distribution networks, (2) high levels of burstiness exist for such data movements, as also observed for scientific or populist data repositories and web sites (e.g., space imagery, sports events), and (3) a large number of remote data retrievals are by single clients for single files. We finally discuss the impact of our observations on grid applications in general.
Citation:
Mohamed Mansour, Mathew Wolf, Karsten Schwan, "Dynamic Data Access to the GT/CERCS Linux Mirror Site," ipdps, vol. 18, pp.273b, 18th International Parallel and Distributed Processing Symposium (IPDPS'04) - Workshop 17, 2004