loading...
Clustered Workflow Execution of Retargeted Data Analysis Scripts
May 19-May 22
DOI Bookmark: http://doi.ieeecomputersociety.org/10.1109/CCGRID.2008.692008 Eighth IEEE International Sympos ...
 This Article 
 
PDF
HTML
 
 Share 
   
 Bibliographic References 
   
 Add to: 
 
Digg
Furl
Spurl
Blink
Simpy
Google
Del.icio.us
Y!MyWeb
 
 Search 
   
Supercomputing advances have enabled computational science data volumes to grow at ever increasing rates, commonly resulting in moredata produced than can be practically analyzed. Whole-dataset download costs have grown to impractical heights, even with multi-Gbps networks, forcing scientists to rely on server-side subsetting and limiting the scope of data they can analyze on a workstation. Our system supplements existing scientific data services with lightweight computational capability, providing a means of safely relocating analysis from the desktop to the server where clustered execution can be coordinated, exploiting data locality, reducing unnecessary data transfer, and providing end-users with results several times faster. We show how dataflow and other compiler-inspired analyses of shell scripts of scientists' most common analysis tools enables parallelization and optimizations in disk and network I/O bandwidth. We benchmark using an actual geoscience analysis script, illustrating the crucial performance gains of extracting workflows defined in scripts and optimizing their execution. Current results quantify significant improvements in performance, showing the promise of bringing transparent high-performance analysis to the scientist's desktop.
Index Terms:
cluster, scripting, scientific computing, service, data analysis, parallelism, compilation
Citation:
Daniel L. Wang, Charles S. Zender, Stephen F. Jenks, "Clustered Workflow Execution of Retargeted Data Analysis Scripts," ccgrid, pp.449-458, 2008 Eighth IEEE International Symposium on Cluster Computing and the Grid (CCGRID), 2008
Usage of this product signifies your acceptance of the Terms of Use.