loading...
Grid-based Indexing of a Newswire Corpus
Pittsburgh, PA November 08-November 08
DOI Bookmark: http://doi.ieeecomputersociety.org/10.1109/GRID.2004.34Fifth IEEE/ACM International Workshop ...
 This Article 
 
PDF
HTML
 
 Share 
   
 Bibliographic References 
   
 Add to: 
 
Digg
Furl
Spurl
Blink
Simpy
Google
Del.icio.us
Y!MyWeb
 
 Search 
   
Baden Hughes, The University of Melbourne, Australia
Srikumar Venugopal, The University of Melbourne, Australia
Rajkumar Buyya, The University of Melbourne, Australia
In this paper we report experience in the use of computational grids in the domain of natural language processing, particularly in the area of information extraction, to create query indices for information retrieval tasks. Given the prevalence of large corpora in the natural language processing domain, computational grids offer significant utility to researchers in the domain who are reaching the bounds of computational efficiency. We leverage the affinities between the segmented data sources prevalent in natural language processing and the parallelisation model from the grid domain. The experiment reported here is a large-scale newswire corpus indexing task, with the goal to efficiently create a queryable index of the entire corpus. By parallelising the indexing task and executing it on an Australian computational grid, we observe overall performance improvement of a 2.26x speedup over the same experiment on a single computational node. In addition to reporting the raw performance impact, we reflect on a number of interesting points discovered during the execution of the experiments and propose a number of new requirements for grid middleware.
Citation:
Baden Hughes, Srikumar Venugopal, Rajkumar Buyya, "Grid-based Indexing of a Newswire Corpus," grid, pp.320-327, Fifth IEEE/ACM International Workshop on Grid Computing (GRID'04), 2004
Usage of this product signifies your acceptance of the Terms of Use.