This article compares several strategies for searching in Web engines and we present the bucket algorithms to improve the efficiency of a classical index data structure for parallel textual database. We use the inverted files as the data structure and the vector space model to perform the ranking of documents. The main interest is the queries parallel processing on a cluster of PCs, and therefore this paper is focused in the communication and synchronization optimization. The design of the server that processes the queries, is effected on top of the Bulk Synchronous-BSP model of parallel computing, to study how query performance is affected by the index organization.
Citation:
V. Gil Costa, A. M. Printista, M. Marin, "ImprovingWeb Searches with Distributed Buckets Structures," la-web, pp.119-126, Fourth Latin American Web Congress (LA-WEB'06), 2006