Classification of search queries is a complex and computationally challenging task. Typically, search queries are short, reveal very few features per single query and are therefore a weak source for traditional machine learning. In this paper, we present a method that combines limited manual labeling, computational linguistics and information retrieval to classify a large collection of web search queries. A short set of manually chosen terms that are known a priori to be of interest to a particular class is used to cull a small number of actual queries from a commercial search engine log. These queries are then submitted to a commercial search engine and the returned search results are used to find more class related terms. We examine classification proficiency of the proposed method on a large web search engine query log and show that up to 48% of the unlabeled set could be classified using this method. We discuss results of this research and its implications on the advancement of short text classification.
Index Terms:
web search logs, machine learning, short text classification, labeled sets
Citation:
Isak Taksa, Sarah Zelikovitz, Amanda Spink, "Using Web Search Logs to Identify Query Classification Terms," itng, pp.469-474, International Conference on Information Technology (ITNG'07), 2007