Keyword search over dynamic categorized information

Bhide, M. ; Chakaravarthy, V. T. ; Ramamritham, K. ; Roy, P. (2009) Keyword search over dynamic categorized information Proceedings of 25th IEEE International Conference on Data Engineering . pp. 258-269.

Full text not available from this repository.

Official URL: http://ieeexplore.ieee.org/xpl/login.jsp?tp=&arnum...

Related URL: http://dx.doi.org/10.1109/ICDE.2009.91

Abstract

Consider an information repository whose content is categorized. A data item (in the repository) can belong to multiple categories and new data is continuously added to the system. In this paper, we describe a system, CS*, which takes a keyword query and returns the relevant top-K categories. In contrast, traditional keyword search returns the top-K documents (i.e., data items) relevant to a user query. The need to dynamically categorize new data and also update the meta-data required for fast responses to user queries poses interesting challenges. The brute force approach of updating the meta-data by comparing each new data item with all the categories is impractical due to (i) the large cost involved in finding the categories associated with a data item and (ii) the high rate of arrival of new data items. We show that a sampling based approach which provides statistical guarantees on the reported results is also impracticable. We hence develop the CS* approach whose effectiveness results from its ability to focus on a strategically chosen subset of categories on the one hand and a subset of new data on the other. Given a query, CS* finds the top-K categories with high accuracy even in time-constrained situations. An experimental evaluation of the CS* system using real world data shows that it can easily achieve accuracy in excess of 90%, whereas other approaches demand at least 57% more resources (i.e., processing power), for providing similar results. Our experimental results also show that, contrary to expectations, if the rate of arrival of data items doubles, whereas CS* continues to provide high accuracy without a significant increase in resources, other approaches require more than double the number of resources.

Item Type:Article
Source:Copyright of this article belongs to IEEE.
ID Code:94227
Deposited On:24 Aug 2012 12:14
Last Modified:24 Aug 2012 12:14

Repository Staff Only: item control page