Pandey, Sandeep ; Ramamritham, Krithi ; Chakrabarti, Soumen (2003) Monitoring the dynamic web to respond to continuous queries Proceedings of the 12th international conference on World Wide Web . pp. 659-668.
Full text not available from this repository.
Official URL: http://www.acm.org/publications
Abstract
Continuous queries are queries for which responses given to users must be continuously updated, as the sources of interest get updated. Such queries occur, for instance, during on-line decision making, e.g., traffic flow control, weather monitoring, etc. The problem of keeping the responses current reduces to the problem of deciding how often to visit a source to determine if and how it has been modified, in order to update earlier responses accordingly. On the surface, this seems to be similar to the crawling problem since crawlers attempt to keep indexes up-to-date as pages change and users pose search queries. We show that this is not the case, both due to the inherent differences between the nature of the two problems as well as the performance metric. We propose, develop and evaluate a novel multi-phase (Continuous Adaptive Monitoring) (CAM) solution to the problem of maintaining the currency of query results. Some of the important phases are: The tracking phase, in which changes, to an initially identified set of relevant pages, are tracked. From the observed change characteristics of these pages, a probabilistic model of their change behavior is formulated and weights are assigned to pages to denote their importance for the current queries. During the next phase, the resource allocation phase, based on these statistics, resources, needed to continuously monitor these pages for changes, are allocated. Given these resource allocations, the scheduling phase produces an optimal achievable schedule for the monitoring tasks. An experimental evaluation of our approach compared to prior approaches for crawling dynamic web pages shows the effectiveness of CAM for monitoring dynamic changes. For example, by monitoring just 5% of the page changes, CAM is able to return 90% of the changed information to the users. The experiments also produce some interesting observations pertaining to the differences between the two problems of crawling--to build an index--and the problem of change tracking--to respond to continuous queries.
Item Type: | Article |
---|---|
Source: | Copyright of this article belongs to Association for Computing Machinery. |
ID Code: | 94237 |
Deposited On: | 24 Aug 2012 09:33 |
Last Modified: | 24 Aug 2012 09:33 |
Repository Staff Only: item control page