Whittle index policy for crawling ephemeral content

Avrachenkov, Konstantin E. ; Borkar, Vivek S. (2015) Whittle index policy for crawling ephemeral content In: 2015 54th IEEE Conference on Decision and Control (CDC), Osaka, Japan, 15-18 December 2015.

Full text not available from this repository.

Official URL: http://doi.org/10.1109/CDC.2015.7403283

Related URL: http://dx.doi.org/10.1109/CDC.2015.7403283

Abstract

We consider the task of scheduling a crawler to retrieve from several sites their ephemeral content. This is content, such as news or posts at social network groups, for which a user typically loses interest after some days or hours. Thus development of a timely crawling policy for ephemeral information sources is very important. We first formulate this problem as an optimal control problem with average reward. The reward can be measured in terms of the number of clicks or relevant search requests. The problem in its exact formulation suffers from the curse of dimensionality and quickly becomes intractable even with moderate number of information sources. Fortunately, this problem admits a Whittle index, a celebrated heuristics which leads to problem decomposition and to a very simple and efficient crawling policy. We derive the Whittle index and provide its theoretical justification.

Item Type:Conference or Workshop Item (Other)
Source:Copyright of this article belongs to IEEE.
ID Code:135196
Deposited On:20 Jan 2023 05:57
Last Modified:20 Jan 2023 05:57

Repository Staff Only: item control page