QWI: Q-learning with Whittle Index

Robledo, Francisco ; Borkar, Vivek ; Ayesta, Urtzi ; Avrachenkov, Konstantin (2022) QWI: Q-learning with Whittle Index ACM SIGMETRICS Performance Evaluation Review, 49 (2). pp. 47-50.

Full text not available from this repository.

Official URL: https://doi.org/10.1145/3512798.3512816

Related URL: http://dx.doi.org/10.1145/3512798.3512816

Abstract

The Whittle index policy is a heuristic that has shown remarkable good performance (with guaranted asymptotic optimality) when applied to the class of problems known as multi-armed restless bandits. In this paper we develop QWI, an algorithm based on Q-learning in order to learn theWhittle indices. The key feature is the deployment of two timescales, a relatively faster one to update the state-action Qfunctions, and a relatively slower one to update the Whittle indices. In our main result, we show that the algorithm converges to the Whittle indices of the problem. Numerical computations show that our algorithm converges much faster than both the standard Q-learning algorithm as well as neural-network based approximate Q-learning.

Item Type:Article
Source:Copyright of this article belongs to ACM, Inc.
ID Code:135131
Deposited On:19 Jan 2023 07:45
Last Modified:19 Jan 2023 07:45

Repository Staff Only: item control page