Avrachenkov, Konstantin E. ; Borkar, Vivek S. (2022) Whittle index based Q-learning for restless bandits with average reward Automatica, 139 . p. 110186. ISSN 0005-1098
Full text not available from this repository.
Official URL: http://doi.org/10.1016/j.automatica.2022.110186
Related URL: http://dx.doi.org/10.1016/j.automatica.2022.110186
Abstract
A novel reinforcement learning algorithm is introduced for multiarmed restless bandits with average reward, using the paradigms of Q-learning and Whittle index. Specifically, we leverage the structure of the Whittle index policy to reduce the search space of Q-learning, resulting in major computational gains. Rigorous convergence analysis is provided, supported by numerical experiments. The numerical experiments show excellent empirical performance of the proposed scheme.
Item Type: | Article |
---|---|
Source: | Copyright of this article belongs to Elsevier Science. |
ID Code: | 135123 |
Deposited On: | 19 Jan 2023 07:18 |
Last Modified: | 19 Jan 2023 07:18 |
Repository Staff Only: item control page