Multi-armed bandits based on a variant of Simulated Annealing

Abdulla, Mohammed Shahid ; Bhatnagar, Shalabh (2016) Multi-armed bandits based on a variant of Simulated Annealing Indian Journal of Pure and Applied Mathematics, 47 (2). pp. 195-212. ISSN 0019-5588

Full text not available from this repository.

Official URL: http://doi.org/10.1007/s13226-016-0184-5

Related URL: http://dx.doi.org/10.1007/s13226-016-0184-5

Abstract

A variant of Simulated Annealing termed Simulated Annealing with Multiplicative Weights (SAMW) has been proposed in the literature. However, convergence was dependent on a parameter β(T), which was calculated a-priori based on the total iterations T the algorithm would run for. We first show the convergence of SAMW even when a diminishing stepsize β k → 1 is used, where k is the index of iteration. Using this SAMW as a kernel, a stochastic multi-armed bandit (SMAB) algorithm called SOFTMIX can be improved to obtain the minimum-possible log regret, as compared to log2 regret of the original. Another modification of SOFTMIX is proposed which avoids the need for a parameter that is dependent on the reward distribution of the arms. Further, a variant of SOFTMIX that uses a comparison term drawn from another popular SMAB algorithm called UCB1 is then described. It is also shown why the proposed scheme is computationally more efficient over UCB1, and an alternative to this algorithm with simpler stepsizes is also proposed. Numerical simulations for all the proposed algorithms are then presented.

Item Type:Article
Source:Copyright of this article belongs to Springer Nature.
Keywords:Stochastic Processes; Applied Probability; Statistics; Discrete Optimization.
ID Code:116495
Deposited On:12 Apr 2021 06:02
Last Modified:12 Apr 2021 06:02

Repository Staff Only: item control page