Bhatnagar, Shalabh ; Babu, K. Mohan (2008) New algorithms of the Q-learning type Automatica, 44 (4). pp. 1111-1119. ISSN 0005-1098
Full text not available from this repository.
Official URL: http://doi.org/10.1016/j.automatica.2007.09.009
Related URL: http://dx.doi.org/10.1016/j.automatica.2007.09.009
Abstract
We propose two algorithms for Q-learning that use the two-timescale stochastic approximation methodology. The first of these updates Q-values of all feasible state–action pairs at each instant while the second updates Q-values of states with actions chosen according to the ‘current’ randomized policy updates. A proof of convergence of the algorithms is shown. Finally, numerical experiments using the proposed algorithms on an application of routing in communication networks are presented on a few different settings.
Item Type: | Article |
---|---|
Source: | Copyright of this article belongs to Elsevier B.V. |
ID Code: | 116558 |
Deposited On: | 12 Apr 2021 06:48 |
Last Modified: | 12 Apr 2021 06:48 |
Repository Staff Only: item control page