Borkar, V. S. (2002) Q-learning for risk-sensitive control Mathematics of Operations Research, 27 (2). pp. 294-311. ISSN 0364-765X
Full text not available from this repository.
Official URL: http://mor.journal.informs.org/content/27/2/294.ab...
Related URL: http://dx.doi.org/10.1287/moor.27.2.294.324
Abstract
We propose for risk-sensitive control of finite Markov chains a counterpart of the popular Q-learning algorithm for classical Markov decision processes. The algorithm is shown to converge with probability one to the desired solution. The proof technique is an adaptation of the o.d.e. approach for the analysis of stochastic approximation algorithms, with most of the work involved used for the analysis of the specific o.d.e.s that arise.
Item Type: | Article |
---|---|
Source: | Copyright of this article belongs to INFORMS. |
Keywords: | Markov Decision Processes; Risk-sensitive Control; Reinforcement Learning; Q-learning; Stochastic Approximation; Dynamic Programming |
ID Code: | 81452 |
Deposited On: | 06 Feb 2012 05:04 |
Last Modified: | 06 Feb 2012 05:04 |
Repository Staff Only: item control page