Konda, Vijaymohan R. ; Borkar, Vivek S. (1999) Actor-critic--type learning algorithms for markov decision processes SIAM Journal on Control and Optimization, 38 (1). pp. 94-123. ISSN 0363-0129
Full text not available from this repository.
Official URL: http://link.aip.org/link/?SJCODC/38/94/1
Related URL: http://dx.doi.org/10.1137/S036301299731669X
Abstract
Algorithms for learning the optimal policy of a Markov decision process (MDP) based on simulated transitions are formulated and analyzed. These are variants of the well-known "actor-critic" (or "adaptive critic") algorithm in the artificial intelligence literature. Distributed asynchronous implementations are considered. The analysis involves two time scale stochastic approximations.
Item Type: | Article |
---|---|
Source: | Copyright of this article belongs to Society for Industrial & Applied Mathematics. |
Keywords: | Reinforcement Learning; Markov Decision Processes; Actor-critic Algorithms; Stochastic Approximation; Asynchronous Iterations |
ID Code: | 5304 |
Deposited On: | 18 Oct 2010 08:32 |
Last Modified: | 20 May 2011 09:09 |
Repository Staff Only: item control page