Actor-critic--type learning algorithms for markov decision processes

Konda, Vijaymohan R. ; Borkar, Vivek S. (1999) Actor-critic--type learning algorithms for markov decision processes SIAM Journal on Control and Optimization, 38 (1). pp. 94-123. ISSN 0363-0129

Full text not available from this repository.

Official URL: http://link.aip.org/link/?SJCODC/38/94/1

Related URL: http://dx.doi.org/10.1137/S036301299731669X

Abstract

Algorithms for learning the optimal policy of a Markov decision process (MDP) based on simulated transitions are formulated and analyzed. These are variants of the well-known "actor-critic" (or "adaptive critic") algorithm in the artificial intelligence literature. Distributed asynchronous implementations are considered. The analysis involves two time scale stochastic approximations.

Item Type:Article
Source:Copyright of this article belongs to Society for Industrial & Applied Mathematics.
Keywords:Reinforcement Learning; Markov Decision Processes; Actor-critic Algorithms; Stochastic Approximation; Asynchronous Iterations
ID Code:5304
Deposited On:18 Oct 2010 08:32
Last Modified:20 May 2011 09:09

Repository Staff Only: item control page