A Simultaneous Perturbation Stochastic Approximation-Based Actor–Critic Algorithm for Markov Decision Processes

Bhatnagar, S. ; Kumar, S. (2004) A Simultaneous Perturbation Stochastic Approximation-Based Actor–Critic Algorithm for Markov Decision Processes IEEE Transactions on Automatic Control, 49 (4). pp. 592-598. ISSN 0018-9286

Full text not available from this repository.

Official URL: http://doi.org/10.1109/TAC.2004.825622

Related URL: http://dx.doi.org/10.1109/TAC.2004.825622

Abstract

A two-timescale simulation-based actor-critic algorithm for solution of infinite horizon Markov decision processes with finite state and compact action spaces under the discounted cost criterion is proposed. The algorithm does gradient search on the slower timescale in the space of deterministic policies and uses simultaneous perturbation stochastic approximation-based estimates. On the faster scale, the value function corresponding to a given stationary policy is updated and averaged over a fixed number of epochs (for enhanced performance). The proof of convergence to a locally optimal policy is presented. Finally, numerical experiments using the proposed algorithm on flow control in a bottleneck link using a continuous time queueing model are shown.

Item Type:Article
Source:Copyright of this article belongs to Institute of Electrical and Electronics Engineers.
ID Code:116580
Deposited On:12 Apr 2021 06:54
Last Modified:12 Apr 2021 06:54

Repository Staff Only: item control page