An actor–critic algorithm with function approximation for discounted cost constrained Markov decision processes

Dimensions

Bhatnagar, Shalabh (2010) An actor–critic algorithm with function approximation for discounted cost constrained Markov decision processes Systems & Control Letters, 59 (12). pp. 760-766. ISSN 0167-6911

Full text not available from this repository.

Official URL: http://doi.org/10.1016/j.sysconle.2010.08.013

Related URL: http://dx.doi.org/10.1016/j.sysconle.2010.08.013

Abstract

We develop in this article the first actor–critic reinforcement learning algorithm with function approximation for a problem of control under multiple inequality constraints. We consider the infinite horizon discounted cost framework in which both the objective and the constraint functions are suitable expected policy-dependent discounted sums of certain sample path functions. We apply the Lagrange multiplier method to handle the inequality constraints. Our algorithm makes use of multi-timescale stochastic approximation and incorporates a temporal difference (TD) critic and an actor that makes a gradient search in the space of policy parameters using efficient simultaneous perturbation stochastic approximation (SPSA) gradient estimates. We prove the asymptotic almost sure convergence of our algorithm to a locally optimal policy.

Item Type:	Article
Source:	Copyright of this article belongs to Elsevier B.V.
Keywords:	Constrained Markov Decision Processes; Infinite Horizon Discounted Cost Criterion; Function Approximation; Actor–Critic Algorithm; Simultaneous Perturbation Stochastic Approximation.
ID Code:	116542
Deposited On:	12 Apr 2021 06:46
Last Modified:	12 Apr 2021 06:46

Repository Staff Only: item control page

PlumX Metrics