Parametrized Actor-Critic Algorithms for Finite-Horizon MDPs

Abdulla, Mohammed Shahid ; Bhatnagar, Shalabh (2007) Parametrized Actor-Critic Algorithms for Finite-Horizon MDPs In: American Control Conference, 9-13 July 2007, New York, NY, USA.

Full text not available from this repository.

Official URL: http://doi.org/10.1109/ACC.2007.4282587

Related URL: http://dx.doi.org/10.1109/ACC.2007.4282587

Abstract

Due to their non-stationarity, finite-horizon Markov decision processes (FH-MDPs) have one probability transition matrix per stage. Thus the curse of dimensionality affects FH-MDPs more severely than infinite-horizon MDPs. We propose two parametrized 'actor-critic' algorithms to compute optimal policies for FH-MDPs. Both algorithms use the two-timescale stochastic approximation technique, thus simultaneously performing gradient search in the parametrized policy space (the 'actor') on a slower timescale and learning the policy gradient (the 'critic') via a faster recursion. This is in contrast to methods where critic recursions learn the cost-to-go proper. We show w.p 1 convergence to a set with the necessary condition for constrained optima. The proposed parameterization is for FH-MDPs with compact action sets, although certain exceptions can be handled. Further, a third algorithm for stochastic control of stopping time processes is presented. We explain why current policy evaluation methods do not work as critic to the proposed actor recursion. Simulation results from flow-control in communication networks attest to the performance advantages of all three algorithms.

Item Type:Conference or Workshop Item (Paper)
Source:Copyright of this article belongs to Institute of Electrical and Electronics Engineers.
Keywords:Finite Horizon Markov Decision Processes; Reinforcement Learning; Two Timescale Stochastic Approximation; Actor-critic Algorithms.
ID Code:116725
Deposited On:12 Apr 2021 07:28
Last Modified:12 Apr 2021 07:28

Repository Staff Only: item control page