A novel Q-learning algorithm with function approximation for constrained Markov decision processes

Lakshmanan, K. ; Bhatnagar, Shalabh (2012) A novel Q-learning algorithm with function approximation for constrained Markov decision processes In: 50th Annual Allerton Conference on Communication, Control, and Computing (Allerton), 1-5 Oct. 2012, Monticello, IL, USA.

Full text not available from this repository.

Official URL: http://doi.org/10.1109/Allerton.2012.6483246

Related URL: http://dx.doi.org/10.1109/Allerton.2012.6483246

Abstract

We present a novel multi-timescale Q-learning algorithm for average cost control in a Markov decision process subject to multiple inequality constraints. We formulate a relaxed version of this problem through the Lagrange multiplier method. Our algorithm is different from Q-learning in that it updates two parameters - a Q-value parameter and a policy parameter. The Q-value parameter is updated on a slower time scale as compared to the policy parameter. Whereas Q-learning with function approximation can diverge in some cases, our algorithm is seen to be convergent as a result of the aforementioned timescale separation. We show the results of experiments on a problem of constrained routing in a multistage queueing network. Our algorithm is seen to exhibit good performance and the various inequality constraints are seen to be satisfied upon convergence of the algorithm.

Item Type:Conference or Workshop Item (Paper)
Source:Copyright of this article belongs to Institute of Electrical and Electronics Engineers.
Keywords:Q-Learning With Linear Function Approximation; Constrained MDP; Lagrange Multiplier Method; Reinforcement Learning; Multi-Stage Stochastic Shortest Path Problem.
ID Code:116677
Deposited On:12 Apr 2021 07:22
Last Modified:12 Apr 2021 07:22

Repository Staff Only: item control page