Q-learning for Markov decision processes with a satisfiability criterion

Shah, Suhail M. ; Borkar, Vivek S. (2018) Q-learning for Markov decision processes with a satisfiability criterion Systems & Control Letters, 113 . pp. 45-51. ISSN 0167-6911

Full text not available from this repository.

Official URL: http://doi.org/10.1016/j.sysconle.2018.01.003

Related URL: http://dx.doi.org/10.1016/j.sysconle.2018.01.003

Abstract

A reinforcement learning algorithm is proposed in order to solve a multi-criterion Markov decision process, i.e., an MDP with a vector running cost. Specifically, it combines a Q-learning scheme for a weighted linear combination of the prescribed running costs with an incremental version of replicator dynamics that updates the weights. The objective is that the time averaged vector cost meets prescribed asymptotic bounds. Under mild assumptions, it is shown that the scheme achieves the desired objective.

Item Type:Article
Source:Copyright of this article belongs to Elsevier Science.
ID Code:135156
Deposited On:19 Jan 2023 10:32
Last Modified:19 Jan 2023 10:32

Repository Staff Only: item control page