Generalized Second-Order Value Iteration in Markov Decision Processes

Kamanchi, Chandramouli ; Diddigi, Raghuram Bharadwaj ; Bhatnagar, Shalabh (2022) Generalized Second-Order Value Iteration in Markov Decision Processes IEEE Transactions on Automatic Control, 67 (8). pp. 4241-4247. ISSN 0018-9286

Full text not available from this repository.

Official URL: http://doi.org/10.1109/TAC.2021.3112851

Related URL: http://dx.doi.org/10.1109/TAC.2021.3112851

Abstract

Value iteration is a fixed point iteration technique utilized to obtain the optimal value function and policy in a discounted reward Markov decision process (MDP). Here, a contraction operator is constructed and applied repeatedly to arrive at the optimal solution. Value iteration is a first-order method and, therefore, it may take a large number of iterations to converge to the optimal solution. Successive relaxation is a popular technique that can be applied to solve a fixed point equation. It has been shown in the literature that under a special structure of the MDP, successive overrelaxation technique computes the optimal value function faster than standard value iteration. In this article, we propose a second-order value iteration procedure that is obtained by applying the Newton–Raphson method to the successive relaxation value iteration scheme. We prove the global convergence of our algorithm to the optimal solution asymptotically and show the second-order convergence. Through experiments, we demonstrate the effectiveness of our proposed approach.

Item Type:Article
Source:Copyright of this article belongs to Institute of Electrical and Electronic Engineers.
ID Code:133780
Deposited On:30 Dec 2022 07:25
Last Modified:30 Dec 2022 07:25

Repository Staff Only: item control page