A Generalized Minimax Q-Learning Algorithm for Two-Player Zero-Sum Stochastic Games

Diddigi, Raghuram Bharadwaj ; Kamanchi, Chandramouli ; Bhatnagar, Shalabh (2022) A Generalized Minimax Q-Learning Algorithm for Two-Player Zero-Sum Stochastic Games IEEE Transactions on Automatic Control, 67 (9). pp. 4816-4823. ISSN 0018-9286

Full text not available from this repository.

Official URL: http://doi.org/10.1109/TAC.2022.3159453

Related URL: http://dx.doi.org/10.1109/TAC.2022.3159453

Abstract

We consider the problem of two-player zero-sum games. This problem is formulated as a min–max Markov game in this article. The solution of this game, which is the min–max payoff, starting from a given state is called the min–max value of the state. In this article, we compute the solution of the two-player zero-sum game, utilizing the technique of successive relaxation that has been successfully applied in this article to compute a faster value iteration algorithm in the context of Markov decision processes. We extend the concept of successive relaxation to the setting of two-player zero-sum games. We show that, under a special structure on the game, this technique facilitates faster computation of the min–max value of the states. We then derive a generalized minimax Q-learning algorithm, which computes the optimal policy when the model information is not known. Finally, we prove the convergence of the proposed generalized minimax Q-learning algorithm utilizing stochastic approximation techniques, under an assumption on the boundedness of iterates. Through experiments, we demonstrate the

Item Type:Article
Source:Copyright of this article belongs to Institute of Electrical and Electronic Engineers.
ID Code:133781
Deposited On:30 Dec 2022 07:28
Last Modified:30 Dec 2022 07:28

Repository Staff Only: item control page