Two-timescale Algorithms For Learning Nash Equilibria In General-sum Stochastic Games

H.L., Prasad ; L.A., Prashanth ; Bhatnagar, Shalabh (2015) Two-timescale Algorithms For Learning Nash Equilibria In General-sum Stochastic Games In: Proceedings of the 14th International Conference on Autonomous Agents and Multiagent Systems (AAMAS 2015), May 4-8, 2015, Istanbul, Turkey.

Full text not available from this repository.

Official URL: https://dl.acm.org/doi/10.5555/2772879.2773328

Abstract

We consider the problem of finding stationary Nash equilibria (NE) in a finite discounted general-sum stochastic game. We first generalize a non-linear optimization problem from [9] to a general Nplayer game setting. Next, we break down the optimization problem into simpler sub-problems that ensure there is no Bellman error for a given state and an agent. We then provide a characterization of solution points of these sub-problems that correspond to Nash equilibria of the underlying game and for this purpose, we derive a set of necessary and sufficient SG-SP (Stochastic Game - Sub-Problem) conditions. Using these conditions, we develop two provably convergent algorithms. The first algorithm - OFF-SGSP - is centralized and model-based, i.e., it assumes complete information of the game. The second algorithm - ON-SGSP - is an online model-free algorithm. We establish that both algorithms converge, in self-play, to the equilibria of a certain ordinary differential equation (ODE), whose stable limit points coincide with stationary NE of the underlying general-sum stochastic game. On a single state non-generic game [12] as well as on a synthetic two-player game setup with 810, 000 states, we establish that ON-SGSP consistently outperforms NashQ [16] and FFQ [21] algorithms.

Item Type:Conference or Workshop Item (Paper)
Source:Copyright of this article belongs to International Foundation for Autonomous Agents and Multiagent Systems.
Keywords:Stochastic Games; Multi Agent Reinforcement Learning; Nash Equilibrium; Two Timescale Stochastic Approximation.
ID Code:116655
Deposited On:12 Apr 2021 07:18
Last Modified:12 Apr 2021 07:18

Repository Staff Only: item control page