Convergent Temporal-difference Learning With Arbitrary Smooth Function Approximation

Bhatnagar, Shalabh ; Precup, Doina ; Silver, David ; Sutton, Richard S. ; Maei, Hamid ; Szepesvári, Csaba (2009) Convergent Temporal-difference Learning With Arbitrary Smooth Function Approximation In: NIPS'09: Proceedings of the 22nd International Conference on Neural Information Processing Systems, December 2009.

Full text not available from this repository.

Official URL: https://papers.nips.cc/paper/2009/file/3a15c7d0bbe...

Abstract

We introduce the first temporal-difference learning algorithms that converge with smooth value function approximators, such as neural networks. Conventional temporal-difference (TD) methods, such as TD(λ), Q-learning and Sarsa have been used successfully with function approximation in many applications. However, it is well known that off-policy sampling, as well as nonlinear function approximation, can cause these algorithms to become unstable (i.e., the parameters of the approximator may diverge). Sutton et al. (2009a, 2009b) solved the problem of off-policy learning with linear TD algorithms by introducing a new objective function, related to the Bellman error, and algorithms that perform stochastic gradient-descent on this function. These methods can be viewed as natural generalizations to previous TD methods, as they converge to the same limit points when used with linear function approximation methods. We generalize this work to nonlinear function approximation. We present a Bellman error objective function and two gradient-descent TD algorithms that optimize it. We prove the asymptotic almost-sure convergence of both algorithms, for any finite Markov decision process and any smooth value function approximator, to a locally optimal solution. The algorithms are incremental and the computational complexity per time step scales linearly with the number of parameters of the approximator. Empirical results obtained in the game of Go demonstrate the algorithms’ effectiveness.

Item Type:Conference or Workshop Item (Paper)
Source:Copyright by the author(s)/owner(s).
ID Code:116701
Deposited On:12 Apr 2021 07:24
Last Modified:12 Apr 2021 07:24

Repository Staff Only: item control page