Sutton, Richard S. ; Maei, Hamid Reza ; Precup, Doina ; Bhatnagar, Shalabh ; Silver, David ; Szepesvári, Csaba ; Wiewiora, Eric (2009) Fast gradient-descent methods for temporal-difference learning with linear function approximation In: 26th International Conference on Machine Learning, June 14-18, 2009, Montreal, Canada.
Full text not available from this repository.
Official URL: http://doi.org/10.1145/1553374.1553501
Related URL: http://dx.doi.org/10.1145/1553374.1553501
Abstract
Sutton, Szepesvári and Maei (2009) recently introduced the first temporal-difference learning algorithm compatible with both linear function approximation and off-policy training, and whose complexity scales only linearly in the size of the function approximator. Although their gradient temporal difference (GTD) algorithm converges reliably, it can be very slow compared to conventional linear TD (on on-policy problems where TD is convergent), calling into question its practical utility. In this paper we introduce two new related algorithms with better convergence rates. The first algorithm, GTD2, is derived and proved convergent just as GTD was, but uses a different objective function and converges significantly faster (but still not as fast as conventional TD). The second new algorithm, linear TD with gradient correction, or TDC, uses the same update rule as conventional TD except for an additional term which is initially zero. In our experiments on small test problems and in a Computer Go application with a million features, the learning rate of this algorithm was comparable to that of conventional TD. This algorithm appears to extend linear TD to off-policy learning with no penalty in performance while only doubling computational requirements.
Item Type: | Conference or Workshop Item (Paper) |
---|---|
Source: | Copyright 2009 by the author(s)/owner(s). |
ID Code: | 116711 |
Deposited On: | 12 Apr 2021 07:25 |
Last Modified: | 12 Apr 2021 07:25 |
Repository Staff Only: item control page