Bhatnagar, Shalabh ; Borkar, Vivek S. ; Prashanth, L. A. (2013) Adaptive Feature Pursuit: Online Adaptation of Features in Reinforcement Learning Reinforcement Learning and Approximate Dynamic Programming for Feedback Control, 23 . John Wiley & Sons, Inc., pp. 517-534. ISBN 9781118453988
Full text not available from this repository.
Official URL: http://doi.org/10.1002/9781118453988.ch23
Related URL: http://dx.doi.org/10.1002/9781118453988.ch23
Abstract
This chapter presents a novel feature adaptation scheme based on temporal difference (TD) learning for the problem of prediction. The scheme suitably combines aspects of exploitation and exploration by (a) finding the worst basis vector in the feature matrix at each stage and replacing it with the current best estimate of the normalized value function and (b) replacing the second worst basis vector with another vector chosen randomly that would result in a new subspace of basis vectors getting picked. The chapter uses the algorithm to a problem of prediction in traffic signal control and observes good performance over two different network settings. As future work, the chapter considers the application of TD learning algorithm together with other schemes such as least squares temporal difference (LSTD) learning and least squares policy evaluation (LSPE).
Item Type: | Book |
---|---|
Source: | Copyright of this article belongs to John Wiley & Sons, Inc.. |
Keywords: | Convergence Analysis; Feature Adaptation Scheme; Online Feature Adaptation; Reinforcement Learning; Temporal Difference (TD) Learning; Traffic Signal Control. |
ID Code: | 116477 |
Deposited On: | 12 Apr 2021 05:47 |
Last Modified: | 12 Apr 2021 05:47 |
Repository Staff Only: item control page