Bhatnagar, S. ; Kumar, S. (2005) A reinforcement learning based algorithm for markov decision processes In: International Conference on Intelligent Sensing and Information Processing, 4-7 Jan. 2005, Chennai, India.
Full text not available from this repository.
Official URL: http://doi.org/10.1109/ICISIP.2005.1529448
Related URL: http://dx.doi.org/10.1109/ICISIP.2005.1529448
Abstract
A variant of a recently proposed two-timescale reinforcement learning based actor-critic algorithm for infinite horizon discounted cost Markov decision processes with finite state and compact action spaces is proposed. On the faster timescale, the value function corresponding to a given stationary deterministic policy is updated and averaged while the policy itself is updated on the slower scale. The latter recursion uses the sign of the gradient estimate instead of the estimate itself. A potential advantage in the use of sign function lies in significantly reduced computation and communication overheads in applications such as congestion control in communication networks and distributed computation. Convergence analysis of the algorithm is briefly sketched and numerical experiments for a problem of congestion control are presented.
Item Type: | Conference or Workshop Item (Paper) |
---|---|
Source: | Copyright of this article belongs to Institute of Electrical and Electronics Engineers. |
ID Code: | 116735 |
Deposited On: | 12 Apr 2021 07:29 |
Last Modified: | 12 Apr 2021 07:29 |
Repository Staff Only: item control page