Bounds for off-policy prediction in reinforcement learning

Joseph, Ajin George ; Bhatnagar, Shalabh (2017) Bounds for off-policy prediction in reinforcement learning In: International Joint Conference on Neural Networks (IJCNN), 14-19 May 2017, Anchorage, AK.

Full text not available from this repository.

Official URL: http://doi.org/10.1109/IJCNN.2017.7966359

Related URL: http://dx.doi.org/10.1109/IJCNN.2017.7966359

Abstract

In this paper, we provide for the first time, error bounds for the off-policy prediction in reinforcement learning. The primary objective in off-policy prediction is to estimate the value function of a given target policy of interest using the linear function approximation architecture by utilizing a sample trajectory generated by a behaviour policy which is possibly different from the target policy. The stability of the off-policy prediction has been an open question for a long time. Only recently, could Yu provide a generalized proof, which makes our results more appealing to the reinforcement learning community. The off-policy prediction is useful in complex reinforcement learning settings, where the sample trajectory is hard to obtain and one has to rely on the sample behaviour of the system with respect to an arbitrary policy. We provide here error bound on the solution of the off-policy prediction with respect to a closeness measure between the target and the behaviour policy.

Item Type:Conference or Workshop Item (Paper)
Source:Copyright of this article belongs to Institute of Electrical and Electronics Engineers.
ID Code:116647
Deposited On:12 Apr 2021 07:17
Last Modified:12 Apr 2021 07:17

Repository Staff Only: item control page