Two Time-Scale Stochastic Approximation with Controlled Markov Noise and Off-Policy Temporal-Difference Learning

Karmakar, Prasenjit ; Bhatnagar, Shalabh (2018) Two Time-Scale Stochastic Approximation with Controlled Markov Noise and Off-Policy Temporal-Difference Learning Mathematics of Operations Research, 43 (1). pp. 130-151. ISSN 0364-765X

Full text not available from this repository.

Official URL: http://doi.org/10.1287/moor.2017.0855

Related URL: http://dx.doi.org/10.1287/moor.2017.0855

Abstract

We present for the first time an asymptotic convergence analysis of two time-scale stochastic approximation driven by “controlled” Markov noise. In particular, the faster and slower recursions have nonadditive controlled Markov noise components in addition to martingale difference noise. We analyze the asymptotic behavior of our framework by relating it to limiting differential inclusions in both time scales that are defined in terms of the ergodic occupation measures associated with the controlled Markov processes. Finally, we present a solution to the off-policy convergence problem for temporal-difference learning with linear function approximation, using our results.

Item Type:Article
Source:Copyright of this article belongs to The Institute for Operations Research and the Management Sciences.
Keywords:Markov Noise; Two Time-scale Stochastic Approximation; Asymptotic Convergence; Temporal-Difference Learning.
ID Code:116465
Deposited On:12 Apr 2021 05:58
Last Modified:12 Apr 2021 05:58

Repository Staff Only: item control page