Yao, Hengshuai ; Bhatnagar, Shalabh ; Diao, Dongcui ; Sutton, Richard S ; Szepesv\'ari, Csaba (2009) Multi-Step Dyna Planning for Policy Evaluation and Control In: Advances in Neural Information Processing Systems, Dec.7-11, Bangalore, India.
Full text not available from this repository.
Official URL: https://proceedings.neurips.cc/paper/2009/file/c52...
Abstract
We extend Dyna planning architecture for policy evaluation and control in two significant aspects. First, we introduce a multi-step Dyna planning that projects the simulated state/feature many steps into the future. Our multi-step Dyna is based on a multi-step model, which we call the {\em λ -model}. The λ -model interpolates between the one-step model and an infinite-step model, and can be learned efficiently online. Second, we use for Dyna control a dynamic multi-step model that is able to predict the results of a sequence of greedy actions and track the optimal policy in the long run. Experimental results show that Dyna using the multi-step model evaluates a policy faster than using single-step models; Dyna control algorithms using the dynamic tracking model are much faster than model-free algorithms; further, multi-step Dyna control algorithms enable the policy and value function to converge much faster to their optima than single-step Dyna algorithms.
Item Type: | Conference or Workshop Item (Paper) |
---|---|
Source: | Copyright 2010 by the author(s)/owner(s). |
ID Code: | 116688 |
Deposited On: | 12 Apr 2021 07:24 |
Last Modified: | 12 Apr 2021 07:24 |
Repository Staff Only: item control page