Multi-Step Dyna Planning for Policy Evaluation and Control

Yao, Hengshuai ; Bhatnagar, Shalabh ; Diao, Dongcui ; Sutton, Richard S ; Szepesv\'ari, Csaba (2009) Multi-Step Dyna Planning for Policy Evaluation and Control In: Advances in Neural Information Processing Systems, Dec.7-11, Bangalore, India.

Full text not available from this repository.

Official URL: https://proceedings.neurips.cc/paper/2009/file/c52...

Abstract

We extend Dyna planning architecture for policy evaluation and control in two significant aspects. First, we introduce a multi-step Dyna planning that projects the simulated state/feature many steps into the future. Our multi-step Dyna is based on a multi-step model, which we call the {\em λ -model}. The λ -model interpolates between the one-step model and an infinite-step model, and can be learned efficiently online. Second, we use for Dyna control a dynamic multi-step model that is able to predict the results of a sequence of greedy actions and track the optimal policy in the long run. Experimental results show that Dyna using the multi-step model evaluates a policy faster than using single-step models; Dyna control algorithms using the dynamic tracking model are much faster than model-free algorithms; further, multi-step Dyna control algorithms enable the policy and value function to converge much faster to their optima than single-step Dyna algorithms.

Item Type:	Conference or Workshop Item (Paper)
Source:	Copyright 2010 by the author(s)/owner(s).
ID Code:	116688
Deposited On:	12 Apr 2021 07:24
Last Modified:	12 Apr 2021 07:24

Repository Staff Only: item control page

PlumX Metrics