Universal Option Models

Yao, Hengshuai ; Szepesvari, Csaba ; Sutton, Rich ; Modayil, Joseph ; Bhatnagar, Shalabh (2014) Universal Option Models In: Advances in Neural Information processing Systems (NIPS), Dec. 8-11, Montreal, Canada.

Full text not available from this repository.

Abstract

We consider the problem of learning models of options for real-time abstract planning, in the setting where reward functions can be specified at any time and their expected returns must be efficiently computed. We introduce a new model for an option that is independent of any reward function, called the universal option model (UOM). We prove that the UOM of an option can construct a traditional option model given a reward function, and also supports efficient computation of the option-conditional return. We extend the UOM to linear function approximation, and we show the UOM gives the TD solution of option returns and the value function of a policy over options. We provide a stochastic approximation algorithm for incrementally learning UOMs from data and prove its consistency. We demonstrate our method in two domains. The first domain is a real-time strategy game, where the controller must select the best game unit to accomplish a dynamically-specified task. The second domain is article recommendation, where each user query defines a new reward function and an article’s relevance is the expected return from following a policy that follows the citations between articles. Our experiments show that UOMs are substantially more efficient than previously known methods for evaluating option returns and policies over options.

Item Type:Conference or Workshop Item (Paper)
ID Code:116664
Deposited On:12 Apr 2021 07:20
Last Modified:12 Apr 2021 07:20

Repository Staff Only: item control page