A time aggregation approach to Markov decision processes

Dimensions

Cao, Xi-Ren ; Ren, Zhiyuan ; Bhatnagar, Shalabh ; Fu, Michael ; Marcus, Steven (2002) A time aggregation approach to Markov decision processes Automatica, 38 (6). pp. 929-943. ISSN 0005-1098

Full text not available from this repository.

Official URL: http://doi.org/10.1016/S0005-1098(01)00282-5

Related URL: http://dx.doi.org/10.1016/S0005-1098(01)00282-5

Abstract

We propose a time aggregation approach for the solution of infinite horizon average cost Markov decision processes via policy iteration. In this approach, policy update is only carried out when the process visits a subset of the state space. As in state aggregation, this approach leads to a reduced state space, which may lead to a substantial reduction in computational and storage requirements, especially for problems with certain structural properties. However, in contrast to state aggregation, which generally results in an approximate model due to the loss of Markov property, time aggregation suffers no loss of accuracy, because the Markov property is preserved. Single sample path-based estimation algorithms are developed that allow the time aggregation approach to be implemented on-line for practical systems. Some numerical and simulation examples are presented to illustrate the ideas and potential computational savings.

Item Type:	Article
Source:	Copyright of this article belongs to Elsevier B.V.
ID Code:	116583
Deposited On:	12 Apr 2021 06:55
Last Modified:	12 Apr 2021 06:55

Repository Staff Only: item control page

PlumX Metrics