Bhatnagar, Shalabh ; Borkar, Vivek S. ; Akarapu, Madhukar (2006) A simulation-based algorithm for ergodic control of Markov chains conditioned on rare events Journal of Machine Learning Research, 7 . pp. 1937-1962. ISSN 1532-4435
|
PDF
- Publisher Version
271kB |
Official URL: http://portal.acm.org/citation.cfm?id=1248617
Abstract
We study the problem of long-run average cost control of Markov chains conditioned on a rare event. In a related recent work, a simulation based algorithm for estimating performance measures associated with a Markov chain conditioned on a rare event has been developed. We extend ideas from this work and develop an adaptive algorithm for obtaining, online, optimal control policies conditioned on a rare event. Our algorithm uses three timescales or step-size schedules. On the slowest timescale, a gradient search algorithm for policy updates that is based on one-simulation simultaneous perturbation stochastic approximation (SPSA) type estimates is used. Deterministic perturbation sequences obtained from appropriate normalized Hadamard matrices are used here. The fast timescale recursions compute the conditional transition probabilities of an associated chain by obtaining solutions to the multiplicative Poisson equation (for a given policy estimate). Further, the risk parameter associated with the value function for a given policy estimate is updated on a timescale that lies in between the two scales above. We briefly sketch the convergence analysis of our algorithm and present a numerical application in the setting of routing multiple flows in communication networks.
Item Type: | Article |
---|---|
Source: | Copyright of this article belongs to Journal of Machine Learning Research. |
ID Code: | 5319 |
Deposited On: | 18 Oct 2010 08:39 |
Last Modified: | 16 May 2016 15:50 |
Repository Staff Only: item control page