loading...
Natural Gradient Policy for Average Cost SMDP Problem
Paris, France October 29-October 31
DOI Bookmark: http://doi.ieeecomputersociety.org/10.1109/ICTAI.2007.1219th IEEE International Conference on ...
 This Article 
 
PDF
HTML
 
 Share 
   
 Bibliographic References 
   
 Add to: 
 
Digg
Furl
Spurl
Blink
Simpy
Google
Del.icio.us
Y!MyWeb
 
 Search 
   
Semi-Markov decision processes (SMDP) are continu- ous time generlizations of discrete time Markov Decision Process. A number of value and policy iteration algorithms have been developed for the solution of SMDP problem. But solving SMDP problem requires prior knowledge of the de- terministic kernels, and suffers from the curse of dimension- ality. In this paper, we present the steepest descent direction based on a family of parameterized policies to overcome those limitations. The update rule is based on stochastic policy gradients employing Amari's natural gradient ap- proach that is moving toward choosing a greedy optimal action. We then show considerable performance improve- ments of this method in the simple two-state SMDP problem and in the more complex SMDP of call admission control problem.
Citation:
Ngo Anh Vien, TaeChoong Chung, "Natural Gradient Policy for Average Cost SMDP Problem," ictai, vol. 1, pp.11-18, 19th IEEE International Conference on Tools with Artificial Intelligence - Vol.1 (ICTAI 2007), 2007
Usage of this product signifies your acceptance of the Terms of Use.