loading...
Limits of Multi-Discounted Markov Decision Processes
Wroclaw, Poland July 10-July 14
DOI Bookmark: http://doi.ieeecomputersociety.org/10.1109/LICS.2007.2822nd Annual IEEE Symposium on Logic i ...
 This Article 
 
PDF
HTML
 
 Share 
   
 Bibliographic References 
   
 Add to: 
 
Digg
Furl
Spurl
Blink
Simpy
Google
Del.icio.us
Y!MyWeb
 
 Search 
   
Hugo Gimbert, LIX, Ecole Polytechnique, France
Wieslaw Zielonka, Universite Denis Diderot, France
Markov decision processes (MDPs) are controllable discrete event systems with stochastic transitions. The payoff received by the controller can be evaluated in different ways, depending on the payoff function the MDP is equipped with. For example a mean-payoff function evaluates average performance, whereas a discounted payoff function gives more weights to earlier performance by means of a discount factor. Another well-known example is the parity payoff function which is used to encode logical specifications [14].

Surprisingly, parity and mean-payoff MDPs share two non-trivial properties: they both have pure stationary optimal strategies [4, 15] and they both are approximable by discounted MDPs with multiple discount factors (multi- discounted MDPs) [5, 15].

In this paper we unify and generalize these results. We introduce a new class of payoff functions called the priority weighted payoff functions, which are generalization of both parity and mean-payoff functions. We prove that priority weighted MDPs admit optimal strategies that are pure and stationary, and that the priority weighted value of an MDP is the limit of the multi-discounted value when discount factors tend to 0 simultaneously at various speeds.

Citation:
Hugo Gimbert, Wieslaw Zielonka, "Limits of Multi-Discounted Markov Decision Processes," lics, pp.89-98, 22nd Annual IEEE Symposium on Logic in Computer Science (LICS 2007), 2007
Usage of this product signifies your acceptance of the Terms of Use.


Suggestions