已收录 268921 条政策
 政策提纲
  • 暂无提纲
Approximate receding horizon approach for Markov decision processes: average reward case
[摘要] We consider an approximation scheme for solving Markov decision processes (MDPs) with countable state space, finite action space, and bounded rewards that uses an approximate solution of a fixed finite-horizon sub-MDP of a given infinite-horizon MDP to create a stationary policy, which we call approximate receding horizon control. We first analyze the performance of the approximate receding horizon control for infinite-horizon average reward under an ergodicity assumption, which also generalizes the result obtained by White (J. Oper. Res. Soc. 33 (1982) 253-259). We then study two examples of the approximate receding horizon control via lower bounds to the exact solution to the sub-MDP. The first control policy is based on a finite-horizon approximation of Howard's policy improvement of a single policy and the second policy is based on a generalization of the single policy improvement for multiple policies. Along the study, we also provide a simple alternative proof on the policy improvement for countable state space. We finally discuss practical implementations of these schemes via simulation. (C) 2003 Elsevier Inc. All rights reserved.
[发布日期] 2003-10-15 [发布机构] 
[效力级别]  [学科分类] 
[关键词] Markov decision process;receding horizon control;infinite-horizon average reward;policy improvement;rollout;ergodicity [时效性] 
   浏览次数:1      统一登录查看全文      激活码登录查看全文