已收录 268917 条政策
 政策提纲
  • 暂无提纲
An ϵ -Greedy Multiarmed Bandit Approach to Markov Decision Processes
[摘要] We present REGA, a new adaptive-sampling-based algorithm for the control of finite-horizon Markov decision processes (MDPs) with very large state spaces and small action spaces. We apply a variant of theϵ -greedy multiarmed bandit algorithm to each stage of the MDP in a recursive manner, thus computing an estimation of the “reward-to-go” value at each stage of the MDP. We provide a finite-time analysis of REGA. In particular, we provide a bound on the probability that the approximation error exceeds a given threshold, where the bound is given in terms of the number of samples collected at each stage of the MDP. We empirically compare REGA against another sampling-based algorithm called RASA by running simulations against the SysAdmin benchmark problem with 2 10 states. The results show that REGA and RASA achieved similar performance. Moreover, REGA and RASA empirically outperformed an implementation of the algorithm that uses the “original”ϵ -greedy algorithm that commonly appears in the literature.
[发布日期]  [发布机构] 
[效力级别]  [学科分类] 农艺学与作物科学
[关键词] multiarmed bandits;epsilon-greedy method;Markov decision process (MDP);sampling;optimization under uncertainties [时效性] 
   浏览次数:8      统一登录查看全文      激活码登录查看全文