Exact solution of the Bellman equation for aβ-discounted reward in a two-armed bandit with switching arms
[摘要] We consider the symmetric Poissonian two-armed bandit problem. For the case of switching arms, only one of which creates reward, we solve explicitly the Bellman equation for aβ-discounted reward and prove that a myopic policy is optimal.
[发布日期] [发布机构]
[效力级别] [学科分类] 应用数学
[关键词] [时效性]