Optimising learning with transferable prior information
[摘要] This thesis addresses the problem of how to incorporate user knowledge about an environment, or information acquired during previous learning in that environment or a similar one, to make future learning more effective. The problem is tackled within the framework of learning from rewards while acting in a Markov Decision Process (MDP). Appropriately incorporating user knowledge and prior experience into learning should lead to better performance during learning (the exploitation-exploration trade-off), and offer a better solution at the end of the learning period.We work in a Bayesian setting and consider two main types of transferable information namely historical data and constraints involving absolute and relative restrictions on process dynamics. We present new algorithms for reasoning with transition constraints and show how to revise beliefs about the MDP transition matrix using constraints and prior knowledge. We also show how to use the resulting beliefs to control exploration. Finally we demonstrate benefits of historical information via power priors and by using process templates to transfer information from one environment to a second with related local process dynamics. We present results showing that incorporating historical data and constraints on state transitions in uncertain environments, either separately or collectively, can improve learning performance.
[发布日期] [发布机构] University:University of Birmingham;Department:School of Computer Science
[效力级别] [学科分类]
[关键词] Q Science;QA Mathematics;QA75 Electronic computers. Computer science [时效性]