Learning and acting in unknown and uncertain worlds

[摘要] This dissertation addresses the problem of learning to act in an unknown and uncertain world. This is a difficult problem. Even if a world model is available, an assumption not made here, it is known to be intractable to learn an optimal policy for controlling behaviour (Littman 1996). Assuming no world model is known leads to two approaches: model-free learning, which attempts to learn to act without a model of the environment, and model learning, which attempts to learn a model of the environment from interactions with the world. Most earlier approaches make a priori assumptions about the complexity of the model or policy required, the upshot of which is that a fixed amount of memory is available to the agent. It is well known that in a noisy environment, the type assumed within, an environment specific amount of memory is required to act optimally. Fixing the capacity of memory before any interactions have occurred is thus a limiting assumption. The theme of this dissertation is that representing multiple policies or environment models of varying size enables us to address this problem. Both model-free learning and model learning are investigated. For the former, I present a policy search method (usable with a wide range of algorithms) that maintains a population of policies of varying size. By sharing information between policies I show that it can learn near optimal policies for a variety of challenging problems, and that performance is significantly improved over using the same amount of computation without information sharing. I investigate two approaches to model learning. The first is a variational Bayesian method for learning POMDPs. I show that it achieves superior results to the Bayes-adaptive algorithm (Ross, Chaib-draa and Pineau 2007) using their experimental setup. However, this experimental setup makes strong assumptions about prior information, and I show that weakening these assumptions leads to poor performance. I then address model learning for a simpler model, a topological map. I develop a novel non-parametric Bayesian map that sets no limit of the model size, and show experimentally that maps can be learned from robot data with weak prior knowledge.

[发布日期] [发布机构] University:University of Birmingham;Department:School of Computer Science

[效力级别] [学科分类]

[关键词] Q Science;QA Mathematics;QA75 Electronic computers. Computer science [时效性]

浏览次数：7

统一登录查看全文激活码登录查看全文