Mind-theoretic planning for social robots
[摘要] As robots move out of factory floors and into human environments, out from safe barricaded workstations to operating in close proximity with people, they will increasingly be expected to understand and coordinate with basic aspects of human behavior. If they are to become useful and productive participants in human-robot teams, they will require effective methods of modeling their human counterparts in order to better coordinate and cooperate with them. Theory of Mind (ToM) is defined as people;;s ability to reason about others;; behavior in terms of their internal states, such as beliefs and desires. Having a ToM allows an individual to understand the observed behavior of others, based not only on directly observable perceptual features but also an understanding of underlying mental states; this understanding allows the individual to anticipate and better react to future actions. In this thesis a Mind-Theoretic Planning (MTP) system is presented which attempts to provide robots with some of the basic ToM abilities that people rely on for coordinating and interacting with others. The MTP system frames the problem of mind-theoretic reasoning as a planning problem with mixed observability. A predictive forward model of others;; behavior is computed by creating a set of mental state situations (MSS), each composed of stacks of Markov Decision Process (MDP) models whose solutions provide approximations of anticipated rational actions and reactions of that agent. This forward model, in addition to a perceptual-range limiting observation function, is combined into a Partially Observable MDP (POMDP). The presented MTP approach increases computational efficiency by taking advantage of approximation methods offered by a novel POMDP solver B3RTDP as well as leveraging value functions at various levels of the MSS as heuristics for value functions at higher levels. For the purpose of creating an efficient MTP system, a novel general-purpose online POMDP solver B3RTDP was developed. This planner extends the Real- Time Dynamic Programming (RTDP) approach to solving POMDPs. By using a bounded value function representation, we are able to apply a novel approach to pruning the belief-action search graph and maintain a Convergence Frontier, a novel mechanism for taking advantage of early action convergence, which can greatly improve RTDP;;s search time. Lastly, an online video game was developed for the purpose of evaluating the MTP system by having people complete tasks in a virtual environment with a simulated robotic assistant. A human subject study was performed to assess both the objective behavioral differences in performance of the human-robot teams, as well as the subjective attitudinal differences in how people perceived agents with varying MTP capabilities. We demonstrate that providing agents with mind-theoretic capabilities can significantly improve the efficiency of human-robot teamwork in certain domains and suggest that it may also positively influence humans;; subjective perception of their robotic teammates.
[发布日期] [发布机构] Massachusetts Institute of Technology
[效力级别] [学科分类]
[关键词] [时效性]