Collaborative filtering with low regret
[摘要] Collaborative filtering (CF) is a widely used technique in recommendation systems where recommendations are provided in a content-agnostic manner, and there are two main paradigms in neighborhood-based CF: the user-user paradigm and the item-item paradigm. To recommend to a user in the user-user paradigm, one first looks for similar users, and then recommends items liked by those similar users. In the item-item paradigm, in contrast, items similar to those liked by the user are found and subsequently recommended. Much empirical evidence exists for the success of the item-item paradigm (Linden et aL, 2003; Koren and Bell, 2011), and in this thesis, motivated to understand reasons behind this, we study its theoretical performance and prove guarantees. We work under a generic model where the population of items is represented by a distribution over [-1, +1 ]N , with a binary string of length N associated with each item to represent which of the N users likes (+1) or dislikes (-1) the item. As the first main result, we show that a simple algorithm following item-item paradigm achieves a regret (which captures the number of poor recommendations over T time steps) that is sublinear and scales as ... , where d is the doubling dimension of the item space. As the second main result we show that the cold-start time (which is the first time after which quality recommendations can be given) of this algorithm is ... , where v is the typical fraction of items that users like. This thesis advances the state of the art on many fronts. First, our cold-start bound differs from that of Brester et al. (2014) for user-user paradigm, where the cold-start time increases with number of items. Second, our regret bound is similar to those obtained in multi-armed bandits (surveyed in Bubeck and Cesa-Bianchi (2012)) when the arms belong to general spaces (Kleinberg et aL, 2013; Bubeck et aL, 2011). This is despite the notable differences that in our setting: (a) recommending the same item twice to a given user is not allowed, unlike in bandits where arms can be pulled twice; and (b) the distance function for the underlying metric space is not known in our setting. Finally, our mixture assumptions differ from earlier works, cf. (Kleinberg and Sandler, 2004; Dabeer, 2013; Bresler et al., 2014), that assume ;;gap;; between mixture components. We circumvent gap conditions by instead using the doubling dimension of the item space.
[发布日期] [发布机构] Massachusetts Institute of Technology
[效力级别] [学科分类]
[关键词] [时效性]