STREAMING CORESETS FOR HIGH-DIMENSIONAL GEOMETRY
[摘要] This thesis studies clustering problems on data streams, specifically with applications to metric spaces that are either non-Euclidean or of high-dimensionality. The algorithms are introduced to approximate minima of the k-median function, although they can be used to find approximate minima of many other functions are shown in Part I.In Part I, we introduce the first analysis of the effect of stream order on the space required to solve k-median. We use this to forge a connection to random-order streams and provide an optimal O(nk)-time algorithm for k-median in the RAM model. In Part II, we provide a nearly optimal algorithm to maintain a coreset for k-median. Specifically for the case of k-means, we introduce the first coreset to be simultaneously independent of both the dimension and the input size. This gives the first sublinear size coreset for high- dimensional sparse data. In Part III, we apply our techniques to streams where points may be deleted as well as inserted, and construct the first coreset for high-dimensional spaces.
[发布日期] [发布机构] Johns Hopkins University
[效力级别] coresets [学科分类]
[关键词] streaming;coresets;Mathematics [时效性]