Approximate k-means clustering through random projections
[摘要] Using random row projections, we show how to approximate a data matrix A with a much smaller sketch à that can be used to solve a general class of constrained k-rank approximation problems to within (1 + [epsilon]) error. Importantly, this class of problems includes k-means clustering. By reducing data points to just O(k) dimensions, our methods generically accelerate any exact, approximate, or heuristic algorithm for these ubiquitous problems. For k-means dimensionality reduction, we provide (1+ [epsilon]) relative error results for random row projections which improve on the (2 + [epsilon]) prior known constant factor approximation associated with this sketching technique, while preserving the number of dimensions. For k-means clustering, we show how to achieve a (9 + [epsilon]) approximation by Johnson-Lindenstrauss projecting data points to just 0(log k/[epsilon]2 ) dimensions. This gives the first result that leverages the specific structure of k-means to achieve dimension independent of input size and sublinear in k.
[发布日期] [发布机构] Massachusetts Institute of Technology
[效力级别] [学科分类]
[关键词] [时效性]