Generalized Dirichlet-process-means for f-separable distortion measures
[摘要] DP-means clustering was obtained as an extension of K-means clustering. While it is implemented with a simple and efficient algorithm, it can estimate the number of clusters simultaneously. However, DP-means is specifically designed for the average distortion measure. Therefore, it is vulnerable to outliers in data, and can cause large maximum distortion in clusters. In this work, we extend the objective func-tion of the DP-means to f-separable distortion measures and propose a unified learning algorithm to overcome the above problems by selecting the function f. Further, the influence function of the estimated cluster center is analyzed to evaluate the robustness against outliers. We demonstrate the performance of the generalized method by numerical experiments using real datasets. (c) 2020 The Authors. Published by Elsevier B.V. This is an open access article under the CC BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/4.0/).
[发布日期] 2021-10-11 [发布机构]
[效力级别] [学科分类]
[关键词] Clustering;Dirichlet-process-means;f-separable distortion measures;Bregman divergence;Influence function;Maximum distortion [时效性]