已收录 268921 条政策
 政策提纲
  • 暂无提纲
Dependence of Clustering Algorithm Performance on Clustered-ness of Data
[摘要] Intuitively, clustering algorithms should work better on the datasets that have well separated clusters. But we found the contrary for the center-based clustering algorithms, including K-Means, K-Harmonic Means and EM. We generated 1200 synthetic datasets with varying ratio of inter-cluster variance over within-cluster variance, which we call the clustered-ness of the dataset. We run K-Means, K-Harmonic Means and EM on these datasets and found that the ratio of the performance over the global optimum grows with increasing clustered-ness. Dependence of clustering algorithm performance on other parameters -- quality of initialization and dimensionality of data -- are also demonstrated. 12 Pages
[发布日期]  [发布机构] HP Development Company
[效力级别]  [学科分类] 计算机科学(综合)
[关键词] clustering;K-Means;K-Harmonic Means;EM;Data Mining [时效性] 
   浏览次数:48      统一登录查看全文      激活码登录查看全文