Neighborhood Graphs for Estimation of Density Functionals.
[摘要] Functionals of densities play a fundamental role in statistics, signal processing, machine learning, information theory and related fields.This class of functionals includes divergence measures of densities, intrinsic dimension of the density support, and minimum volume sets. k-nearest neighbor (k-NN) graph based estimators are widely used for the estimation of these functionals. While several consistent k-NN estimators have been previously proposed for estimating these functionals, general results on rates of convergence of these estimators and confidence intervals on the estimated functional are not available. In this thesis, a new class of estimators based on bipartite k-NN graphs is proposed for estimating functionals of probability density functions. This class includes divergence estimators, intrinsic dimension estimators and minimum volume set estimators. For this class of estimators, large sample theory is used to characterize performance of the estimators. Specifically, large sample expressions for estimator bias and variance is derived and a central limit theorem for the distribution of the estimators is established. This theory is applied to accurately estimate functionals of interest by optimizing the mean squared error over free parameters, e.g. the number of neighbors k, and obtaining confidence intervals on the estimated functional by invoking the central limit theorem. Furthermore, this theory provides significant insight into the statistical behavior of these bipartite k-NN estimators, leading to the development of modified k-NN estimators with faster rates of convergence. In particular, a weighted ensemble of bipartite k-NN estimators for functional estimation is proposed, and it is shown using this theory, that the weighted ensemble estimator outperforms the state-of-the-art. Using this theory, the thesis develops performance-driven algorithms in several applications. First, the theory is applied todetermine entropy with confidence to facilitate anomaly detection at desired false alarm rates in wireless sensor networks.Second, the theory is applied to determine complexity of high-dimensional data lying on a manifold, and subsequently applied to fusion and segmentation applications. Finally, the thesis introduces an efficient anomaly detection algorithm based on estimation of p-values of membership in training-sample minimum volume sets using bipartite k-NN graphs.
[发布日期] [发布机构] University of Michigan
[效力级别] K-NN Graphs [学科分类]
[关键词] K Nearest Neighbor;K-NN Graphs;K-NN Estimators;Ensemble Estimators;Entropy Estimation;Dimension Estimation;Electrical Engineering;Engineering;Electrical Engineering-Systems [时效性]