Variable Weight Kernel Density Estimation
[摘要] Nonparametric density estimation is a common and important task in many problems in machine learning. It consists in estimating a density function from available observations without making parametric assumptions on the generating distribution. Kernel means are nonparametric estimators composed of the average of simple functions, called kernels, centered at each data point. This work studies some relatives of these kernel means with structural similarity but which assign different weights to each kernel unit in order to attain certain desired characteristics. In particular, we present a sparse kernel mean estimator and a consistent kernel density estimator with fixed bandwidth parameter.First, regarding kernel means, we study the kernel density estimator (KDE) and the kernel mean embedding. These are frequently used to represent probability distributions, unfortunately, they face scalability issues. A single point evaluation of the kernel density estimator, for example, requires a computation time linear in the training sample size. To address this challenge, we present a method to efficiently construct a sparse approximation of a kernel mean. We do so by first establishing an incoherence-based bound on the approximation error. We then observe that, for any kernel with constant norm (which includes all translation invariant kernels), the bound can be efficiently minimized by solving the k-center problem. The outcome is a linear time construction of a sparse kernel mean, which also lends itself naturally to an automatic sparsity selection scheme. We demonstrate the computational gains of our method by looking at several benchmark data sets, as well as three applications involving kernel means: Euclidean embedding of distributions, class proportion estimation, and clustering using the mean-shift algorithm.Second we address the bandwidth selection problem in kernel density estimation. Consistency of the KDE requires that the kernel bandwidth tends to zero as the sample size grows. In this work, we investigate the question of whether consistency is still possible when the bandwidth is fixed, if we consider a more general class of weighted KDEs. To answer this question in the affirmative, we introduce the fixed-bandwidth KDE (fbKDE), obtained by solving a quadratic program, that consistently estimates any continuous square-integrable density. Rates of convergence are also established for the fbKDE for radial kernels and the box kernel under appropriate smoothness assumptions. Furthermore, in a simulation study we demonstrate that the fbKDE compares favorably to the standard KDE and the previously proposed variable bandwidth KDE.
[发布日期] [发布机构] University of Michigan
[效力级别] density estimation [学科分类]
[关键词] machine learning;density estimation;reproducing kernel Hilbert space;complexity;sparsity;consistency;Computer Science;Electrical Engineering;Statistics and Numeric Data;Engineering;Science;Electrical Engineering: Systems [时效性]