Supervised learning in probabilistic environments
[摘要] NOTE: Text or symbols not renderable in plain ASCII are indicated by [...]. Abstract is included in .pdf document.For a wide class of learning systems and different noise models, we bound the test performance in terms of the noise level and number of data points. We obtain O(1/N) convergence to the best hypothesis, the rate of convergence depending on the noise level and target complexity with respect to the learning model. Our results can be applied to estimate the model limitation, which we illustrate in the financial markets. Changes in model limitation can be used to track changes in volatility.We analyze regularization in generalized linear models, focusing on weight decay. For a well specified linear model, the optimal regularization parameter decreases as [...]. When the data is noiseless, regularization is harmful. For a misspecified linear model, the "degree" of misspecification has an effect analogous to noise. For more general learning systems we develop EXPLOVA (explanation of variance) which also enables us to derive a condition on the learning model for regularization to help. We emphasize the necessity of prior information for effective regularization.By counting functions on a discretized grid, we develop a framework for incorporating prior knowledge about the target function into the learning process. Using this framework, we derive a direct connection between smoothness priors and Tikhonov regularization, in addition to the regularization terms implied by other priors.We prove a No Free Lunch result for noise prediction: when the prior over target functions is uniform, the data set conveys no information about the noise distribution. We then consider using maximum likelihood to predict non-stationary noise variance in time series. Maximum likelihood leads to systematic errors that favor lower variance. We discuss the systematic correction of these errors.We develop stochastic and deterministic techniques for density estimation based on approximating the distribution function, thus placing density estimation within the supervised learning framework. We prove consistency of the estimators and obtain convergence rates in L1 and L2. We also develop approaches to random variate generation based on "inverting" the density estimation procedure and based on a control formulation.In general, we use multilayer neural networks to illustrate our methods.
[发布日期] [发布机构] University:California Institute of Technology;Department:Engineering and Applied Science
[效力级别] [学科分类]
[关键词] electrical engineering [时效性]