Investigation of Smooth and Non-smooth Penalties for Regularized ModelSelection in Regression.
[摘要] In this thesis, new approaches for using regularized regression in model selection are proposed, and we characterize the circumstances in which regularized regression improves our ability to discriminate models.First, we propose a variable selection method for regression models with interactions, using L1 regularization to automatically enforces heredity constraints.Theoretical study shows that asymptotically the proposed method performs as well as when the true model is known in advance under some regularity conditions.Numerical results show that the method performs favorably in terms of prediction and variable selection compared to some other recently developed methods.Second, regularized regression methods including ridge regression, the Lasso and the elastic net are investigated in terms of their abilities to rank the predictors in a regression model based on the sizes of their effects.Intuitively, regularization should be most useful when strong collinearity is present, however, we find that not all models with collinearity benefit from regularization.We were able to characterize situations in which regularization is either helpful, harmful, or neutral for ranking performance, and defined a sense in which regularization improves performance more often than not.By analytical and numerical studies, we show that L2-regularization outperforms L1-regularization for ranking performance, especially when the effects are weak, partly because when univariate analysis is optimal, ridge regression can better approximate univariate analysis than the Lasso.Our results also imply that the best regression estimator for variable ranking and for prediction may differ.This work may have implications for genetic mapping and other analyses involving regression methods with weak effects and collinear regressors.
[发布日期] [发布机构] University of Michigan
[效力级别] Penalized Regression [学科分类]
[关键词] Regularization;Penalized Regression;Lasso;Ridge Regression;Heredity;Ranking;Statistics and Numeric Data;Science;Statistics [时效性]