Statistical Analysis of Complex Data: Bayesian Model Selection and Functional Data Depth.
[摘要] Big data of the modern era exhibit different types of complex structures. This dissertation addresses two important problems that arise in this context. Consider high-dimensional data where the number of variables is much larger than the sample size. For model selection in a Bayesian framework, a novel approach using sample size dependent spike and slab priors is proposed. It is shown that the corresponding posterior has strong variable selection consistency even when the number of covariates grows nearly exponentially with the sample size, and that the posterior induces shrinkage similar to the shrinkage due to the L0 penalty. A new computational algorithm for posterior computation is proposed, which is much more scalable in memory and in computational efficiency than existing Markov chain Monte Carlo algorithms. For the analysis of functional data, a new notion of data depth is devised which possessesdesirable properties, and is especially well suited for obtaining central regions. In particular, the central regions achieve desired simultaneous coverage probability and are useful in a wide range of applications including boxplots and outlier detection for functional data, and simultaneous confidence bands in regression problems.
[发布日期] [发布机构] University of Michigan
[效力级别] Bayesian Model Selection [学科分类]
[关键词] High Dimensional Data;Bayesian Model Selection;Functional Data;Data Depth;Complex Data;Bayesian Computation;Skinny Gibbs;Gene Expression;Mathematics;Science (General);Statistics and Numeric Data;Science;Statistics [时效性]