Topics in High-Dimensional Inference with Applications to Raman Spectroscopy.
[摘要] Recent advances in technology have led to a demand for statistical techniques for high-dimensional data. This thesis explores dimension estimation and reduction,covariance estimation and regularization, and improving nearest neighbor graphs with some examples in the context of Raman spectroscopy.A new technique for estimating intrinsic dimension is proposed and used to estimate the number of pure components in a chemical mixture in Raman spectroscopyapplications. We show how the new method improves over existing procedures, can be adapted via smoothing to deal with high noise levels, and has future applicationsin detecting mixture homogeneity.Next, we consider covariance estimation and regularization in high dimensions. Regularized covariance estimators in high dimensions depend on the ordering of variables or are completely invariant to variable permutations. We propose a new method, Isoband, which uses the unordered data to discover a suitable order for the variables and then apply methods which depend on variable ordering to improvecovariance estimation for sparse covariance matrices. Our method has the additional advantage of being able to detect blocks within covariances and thus create additionalsparsity and structure in the estimate. We show by simulations that when a suitable variable ordering exists, we do better by discovering it than by using a permutation-invariant method, and illustrate the new methodology on a real data example.The Isoband methodology relies on a nearest neighbor graph, and in the last chapter, we address improving robustness of nearest neighbor graphs, which havewidespread statistical applications. In our application, the nearest neighbor graph is based on the variables rather than the observations. Two new methods are proposedwhich improve upon the basic nearest neighbor graphs by removing spurious edges by either bootstrapping the data or smoothing. Both methods are competitive comparedto existing graph perturbation methods in the literature.
[发布日期] [发布机构] University of Michigan
[效力级别] Intrinsic Dimension [学科分类]
[关键词] Covariance Estimation;Intrinsic Dimension;Graph Perturbation;Statistics and Numeric Data;Science;Statistics [时效性]