Topics in High-Dimensional Unsupervised Learning.
[摘要] The first part of the dissertation introduces several new methods for estimating thestructure of graphical models. Firstly, we consider estimating graphical models withdiscrete variables, including nominal variables and ordinal variables. For the nominalvariables, we prove the asymptotic properties of the joint neighborhood selectionmethod proposed by Hoefling and Tibshirani (2009) and Wang et al. (2009), which isused to fit high-dimensional graphical models with binary random variables. We showthat this method is consistent in terms of both parameter estimation and structureestimation and extend it to general nominal variables. For ordinal variables, we introducea new graphical model, which assumes that the ordinal variables are generatedby discretizing marginal distributions of a latent multivariate Gaussian distributionand the relationships of these ordinal variables are described by the underlying Gaussiangraphical model. We develop an EM-like algorithm to estimate the underlyinglatent network and apply the mean field theory to improve computational efficiency.We also consider the problem of jointly estimating multiple graphical models whichshare the variables but come from different categories. Compared with separate estimationfor each category, the proposed joint estimation method significantly improves performance when graphical models in different categories have some similarities. Wedevelop joint estimation methods both for Gaussian graphical models and for graphicalmodels for categorical variables.In the second part of the dissertation, we develop two methods to improve interpretabilityof high-dimensional unsupervised learning methods. First, we introduce apairwise variable selection method for high-dimensional model-based clustering. Unlikeexisting variable selection methods for clustering problems, the proposed methodnot only selects the informative variables, but also identifies which pairs of clustersare separable by each informative variable. We also propose a new method to identifyboth sparse structures and ;;block” structures in factor loadings in principal componentanalysis. This is achieved by forcing highly correlated variables to have identicalfactor loadings via a regularization penalty.
[发布日期] [发布机构] University of Michigan
[效力级别] High-dimensinonal Data Analysis [学科分类]
[关键词] Graphical Model;High-dimensinonal Data Analysis;Network Analysis;Unsupervised Learning;Statistics and Numeric Data;Science;Statistics [时效性]