Estimation and Inference in High Dimensional Networks, with Applications toBiological Systems.
[摘要] This dissertation discusses several aspects of estimation and inference for high dimensional networks, and is divided into three main parts. First, to assess the significance of arbitrary subnetworks (e.g. pathways), I propose a latent variable model that directly incorporates the network information. By formulating the problem as a (generalized) mixed linear model, I introduce a general inference procedure for testing the significance of subnetworks, that can be used to test for changes in both expression levels of the corresponding nodes (e.g. genes), as well as the structure of the network. The framework is then extended for analysis of data obtained from complex experimental designs. We also study the effect of noise in the network information, both theoretically and empirically, and show that the proposed inference procedure is robust to the presence of random noise.In the second part, we consider the problem of estimating directed graphs from observed data. The general problem of estimation of directed graphs is computationally NP-hard and direction of interactions may not be distinguishable from observations. I consider a special case of this problem, where the nodes (i.e. variables) inherit a natural ordering, and propose efficient penalized likelihood methods for estimating the underlying network structure. Consistency of the estimators in the high dimensional setting (more variables than observations) is established. I also propose an extension of the lasso penalty that results in improved estimation of graphical Granger causality from time-course observations.The last part of the dissertation is devoted to issues of dimension reduction and efficient computation in networks. I propose a dimension reduction algorithm for networks using Laplacian eigenmaps, discuss the connections of this method to principal component analysis, and formulate the inference problem using a group lasso penalty. I also address computational aspects of estimation in networks, by proposing a distributed algorithm based on block-relaxation and derive conditions required for convergence of the algorithm to the maximum likelihood estimates. Finally, I present an extension of the block-relaxation algorithm, called approximate block-relaxation, that facilitates the use of iterative algorithms in optimization problems with complex objective functions.
[发布日期] [发布机构] University of Michigan
[效力级别] Graphical Models [学科分类]
[关键词] High Dimensional Networks;Graphical Models;Biological Networks and Systems Biology;Small N Large P Asymptotics;Penalized Likelihood Estimation;Bioinformatics;Mathematics;Statistics and Numeric Data;Health Sciences;Science;Statistics [时效性]