Statistical Inference for Some Problems in Network Analysis.

[摘要] Recent advances in computing and measurement technologies have led to an explosion in the amount of data that are being collected in all areas of application.Much of these data have network or graph structures, and they are common in diverse scientific areas, such as biology, computer science, sociology and so on. This dissertation makes three contributions to inference problems in statistical network analysis.The first two parts of the dissertation focus on community analysis. In the first part, we establish general theory for checking consistency of community detection under the degree-corrected block model, a generalization of standard stochastic block model, which allows variation in node degrees within a community, thus accommodating hub nodes. We compare several community detection criteria under both the standard and the degree-corrected block models. We show which criteria are consistent under which models and constraints, as well as compare their relative performance in practice.The second part proposes a new framework for community identification. Most community detection methods focus on partitioning the entire network into communities, with the expectation of many ties within communities and few ties between. However, many networks contain nodes that do not fit in with any of the communities, and forcing every node into a community can distort results. We propose a framework that extracts one community at a time, allowing for weakly connected nodes. The proposed extraction criterion performs well on simulated and real networks. For the case of the block model, we establish asymptotic consistency of estimated node labels and propose a hypothesis test for determining the number of communities.The third part makes a contribution to the link prediction problem.In many applications, notably in genetics, a partially observed network may not contain any negative examples of absent edges, which creates a difficulty for many existing supervised learning approaches.We develop a new method which treats the observed network as a sample of the true network with different sampling rates for positive and negative examples. We obtain a relative ranking of potential links by their probabilities, utilizing information on node covariates as well as on network topology.

[发布日期] [发布机构] University of Michigan

[效力级别] Community Identification [学科分类]

[关键词] Network Analysis;Community Identification;Link Prediction;Statistics and Numeric Data;Science;Statistics [时效性]

浏览次数：30

统一登录查看全文激活码登录查看全文