Bayesian Hierarchical Modeling for Problems in Computational Biology.

[摘要] This thesis addresses the application of Bayesian hierarchical models to the analysis of high-throughput genomic and proteomic datasets. These models offer probabilistic data mining tools for large-scale datasets and help elucidate biological features using concise summaries.First, a novel method is developed for analyzing joint profiles of DNA copy number and gene expression data. mRNA expression is result of DNA transcription, thus gene expression data is expected to be partially correlated with copy number changes. The goal of this methodology is to identify genes showing differential expression between phenotypes associated with aberrant copy number alteration. This problem was approached by combining a change-point estimation procedure with sampling methods for two-stage mixture model. Second, a hierarchical hidden Markov model (HHMM) is proposed as a tool to merge data from chromatin immunoprecipitation (ChIP) experiments of two different mapping platforms ChIP-seq and ChIP-chip. In this method, inference results from individual HMMs in ChIP-seq and ChIP-chip experiments are summarized by a higher level HMM. Simulation studies show that one can improve receiver operating characteristic in TFBS identification by combining data from both technologies. Analysis of two well-studied transcription factors, NRSF and CTCF, also demonstrates that HHMM outperforms identification in individual data sources and simple merger of the two.Third, statistical methods for analyzing quantitative datasets generated from mass spectrometry-based proteomics experiments are presented. A model-based method QSpec is developed for identifying differentially expressed proteins using spectral counts. The method addresses limited sample size in typical experimental data with hierarchical Bayes. Following is another model-based method significance analysis (SAInt) for assigning significance measures to protein-protein interactions based on spectral count data obtained from large-scale affinity purification-mass spectrometry (AP-MS) experiments. The statistical models proposed in this thesis have a hierarchical structure in model parameters, allowing the inference for one gene or protein to borrow strength from others. All the examples illustrate that Bayesian hierarchical models allows for tractable model parametrization and practicable posterior sampling-based inference.

[发布日期] [发布机构] University of Michigan

[效力级别] Statistics and Numeric Data [学科分类]

[关键词] Bayesian Data Analysis for Computational Biology;Statistics and Numeric Data;Science;Biostatistics [时效性]

浏览次数：8

统一登录查看全文激活码登录查看全文