Markov Chain Monte Carlo in Genetics: Subphenotyping, Linkage Disequilibrium Modeling, and Fine Mapping.
[摘要] The advance of modern genotyping and sequencing technologies makes large scale data available in different genetic studies. Meanwhile, MCMC algorithm provides powerful computational tools in handling these high-dimensional genetic data. In this dissertation, I demonstrate several MCMC applications in emerging genetic studies.In Chapter 2, I propose a method to identify genetically homogeneous subphenotypes of complex diseases. I assume that different disease subtypes, caused by different risk variants, behave uniquely in clinical characteristics (treated as covariates). I design an algorithm to identify these covariates to define genetically homogeneous subtypes. Conditional on these covariates, this algorithm calculate each affected individual’s posterior probability of belonging to each subtype. Using simulated data, I illustrate that my algorithm correctly identifies subtypes, such that affected individuals within each subtype group are likely to carry the same risk variants. I also evaluate whether stratifying on these estimated subtype memberships improves the power to detect phenotypic association at risk loci attributable to these subtypes. In Chapter 3, I introduce a novel algorithm to model the linkage disequilibrium (LD) between different genomic positions through shared genealogies. Compared to traditional hidden Markov models (HMM) which might over simplify the evolutionary process of sampled haplotypes, my method allows for more variations in prior probabilities about shared haplotype segments descend from particular ancestors, as well as more variations in population genetic parameters. Through this more careful model, our method improves the accuracy in haplotype reconstruction. Moreover, I propose a fine mapping algorithm based on this model to localize complex trait loci. My algorithm identifies disease causal loci accurately when traditional mapping approaches based on single marker tests have low power.In Chapter 4, I propose an approach to overcome the computational burden in fine mappings using our coalescent-based modeling. I first estimate a set of clusters of sampled haplotypes such that members within each cluster share one common ancestor. I then make inferences about genealogies of these clusters to localize candidate regions of disease-causing mutations. Using simulated data, I illustrate that this implementation enables my fine mapping approach in large samples with several tens of thousands of individuals.
[发布日期] [发布机构] University of Michigan
[效力级别] Population Genetics [学科分类]
[关键词] Statistical Genetics;Population Genetics;Morkov Chain Monte Carlo;Statistics and Numeric Data;Science;Biostatistics [时效性]