Statistical Methods and Analysis in Genome Wide Association Studies and Next- Generation Sequencing.
[摘要] Genome-wide association studies (GWAS), which examine common genetic variants in thousands of individuals, have identified many genetic loci associated with a variety of complex diseases and phenotypes. New Next-Generation Sequencing (NGS) technologies allow us to extend these studies to rarer variants not typically evaluated by GWAS. In this dissertation, I present novel statistical methods and software to dissect the genetic basis of complex traits in the context of both GWAS and NGS. First, I present a large-scale GWAS for Age-related Macular Degeneration (AMD). Our studies extend the catalog of AMD associated loci and provide clues about underlying cellular pathways. A novelty in our study is that we propose a prediction method using all susceptibility loci to help identify individuals at high risk of disease. The prediction can be extended to the general population with a weighted scheme combining both disease prevalence and case-control ratio in GWAS sample.Second, I describe an interactive package that provides graphical overviews of the results of whole-genome association studies in datasets with rich multi-dimensional phenotypic information, such as global surveys of gene expression. Third, I propose and implement an efficient Hidden Markov Model (HMM) based method for genotype calling and haplotype inference in parent-offspring trios. Our method considers both linkage disequilibrium (LD) patterns and the constraints imposed by the family structure in assigning individual genotypes and haplotypes. Using simulations and sequencing data from ongoing projects, we show that trios provide higher genotype calling accuracy across the frequency spectrum, both overall and at hard-to-call heterozygous sites. In addition, sequencing trios provides greatly improved haplotype phasing accuracy.Finally, I describe an efficient state space reduction method for haplotype inference and genotype calling. This method is motivated by the increasing computational challenge of HMM-based approaches used to describe haplotype sharing in GWAS and NGS data. Our method takes advantage of local similarity between haplotypes and reduces the HMM state space dynamically, while preserving the same accuracy of full state space method. Through simulation and real data analysis, we show that this method can have substantial savings in both memory and CPU time.
[发布日期] [发布机构] University of Michigan
[效力级别] Age-related Macular Degeneration [学科分类]
[关键词] Genome-wide Association Study;Age-related Macular Degeneration;Genotype Calling and Haplotype Inference;State Space Reduction Method;Next-generation Sequencing;Genetics;Statistics and Numeric Data;Health Sciences;Science;Biostatistics [时效性]