Improved Analysis of Large Genetic Association Studies Using Summary Statistics
[摘要] Genome-wide association studies, which examine millions of genetic variants in thousands of individuals, have identified many complex trait associated loci. As sample sizes increase, particularly through meta-analysis, the number of disease associated loci has increased rapidly. The objective of this dissertation is to demonstrate the advantages of combining data across studies using summary statistics and to demonstrate methods that use publicly available information, such as functional annotation of the genome, to gain further insight into the genetics of human disease.In the first project, we analyze data from 188,578 individuals using genome-wide and custom genotyping arrays to identify new loci and refine known loci for lipid traits low-density lipoprotein (LDL) cholesterol, high-density lipoprotein (HDL) cholesterol, triglycerides, and total cholesterol.We identify and annotate 157 loci associated with lipid levels at P < 5x10-8, including 62 loci not previously associated with lipid levels in humans. Using dense genotyping in individuals of European, East Asian, South Asian, and African ancestry, we narrow association signals in 12 loci. We find that loci associated with blood lipids are often associated with cardiovascular and metabolic traits including coronary artery disease, type 2 diabetes, blood pressure, waist-hip ratio, and body mass index. Our results illustrate the value of genetic data from individuals of diverse ancestries and provide insights into biological mechanisms regulating blood lipids to guide future genetic, biological, and therapeutic research.In the second project, we propose that causal variants for a trait may share certain genomic features. Importantly, we show that when these genomic features can be identified, we can use them to help pinpoint likely causal variants among many trait associated variants. We develop a model that identifies genomic features enriched among the associated loci and uses this information to prioritize likely functional variants in each locus leading to narrower sets of variants for follow-up. Our models work for both quantitative and case-control data and can be used with summary statistics, making it convenient to incorporate in ongoing meta-analysis of genome-wide association studies that can include 100,000s of individuals.In the third project, we consider meta-analysis where studies may have overlapping sets of participants. In such scenarios, meta-analysis methods that do not account for overlap will perform poorly and have inflated Type I error. We propose a method to identify participant overlap between GWAS using only summary statistics, estimate the degree of overlap, and correctly meta-analyze studies taking into account the overlap. Our method builds upon and extends previous methods that allow meta-analysis of GWAS studies with known overlap proportions.We illustrate our method using simulations and artificially created overlapping samples using real GWAS data.
[发布日期] [发布机构] University of Michigan
[效力级别] Lipids [学科分类]
[关键词] Genetic Association Studies using Summary Statistics;Lipids;Post GWAS analysis;Meta-analysis with Overlap;Genetics;Statistics and Numeric Data;Health Sciences;Science;Biostatistics [时效性]