Statistical Methods in Population Genetics for Next-Generation Sequencing Data

[摘要] The increasing number of large-scale sequencing studies has provided unprecedented access to rare genetic variation. Rare variants are often population specific and arose recently, giving unique insight into recent population history. We require innovative population genetics methods to leverage this new information.In Chapter 2, we present a method for estimating changing migration using the distribution of rare variants among populations. We develop a likelihood function based on this distribution to obtain one estimate of the migration rate for variants with a given minor allele count. As the distribution depends only on the migration rate after the mutation-generating event, we compare migration estimates in variants with different minor allele counts to obtain evidence of changing migration. Evaluating our method on simulated data and applying the method to exome sequence data of drug target genes, we identify migration changes as recent as 20 generations in the past and estimate migration rate parameters. In Chapter 3, we develop a flexible mathematical model for population bottlenecks and genetic drift. Using binomial sampling and a stochastic process, we construct a discrete Markov chain with two transition matrices. We apply this approach to sequencing of mitochondrial DNA (mtDNA) of mother-child pairs and estimate the bottleneck size during mtDNA transmission. In a second application, we adapt this model for cell growth experiments. We determine the probability of drift, without selection, producing extreme shifts in allele frequencies during cell replication. At low probabilities, we find evidence of selection and adapt the model to incorporate and estimate a selection coefficient.In Chapter 4, we explore signals of selection in autoimmune disease genes by adapting site frequency spectrum (SFS) tests to whole genome sequencing (WGS) data. We hypothesize loci associated with multiple autoimmune diseases were once selected for protection from pathogens. We calculate these SFS tests across the genome, generating an empirical distribution and applying a rank-based testing procedure for our genes. Our novel approach eliminates ascertainment bias found in genome-wide association studies data, while accounting for population growth and dependency across the genome. We assess the power of this approach and discuss optimal parameters for its application.

[发布日期] [发布机构] University of Michigan

[效力级别] next generation sequencing [学科分类]

[关键词] population genetics;next generation sequencing;Genetics;Mathematics;Science (General);Statistics and Numeric Data;Health Sciences;Science;Biostatistics [时效性]

浏览次数：23

统一登录查看全文激活码登录查看全文