Statistical Methods for Low-frequency and Rare Genetic Variants.
[摘要] Genetic association studies using sequencing, dense-array genotyping, or sequencing-based imputation provide the means to identify low-frequency and rare variants associated with diseases and traits, but analysis of these variants presents new statistical challenges.Single marker tests (e.g. logistic and linear regression), and methods to combine information across studies (e.g. joint and meta-analysis) may be poorly calibrated and/or of low power.The calibration and power of aggregation tests, where multiple rare variants are analyzed jointly, have not been evaluated for variants on the X chromosome.In my dissertation, I address three topics:First, for case-control studies, I evaluate the calibration and power of four logistic regression tests in joint and meta-analysis for low-frequency and rare variants and demonstrate that:(a) for joint analysis, the Firth bias-corrected test is best (e.g. most powerful among well-calibrated tests); (b) for meta-analysis of balanced studies (equal numbers of cases and controls), the score test is best, but is less powerful than Firth test-based joint analysis; and (c) for meta-analysis of sufficiently unbalanced studies, all four tests can be anti-conservative, particularly the score test.Second, for quantitative trait (QT) studies, I evaluate the calibration and power of linear regression in joint and meta-analysis and demonstrate for normally distributed QTs that: joint and sample-size weighted meta-analysis are equally well-calibrated and powerful for variants with expected minor allele count E[MAC]≥10; inverse-variance weighted meta-analysis is slightly anti-conservative for small-sized studies.For non-normally distributed QTs, joint and meta-analysis is equally anti-conservative for low-frequency and rare variants.Inverse-normal transformation of the QT remedies this problem, but transforming QTs of any distribution reduces power. Third, for case-control and QT studies, I evaluate the calibration and power of three aggregation tests for the X chromosome: burden, SKAT, and SKAT-O.For case-control studies, tests are relatively well-calibrated across all simulation scenarios.Power is usually slightly increased when the coding scheme for male genotypes matches the underlying model, but power loss is small when the model is misspecified.Differences in male:female ratio in cases and controls have little effect on power.For QTs, calibration and power results are very similar to those for binary traits.
[发布日期] [发布机构] University of Michigan
[效力级别] Genetics [学科分类]
[关键词] Statistical genetics;Genetics;Public Health;Statistics and Numeric Data;Health Sciences;Science;Biostatistics [时效性]