Statistical Analysis for Genomic Studies Involving Measurement Error, Multiple Populations, and Limited Sample Size.
[摘要] Genomic studies involve various types of high-dimensional data. Study designs are often complex, and data are difficult to collect. For example, the subjects may belong to distinct populations, the number of subjects is often small, and substantial measurement error is usually present. In this thesis, we consider three important issues that arise in this research setting. The impact of measurement error on parameter estimation has been extensively studied, but its effects on predictive performancehave not been. In part 1 of the thesis, we partially characterize the data generatingmodels that are most adversely impacted by measurement error. These results may help researchers judge whether improving data collection procedures, or identifying more informative markers would have a greater impact on predictive performance.In part 2 of the thesis, we present a new approach for identifying the common andunique marker/outcome associations that are present in a genomic dataset consistingof several subpopulations. We show that the natural plug-in style estimates of overlapare biased, and we demonstrate a copula-based approach to reducing the bias. Part 3 of the thesis considers situations in which power for attributing effects to specific markers is low, but meaningful relationships between marker/outcome associations and other statistical properties of the markers can be identified.
[发布日期] [发布机构] University of Michigan
[效力级别] Effect Size [学科分类]
[关键词] Measurement Error;Effect Size;Statistics and Numeric Data;Science;Statistics [时效性]