Permutation Testing and Semiparametric Regression: Efficient Computation, Tests of Matrix Structure, and l1 Smoothing Penalties

[摘要] Part I: Permutation TestingChapters 1 and 2: Fast Approximation of Small p-values in Permutation Tests by Partitioning the PermutationsResearchers in genetics and other life sciences commonly use permutation tests to evaluate differences between groups. Permutation tests have desirable properties, including exactness if data are exchangeable, and are applicable even when the distribution of the test statistic is analytically intractable. However, permutation tests can be computationally intensive.We propose both an asymptotic approximation and a resampling algorithm for quickly estimating small permutation p-values (e.g. <10^-6) for the difference and ratio of means in two-sample tests. Our methods are based on the distribution of test statistics within and across partitions of the permutations, which we define. We present our methods and demonstrate their use through simulations and an application to cancer genomic data. Through simulations, we find that our resampling algorithm is more computationally efficient than another leading alternative, particularly for extremely small p-values (e.g. <10^-30). Through application to cancer genomic data, we find that our methods can successfully identify up- and down-regulated genes. While we focus on the difference and ratio of means, we speculate that our approaches may work in other settingsChapter 3: Tests of Matrix Structure for Construct ValidationPsychologists and other behavioral scientists are frequently interested in whether a questionnaire reliably measures a latent construct. Attempts to address this issue are referred to as construct validation. We describe nonparametric hypothesis testing procedures to assess matrix structures, which can be used for construct validation. These methods are based on a quadratic assignment framework, and can be used either by themselves or to check the robustness of other methods. We investigate the performance of these matrix structure tests through simulations, and demonstrate their use by analyzing a big five personality traits questionnaire administered as part of the Health and Retirement Study. We also derive the rate of convergence for our overall test to better understand its behavior.Part II: Semiparametric regressionChapter 4: P-Splines with an l1 Penalty for Repeated MeasuresP-splines are penalized B-splines, in which finite order differences in coefficients are typically penalized with an l2 norm.P-splines can be used for semiparametric regression and can include random effects to account for within-subject variability. In addition to l2 penalties, l1-type penalties have been used in nonparametric and semiparametric regression to achieve greater flexibility, such as in locally adaptive regression splines, l1 trend filtering, and the fused lasso additive model. However, there has been less focus on using l1 penalties in P-splines, particularly for estimating conditional means.We demonstrate the potential benefits of using an l1 penalty in P-splines, with an emphasis on fitting non-smooth functions. We propose an estimation procedure using the alternating direction method of multipliers and cross validation, and provide degrees of freedom and approximate confidence bands based on a ridge approximation to the l1 penalized fit. We also demonstrate potential uses through simulations and an application to electrodermal activity data collected as part of a stress study.

[发布日期] [发布机构] University of Michigan

[效力级别] Resampling methods [学科分类]

[关键词] Computational efficiency;Resampling methods;Genomics;Additive models;Hubert"s Gamma;Stress research;Statistics and Numeric Data;Science;Biostatistics [时效性]

浏览次数：77

统一登录查看全文激活码登录查看全文