Assessment and Improvement of a Sequential Regression Multivariate Imputation Algorithm.
[摘要] The sequential regression multivariate imputation (SRMI, also known as chained equations or fully conditional specifications) is a popular approach for handling missing values in highly complex data structures with many types of variables, structural dependencies among the variables and bounds on plausible imputation values. It is a Gibbs style algorithm with iterative draws from the posterior predictive distribution of missing values in any given variable, conditional on all observed and imputed values of all other variables. However, a theoretical weakness of this approach is that the specification of a set of fully conditional regression models may not be compatible with a joint distribution of the variables being imputed. Hence, the convergence properties of the iterative algorithm are not well understood. The dissertation will focus on assessing and improving the SRMI algorithm.Chapter 2 develops conditions for convergence and assesses the properties of inferences from both compatible and incompatible sequences of generalized linear regression models. The results are established for the missing data pattern where each subject may be missing a value on at most one variable. The results are used to develop criteria for the choice of regression models.Chapter 3 proposes a modified block sequential regression multivariate imputation (BSRMI) approach to divide the data into blocks for each variable based on missing data patterns and tune the regression models through compatibility restrictions. This is extremely helpful to avoid divergence when the data are missing in general patterns and when it is difficult to get well fitting models across all missing data patterns. Conditions for the convergence of the algorithm are established, and the repeated sampling properties of inferences using several simulated data sets are studied. Chapter 4 extends the imputation model selection to quasi-likelihood regression models in both SRMI and BSRMI to better capture structure in the prediction model for the missing values. The performance of the modified approach is examined through simulation studies. The results show that extension to quasi-likelihood regression models makes it easier to choose better fitting model sequences to yield desirable repeated sampling properties of the multiple imputation estimates.
[发布日期] [发布机构] University of Michigan
[效力级别] Sequential Regression Multivariate Imputation [学科分类]
[关键词] Missing Data Multiple Imputation;Sequential Regression Multivariate Imputation;Compatible Conditional Specifications;Block-specific Sequential Regression Multivariate Imputation;Sequential Regression Multivariate Imputation by Quasi-Likelihood Regression Models;Statistics and Numeric Data;Science;Biostatistics [时效性]