A Statistical Framework for Using External Information in Updating Prediction Models with New Biomarker Measures
[摘要] Prediction models are abundant in the clinical and epidemiologic literature. There are established risk prediction models for cancer, cardiovascular diseases and many other chronic diseases. The information from an existing prediction model can be available in the form of coefficient estimates (with or without measures of standard error) or individual prediction probabilities (with or without standard errors). This dissertation poses a principled framework to incorporate such varying types of information while building a new prediction model that adds new candidate biomarkers to the existing model. In the first chapter, we consider a situation where there is rich historical data available for the coefficients and their standard errors in a linear regression model describing the association between a continuous outcome variable Y and a set of predicting factors X, from a large study. We would like to utilize this summary information for improving inference in an expanded model of interest, Y given X, B. The additional variable B is a new biomarker, measured on a small number of subjects in a new dataset. We formulate the problem in an inferential framework where the historical information is translated in terms of nonlinear constraints on the parameter space and propose both frequentist and Bayes solutions to this problem. We show that a Bayesian transformation approach proposed by Gunn and Dunson is a simple and effective computational method to conduct approximate Bayesian inference for this constrained parameter problem. The simulation results comparing these methods indicate that historical information on Y|X can improve the efficiency of estimation and enhance the predictive power in the regression model of interest E(Y|X, B). We illustrate our methodology by enhancing a published prediction model for bone lead levels in terms of blood lead and other covariates, with a new biomarker defined through a genetic risk score.In the second chapter, we further develop and evaluate the strategy of translating the external information into constraints on regression coefficients in the setting of a binary response variable Y and a logistic regression model. Borrowing from the measurement error literature we establish an approximate relationship between the regression coefficients in the models Pr(Y = 1|X, β), Pr(Y = 1|X, B, γ) and E(B|X, θ) for a Gaussian distribution of B. For binary B we propose an alternate expression. We illustrate our methodology through simulations and by enhancing the High-grade Prostate Cancer Prevention Trial Risk Calculator, with two new biomarkers prostate cancer antigen 3 and TMPRSS2:ERG.In the third chapter, the goal is to improve the prediction ability of a risk assessment model, Pr(Y = 1|X, B) constructed from a small dataset by incorporating external information that comes in the form of predicted outcomes from an existing model for Pr(Y = 1|X). For example, the existing well-known risk prediction models are often converted into a publicly available online tool or risk calculator to yield a predicted probability of developing the disease for an individual based on a set of risk factors X, but the exact form of the model/algorithm to construct the predictions may not be known. We propose a constrained maximum likelihood method and an approach based on synthetic data and multiple imputation to utilize this information while constructing a model for Pr(Y = 1|X, B).
[发布日期] [发布机构] University of Michigan
[效力级别] Constrained estimation [学科分类]
[关键词] Bayesian methods;Constrained estimation;Prediction models;Logistic regression;Statistics and Numeric Data;Science;Biostatistics [时效性]