Tree-Based Methods for Discovery of Association between Flow Cytometry Data and Clinical Endpoints
[摘要] We demonstrate the application and comparative interpretations ofthree tree-based algorithms for the analysis of data arising fromflow cytometry: classification and regression trees (CARTs), randomforests (RFs), and logic regression (LR). Specifically, we considerthe question of what best predicts CD4 T-cell recovery in HIV-1infected persons starting antiretroviral therapy with CD4 countbetween 200 and 350 cell/μL. A comparison to a more standardcontingency table analysis is provided. While contingency tableanalysis and RFs provide information on the importance of eachpotential predictor variable, CART and LR offer additional insightinto the combinations of variables that together are predictive ofthe outcome. In all cases considered, baseline CD3-DR-CD56+CD16+emerges as an important predictor variable, while the tree-basedapproaches identify additional variables as potentially informative.Application of tree-based methods to our data suggests that acombination of baseline immune activation states, with emphasis onCD8 T-cell activation, may be a better predictor than any singleT-cell/innate cell subset analyzed. Taken together, we show thattree-based methods can be successfully applied to flow cytometry datato better inform and discover associations that may not emerge inthe context of a univariate analysis.
[发布日期] [发布机构]
[效力级别] [学科分类] 生物技术
[关键词] [时效性]