Multivariate methods for the statistical analysis of hyperdimensional high-content screening data
[摘要] In the post-genomic era, greater emphasis has been placed on understanding the function of genes at the systems level. To meet these needs, biologists are creating larger, and increasingly complex datasets. In recent years, high-content screening (HCS) using RNA interference (RNAi) or other perturbation techniques in combination with automated microscopy has emerged as a promising investigative tool to explore intricate biological processes. Image-based HC screens produce massive hyperdimensional data sets. To identify novel components of the DNA damage response (DDR) after ionizing radiation, we recently performed an image-based HC RNAi screen in an osteosarcoma cell line. Robust univariate hit identication methods and manual network analysis identied an isoform of BRD4, a bromodomain and extra-terminal domain family member, as an endogenous inhibitor of DDR signaling. However, despite the plethora of data generated from our and other HC screens, little progress has been made in analyzing HC data using multivariate computational methods that exploit the full richness of hyperdimensional data and identify more than just the most salient knockdown phenotypes to gain a detailed understanding of how gene products cooperate to regulate complex cellular processes. We developed a novel multivariate method using logistic regression models and least absolute shrinkage and selection operator regularization for analyzing hyperdimensional HC data. We applied this method to our HC screen to identify genes that exhibit subtle but consistent phenotypic changes upon knockdown that would have been missed by conventional univariate hit identication approaches. Our method automatically selects the most predictive features at the most predictive time points to facilitate the more ecient design of follow-up experiments and puts the identied hits in a network context using the Prize-Collecting Steiner Tree algorithm. This method offers superior performance over the current gold standard for the analysis of HC RNAi screens. A surprising finding from our analysis is that training sets of genes involved in complex biological phenomena used to train predictive models must be broken down into functionally coherent subsets in order to enhance new gene discovery. Additionally, we found that in the case of RNAi screening, statistical cell-to-cell variation in phenotypic responses in a well of cells targeted by a single shRNA is an important predictor of gene dependent events.
[发布日期] [发布机构] Massachusetts Institute of Technology
[效力级别] [学科分类]
[关键词] [时效性]