Towards unified biomedical modeling with subgraph mining and factorization algorithms
[摘要] This dissertation applies subgraph mining and factorization algorithms to clinical narrative text, ICU physiologic time series and computational genomics. These algorithms aims to build clinical models that improve both prediction accuracy and interpretability, by exploring relational information in different biomedical data modalities including clinical narratives, physiologic time series and exonic mutations. This dissertation focuses on three concrete applications: implicating neurodevelopmentally coregulated exon clusters in phenotypes of Autism Spectrum Disorder (ASD), predicting mortality risk of ICU patients based on their physiologic measurement time series, and identifying subtypes of lymphoma patients based on pathology report text. In each application, we automatically extract relational information into a graph representation and collect important subgraphs that are of interest. Depending on the degree of structure in the data format, heavier machinery of factorization models becomes necessary to reliably group important subgraphs. We demonstrate that these methods lead to not only improved performance but also better interpretability in each application.
[发布日期] [发布机构] Massachusetts Institute of Technology
[效力级别] [学科分类]
[关键词] [时效性]