Identifying relevant fulltext articles for GO annotation without MeSH
[摘要] Gene Ontology (GO) is a controlled vocabulary. Given a gene product, GO enables scientists toclearly and unambiguously describe specific molecular functions of the gene product, specificbiological processes in which it is involved, and specific cellular components to which it islocalized. In this paper, we present our approach to identifying which papers have experimentalevidence warranting annotation with GO codes. The training data set contains 375 relevantfulltext articles and 5,462 irrelevant ones, and the test data set contains 420 positive fulltextarticles and 5,623 negative ones. We regarded this problem as a binary classification problem,and employed Support Vector Machines (SVMs) to distinguish positive articles from negativeones. Title, abstract, figure/table captions, and three standard sectionsResults, Discussion, andConclusion were the targets of feature extraction. Without incorporating MeSH (Medical SubjectHeadings) terms as part of the features, our system achieved 0.381 in Normalized Utility
[发布日期] [发布机构]
[效力级别] [学科分类] 社会科学、人文和艺术(综合)
[关键词] [时效性]