Concept Extraction and Synonymy Management forBiomedical Information Retrieval

[摘要] This paper reports on work done for the Genomics Track at TREC 2004 byConverSpeech LLC in conjunction with scientists at the Saccharomyces GenomeDatabase (SGD), the model organism database located at Stanford University, California.The rapidly increasing number of articles in the biomedical literature has created newurgency for software tools that find information relevant to specific information needs.We focused on two challenges in this work: the problems of synonymy (several termshaving the same meaning) and polysemy (a term having more than one meaning), and theproblem of constructing queries from information needs stated in natural language. Weinvestigated the use of concept extraction for the second problem, relying on the limitedstatements of information need as the source of textual analysis. To minimize theproblem of synonymy, we investigated the use of a languageoriented biomedicalontology and MeSH (Medical Subject Headings) for term expansion. Additionally, tominimize the problem of polysemy, we used extracted concepts to analyze and rank thedocuments returned by a search. We submitted two sets of results to TREC forevaluation, the first one produced automatically, the second derived from the first bymaking specific kinds of changes in the query and ranking methods. The mean averageprecision (MAP) for the automatic result was lower than the median of the 37 submittedruns overall; however, desirable results were obtained for mean average precision at 10and 100 documents for almost half the topics. The MAP for the derived result was higherthan the median, a desirable result.BackgroundNEED. The rapidly increasing number of articles in the biomedical literature has creatednew urgency for software tools that find information relevant to specific informationneeds. The Text Retrieval Conference (TREC), cosponsored by the National Instituteof Standards and Technology (NIST) and U.S. Department of Defense, supports large scale evaluation of text retrieval methodologies.The TREC 2004 Genomics trackcontained a task, called the Ad hoc information retrieval task, that consisted of 50specific information needs collected from interviews with biomedical scientists.Documents relevant to these needs had to be located within a 10year subset of theMEDLINE bibliographic database and the results sorted according to their estimatedrelevance. This paper reports on work done for the Ad hoc task by ConverSpeech LLCin conjunction with scientists at SGD, a scientific database of the molecular biology andgenetics of the yeast Saccharomyces cerevisiae located at Stanford University,California [1].

[发布日期] [发布机构]

[效力级别] [学科分类] 社会科学、人文和艺术（综合）

[关键词] [时效性]

浏览次数：6

统一登录查看全文激活码登录查看全文