TREC 2004 Genomics Track Experiments at UTA: The Effects of Primary Keys, Bigram Phrases and Query Expansion on Retrieval Performance
[摘要] We submitted runs for Genomics Track’s ad hoc retrieval task. The first official run (utaauto) wasan automatic run and the second (utamanu) manual. For utaauto, the main features of query formulation were the removal of performative and marginally topical words from the topics based on average term frequency statistics, the removal of stopwords, the identification of bigram phrases, the weighting of keys with low document frequency (which we call primary keys), and query expansion with MH terms. The utamanu queries were Boolean queries formulated on a basis of a concept analysis. New terms were selected from the MeSH, genome databases, and dictionaries. The mean average precision (MAP) for the utaauto queries was 0.3324 and for the utamanu queries 0.3128. In the additional experiments we studied the effects of utaauto’s main features on retrieval performance. The results showed that assigning more weight to the primary key than the other keys of a query improves retrieval performance. The use of bigram phrases in queries was also useful. The bigram phrases were identified using a collocation technique that computes an adjacency indicator for words in topics based on the joint occurrences of the words in a large and small window of document text
[发布日期] [发布机构]
[效力级别] [学科分类] 社会科学、人文和艺术(综合)
[关键词] [时效性]