UMass at TREC 2004: Novelty and HARD
[摘要] For the TREC 2004 Novelty track, UMass participated in all four tasks. Although finding relevant sentences was harder this year than last, we continue to show marked improvements over the baseline of calling all sentences relevant, with a variant of tfidf being the most successful approach. We achieve 59% improvements over the base line in locating novel sentences, primarily by looking at the similarity of a sentence to earlier sentences and focus ing on named entities. For the High Accuracy Retrieval from Documents (HARD) track, we investigated the use of clarification forms, fixed and variablelength passage retrieval, and the use of metadata. Clarification form results indicate that passage level feedback can provide improvements comparable to user supplied relatedtext for document evaluation and outperforms relatedtext for passage eval uation. Document retrieval methods without a query ex pansion component show the most gains from relatedtext. We also found that displaying the top passages for feed back outperformed displaying centroid passages. Named entity feedback resulted in mixed performance. Our pri mary findings for passage retrieval are that document re trieval methods performed better than passage retrieval methods on the passage evaluation metric of binary pref erence at 12,000 characters, and that clarification forms improved passage retrieval for every retrieval method ex plored. We found no benefit to using variablelength pas sages over fixedlength passages for this corpus. Our use of geography and genre metadata resulted in no significant
[发布日期] [发布机构]
[效力级别] [学科分类] 社会科学、人文和艺术(综合)
[关键词] [时效性]