CLARIT TREC8 Experiments in Searching Web Data
[摘要] CLARITECH submitted two baselinecontentonly runs and completed two additional content+link runs in the TREC8 Web Track.These represent our first serious attempt to deal with Web data, and our first automatic runs in several years. The first question was whether CLARIT would perform as well on Web data as on more traditional text.We found that, with extensive preprocessing of the raw data prior to indexing, the automatic retrieval system actually performed better on Web data than on Ad Hoc data.For the link runs, we implemented a version of the HITS algorithm [Kleinberg 1997], originally developed at IBM. Our version optimized HITS for the CLARIT environment, but also reflected some constraints imposed by limited resources. Unable to develop and sufficiently test our own matrixprocessing library in time, we used a commercial product for the number crunching. Performance on the link runs was poor, but failure
[发布日期] [发布机构]
[效力级别] [学科分类] 社会科学、人文和艺术(综合)
[关键词] [时效性]