Towards GridBased Information Retrieval
[摘要] The IRTools software toolkit was used in TREC 2004 for submissions to the Web track andthe Terabyte track. Terabyte track results were not available at the time of the due date forthis Proceedings paper.While Web track results were available, qrels were not.Because wediscovered a bug in the MySQL++ API that truncated docid numbers in our results, we willawait qrels to reevaluate submitted runs and report results.This year, the Terabyte track dictated some changes to IRTools in order to handle the430+GB of text (about 25M documents).The main change was to operate on chunks of thecollection (272 separate chunks, each containing one of the Terabyte collections’subdirectories).Chunks were generated in parallel using the National Center forSupercomputing Application’s cluster, Mercury (dual Itanium systems).Up to about 40systems were used simultaneously for both indexing and querying.Query merging wassimplistic, based on the cosine value with Lnu.Ltc weighting.Use of the NCSA cluster, and other experiments with commodity clusters, is part of workunderway to enable information retrieval in Grid computing environments.The sitehttp://www.girwg.org has information about Grid Information Retrieval (GIR), includinglinks to the published Requirements document and draft Architecture document.The GIRworking group is chartered by the Global Grid Forum (GGF) to develop standards andreference implementations for GIR.TREC participants are urged to consider getting involved with Grid computing. Computational grids offer a very good fit for the needs of largescale information retrievalresearch and practice.This brief abstract for the proceedings will be replaced with a complete analysis of this year’ssubmissions for the full conference paper.Meanwhile, Newby (2004) provides a profile ofIRTools, which is generally applicable to this year’s submissions.ReferencesNewby, Gregory B.2004.“Document Structure with IRTools.”In: Voorhees, Ellen (Ed.).NIST Special Publication 500255:The Twelfth Text REtrieval Conference (TREC 2003). Gaithersburg, Maryland: NIST.pp. 568577.* 909 Koyukuk Dr. Fairbanks AK 99775.newby@arsc.edu or http://www.arsc.edu/~newby . The research described here was funded in part by National Science Foundation grant
[发布日期] [发布机构]
[效力级别] [学科分类] 社会科学、人文和艺术(综合)
[关键词] [时效性]