Information Needs and Automatic Queries
[摘要] Tarragon Consulting Corporation participated in theadhoc retrieval task of the TREC 2004 GenomicsTrack. We used a standard deployment of the K2search engine from Verity, Inc. in which we exploitedthe freetext query parser to interpret the informationneed statements provided in the task. The primary goalof our participation was to establish a performancebaseline using “oftheshelf” tools and then to explorehow knowledgebased extensions could enhanceperformance. Time and resource constraints preventedus from performing the knowledgebased experiments,but our official submissions show that reasonableperformance can be achieved using just our baselinestrategy.Overall ApproachIn our approach, we emphasize the use of “offthe shelf” toolsto provide a baseline capability and thenexplore ways in which custom algorithms andtechnologies can be used to enhance performance. Forthe TREC 2004 Genomics AdHoc Retrieval Task, weused Verity’s K2 search engine (see:http//www.verity.com/ for basic information about theK2 family of products) both to index the documentsand to provide a baseline interpretation of the testtopics.To build the collection for the adhoc experiments, wecreated a minimal XML variant of the PubMed recordsin the test set (see Figure 1), and indexed them usingthe standard K2 indexer.To create the test queries, we use the standard K2 free text query parser to convert each element of thestatement of information need into a K2 queryfragment, and then combined these into an overallquery topic that used the components in differentways. In particular, we experimented with givingdifferent importance weights to the three elements ofthe topic.For example, Topic 1 as provided by NIST is:1 Ferroportin1 in humans Find articles about Ferroportin1,an iron transporter, in humans. Ferroportin1 (also known asSLC40A1; Ferroportin 1; FPN1; HFE4;IREG1; Iron regulated gene 1; Iron regulated transporter 1; MTP1; SLC11A3;and Solute carrier family 11 (proton coupled divalent metal ion transporters),member 3) may play a role in irontransport. To process this into a K2 query, we first separate outthe
, and elements.Then for each element, we: (1) remove any leadingnoise phrases that match entries in a library of pre built patterns developed from an analysis of previousTREC topics; (2) add a period to the end if there is noterminating punctuation; and, (3) map everything intouppercase.For Topic 1 this gives:FERROPORTIN1 IN HUMANS.FERROPORTIN1, AN IRON TRANSPORTER,IN HUMANS.FERROPORTIN1 (ALSO KNOWN ASSLC40A1; FERROPORTIN 1; FPN1; HFE4;IREG1; IRON REGULATED GENE 1; IRON REGULATED TRANSPORTER 1; MTP1; SLC11A3;AND SOLUTE CARRIER FAMILY 11 (PROTON COUPLED DIVALENT METAL ION TRANSPORTERS),MEMBER 3) MAY PLAY A ROLE IN IRON
[发布日期] [发布机构]
[效力级别] [学科分类] 社会科学、人文和艺术(综合)
[关键词] [时效性]