已收录 268921 条政策
 政策提纲
  • 暂无提纲
Language Models for Searching in Web Corpora
[摘要] We describe our participation in theTREC 2004 Web and Terabyte tracks. For the web track, we employ mixture language models based on document fulltext, incoming anchor text, and documents titles, with a range of web centric priors. We provide a detailed analysis of the effect on relevance of document length, URL structure, and link topology. The result ing webcentric priors are applied to three types of topics—distillation, home page, and named page—and improve effectiveness for all topic types, as well as for the mixed query set. For the terabyte track, we experimented with build ing an index just based on the document titles, or on the incoming anchor texts. Very selective indexing leads to a compact index that is effec tive in terms of early precision, catering for the
[发布日期]  [发布机构] 
[效力级别]  [学科分类] 社会科学、人文和艺术(综合)
[关键词]  [时效性] 
   浏览次数:2      统一登录查看全文      激活码登录查看全文