已收录 268921 条政策
 政策提纲
  • 暂无提纲
An investigation of several document classification algorithms leading to the design of an autonomous software agent for locating specific, relevant information on the World Wide Web
[摘要] The goal of the research described in this thesis was to design an autonomous software agent that can locate specific, relevant information on the World Wide Web. The first chapter provides the motivation behind this projectand a brief overview of the challenges associated with it. The next chapter presents the analysis which led to the development of a new, improved version of the computer program called ITRule. The improvements consistof a new algorithm for classifying documents that outperforms the previous one, significantly enhanced support for data exploration, i.e., the process ofextracting information from raw data, and a new algorithm for quantizing numeric variables so they can be used by ITRule. The third part of this thesis compares the performances of three versions of ITRule, two versions of the Naive Bayes classifier, several neural networks, the decision tree algorithm called CART, and a linear support vector machine, in order to determine which one is best suited for selecting relevant web pages. An analysis of thetest results shows that a new ITRule classification algorithm, based on cross validation combined with the J-measure, performs best. The fourth and final part of the thesis describes how some of these results were used in the design of a user friendly, autonomous software agent called Poirot that can help World Wide Web users stay up to date on new developments in topics of interest.
[发布日期]  [发布机构] University:California Institute of Technology;Department:Engineering and Applied Science
[效力级别]  [学科分类] 
[关键词] Electrical Engineering [时效性] 
   浏览次数:3      统一登录查看全文      激活码登录查看全文