Design and analysis of rule induction systems
[摘要] The RULES family of algorithms is reviewed in this work and the drawback of the variation in their generalisation performance is investigated. This results in a new data ordering method (DOM) for the RULES family of inductive learning algorithms. DOM is based on the selection of the most representative example; the method has been tested as a pre-processing stage for many data sets and has shown promising results. Another difficulty faced is the growing size of training data sets, which results in long algorithm execution times and less compact generated rules. In this study a new data sorting method (DSM) is developed for ordering the whole data set and reducing the training time. This is based on selecting relevant attributes and best possible examples to represent a data set.Finally, the order in which the raw data is introduced to the RULES family algorithms considerably affects the accuracy of the generated rules. This work presents a new data grouping method (DGM) to solve this problem, which is based on clustering. This method, in the form of an algorithm, is integrated into a data mining tool and applied to a real project; as a result, better variation in the classification percentage and a lower number of rules formed has been achieved.
[发布日期] [发布机构] University:University of Birmingham;Department:School of Engineering, Department of Mechanical Engineering
[效力级别] [学科分类]
[关键词] Q Science;QA Mathematics [时效性]