The role of confidence and diversity in dynamic ensemble class prediction systems

[摘要] Classification is a data mining problem that arises in many real-world applications. A popular approach to tackle these classification problems is using an ensemble of classifiers that combines the collective knowledge of several classifiers. Most popular methods create a static ensemble, in which a single ensemble is constructed or chosen from a pool of classifiers and used for all new data instances. Two factors that have been frequently used to construct a static ensemble are the accuracy of and diversity among the individual classifiers. There have been many studies investigating how these factors should be combined and how much diversity is required to increase the ensemble's performance. These results have concluded that it is not trivial to build a static ensemble that generalizes well. Recently, a different approach has been undertaken: dynamic ensemble construction. Using a different set of classifiers for each new data instance rather than a single static ensemble of classifiers may increase performance since the dynamic ensemble is not required to generalize across the feature space. Most studies on dynamic ensembles focus on classifiers' competency in the local region in which a new data instance resides or agreement among the classifiers. In this thesis, we propose several other approaches for dynamic class prediction. Existing methods focus on assigned labels or their correctness. We hypothesize that using the class probability estimates returned by the classifiers can enhance our estimate of the competency of classifiers on the prediction. We focus on how to use class prediction probabilities (confidence) along with accuracy and diversity to create dynamic ensembles and analyze the contribution of confidence to the system. Our results show that confidence is a significant factor in the dynamic setting. However, it is still unclear how accurate, diverse, and confident ensemble can best be formed to increase the prediction capability of the system. Second, we propose a system for dynamic ensemble classification based on a new distance measure to evaluate the distance between data instances. We first map data instances into a space defined by the class probability estimates from a pool of two-class classifiers. We dynamically select classifiers (features) and the k-nearest neighbors of a new instance by minimizing the distance between the neighbors and the new instance in a two-step framework. Results of our experiments show that our measure is effective for finding similar instances and our framework helps making more accurate predictions. Classifiers' agreement in the region

[发布日期] [发布机构] University of Iowa

[效力级别] [学科分类]

[关键词] [时效性]

浏览次数：3

统一登录查看全文激活码登录查看全文