Binary classification trees : a comparison with popular classification methods in statistics using different software
[摘要] ENGLISH ABSTRACT: Consider a data set with a categorical response variable and a set of explanatoryvariables. The response variable can have two or more categories and the explanatoryvariables can be numerical or categorical. This is a typical setup for a classificationanalysis, where we want to model the response based on the explanatory variables.Traditional statistical methods have been developed under certain assumptionssuch as: the explanatory variables are numeric only and! or the data follow a multivariatenormal distribution. hl practice such assumptions are not always met. Different researchfields generate data that have a mixed structure (categorical and numeric) and researchersare often interested using all these data in the analysis. hl recent years robust methodssuch as classification trees have become the substitute for traditional statistical methodswhen the above assumptions are violated. Classification trees are not only an effectiveclassification method, but offer many other advantages.The aim of this thesis is to highlight the advantages of classification trees. hl thechapters that follow, the theory of and further developments on classification trees arediscussed. This forms the foundation for the CART software which is discussed inChapter 5, as well as other software in which classification tree modeling is possible. Wewill compare classification trees to parametric-, kernel- and k-nearest-neighbourdiscriminant analyses. A neural network is also compared to classification trees andfinally we draw some conclusions on classification trees and its comparisons with othermethods.
[发布日期] [发布机构] Stellenbosch University
[效力级别] [学科分类]
[关键词] [时效性]