Nearest hypersphere classification : a comparison with other classification techniques
[摘要] ENGLISH ABSTRACT: Classification is a widely used statistical procedure to classify objects into two or moreclasses according to some rule which is based on the input variables. Examples of suchtechniques are Linear and Quadratic Discriminant Analysis (LDA and QDA). However,classification of objects with these methods can get complicated when the number of inputvariables in the data become too large (���� ≪ ����), when the assumption of normality is nolonger met or when classes are not linearly separable. Vapnik et al. (1995) introduced theSupport Vector Machine (SVM), a kernel-based technique, which can perform classificationin cases where LDA and QDA are not valid. SVM makes use of an optimal separatinghyperplane and a kernel function to derive a rule which can be used for classifying objects.Another kernel-based technique was proposed by Tax and Duin (1999) where a hypersphereis used for domain description of a single class. The idea of a hypersphere for a single classcan be easily extended to classification when dealing with multiple classes by just classifyingobjects to the nearest hypersphere.Although the theory of hyperspheres is well developed, not much research has gone intousing hyperspheres for classification and the performance thereof compared to otherclassification techniques. In this thesis we will give an overview of Nearest HypersphereClassification (NHC) as well as provide further insight regarding the performance of NHCcompared to other classification techniques (LDA, QDA and SVM) under differentsimulation configurations.We begin with a literature study, where the theory of the classification techniques LDA,QDA, SVM and NHC will be dealt with. In the discussion of each technique, applications inthe statistical software R will also be provided. An extensive simulation study is carried outto compare the performance of LDA, QDA, SVM and NHC for the two-class case. Variousdata scenarios will be considered in the simulation study. This will give further insight interms of which classification technique performs better under the different data scenarios.Finally, the thesis ends with the comparison of these techniques on real-world data.
[发布日期] [发布机构] Stellenbosch University
[效力级别] [学科分类]
[关键词] [时效性]