Cluster analysis and classification of process data by use of principal curves
[摘要] ENGLISH SUMMARY: In this thesis a new method of clustering as wen as a new method of classification isproposed. Cluster analysis is a statistical method used to search for natural groups inan unstructured multivariate data set. Clusters are obtained in such. a way that theobservations belonging to the same group are more alike than observations acrossgroups. For instance, long data records are found in mineral processing plants, wherethe data can be reduced to clusters according to different ore types. Most of theexisting clustering methods do not give reliable results when applied to engineeringdata, since these methods were mainly developed in the domains of psychology andbiology.Classification analysis can be regarded as the natural continuation of cluster analysis.In order to classify objects, two types of observations are needed. The first are thoseobservations whose group memberships are known a priori, which can be acquiredthrough cluster analysis. The second kind of observations are those whose groupmemberships are unidentified. By means of classification these observations areallocated to one of the existing groups.Both of the proposed techniques are based on the use of a smooth one-dimensionalcurve, passing through the middle of the data set. To formalise such an idea,principal curves were developed by Hastie and Stuetzle (1989). A principal curvesummarises the data in a non-linear fashion. For clustering, the principal curve of theentire unstructured data set is extracted. This one-dimensional representation of thedata set is then used to search for different clusters. For classification, a principalcurve is fitted to every known group in the data set. The observations to be assignedto one of the known groups are allocated to the group closest to the new point.Clustering with principal curves grouped engineering data better than most of thewell-known clustering algorithms. Some shortcomings of this method were alsoestablished. Classification with principal curves gave similar, optimal results as compared to some existing classification methods. This classification method can beapplied to data of any distribution, unlike statistical classification techniques.
[发布日期] [发布机构] Stellenbosch University
[效力级别] [学科分类]
[关键词] [时效性]