Multiple outlier detection and cluster analysis of multivariate normal data
[摘要] ENGLISH ABSTRACT: Outliers may be defined as observations that are sufficiently aberrant to arouse thesuspicion of the analyst as to their origin. They could be the result of human error, inwhich case they should be corrected, but they may also be an interesting exception,and this would deserve further investigation.Identification of outliers typically consists of an informal inspection of a plot ofthe data, but this is unreliable for dimensions greater than two. A formal procedurefor detecting outliers allows for consistency when classifying observations. It alsoenables one to automate the detection of outliers by using computers.The special case of univariate data is treated separately to introduce essentialconcepts, and also because it may well be of interest in its own right. We then considertechniques used for detecting multiple outliers in a multivariate normal sample,and go on to explain how these may be generalized to include cluster analysis.Multivariate outlier detection is based on the Minimum Covariance Determinant(MCD) subset, and is therefore treated in detail. Exact bivariate algorithms wererefined and implemented, and the solutions were used to establish the performanceof the commonly used heuristic, Fast–MCD.
[发布日期] [发布机构] Stellenbosch University
[效力级别] [学科分类]
[关键词] [时效性]