The use of classification methods for gross error detection in process data
[摘要] ENGLISH ABSTRACT: All process measurements contain some element of error. Typically, a distinction is made betweenrandom errors, with zero expected value, and gross errors with non-zero magnitude. Data Reconciliation(DR) and Gross Error Detection (GED) comprise a collection of techniques designed to attenuatemeasurement errors in process data in order to reduce the effect of the errors on subsequent use of thedata. DR proceeds by finding the optimum adjustments so that reconciled measurement data satisfyimposed process constraints, such as material and energy balances. The DR solution is optimal underthe assumed statistical random error model, typically Gaussian with zero mean and known covariance.The presence of outliers and gross errors in the measurements or imposed process constraints invalidatesthe assumptions underlying DR, so that the DR solution may become biased. GED is required to detect,identify and remove or otherwise compensate for the gross errors. Typically GED relies on formalhypothesis testing of constraint residuals or measurement adjustment-based statistics derived from theassumed random error statistical model.Classification methodologies are methods by which observations are classified as belonging to one ofseveral possible groups. For the GED problem, artificial neural networks (ANN's) have been appliedhistorically to resolve the classification of a data set as either containing or not containing a gross error.The hypothesis investigated in this thesis is that classification methodologies, specifically classificationtrees (CT) and linear or quadratic classification functions (LCF, QCF), may provide an alternative to theclassical GED techniques.This hypothesis is tested via the modelling of a simple steady-state process unit with associatedsimulated process measurements. DR is performed on the simulated process measurements in order tosatisfy one linear and two nonlinear material conservation constraints. Selected features from the DRprocedure and process constraints are incorporated into two separate input vectors for classifierconstruction. The performance of the classification methodologies developed on each input vector iscompared with the classical measurement test in order to address the posed hypothesis.General trends in the results are as follows: - The power to detect and/or identify a gross error is a strong function of the gross error magnitudeas well as location for all the classification methodologies as well as the measurement test.- For some locations there exist large differences between the power to detect a gross error and thepower to identify it correctly. This is consistent over all the classifiers and their associatedmeasurement tests, and indicates significant smearing of gross errors.- In general, the classification methodologies have higher power for equivalent type I error thanthe measurement test.- The measurement test is superior for small magnitude gross errors, and for specific locations,depending on which classification methodology it is compared with.There is significant scope to extend the work to more complex processes and constraints, includingdynamic processes with multiple gross errors in the system. Further investigation into the optimalselection of input vector elements for the classification methodologies is also required.
[发布日期] [发布机构] Stellenbosch University
[效力级别] [学科分类]
[关键词] [时效性]