Exploratory data analysis and empirical modelling of stationary processes by use of genetic programming

[摘要] ENGLISH SUMMARY: Enhancing the performance of any process requires a detailed knowledge ofthe unknown system, with a mathematical model being the most common means ofrepresenting this knowledge. The most frequently used statistical techniques, assumethat any relationships between input and output variables are linear and that the dataitself is normally distributed. However, real world systems can be highly non-linear andlinear approaches can therefore fail to predict the behaviour of the system accurately.Explicit specification of optimal structure in large non-linear models is often not practicaland as a result, non-parametric methods (kernel regression, artificial neural networks,etc.) are usually employed. Although these models allow accurate representation ofcomplex systems, they can be very difficult to interpret.This research project explores a novel approach to this problem of mathematicalmodelling which attempts to evolve optimal parametric models, based on the Darwinianmechanism of evolution. This approach, referred to as genetic programming (GP),facilitates development of explicit or implicit models, or any mix of these two extremes,as dictated by the problem and unlike other methods, it can handle a trade-off betweenaccuracy and interpretability with great ease.During this research; a -commercial application (a-GP) was developed, since very fewcommercial systems are currently available. Some techniques were developed, whichimproved the performance ofthe original algorithm considerably. For instance, memorydemands were decreased by a factor of 5 by utilizing a different implementation model.Improved convergence and robustness was obtained by using a correlation-basedfitness function in conjunction with a correction filter which reduced the sum of the squared errors; at the expense of a more complex model. The evaluation process wasexpedited by evaluating each tree-like structure as a reverse polish expression; asopposed to a branch-node reduction technique. Additional execution speed was furtherobtained by implementing the algorithm in c++ (an object oriented compiled language)which is significantly faster than the original LISP (an interpreted language)implementation, .The newly improved algorithm, a-GP, was applied to four industrial data sets and theresults were compared against other methods such as standard genetic programming,multilayer perceptron neural networks and linear regression. It was found that a-GPoutperformed standard genetic programming on all four case studies, while improvingon neural networks on half of the runs.The evolved models tended to be complex. This could be attributed to the lack ofparameter estimation that the genetic programming algorithm tried to compensate forby evolving complex tree structures; which it used to approximate the parameters.As a data visualization tool, a-GP was applied to four bench marking data sets usedextensively in the literature. The results acquired with a-GP compared favourably withthose obtained by other methods with the additional benefit in that a-GP was able toevolve simple mapping functions, which clearly indicated how the variables related tothe structure. Additionally, the algorithm was applied in the mapping of two industrialprocesses. The results showed distinct clustering tendencies within the data, indicatingthe different operating regimes of the processes under investigation.

[发布日期] [发布机构] Stellenbosch University

[效力级别] [学科分类]

[关键词] [时效性]

浏览次数：5

统一登录查看全文激活码登录查看全文