Non-linear neurocontrol of chemical processes using reinforcement learning
[摘要] ENGLISH ABSTRACT: The difficulties of chemical process control using plain Proportional-Integral-Derivative (PID) methods include interaction of process manipulated- and controlvariables as well as difficulty in tuning. One way of eliminating these problems is touse a centralized non-linear control solution such as a feed-forward neural network.While many ways exist to train such neurocontrollers, one of the promisingactive research areas is reinforcement learning. The biggest drawing card of theneurocontrol using reinforcement learning paradigm is that no expert knowledge ofthe system is neccesary - all control knowledge is gained by interaction with theplant model.This work uses episodic reinforcement learning to train controllers using twotypes of process model - non-linear dynamic models and non-linear autoregressivemodels. The first was termed model-based training and the second data-based learning.By testing the controllers obtained during data-based learning on the originalmodel, the effect of plant model mismatch and therefore real-world applicabilitycould be seen. In addition, two reinforcement learning algorithms, Policy Gradientswith Parameter-based Exploration (PGPE) and the Covariance Matrix AdaptationEvolution Strategy (CMA-ES) were compared to one-another. Set point trackingwas facilitated by the use of integral error feedback.Two control case studies were conducted to test the effectiveness of each typeof controller and algorithm, and allowed comparison to multi-loop feedback control.The first is a ball mill grinding circuit pilot plant model with 5 degrees of freedom,and the second a 41-stage binary distillation column with 7 degrees of freedom.The ball mill case study showed that centralized non-linear feedback controlusing neural networks can improve on even highly optimized PI control methods,with the proposed integral error-feedback neural network architecture working verywell at tracking the set point. CMA-ES produced better results than PGPE, beingable to find up to 20% better solutions. When compared to PI control, the ball mill neurocontrol solution had a 6% higher productivity and showed more than 10%improvement of the product size set point tracking. In the case of some plant-modelmismatch (88% fit), the data-based ball mill neurocontroller still achieved better setpoint tracking and disturbance handling than PI control, but productivity did notimprove.The distillation case study showed less positive results. While reinforcementlearning was able to learn successful controllers in the case of no plant-model mismatchand outperform LV - and (L/D)(V/B)-based PI control, the best-performingneurocontroller still performed up to 20% worse than DB-based PI control. Onceagain, CMA-ES showed better performance than PGPE, with latter even failing tofind feasible control solutions.While on-line learning in the ball mill study was made impossible due to stabilityissues, on-line adaptation in the distillation case study succeeded with the use of apartial neurocontroller. The learner was able to achieve, with a success rate ofjust over 50%, greater than 95% purity in both distillate and bottoms within 2,000minutes of interacting with the plant.Overall, reinforcement learning showed that, when there is sufficient room forimprovement over existing control implementations, it can make for a very goodreplacement control solution even when no model is available. Future work shouldfocus on evaluating these techniques in lab-scale control studies.
[发布日期] [发布机构] Stellenbosch University
[效力级别] [学科分类]
[关键词] [时效性]