A deep reinforcement learning-based optimization method for long-running applications container deployment
[摘要] Unlike the short execution cycles of batch jobs, intelligent algorithmic applications typically run in long-cycle containers in the cloud(Long-Running Applications, LRA). Both need to meet strict SLO (service level objective) requirements, consider performance scaling to cope with peak load demands, and face issues such as I/O dependencies and resource contention and interference from coexisting containers. The above greatly complicates container deployment and can easily lead to performance bottlenecks. Therefore, the optimization of LRA-like container deployment is one of the key issues that cannot be avoided and needs to be addressed in the cloud computing model. This research uses deep reinforcement learning (DRL) to optimize the deployment of LRAs class containers. The proposed non-generic model is able to customize a dedicated model for each container group, providing high-quality placement and low training complexity; meanwhile, the proposed batch deployment scheme is able to optimize various scheduling objectives that are not directly supported by existing constraint-based schedulers, such as minimizing SLO violations. The experimental results show that the performance of the DRL deployment algorithm improves by 56.2% compared to the average RPS of the baseline, indicating that the manual deployment scheme can only meet the basic requirements but cannot cover the complex interactions between containers under constraints from a global perspective. This limitation severely limits the performance of the whole pod. Meanwhile, based on previous experience, the time consumption of a single deployment scheme is about 1 hour, while the time consumption of the DRL deployment algorithm may be less than 7.5 minutes.
[发布日期] [发布机构]
[效力级别] [学科分类] 自动化工程
[关键词] [时效性]