Non-Asymptotic Adaptive Control of Linear-Quadratic Systems

[摘要] Optimal control for the canonical model of systems with linear dynamics and quadratic operating costs (known as LQ systems) is a well-studied problem in the stochastic control literature. When the true system dynamics are unknown, an adaptive policy is required for learning the model parameters and planning a control policy simultaneously. Addressing this trade-off between accurate estimation and good control represents the main challenge in area of adaptive control. Another important issue is to prevent the system becoming destabilized (in the sense that its state grows in an uncontrolled fashion) due to lack of knowledge of the system dynamics. Asymptotically optimal approaches have been thoroughly investigated in the literature, but non-asymptotic results are few and rather incomplete. To derive such results, new concepts and technical tools need to be developed for the estimation during the stabilization period of the system.In adaptive control, the system performance is measured by the regret, which is the difference between the cost of the adaptive policy and that of the optimal control designed according to the known dynamics. In this work, we establish non-asymptotic high probability regret bounds, which are modulo a logarithmic factor, optimal, for different LQ systems with and without identifiability assumptions. We also provide high probability guarantees for a stabilization algorithm based on random linear feedbacks. The results obtained are fairly general, since the assumptions needed are those of: (i) stabilizability of the matrices encoding the system;;s dynamical, and (ii) on the heaviness of the distribution for the noise vectors. The study provides also novel results regarding the estimation of the parameters for presumably unstable Vector Autoregressive (VAR) models. In the classical literature, there are hardly any results for the unstable case, especially regarding finite sample bounds, that is the subject of this work. Our results relate the sample size required as a function of the problem dimension and key characteristics of the true underlying transition matrix and the innovation distribution. To obtain them, appropriate concentration inequalities for random matrices and for sequences of martingale differences are leveraged.

[发布日期] [发布机构] University of Michigan

[效力级别] Linear Systems [学科分类]

[关键词] Non-Asymptotic Adaptive Control;Linear Systems;Finite Time Stabilization;Reinforcement Learning;Unstable Vector Autoregressive;Finite Sample Estimation;Computer Science;Electrical Engineering;Engineering (General);Industrial and Operations Engineering;Mathematics;Statistics and Numeric Data;Engineering;Science;Statistics [时效性]

浏览次数：21

统一登录查看全文激活码登录查看全文