关于自适应LQR的非信息最佳策略，b-matrix

论文标题

关于自适应LQR的非信息最佳策略，b-matrix

On Uninformative Optimal Policies in Adaptive LQR with Unknown B-Matrix

论文作者

Ziemann, Ingvar, Sandberg, Henrik

论文摘要

本文介绍了自适应线性二次调节剂（LQR）的局部渐近minimax遗憾的下限。我们认为，即使存在结构性侧面信息，我们也考虑了$ b $ - $ matrices和已知的$ a $ matrices，并旨在了解何时对数遗憾也是不可能的。在定义了渔民信息的奇异性条件方面，我们通过诉诸货车树的不平等（Bayesiancramér-rao）来确定了不信息最佳政策的内在概念，以示fisher信息的奇异性条件，以遗憾的是LQR的遗憾下限，并在近亲形式（Bellman Forral of Bellman错误）中获得了遗憾。结果表明，如果参数化引起了无信息的最佳策略，则对数遗憾是不可能的，并且在时间范围内至少速率是平方根。我们明确地表征了无信息的最佳策略的概念，该策略从系统理论数量和特定实例参数化方面。

This paper presents local asymptotic minimax regret lower bounds for adaptive Linear Quadratic Regulators (LQR). We consider affinely parametrized $B$-matrices and known $A$-matrices and aim to understand when logarithmic regret is impossible even in the presence of structural side information. After defining the intrinsic notion of an uninformative optimal policy in terms of a singularity condition for Fisher information we obtain local minimax regret lower bounds for such uninformative instances of LQR by appealing to van Trees' inequality (Bayesian Cramér-Rao) and a representation of regret in terms of a quadratic form (Bellman error). It is shown that if the parametrization induces an uninformative optimal policy, logarithmic regret is impossible and the rate is at least order square root in the time horizon. We explicitly characterize the notion of an uninformative optimal policy in terms of the nullspaces of system-theoretic quantities and the particular instance parametrization.

下载PDF全文

下载文献需遵守相关版权规定

论文标题