学习针对具有未知动态的系统的最小纳入控制法律稳定控制法律

论文标题

学习针对具有未知动态的系统的最小纳入控制法律稳定控制法律

Learning Min-norm Stabilizing Control Laws for Systems with Unknown Dynamics

论文作者

Westenbroek, Tyler, Castaneda, Fernando, Agrawal, Ayush, Sastry, S. Shankar, Sreenath, Koushil

论文摘要

本文介绍了一个框架，该框架使用使用模型的策略优化方法来学习具有未知动力学的系统的最小值稳定控制器。该方法首先首先为系统（可能不准确的）动力学模型设计控制Lyapunov函数（CLF），以及一个指定CLF在状态空间中不同点的CLF的最低可接受能量耗散速率的功能。将能量耗散条件视为对现实世界系统所需的闭环行为的限制，我们使用惩罚方法来对学习控制器的参数提出一个不受约束的优化问题，可以使用工厂收集的数据使用无模型的策略优化算法来求解。我们讨论优化何时了解现实世界系统的稳定控制器，并在学习控制器的结构上得出条件，以确保优化强烈凸出，这意味着可以可靠地找到全球最佳解决方案。我们验证模拟方法的方法，首先是双摆，然后概括该框架，以使用混合零动力学框架来学习稳定的步行双皮德机器人。通过将大量结构编码到学习问题中，我们只需几分钟甚至几秒钟的培训数据就可以学习两个系统的稳定控制器。

This paper introduces a framework for learning a minimum-norm stabilizing controller for a system with unknown dynamics using model-free policy optimization methods. The approach begins by first designing a Control Lyapunov Function (CLF) for a (possibly inaccurate) dynamics model for the system, along with a function which specifies a minimum acceptable rate of energy dissipation for the CLF at different points in the state-space. Treating the energy dissipation condition as a constraint on the desired closed-loop behavior of the real-world system, we use penalty methods to formulate an unconstrained optimization problem over the parameters of a learned controller, which can be solved using model-free policy optimization algorithms using data collected from the plant. We discuss when the optimization learns a stabilizing controller for the real world system and derive conditions on the structure of the learned controller which ensure that the optimization is strongly convex, meaning the globally optimal solution can be found reliably. We validate the approach in simulation, first for a double pendulum, and then generalize the framework to learn stable walking controllers for underactuated bipedal robots using the Hybrid Zero Dynamics framework. By encoding a large amount of structure into the learning problem, we are able to learn stabilizing controllers for both systems with only minutes or even seconds of training data.

下载PDF全文

下载文献需遵守相关版权规定

论文标题