论文标题

基准在逆增强学习中的约束推断

Benchmarking Constraint Inference in Inverse Reinforcement Learning

论文作者

Liu, Guiliang, Luo, Yudong, Gaurav, Ashish, Rezaee, Kasra, Poupart, Pascal

论文摘要

当将强化学习(RL)代理部署到物理系统中时,我们必须确保这些代理非常了解基本的约束。但是,在许多现实世界中,RL代理通常很难在数学上和未知的限制。为了解决这些问题,逆约束强化学习(ICRL)经验估算了专家示范的限制。作为一个新兴的研究主题,ICRL没有共同的基准测试,并且先前的作品在手工制作的环境下测试了手动生成的专家演示的算法。在本文中,我们在RL应用程序域的背景下构建了ICRL基准测试,包括机器人控制和自动驾驶。对于每个环境,我们设计相关的约束和培训专家代理以生成演示数据。此外,与学习确定性约束的现有基准不同,我们提出了一种变异的ICRL方法来对候选约束的后验分布进行建模。我们在基准下对这些算法进行了广泛的实验,并展示了它们如何促进研究ICRL的重要研究挑战。基准,包括复制ICRL算法的说明,可在https://github.com/guiliang/icrl-benchmarks-public上获得。

When deploying Reinforcement Learning (RL) agents into a physical system, we must ensure that these agents are well aware of the underlying constraints. In many real-world problems, however, the constraints are often hard to specify mathematically and unknown to the RL agents. To tackle these issues, Inverse Constrained Reinforcement Learning (ICRL) empirically estimates constraints from expert demonstrations. As an emerging research topic, ICRL does not have common benchmarks, and previous works tested algorithms under hand-crafted environments with manually-generated expert demonstrations. In this paper, we construct an ICRL benchmark in the context of RL application domains, including robot control, and autonomous driving. For each environment, we design relevant constraints and train expert agents to generate demonstration data. Besides, unlike existing baselines that learn a deterministic constraint, we propose a variational ICRL method to model a posterior distribution of candidate constraints. We conduct extensive experiments on these algorithms under our benchmark and show how they can facilitate studying important research challenges for ICRL. The benchmark, including the instructions for reproducing ICRL algorithms, is available at https://github.com/Guiliang/ICRL-benchmarks-public.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源