论文标题

从视觉观察中学习的挑战和机遇

Challenges and Opportunities in Offline Reinforcement Learning from Visual Observations

论文作者

Lu, Cong, Ball, Philip J., Rudner, Tim G. J., Parker-Holder, Jack, Osborne, Michael A., Teh, Yee Whye

论文摘要

离线增强学习在利用大型预采用的数据集进行政策学习方面表现出了巨大的希望,从而使代理可以放弃经常廉价的在线数据收集。但是,从具有连续动作空间的视觉观察中学习的离线增强措施仍未探索,对这个复杂领域的主要挑战有限的理解。在本文中,我们建立了简单的基准,以在视觉领域中连续控制,并引入了一套基准测试任务,以从视觉观察中从视觉观察中学习,旨在更好地代表现实世界中离线RL问题中存在的数据分布,并由一组Desiderata套件引导,以通过视觉识别和可视觉上的视觉识别,以进行视觉上的观察和动态识别。使用这套基准测试任务,我们表明对两个流行的基于愿景的在线增强学习算法(Dreamerv2和DRQ-V2)进行了简单的修改,足以超越现有的离线RL方法并建立竞争性基线以在视觉域中连续控制。我们严格评估这些算法,并对基于最新模型和无模型的离线RL方法之间的差异进行经验评估,以通过视觉观察连续控制。本评估中使用的所有代码和数据都是开源的,以促进该领域的进度。

Offline reinforcement learning has shown great promise in leveraging large pre-collected datasets for policy learning, allowing agents to forgo often-expensive online data collection. However, offline reinforcement learning from visual observations with continuous action spaces remains under-explored, with a limited understanding of the key challenges in this complex domain. In this paper, we establish simple baselines for continuous control in the visual domain and introduce a suite of benchmarking tasks for offline reinforcement learning from visual observations designed to better represent the data distributions present in real-world offline RL problems and guided by a set of desiderata for offline RL from visual observations, including robustness to visual distractions and visually identifiable changes in dynamics. Using this suite of benchmarking tasks, we show that simple modifications to two popular vision-based online reinforcement learning algorithms, DreamerV2 and DrQ-v2, suffice to outperform existing offline RL methods and establish competitive baselines for continuous control in the visual domain. We rigorously evaluate these algorithms and perform an empirical evaluation of the differences between state-of-the-art model-based and model-free offline RL methods for continuous control from visual observations. All code and data used in this evaluation are open-sourced to facilitate progress in this domain.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源