论文标题
最大程度地减少人类援助:增强一次示范以进行深入强化学习
Minimizing Human Assistance: Augmenting a Single Demonstration for Deep Reinforcement Learning
论文作者
论文摘要
事实证明,在强化学习中使用人类示范可以显着提高剂性能。但是,任何要求人手动“教”该模型的要求与强化学习的目标有些相反。本文试图通过使用通过简单使用的虚拟现实模拟收集的单个人类示例来帮助进行RL训练,从而最大程度地减少人类参与学习过程的参与,同时保留绩效优势。我们的方法增强了一次演示,以产生许多类似人类的演示,与深层确定性的政策梯度和后视经验重播(DDPG +她)相结合时,可以显着改善对简单任务的训练时间,并允许代理解决一个复杂的任务(块堆叠),而DDPG +她单独无法解决。该模型使用单个人类示例实现了这一重要的训练优势,需要不到一分钟的人类输入。此外,尽管从人类的例子中学习了,但代理人并没有限制人类水平的表现,经常学习与人类示范有很大不同的政策。
The use of human demonstrations in reinforcement learning has proven to significantly improve agent performance. However, any requirement for a human to manually 'teach' the model is somewhat antithetical to the goals of reinforcement learning. This paper attempts to minimize human involvement in the learning process while retaining the performance advantages by using a single human example collected through a simple-to-use virtual reality simulation to assist with RL training. Our method augments a single demonstration to generate numerous human-like demonstrations that, when combined with Deep Deterministic Policy Gradients and Hindsight Experience Replay (DDPG + HER) significantly improve training time on simple tasks and allows the agent to solve a complex task (block stacking) that DDPG + HER alone cannot solve. The model achieves this significant training advantage using a single human example, requiring less than a minute of human input. Moreover, despite learning from a human example, the agent is not constrained to human-level performance, often learning a policy that is significantly different from the human demonstration.