非本地政策优化通过多样性调节的协作探索

论文标题

非本地政策优化通过多样性调节的协作探索

Non-local Policy Optimization via Diversity-regularized Collaborative Exploration

论文作者

Peng, Zhenghao, Sun, Hao, Zhou, Bolei

论文摘要

常规的加强学习（RL）算法通常具有一个单一的代理学习来独立解决任务。结果，代理只能探索国家行动空间的有限部分，而学习的行为与代理商以前的经验高度相关，从而使训练容易达到本地最低限度。在这项工作中，我们使RL具有团队合作的能力，并提出了一个新型的非本地政策优化框架，称为多样性调节合作探索（DICE）。骰子利用一组异构代理人同时探索环境并分享收集的经验。正规化机制进一步旨在维持团队的多样性并调节探索。我们在政府和违反政策设置中实施了框架，实验结果表明，骰子可以比Mujoco运动任务中的基线实现实质性改善。

Conventional Reinforcement Learning (RL) algorithms usually have one single agent learning to solve the task independently. As a result, the agent can only explore a limited part of the state-action space while the learned behavior is highly correlated to the agent's previous experience, making the training prone to a local minimum. In this work, we empower RL with the capability of teamwork and propose a novel non-local policy optimization framework called Diversity-regularized Collaborative Exploration (DiCE). DiCE utilizes a group of heterogeneous agents to explore the environment simultaneously and share the collected experiences. A regularization mechanism is further designed to maintain the diversity of the team and modulate the exploration. We implement the framework in both on-policy and off-policy settings and the experimental results show that DiCE can achieve substantial improvement over the baselines in the MuJoCo locomotion tasks.

下载PDF全文

下载文献需遵守相关版权规定

论文标题