论文标题

非本地政策优化通过多样性调节的协作探索

Non-local Policy Optimization via Diversity-regularized Collaborative Exploration

论文作者

Peng, Zhenghao, Sun, Hao, Zhou, Bolei

论文摘要

常规的加强学习(RL)算法通常具有一个单一的代理学习来独立解决任务。结果,代理只能探索国家行动空间的有限部分,而学习的行为与代理商以前的经验高度相关,从而使训练容易达到本地最低限度。在这项工作中,我们使RL具有团队合作的能力,并提出了一个新型的非本地政策优化框架,称为多样性调节合作探索(DICE)。骰子利用一组异构代理人同时探索环境并分享收集的经验。正规化机制进一步旨在维持团队的多样性并调节探索。我们在政府和违反政策设置中实施了框架,实验结果表明,骰子可以比Mujoco运动任务中的基线实现实质性改善。

Conventional Reinforcement Learning (RL) algorithms usually have one single agent learning to solve the task independently. As a result, the agent can only explore a limited part of the state-action space while the learned behavior is highly correlated to the agent's previous experience, making the training prone to a local minimum. In this work, we empower RL with the capability of teamwork and propose a novel non-local policy optimization framework called Diversity-regularized Collaborative Exploration (DiCE). DiCE utilizes a group of heterogeneous agents to explore the environment simultaneously and share the collected experiences. A regularization mechanism is further designed to maintain the diversity of the team and modulate the exploration. We implement the framework in both on-policy and off-policy settings and the experimental results show that DiCE can achieve substantial improvement over the baselines in the MuJoCo locomotion tasks.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源