MASER：由经验重播缓冲区产生的子目标的多代理增强学习

论文标题

MASER：由经验重播缓冲区产生的子目标的多代理增强学习

MASER: Multi-Agent Reinforcement Learning with Subgoals Generated from Experience Replay Buffer

论文作者

Jeon, Jeewon, Kim, Woojun, Jung, Whiyoung, Sung, Youngchul

论文摘要

在本文中，我们考虑合作的多代理增强学习（MARL），并以稀疏的奖励。为了解决这个问题，我们提出了一种名为Maser：Marl的新方法，其子宫由经验重播缓冲液产生。在广泛使用的集中式培训的假设下，通过分散执行和对MARL的Q值分解的一致性，Maser通过考虑单个Q值和总Q值来自动为多个代理人生成适当的子目标。然后，Maser根据与Q学习相关的可行表示为每个代理设计个人的固有奖励，以便代理人达到其子目标，同时最大化联合行动价值。数值结果表明，与其他最先进的MARL算法相比，Maser的表现明显优于Starcraft II微管理基准。

In this paper, we consider cooperative multi-agent reinforcement learning (MARL) with sparse reward. To tackle this problem, we propose a novel method named MASER: MARL with subgoals generated from experience replay buffer. Under the widely-used assumption of centralized training with decentralized execution and consistent Q-value decomposition for MARL, MASER automatically generates proper subgoals for multiple agents from the experience replay buffer by considering both individual Q-value and total Q-value. Then, MASER designs individual intrinsic reward for each agent based on actionable representation relevant to Q-learning so that the agents reach their subgoals while maximizing the joint action value. Numerical results show that MASER significantly outperforms StarCraft II micromanagement benchmark compared to other state-of-the-art MARL algorithms.

下载PDF全文

下载文献需遵守相关版权规定

论文标题