一群浮标进行动态海洋监测的多代理增强学习

论文标题

一群浮标进行动态海洋监测的多代理增强学习

Multi-Agent Reinforcement Learning for Dynamic Ocean Monitoring by a Swarm of Buoys

论文作者

Kouzehgar, Maryam, Meghjani, Malika, Bouffanais, Roland

论文摘要

自主海洋环境监测问题传统上包括一个区域覆盖问题，只能由多机器人系统有效地进行。在本文中，我们专注于通常通过从生物启发的策略的微妙而临时的组合获得的简单群体进行操作和控制的机器人群。我们建议使用多代理增强学习（MARL）有效地涉及环境特征的非平稳性，提出了一种新颖的结构化方法，以实现区域覆盖。具体而言，我们提出了两种动态区域覆盖方法：（1）基于群体的MARL和（2）基于覆盖范围的MARL。前者是使用多代理深层确定性策略梯度（MADDPG）方法训练的，而MADDPG的修改版本的奖励功能本质上会导致集体行为。两种方法均经过测试并用不同的几何形状区域（平方与矩形）获得可接受的区域覆盖范围，并从非平稳环境中的结构化学习中受益。与幼稚的蜂群相比，两种方法都是有利的。但是，基于覆盖范围的MARL在学习标准方面具有更强的收敛特征，并且代理在区域覆盖方面的扩散较高。

Autonomous marine environmental monitoring problem traditionally encompasses an area coverage problem which can only be effectively carried out by a multi-robot system. In this paper, we focus on robotic swarms that are typically operated and controlled by means of simple swarming behaviors obtained from a subtle, yet ad hoc combination of bio-inspired strategies. We propose a novel and structured approach for area coverage using multi-agent reinforcement learning (MARL) which effectively deals with the non-stationarity of environmental features. Specifically, we propose two dynamic area coverage approaches: (1) swarm-based MARL, and (2) coverage-range-based MARL. The former is trained using the multi-agent deep deterministic policy gradient (MADDPG) approach whereas, a modified version of MADDPG is introduced for the latter with a reward function that intrinsically leads to a collective behavior. Both methods are tested and validated with different geometric shaped regions with equal surface area (square vs. rectangle) yielding acceptable area coverage, and benefiting from the structured learning in non-stationary environments. Both approaches are advantageous compared to a naïve swarming method. However, coverage-range-based MARL outperforms the swarm-based MARL with stronger convergence features in learning criteria and higher spreading of agents for area coverage.

下载PDF全文

下载文献需遵守相关版权规定

论文标题