正规化的软演员批评用于行为转移学习

论文标题

正规化的软演员批评用于行为转移学习

Regularized Soft Actor-Critic for Behavior Transfer Learning

论文作者

Tan, Mingxi, Tian, Andong, Denoyer, Ludovic

论文摘要

现有的模仿学习方法主要集中于使代理有效地模仿一个表现出的行为，但不能解决行为方式与任务目标之间的潜在矛盾。普遍缺乏有效的方法，可以使代理在完成任务的主要目标的同时部分模仿不同程度上证明的行为。在本文中，我们提出了一种称为正则化软批评的方法，该方法在受约束的马尔可夫决策过程框架（CMDP）下制定了主要任务和模仿任务。主要任务定义为软性参数（SAC）中使用的最大熵目标，模仿任务定义为约束。我们评估了与视频游戏应用程序相关的连续控制任务的方法。

Existing imitation learning methods mainly focus on making an agent effectively mimic a demonstrated behavior, but do not address the potential contradiction between the behavior style and the objective of a task. There is a general lack of efficient methods that allow an agent to partially imitate a demonstrated behavior to varying degrees, while completing the main objective of a task. In this paper we propose a method called Regularized Soft Actor-Critic which formulates the main task and the imitation task under the Constrained Markov Decision Process framework (CMDP). The main task is defined as the maximum entropy objective used in Soft Actor-Critic (SAC) and the imitation task is defined as a constraint. We evaluate our method on continuous control tasks relevant to video games applications.

下载PDF全文

下载文献需遵守相关版权规定

论文标题