通过解释和上下文感知数据扩大人类引导的增强学习中的管道

论文标题

通过解释和上下文感知数据扩大人类引导的增强学习中的管道

Widening the Pipeline in Human-Guided Reinforcement Learning with Explanation and Context-Aware Data Augmentation

论文作者

Guan, Lin, Verma, Mudit, Guo, Sihang, Zhang, Ruohan, Kambhampati, Subbarao

论文摘要

人类的解释（例如，就特征重要性而言）最近用于扩展交互式机器学习中的人与代理之间的通信渠道。在这种情况下，人类教练不仅提供了基础真理，还提供某种形式的解释。但是，这种人类的指导仅在监督的学习任务中进行了研究，目前尚不清楚如何最好地将这种类型的人类知识纳入深度强化学习。在本文中，我们介绍了在人类在人类增强学习中使用人类视觉解释（HRL）的首次研究。我们专注于从反馈中学习的任务，其中人类培训师不仅为查询的状态行动对提供了二进制评估“好”或“坏”反馈，而且还通过注释图像中的相关功能来提供视觉上的解释。我们提出扩展（说明增强反馈），以鼓励模型通过上下文感知的数据增强编码与任务相关的功能，该数据仅在人类显着信息中无关紧要。我们选择五项任务，即Pixel-Taxi和四个Atari游戏，以评估此方法的性能和样本效率。我们表明，我们的方法极大地优于利用人类解释的方法，这些方法是根据监督学习和仅利用评估反馈的人类RL基线所改善的。

Human explanation (e.g., in terms of feature importance) has been recently used to extend the communication channel between human and agent in interactive machine learning. Under this setting, human trainers provide not only the ground truth but also some form of explanation. However, this kind of human guidance was only investigated in supervised learning tasks, and it remains unclear how to best incorporate this type of human knowledge into deep reinforcement learning. In this paper, we present the first study of using human visual explanations in human-in-the-loop reinforcement learning (HRL). We focus on the task of learning from feedback, in which the human trainer not only gives binary evaluative "good" or "bad" feedback for queried state-action pairs, but also provides a visual explanation by annotating relevant features in images. We propose EXPAND (EXPlanation AugmeNted feeDback) to encourage the model to encode task-relevant features through a context-aware data augmentation that only perturbs irrelevant features in human salient information. We choose five tasks, namely Pixel-Taxi and four Atari games, to evaluate the performance and sample efficiency of this approach. We show that our method significantly outperforms methods leveraging human explanation that are adapted from supervised learning, and Human-in-the-loop RL baselines that only utilize evaluative feedback.

下载PDF全文

下载文献需遵守相关版权规定

论文标题