通过自适应元学习者的行为相似性学习可推广的表示的代表

论文标题

通过自适应元学习者的行为相似性学习可推广的表示的代表

Learning Generalizable Representations for Reinforcement Learning via Adaptive Meta-learner of Behavioral Similarities

论文作者

Chen, Jianda, Pan, Sinno Jialin

论文摘要

如何从高级视觉观察中学习有效的基于强化学习的模型是一个实用且具有挑战性的问题。解决此问题的关键是从观察结果中学习低维状态表示，从中可以从中学习有效的政策。为了提高国家编码的学习，最近的工作集中在捕获状态表示之间的行为相似性或对视觉观察应用数据扩展。在本文中，我们提出了一个基于元学习的新型框架，用于表示有关强化学习的行为相似性。具体而言，我们的框架将高维观测值编码为关于马尔可夫决策过程（MDP）中有关奖励和动态的两个分解嵌入。开发了一对元学习者，其中一个量化了奖励相似性，而另一个量化了相应分解的嵌入的动态相似性。元学习者是通过近似于两个隔离性分配度量的两个不相交项来更新状态嵌入的。为了结合奖励和动态术语，我们进一步制定了一种策略，以根据不同的任务或环境适应其影响。我们从经验上证明，我们提出的框架在几个基准上的最先进基线都优于最先进的基准，包括常规的DM控制套件，分散DM控制套件和自动驾驶任务Carla。

How to learn an effective reinforcement learning-based model for control tasks from high-level visual observations is a practical and challenging problem. A key to solving this problem is to learn low-dimensional state representations from observations, from which an effective policy can be learned. In order to boost the learning of state encoding, recent works are focused on capturing behavioral similarities between state representations or applying data augmentation on visual observations. In this paper, we propose a novel meta-learner-based framework for representation learning regarding behavioral similarities for reinforcement learning. Specifically, our framework encodes the high-dimensional observations into two decomposed embeddings regarding reward and dynamics in a Markov Decision Process (MDP). A pair of meta-learners are developed, one of which quantifies the reward similarity and the other quantifies dynamics similarity over the correspondingly decomposed embeddings. The meta-learners are self-learned to update the state embeddings by approximating two disjoint terms in on-policy bisimulation metric. To incorporate the reward and dynamics terms, we further develop a strategy to adaptively balance their impacts based on different tasks or environments. We empirically demonstrate that our proposed framework outperforms state-of-the-art baselines on several benchmarks, including conventional DM Control Suite, Distracting DM Control Suite and a self-driving task CARLA.

下载PDF全文

下载文献需遵守相关版权规定

论文标题