大规模检索用于加固学习

论文标题

大规模检索用于加固学习

Large-Scale Retrieval for Reinforcement Learning

论文作者

Humphreys, Peter C., Guez, Arthur, Tieleman, Olivier, Sifre, Laurent, Weber, Théophane, Lillicrap, Timothy

论文摘要

有效的决策涉及将过去的经验和相关上下文信息与新型情况联系起来。在深度强化学习（RL）中，主导范式是为了摊销信息，以通过训练损失来帮助通过梯度下降到网络权重。在这里，我们采用了一种替代方法，其中代理可以利用大规模上下文敏感数据库查找来支持其参数计算。这使代理商可以直接以端到端的方式学习，以利用相关信息来告知其输出。此外，代理可以通过简单地增加检索数据集来访问代理商而无需重新培训。我们研究了9x9 GO中离线RL的这种方法，这是一款具有挑战性的游戏，庞大的组合状态空间特权对与过去的经验进行直接匹配。我们利用快速，大约最近的邻居技术来从数千万的专家示范状态中检索相关数据。参与此信息为仅将这些示范作为训练轨迹的示范提供了明显的预测准确性和游戏表现，从而提供了令人信服的演示，以表明在离线RL代理中大规模检索的价值。

Effective decision making involves flexibly relating past experiences and relevant contextual information to a novel situation. In deep reinforcement learning (RL), the dominant paradigm is for an agent to amortise information that helps decision making into its network weights via gradient descent on training losses. Here, we pursue an alternative approach in which agents can utilise large-scale context sensitive database lookups to support their parametric computations. This allows agents to directly learn in an end-to-end manner to utilise relevant information to inform their outputs. In addition, new information can be attended to by the agent, without retraining, by simply augmenting the retrieval dataset. We study this approach for offline RL in 9x9 Go, a challenging game for which the vast combinatorial state space privileges generalisation over direct matching to past experiences. We leverage fast, approximate nearest neighbor techniques in order to retrieve relevant data from a set of tens of millions of expert demonstration states. Attending to this information provides a significant boost to prediction accuracy and game-play performance over simply using these demonstrations as training trajectories, providing a compelling demonstration of the value of large-scale retrieval in offline RL agents.

下载PDF全文

下载文献需遵守相关版权规定

论文标题