用句子级语义的零拍摄文本冒险游戏的学习

论文标题

用句子级语义的零拍摄文本冒险游戏的学习

Zero-Shot Learning of Text Adventure Games with Sentence-Level Semantics

论文作者

Yin, Xusen, May, Jonathan

论文摘要

加强学习算法（例如Q-学习算法）在培训模型中表现出巨大的希望，以学习为给定系统状态采取的最佳行动；具有探索性或对抗性的应用程序的目标，例如面向任务的对话或游戏。但是，无法直接访问其状态的模型很难训练。当唯一的状态访问是通过语言媒介而言，这可能特别明显。我们介绍了一个可适应深Q学习的新模型，该模型结合了暹罗神经网络体系结构和对Q值函数的新颖重构，以便更好地表示系统状态，鉴于其在语言通道上的近似值。我们在基于文本的冒险游戏学习的背景下评估模型。外部，我们的模型达到了基线的融合性能点，仅需要其迭代率的15％，达到比基线高15％的收敛性能点，并且能够玩不见的，无关的游戏，没有微调。我们探究了新模型的表示空间，以确定本质上，这是由于将不同的语言调解到同一状态的适当聚类所致。

Reinforcement learning algorithms such as Q-learning have shown great promise in training models to learn the optimal action to take for a given system state; a goal in applications with an exploratory or adversarial nature such as task-oriented dialogues or games. However, models that do not have direct access to their state are harder to train; when the only state access is via the medium of language, this can be particularly pronounced. We introduce a new model amenable to deep Q-learning that incorporates a Siamese neural network architecture and a novel refactoring of the Q-value function in order to better represent system state given its approximation over a language channel. We evaluate the model in the context of zero-shot text-based adventure game learning. Extrinsically, our model reaches the baseline's convergence performance point needing only 15% of its iterations, reaches a convergence performance point 15% higher than the baseline's, and is able to play unseen, unrelated games with no fine-tuning. We probe our new model's representation space to determine that intrinsically, this is due to the appropriate clustering of different linguistic mediation into the same state.

下载PDF全文

下载文献需遵守相关版权规定

论文标题