论文标题

用句子级语义的零拍摄文本冒险游戏的学习

Zero-Shot Learning of Text Adventure Games with Sentence-Level Semantics

论文作者

Yin, Xusen, May, Jonathan

论文摘要

加强学习算法(例如Q-学习算法)在培训模型中表现出巨大的希望,以学习为给定系统状态采取的最佳行动;具有探索性或对抗性的应用程序的目标,例如面向任务的对话或游戏。但是,无法直接访问其状态的模型很难训练。当唯一的状态访问是通过语言媒介而言,这可能特别明显。我们介绍了一个可适应深Q学习的新模型,该模型结合了暹罗神经网络体系结构和对Q值函数的新颖重构,以便更好地表示系统状态,鉴于其在语言通道上的近似值。我们在基于文本的冒险游戏学习的背景下评估模型。外部,我们的模型达到了基线的融合性能点,仅需要其迭代率的15%,达到比基线高15%的收敛性能点,并且能够玩不见的,无关的游戏,没有微调。我们探究了新模型的表示空间,以确定本质上,这是由于将不同的语言调解到同一状态的适当聚类所致。

Reinforcement learning algorithms such as Q-learning have shown great promise in training models to learn the optimal action to take for a given system state; a goal in applications with an exploratory or adversarial nature such as task-oriented dialogues or games. However, models that do not have direct access to their state are harder to train; when the only state access is via the medium of language, this can be particularly pronounced. We introduce a new model amenable to deep Q-learning that incorporates a Siamese neural network architecture and a novel refactoring of the Q-value function in order to better represent system state given its approximation over a language channel. We evaluate the model in the context of zero-shot text-based adventure game learning. Extrinsically, our model reaches the baseline's convergence performance point needing only 15% of its iterations, reaches a convergence performance point 15% higher than the baseline's, and is able to play unseen, unrelated games with no fine-tuning. We probe our new model's representation space to determine that intrinsically, this is due to the appropriate clustering of different linguistic mediation into the same state.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源