剥夺了MDP：学习世界模型比世界本身更好

论文标题

剥夺了MDP：学习世界模型比世界本身更好

Denoised MDPs: Learning World Models Better Than the World Itself

论文作者

Wang, Tongzhou, Du, Simon S., Torralba, Antonio, Isola, Phillip, Zhang, Amy, Tian, Yuandong

论文摘要

将信号与噪声分开的能力以及用干净的抽象对智能至关重要的能力。有了这种能力，人类就可以在不考虑所有可能的滋扰因素的情况下有效执行现实世界任务。人造代理可以做同样的事情？当噪音时，代理可以安全地丢弃哪些信息？在这项工作中，我们根据可控性和与奖励的关系将野外信息分为四种类型，并将有用的信息归为可控和奖励相关的有用信息。该框架阐明了有关强化学习（RL）中的各种先前工作所删除的信息，并导致我们提出的学习方法，即学习一种已明确影响某些噪声分散障碍者的DeNOCONE MDP。对DeepMind Control Suite和Robodesk变体的广泛实验表明，我们的DeNocy World模型的性能优于仅使用原始观测值，而超过了先前的工作，跨政策优化控制任务以及关节位置回归的非控制任务。

The ability to separate signal from noise, and reason with clean abstractions, is critical to intelligence. With this ability, humans can efficiently perform real world tasks without considering all possible nuisance factors.How can artificial agents do the same? What kind of information can agents safely discard as noises? In this work, we categorize information out in the wild into four types based on controllability and relation with reward, and formulate useful information as that which is both controllable and reward-relevant. This framework clarifies the kinds information removed by various prior work on representation learning in reinforcement learning (RL), and leads to our proposed approach of learning a Denoised MDP that explicitly factors out certain noise distractors. Extensive experiments on variants of DeepMind Control Suite and RoboDesk demonstrate superior performance of our denoised world model over using raw observations alone, and over prior works, across policy optimization control tasks as well as the non-control task of joint position regression.

下载PDF全文

下载文献需遵守相关版权规定

论文标题