论文标题
阐明强化学习中的认识和态度不确定性
Disentangling Epistemic and Aleatoric Uncertainty in Reinforcement Learning
论文作者
论文摘要
在预测的奖励上表征核心和认知不确定性可以帮助建立可靠的增强学习(RL)系统。不确定性的不确定性是由于不可还原环境的随机性导致了固有的风险状态和行动。认知不确定性来自学习过程中积累的有限信息以做出明智的决定。可以使用表征Aleatoric和认知不确定性来加快在训练环境中的学习,改善对类似测试环境的概括,并在异常测试环境中提出陌生的行为。在这项工作中,我们介绍了一个框架,用于解散RL中的疾病和认知不确定性。 (1)我们首先定义了四个Desiderata,这些desiderata捕获了RL在训练和测试时间的RL中估计的所需行为。 (2)然后,我们提出了受监督学习启发的四个RL模型(即蒙特卡洛辍学,集合,深内核学习模型和证据网络),以实例化核心和认知不确定性。最后,(3)我们提出了一种实用的评估方法,以基于对分布外环境的检测和对扰动环境的概括来评估无模型RL的不确定性估计。我们提供了理论和实验证据,以验证精心装备无模型的RL药物使用有监督的学习不确定性方法可以实现我们的Desiderata。
Characterizing aleatoric and epistemic uncertainty on the predicted rewards can help in building reliable reinforcement learning (RL) systems. Aleatoric uncertainty results from the irreducible environment stochasticity leading to inherently risky states and actions. Epistemic uncertainty results from the limited information accumulated during learning to make informed decisions. Characterizing aleatoric and epistemic uncertainty can be used to speed up learning in a training environment, improve generalization to similar testing environments, and flag unfamiliar behavior in anomalous testing environments. In this work, we introduce a framework for disentangling aleatoric and epistemic uncertainty in RL. (1) We first define four desiderata that capture the desired behavior for aleatoric and epistemic uncertainty estimation in RL at both training and testing time. (2) We then present four RL models inspired by supervised learning (i.e. Monte Carlo dropout, ensemble, deep kernel learning models, and evidential networks) to instantiate aleatoric and epistemic uncertainty. Finally, (3) we propose a practical evaluation method to evaluate uncertainty estimation in model-free RL based on detection of out-of-distribution environments and generalization to perturbed environments. We present theoretical and experimental evidence to validate that carefully equipping model-free RL agents with supervised learning uncertainty methods can fulfill our desiderata.