论文标题
CGAR:评论家指导行动重新分配加固
CGAR: Critic Guided Action Redistribution in Reinforcement Leaning
论文作者
论文摘要
培训游戏增强学习代理需要与环境进行多次互动。无知的随机探索可能会导致浪费时间和资源。减轻这种浪费至关重要。正如本文所述,在非政策演员评论家算法的设置下,我们证明,评论家可以带来更多的预期折扣奖励,而不是或至少与演员相等。因此,评论家预测的Q值是更好的信号,以重新分发最初从演员预测的政策分布中采样的动作。本文介绍了新的评论家指导行动重新分布(CGAR)算法,并在Openai Mujoco任务上进行了测试。实验结果表明,我们的方法提高了样本效率并实现最先进的性能。我们的代码可以在https://github.com/tairanhuang/cgar上找到。
Training a game-playing reinforcement learning agent requires multiple interactions with the environment. Ignorant random exploration may cause a waste of time and resources. It's essential to alleviate such waste. As discussed in this paper, under the settings of the off-policy actor critic algorithms, we demonstrate that the critic can bring more expected discounted rewards than or at least equal to the actor. Thus, the Q value predicted by the critic is a better signal to redistribute the action originally sampled from the policy distribution predicted by the actor. This paper introduces the novel Critic Guided Action Redistribution (CGAR) algorithm and tests it on the OpenAI MuJoCo tasks. The experimental results demonstrate that our method improves the sample efficiency and achieves state-of-the-art performance. Our code can be found at https://github.com/tairanhuang/CGAR.