论文标题
Q学习的高估偏差的影响因素
Factors of Influence of the Overestimation Bias of Q-Learning
论文作者
论文摘要
我们研究学习率$α$,折扣因子$γ$和奖励信号$ r $是否会影响Q学习算法的高估偏差。我们的初步结果在随机的环境中,需要将神经网络用作函数近似器,这表明这三个参数都会显着影响高估。通过仔细调整$α$和$γ$,以及使用Q-Learning的时间差目标的指数移动平均值,我们表明算法可以学习比其他几种流行的无模型方法更准确的价值估计值,这些方法在过去解决了过去在过去的高估偏见。
We study whether the learning rate $α$, the discount factor $γ$ and the reward signal $r$ have an influence on the overestimation bias of the Q-Learning algorithm. Our preliminary results in environments which are stochastic and that require the use of neural networks as function approximators, show that all three parameters influence overestimation significantly. By carefully tuning $α$ and $γ$, and by using an exponential moving average of $r$ in Q-Learning's temporal difference target, we show that the algorithm can learn value estimates that are more accurate than the ones of several other popular model-free methods that have addressed its overestimation bias in the past.