论文标题
Q学习,Q-tables的零熵
Q-Learning with Differential Entropy of Q-Tables
论文作者
论文摘要
众所周知,在经典和简单的Q学习算法中可能会发生信息丢失。引入了基于熵的策略搜索方法来替换Q学习和设计对信息丢失更强大的算法。我们猜想,在Q学习的延长训练过程中的性能降低是由于信息损失而导致的,这是非透明的,而仅在不改变Q学习算法本身而进行累积奖励时,这是非透明的。我们将Q-表(DE-QT)的微分熵作为Q学习算法的外部信息丢失检测器。分析了DE-QT对训练发作的行为,以在训练过程中找到适当的停止标准。结果表明,DE-QT可以检测到最合适的停止点,在经典的Q学习算法中达到高成功率和高效率之间的平衡。
It is well-known that information loss can occur in the classic and simple Q-learning algorithm. Entropy-based policy search methods were introduced to replace Q-learning and to design algorithms that are more robust against information loss. We conjecture that the reduction in performance during prolonged training sessions of Q-learning is caused by a loss of information, which is non-transparent when only examining the cumulative reward without changing the Q-learning algorithm itself. We introduce Differential Entropy of Q-tables (DE-QT) as an external information loss detector to the Q-learning algorithm. The behaviour of DE-QT over training episodes is analyzed to find an appropriate stopping criterion during training. The results reveal that DE-QT can detect the most appropriate stopping point, where a balance between a high success rate and a high efficiency is met for classic Q-Learning algorithm.