Q学习，Q-tables的零熵

论文标题

Q学习，Q-tables的零熵

Q-Learning with Differential Entropy of Q-Tables

论文作者

Nguyen, Tung D., Kasmarik, Kathryn E., Abbass, Hussein A.

论文摘要

众所周知，在经典和简单的Q学习算法中可能会发生信息丢失。引入了基于熵的策略搜索方法来替换Q学习和设计对信息丢失更强大的算法。我们猜想，在Q学习的延长训练过程中的性能降低是由于信息损失而导致的，这是非透明的，而仅在不改变Q学习算法本身而进行累积奖励时，这是非透明的。我们将Q-表（DE-QT）的微分熵作为Q学习算法的外部信息丢失检测器。分析了DE-QT对训练发作的行为，以在训练过程中找到适当的停止标准。结果表明，DE-QT可以检测到最合适的停止点，在经典的Q学习算法中达到高成功率和高效率之间的平衡。

It is well-known that information loss can occur in the classic and simple Q-learning algorithm. Entropy-based policy search methods were introduced to replace Q-learning and to design algorithms that are more robust against information loss. We conjecture that the reduction in performance during prolonged training sessions of Q-learning is caused by a loss of information, which is non-transparent when only examining the cumulative reward without changing the Q-learning algorithm itself. We introduce Differential Entropy of Q-tables (DE-QT) as an external information loss detector to the Q-learning algorithm. The behaviour of DE-QT over training episodes is analyzed to find an appropriate stopping criterion during training. The results reveal that DE-QT can detect the most appropriate stopping point, where a balance between a high success rate and a high efficiency is met for classic Q-Learning algorithm.

下载PDF全文

下载文献需遵守相关版权规定

论文标题