论文标题
解决基于优势的强化学习算法的标量问题
Solving the scalarization issues of Advantage-based Reinforcement Learning Algorithms
论文作者
论文摘要
在这项研究中,研究了Advantage Actor评论家(A2C)强化学习算法中多目标优化问题的标量引起的一些问题。该论文显示了幼稚的标量如何导致梯度重叠。此外,讨论熵正则项可能是不受控制的噪声来源的可能性。关于上述问题,提出了一种避免梯度重叠的技术,同时保持相同的损失表述。此外,研究了一种通过研究所需的最小熵从分布中抽样的作用来避免不受控制的噪声的方法。已经进行了试点实验,以显示提出的方法如何加快训练的速度。提出的方法可以应用于任何基于优势的增强学习算法。
In this research, some of the issues that arise from the scalarization of the multi-objective optimization problem in the Advantage Actor Critic (A2C) reinforcement learning algorithm are investigated. The paper shows how a naive scalarization can lead to gradients overlapping. Furthermore, the possibility that the entropy regularization term can be a source of uncontrolled noise is discussed. With respect to the above issues, a technique to avoid gradient overlapping is proposed, while keeping the same loss formulation. Moreover, a method to avoid the uncontrolled noise, by sampling the actions from distributions with a desired minimum entropy, is investigated. Pilot experiments have been carried out to show how the proposed method speeds up the training. The proposed approach can be applied to any Advantage-based Reinforcement Learning algorithm.