论文标题
评估和加速深入增强学习的覆盖范围
Assessing and Accelerating Coverage in Deep Reinforcement Learning
论文作者
论文摘要
当前的深度加固学习(DRL)算法利用模拟环境中的随机性在状态空间中完全覆盖。但是,尤其是在高维度下,依靠随机性可能会导致培训的DRL神经网络模型的覆盖范围,这反过来又可能导致严重且经常致命的现实情况。据作者所知,在当前的研究文献中缺乏对DRL覆盖范围的评估。因此,在本文中,提出了一种新的措施,即近似伪覆盖(APC),以评估DRL应用中的覆盖范围。我们建议通过将高维状态空间投影到较低的歧管并量化占用空间来计算APC。此外,我们利用快速探索的随机树(RRT)利用探索探索策略来最大化。评估的功效和覆盖范围的加速性在标准任务(例如Cartpole,Highway-Env)上得到了证明。
Current deep reinforcement learning (DRL) algorithms utilize randomness in simulation environments to assume complete coverage in the state space. However, particularly in high dimensions, relying on randomness may lead to gaps in coverage of the trained DRL neural network model, which in turn may lead to drastic and often fatal real-world situations. To the best of the author's knowledge, the assessment of coverage for DRL is lacking in current research literature. Therefore, in this paper, a novel measure, Approximate Pseudo-Coverage (APC), is proposed for assessing the coverage in DRL applications. We propose to calculate APC by projecting the high dimensional state space on to a lower dimensional manifold and quantifying the occupied space. Furthermore, we utilize an exploration-exploitation strategy for coverage maximization using Rapidly-Exploring Random Tree (RRT). The efficacy of the assessment and the acceleration of coverage is demonstrated on standard tasks such as Cartpole, highway-env.