通过增强学习的约束组合优化

论文标题

通过增强学习的约束组合优化

Constrained Combinatorial Optimization with Reinforcement Learning

论文作者

Solozabal, Ruben, Ceberio, Josu, Takáč, Martin

论文摘要

本文提出了一个框架，可以使用深度加固学习（RL）解决受约束的组合优化问题。为此，我们扩展了神经组合优化（NCO）理论，以应对其表述中的约束。值得注意的是，我们提出定义受约束的组合问题作为完全可观察到的马尔可夫决策过程（CMDP）。在这种情况下，该解决方案是基于与环境的交互作用的迭代构建。除了奖励信号外，该模型还依赖于从约束不满产生的惩罚信号来推断作为启发式算法的政策。此外，在优化过程中访问完整的状态表示，使我们能够依靠无内存的体系结构，从而增强在先前的序列到序列方法中获得的结果。与经典的启发式，元启发式和约束编程（CP）求解器相比，对受约束的车间和资源分配问题进行了实验证明了该提案对快速解决方案的优越性。

This paper presents a framework to tackle constrained combinatorial optimization problems using deep Reinforcement Learning (RL). To this end, we extend the Neural Combinatorial Optimization (NCO) theory in order to deal with constraints in its formulation. Notably, we propose defining constrained combinatorial problems as fully observable Constrained Markov Decision Processes (CMDP). In that context, the solution is iteratively constructed based on interactions with the environment. The model, in addition to the reward signal, relies on penalty signals generated from constraint dissatisfaction to infer a policy that acts as a heuristic algorithm. Moreover, having access to the complete state representation during the optimization process allows us to rely on memory-less architectures, enhancing the results obtained in previous sequence-to-sequence approaches. Conducted experiments on the constrained Job Shop and Resource Allocation problems prove the superiority of the proposal for computing rapid solutions when compared to classical heuristic, metaheuristic, and Constraint Programming (CP) solvers.

下载PDF全文

下载文献需遵守相关版权规定

论文标题