在强大的马尔可夫决策过程的凸配方中

论文标题

在强大的马尔可夫决策过程的凸配方中

On the convex formulations of robust Markov decision processes

论文作者

Grand-Clément, Julien, Petrik, Marek

论文摘要

强大的马尔可夫决策过程（MDP）用于在不确定环境中动态优化的应用，并已进行了广泛的研究。 MDP的许多主要属性和算法，例如价值迭代和策略迭代，直接扩展到RMDP。令人惊讶的是，尚无用于求解RMDP的MDP凸优化公式的已知类似物。这项工作描述了在经典的SA截形和S型角假设下RMDP的第一个凸优化公式。通过使用变量的熵正则化和指数变化，我们在状态和动作的数量中得出了许多变量和约束的凸公式，但约束中具有较大的系数。我们进一步简化了使用多面体，椭圆形或基于熵的不确定性集的RMDP的配方，这表明，在这些情况下，RMDP可以根据指数锥，二次锥和非负矫正量重新构建为圆锥程序。我们的工作打开了RMDP的新研究方向，可以作为获得RMDP的可拖动凸公式的第一步。

Robust Markov decision processes (MDPs) are used for applications of dynamic optimization in uncertain environments and have been studied extensively. Many of the main properties and algorithms of MDPs, such as value iteration and policy iteration, extend directly to RMDPs. Surprisingly, there is no known analog of the MDP convex optimization formulation for solving RMDPs. This work describes the first convex optimization formulation of RMDPs under the classical sa-rectangularity and s-rectangularity assumptions. By using entropic regularization and exponential change of variables, we derive a convex formulation with a number of variables and constraints polynomial in the number of states and actions, but with large coefficients in the constraints. We further simplify the formulation for RMDPs with polyhedral, ellipsoidal, or entropy-based uncertainty sets, showing that, in these cases, RMDPs can be reformulated as conic programs based on exponential cones, quadratic cones, and non-negative orthants. Our work opens a new research direction for RMDPs and can serve as a first step toward obtaining a tractable convex formulation of RMDPs.

下载PDF全文

下载文献需遵守相关版权规定

论文标题