论文标题
马尔可夫决策过程的时间串联
Temporal Concatenation for Markov Decision Processes
论文作者
论文摘要
我们提出和分析了一种时间串联启发式,以求解大规模有限的马尔可夫决策过程(MDP),该过程将MDP沿时间范围划分为较小的子问题,并通过简单地从这些子问题与最佳的实用点相吻合来生成整体解决方案。作为“黑匣子”架构,时间串联可与各种现有的MDP算法一起使用。我们的主要结果是与最佳解决方案相比,时间串联的遗憾。我们为一般的MDP实例提供上限,以及一个显示上限的MDP实例家族。总之,我们的结果表明了时间串联的大幅加速潜力,而牺牲了某些性能降解。
We propose and analyze a temporal concatenation heuristic for solving large-scale finite-horizon Markov decision processes (MDP), which divides the MDP into smaller sub-problems along the time horizon and generates an overall solution by simply concatenating the optimal solutions from these sub-problems. As a "black box" architecture, temporal concatenation works with a wide range of existing MDP algorithms. Our main results characterize the regret of temporal concatenation compared to the optimal solution. We provide upper bounds for general MDP instances, as well as a family of MDP instances in which the upper bounds are shown to be tight. Together, our results demonstrate temporal concatenation's potential of substantial speed-up at the expense of some performance degradation.