论文标题

部分可观测时空混沌系统的无模型预测

Probabilistic Framework of Howard's Policy Iteration: BML Evaluation and Robust Convergence Analysis

论文作者

Wang, Yutian, Ni, Yuan-Hua, Chen, Zengqiang, Zhang, Ji-Feng

论文摘要

本文旨在使用前向后的随机微分方程(FBSDE)的语言为霍华德的策略迭代算法构建一个概率框架。与基于部分微分方程的常规配方相反,我们的基于FBSDE的公式可以通过优化标准在样本数据上来轻松实现,因此对状态维度不太敏感。特别是,通过构建不同的FBSDE讨论了政策和政策评估方法。然后提出了向后估计的损坏(BML)标准以求解这些方程。通过在拟议的标准中选择特定的重量功能,我们可以恢复流行的Deep BSDE方法或BSDE的Martingale方法。收敛结果是在理想和实际条件下建立的,具体取决于优化标准是否降低到零。在理想情况下,我们证明了由拟议的基于FBSDE的算法和标准政策迭代产生的策略序列具有相同的性能,因此具有相同的收敛速率。在实际情况下,在对优化误差的轻度假设下,提出的算法仍被证明可以牢固地融合。

This paper aims to build a probabilistic framework for Howard's policy iteration algorithm using the language of forward-backward stochastic differential equations (FBSDEs). As opposed to conventional formulations based on partial differential equations, our FBSDE-based formulation can be easily implemented by optimizing criteria over sample data, and is therefore less sensitive to the state dimension. In particular, both on-policy and off-policy evaluation methods are discussed by constructing different FBSDEs. The backward-measurability-loss (BML) criterion is then proposed for solving these equations. By choosing specific weight functions in the proposed criterion, we can recover the popular Deep BSDE method or the martingale approach for BSDEs. The convergence results are established under both ideal and practical conditions, depending on whether the optimization criteria are decreased to zero. In the ideal case, we prove that the policy sequences produced by proposed FBSDE-based algorithms and the standard policy iteration have the same performance, and thus have the same convergence rate. In the practical case, the proposed algorithm is still proved to converge robustly under mild assumptions on optimization errors.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源