论文标题
带有长期CVAR标准的风险敏感马尔可夫决策过程
Risk-Sensitive Markov Decision Processes with Long-Run CVaR Criterion
论文作者
论文摘要
CVAR(有条件的价值处于风险中)是一种在金融中广泛使用的风险度量。但是,由于它不是标准的马尔可夫决策过程(MDP),因此动态优化CVAR非常困难,动态编程原理失败。在本文中,从基于灵敏度的优化的角度来看,我们研究了具有长期CVAR标准的无限 - 摩尼斯离散时间MDP。通过引入伪CVAR度量,我们得出了CVAR差异公式,该公式量化了在任何两个策略下的长期CVAR的差异。确定性政策的最佳性得出了。我们为CVAR获得了所谓的Bellman局部最优方程,这是本地最佳策略的必要条件,仅对于全球最佳策略所必需。 CVAR导数公式也被得出,以提供更多的灵敏度信息。然后,我们开发了一种策略迭代类型算法,以有效地优化CVAR,该算法被证明会在混合策略空间中收敛到本地Optima。我们进一步讨论一些扩展,包括均值优化和CVAR的最大化。最后,我们进行了与投资组合管理有关的数值实验,以证明主要结果。我们的工作可能会从灵敏度的角度阐明动态优化CVAR。
CVaR (Conditional Value at Risk) is a risk metric widely used in finance. However, dynamically optimizing CVaR is difficult since it is not a standard Markov decision process (MDP) and the principle of dynamic programming fails. In this paper, we study the infinite-horizon discrete-time MDP with a long-run CVaR criterion, from the view of sensitivity-based optimization. By introducing a pseudo CVaR metric, we derive a CVaR difference formula which quantifies the difference of long-run CVaR under any two policies. The optimality of deterministic policies is derived. We obtain a so-called Bellman local optimality equation for CVaR, which is a necessary and sufficient condition for local optimal policies and only necessary for global optimal policies. A CVaR derivative formula is also derived for providing more sensitivity information. Then we develop a policy iteration type algorithm to efficiently optimize CVaR, which is shown to converge to local optima in the mixed policy space. We further discuss some extensions including the mean-CVaR optimization and the maximization of CVaR. Finally, we conduct numerical experiments relating to portfolio management to demonstrate the main results. Our work may shed light on dynamically optimizing CVaR from a sensitivity viewpoint.