用于规避风险在线凸的零订单动量方法

论文标题

用于规避风险在线凸的零订单动量方法

A Zeroth-Order Momentum Method for Risk-Averse Online Convex Games

论文作者

Wang, Zifan, Shen, Yi, Bell, Zachary I., Nivison, Scott, Zavlanos, Michael M., Johansson, Karl H.

论文摘要

我们考虑在反复的未知游戏中进行规避风险的学习，在这种游戏中，代理商的目标是最大程度地减少其产生高成本的风险。具体而言，代理商将处于风险的条件值（CVAR）用作风险措施，并以每个情节在每个情节的成本值的形式依靠强盗反馈来估算其CVAR值并更新其动作。使用匪徒反馈来估计CVAR的一个主要挑战是，代理只能访问其自身的成本值，但是，这取决于所有代理的行为。为了应对这一挑战，我们提出了一种新的规避风险的学习算法，并利用有关成本价值的完整历史信息。我们表明，该算法实现了次线性的遗憾，并匹配了文献中最著名的算法。我们为欧洲大师游戏提供了数值实验，该游戏表明我们的方法表现优于现有方法。

We consider risk-averse learning in repeated unknown games where the goal of the agents is to minimize their individual risk of incurring significantly high cost. Specifically, the agents use the conditional value at risk (CVaR) as a risk measure and rely on bandit feedback in the form of the cost values of the selected actions at every episode to estimate their CVaR values and update their actions. A major challenge in using bandit feedback to estimate CVaR is that the agents can only access their own cost values, which, however, depend on the actions of all agents. To address this challenge, we propose a new risk-averse learning algorithm with momentum that utilizes the full historical information on the cost values. We show that this algorithm achieves sub-linear regret and matches the best known algorithms in the literature. We provide numerical experiments for a Cournot game that show that our method outperforms existing methods.

下载PDF全文

下载文献需遵守相关版权规定

论文标题