连续时间分配团体在线分配

论文标题

连续时间分配团体在线分配

Group-Fair Online Allocation in Continuous Time

论文作者

Cayci, Semih, Gupta, Swati, Eryilmaz, Atilla

论文摘要

离散时间学习的理论已成功应用于许多问题，这些问题涉及不确定性下的顺序决策。但是，在许多应用程序中，包括在线自由职业平台中的合同招聘和云计算系统中的服务器分配，只有在随机和动作依赖于动作的时间之后才能观察到每个动作的结果。此外，由于某些道德和经济问题，控制者可能会对完成每个任务的完成施加截止日期，并且需要在分配总时间预算$ B $的不同群体中公平。为了解决这些应用程序，我们考虑使用公平考虑的连续时间在线学习问题，并基于连续的时间实用性最大化提出了一个新颖的框架。我们表明，这种配方恢复了奖励最大化的，最大的公平公平和按比例公平的分配规则，包括特殊情况。我们表征了最佳离线策略，该策略以最佳的公平方式（由实用程序函数定义）分配了不同动作之间的总时间，并施加了截止日期以最大程度地提高时间效率。在没有任何统计知识的情况下，我们提出了一种基于时间平均的双重上升优化的新颖的在线学习算法，并证明它实现了$ \ tilde {o}（b^{ - 1/2}）$遗憾。

The theory of discrete-time online learning has been successfully applied in many problems that involve sequential decision-making under uncertainty. However, in many applications including contractual hiring in online freelancing platforms and server allocation in cloud computing systems, the outcome of each action is observed only after a random and action-dependent time. Furthermore, as a consequence of certain ethical and economic concerns, the controller may impose deadlines on the completion of each task, and require fairness across different groups in the allocation of total time budget $B$. In order to address these applications, we consider continuous-time online learning problem with fairness considerations, and present a novel framework based on continuous-time utility maximization. We show that this formulation recovers reward-maximizing, max-min fair and proportionally fair allocation rules across different groups as special cases. We characterize the optimal offline policy, which allocates the total time between different actions in an optimally fair way (as defined by the utility function), and impose deadlines to maximize time-efficiency. In the absence of any statistical knowledge, we propose a novel online learning algorithm based on dual ascent optimization for time averages, and prove that it achieves $\tilde{O}(B^{-1/2})$ regret bound.

下载PDF全文

下载文献需遵守相关版权规定

论文标题