汤普森在多任务匪徒中进行稳健转移的采样

论文标题

汤普森在多任务匪徒中进行稳健转移的采样

Thompson Sampling for Robust Transfer in Multi-Task Bandits

论文作者

Wang, Zhi, Zhang, Chicheng, Chaudhuri, Kamalika

论文摘要

我们研究了在线多任务学习的问题，其中在相似但不一定相同的多臂强盗环境中执行任务。特别是，我们研究学习者如何通过知识转移来改善多个相关任务的整体绩效。尽管最近已证明基于上限的算法（UCB）算法在同时解决所有任务的环境中可以实现几乎最佳的性能保证，但尚不清楚汤普森采样（TS）算法是否具有卓越的经验性能在一般情况下具有相似的理论特性。在这项工作中，我们为更一般的在线多任务学习协议提供了TS-Type算法，该协议扩展了并发设置。我们提供了其频繁的分析，并证明它在随机停止时间的多任务数据聚集中使用新颖的浓度不平等也几乎是最佳的。最后，我们在合成数据上评估了算法，并表明与基于UCB的算法相比，TS型算法具有出色的经验性能和基线算法，该算法在没有传输的情况下为每个单独的任务执行TS。

We study the problem of online multi-task learning where the tasks are performed within similar but not necessarily identical multi-armed bandit environments. In particular, we study how a learner can improve its overall performance across multiple related tasks through robust transfer of knowledge. While an upper confidence bound (UCB)-based algorithm has recently been shown to achieve nearly-optimal performance guarantees in a setting where all tasks are solved concurrently, it remains unclear whether Thompson sampling (TS) algorithms, which have superior empirical performance in general, share similar theoretical properties. In this work, we present a TS-type algorithm for a more general online multi-task learning protocol, which extends the concurrent setting. We provide its frequentist analysis and prove that it is also nearly-optimal using a novel concentration inequality for multi-task data aggregation at random stopping times. Finally, we evaluate the algorithm on synthetic data and show that the TS-type algorithm enjoys superior empirical performance in comparison with the UCB-based algorithm and a baseline algorithm that performs TS for each individual task without transfer.

下载PDF全文

下载文献需遵守相关版权规定

论文标题