论文标题
强制探索非印象匪徒的策略
Forced-exploration free Strategies for Unimodal Bandits
论文作者
论文摘要
我们考虑了由一组具有单峰结构的高斯或伯努利分布指定的多臂匪徒问题。尽管文献中已经解决了这个问题(Combes and Proutiere,2014年),但这种结构的最新算法使得成为强制探索机制。我们介绍了IMED-UB,这是一种通过适应本田(Honda and Takemura)(2015)引入的索引最小经验差异(IMED)策略,这是第一个利用单峰结构的无强制探索策略。该策略被证明是最佳的。然后,我们得出KLUCB-UB,这是IMED-UB的KLUCB版本,也被证明是最佳的。由于我们的证明技术,我们将进一步能够以统一的方式对这两种策略进行简短的有限时间分析。数值实验表明,IMED-UB和KLUCB-AB在实践中的表现都相似,并且表现优于最新算法。
We consider a multi-armed bandit problem specified by a set of Gaussian or Bernoulli distributions endowed with a unimodal structure. Although this problem has been addressed in the literature (Combes and Proutiere, 2014), the state-of-the-art algorithms for such structure make appear a forced-exploration mechanism. We introduce IMED-UB, the first forced-exploration free strategy that exploits the unimodal-structure, by adapting to this setting the Indexed Minimum Empirical Divergence (IMED) strategy introduced by Honda and Takemura (2015). This strategy is proven optimal. We then derive KLUCB-UB, a KLUCB version of IMED-UB, which is also proven optimal. Owing to our proof technique, we are further able to provide a concise finite-time analysis of both strategies in an unified way. Numerical experiments show that both IMED-UB and KLUCB-UB perform similarly in practice and outperform the state-of-the-art algorithms.