论文标题
多任务学习的无午餐定理
A No-Free-Lunch Theorem for MultiTask Learning
论文作者
论文摘要
多任务学习及相关领域,例如多源域自适应地址现代设置,其中来自$ n $相关分布的数据集$ \ {p_t \} $都应结合起来,以提高任何单个此类分发$ {\ cal d} $的性能。关于该主题的不断发展的理论,仍然存在一个令人困惑的事实:尽管我们希望能够说明多个任务的贡献的性能界限,但绝大多数分析导致界限最能在每项任务的$ n $中提高,但大多数通常不会改善$ n $。因此,一开始似乎是,在此类分析中考虑的分配设置或聚合程序可能会有些不利。但是,正如我们所表明的那样,这张照片恰好更加细微,有趣的是,艰难的制度似乎似乎很有利。 特别是,我们考虑了一个看似有利的分类方案,其中所有任务$ p_t $共享一个常见的最佳分类器$ h^*,$,并且可以证明可以接受广泛的制度,并以$ n $和$ n $ n $ n $ n $ n $ n $ n $提高了甲骨文。我们的一些主要结果如下: $ \ bullet $,我们表明,即使这样的政权承认$ n $和$ n $的最小值,但不存在自适应算法;也就是说,没有访问分销信息,没有算法可以保证使用$ n $固定的$ n $ n $提高的利率。 $ \ bullet $带有一些其他信息,即,任务的排名$ \ {p_t \} $,根据它们与目标$ {\ cal d} $的距离,尽管在$ n $ n $ n $ n $ n $ n中有一个简单的基于级别的过程,但基于级别的过程仍可以实现接近任务数据集的最佳聚合。有趣的是,最佳聚合可能会排除某些任务,即使它们都共享相同的$ H^*$。
Multitask learning and related areas such as multi-source domain adaptation address modern settings where datasets from $N$ related distributions $\{P_t\}$ are to be combined towards improving performance on any single such distribution ${\cal D}$. A perplexing fact remains in the evolving theory on the subject: while we would hope for performance bounds that account for the contribution from multiple tasks, the vast majority of analyses result in bounds that improve at best in the number $n$ of samples per task, but most often do not improve in $N$. As such, it might seem at first that the distributional settings or aggregation procedures considered in such analyses might be somehow unfavorable; however, as we show, the picture happens to be more nuanced, with interestingly hard regimes that might appear otherwise favorable. In particular, we consider a seemingly favorable classification scenario where all tasks $P_t$ share a common optimal classifier $h^*,$ and which can be shown to admit a broad range of regimes with improved oracle rates in terms of $N$ and $n$. Some of our main results are as follows: $\bullet$ We show that, even though such regimes admit minimax rates accounting for both $n$ and $N$, no adaptive algorithm exists; that is, without access to distributional information, no algorithm can guarantee rates that improve with large $N$ for $n$ fixed. $\bullet$ With a bit of additional information, namely, a ranking of tasks $\{P_t\}$ according to their distance to a target ${\cal D}$, a simple rank-based procedure can achieve near optimal aggregations of tasks' datasets, despite a search space exponential in $N$. Interestingly, the optimal aggregation might exclude certain tasks, even though they all share the same $h^*$.