多任务学习的无午餐定理

论文标题

多任务学习的无午餐定理

A No-Free-Lunch Theorem for MultiTask Learning

论文作者

Hanneke, Steve, Kpotufe, Samory

论文摘要

多任务学习及相关领域，例如多源域自适应地址现代设置，其中来自$ n $相关分布的数据集$ \ {p_t \} $都应结合起来，以提高任何单个此类分发$ {\ cal d} $的性能。关于该主题的不断发展的理论，仍然存在一个令人困惑的事实：尽管我们希望能够说明多个任务的贡献的性能界限，但绝大多数分析导致界限最能在每项任务的$ n $中提高，但大多数通常不会改善$ n $。因此，一开始似乎是，在此类分析中考虑的分配设置或聚合程序可能会有些不利。但是，正如我们所表明的那样，这张照片恰好更加细微，有趣的是，艰难的制度似乎似乎很有利。特别是，我们考虑了一个看似有利的分类方案，其中所有任务$ p_t $共享一个常见的最佳分类器$ h^*，$，并且可以证明可以接受广泛的制度，并以$ n $和$ n $ n $ n $ n $ n $ n $ n $提高了甲骨文。我们的一些主要结果如下： $ \ bullet $，我们表明，即使这样的政权承认$ n $和$ n $的最小值，但不存在自适应算法；也就是说，没有访问分销信息，没有算法可以保证使用$ n $固定的$ n $ n $提高的利率。 $ \ bullet $带有一些其他信息，即，任务的排名$ \ {p_t \} $，根据它们与目标$ {\ cal d} $的距离，尽管在$ n $ n $ n $ n $ n $ n中有一个简单的基于级别的过程，但基于级别的过程仍可以实现接近任务数据集的最佳聚合。有趣的是，最佳聚合可能会排除某些任务，即使它们都共享相同的$ H^*$。

Multitask learning and related areas such as multi-source domain adaptation address modern settings where datasets from $N$ related distributions $\{P_t\}$ are to be combined towards improving performance on any single such distribution ${\cal D}$. A perplexing fact remains in the evolving theory on the subject: while we would hope for performance bounds that account for the contribution from multiple tasks, the vast majority of analyses result in bounds that improve at best in the number $n$ of samples per task, but most often do not improve in $N$. As such, it might seem at first that the distributional settings or aggregation procedures considered in such analyses might be somehow unfavorable; however, as we show, the picture happens to be more nuanced, with interestingly hard regimes that might appear otherwise favorable. In particular, we consider a seemingly favorable classification scenario where all tasks $P_t$ share a common optimal classifier $h^*,$ and which can be shown to admit a broad range of regimes with improved oracle rates in terms of $N$ and $n$. Some of our main results are as follows: $\bullet$ We show that, even though such regimes admit minimax rates accounting for both $n$ and $N$, no adaptive algorithm exists; that is, without access to distributional information, no algorithm can guarantee rates that improve with large $N$ for $n$ fixed. $\bullet$ With a bit of additional information, namely, a ranking of tasks $\{P_t\}$ according to their distance to a target ${\cal D}$, a simple rank-based procedure can achieve near optimal aggregations of tasks' datasets, despite a search space exponential in $N$. Interestingly, the optimal aggregation might exclude certain tasks, even though they all share the same $h^*$.

下载PDF全文

下载文献需遵守相关版权规定

论文标题