论文标题

Wasserstein分布的核心核心优化问题

Coresets for Wasserstein Distributionally Robust Optimization Problems

论文作者

Huang, Ruomin, Huang, Jiawei, Liu, Wenjie, Ding, Hu

论文摘要

Wasserstein分布在强大的优化方面(\ TextSF {WDRO})是一个流行的模型,可通过模棱两可的数据增强机器学习的鲁棒性。但是,\ textsf {wdro}的复杂性在实践中可能是令人望而却步的,因为求解其``minimax''公式需要大量的计算。最近,已经开发了一些快速的\ textsf {WDRO}培训算法,用于某些特定的机器学习任务(例如,逻辑回归)。但是,据我们所知,关于一般大规模\ textsf {wdro}的一般大规模\ textsf {wdro}的有效算法的研究仍然非常有限。 \ textit {coreset}是压缩大数据集的重要工具,因此已广泛应用它来减少许多优化问题的计算复杂性。在本文中,我们引入了一个统一的框架,以构建一般\ textsf {wdro}问题的$ε$ -coreset。尽管由于数据的不确定性问题而获得\ textsf {wdro}的常规核心是具有挑战性的,但我们表明我们可以通过使用\ textsf {wdro}的强双重性能来计算``dual dual coreset''。同样,对于原始\ textsf {wdro}目标,可以保证双核引入的错误。为了构建双核,我们提出了一种新型的网格采样方法,该方法特别适合\ textsf {wdro}的双重公式。最后,我们实施了核心方法,并说明了在实验中的几个\ textsf {wdro}问题的有效性。

Wasserstein distributionally robust optimization (\textsf{WDRO}) is a popular model to enhance the robustness of machine learning with ambiguous data. However, the complexity of \textsf{WDRO} can be prohibitive in practice since solving its ``minimax'' formulation requires a great amount of computation. Recently, several fast \textsf{WDRO} training algorithms for some specific machine learning tasks (e.g., logistic regression) have been developed. However, the research on designing efficient algorithms for general large-scale \textsf{WDRO}s is still quite limited, to the best of our knowledge. \textit{Coreset} is an important tool for compressing large dataset, and thus it has been widely applied to reduce the computational complexities for many optimization problems. In this paper, we introduce a unified framework to construct the $ε$-coreset for the general \textsf{WDRO} problems. Though it is challenging to obtain a conventional coreset for \textsf{WDRO} due to the uncertainty issue of ambiguous data, we show that we can compute a ``dual coreset'' by using the strong duality property of \textsf{WDRO}. Also, the error introduced by the dual coreset can be theoretically guaranteed for the original \textsf{WDRO} objective. To construct the dual coreset, we propose a novel grid sampling approach that is particularly suitable for the dual formulation of \textsf{WDRO}. Finally, we implement our coreset approach and illustrate its effectiveness for several \textsf{WDRO} problems in the experiments.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源