优化分歧的变异表示并加速其统计估计

论文标题

优化分歧的变异表示并加速其统计估计

Optimizing Variational Representations of Divergences and Accelerating their Statistical Estimation

论文作者

Birrell, Jeremiah, Katsoulakis, Markos A., Pantazis, Yannis

论文摘要

在许多研究领域，差异和高维概率分布之间的差异表示和距离都具有重要的理论见解和实际优势。最近，它们在机器学习中已成为一种可训练且可扩展的方法，用于培训概率模型以及在数据分布之间统计学上的区别。它们的优势包括：1）可以从数据中估算为统计平均值。 2）这种表示可以利用神经网络在功能空间中有效近似最佳解决方案的能力。但是，目前缺乏一种系统和实用的方法来改善这种变异公式的紧密性，并因此加速了统计学习和从数据中估算。在这里，我们开发了一种方法来构建差异的新变化表示。我们的方法取决于通过辅助优化问题构建的改进目标功能。此外，目标功能的功能性Hessian的计算揭示了常见最佳变异溶液周围的局部曲率差异。这可以量化和命令在不同的变分表示之间的紧密度。最后，利用神经网络优化的数值模拟表明，更紧密的表示可以使学习速度明显更快，并且更准确地估计了合成数据集中的差异（超过1000个维度），通常会通过几乎一个数量级的顺序加速。

Variational representations of divergences and distances between high-dimensional probability distributions offer significant theoretical insights and practical advantages in numerous research areas. Recently, they have gained popularity in machine learning as a tractable and scalable approach for training probabilistic models and for statistically differentiating between data distributions. Their advantages include: 1) They can be estimated from data as statistical averages. 2) Such representations can leverage the ability of neural networks to efficiently approximate optimal solutions in function spaces. However, a systematic and practical approach to improving the tightness of such variational formulas, and accordingly accelerate statistical learning and estimation from data, is currently lacking. Here we develop such a methodology for building new, tighter variational representations of divergences. Our approach relies on improved objective functionals constructed via an auxiliary optimization problem. Furthermore, the calculation of the functional Hessian of objective functionals unveils the local curvature differences around the common optimal variational solution; this quantifies and orders the tightness gains between different variational representations. Finally, numerical simulations utilizing neural network optimization demonstrate that tighter representations can result in significantly faster learning and more accurate estimation of divergences in both synthetic and real datasets (of more than 1000 dimensions), often accelerated by nearly an order of magnitude.

下载PDF全文

下载文献需遵守相关版权规定

论文标题