最小的差异抽样，并具有可证明的保证，用于快速训练图形神经网络

论文标题

最小的差异抽样，并具有可证明的保证，用于快速训练图形神经网络

Minimal Variance Sampling with Provable Guarantees for Fast Training of Graph Neural Networks

论文作者

Cong, Weilin, Forsati, Rana, Kandemir, Mahmut, Mahdavi, Mehrdad

论文摘要

采样方法（例如节点，层或子图）已成为加快训练大规模图神经网络（GNNS）的必不可少策略。但是，现有的采样方法主要基于图形结构信息，而忽略了优化的动态性，从而导致估计随机梯度的差异很大。高方差问题在极大的图表中可以非常明显，在这些图表中，它会导致收敛缓慢且泛化差。 In this paper, we theoretically analyze the variance of sampling methods and show that, due to the composite structure of empirical risk, the variance of any sampling method can be decomposed into \textit{embedding approximation variance} in the forward stage and \textit{stochastic gradient variance} in the backward stage that necessities mitigating both types of variance to obtain faster convergence rate.我们提出了一种解耦方差降低策略，该策略采用（近似）梯度信息以最小的方差适应性样品节点，并明确降低通过嵌入近似值引入的方差。我们从理论和经验上表明，即使使用较小的迷你批量尺寸，提出的方法也具有更快的收敛速率，并且与现有方法相比，融合率更高，并且需要更好的概括。

Sampling methods (e.g., node-wise, layer-wise, or subgraph) has become an indispensable strategy to speed up training large-scale Graph Neural Networks (GNNs). However, existing sampling methods are mostly based on the graph structural information and ignore the dynamicity of optimization, which leads to high variance in estimating the stochastic gradients. The high variance issue can be very pronounced in extremely large graphs, where it results in slow convergence and poor generalization. In this paper, we theoretically analyze the variance of sampling methods and show that, due to the composite structure of empirical risk, the variance of any sampling method can be decomposed into \textit{embedding approximation variance} in the forward stage and \textit{stochastic gradient variance} in the backward stage that necessities mitigating both types of variance to obtain faster convergence rate. We propose a decoupled variance reduction strategy that employs (approximate) gradient information to adaptively sample nodes with minimal variance, and explicitly reduces the variance introduced by embedding approximation. We show theoretically and empirically that the proposed method, even with smaller mini-batch sizes, enjoys a faster convergence rate and entails a better generalization compared to the existing methods.

下载PDF全文

下载文献需遵守相关版权规定

论文标题