论文标题
Minibatch vs本地SGD用于异质分布式学习
Minibatch vs Local SGD for Heterogeneous Distributed Learning
论文作者
论文摘要
我们在异质分布式设置中分析本地SGD(又称平行或联合SGD)和Minibatch SGD,在此设置中,每台机器都可以访问不同,机器特异性的凸面目标的随机梯度估计;目标是优化W.R.T.平均目标;机器只能间歇性通信。我们认为,(i)Minibatch SGD(即使没有加速)在这种情况下占主导地位的所有现有SGD现有分析,(ii)当异质性高时,加速的Minibatch SGD是最佳的,并且(iii)呈现出对非杂种制度中Minibatch SGD改善本地SGD的第一个上限。
We analyze Local SGD (aka parallel or federated SGD) and Minibatch SGD in the heterogeneous distributed setting, where each machine has access to stochastic gradient estimates for a different, machine-specific, convex objective; the goal is to optimize w.r.t. the average objective; and machines can only communicate intermittently. We argue that, (i) Minibatch SGD (even without acceleration) dominates all existing analysis of Local SGD in this setting, (ii) accelerated Minibatch SGD is optimal when the heterogeneity is high, and (iii) present the first upper bound for Local SGD that improves over Minibatch SGD in a non-homogeneous regime.