$ \ varepsilon $ - 扰动的强烈彩票假设

论文标题

$ \ varepsilon $ - 扰动的强烈彩票假设

Strong Lottery Ticket Hypothesis with $\varepsilon$--perturbation

论文作者

Xiong, Zheyang, Liao, Fangshuo, Kyrillidis, Anastasios

论文摘要

强烈的彩票假说（LTH）声称在足够大的随机初始初始化的神经网络中存在子网，该网络近似于某些目标神经网络而无需训练。我们通过将预训练前步骤的重量变化推广到初始化的某些扰动，将强大文献的理论保证扩展到与原始LTH更相似的情况。特别是，我们关注以下问题：通过在随机初始权重上允许$ \ varepsilon $ scale扰动，我们可以减少强LTH中候选网络的过度参数化要求吗？此外，SGD的重量是否会随着一组良好的扰动而变化？我们通过首先扩展子集总和上的理论结果来回答第一个问题，以允许候选人扰动。将此结果应用于神经网络设置，我们表明这样的$ \ varepsilon $ - 施加减少了强LTH的过度参数化需求。为了回答第二个问题，我们通过实验表明，预计的SGD所达到的扰动权重显示出强大的LTH修剪状态的表现更好。

The strong Lottery Ticket Hypothesis (LTH) claims the existence of a subnetwork in a sufficiently large, randomly initialized neural network that approximates some target neural network without the need of training. We extend the theoretical guarantee of the strong LTH literature to a scenario more similar to the original LTH, by generalizing the weight change in the pre-training step to some perturbation around initialization. In particular, we focus on the following open questions: By allowing an $\varepsilon$-scale perturbation on the random initial weights, can we reduce the over-parameterization requirement for the candidate network in the strong LTH? Furthermore, does the weight change by SGD coincide with a good set of such perturbation? We answer the first question by first extending the theoretical result on subset sum to allow perturbation on the candidates. Applying this result to the neural network setting, we show that such $\varepsilon$-perturbation reduces the over-parameterization requirement of the strong LTH. To answer the second question, we show via experiments that the perturbed weight achieved by the projected SGD shows better performance under the strong LTH pruning.

下载PDF全文

下载文献需遵守相关版权规定

论文标题