论文标题
去携带的pac-bayes边缘边界:非凸和非平滑预测变量的应用
De-randomized PAC-Bayes Margin Bounds: Applications to Non-convex and Non-smooth Predictors
论文作者
论文摘要
尽管做出了几项值得注意的努力,但解释了确定性非平滑净网的概括,例如Relu-Nets,仍然具有挑战性。确定性非平滑深网的现有方法通常需要绑定这样的深网的Lipschitz常数,但这种界限很大,甚至可能随着训练设定尺寸产生的空白概括范围而增加。在本文中,我们为确定性的非凸和非平滑预测指标(例如relu-nets)提供了一个新的脱机pac-bayes边缘界限。与适用于贝叶斯预测变量的Pac-Bayes不同,除非范围的界限适用于诸如Relu-Nets之类的确定性预测因子。结合的特定实例取决于训练有素的权重(加权)距离与初始化的(加权)距离之间的权衡,而训练有素的预测指标的有效曲率(“平坦”)。 为了达到这些界限,我们首先为非凸侧但平滑的预测变量(例如线性深网(LDN))开发了一个驱动的参数,该网络(LDN)将确定性预测变量与贝叶斯预测变量相关联。然后,我们考虑非平滑预测指标,对于任何给定的输入,这些预测变量是平滑的预测指标,例如,Relu-Nets成为任何给定输入的LDN,但是实现的平滑预测指标对于不同的输入可能会有所不同。对于此类非平滑预测指标,我们引入了新的Pac-Bayes分析,该分析利用了实现的预测因子的平滑度,例如LDN,对于给定的输入,并避免了对非平滑预测指标的Lipschitz常数的依赖。经过仔细的脱机后,我们将获得确定性的非平滑预测因子的束缚。我们还基于这些边界建立了不均匀的样品复杂性结果。最后,我们在标签中的训练设置大小和随机性方面呈现了界限的广泛经验结果。
In spite of several notable efforts, explaining the generalization of deterministic non-smooth deep nets, e.g., ReLU-nets, has remained challenging. Existing approaches for deterministic non-smooth deep nets typically need to bound the Lipschitz constant of such deep nets but such bounds are quite large, may even increase with the training set size yielding vacuous generalization bounds. In this paper, we present a new family of de-randomized PAC-Bayes margin bounds for deterministic non-convex and non-smooth predictors, e.g., ReLU-nets. Unlike PAC-Bayes, which applies to Bayesian predictors, the de-randomized bounds apply to deterministic predictors like ReLU-nets. A specific instantiation of the bound depends on a trade-off between the (weighted) distance of the trained weights from the initialization and the effective curvature (`flatness') of the trained predictor. To get to these bounds, we first develop a de-randomization argument for non-convex but smooth predictors, e.g., linear deep networks (LDNs), which connects the performance of the deterministic predictor with a Bayesian predictor. We then consider non-smooth predictors which for any given input realized as a smooth predictor, e.g., ReLU-nets become some LDNs for any given input, but the realized smooth predictors can be different for different inputs. For such non-smooth predictors, we introduce a new PAC-Bayes analysis which takes advantage of the smoothness of the realized predictors, e.g., LDN, for a given input, and avoids dependency on the Lipschitz constant of the non-smooth predictor. After careful de-randomization, we get a bound for the deterministic non-smooth predictor. We also establish non-uniform sample complexity results based on such bounds. Finally, we present extensive empirical results of our bounds over changing training set size and randomness in labels.