论文标题
使用预处理的随机梯度MCMC及其应用,贝叶斯稀疏学习
Bayesian Sparse learning with preconditioned stochastic gradient MCMC and its applications
论文作者
论文摘要
在这项工作中,我们提出了贝叶斯型稀疏深度学习算法。该算法利用一组尖峰和单板先验来用于深神经网络中的参数。分层贝叶斯混合物将使用自适应经验方法进行训练。也就是说,将使用预处理的随机梯度Langevin Dynamics(PSGLD)从后验中进行样品,并通过随机近似优化潜在变量。通过自适应搜索和惩罚来优化超参数的同时,实现了网络的稀疏性。流行的SG-MCMC方法是随机梯度Langevin Dynamics(SGLD)。但是,考虑到非convex学习中模型参数空间中的复杂几何形状,使用每个组件中的通用步长更新参数可能会导致缓慢的混合。为了解决此问题,我们在更新规则中应用了一个可管理的预处理程序,该规则提供了一个步骤尺寸的参数以适应本地几何属性。此外,通过在预处理矩阵中平稳优化高参数,我们提出的算法确保了偏差的减少,这是通过忽略预处理SGLD中的校正项来引入的。根据现有的理论框架,我们表明所提出的算法可以在轻度条件下以可控的偏见渐近地汇合到正确的分布。在合成回归问题和学习椭圆PDE的解决方案上进行了数值测试,这证明了当前工作的准确性和效率。
In this work, we propose a Bayesian type sparse deep learning algorithm. The algorithm utilizes a set of spike-and-slab priors for the parameters in the deep neural network. The hierarchical Bayesian mixture will be trained using an adaptive empirical method. That is, one will alternatively sample from the posterior using preconditioned stochastic gradient Langevin Dynamics (PSGLD), and optimize the latent variables via stochastic approximation. The sparsity of the network is achieved while optimizing the hyperparameters with adaptive searching and penalizing. A popular SG-MCMC approach is Stochastic gradient Langevin dynamics (SGLD). However, considering the complex geometry in the model parameter space in non-convex learning, updating parameters using a universal step size in each component as in SGLD may cause slow mixing. To address this issue, we apply a computationally manageable preconditioner in the updating rule, which provides a step-size parameter to adapt to local geometric properties. Moreover, by smoothly optimizing the hyperparameter in the preconditioning matrix, our proposed algorithm ensures a decreasing bias, which is introduced by ignoring the correction term in preconditioned SGLD. According to the existing theoretical framework, we show that the proposed algorithm can asymptotically converge to the correct distribution with a controllable bias under mild conditions. Numerical tests are performed on both synthetic regression problems and learning the solutions of elliptic PDE, which demonstrate the accuracy and efficiency of present work.