使用最佳传输的模型压缩

论文标题

使用最佳传输的模型压缩

Model Compression Using Optimal Transport

论文作者

Lohit, Suhas, Jones, Michael

论文摘要

模型压缩方法对于允许在计算，内存和能源约束环境（例如手机）中更轻松地部署深度学习模型很重要。知识蒸馏是一类模型压缩算法，其中大型教师网络的知识被转移到较小的学生网络，从而提高了学生的表现。在本文中，我们展示了如何将基于最佳运输的损失功能用于培训学生网络，该功能鼓励学习学生网络参数，以帮助使学生特征的分布更接近教师功能。我们在CIFAR-100，SVHN和Imagenet上介绍了图像分类结果，并表明所提出的最佳运输损失函数的性能与其他损失函数相当或更好。

Model compression methods are important to allow for easier deployment of deep learning models in compute, memory and energy-constrained environments such as mobile phones. Knowledge distillation is a class of model compression algorithm where knowledge from a large teacher network is transferred to a smaller student network thereby improving the student's performance. In this paper, we show how optimal transport-based loss functions can be used for training a student network which encourages learning student network parameters that help bring the distribution of student features closer to that of the teacher features. We present image classification results on CIFAR-100, SVHN and ImageNet and show that the proposed optimal transport loss functions perform comparably to or better than other loss functions.

下载PDF全文

下载文献需遵守相关版权规定

论文标题