使用可微分算法学习

论文标题

使用可微分算法学习

Learning with Differentiable Algorithms

论文作者

Petersen, Felix

论文摘要

经典算法和机器学习系统（例如神经网络）在日常生活中都很丰富。虽然经典的计算机科学算法适合精确执行精确定义的任务，例如在大图中找到最短路径，但神经网络允许从数据中学习来预测更复杂的任务中最可能的答案，例如图像分类，这些任务无法简化为确切的算法。为了获得两全其美，本文探讨了将这两个概念结合起来，从而导致更健壮，表现更好，更容易解释，计算效率更高，并且有效的数据有效架构。该论文正式化了算法监督的想法，该算法可以使神经网络与算法一起学习或结合学习。当将算法集成到神经体系结构中时，重要的是，算法是可以区分的，因此可以端对端训练该体系结构，并且可以以有意义的方式通过算法传播梯度。为了使算法可区分，本文提出了一种通过扰动变量并以封闭形式的期望值（即无需采样）近似期望值来连续放松算法的通用方法。此外，本文提出了可区分的算法，例如可区分的排序网络，可区分的渲染器和可区分的逻辑门网络。最后，本文提出了使用算法学习的替代培训策略。

Classic algorithms and machine learning systems like neural networks are both abundant in everyday life. While classic computer science algorithms are suitable for precise execution of exactly defined tasks such as finding the shortest path in a large graph, neural networks allow learning from data to predict the most likely answer in more complex tasks such as image classification, which cannot be reduced to an exact algorithm. To get the best of both worlds, this thesis explores combining both concepts leading to more robust, better performing, more interpretable, more computationally efficient, and more data efficient architectures. The thesis formalizes the idea of algorithmic supervision, which allows a neural network to learn from or in conjunction with an algorithm. When integrating an algorithm into a neural architecture, it is important that the algorithm is differentiable such that the architecture can be trained end-to-end and gradients can be propagated back through the algorithm in a meaningful way. To make algorithms differentiable, this thesis proposes a general method for continuously relaxing algorithms by perturbing variables and approximating the expectation value in closed form, i.e., without sampling. In addition, this thesis proposes differentiable algorithms, such as differentiable sorting networks, differentiable renderers, and differentiable logic gate networks. Finally, this thesis presents alternative training strategies for learning with algorithms.

下载PDF全文

下载文献需遵守相关版权规定

论文标题