论文标题

神经路径特征和神经路径内核:了解大门在深度学习中的作用

Neural Path Features and Neural Path Kernel : Understanding the role of gates in deep learning

论文作者

Lakshminarayanan, Chandrashekar, Singh, Amit Vikram

论文摘要

整流的线性单位(relu)激活也可以被认为是“门”,当它们“启用”时(当前激活输入为正)或“ OFF”(当前激活输入为负)时,它们通过或停止其前激活输入。具有RELU激活的深神经网络(DNN)具有许多门,并且每个门的ON/OFF状态跨输入示例以及网络权重。对于给定的输入示例,只有一个门的子集是“活动性”的,即on,而连接到这些活动门的权重的子网络负责产生输出。在随机初始化时,与给定输入示例相对应的活动子网络是随机的。在培训期间,随着学历的学习,还可以学习主动子网络,并可能拥有非常有价值的信息。在本文中,我们分析表征了主动子网络在深度学习中的作用。为此,我们在新颖的“神经路径特征”(NPF)中编码给定输入的门的ON/OFF状态,DNN的权重编码在新颖的“神经路径值”(NPV)中。此外,我们表明网络的输出确实是NPF和NPV的内部产品。本文的主要结果表明,与NPF相关的“神经路径内核”是一个基本数量,其特征在于DNN大门中存储的信息。我们通过实验(在MNIST和CIFAR-10上)表明,在训练期间学习了具有relu激活的标准DNN中,并且这种学习是概括的关键。此外,可以在两个单独的网络中学习NPF和NPV,并且此类学习在实验中也很好地概括了。

Rectified linear unit (ReLU) activations can also be thought of as 'gates', which, either pass or stop their pre-activation input when they are 'on' (when the pre-activation input is positive) or 'off' (when the pre-activation input is negative) respectively. A deep neural network (DNN) with ReLU activations has many gates, and the on/off status of each gate changes across input examples as well as network weights. For a given input example, only a subset of gates are 'active', i.e., on, and the sub-network of weights connected to these active gates is responsible for producing the output. At randomised initialisation, the active sub-network corresponding to a given input example is random. During training, as the weights are learnt, the active sub-networks are also learnt, and potentially hold very valuable information. In this paper, we analytically characterise the role of active sub-networks in deep learning. To this end, we encode the on/off state of the gates of a given input in a novel 'neural path feature' (NPF), and the weights of the DNN are encoded in a novel 'neural path value' (NPV). Further, we show that the output of network is indeed the inner product of NPF and NPV. The main result of the paper shows that the 'neural path kernel' associated with the NPF is a fundamental quantity that characterises the information stored in the gates of a DNN. We show via experiments (on MNIST and CIFAR-10) that in standard DNNs with ReLU activations NPFs are learnt during training and such learning is key for generalisation. Furthermore, NPFs and NPVs can be learnt in two separate networks and such learning also generalises well in experiments.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源