论文标题
凸神经网络的奇怪案例
The Curious Case of Convex Neural Networks
论文作者
论文摘要
在本文中,我们调查了神经网络的受约束公式,其中输出是输入的凸函数。我们表明,可以在完全连接和卷积层上执行凸的约束,从而适用于大多数架构。凸的约束包括将权重(除了第一层以外的所有)限制为非负数,并使用非降低凸激活函数。尽管很简单,但这些约束对网络的概括能力具有深远的影响。我们提取三个有价值的见解:(a)输入输出凸神经网络(IOC-NNS)自我正规化并减少过度拟合的问题; (b)尽管受到严格的约束,但与基础卷积体系结构相比,它们的表现优于基础多层感知器,并且具有相似的性能,并且(c)IOC-NNS在火车标签中表现出对噪声的稳健性。我们使用彻底的实验和对具有三种不同神经网络架构的标准图像分类数据集进行了彻底的实验和消融研究,证明了拟议思想的功效。
In this paper, we investigate a constrained formulation of neural networks where the output is a convex function of the input. We show that the convexity constraints can be enforced on both fully connected and convolutional layers, making them applicable to most architectures. The convexity constraints include restricting the weights (for all but the first layer) to be non-negative and using a non-decreasing convex activation function. Albeit simple, these constraints have profound implications on the generalization abilities of the network. We draw three valuable insights: (a) Input Output Convex Neural Networks (IOC-NNs) self regularize and reduce the problem of overfitting; (b) Although heavily constrained, they outperform the base multi layer perceptrons and achieve similar performance as compared to base convolutional architectures and (c) IOC-NNs show robustness to noise in train labels. We demonstrate the efficacy of the proposed idea using thorough experiments and ablation studies on standard image classification datasets with three different neural network architectures.