论文标题
关于卷积神经网络中最大池特征图的移位不变性
On the Shift Invariance of Max Pooling Feature Maps in Convolutional Neural Networks
论文作者
论文摘要
本文着重于在图像分类的背景下改善卷积神经网络(CNN)的数学解释性。具体来说,我们解决了第一层出现的不稳定性问题,该问题倾向于学习在像Imagenet这样的数据集中训练时,这些参数非常类似于定向的带通滤波器。具有这样的Gabor样过滤器的次采样卷积很容易被混叠,从而引起对小输入转移的敏感性。在这种情况下,我们建立了最大池操作员近似复杂模量的条件,该模量几乎不变。然后,我们得出了子采样卷积的偏移不变性度量,然后是最大池。特别是,我们强调了过滤器的频率和方向在实现稳定性方面所起的关键作用。我们通过考虑基于双树复合物小波包变换的确定性特征提取器来实验验证我们的理论,这是一种离散的Gabor样分解的特定情况。
This paper focuses on improving the mathematical interpretability of convolutional neural networks (CNNs) in the context of image classification. Specifically, we tackle the instability issue arising in their first layer, which tends to learn parameters that closely resemble oriented band-pass filters when trained on datasets like ImageNet. Subsampled convolutions with such Gabor-like filters are prone to aliasing, causing sensitivity to small input shifts. In this context, we establish conditions under which the max pooling operator approximates a complex modulus, which is nearly shift invariant. We then derive a measure of shift invariance for subsampled convolutions followed by max pooling. In particular, we highlight the crucial role played by the filter's frequency and orientation in achieving stability. We experimentally validate our theory by considering a deterministic feature extractor based on the dual-tree complex wavelet packet transform, a particular case of discrete Gabor-like decomposition.