教CNN以模仿人类的视觉认知过程并正规化纹理形状偏见

论文标题

教CNN以模仿人类的视觉认知过程并正规化纹理形状偏见

Teaching CNNs to mimic Human Visual Cognitive Process & regularise Texture-Shape bias

论文作者

Mohla, Satyam, Nasery, Anshul, Banerjee, Biplab

论文摘要

计算机视觉的最新实验表明，纹理偏见是采用卷积神经网络（CNN）的模型的主要原因，与早期作品相抵触，声称这些网络使用形状识别对象。据信，成本函数迫使美国有线电视新闻网（CNN）采取贪婪的方法，并为诸如纹理（例如质地）（提高准确性）等本地信息产生倾向，因此无法探索任何全球统计数据。我们提出了一种新的直觉体系结构CognitiveCnn，灵感来自心理学特征集成理论，以利用人类可解释的特征，例如形状，纹理，边缘等来重建和对图像进行分类。我们定义了新型指标，以使用注意图量化这些方式中存在的“抽象信息”的“相关性”。我们进一步介绍了一种正则化方法，该方法可确保形状，纹理等的每个模态在给定的任务中都会成比例的影响，就像重建一样。并执行实验以显示出所得的准确性和鲁棒性的提升，除了对这些CNN提供解释性以在对象识别中实现卓越的性能。

Recent experiments in computer vision demonstrate texture bias as the primary reason for supreme results in models employing Convolutional Neural Networks (CNNs), conflicting with early works claiming that these networks identify objects using shape. It is believed that the cost function forces the CNN to take a greedy approach and develop a proclivity for local information like texture to increase accuracy, thus failing to explore any global statistics. We propose CognitiveCNN, a new intuitive architecture, inspired from feature integration theory in psychology to utilise human interpretable feature like shape, texture, edges etc. to reconstruct, and classify the image. We define novel metrics to quantify the "relevance" of "abstract information" present in these modalities using attention maps. We further introduce a regularisation method which ensures that each modality like shape, texture etc. gets proportionate influence in a given task, as it does for reconstruction; and perform experiments to show the resulting boost in accuracy and robustness, besides imparting explainability to these CNNs for achieving superior performance in object recognition.

下载PDF全文

下载文献需遵守相关版权规定

论文标题