论文标题
浮动感知系统的新兴特性
Emergent Properties of Foveated Perceptual Systems
论文作者
论文摘要
这项工作的目的是表征FoveAtion操作对机器视觉系统的代表性影响,灵感来自于FOVEAT的人类视觉系统的启发,该系统的敏锐度更高,在凝视和外围的类似纹理的中心和纹理状的中心。为此,我们介绍了由第一阶段\ textit {固定}图像变换组成的模型,然后是第二阶段\ textit {Learnable}卷积神经网络,我们改变了第一阶段组件。主要模型具有foveatextal的输入阶段,我们将其与具有foveated-Blurred输入的模型和具有空间均匀模糊输入的模型进行比较(均与感知压缩匹配),以及具有最小输入基于输入的压缩的最终参考模型。我们发现:1)foveated-textuly模型尽管具有更大的I.I.D。比其他模型的概括; 2)FOVEATED-TEXTURE模型对高空间频率信息具有更大的敏感性,并且对遮挡的鲁棒性具有更大的鲁棒性,W.R.T比较模型; 3)两种凹陷系统,即使具有重量共享约束,相对于空间均匀的系统,相对于空间均匀的系统都显示出更强的中心图像偏置。至关重要的是,这些结果在整个学习动力学过程中都保留在不同的经典CNN体系结构上。 Altogether, this suggests that foveation with peripheral texture-based computations yields an efficient, distinct, and robust representational format of scene information, and provides symbiotic computational insight into the representational consequences that texture-based peripheral encoding may have for processing in the human visual system, while also potentially inspiring the next generation of computer vision models via spatially-adaptive computation.代码 +数据可用:https://github.com/arturodeza/emergentproperties
The goal of this work is to characterize the representational impact that foveation operations have for machine vision systems, inspired by the foveated human visual system, which has higher acuity at the center of gaze and texture-like encoding in the periphery. To do so, we introduce models consisting of a first-stage \textit{fixed} image transform followed by a second-stage \textit{learnable} convolutional neural network, and we varied the first stage component. The primary model has a foveated-textural input stage, which we compare to a model with foveated-blurred input and a model with spatially-uniform blurred input (both matched for perceptual compression), and a final reference model with minimal input-based compression. We find that: 1) the foveated-texture model shows similar scene classification accuracy as the reference model despite its compressed input, with greater i.i.d. generalization than the other models; 2) the foveated-texture model has greater sensitivity to high-spatial frequency information and greater robustness to occlusion, w.r.t the comparison models; 3) both the foveated systems, show a stronger center image-bias relative to the spatially-uniform systems even with a weight sharing constraint. Critically, these results are preserved over different classical CNN architectures throughout their learning dynamics. Altogether, this suggests that foveation with peripheral texture-based computations yields an efficient, distinct, and robust representational format of scene information, and provides symbiotic computational insight into the representational consequences that texture-based peripheral encoding may have for processing in the human visual system, while also potentially inspiring the next generation of computer vision models via spatially-adaptive computation. Code + Data available here: https://github.com/ArturoDeza/EmergentProperties