论文标题
零拍学习的新颖观点:通过语义特征扩展建立歧管结构的对齐
A Novel Perspective to Zero-shot Learning: Towards an Alignment of Manifold Structures via Semantic Feature Expansion
论文作者
论文摘要
零射门学习旨在认识到看不见的课程(没有培训示例),并从可见的课程中转移了知识。这通常是通过利用可见和看不见类共享的语义特征空间(即属性或词向量)作为桥梁来实现的。零射门学习中的一种常见做法是在视觉和语义特征空间之间训练带有标记的类示例的投影。当推断时,该学习的投影将应用于看不见的类,并通过某些指标识别类标签。但是,视觉和语义特征空间是相互独立的,并且具有完全不同的歧管结构。在这样的范式下,大多数现有的方法很容易遭受域移位问题的困扰,并削弱了零照片识别的性能。为了解决这个问题,我们提出了一个名为AMS-SFE的新型模型。它通过语义特征扩展来考虑歧管结构的对齐。具体来说,我们基于基于自动编码器的模型,以扩展视觉输入中的语义功能。此外,扩展是由从数据的视觉特征空间中提取的嵌入式歧管共同指导的。我们的模型是通过扩展语义特征并获得两个好处来对齐两个特征空间的首次尝试:首先,我们扩展了一些辅助功能,以增强语义特征空间;其次,更重要的是,我们隐含地对齐视觉和语义特征空间之间的歧管结构。因此,可以更好地训练投影并减轻域移位问题。广泛的实验显示出显着的性能改善,这验证了我们模型的有效性。
Zero-shot learning aims at recognizing unseen classes (no training example) with knowledge transferred from seen classes. This is typically achieved by exploiting a semantic feature space shared by both seen and unseen classes, i.e., attribute or word vector, as the bridge. One common practice in zero-shot learning is to train a projection between the visual and semantic feature spaces with labeled seen classes examples. When inferring, this learned projection is applied to unseen classes and recognizes the class labels by some metrics. However, the visual and semantic feature spaces are mutually independent and have quite different manifold structures. Under such a paradigm, most existing methods easily suffer from the domain shift problem and weaken the performance of zero-shot recognition. To address this issue, we propose a novel model called AMS-SFE. It considers the alignment of manifold structures by semantic feature expansion. Specifically, we build upon an autoencoder-based model to expand the semantic features from the visual inputs. Additionally, the expansion is jointly guided by an embedded manifold extracted from the visual feature space of the data. Our model is the first attempt to align both feature spaces by expanding semantic features and derives two benefits: first, we expand some auxiliary features that enhance the semantic feature space; second and more importantly, we implicitly align the manifold structures between the visual and semantic feature spaces; thus, the projection can be better trained and mitigate the domain shift problem. Extensive experiments show significant performance improvement, which verifies the effectiveness of our model.