论文标题
视觉表示对象超出对象的双曲线对比度学习
Hyperbolic Contrastive Learning for Visual Representations beyond Objects
论文作者
论文摘要
尽管自我监督的方法已导致视觉表示学习的快速进步,但这些方法通常使用相同的镜头处理对象和场景。在本文中,我们专注于对物体和场景的学习表示,这些对象和场景保留了它们之间的结构。 通过观察到在表示空间中接近视觉上相似的对象的动机,我们认为场景和对象应该基于其组成性遵循层次结构。为了利用这种结构,我们提出了一个对比的学习框架,其中使用欧几里得损失来学习对象表示,并使用双曲线损失来鼓励场景的表示,以在双曲线空间中靠近其成分对象的表示。这个新颖的双曲物镜通过优化其规范的大小来鼓励表现形式中的场景对象。我们表明,当在可可和开放映像数据集上进行预处理时,双曲线损失可改善多个数据集和任务的几个基线的下游性能,包括图像分类,对象检测和语义分段。我们还表明,学到的表示形式的属性使我们能够以零拍的方式解决各种涉及场景和对象之间的相互作用的视觉任务。我们的代码可以在\ url {https://github.com/shlokk/hcl/tree/main/main/hcl}中找到。
Although self-/un-supervised methods have led to rapid progress in visual representation learning, these methods generally treat objects and scenes using the same lens. In this paper, we focus on learning representations for objects and scenes that preserve the structure among them. Motivated by the observation that visually similar objects are close in the representation space, we argue that the scenes and objects should instead follow a hierarchical structure based on their compositionality. To exploit such a structure, we propose a contrastive learning framework where a Euclidean loss is used to learn object representations and a hyperbolic loss is used to encourage representations of scenes to lie close to representations of their constituent objects in a hyperbolic space. This novel hyperbolic objective encourages the scene-object hypernymy among the representations by optimizing the magnitude of their norms. We show that when pretraining on the COCO and OpenImages datasets, the hyperbolic loss improves downstream performance of several baselines across multiple datasets and tasks, including image classification, object detection, and semantic segmentation. We also show that the properties of the learned representations allow us to solve various vision tasks that involve the interaction between scenes and objects in a zero-shot fashion. Our code can be found at \url{https://github.com/shlokk/HCL/tree/main/HCL}.