视觉意识到基于层次结构的食物识别

论文标题

视觉意识到基于层次结构的食物识别

Visual Aware Hierarchy Based Food Recognition

论文作者

Mao, Runyu, He, Jiangpeng, Shao, Zeman, Yarlagadda, Sri Kalyan, Zhu, Fengqing

论文摘要

食物识别是基于图像的饮食评估中最重要的组成部分之一。但是，由于食品图像的复杂程度不同，食品类别的阶层间相似性，对于基于图像的食物识别系统而言，对于各种公开可用的数据集来说，这是一项挑战。在这项工作中，我们提出了一种新的两步食品识别系统，其中包括食品本地化和使用卷积神经网络（CNN）作为骨干建筑的分层食品分类。食品定位步骤基于实施更快的R-CNN方法来识别食品区域。在食品分类步骤中，可以将视觉上相似的食物类别自动聚集在一起，以生成代表食物类别之间语义视觉关系的层次结构，然后提出了多任务CNN模型来基于视觉意识层次结构执行分类任务。由于数据集的大小和质量是数据驱动方法的关键组成部分，因此我们介绍了一个新的食物图像数据集，Viper-Foodnet（VFN）数据集，由82个食品类别组成，其基于美国最常见的食品的15K图像。半自动众包工具用于为该数据集提供基础真相信息，包括食品对象边界框和食品对象标签。实验结果表明，我们的系统可以显着改善4个公开可用数据集和新的VFN数据集的分类和识别性能。

Food recognition is one of the most important components in image-based dietary assessment. However, due to the different complexity level of food images and inter-class similarity of food categories, it is challenging for an image-based food recognition system to achieve high accuracy for a variety of publicly available datasets. In this work, we propose a new two-step food recognition system that includes food localization and hierarchical food classification using Convolutional Neural Networks (CNNs) as the backbone architecture. The food localization step is based on an implementation of the Faster R-CNN method to identify food regions. In the food classification step, visually similar food categories can be clustered together automatically to generate a hierarchical structure that represents the semantic visual relations among food categories, then a multi-task CNN model is proposed to perform the classification task based on the visual aware hierarchical structure. Since the size and quality of dataset is a key component of data driven methods, we introduce a new food image dataset, VIPER-FoodNet (VFN) dataset, consists of 82 food categories with 15k images based on the most commonly consumed foods in the United States. A semi-automatic crowdsourcing tool is used to provide the ground-truth information for this dataset including food object bounding boxes and food object labels. Experimental results demonstrate that our system can significantly improve both classification and recognition performance on 4 publicly available datasets and the new VFN dataset.

下载PDF全文

下载文献需遵守相关版权规定

论文标题