论文标题
Isia Food-500:通过堆叠的全球关注网络进行大规模食品识别的数据集
ISIA Food-500: A Dataset for Large-Scale Food Recognition via Stacked Global-Local Attention Network
论文作者
论文摘要
在多媒体社区的各种现实应用程序(例如饮食管理和自助餐厅)中,食品认可在多媒体社区受到了越来越多的关注。对于开发先进的大型食品识别算法以及为此类算法提供基准数据集,迫切需要大规模的食物图像本体论。为了鼓励食品识别的进一步进展,我们介绍了伊西亚食品数据集-500,其中有500个类别的Wikipedia和399,726张图像,这是一个更全面的食品数据集,可按类别覆盖和数据量超过现有的流行基准数据集。此外,我们提出了一个堆叠的全部本地关注网络,该网络由两个用于食品识别的子网络组成。一个子网首先利用混合空间通道的关注来提取更多的判别特征,然后将这些多尺度判别特征从多个层汇总到全球级别的表示(例如,质地和有关食品的形状信息)。另一个通过级联的空间变压器产生来自不同区域的注意区域(例如成分相关区域),并进一步汇总了这些多尺度区域特征,从不同层从不同的层到局部级别表示。这两种功能最终被融合为食品识别的综合代表。 ISIA Food-500和其他两个流行的基准数据集进行了广泛的实验证明了我们提出的方法的有效性,因此可以被视为一个强大的基线。可以在http://123.57.42.89/foodcomputing-dataset/isia-food500.html上找到数据集,代码和模型。
Food recognition has received more and more attention in the multimedia community for its various real-world applications, such as diet management and self-service restaurants. A large-scale ontology of food images is urgently needed for developing advanced large-scale food recognition algorithms, as well as for providing the benchmark dataset for such algorithms. To encourage further progress in food recognition, we introduce the dataset ISIA Food- 500 with 500 categories from the list in the Wikipedia and 399,726 images, a more comprehensive food dataset that surpasses existing popular benchmark datasets by category coverage and data volume. Furthermore, we propose a stacked global-local attention network, which consists of two sub-networks for food recognition. One subnetwork first utilizes hybrid spatial-channel attention to extract more discriminative features, and then aggregates these multi-scale discriminative features from multiple layers into global-level representation (e.g., texture and shape information about food). The other one generates attentional regions (e.g., ingredient relevant regions) from different regions via cascaded spatial transformers, and further aggregates these multi-scale regional features from different layers into local-level representation. These two types of features are finally fused as comprehensive representation for food recognition. Extensive experiments on ISIA Food-500 and other two popular benchmark datasets demonstrate the effectiveness of our proposed method, and thus can be considered as one strong baseline. The dataset, code and models can be found at http://123.57.42.89/FoodComputing-Dataset/ISIA-Food500.html.