ISIA Food-500: A Dataset for Large-Scale Food Recognition via Stacked Global-Local Attention Network

被引:56
|
作者
Min, Weiqing [1 ,2 ]
Liu, Linhu [1 ,2 ]
Wang, Zhiling [1 ,2 ]
Luo, Zhengdong [1 ,2 ]
Wei, Xiaoming [3 ]
Wei, Xiaolin [3 ]
Jiang, Shuqiang [1 ,2 ]
机构
[1] Chinese Acad Sci, Key Lab Intelligent Informat Proc, Inst Comp Technol, Beijing 100190, Peoples R China
[2] Univ Chinese Acad Sci, Beijing 100049, Peoples R China
[3] Meituan Dianping Grp, Hong Kong, Peoples R China
基金
中国国家自然科学基金;
关键词
Food Recognition; Food Datasets; Benchmark; Deep Learning;
D O I
10.1145/3394171.3414031
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Food recognition has received more and more attention in the multimedia community for its various real-world applications, such as diet management and self-service restaurants. A large-scale ontology of food images is urgently needed for developing advanced large-scale food recognition algorithms, as well as for providing the benchmark dataset for such algorithms. To encourage further progress in food recognition, we introduce the dataset ISIA Food-500 with 500 categories from the list in the Wikipedia and 399,726 images, a more comprehensive food dataset that surpasses existing popular benchmark datasets by category coverage and data volume. Furthermore, we propose a stacked global-local attention network, which consists of two sub-networks for food recognition. One sub-network first utilizes hybrid spatial-channel attention to extract more discriminative features, and then aggregates these multi-scale discriminative features from multiple layers into global-level representation (e.g., texture and shape information about food). The other one generates attentional regions (e.g., ingredient relevant regions) from different regions via cascaded spatial transformers, and further aggregates these multi-scale regional features from different layers into local-level representation. These two types of features are finally fused as comprehensive representation for food recognition. Extensive experiments on ISIA Food-500 and other two popular benchmark datasets demonstrate the effectiveness of our proposed method, and thus can be considered as one strong baseline. The dataset, code and models can be found at http://123.57.42.89/FoodComputing-Dataset/ISIA-Food500.html.
引用
收藏
页码:393 / 401
页数:9
相关论文
共 50 条
  • [41] Distribution-Aware Interactive Attention Network and Large-Scale Cloud Recognition Benchmark on FY-4A Satellite Image
    Zhang, Jiaqing
    Lei, Jie
    Xie, Weiying
    Jiang, Kai
    Zhang, Xin
    Cao, Mingxiang
    Li, Yunsong
    IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, 2024, 62
  • [42] Comparison and Evaluation of Large-Scale and On-Site Recycling Systems for Food Waste via Life Cycle Cost Analysis
    Lee, Kyoung Hee
    Oh, Jeong-ik
    Chu, Kyoung Hoon
    Kwon, Suk Hyun
    Yoo, Sung Soo
    SUSTAINABILITY, 2017, 9 (12)
  • [43] Large-scale flow field super-resolution via local-global fusion convolutional neural networks
    Zhou, Xuxi
    Jin, Xiaowei
    Laima, Shujin
    Li, Hui
    PHYSICS OF FLUIDS, 2024, 36 (05)
  • [44] Impacts of large-scale farming on local communities' food security and income levels - Empirical evidence from Oromia Region, Ethiopia
    Shete, Maru
    Rutten, Marcel
    LAND USE POLICY, 2015, 47 : 282 - 292
  • [45] Emergent effects of synaptic connectivity on the dynamics of global and local slow waves in a large-scale thalamocortical network model of the human brain
    Marsh, Brianna
    Navas-Zuloaga, M. Gabriela
    Rosen, Burke Q.
    Sokolov, Yury
    Delanois, Jean Erik
    Gonzalez, Oscar C.
    Krishnan, Giri P.
    Halgren, Eric
    Bazhenov, Maxim
    PLOS COMPUTATIONAL BIOLOGY, 2024, 20 (07)
  • [46] Global Localization in Large-Scale Point Clouds via Roll-Pitch-Yaw Invariant Place Recognition and Low-Overlap Global Registration
    Wang, Zhong
    Zhang, Lin
    Zhao, Shengjie
    Zhou, Yicong
    IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2024, 34 (05) : 3846 - 3859
  • [47] Deep Global Multiple-Scale and Local Patches Attention Dual-Branch Network for Pose-Invariant Facial Expression Recognition
    Liu, Chaoji
    Liu, Xingqiao
    Chen, Chong
    Zhou, Kang
    CMES-COMPUTER MODELING IN ENGINEERING & SCIENCES, 2024, 139 (01): : 405 - 440
  • [48] A full-attention network with an open dataset for large-scale building semantic segmentation along long-span high-speed rail lines
    Qiao, Wenfan
    Shen, Li
    Wang, Jicheng
    Li, Zhilin
    INTERNATIONAL JOURNAL OF DIGITAL EARTH, 2024, 17 (01)
  • [49] Combining principal component analysis with an artificial neural network to perform online quality assessment of food as it cooks in a large-scale industrial oven
    O'Farrell, M
    Lewis, E
    Flanagan, C
    Lyons, WB
    Jackman, N
    SENSORS AND ACTUATORS B-CHEMICAL, 2005, 107 (01): : 104 - 112
  • [50] Facile control of copper nanowire dimensions via the Maillard reaction: using food chemistry for fabricating large-scale transparent flexible conductors
    Kevin, M.
    Lim, Gregory Y. R.
    Ho, G. W.
    GREEN CHEMISTRY, 2015, 17 (02) : 1120 - 1126