Zero-Shot Food Image Detection Based on Transformer

被引:0
|
作者
Song, Jingru [1 ]
Min, Weiqing [2 ,3 ]
Zhou, Pengfei [2 ,3 ]
Rao, Quanrui [1 ]
Sheng, Guorui [1 ]
Yang, Yancun [1 ]
Wang, Lili [1 ]
Jiang, Shuqiang [2 ,3 ]
机构
[1] School of Information and Electrical Engineering, Ludong University, Yantai,264025, China
[2] Institute of Computing Technology, Chinese Academy of Sciences, Beijing,100190, China
[3] Key Lab of Intelligent Information Processing, Chinese Academy of Sciences, Beijing,100190, China
关键词
Food chemistry - Food ingredients;
D O I
10.13386/j.issn1002-0306.2024030027
中图分类号
学科分类号
摘要
As a fundamental task in food computing, food detection played a crucial role in locating and identifying food items from input images, particularly in applications such as intelligent canteen settlement and dietary health management. However, food categories were constantly updating in practical scenarios, making it difficult for food detectors trained on fixed categories to accurately detect previously unseen food categories. To address this issue, this paper proposed a zero-shot food image detection method. Firstly, a Transformer-based food primitive generator was constructed, where each primitive contained fine-grained attributes relevant to food categories. These primitives could be selectively assembled based on the food characteristics to synthesize new food features. Secondly, an enhancement component of visual feature disentanglement was proposed in order to impose more constraints on the visual features of unseen food categories. The visual features of food images were decomposed into semantically related features and semantically unrelated features, thereby better transferring semantic knowledge of food categories to their visual features. The proposed method was extensively evaluated on the ZSFooD and UEC-FOOD256 datasets through numerous experiments and ablation studies. Under the zero-shot detection (ZSD) setting, optimal average precision on unseen classes reached 4.9% and 24.1%, respectively, demonstrating the effectiveness of the proposed approach. Under the generalized zero-shot detection (GZSD) setting, the harmonic mean of visible and unseen classes reaches 5.8% and 22.0%, respectively, further validating the effectiveness of the proposed method. © The Author(s) 2024.
引用
收藏
页码:18 / 26
相关论文
共 50 条
  • [31] Zero-Shot Defect Feature Optimizer: an efficient zero-shot optimization method for defect detection
    Yan, Zhibo
    Wu, Hanyang
    Aasim, Tehreem
    Yao, Haitao
    Zhang, Teng
    Wang, Dongyun
    JOURNAL OF ELECTRONIC IMAGING, 2025, 34 (01)
  • [32] Vision transformer-based generalized zero-shot learning with data criticizing
    Zhou, Quan
    Liang, Yucuan
    Zhang, Zhenqi
    Cao, Wenming
    APPLIED INTELLIGENCE, 2025, 55 (06)
  • [33] Zero-Shot Sketch-Image Hashing
    Shen, Yuming
    Liu, Li
    Shen, Fumin
    Shao, Ling
    2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2018, : 3598 - 3607
  • [34] ZERO-SHOT MEDICAL IMAGE ARTIFACT REDUCTION
    Chen, Yu-Jen
    Chang, Yen-Jung
    Wen, Shao-Cheng
    Shi, Yiyu
    Xu, Xiaowei
    Ho, Tsung-Yi
    Jia, Qianjun
    Huang, Meiping
    Zhuang, Jian
    2020 IEEE 17TH INTERNATIONAL SYMPOSIUM ON BIOMEDICAL IMAGING (ISBI 2020), 2020, : 862 - 866
  • [35] ATTRIBUTE HASHING FOR ZERO-SHOT IMAGE RETRIEVAL
    Xu, Yahui
    Yang, Yang
    Shen, Fumin
    Xu, Xing
    Zhou, Yuxuan
    Shen, Heng Tao
    2017 IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXPO (ICME), 2017, : 133 - 138
  • [36] Zero-Shot Image Retrieval with Human Feedback
    Agnolucci, Lorenzo
    Baldrati, Alberto
    Bertini, Marco
    Del Bimbo, Alberto
    PROCEEDINGS OF THE 31ST ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2023, 2023, : 9417 - 9419
  • [37] Embarrassingly Easy Zero-Shot Image Recognition
    Song, Wenli
    Zhang, Lei
    Fu, Jingru
    BIOMETRIC RECOGNITION (CCBR 2019), 2019, 11818 : 126 - 133
  • [38] Gaze Embeddings for Zero-Shot Image Classification
    Karessli, Nour
    Akata, Zeynep
    Schiele, Bernt
    Bulling, Andreas
    30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017), 2017, : 6412 - 6421
  • [39] Zero-Shot Text-to-Image Generation
    Ramesh, Aditya
    Pavlov, Mikhail
    Goh, Gabriel
    Gray, Scott
    Voss, Chelsea
    Radford, Alec
    Chen, Mark
    Sutskever, Ilya
    INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 139, 2021, 139
  • [40] Multimodal Ensembling for Zero-Shot Image Classification
    Hickmon, Javon
    THIRTY-EIGTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 38 NO 21, 2024, : 23747 - 23749