Disambiguity and Alignment: An Effective Multi-Modal Alignment Method for Cross-Modal Recipe Retrieval

被引:0
|
作者
Zou, Zhuoyang [1 ]
Zhu, Xinghui [1 ]
Zhu, Qinying [1 ]
Zhang, Hongyan [1 ]
Zhu, Lei [1 ]
机构
[1] Hunan Agr Univ, Coll Informat & Intelligence, Changsha 410128, Peoples R China
基金
中国国家自然科学基金;
关键词
cross-modal recipe retrieval; multi-modal alignment; food image ambiguity; deep learning; TRANSFORMER;
D O I
10.3390/foods13111628
中图分类号
TS2 [食品工业];
学科分类号
0832 ;
摘要
As a prominent topic in food computing, cross-modal recipe retrieval has garnered substantial attention. However, the semantic alignment across food images and recipes cannot be further enhanced due to the lack of intra-modal alignment in existing solutions. Additionally, a critical issue named food image ambiguity is overlooked, which disrupts the convergence of models. To these ends, we propose a novel Multi-Modal Alignment Method for Cross-Modal Recipe Retrieval (MMACMR). To consider inter-modal and intra-modal alignment together, this method measures the ambiguous food image similarity under the guidance of their corresponding recipes. Additionally, we enhance recipe semantic representation learning by involving a cross-attention module between ingredients and instructions, which is effective in supporting food image similarity measurement. We conduct experiments on the challenging public dataset Recipe1M; as a result, our method outperforms several state-of-the-art methods in commonly used evaluation criteria.
引用
收藏
页数:16
相关论文
共 50 条
  • [1] CA_DeepSC: Cross-Modal Alignment for Multi-Modal Semantic Communications
    Wang, Wenjun
    Liu, Minghao
    Chen, Mingkai
    [J]. IEEE CONFERENCE ON GLOBAL COMMUNICATIONS, GLOBECOM, 2023, : 5871 - 5876
  • [2] Token Embeddings Alignment for Cross-Modal Retrieval
    Xie, Chen-Wei
    Wu, Jianmin
    Zheng, Yun
    Pan, Pan
    Hua, Xian-Sheng
    [J]. PROCEEDINGS OF THE 30TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2022, 2022, : 4555 - 4563
  • [3] Robust cross-modal retrieval with alignment refurbishment
    Guo, Jinyi
    Ding, Jieyu
    [J]. FRONTIERS OF INFORMATION TECHNOLOGY & ELECTRONIC ENGINEERING, 2023, 24 (10) : 1403 - 1415
  • [4] Adequate alignment and interaction for cross-modal retrieval
    Mingkang WANG
    Min MENG
    Jigang LIU
    Jigang WU
    [J]. 虚拟现实与智能硬件(中英文), 2023, 5 (06) : 509 - 522
  • [5] Multi-modal and cross-modal for lecture videos retrieval
    Nhu Van Nguyen
    Coustaty, Mickal
    Ogier, Jean-Marc
    [J]. 2014 22ND INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION (ICPR), 2014, : 2667 - 2672
  • [6] Unsupervised Multi-modal Hashing for Cross-Modal Retrieval
    Yu, Jun
    Wu, Xiao-Jun
    Zhang, Donglin
    [J]. COGNITIVE COMPUTATION, 2022, 14 (03) : 1159 - 1171
  • [7] Unsupervised Multi-modal Hashing for Cross-Modal Retrieval
    Jun Yu
    Xiao-Jun Wu
    Donglin Zhang
    [J]. Cognitive Computation, 2022, 14 : 1159 - 1171
  • [8] Multi-modal semantic autoencoder for cross-modal retrieval
    Wu, Yiling
    Wang, Shuhui
    Huang, Qingming
    [J]. NEUROCOMPUTING, 2019, 331 : 165 - 175
  • [9] Cross-Modal Retrieval Augmentation for Multi-Modal Classification
    Gur, Shir
    Neverova, Natalia
    Stauffer, Chris
    Lim, Ser-Nam
    Kiela, Douwe
    Reiter, Austin
    [J]. FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, EMNLP 2021, 2021, : 111 - 123
  • [10] Multi-Modal Pulmonary Mass Segmentation Network Based on Cross-Modal Spatial Alignment
    LI Jiaxin
    CHEN Houjin
    PENG Yahui
    LI Yanfeng
    [J]. JOURNAL OF ELECTRONICS & INFORMATION TECHNOLOGY, 2022, 44 (01) : 11 - 17