Disambiguity and Alignment: An Effective Multi-Modal Alignment Method for Cross-Modal Recipe Retrieval

被引:0
|
作者
Zou, Zhuoyang [1 ]
Zhu, Xinghui [1 ]
Zhu, Qinying [1 ]
Zhang, Hongyan [1 ]
Zhu, Lei [1 ]
机构
[1] Hunan Agr Univ, Coll Informat & Intelligence, Changsha 410128, Peoples R China
基金
中国国家自然科学基金;
关键词
cross-modal recipe retrieval; multi-modal alignment; food image ambiguity; deep learning; TRANSFORMER;
D O I
10.3390/foods13111628
中图分类号
TS2 [食品工业];
学科分类号
0832 ;
摘要
As a prominent topic in food computing, cross-modal recipe retrieval has garnered substantial attention. However, the semantic alignment across food images and recipes cannot be further enhanced due to the lack of intra-modal alignment in existing solutions. Additionally, a critical issue named food image ambiguity is overlooked, which disrupts the convergence of models. To these ends, we propose a novel Multi-Modal Alignment Method for Cross-Modal Recipe Retrieval (MMACMR). To consider inter-modal and intra-modal alignment together, this method measures the ambiguous food image similarity under the guidance of their corresponding recipes. Additionally, we enhance recipe semantic representation learning by involving a cross-attention module between ingredients and instructions, which is effective in supporting food image similarity measurement. We conduct experiments on the challenging public dataset Recipe1M; as a result, our method outperforms several state-of-the-art methods in commonly used evaluation criteria.
引用
收藏
页数:16
相关论文
共 50 条
  • [31] Heterogeneous Feature Fusion and Cross-modal Alignment for Composed Image Retrieval
    Zhang, Gangjian
    Wei, Shikui
    Pang, Huaxin
    Zhao, Yao
    [J]. PROCEEDINGS OF THE 29TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2021, 2021, : 5353 - 5362
  • [32] Discriminative Dictionary Learning With Common Label Alignment for Cross-Modal Retrieval
    Deng, Cheng
    Tang, Xu
    Yan, Junchi
    Liu, Wei
    Gao, Xinbo
    [J]. IEEE TRANSACTIONS ON MULTIMEDIA, 2016, 18 (02) : 208 - 218
  • [33] Cross-modal alignment with graph reasoning for image-text retrieval
    Zheng Cui
    Yongli Hu
    Yanfeng Sun
    Junbin Gao
    Baocai Yin
    [J]. Multimedia Tools and Applications, 2022, 81 : 23615 - 23632
  • [34] Multi-Level Cross-Modal Alignment for Image Clustering
    Qiu, Liping
    Zhang, Qin
    Chen, Xiaojun
    Cai, Shaotian
    [J]. THIRTY-EIGHTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 38 NO 13, 2024, : 14695 - 14703
  • [35] A Framework for Enabling Unpaired Multi-Modal Learning for Deep Cross-Modal Hashing Retrieval
    Williams-Lekuona, Mikel
    Cosma, Georgina
    Phillips, Iain
    [J]. JOURNAL OF IMAGING, 2022, 8 (12)
  • [36] Cross-modal attention for multi-modal image registration
    Song, Xinrui
    Chao, Hanqing
    Xu, Xuanang
    Guo, Hengtao
    Xu, Sheng
    Turkbey, Baris
    Wood, Bradford J.
    Sanford, Thomas
    Wang, Ge
    Yan, Pingkun
    [J]. MEDICAL IMAGE ANALYSIS, 2022, 82
  • [37] Multi-Level Cross-Modal Semantic Alignment Network for Video-Text Retrieval
    Nian, Fudong
    Ding, Ling
    Hu, Yuxia
    Gu, Yanhong
    [J]. MATHEMATICS, 2022, 10 (18)
  • [38] Multi-subspace Implicit Alignment for Cross-modal Retrieval on Cooking Recipes and Food Images
    Li, Lin
    Li, Ming
    Zan, Zichen
    Xie, Qing
    Liu, Jianquan
    [J]. PROCEEDINGS OF THE 30TH ACM INTERNATIONAL CONFERENCE ON INFORMATION & KNOWLEDGE MANAGEMENT, CIKM 2021, 2021, : 3211 - 3215
  • [39] Cross-modal recipe retrieval with stacked attention model
    Jing-Jing Chen
    Lei Pang
    Chong-Wah Ngo
    [J]. Multimedia Tools and Applications, 2018, 77 : 29457 - 29473
  • [40] Cross-modal Variational Alignment of Latent Spaces
    Theodoridis, Thomas
    Chatzis, Theocharis
    Solachidis, Vassilios
    Dimitropoulos, Kosmas
    Daras, Petros
    [J]. 2020 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION WORKSHOPS (CVPRW 2020), 2020, : 4127 - 4136