Adequate alignment and interaction for cross-modal retrieval

被引:0
|
作者
Mingkang WANG [1 ]
Min MENG [1 ]
Jigang LIU [2 ]
Jigang WU [1 ]
机构
[1] School of Computer Science and Technology, Guangdong University of Technology
[2] Ping An Life Insurance of China
基金
美国国家科学基金会; 中国国家自然科学基金;
关键词
D O I
暂无
中图分类号
TP391.3 [检索机];
学科分类号
081203 ; 0835 ;
摘要
Background Cross-modal retrieval has attracted widespread attention in many cross-media similarity search applications, particularly image-text retrieval in the fields of computer vision and natural language processing. Recently, visual and semantic embedding(VSE) learning has shown promising improvements in image text retrieval tasks. Most existing VSE models employ two unrelated encoders to extract features and then use complex methods to contextualize and aggregate these features into holistic embeddings. Despite recent advances, existing approaches still suffer from two limitations:(1) without considering intermediate interactions and adequate alignment between different modalities, these models cannot guarantee the discriminative ability of representations;and(2) existing feature aggregators are susceptible to certain noisy regions, which may lead to unreasonable pooling coefficients and affect the quality of the final aggregated features. Methods To address these challenges, we propose a novel cross-modal retrieval model containing a well-designed alignment module and a novel multimodal fusion encoder that aims to learn the adequate alignment and interaction of aggregated features to effectively bridge the modality gap. Results Experiments on the Microsoft COCO and Flickr30k datasets demonstrated the superiority of our model over state-of-the-art methods.
引用
收藏
页码:509 / 522
页数:14
相关论文
共 50 条
  • [1] Token Embeddings Alignment for Cross-Modal Retrieval
    Xie, Chen-Wei
    Wu, Jianmin
    Zheng, Yun
    Pan, Pan
    Hua, Xian-Sheng
    [J]. PROCEEDINGS OF THE 30TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2022, 2022, : 4555 - 4563
  • [2] Robust cross-modal retrieval with alignment refurbishment
    Guo, Jinyi
    Ding, Jieyu
    [J]. FRONTIERS OF INFORMATION TECHNOLOGY & ELECTRONIC ENGINEERING, 2023, 24 (10) : 1403 - 1415
  • [3] Multilevel Semantic Interaction Alignment for Video-Text Cross-Modal Retrieval
    Chen, Lei
    Deng, Zhen
    Liu, Libo
    Yin, Shibai
    [J]. IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2024, 34 (07) : 6559 - 6575
  • [4] Category Alignment Adversarial Learning for Cross-Modal Retrieval
    He, Shiyuan
    Wang, Weiyang
    Wang, Zheng
    Xu, Xing
    Yang, Yang
    Wang, Xiaoming
    Shen, Heng Tao
    [J]. IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2023, 35 (05) : 4527 - 4538
  • [5] Learning Relation Alignment for Calibrated Cross-modal Retrieval
    Ren, Shuhuai
    Lin, Junyang
    Zhao, Guangxiang
    Men, Rui
    Yang, An
    Zhou, Jingren
    Sun, Xu
    Yang, Hongxia
    [J]. 59TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS AND THE 11TH INTERNATIONAL JOINT CONFERENCE ON NATURAL LANGUAGE PROCESSING, VOL 1 (ACL-IJCNLP 2021), 2021, : 514 - 524
  • [6] Disambiguity and Alignment: An Effective Multi-Modal Alignment Method for Cross-Modal Recipe Retrieval
    Zou, Zhuoyang
    Zhu, Xinghui
    Zhu, Qinying
    Zhang, Hongyan
    Zhu, Lei
    [J]. FOODS, 2024, 13 (11)
  • [7] Annotation Efficient Cross-Modal Retrieval with Adversarial Attentive Alignment
    Huang, Po-Yao
    Kang, Guoliang
    Liu, Wenhe
    Chang, Xiaojun
    Hauptmann, Alexander G.
    [J]. PROCEEDINGS OF THE 27TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA (MM'19), 2019, : 1758 - 1767
  • [8] Cross-Modal Interaction Network for Video Moment Retrieval
    Ping, Shen
    Jiang, Xiao
    Tian, Zean
    Cao, Ronghui
    Chi, Weiming
    Yang, Shenghong
    [J]. INTERNATIONAL JOURNAL OF PATTERN RECOGNITION AND ARTIFICIAL INTELLIGENCE, 2023, 37 (08)
  • [9] Semantic-alignment transformer and adversary hashing for cross-modal retrieval
    Sun, Yajun
    Wang, Meng
    Ma, Ying
    [J]. APPLIED INTELLIGENCE, 2024, 54 (17-18) : 7581 - 7602
  • [10] Cross-Modal Joint Prediction and Alignment for Composed Query Image Retrieval
    Yang, Yuchen
    Wang, Min
    Zhou, Wengang
    Li, Houqiang
    [J]. PROCEEDINGS OF THE 29TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2021, 2021, : 3303 - 3311