Dual Semantic Relationship Attention Network for Image-Text Matching

被引:0
|
作者
Wen, Keyu [1 ]
Gu, Xiaodong [1 ]
机构
[1] Fudan Univ, Dept Elect Engn, Shanghai 200433, Peoples R China
基金
中国国家自然科学基金;
关键词
cross-modal; retrieval; attention; semantic relationship;
D O I
10.1109/ijcnn48605.2020.9206782
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Image-Text Matching is one major task in cross-modal information processing. The main challenge is to learn the unified vision and language representations. Previous methods that perform well on this task primarily focus on the region features in images corresponding to the words in sentences. However, this will cause the regional features to lose contact with the global context, leading to the mismatch with those non-object words in some sentences. In this work, in order to alleviate this problem, a novel Dual Semantic Relationship Attention Network is proposed which mainly consists of two modules, separate semantic relationship module and the joint semantic relationship module. With these two modules, different hierarchies of semantic relationships are learned simultaneously, thus promoting the image-text matching process. Quantitative experiments have been performed on MS-COCO and Flickr-30K and our method outperforms previous approaches by a large margin due to the effectiveness of the dual semantic relationship attention scheme.
引用
收藏
页数:7
相关论文
共 50 条
  • [41] Region Reinforcement Network With Topic Constraint for Image-Text Matching
    Wu, Jie
    Wu, Chunlei
    Lu, Jing
    Wang, Leiquan
    Cui, Xuerong
    IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2022, 32 (01) : 388 - 397
  • [42] A Multiview Text Imagination Network Based on Latent Alignment for Image-Text Matching
    Shang, Heng
    Zhao, Guoshuai
    Shi, Jing
    Qian, Xueming
    IEEE INTELLIGENT SYSTEMS, 2023, 38 (03) : 41 - 50
  • [43] Context-Aware Attention Network for Image-Text Retrieval
    Zhang, Qi
    Lei, Zhen
    Zhang, Zhaoxiang
    Li, Stan Z.
    2020 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2020, : 3533 - 3542
  • [44] Image-text matching algorithm based on multi-level semantic alignment
    Li Y.
    Yao T.
    Zhang L.
    Sun Y.
    Fu H.
    Beijing Hangkong Hangtian Daxue Xuebao/Journal of Beijing University of Aeronautics and Astronautics, 2024, 50 (02): : 551 - 558
  • [45] An end-to-end image-text matching approach considering semantic uncertainty
    Tuerhong, Gulanbaier
    Dai, Xin
    Tian, Liwei
    Wushouer, Mairidan
    NEUROCOMPUTING, 2024, 607
  • [46] Unlocking the Power of Cross-Dimensional Semantic Dependency for Image-Text Matching
    Zhang, Kun
    Zhang, Lei
    Hu, Bo
    Zhu, Mengxiao
    Mao, Zhendong
    PROCEEDINGS OF THE 31ST ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2023, 2023, : 4828 - 4837
  • [47] Multi-scale motivated neural network for image-text matching
    Qin, Xueyang
    Li, Lishuang
    Pang, Guangyao
    MULTIMEDIA TOOLS AND APPLICATIONS, 2023, 83 (2) : 4383 - 4407
  • [48] Cross-modal Semantically Augmented Network for Image-text Matching
    Yao, Tao
    Li, Yiru
    Li, Ying
    Zhu, Yingying
    Wang, Gang
    Yue, Jun
    ACM TRANSACTIONS ON MULTIMEDIA COMPUTING COMMUNICATIONS AND APPLICATIONS, 2024, 20 (04)
  • [49] Multi-scale motivated neural network for image-text matching
    Xueyang Qin
    Lishuang Li
    Guangyao Pang
    Multimedia Tools and Applications, 2024, 83 : 4383 - 4407
  • [50] Cross-modal Graph Matching Network for Image-text Retrieval
    Cheng, Yuhao
    Zhu, Xiaoguang
    Qian, Jiuchao
    Wen, Fei
    Liu, Peilin
    ACM TRANSACTIONS ON MULTIMEDIA COMPUTING COMMUNICATIONS AND APPLICATIONS, 2022, 18 (04)