Learning Semantic Relationship among Instances for Image-Text Matching

被引:16
|
作者
Fu, Zheren [1 ]
Mao, Zhendong [1 ,2 ]
Song, Yan [1 ]
Zhang, Yongdong [1 ,2 ]
机构
[1] Univ Sci & Technol China, Hefei, Peoples R China
[2] Hefei Comprehens Natl Sci Ctr, Inst Artificial Intelligence, Hefei, Peoples R China
基金
中国国家自然科学基金;
关键词
D O I
10.1109/CVPR52729.2023.01455
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Image-text matching, a bridge connecting image and language, is an important task, which generally learns a holistic cross-modal embedding to achieve a high-quality semantic alignment between the two modalities. However, previous studies only focus on capturing fragment-level relation within a sample from a particular modality, e.g., salient regions in an image or text words in a sentence, where they usually pay less attention to capturing instance-level interactions among samples and modalities, e.g., multiple images and texts. In this paper, we argue that sample relations could help learn subtle differences for hard negative instances, and thus transfer shared knowledge for infrequent samples should be promising in obtaining better holistic embeddings. Therefore, we propose a novel hierarchical relation modeling framework (HREM), which explicitly capture both fragment- and instance-level relations to learn discriminative and robust cross-modal embeddings. Extensive experiments on Flickr30K and MS-COCO show our proposed method outperforms the state-of-the-art ones by 4%-10% in terms of rSum. Our code is available at https://github.com/CrossmodalGroup/HREM.
引用
收藏
页码:15159 / 15168
页数:10
相关论文
共 50 条
  • [1] Dual Semantic Relationship Attention Network for Image-Text Matching
    Wen, Keyu
    Gu, Xiaodong
    [J]. 2020 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2020,
  • [2] Enhanced Semantic Similarity Learning Framework for Image-Text Matching
    Zhang, Kun
    Hu, Bo
    Zhang, Huatian
    Li, Zhe
    Mao, Zhendong
    [J]. IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2024, 34 (04) : 2973 - 2988
  • [3] Learning Relationship-Enhanced Semantic Graph for Fine-Grained Image-Text Matching
    Liu, Xin
    He, Yi
    Cheung, Yiu-Ming
    Xu, Xing
    Wang, Nannan
    [J]. IEEE TRANSACTIONS ON CYBERNETICS, 2024, 54 (02) : 948 - 961
  • [4] Visual Semantic Reasoning for Image-Text Matching
    Li, Kunpeng
    Zhang, Yulun
    Li, Kai
    Li, Yuanyuan
    Fu, Yun
    [J]. 2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2019), 2019, : 4653 - 4661
  • [5] IMAGE-TEXT MATCHING WITH SHARED SEMANTIC CONCEPTS
    Miao Lanxin
    [J]. 2022 19TH INTERNATIONAL COMPUTER CONFERENCE ON WAVELET ACTIVE MEDIA TECHNOLOGY AND INFORMATION PROCESSING (ICCWAMTIP), 2022,
  • [6] Learning Dual Semantic Relations With Graph Attention for Image-Text Matching
    Wen, Keyu
    Gu, Xiaodong
    Cheng, Qingrong
    [J]. IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2021, 31 (07) : 2866 - 2879
  • [7] Regularizing Visual Semantic Embedding With Contrastive Learning for Image-Text Matching
    Liu, Yang
    Liu, Hong
    Wang, Huaqiu
    Liu, Mengyuan
    [J]. IEEE SIGNAL PROCESSING LETTERS, 2022, 29 : 1332 - 1336
  • [8] Knowledge Aware Semantic Concept Expansion for Image-Text Matching
    Shi, Botian
    Ji, Lei
    Lu, Pan
    Niu, Zhendong
    Duan, Nan
    [J]. PROCEEDINGS OF THE TWENTY-EIGHTH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2019, : 5182 - 5189
  • [9] A Survey on Deeр Learning Based Image-Text Matching
    Liu, Meng
    Qi, Meng-Jin
    Zhan, Zhen-Yu
    Qu, Lei-Gang
    Nie, Xiu-Shan
    Nie, Li-Qiang
    [J]. Jisuanji Xuebao/Chinese Journal of Computers, 2023, 46 (11): : 2370 - 2399
  • [10] Learning hierarchical embedding space for image-text matching
    Sun, Hao
    Qin, Xiaolin
    Liu, Xiaojing
    [J]. INTELLIGENT DATA ANALYSIS, 2024, 28 (03) : 647 - 665