Image-Text Cross-Modal Retrieval with Instance Contrastive Embedding

被引:0
|
作者
Zeng, Ruigeng [1 ]
Ma, Wentao [2 ]
Wu, Xiaoqian [2 ]
Liu, Wei [3 ]
Liu, Jie [1 ]
机构
[1] Natl Univ Def Technol, Sci & Technol Parallel & Distributed Proc Lab, Changsha 410073, Peoples R China
[2] Anhui Agr Univ, Sch Informat & Artificial Intelligence, Hefei 230036, Peoples R China
[3] Anhui Univ Finance & Econ, Sch Management Sci & Engn, Bengbu 233030, Peoples R China
基金
中国国家自然科学基金;
关键词
cross-modal retrieval; semantic gap; two-stage training strategy;
D O I
10.3390/electronics13020300
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Image-text cross-modal retrieval aims to bridge the semantic gap between different modalities, allowing for the search of images based on textual descriptions or vice versa. Existing efforts in this field concentrate on coarse-grained feature representation and then utilize pairwise ranking loss to pull image-text positive pairs closer, pushing negative ones apart. However, using pairwise ranking loss directly on coarse-grained representation lacks reliability as it disregards fine-grained information, posing a challenge in narrowing the semantic gap between image and text. To this end, we propose an Instance Contrastive Embedding (IConE) method for image-text cross-modal retrieval. Specifically, we first transfer the multi-modal pre-training model to the cross-modal retrieval task to leverage the interactive information between image and text, thereby enhancing the model's representational capabilities. Then, to comprehensively consider the feature distribution of intra- and inter-modality, we design a novel two-stage training strategy that combines instance loss and contrastive loss, dedicated to extracting fine-grained representation within instances and bridging the semantic gap between modalities. Extensive experiments on two public benchmark datasets, Flickr30k and MS-COCO, demonstrate that our IConE outperforms several state-of-the-art (SoTA) baseline methods and achieves competitive performance.
引用
收藏
页数:18
相关论文
共 50 条
  • [11] Joint feature approach for image-text cross-modal retrieval
    Gao, Dihui
    Sheng, Lijie
    Xu, Xiaodong
    Miao, Qiguang
    Xi'an Dianzi Keji Daxue Xuebao/Journal of Xidian University, 2024, 51 (04): : 128 - 138
  • [12] Cross-modal Graph Matching Network for Image-text Retrieval
    Cheng, Yuhao
    Zhu, Xiaoguang
    Qian, Jiuchao
    Wen, Fei
    Liu, Peilin
    ACM TRANSACTIONS ON MULTIMEDIA COMPUTING COMMUNICATIONS AND APPLICATIONS, 2022, 18 (04)
  • [13] Probability Distribution Representation Learning for Image-Text Cross-Modal Retrieval
    Yang C.
    Liu L.
    Jisuanji Fuzhu Sheji Yu Tuxingxue Xuebao/Journal of Computer-Aided Design and Computer Graphics, 2022, 34 (05): : 751 - 759
  • [14] Heterogeneous Graph Fusion Network for cross-modal image-text retrieval
    Qin, Xueyang
    Li, Lishuang
    Pang, Guangyao
    Hao, Fei
    EXPERT SYSTEMS WITH APPLICATIONS, 2024, 249
  • [15] SAM: cross-modal semantic alignments module for image-text retrieval
    Pilseo Park
    Soojin Jang
    Yunsung Cho
    Youngbin Kim
    Multimedia Tools and Applications, 2024, 83 : 12363 - 12377
  • [16] Image-text bidirectional learning network based cross-modal retrieval
    Li, Zhuoyi
    Lu, Huibin
    Fu, Hao
    Gu, Guanghua
    NEUROCOMPUTING, 2022, 483 : 148 - 159
  • [17] RICH: A rapid method for image-text cross-modal hash retrieval
    Li, Bo
    Yao, Dan
    Li, Zhixin
    DISPLAYS, 2023, 79
  • [18] SAM: cross-modal semantic alignments module for image-text retrieval
    Park, Pilseo
    Jang, Soojin
    Cho, Yunsung
    Kim, Youngbin
    MULTIMEDIA TOOLS AND APPLICATIONS, 2024, 83 (04) : 12363 - 12377
  • [19] An Enhanced Feature Extraction Framework for Cross-Modal Image-Text Retrieval
    Zhang, Jinzhi
    Wang, Luyao
    Zheng, Fuzhong
    Wang, Xu
    Zhang, Haisu
    REMOTE SENSING, 2024, 16 (12)
  • [20] Learning Hierarchical Semantic Correspondences for Cross-Modal Image-Text Retrieval
    Zeng, Sheng
    Liu, Changhong
    Zhou, Jun
    Chen, Yong
    Jiang, Aiwen
    Li, Hanxi
    PROCEEDINGS OF THE 2022 INTERNATIONAL CONFERENCE ON MULTIMEDIA RETRIEVAL, ICMR 2022, 2022, : 239 - 248