Cross-Modal Image-Text Retrieval with Semantic Consistency

被引:47
|
作者
Chen, Hui [1 ,2 ]
Ding, Guiguang [1 ,2 ]
Lin, Zijin [3 ]
Zhao, Sicheng [4 ]
Han, Jungong [5 ]
机构
[1] Tsinghua Univ, Sch Software, Beijing, Peoples R China
[2] Tsinghua Univ, Beijing Natl Res Ctr Informat Sci & Technol BNRis, Beijing, Peoples R China
[3] Microsoft Res, Beijing, Peoples R China
[4] Univ Calif Berkeley, Dept Elect Engn & Comp Sci, Berkeley, CA 94720 USA
[5] Univ Warwick, WMG Data Sci, Coventry, W Midlands, England
基金
中国国家自然科学基金; 国家重点研发计划;
关键词
Cross-modal; image-text retrieval; semantic consistency;
D O I
10.1145/3343031.3351055
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
Cross-modal image-text retrieval has been a long-standing challenge in the multimedia community. Existing methods explore various complicated embedding spaces to assess the semantic similarity between a given image-text pair, but consider no/little about the consistency across them. To remedy this situation, we introduce the idea of semantic consistency for learning various embedding spaces jointly. Specifically, similar to the previous works, we start by constructing two different embedding spaces, namely the image-grounded embedding space and the text-grounded embedding space. However, instead of learning these two embedding spaces separately, we incorporate a semantic consistency constraint in the common ranking objective function such that both embedding spaces can be learned simultaneously and benefit from each other to gain performance improvement. We conduct extensive experiments on three benchmark datasets, i.e., Flickr8k, Flickr30k and MS COCO. Results show that our model outperforms the state-of-the-art models on all three datasets, which can well demonstrate the effectiveness and superiority of the introduction of semantic consistency. Our source code is released at: https://github.com/HuiChen24/SemanticConsistency.
引用
收藏
页码:1749 / 1757
页数:9
相关论文
共 50 条
  • [1] Image-Text Retrieval With Cross-Modal Semantic Importance Consistency
    Liu, Zejun
    Chen, Fanglin
    Xu, Jun
    Pei, Wenjie
    Lu, Guangming
    [J]. IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2023, 33 (05) : 2465 - 2476
  • [2] SAM: cross-modal semantic alignments module for image-text retrieval
    Pilseo Park
    Soojin Jang
    Yunsung Cho
    Youngbin Kim
    [J]. Multimedia Tools and Applications, 2024, 83 : 12363 - 12377
  • [3] SAM: cross-modal semantic alignments module for image-text retrieval
    Park, Pilseo
    Jang, Soojin
    Cho, Yunsung
    Kim, Youngbin
    [J]. MULTIMEDIA TOOLS AND APPLICATIONS, 2024, 83 (04) : 12363 - 12377
  • [4] Learning Hierarchical Semantic Correspondences for Cross-Modal Image-Text Retrieval
    Zeng, Sheng
    Liu, Changhong
    Zhou, Jun
    Chen, Yong
    Jiang, Aiwen
    Li, Hanxi
    [J]. PROCEEDINGS OF THE 2022 INTERNATIONAL CONFERENCE ON MULTIMEDIA RETRIEVAL, ICMR 2022, 2022, : 239 - 248
  • [5] DEEP RANK CROSS-MODAL HASHING WITH SEMANTIC CONSISTENT FOR IMAGE-TEXT RETRIEVAL
    Liu, Xiaoqing
    Zeng, Huanqiang
    Shi, Yifan
    Zhu, Jianqing
    Ma, Kai-Kuang
    [J]. ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings, 2022, 2022-May : 4828 - 4832
  • [6] Visual Contextual Semantic Reasoning for Cross-Modal Drone Image-Text Retrieval
    Huang, Jinghao
    Chen, Yaxiong
    Xiong, Shengwu
    Lu, Xiaoqiang
    [J]. IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, 2024, 62
  • [7] DEEP RANK CROSS-MODAL HASHING WITH SEMANTIC CONSISTENT FOR IMAGE-TEXT RETRIEVAL
    Liu, Xiaoqing
    Zeng, Huanqiang
    Shi, Yifan
    Zhu, Jianqing
    Ma, Kai-Kuang
    [J]. 2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2022, : 4828 - 4832
  • [8] Cross-modal Image-Text Retrieval with Multitask Learning
    Luo, Junyu
    Shen, Ying
    Ao, Xiang
    Zhao, Zhou
    Yang, Min
    [J]. PROCEEDINGS OF THE 28TH ACM INTERNATIONAL CONFERENCE ON INFORMATION & KNOWLEDGE MANAGEMENT (CIKM '19), 2019, : 2309 - 2312
  • [9] Rethinking Benchmarks for Cross-modal Image-text Retrieval
    Chen, Weijing
    Yao, Linli
    Jin, Qin
    [J]. PROCEEDINGS OF THE 46TH INTERNATIONAL ACM SIGIR CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION RETRIEVAL, SIGIR 2023, 2023, : 1241 - 1251
  • [10] A Deep Semantic Alignment Network for the Cross-Modal Image-Text Retrieval in Remote Sensing
    Cheng, Qimin
    Zhou, Yuzhuo
    Fu, Peng
    Xu, Yuan
    Zhang, Liang
    [J]. IEEE JOURNAL OF SELECTED TOPICS IN APPLIED EARTH OBSERVATIONS AND REMOTE SENSING, 2021, 14 : 4284 - 4297