Cross-Modal Image-Text Retrieval with Semantic Consistency

被引：47

作者：

Chen, Hui ^{[1
,2
]}

Ding, Guiguang ^{[1
,2
]}

Lin, Zijin ^{[3
]}

Zhao, Sicheng ^{[4
]}

Han, Jungong ^{[5
]}

机构：

[1] Tsinghua Univ, Sch Software, Beijing, Peoples R China

[2] Tsinghua Univ, Beijing Natl Res Ctr Informat Sci & Technol BNRis, Beijing, Peoples R China

[3] Microsoft Res, Beijing, Peoples R China

[4] Univ Calif Berkeley, Dept Elect Engn & Comp Sci, Berkeley, CA 94720 USA

[5] Univ Warwick, WMG Data Sci, Coventry, W Midlands, England

来源：

PROCEEDINGS OF THE 27TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA (MM'19) | 2019年

基金：

中国国家自然科学基金; 国家重点研发计划;

关键词：

Cross-modal; image-text retrieval; semantic consistency;

D O I：

10.1145/3343031.3351055

中图分类号：

TP39 [计算机的应用];

学科分类号：

081203 ; 0835 ;

摘要：

Cross-modal image-text retrieval has been a long-standing challenge in the multimedia community. Existing methods explore various complicated embedding spaces to assess the semantic similarity between a given image-text pair, but consider no/little about the consistency across them. To remedy this situation, we introduce the idea of semantic consistency for learning various embedding spaces jointly. Specifically, similar to the previous works, we start by constructing two different embedding spaces, namely the image-grounded embedding space and the text-grounded embedding space. However, instead of learning these two embedding spaces separately, we incorporate a semantic consistency constraint in the common ranking objective function such that both embedding spaces can be learned simultaneously and benefit from each other to gain performance improvement. We conduct extensive experiments on three benchmark datasets, i.e., Flickr8k, Flickr30k and MS COCO. Results show that our model outperforms the state-of-the-art models on all three datasets, which can well demonstrate the effectiveness and superiority of the introduction of semantic consistency. Our source code is released at: https://github.com/HuiChen24/SemanticConsistency.

引用

页码：1749 / 1757

页数：9

共 50 条

[1] Image-Text Retrieval With Cross-Modal Semantic Importance Consistency
Liu, Zejun
Chen, Fanglin
Xu, Jun
Pei, Wenjie
Lu, Guangming
[J]. IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2023, 33 (05) : 2465 - 2476
[2] SAM: cross-modal semantic alignments module for image-text retrieval
Pilseo Park
Soojin Jang
Yunsung Cho
Youngbin Kim
[J]. Multimedia Tools and Applications, 2024, 83 : 12363 - 12377
[3] SAM: cross-modal semantic alignments module for image-text retrieval
Park, Pilseo
Jang, Soojin
Cho, Yunsung
Kim, Youngbin
[J]. MULTIMEDIA TOOLS AND APPLICATIONS, 2024, 83 (04) : 12363 - 12377
[4] Learning Hierarchical Semantic Correspondences for Cross-Modal Image-Text Retrieval
Zeng, Sheng
Liu, Changhong
Zhou, Jun
Chen, Yong
Jiang, Aiwen
Li, Hanxi
[J]. PROCEEDINGS OF THE 2022 INTERNATIONAL CONFERENCE ON MULTIMEDIA RETRIEVAL, ICMR 2022, 2022, : 239 - 248
[5] DEEP RANK CROSS-MODAL HASHING WITH SEMANTIC CONSISTENT FOR IMAGE-TEXT RETRIEVAL
Liu, Xiaoqing
Zeng, Huanqiang
Shi, Yifan
Zhu, Jianqing
Ma, Kai-Kuang
[J]. ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings, 2022, 2022-May : 4828 - 4832
[6] Visual Contextual Semantic Reasoning for Cross-Modal Drone Image-Text Retrieval
Huang, Jinghao
Chen, Yaxiong
Xiong, Shengwu
Lu, Xiaoqiang
[J]. IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, 2024, 62
[7] DEEP RANK CROSS-MODAL HASHING WITH SEMANTIC CONSISTENT FOR IMAGE-TEXT RETRIEVAL
Liu, Xiaoqing
Zeng, Huanqiang
Shi, Yifan
Zhu, Jianqing
Ma, Kai-Kuang
[J]. 2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2022, : 4828 - 4832
[8] Cross-modal Image-Text Retrieval with Multitask Learning
Luo, Junyu
Shen, Ying
Ao, Xiang
Zhao, Zhou
Yang, Min
[J]. PROCEEDINGS OF THE 28TH ACM INTERNATIONAL CONFERENCE ON INFORMATION & KNOWLEDGE MANAGEMENT (CIKM '19), 2019, : 2309 - 2312
[9] Rethinking Benchmarks for Cross-modal Image-text Retrieval
Chen, Weijing
Yao, Linli
Jin, Qin
[J]. PROCEEDINGS OF THE 46TH INTERNATIONAL ACM SIGIR CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION RETRIEVAL, SIGIR 2023, 2023, : 1241 - 1251
[10] A Deep Semantic Alignment Network for the Cross-Modal Image-Text Retrieval in Remote Sensing
Cheng, Qimin
Zhou, Yuzhuo
Fu, Peng
Xu, Yuan
Zhang, Liang
[J]. IEEE JOURNAL OF SELECTED TOPICS IN APPLIED EARTH OBSERVATIONS AND REMOTE SENSING, 2021, 14 : 4284 - 4297

← 1 2 3 4 5 →