Deep Cross-Modal Projection Learning for Image-Text Matching

被引：192

作者：

Zhang, Ying ^{[1
]}

Lu, Huchuan ^{[1
]}

机构：

[1] Dalian Univ Technol, Dalian, Peoples R China

来源：

COMPUTER VISION - ECCV 2018, PT I | 2018年 / 11205卷

关键词：

Image-text matching; Cross-modal projection; Joint embedding learning; Deep learning;

D O I：

10.1007/978-3-030-01246-5_42

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

The key point of image-text matching is how to accurately measure the similarity between visual and textual inputs. Despite the great progress of associating the deep cross-modal embeddings with the bi-directional ranking loss, developing the strategies for mining useful triplets and selecting appropriate margins remains a challenge in real applications. In this paper, we propose a cross-modal projection matching (CMPM) loss and a cross-modal projection classification (CMPC) loss for learning discriminative image-text embeddings. The CMPM loss minimizes the KL divergence between the projection compatibility distributions and the normalized matching distributions defined with all the positive and negative samples in a mini-batch. The CMPC loss attempts to categorize the vector projection of representations from one modality onto another with the improved norm-softmax loss, for further enhancing the feature compactness of each class. Extensive analysis and experiments on multiple datasets demonstrate the superiority of the proposed approach.

引用

页码：707 / 723

页数：17

共 50 条

[1] Cross-Modal Image-Text Matching via Coupled Projection Learning Hashing
Zhao, Huan
Wang, Haoqian
Zha, Xupeng
Wang, Song
[J]. 2022 IEEE 9TH INTERNATIONAL CONFERENCE ON DATA SCIENCE AND ADVANCED ANALYTICS (DSAA), 2022, : 367 - 376
[2] Cross-modal Image-Text Retrieval with Multitask Learning
Luo, Junyu
Shen, Ying
Ao, Xiang
Zhao, Zhou
Yang, Min
[J]. PROCEEDINGS OF THE 28TH ACM INTERNATIONAL CONFERENCE ON INFORMATION & KNOWLEDGE MANAGEMENT (CIKM '19), 2019, : 2309 - 2312
[3] Cross-modal Semantically Augmented Network for Image-text Matching
Yao, Tao
Li, Yiru
Li, Ying
Zhu, Yingying
Wang, Gang
Yue, Jun
[J]. ACM TRANSACTIONS ON MULTIMEDIA COMPUTING COMMUNICATIONS AND APPLICATIONS, 2024, 20 (04)
[4] Cross-Modal Attention With Semantic Consistence for Image-Text Matching
Xu, Xing
Wang, Tan
Yang, Yang
Zuo, Lin
Shen, Fumin
Shen, Heng Tao
[J]. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2020, 31 (12) : 5412 - 5425
[5] Cross-modal Semantic Interference Suppression for image-text matching
Yao, Tao
Peng, Shouyong
Sun, Yujuan
Sheng, Guorui
Fu, Haiyan
Kong, Xiangwei
[J]. Engineering Applications of Artificial Intelligence, 2024, 133
[6] Cross-modal Graph Matching Network for Image-text Retrieval
Cheng, Yuhao
Zhu, Xiaoguang
Qian, Jiuchao
Wen, Fei
Liu, Peilin
[J]. ACM TRANSACTIONS ON MULTIMEDIA COMPUTING COMMUNICATIONS AND APPLICATIONS, 2022, 18 (04)
[7] Cross-modal Semantic Interference Suppression for image-text matching
Yao, Tao
Peng, Shouyong
Sun, Yujuan
Sheng, Guorui
Fu, Haiyan
Kong, Xiangwei
[J]. ENGINEERING APPLICATIONS OF ARTIFICIAL INTELLIGENCE, 2024, 133
[8] Improving Image-Text Matching With Bidirectional Consistency of Cross-Modal Alignment
Li, Zhe
Zhang, Lei
Zhang, Kun
Zhang, Yongdong
Mao, Zhendong
[J]. IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2024, 34 (07) : 6590 - 6607
[9] Cross-modal multi-relationship aware reasoning for image-text matching
Zhang, Jin
He, Xiaohai
Qing, Linbo
Liu, Luping
Luo, Xiaodong
[J]. MULTIMEDIA TOOLS AND APPLICATIONS, 2022, 81 (09) : 12005 - 12027
[10] Cross-modal multi-relationship aware reasoning for image-text matching
Jin Zhang
Xiaohai He
Linbo Qing
Luping Liu
Xiaodong Luo
[J]. Multimedia Tools and Applications, 2022, 81 : 12005 - 12027

← 1 2 3 4 5 →