Deep Cross-Modal Projection Learning for Image-Text Matching

被引:192
|
作者
Zhang, Ying [1 ]
Lu, Huchuan [1 ]
机构
[1] Dalian Univ Technol, Dalian, Peoples R China
来源
关键词
Image-text matching; Cross-modal projection; Joint embedding learning; Deep learning;
D O I
10.1007/978-3-030-01246-5_42
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The key point of image-text matching is how to accurately measure the similarity between visual and textual inputs. Despite the great progress of associating the deep cross-modal embeddings with the bi-directional ranking loss, developing the strategies for mining useful triplets and selecting appropriate margins remains a challenge in real applications. In this paper, we propose a cross-modal projection matching (CMPM) loss and a cross-modal projection classification (CMPC) loss for learning discriminative image-text embeddings. The CMPM loss minimizes the KL divergence between the projection compatibility distributions and the normalized matching distributions defined with all the positive and negative samples in a mini-batch. The CMPC loss attempts to categorize the vector projection of representations from one modality onto another with the improved norm-softmax loss, for further enhancing the feature compactness of each class. Extensive analysis and experiments on multiple datasets demonstrate the superiority of the proposed approach.
引用
收藏
页码:707 / 723
页数:17
相关论文
共 50 条
  • [1] Cross-Modal Image-Text Matching via Coupled Projection Learning Hashing
    Zhao, Huan
    Wang, Haoqian
    Zha, Xupeng
    Wang, Song
    [J]. 2022 IEEE 9TH INTERNATIONAL CONFERENCE ON DATA SCIENCE AND ADVANCED ANALYTICS (DSAA), 2022, : 367 - 376
  • [2] Cross-modal Image-Text Retrieval with Multitask Learning
    Luo, Junyu
    Shen, Ying
    Ao, Xiang
    Zhao, Zhou
    Yang, Min
    [J]. PROCEEDINGS OF THE 28TH ACM INTERNATIONAL CONFERENCE ON INFORMATION & KNOWLEDGE MANAGEMENT (CIKM '19), 2019, : 2309 - 2312
  • [3] Cross-modal Semantically Augmented Network for Image-text Matching
    Yao, Tao
    Li, Yiru
    Li, Ying
    Zhu, Yingying
    Wang, Gang
    Yue, Jun
    [J]. ACM TRANSACTIONS ON MULTIMEDIA COMPUTING COMMUNICATIONS AND APPLICATIONS, 2024, 20 (04)
  • [4] Cross-Modal Attention With Semantic Consistence for Image-Text Matching
    Xu, Xing
    Wang, Tan
    Yang, Yang
    Zuo, Lin
    Shen, Fumin
    Shen, Heng Tao
    [J]. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2020, 31 (12) : 5412 - 5425
  • [5] Cross-modal Semantic Interference Suppression for image-text matching
    Yao, Tao
    Peng, Shouyong
    Sun, Yujuan
    Sheng, Guorui
    Fu, Haiyan
    Kong, Xiangwei
    [J]. Engineering Applications of Artificial Intelligence, 2024, 133
  • [6] Cross-modal Graph Matching Network for Image-text Retrieval
    Cheng, Yuhao
    Zhu, Xiaoguang
    Qian, Jiuchao
    Wen, Fei
    Liu, Peilin
    [J]. ACM TRANSACTIONS ON MULTIMEDIA COMPUTING COMMUNICATIONS AND APPLICATIONS, 2022, 18 (04)
  • [7] Cross-modal Semantic Interference Suppression for image-text matching
    Yao, Tao
    Peng, Shouyong
    Sun, Yujuan
    Sheng, Guorui
    Fu, Haiyan
    Kong, Xiangwei
    [J]. ENGINEERING APPLICATIONS OF ARTIFICIAL INTELLIGENCE, 2024, 133
  • [8] Improving Image-Text Matching With Bidirectional Consistency of Cross-Modal Alignment
    Li, Zhe
    Zhang, Lei
    Zhang, Kun
    Zhang, Yongdong
    Mao, Zhendong
    [J]. IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2024, 34 (07) : 6590 - 6607
  • [9] Cross-modal multi-relationship aware reasoning for image-text matching
    Zhang, Jin
    He, Xiaohai
    Qing, Linbo
    Liu, Luping
    Luo, Xiaodong
    [J]. MULTIMEDIA TOOLS AND APPLICATIONS, 2022, 81 (09) : 12005 - 12027
  • [10] Cross-modal multi-relationship aware reasoning for image-text matching
    Jin Zhang
    Xiaohai He
    Linbo Qing
    Luping Liu
    Xiaodong Luo
    [J]. Multimedia Tools and Applications, 2022, 81 : 12005 - 12027