Semantic-enhanced discriminative embedding learning for cross-modal retrieval

被引:1
|
作者
Pan, Hao [1 ,2 ]
Huang, Jun [1 ,2 ]
机构
[1] Univ Chinese Acad Sci, Beijing 100049, Peoples R China
[2] Chinese Acad Sci, Shanghai Adv Res Inst, Shanghai 201210, Peoples R China
关键词
Cross-modal retrieval; Semantic enhanced; Erasing; Metric learning;
D O I
10.1007/s13735-022-00237-6
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Cross-modal retrieval requires the retrieval from image to text and vice versa. Most existing methods leverage attention mechanism to explore advanced encoding network and utilize the ranking losses to reduce modal gap. Although these methods have achieved remarkable performance, they still suffer from some drawbacks that hinder the model from learning discriminative semantic embeddings. For example, the attention mechanism may assign larger weights to irrelevant parts than relevant parts, which prevents the model from learning discriminative attention distribution. In addition, traditional ranking losses could disregard relatively discriminative information due to the lack of appropriate hardest negative sample mining and information weighting schemes. In this paper, in order to alleviate these issues, a novel semantic-enhanced discriminative embedding learning method is proposed to enhance the discriminative ability of the model, which mainly consists of three modules. The attention-guided erasing module enables the attention model pay more attention to the relevant parts and reduce the interferences of irrelevant parts by erasing non-attention parts. The large-scale negative sampling module leverages momentum-updated memory banks to expand the number of negative samples, which helps increase the probability of hardest negative being sampled. Moreover, the weighted InfoNCE loss module designs a weighted scheme to assign a larger weight to a harder pair. We evaluate the proposed modules by integrating them into three existing cross-modal retrieval models. Extensive experiments demonstrate that integrating each proposed module to the existing models can steadily improve the performance of all models.
引用
收藏
页码:369 / 382
页数:14
相关论文
共 50 条
  • [1] Semantic-enhanced discriminative embedding learning for cross-modal retrieval
    Hao Pan
    Jun Huang
    [J]. International Journal of Multimedia Information Retrieval, 2022, 11 : 369 - 382
  • [2] Latent semantic-enhanced discrete hashing for cross-modal retrieval
    Liu, Yun
    Ji, Shujuan
    Fu, Qiang
    Zhao, Jianli
    Zhao, Zhongying
    Gong, Maoguo
    [J]. APPLIED INTELLIGENCE, 2022, 52 (14) : 16004 - 16020
  • [3] Latent semantic-enhanced discrete hashing for cross-modal retrieval
    Yun Liu
    Shujuan Ji
    Qiang Fu
    Jianli Zhao
    Zhongying Zhao
    Maoguo Gong
    [J]. Applied Intelligence, 2022, 52 : 16004 - 16020
  • [4] Scalable semantic-enhanced supervised hashing for cross-modal retrieval
    Yang, Fan
    Ding, Xiaojian
    Liu, Yufeng
    Ma, Fumin
    Cao, Jie
    [J]. KNOWLEDGE-BASED SYSTEMS, 2022, 251
  • [5] Multi-Modal Medical Image Matching Based on Multi-Task Learning and Semantic-Enhanced Cross-Modal Retrieval
    Zhang, Yilin
    [J]. TRAITEMENT DU SIGNAL, 2023, 40 (05) : 2041 - 2049
  • [6] Learning discriminative common alignments for cross-modal retrieval
    Liu, Hui
    Chen, Xiao-Ping
    Hong, Rui
    Zhou, Yan
    Wan, Tian-Cai
    Bai, Tai-Li
    [J]. JOURNAL OF ELECTRONIC IMAGING, 2024, 33 (02)
  • [7] Discriminative semantic transitive consistency for cross-modal learning
    Parida, Kranti Kumar
    Sharma, Gaurav
    [J]. COMPUTER VISION AND IMAGE UNDERSTANDING, 2022, 219
  • [8] Semantic-Enhanced Cross-Modal Fusion for Improved Unsupervised Image Captioning
    Xiang, Nan
    Chen, Ling
    Liang, Leiyan
    Rao, Xingdi
    Gong, Zehao
    [J]. ELECTRONICS, 2023, 12 (17)
  • [9] Discrete semantic embedding hashing for scalable cross-modal retrieval
    Liu, Junjie
    Fei, Lunke
    Jia, Wei
    Zhao, Shuping
    Wen, Jie
    Teng, Shaohua
    Zhang, Wei
    [J]. 2021 IEEE INTERNATIONAL CONFERENCE ON SYSTEMS, MAN, AND CYBERNETICS (SMC), 2021, : 1461 - 1467
  • [10] Polysemous Visual-Semantic Embedding for Cross-Modal Retrieval
    Song, Yale
    Soleymani, Mohammad
    [J]. 2019 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2019), 2019, : 1979 - 1988