Semantic-enhanced discriminative embedding learning for cross-modal retrieval

被引:1
|
作者
Pan, Hao [1 ,2 ]
Huang, Jun [1 ,2 ]
机构
[1] Univ Chinese Acad Sci, Beijing 100049, Peoples R China
[2] Chinese Acad Sci, Shanghai Adv Res Inst, Shanghai 201210, Peoples R China
关键词
Cross-modal retrieval; Semantic enhanced; Erasing; Metric learning;
D O I
10.1007/s13735-022-00237-6
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Cross-modal retrieval requires the retrieval from image to text and vice versa. Most existing methods leverage attention mechanism to explore advanced encoding network and utilize the ranking losses to reduce modal gap. Although these methods have achieved remarkable performance, they still suffer from some drawbacks that hinder the model from learning discriminative semantic embeddings. For example, the attention mechanism may assign larger weights to irrelevant parts than relevant parts, which prevents the model from learning discriminative attention distribution. In addition, traditional ranking losses could disregard relatively discriminative information due to the lack of appropriate hardest negative sample mining and information weighting schemes. In this paper, in order to alleviate these issues, a novel semantic-enhanced discriminative embedding learning method is proposed to enhance the discriminative ability of the model, which mainly consists of three modules. The attention-guided erasing module enables the attention model pay more attention to the relevant parts and reduce the interferences of irrelevant parts by erasing non-attention parts. The large-scale negative sampling module leverages momentum-updated memory banks to expand the number of negative samples, which helps increase the probability of hardest negative being sampled. Moreover, the weighted InfoNCE loss module designs a weighted scheme to assign a larger weight to a harder pair. We evaluate the proposed modules by integrating them into three existing cross-modal retrieval models. Extensive experiments demonstrate that integrating each proposed module to the existing models can steadily improve the performance of all models.
引用
收藏
页码:369 / 382
页数:14
相关论文
共 50 条
  • [41] Semantic consistency hashing for cross-modal retrieval
    Yao, Tao
    Kong, Xiangwei
    Fu, Haiyan
    Tian, Qi
    [J]. NEUROCOMPUTING, 2016, 193 : 250 - 259
  • [42] Towards learning a semantic-consistent subspace for cross-modal retrieval
    Xu, Meixiang
    Zhu, Zhenfeng
    Zhao, Yao
    [J]. MULTIMEDIA TOOLS AND APPLICATIONS, 2019, 78 (01) : 389 - 412
  • [43] Analyzing semantic correlation for cross-modal retrieval
    Xie, Liang
    Pan, Peng
    Lu, Yansheng
    [J]. MULTIMEDIA SYSTEMS, 2015, 21 (06) : 525 - 539
  • [44] Learning Semantic Structure-preserved Embeddings for Cross-modal Retrieval
    Wu, Yiling
    Wang, Shuhui
    Huang, Qingming
    [J]. PROCEEDINGS OF THE 2018 ACM MULTIMEDIA CONFERENCE (MM'18), 2018, : 825 - 833
  • [45] Towards learning a semantic-consistent subspace for cross-modal retrieval
    Meixiang Xu
    Zhenfeng Zhu
    Yao Zhao
    [J]. Multimedia Tools and Applications, 2019, 78 : 389 - 412
  • [46] HCMSL: Hybrid Cross-modal Similarity Learning for Cross-modal Retrieval
    Zhang, Chengyuan
    Song, Jiayu
    Zhu, Xiaofeng
    Zhu, Lei
    Zhang, Shichao
    [J]. ACM TRANSACTIONS ON MULTIMEDIA COMPUTING COMMUNICATIONS AND APPLICATIONS, 2021, 17 (01)
  • [47] Cross-Modal Food Retrieval: Learning a Joint Embedding of Food Images and Recipes With Semantic Consistency and Attention Mechanism
    Wang, Hao
    Sahoo, Doyen
    Liu, Chenghao
    Shu, Ke
    Achananuparp, Palakorn
    Lim, Ee-peng
    Hoi, Steven C. H.
    [J]. IEEE TRANSACTIONS ON MULTIMEDIA, 2022, 24 : 2515 - 2525
  • [48] Multi-modal semantic autoencoder for cross-modal retrieval
    Wu, Yiling
    Wang, Shuhui
    Huang, Qingming
    [J]. NEUROCOMPUTING, 2019, 331 : 165 - 175
  • [49] Learning DALTS for cross-modal retrieval
    Yu, Zheng
    Wang, Wenmin
    [J]. CAAI TRANSACTIONS ON INTELLIGENCE TECHNOLOGY, 2019, 4 (01) : 9 - 16
  • [50] Continual learning in cross-modal retrieval
    Wang, Kai
    Herranz, Luis
    van de Weijer, Joost
    [J]. 2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION WORKSHOPS, CVPRW 2021, 2021, : 3623 - 3633