Deep Evidential Learning with Noisy Correspondence for Cross-modal Retrieval

被引:25
|
作者
Qin, Yang [1 ]
Peng, Dezhong [1 ,2 ]
Peng, Xi [1 ]
Wang, Xu [1 ]
Hu, Peng [1 ]
机构
[1] Sichuan Univ, Chengdu, Peoples R China
[2] Chengdu Ruibei Yingte Informat Technol Co Ltd, Chengdu, Peoples R China
基金
中国博士后科学基金; 中国国家自然科学基金;
关键词
Cross-modal retrieval; Image-Text matching; Evidential learning; Noisy correspondence;
D O I
10.1145/3503161.3547922
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
Cross-modal retrieval has been a compelling topic in the multimodal community. Recently, to mitigate the high cost of data collection, the co-occurred pairs (e.g., image and text) could be collected from the Internet as a large-scaled cross-modal dataset, e.g., Conceptual Captions. However, it will unavoidably introduce noise (i.e., mismatched pairs) into training data, dubbed noisy correspondence. Unquestionably, such noise will make supervision information unreliable/uncertain and remarkably degrade the performance. Besides, most existing methods focus training on hard negatives, which will amplify the unreliability of noise. To address the issues, we propose a generalized Deep Evidential Cross-modal Learning framework (DECL), which integrates a novel Cross-modal Evidential Learning paradigm (CEL) and a Robust Dynamic Hinge loss (RDH) with positive and negative learning. CEL could capture and learn the uncertainty brought by noise to improve the robustness and reliability of cross-modal retrieval. Specifically, the bidirectional evidence based on cross-modal similarity is first modeled and parameterized into the Dirichlet distribution, which not only provides accurate uncertainty estimation but also imparts resilience to perturbations against noisy correspondence. To address the amplification problem, RDH smoothly increases the hardness of negatives focused on, thus embracing higher robustness against high noise. Extensive experiments are conducted on three image-text benchmark datasets, i.e., Flickr30K, MS-COCO, and Conceptual Captions, to verify the effectiveness and efficiency of the proposed method. The code is available at https://github.com/QinYang79/DECL.
引用
收藏
页码:4948 / 4956
页数:9
相关论文
共 50 条
  • [1] Learning with Noisy Correspondence for Cross-modal Matching
    Huang, Zhenyu
    Niu, Guocheng
    Liu, Xiao
    Ding, Wenbiao
    Xiao, Xinyan
    Wu, Hua
    Peng, Xi
    [J]. ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 34 (NEURIPS 2021), 2021, 34
  • [2] Learning Cross-Modal Retrieval with Noisy Labels
    Hu, Peng
    Peng, Xi
    Zhu, Hongyuan
    Zhen, Liangli
    Lin, Jie
    [J]. 2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2021, 2021, : 5399 - 5409
  • [3] DCEL: Deep Cross-modal Evidential Learning for Text-Based Person Retrieval
    Li, Shenshen
    Xu, Xing
    Yang, Yang
    Shen, Fumin
    Mo, Yijun
    Li, Yujie
    Shen, Heng Tao
    [J]. PROCEEDINGS OF THE 31ST ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2023, 2023, : 6292 - 6300
  • [4] Cross-Modal Retrieval With Noisy Correspondence via Consistency Refining and Mining
    Ma, Xinran
    Yang, Mouxing
    Li, Yunfan
    Hu, Peng
    Lv, Jiancheng
    Peng, Xi
    [J]. IEEE TRANSACTIONS ON IMAGE PROCESSING, 2024, 33 : 2587 - 2598
  • [5] Cross-Modal Retrieval Using Deep Learning
    Malik, Shaily
    Bhardwaj, Nikhil
    Bhardwaj, Rahul
    Kumar, Saurabh
    [J]. PROCEEDINGS OF THIRD DOCTORAL SYMPOSIUM ON COMPUTATIONAL INTELLIGENCE, DOSCI 2022, 2023, 479 : 725 - 734
  • [6] CROSS-MODAL RETRIEVAL WITH NOISY LABELS
    Mandal, Devraj
    Biswas, Soma
    [J]. 2020 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING (ICIP), 2020, : 2326 - 2330
  • [7] Deep correspondence restricted Boltzmann machine for cross-modal retrieval
    Feng, Fangxiang
    Li, Ruifan
    Wang, Xiaojie
    [J]. NEUROCOMPUTING, 2015, 154 : 50 - 60
  • [8] Neighborhood Learning from Noisy Labels for Cross-Modal Retrieval
    Li, Runhao
    Weng, Zhenyu
    Zhuang, Huiping
    Chen, Yongming
    Lin, Zhiping
    [J]. 2023 IEEE INTERNATIONAL SYMPOSIUM ON CIRCUITS AND SYSTEMS, ISCAS, 2023,
  • [9] Cross-modal Retrieval with Correspondence Autoencoder
    Feng, Fangxiang
    Wang, Xiaojie
    Li, Ruifan
    [J]. PROCEEDINGS OF THE 2014 ACM CONFERENCE ON MULTIMEDIA (MM'14), 2014, : 7 - 16
  • [10] Correspondence Autoencoders for Cross-Modal Retrieval
    Feng, Fangxiang
    Wang, Xiaojie
    Li, Ruifan
    Ahmad, Ibrar
    [J]. ACM TRANSACTIONS ON MULTIMEDIA COMPUTING COMMUNICATIONS AND APPLICATIONS, 2015, 12 (01)