Learning with Noisy Correspondence for Cross-modal Matching

被引:0
|
作者
Huang, Zhenyu [1 ,2 ]
Niu, Guocheng [2 ]
Liu, Xiao [3 ]
Ding, Wenbiao [3 ]
Xiao, Xinyan [2 ]
Wu, Hua [2 ]
Peng, Xi [1 ]
机构
[1] Sichuan Univ, Coll Comp Sci, Chengdu, Peoples R China
[2] Baidu Inc, Beijing, Peoples R China
[3] TAL Educ Grp, Beijing, Peoples R China
基金
国家重点研发计划;
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Cross-modal matching, which aims to establish the correspondence between two different modalities, is fundamental to a variety of tasks such as cross-modal retrieval and vision-and-language understanding. Although a huge number of cross-modal matching methods have been proposed and achieved remarkable progress in recent years, almost all of these methods implicitly assume that the multimodal training data are correctly aligned. In practice, however, such an assumption is extremely expensive even impossible to satisfy. Based on this observation, we reveal and study a latent and challenging direction in cross-modal matching, named noisy correspondence, which could be regarded as a new paradigm of noisy labels. Different from the traditional noisy labels which mainly refer to the errors in category labels, our noisy correspondence refers to the mismatch paired samples. To solve this new problem, we propose a novel method for learning with noisy correspondence, named Noisy Correspondence Rectifier (NCR). In brief, NCR divides the data into clean and noisy partitions based on the memorization effect of neural networks and then rectifies the correspondence via an adaptive prediction model in a co-teaching manner. To verify the effectiveness of our method, we conduct experiments by using the image-text matching as a showcase. Extensive experiments on Flickr30K, MS-COCO, and Conceptual Captions verify the effectiveness of our method. The code could be accessed from www.pengxi. me.
引用
收藏
页数:14
相关论文
共 50 条
  • [1] Learning From Noisy Correspondence With Tri-Partition for Cross-Modal Matching
    Feng, Zerun
    Zeng, Zhimin
    Guo, Caili
    Li, Zheng
    Hu, Lin
    [J]. IEEE TRANSACTIONS ON MULTIMEDIA, 2024, 26 : 3884 - 3896
  • [2] UGNCL: Uncertainty-Guided Noisy Correspondence Learning for Efficient Cross-Modal Matching
    Zha, Quanxing
    Liu, Xin
    Cheung, Yiu-ming
    Xu, Xing
    Wang, Nannan
    Cao, Jianjia
    [J]. PROCEEDINGS OF THE 47TH INTERNATIONAL ACM SIGIR CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION RETRIEVAL, SIGIR 2024, 2024, : 852 - 861
  • [3] Deep Evidential Learning with Noisy Correspondence for Cross-modal Retrieval
    Qin, Yang
    Peng, Dezhong
    Peng, Xi
    Wang, Xu
    Hu, Peng
    [J]. PROCEEDINGS OF THE 30TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2022, 2022, : 4948 - 4956
  • [4] Learning Cross-Modal Retrieval with Noisy Labels
    Hu, Peng
    Peng, Xi
    Zhu, Hongyuan
    Zhen, Liangli
    Lin, Jie
    [J]. 2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2021, 2021, : 5399 - 5409
  • [5] Negative Pre-aware for Noisy Cross-Modal Matching
    Zhang, Xu
    Li, Hao
    Ye, Mang
    [J]. THIRTY-EIGHTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 38 NO 7, 2024, : 7341 - 7349
  • [6] Quaternion Representation Learning for cross-modal matching
    Wang, Zheng
    Xu, Xing
    Wei, Jiwei
    Xie, Ning
    Shao, Jie
    Yang, Yang
    [J]. KNOWLEDGE-BASED SYSTEMS, 2023, 270
  • [7] Cross-Modal Retrieval With Noisy Correspondence via Consistency Refining and Mining
    Ma, Xinran
    Yang, Mouxing
    Li, Yunfan
    Hu, Peng
    Lv, Jiancheng
    Peng, Xi
    [J]. IEEE TRANSACTIONS ON IMAGE PROCESSING, 2024, 33 : 2587 - 2598
  • [8] Disentangled Representation Learning for Cross-Modal Biometric Matching
    Ning, Hailong
    Zheng, Xiangtao
    Lu, Xiaoqiang
    Yuan, Yuan
    [J]. IEEE TRANSACTIONS ON MULTIMEDIA, 2022, 24 : 1763 - 1774
  • [9] Learning Coupled Feature Spaces for Cross-modal Matching
    Wang, Kaiye
    He, Ran
    Wang, Wei
    Wang, Liang
    Tan, Tieniu
    [J]. 2013 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2013, : 2088 - 2095
  • [10] Neighborhood Learning from Noisy Labels for Cross-Modal Retrieval
    Li, Runhao
    Weng, Zhenyu
    Zhuang, Huiping
    Chen, Yongming
    Lin, Zhiping
    [J]. 2023 IEEE INTERNATIONAL SYMPOSIUM ON CIRCUITS AND SYSTEMS, ISCAS, 2023,