Breaking Through the Noisy Correspondence: A Robust Model for Image-Text Matching

被引:0
|
作者
Shi, Haitao [1 ]
Liu, Meng [2 ]
Mu, Xiaoxuan [1 ]
Song, Xuemeng [3 ]
Hu, Yupeng [1 ]
Nie, Liqiang [4 ]
机构
[1] Shandong Univ, Sch Software, Jinan, Peoples R China
[2] Shandong Jianzhu Univ, Sch Comp Sci & Technol, Jinan, Peoples R China
[3] Shandong Univ, Sch Comp Sci andTechnol, Qingdao, Peoples R China
[4] Harbin Inst Technol Shenzhen, Sch Comp Sci & Technol, Shenzhen, Peoples R China
基金
中国国家自然科学基金;
关键词
Cross-model retrieval; image-text matching; noisy correspondence; simi- larity distribution modeling;
D O I
10.1145/3662732
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Unleashing the power of image-text matching in real-world applications is hampered by noisy correspondence. Manually curating high-quality datasets is expensive and time-consuming, and datasets generated using diffusion models are not adequately well-aligned. The most promising way is to collect image-text pairs from the Internet, but it will inevitably introduce noisy correspondence. To reduce the negative impact of noisy correspondence, we propose a novel model that first transforms the noisy correspondence filtering problem into a similarity distribution modeling problem by exploiting the powerful capabilities of pre-trained models. Specifically, we use the Gaussian Mixture model to model the similarity obtained by CLIP as clean distribution and noisy distribution, to filter out most of the noisy correspondence in the dataset. Afterward, we used relatively clean data to fine-tune the model. To further reduce the negative impact of unfiltered noisy correspondence, i.e., a minimal part where two distributions intersect during the fine-tuning process, we propose a distribution-sensitive dynamic margin ranking loss, further increasing the distance between the two distributions. Through continuous iteration, the noisy correspondence gradually decreases and the model performance gradually improves. Our extensive experiments demonstrate the effectiveness and robustness of our model even under high noise rates.
引用
收藏
页数:26
相关论文
共 50 条
  • [1] More Grounded Image Captioning by Distilling Image-Text Matching Model
    Zhou, Yuanen
    Wang, Meng
    Liu, Daqing
    Hu, Zhenzhen
    Zhang, Hanwang
    2020 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2020, : 4776 - 4785
  • [2] Location Attention Knowledge Embedding Model for Image-Text Matching
    Xu, Guoqing
    Hu, Min
    Wang, Xiaohua
    Yang, Jiaoyun
    Li, Nan
    Zhang, Qingyu
    PATTERN RECOGNITION AND COMPUTER VISION, PRCV 2023, PT I, 2024, 14425 : 408 - 421
  • [3] Image-Text Matching Model Based on CLIP Bimodal Encoding
    Zhu, Yihuan
    Xu, Honghua
    Du, Ailin
    Wang, Bin
    APPLIED SCIENCES-BASEL, 2024, 14 (22):
  • [4] Similarity Reasoning and Filtration for Image-Text Matching
    Diao, Haiwen
    Zhang, Ying
    Ma, Lin
    Lu, Huchuan
    THIRTY-FIFTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, THIRTY-THIRD CONFERENCE ON INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE AND THE ELEVENTH SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2021, 35 : 1218 - 1226
  • [5] Asymmetric Polysemous Reasoning for Image-Text Matching
    Zhang, Hongping
    Yang, Ming
    2023 23RD IEEE INTERNATIONAL CONFERENCE ON DATA MINING WORKSHOPS, ICDMW 2023, 2023, : 1013 - 1022
  • [6] Visual Semantic Reasoning for Image-Text Matching
    Li, Kunpeng
    Zhang, Yulun
    Li, Kai
    Li, Yuanyuan
    Fu, Yun
    2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2019), 2019, : 4653 - 4661
  • [7] IMAGE-TEXT MATCHING WITH SHARED SEMANTIC CONCEPTS
    Miao Lanxin
    2022 19TH INTERNATIONAL COMPUTER CONFERENCE ON WAVELET ACTIVE MEDIA TECHNOLOGY AND INFORMATION PROCESSING (ICCWAMTIP), 2022,
  • [8] Fusion layer attention for image-text matching
    Wang, Depeng
    Wang, Liejun
    Song, Shiji
    Huang, Gao
    Guo, Yuchen
    Cheng, Shuli
    Ao, Naixiang
    Du, Anyu
    NEUROCOMPUTING, 2021, 442 : 249 - 259
  • [9] Stacked Cross Attention for Image-Text Matching
    Lee, Kuang-Huei
    Chen, Xi
    Hua, Gang
    Hu, Houdong
    He, Xiaodong
    COMPUTER VISION - ECCV 2018, PT IV, 2018, 11208 : 212 - 228
  • [10] Giving Text More Imagination Space for Image-text Matching
    Dong, Xinfeng
    Han, Longfei
    Zhang, Dingwen
    Liu, Li
    Han, Junwei
    Zhang, Huaxiang
    PROCEEDINGS OF THE 31ST ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2023, 2023, : 6359 - 6368