Integrating Language Guidance Into Image-Text Matching for Correcting False Negatives

被引:3
|
作者
Li, Zheng [1 ]
Guo, Caili [1 ]
Feng, Zerun [1 ]
Hwang, Jenq-Neng [2 ]
Du, Zhongtian [3 ]
机构
[1] Beijing Univ Posts & Telecommun, Sch Informat & Commun Engn, Beijing 100876, Peoples R China
[2] Univ Washington, Dept Elect Engn, Seattle, WA 98105 USA
[3] China Telecom Digital Intelligence Technol Co Ltd, Beijing 100035, Peoples R China
关键词
Correcting false negatives; image-text matching; language guidance; RELEVANCE FEEDBACK; REPRESENTATION;
D O I
10.1109/TMM.2023.3261443
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Image-Text Matching (ITM) aims to establish the correspondence between images and sentences. ITM is fundamental to various vision and language understanding tasks. However, there are limitations in the way existing ITM benchmarks are constructed. The ITM benchmark collects pairs of images and sentences during construction. Therefore, only samples that are paired at collection are annotated as positive. All other samples are annotated as negative. Many correlations are missed in these samples that are annotated as negative. For example, a sentence matches only one image at the time of collection. Only this image is annotated as positive for the sentence. All other images are annotated as negative. However, these negative images may contain images that correspond to the sentences. These mislabeled samples are called false negatives. Existing ITM models are optimized based on annotations containing mislabels, which can introduce noise during training. In this paper, we propose an ITM framework integrating Language Guidance (LG) for correcting false negatives. A language pre-training model is introduced into the ITM framework to identify false negatives. To correct false negatives, we propose language guidance loss, which adaptively corrects the locations of false negatives in the visual-semantic embedding space. Extensive experiments on two ITM benchmarks show that our method can improve the performance of existing ITM models. To verify the performance of correcting false negatives, we conduct further experiments on ECCV Caption. ECCV Caption is a verified dataset where false negatives in annotations have been corrected. The experimental results show that our method can recall more relevant false negatives.
引用
收藏
页码:103 / 116
页数:14
相关论文
共 50 条
  • [1] Improving Image-Text Matching by Integrating Word Sense Disambiguation
    Pu, Xiao
    Yang, Ping
    Yuan, Lin
    Gao, Xinbo
    IEEE Signal Processing Letters, 2024, 31 : 2695 - 2699
  • [2] Learning and Integrating Multi-Level Matching Features for Image-Text Retrieval
    Lan, Hong
    Zhang, Pufen
    IEEE SIGNAL PROCESSING LETTERS, 2022, 29 : 374 - 378
  • [3] Similarity Reasoning and Filtration for Image-Text Matching
    Diao, Haiwen
    Zhang, Ying
    Ma, Lin
    Lu, Huchuan
    THIRTY-FIFTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, THIRTY-THIRD CONFERENCE ON INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE AND THE ELEVENTH SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2021, 35 : 1218 - 1226
  • [4] Asymmetric Polysemous Reasoning for Image-Text Matching
    Zhang, Hongping
    Yang, Ming
    2023 23RD IEEE INTERNATIONAL CONFERENCE ON DATA MINING WORKSHOPS, ICDMW 2023, 2023, : 1013 - 1022
  • [5] Fusion layer attention for image-text matching
    Wang, Depeng
    Wang, Liejun
    Song, Shiji
    Huang, Gao
    Guo, Yuchen
    Cheng, Shuli
    Ao, Naixiang
    Du, Anyu
    NEUROCOMPUTING, 2021, 442 : 249 - 259
  • [6] Visual Semantic Reasoning for Image-Text Matching
    Li, Kunpeng
    Zhang, Yulun
    Li, Kai
    Li, Yuanyuan
    Fu, Yun
    2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2019), 2019, : 4653 - 4661
  • [7] Stacked Cross Attention for Image-Text Matching
    Lee, Kuang-Huei
    Chen, Xi
    Hua, Gang
    Hu, Houdong
    He, Xiaodong
    COMPUTER VISION - ECCV 2018, PT IV, 2018, 11208 : 212 - 228
  • [8] IMAGE-TEXT MATCHING WITH SHARED SEMANTIC CONCEPTS
    Miao Lanxin
    2022 19TH INTERNATIONAL COMPUTER CONFERENCE ON WAVELET ACTIVE MEDIA TECHNOLOGY AND INFORMATION PROCESSING (ICCWAMTIP), 2022,
  • [9] Giving Text More Imagination Space for Image-text Matching
    Dong, Xinfeng
    Han, Longfei
    Zhang, Dingwen
    Liu, Li
    Han, Junwei
    Zhang, Huaxiang
    PROCEEDINGS OF THE 31ST ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2023, 2023, : 6359 - 6368
  • [10] Your Negative May not Be True Negative: Boosting Image-Text Matching with False Negative Elimination
    Li, Haoxuan
    Bin, Yi
    Liao, Junrong
    Yang, Yang
    Shen, Heng Tao
    PROCEEDINGS OF THE 31ST ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2023, 2023, : 924 - 934