Synthesizing Counterfactual Samples for Effective Image-Text Matching

被引:2
|
作者
Wei, Hao [1 ,2 ]
Wang, Shuhui [1 ,3 ]
Han, Xinzhe [1 ,2 ]
Xue, Zhe [4 ]
Ma, Bin [5 ]
Wei, Xiaoming [5 ]
Wei, Xiaolin [5 ]
机构
[1] Chinese Acad Sci, Inst Comput Tech, Key Lab Intell Info Proc, Beijing, Peoples R China
[2] Univ Chinese Acad Sci, Beijing, Peoples R China
[3] Peng Cheng Lab, Shenzhen, Peoples R China
[4] BUPT, Beijing Key Lab Intelligent Telecommun Software &, Beijing, Peoples R China
[5] Meituan Inc, Beijing, Peoples R China
基金
国家重点研发计划; 中国国家自然科学基金;
关键词
Image-Text Matching; Hard Negative Mining; Causal Effects; Counterfactual Reasoning; SIMILARITY;
D O I
10.1145/3503161.3547814
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
Image-text matching is a fundamental research topic bridging vision and language. Recent works use hard negative mining to capture the multiple correspondences between visual and textual domains. Unfortunately, the truly informative negative samples are quite sparse in the training data, which are hard to obtain only in a randomly sampled mini-batch. Motivated by causal inference, we aim to overcome this shortcoming by carefully analyzing the analogy between hard negative mining and causal effects optimizing. Further, we propose Counterfactual Matching (CFM) framework for more effective image-text correspondence mining. CFM contains three major components, i.e., Gradient-Guided Feature Selection for automatic casual factor identification, Self-Exploration for causal factor completeness, and Self-Adjustment for counterfactual sample synthesis. Compared with traditional hard negative mining, our method largely alleviates the over-fitting phenomenon and effectively captures the fine-grained correlations between image and text modality. We evaluate our CFM in combination with three state-of-the-art image-text matching architectures. Quantitative and qualitative experiments conducted on two publicly available datasets demonstrate its strong generality and effectiveness. Code is available at https://github.com/weihao20/cfm.
引用
收藏
页码:4355 / 4364
页数:10
相关论文
共 50 条
  • [41] Bi-Attention enhanced representation learning for image-text matching
    Tian, Yumin
    Ding, Aqiang
    Wang, Di
    Luo, Xuemei
    Wan, Bo
    Wang, Yifeng
    [J]. PATTERN RECOGNITION, 2023, 140
  • [42] Unified Adaptive Relevance Distinguishable Attention Network for Image-Text Matching
    Zhang, Kun
    Mao, Zhendong
    Liu, An-An
    Zhang, Yongdong
    [J]. IEEE TRANSACTIONS ON MULTIMEDIA, 2023, 25 : 1320 - 1332
  • [43] Image-text interaction graph neural network for image-text sentiment analysis
    Wenxiong Liao
    Bi Zeng
    Jianqi Liu
    Pengfei Wei
    Jiongkun Fang
    [J]. Applied Intelligence, 2022, 52 : 11184 - 11198
  • [44] Image-text interaction graph neural network for image-text sentiment analysis
    Liao, Wenxiong
    Zeng, Bi
    Liu, Jianqi
    Wei, Pengfei
    Fang, Jiongkun
    [J]. APPLIED INTELLIGENCE, 2022, 52 (10) : 11184 - 11198
  • [45] Multi-scale motivated neural network for image-text matching
    Qin, Xueyang
    Li, Lishuang
    Pang, Guangyao
    [J]. MULTIMEDIA TOOLS AND APPLICATIONS, 2023, 83 (2) : 4383 - 4407
  • [46] Dual-View Semantic Inference Network for image-text matching
    Wu, Chunlei
    Wu, Jie
    Cao, Haiwen
    Wei, Yiwei
    Wang, Leiquan
    [J]. NEUROCOMPUTING, 2021, 426 : 47 - 57
  • [47] Identification of Necessary Semantic Undertakers in the Causal View for Image-Text Matching
    Zhang, Huatian
    Zhang, Lei
    Zhang, Kun
    Mao, Zhendong
    [J]. THIRTY-EIGHTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 38 NO 7, 2024, : 7105 - 7114
  • [48] Deep Cross-Modal Projection Learning for Image-Text Matching
    Zhang, Ying
    Lu, Huchuan
    [J]. COMPUTER VISION - ECCV 2018, PT I, 2018, 11205 : 707 - 723
  • [49] EXPLORING ENTITY-LEVEL SPATIAL RELATIONSHIPS FOR IMAGE-TEXT MATCHING
    Xia, Yaxian
    Huang, Lun
    Wang, Wenmin
    Wei, Xiao-Yong
    Chen, Jie
    [J]. 2020 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2020, : 4452 - 4456
  • [50] Breaking Through the Noisy Correspondence: A Robust Model for Image-Text Matching
    Shi, Haitao
    Liu, Meng
    Mu, Xiaoxuan
    Song, Xuemeng
    Hu, Yupeng
    Nie, Liqiang
    [J]. ACM Transactions on Information Systems, 2024, 42 (06)