Identification of Necessary Semantic Undertakers in the Causal View for Image-Text Matching

被引:0
|
作者
Zhang, Huatian [1 ]
Zhang, Lei [1 ]
Zhang, Kun [1 ]
Mao, Zhendong [1 ]
机构
[1] Univ Sci & Technol China, Hefei, Peoples R China
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Image-text matching bridges vision and language, which is a fundamental task in multimodal intelligence. Its key challenge lies in how to capture visual-semantic relevance. Fine-grained semantic interactions come from fragment alignments between image regions and text words. However, not all fragments contribute to image-text relevance, and many existing methods are devoted to mining the vital ones to measure the relevance accurately. How well image and text relate depends on the degree of semantic sharing between them. Treating the degree as an effect and fragments as its possible causes, we define those indispensable causes for the generation of the degree as necessary undertakers, i.e., if any of them did not occur, the relevance would be no longer valid. In this paper, we revisit image-text matching in the causal view and uncover inherent causal properties of relevance generation. Then we propose a novel theoretical prototype for estimating the probability-of-necessity of fragments, PNf, for the degree of semantic sharing by means of causal inference, and further design a Necessary Undertaker Identification Framework (NUIF) for image-text matching, which explicitly formalizes the fragment's contribution to imagetext relevance by modeling PNf in two ways. Extensive experiments show that our method achieves state-of-the-art on benchmarks Flickr30K and MSCOCO.
引用
收藏
页码:7105 / 7114
页数:10
相关论文
共 50 条
  • [21] Characterization and classification of semantic image-text relations
    Christian Otto
    Matthias Springstein
    Avishek Anand
    Ralph Ewerth
    [J]. International Journal of Multimedia Information Retrieval, 2020, 9 : 31 - 45
  • [22] Bi-Directional Spatial-Semantic Attention Networks for Image-Text Matching
    Huang, Feiran
    Zhang, Xiaoming
    Zhao, Zhonghua
    Li, Zhoujun
    [J]. IEEE TRANSACTIONS ON IMAGE PROCESSING, 2019, 28 (04) : 2008 - 2020
  • [23] Mutil-level Local Alignment and Semantic Matching Network for Image-Text Retrieval
    Jiang, Zhukai
    Lian, Zhichao
    [J]. ARTIFICIAL NEURAL NETWORKS AND MACHINE LEARNING - ICANN 2022, PT III, 2022, 13531 : 212 - 224
  • [24] Characterization and classification of semantic image-text relations
    Otto, Christian
    Springstein, Matthias
    Anand, Avishek
    Ewerth, Ralph
    [J]. INTERNATIONAL JOURNAL OF MULTIMEDIA INFORMATION RETRIEVAL, 2020, 9 (01) : 31 - 45
  • [25] Similarity Reasoning and Filtration for Image-Text Matching
    Diao, Haiwen
    Zhang, Ying
    Ma, Lin
    Lu, Huchuan
    [J]. THIRTY-FIFTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, THIRTY-THIRD CONFERENCE ON INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE AND THE ELEVENTH SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2021, 35 : 1218 - 1226
  • [26] Asymmetric Polysemous Reasoning for Image-Text Matching
    Zhang, Hongping
    Yang, Ming
    [J]. 2023 23RD IEEE INTERNATIONAL CONFERENCE ON DATA MINING WORKSHOPS, ICDMW 2023, 2023, : 1013 - 1022
  • [27] Fusion layer attention for image-text matching
    Wang, Depeng
    Wang, Liejun
    Song, Shiji
    Huang, Gao
    Guo, Yuchen
    Cheng, Shuli
    Ao, Naixiang
    Du, Anyu
    [J]. NEUROCOMPUTING, 2021, 442 : 249 - 259
  • [28] Semantic Completion and Filtration for Image-Text Retrieval
    Yang, Song
    Li, Qiang
    Li, Wenhui
    Li, Xuan-Ya
    Jin, Ran
    Lv, Bo
    Wang, Rui
    Liu, Anan
    [J]. ACM TRANSACTIONS ON MULTIMEDIA COMPUTING COMMUNICATIONS AND APPLICATIONS, 2023, 19 (04)
  • [29] Stacked Cross Attention for Image-Text Matching
    Lee, Kuang-Huei
    Chen, Xi
    Hua, Gang
    Hu, Houdong
    He, Xiaodong
    [J]. COMPUTER VISION - ECCV 2018, PT IV, 2018, 11208 : 212 - 228
  • [30] Context-Aware Multi-View Summarization Network for Image-Text Matching
    Qu, Leigang
    Liu, Meng
    Cao, Da
    Nie, Liqiang
    Tian, Qi
    [J]. MM '20: PROCEEDINGS OF THE 28TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, 2020, : 1047 - 1055