ECENet: Explainable and Context-Enhanced Network for Multi-modal Fact Verification

被引:1
|
作者
Zhang, Fanrui [1 ]
Liu, Jiawei [1 ]
Zhang, Qiang [1 ]
Sun, Esther [2 ]
Xie, Jingyi [1 ]
Zha, Zheng-Jun [1 ]
机构
[1] Univ Sci & Technol China, Hefei, Anhui, Peoples R China
[2] Univ Toronto, Toronto, ON, Canada
基金
中国国家自然科学基金; 国家重点研发计划;
关键词
Muti-modal fact verification; Attention mechanism; Deep reinforcement learning; Interpretability;
D O I
10.1145/3581783.3612183
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Recently, falsified claims incorporating both text and images have been disseminated more effectively than those containing text alone, raising significant concerns for multi-modal fact verification. Existing research makes contributions to multi-modal feature extraction and interaction, but fails to fully utilize and enhance the valuable and intricate semantic relationships between distinct features. Moreover, most detectors merely provide a single outcome judgment and lack an inference process or explanation. Taking these factors into account, we propose a novel Explainable and Context-Enhanced Network (ECENet) for multi-modal fact verification, making the first attempt to integrate multi-clue feature extraction, multi-level feature reasoning, and justification (explanation) generation within a unified framework. Specifically, we propose an Improved Coarse- and Fine-grained Attention Network, equipped with two types of level-grained attention mechanisms, to facilitate a comprehensive understanding of contextual information. Furthermore, we propose a novel justification generation module via deep reinforcement learning that does not require additional labels. In this module, a sentence extractor agent measures the importance between the query claim and all document sentences at each time step, selecting a suitable amount of high-scoring sentences to be rewritten as the explanation of the model. Extensive experiments demonstrate the effectiveness of the proposed method.
引用
收藏
页码:1231 / 1240
页数:10
相关论文
共 50 条
  • [41] Event-Enhanced Multi-Modal Spiking Neural Network for Dynamic Obstacle Avoidance
    Wang, Yang
    Dong, Bo
    Zhang, Yuji
    Zhou, Yunduo
    Mei, Haiyang
    Wei, Ziqi
    Yang, Xin
    PROCEEDINGS OF THE 31ST ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2023, 2023, : 3138 - 3148
  • [42] GRNet: a graph reasoning network for enhanced multi-modal learning in scene text recognition
    Jia, Zeguang
    Wang, Jianming
    Jin, Rize
    COMPUTER JOURNAL, 2024,
  • [43] GCMR-Net: A Global Context-Enhanced Multi-scale Residual Network for medical image segmentation
    Shi, Anqi
    Shu, Xin
    Xu, Dan
    Wang, Fang
    Multimedia Systems, 2025, 31 (01)
  • [44] An Explainable Multi-Modal Hierarchical Attention Model for Developing Phishing Threat Intelligence
    Chai, Yidong
    Zhou, Yonghang
    Li, Weifeng
    Jiang, Yuanchun
    IEEE TRANSACTIONS ON DEPENDABLE AND SECURE COMPUTING, 2022, 19 (02) : 790 - 803
  • [45] On Enhancing Usability of Hindi ATM Banking with Multi-Modal UI and Explainable UX
    Dept. of CSE, Graphic Era University, Dehradun, India
    不详
    不详
    World Conf. Commun. Comput., WCONF,
  • [46] CrowdGraph: A Crowdsourcing Multi-modal Knowledge Graph Approach to Explainable Fauxtography Detection
    Kou Z.
    Zhang Y.
    Zhang D.
    Wang D.
    Proceedings of the ACM on Human-Computer Interaction, 2022, 6 (CSCW2)
  • [47] An explainable deep learning pipeline for multi-modal multi-organ medical image segmentation
    Mylona, E.
    Zaridis, D.
    Grigoriadis, G.
    Tachos, N.
    Fotiadis, D. I.
    RADIOTHERAPY AND ONCOLOGY, 2022, 170 : S275 - S276
  • [48] Societal context-dependent multi-modal transportation network augmentation in Johannesburg, South Africa
    Moyo, Thembani
    Kibangou, Alain Y.
    Musakwa, Walter
    PLOS ONE, 2021, 16 (04):
  • [49] Joint learning of video scene detection and annotation via multi-modal adaptive context network
    Xu, Yifei
    Pan, Litong
    Sang, Weiguang
    Luo, Hailun
    Li, Li
    Wei, Pingping
    Zhu, Li
    EXPERT SYSTEMS WITH APPLICATIONS, 2024, 249
  • [50] Answer-checking in Context: A Multi-modal Fully Attention Network for Visual Question Answering
    Huang, Hantao
    Han, Tao
    Han, Wei
    Yap, Deep
    Chiang, Cheng-Ming
    2020 25TH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION (ICPR), 2021, : 1173 - 1180