DisAVR: Disentangled Adaptive Visual Reasoning Network for Diagram Question Answering

被引:0
|
作者
Wang, Yaxian [1 ]
Wei, Bifan [1 ]
Liu, Jun [1 ]
Zhang, Lingling [2 ]
Wang, Jiaxin [2 ]
Wang, Qianying [3 ]
机构
[1] Xi An Jiao Tong Univ, Sch Comp Sci & Technol, Natl Engn Lab Big Data Analyt, Xian 710049, Shaanxi, Peoples R China
[2] Xi An Jiao Tong Univ, Key Lab Intelligent Networks & Network Secur, Minist Educ, Xian 710049, Shaanxi, Peoples R China
[3] Lenovo Res, Beijing 100094, Peoples R China
基金
中国国家自然科学基金;
关键词
Cognition; Visualization; Task analysis; Question answering (information retrieval); Semantics; Geometry; Routing; Diagram understanding; visual reasoning; diagram question answering;
D O I
10.1109/TIP.2023.3306910
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Diagram Question Answering (DQA) aims to correctly answer questions about given diagrams, which demands an interplay of good diagram understanding and effective reasoning. However, the same appearance of objects in diagrams can express different semantics. This kind of visual semantic ambiguity problem makes it challenging to represent diagrams sufficiently for better understanding. Moreover, since there are questions about diagrams from different perspectives, it is also crucial to perform flexible and adaptive reasoning on content-rich diagrams. In this paper, we propose a Disentangled Adaptive Visual Reasoning Network for DQA, named DisAVR, to jointly optimize the dual-process of representation and reasoning. DisAVR mainly comprises three modules: improved region feature learning, question parsing, and disentangled adaptive reasoning. Specifically, the improved region feature learning module is designed to first learn robust diagram representation by integrating detail-aware patch features and semantically-explicit text features with region features. Subsequently, the question parsing module decomposes the question into three types of question guidance including region, spatial relation and semantic relation guidance to dynamically guide subsequent reasoning. Next, the disentangled adaptive reasoning module decomposes the whole reasoning process by employing three visual reasoning cells to construct a soft fully-connected multi-layer stacked routing space. These three cells in each layer reason over object regions, semantic and spatial relations in the diagram under the corresponding question guidance. Moreover, an adaptive routing mechanism is designed to flexibly explore more optimal reasoning paths for specific diagram-question pairs. Extensive experiments on three DQA datasets demonstrate the superiority of our DisAVR.
引用
收藏
页码:4812 / 4827
页数:16
相关论文
共 50 条
  • [21] INTERPRETABLE VISUAL QUESTION ANSWERING VIA REASONING SUPERVISION
    Parelli, Maria
    Mallis, Dimitrios
    Diomataris, Markos
    Pitsikalis, Vassilis
    [J]. 2023 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING, ICIP, 2023, : 2525 - 2529
  • [22] Maintaining Reasoning Consistency in Compositional Visual Question Answering
    Jing, Chenchen
    Jia, Yunde
    Wu, Yuwei
    Liu, Xinyu
    Wu, Qi
    [J]. 2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2022), 2022, : 5089 - 5098
  • [23] MUREL: Multimodal Relational Reasoning for Visual Question Answering
    Cadene, Remi
    Ben-younes, Hedi
    Cord, Matthieu
    Thome, Nicolas
    [J]. 2019 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2019), 2019, : 1989 - 1998
  • [24] A DIAGNOSTIC STUDY OF VISUAL QUESTION ANSWERING WITH ANALOGICAL REASONING
    Huang, Ziqi
    Zhu, Hongyuan
    Sun, Ying
    Choi, Dongkyu
    Tan, Cheston
    Lim, Joo-Hwee
    [J]. 2021 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING (ICIP), 2021, : 2463 - 2467
  • [25] Visual Question Answering reasoning with external knowledge based on bimodal graph neural network
    Yang, Zhenyu
    Wu, Lei
    Wen, Peian
    Chen, Peng
    [J]. ELECTRONIC RESEARCH ARCHIVE, 2023, 31 (04): : 1948 - 1965
  • [26] Visual Question Answering Research on Joint Knowledge and Visual Information Reasoning
    Su, Zhenqiang
    Gou, Gang
    [J]. Computer Engineering and Applications, 2024, 60 (05) : 95 - 102
  • [27] Self-Critical Reasoning for Robust Visual Question Answering
    Wu, Jialin
    Mooney, Raymond J.
    [J]. ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 32 (NIPS 2019), 2019, 32
  • [28] LoRA: A Logical Reasoning Augmented Dataset for Visual Question Answering
    Gao, Jingying
    Wu, Qi
    Blair, Alan
    Pagnucco, Maurice
    [J]. ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023), 2023,
  • [29] Explicit Knowledge-based Reasoning for Visual Question Answering
    Wang, Peng
    Wu, Qi
    Shen, Chunhua
    Dick, Anthony
    van den Hengel, Anton
    [J]. PROCEEDINGS OF THE TWENTY-SIXTH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2017, : 1290 - 1296
  • [30] Weakly Supervised Relative Spatial Reasoning for Visual Question Answering
    Banerjee, Pratyay
    Gokhale, Tejas
    Yang, Yezhou
    Baral, Chitta
    [J]. 2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2021), 2021, : 1888 - 1898