DisAVR: Disentangled Adaptive Visual Reasoning Network for Diagram Question Answering

被引:0
|
作者
Wang, Yaxian [1 ]
Wei, Bifan [1 ]
Liu, Jun [1 ]
Zhang, Lingling [2 ]
Wang, Jiaxin [2 ]
Wang, Qianying [3 ]
机构
[1] Xi An Jiao Tong Univ, Sch Comp Sci & Technol, Natl Engn Lab Big Data Analyt, Xian 710049, Shaanxi, Peoples R China
[2] Xi An Jiao Tong Univ, Key Lab Intelligent Networks & Network Secur, Minist Educ, Xian 710049, Shaanxi, Peoples R China
[3] Lenovo Res, Beijing 100094, Peoples R China
基金
中国国家自然科学基金;
关键词
Cognition; Visualization; Task analysis; Question answering (information retrieval); Semantics; Geometry; Routing; Diagram understanding; visual reasoning; diagram question answering;
D O I
10.1109/TIP.2023.3306910
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Diagram Question Answering (DQA) aims to correctly answer questions about given diagrams, which demands an interplay of good diagram understanding and effective reasoning. However, the same appearance of objects in diagrams can express different semantics. This kind of visual semantic ambiguity problem makes it challenging to represent diagrams sufficiently for better understanding. Moreover, since there are questions about diagrams from different perspectives, it is also crucial to perform flexible and adaptive reasoning on content-rich diagrams. In this paper, we propose a Disentangled Adaptive Visual Reasoning Network for DQA, named DisAVR, to jointly optimize the dual-process of representation and reasoning. DisAVR mainly comprises three modules: improved region feature learning, question parsing, and disentangled adaptive reasoning. Specifically, the improved region feature learning module is designed to first learn robust diagram representation by integrating detail-aware patch features and semantically-explicit text features with region features. Subsequently, the question parsing module decomposes the question into three types of question guidance including region, spatial relation and semantic relation guidance to dynamically guide subsequent reasoning. Next, the disentangled adaptive reasoning module decomposes the whole reasoning process by employing three visual reasoning cells to construct a soft fully-connected multi-layer stacked routing space. These three cells in each layer reason over object regions, semantic and spatial relations in the diagram under the corresponding question guidance. Moreover, an adaptive routing mechanism is designed to flexibly explore more optimal reasoning paths for specific diagram-question pairs. Extensive experiments on three DQA datasets demonstrate the superiority of our DisAVR.
引用
收藏
页码:4812 / 4827
页数:16
相关论文
共 50 条
  • [1] Disentangled Retrieval and Reasoning for Implicit Question Answering
    Liu, Qian
    Geng, Xiubo
    Wang, Yu
    Cambria, Erik
    Jiang, Daxin
    [J]. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2024, 35 (06) : 7804 - 7815
  • [2] Relational reasoning and adaptive fusion for visual question answering
    Shen, Xiang
    Han, Dezhi
    Zong, Liang
    Guo, Zihan
    Hua, Jie
    [J]. APPLIED INTELLIGENCE, 2024, 54 (06) : 5062 - 5080
  • [3] Semantic Relation Graph Reasoning Network for Visual Question Answering
    Lan, Hong
    Zhang, Pufen
    [J]. TWELFTH INTERNATIONAL CONFERENCE ON SIGNAL PROCESSING SYSTEMS, 2021, 11719
  • [4] ADAPTIVE ATTENTION FUSION NETWORK FOR VISUAL QUESTION ANSWERING
    Gu, Geonmo
    Kim, Seong Tae
    Ro, Yong Man
    [J]. 2017 IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXPO (ICME), 2017, : 997 - 1002
  • [5] Sequential Visual Reasoning for Visual Question Answering
    Liu, Jinlai
    Wu, Chenfei
    Wang, Xiaojie
    Dong, Xuan
    [J]. PROCEEDINGS OF 2018 5TH IEEE INTERNATIONAL CONFERENCE ON CLOUD COMPUTING AND INTELLIGENCE SYSTEMS (CCIS), 2018, : 410 - 415
  • [6] Chain of Reasoning for Visual Question Answering
    Wu, Chenfei
    Liu, Jinlai
    Wang, Xiaojie
    Dong, Xuan
    [J]. ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 31 (NIPS 2018), 2018, 31
  • [7] A Spatial Hierarchical Reasoning Network for Remote Sensing Visual Question Answering
    Zhang, Zixiao
    Jiao, Licheng
    Li, Lingling
    Liu, Xu
    Chen, Puhua
    Liu, Fang
    Li, Yuxuan
    Guo, Zhicheng
    [J]. IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, 2023, 61
  • [8] Cascade Reasoning Network for Text-based Visual Question Answering
    Liu, Fen
    Xu, Guanghui
    Wu, Qi
    Du, Qing
    Jia, Wei
    Tan, Mingkui
    [J]. MM '20: PROCEEDINGS OF THE 28TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, 2020, : 4060 - 4069
  • [9] Cross-modal Relational Reasoning Network for Visual Question Answering
    Chen, Hongyu
    Liu, Ruifang
    Peng, Bo
    [J]. 2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION WORKSHOPS (ICCVW 2021), 2021, : 3939 - 3948
  • [10] PRIOR VISUAL RELATIONSHIP REASONING FOR VISUAL QUESTION ANSWERING
    Yang, Zhuoqian
    Qin, Zengchang
    Yu, Jing
    Wan, Tao
    [J]. 2020 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING (ICIP), 2020, : 1411 - 1415