Lightweight Visual Question Answering using Scene Graphs

被引:12
|
作者
Nuthalapati, Sai Vidyaranya [1 ]
Chandradevan, Ramraj [2 ]
Giunchiglia, Eleonora [1 ]
Li, Bowen [1 ]
Kayser, Maxime [1 ]
Lukasiewicz, Thomas [1 ]
Yang, Carl [2 ]
机构
[1] Univ Oxford, Dept Comp Sci, Oxford, England
[2] Emory Univ, Dept Comp Sci, Atlanta, GA USA
基金
英国工程与自然科学研究理事会;
关键词
visual question answering; scene graphs; graph neural networks;
D O I
10.1145/3459637.3482218
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Visual question answering (VQA) is a challenging problem in machine perception, which requires a deep joint understanding of both visual and textual data. Recent research has advanced the automatic generation of high-quality scene graphs from images, while powerful yet elegant models like graph neural networks (GNNs) have shown great power in reasoning over graph-structured data. In this work, we propose to bridge the gap between scene graph generation and VQA by leveraging GNNs. In particular, we design a new model called Conditional Enhanced Graph ATtention network (CE-GAT) to encode pairs of visual and semantic scene graphs with both node and edge features, which is seamlessly integrated with a textual question encoder to generate answers through questiongraph conditioning. Moreover, to alleviate the training difficulties of CE-GAT towards VQA, we enforce more useful inductive biases in the scene graphs through novel question-guided graph enriching and pruning. Finally, we evaluate the framework on one of the largest available VQA datasets (namely, GQA) with groundtruth scene graphs, achieving the accuracy of 77.87%, compared with the state of the art (namely, the neural state machine (NSM)), which gives 63.17%. Notably, by leveraging existing scene graphs, our framework is much lighter compared with end-to-end VQA methods (e.g., about 95.3% less parameters than a typical NSM).
引用
收藏
页码:3353 / 3357
页数:5
相关论文
共 50 条
  • [1] DynGraph: Visual Question Answering via Dynamic Scene Graphs
    Haurilet, Monica
    Al-Halah, Ziad
    Stiefelhagen, Rainer
    PATTERN RECOGNITION, DAGM GCPR 2019, 2019, 11824 : 428 - 441
  • [2] Scene Text Visual Question Answering
    Biten, Ali Furkan
    Tito, Ruben
    Mafla, Andres
    Gomez, Lluis
    Rusinol, Marcal
    Valveny, Ernest
    Jawahar, C. V.
    Karatzas, Dimosthenis
    2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2019), 2019, : 4290 - 4300
  • [3] Scene text visual question answering by using YOLO and STN
    Nourali K.
    Dolkhani E.
    International Journal of Speech Technology, 2024, 27 (01) : 69 - 76
  • [4] Scene Understanding for Autonomous Driving Using Visual Question Answering
    Wantiez, Adrien
    Qiu, Tianming
    Matthes, Stefan
    Shen, Hao
    2023 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS, IJCNN, 2023,
  • [5] A Multilingual Approach to Scene Text Visual Question Answering
    Brugues i Pujolras, Josep
    Gomez i Bigorda, Llufs
    Karatzas, Dimosthenis
    DOCUMENT ANALYSIS SYSTEMS, DAS 2022, 2022, 13237 : 65 - 79
  • [6] Scene Graph Refinement Network for Visual Question Answering
    Qian, Tianwen
    Chen, Jingjing
    Chen, Shaoxiang
    Wu, Bo
    Jiang, Yu-Gang
    IEEE TRANSACTIONS ON MULTIMEDIA, 2023, 25 : 3950 - 3961
  • [7] Visual Causal Scene Refinement for Video Question Answering
    Wei, Yushen
    Liu, Yang
    Yan, Hong
    Li, Guanbin
    Lin, Liang
    PROCEEDINGS OF THE 31ST ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2023, 2023, : 377 - 386
  • [8] Question Answering Mediated by Visual Clues and Knowledge Graphs
    de Faria, Fabricio F.
    Usbeck, Ricardo
    Sarullo, Alessio
    Mu, Tingting
    Freitas, Andre
    COMPANION PROCEEDINGS OF THE WORLD WIDE WEB CONFERENCE 2018 (WWW 2018), 2018, : 1937 - 1939
  • [9] Towards Reasoning Ability in Scene Text Visual Question Answering
    Wang, Qingqing
    Xiao, Liqiang
    Lu, Yue
    Jin, Yaohui
    He, Hao
    PROCEEDINGS OF THE 29TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2021, 2021, : 2281 - 2289
  • [10] A Diagrammatic Approach for Visual Question Answering over Knowledge Graphs
    Mouromtsev, Dmitry
    Wohlgenannt, Gerhard
    Haase, Peter
    Pavlov, Dmitry
    Emelyanov, Yury
    Morozov, Alexey
    SEMANTIC WEB: ESWC 2018 SATELLITE EVENTS, 2018, 11155 : 34 - 39