Visual Question Answering based on Formal Logic

被引:3
|
作者
Sethuraman, Muralikrishnna G. [1 ]
Payani, Ali [2 ]
Fekri, Faramarz [1 ]
Kerce, J. Clayton [3 ]
机构
[1] Georgia Inst Technol, Sch Elect & Comp Engn, Atlanta, GA 30332 USA
[2] Cisco, Placer Cty, CA USA
[3] Georgia Inst Technol, Georgia Tech Res Inst, Atlanta, GA 30332 USA
关键词
Visual Question Answering; formal logic; transformers; interpretable learning;
D O I
10.1109/ICMLA52953.2021.00157
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Visual question answering (VQA) has been gaining a lot of traction in the machine learning community in the recent years due to the challenges posed in understanding information coming from multiple modalities (i.e., images, language). In VQA, a series of questions are posed based on a set of images and the task at hand is to arrive at the answer. To achieve this, we take a symbolic reasoning based approach using the framework of formal logic. The image and the questions are converted into symbolic representations on which explicit reasoning is performed. We propose a formal logic framework where (i) images are converted to logical background facts with the help of scene graphs, (ii) the questions are translated to first-order predicate logic clauses using a transformer based deep learning model, and (iii) perform satisfiability checks, by using the background knowledge and the grounding of predicate clauses, to obtain the answer. Our proposed method is highly interpretable and each step in the pipeline can be easily analyzed by a human. We validate our approach on the CLEVR and the GQA dataset. We achieve near perfect accuracy of 99.6% on the CLEVR dataset comparable to the state of art models, showcasing that formal logic is a viable tool to tackle visual question answering. Our model is also data efficient, achieving 99.1% accuracy on CLEVR dataset when trained on just 10% of the training data.
引用
收藏
页码:952 / 957
页数:6
相关论文
共 50 条
  • [31] A Transformer-based Medical Visual Question Answering Model
    Liu, Lei
    Su, Xiangdong
    Guo, Hui
    Zhu, Daobin
    [J]. 2022 26TH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION (ICPR), 2022, : 1712 - 1718
  • [32] Visual Question Answering based on multimodal triplet knowledge accumuation
    Wang, Fengjuan
    An, Gaoyun
    [J]. 2022 16TH IEEE INTERNATIONAL CONFERENCE ON SIGNAL PROCESSING (ICSP2022), VOL 1, 2022, : 81 - 84
  • [33] Counting Attention Based on Classification Confidence for Visual Question Answering
    Chen, Mingqin
    Wang, Yilei
    Chen, Shan
    Wu, Yingjie
    [J]. 2019 IEEE INTL CONF ON PARALLEL & DISTRIBUTED PROCESSING WITH APPLICATIONS, BIG DATA & CLOUD COMPUTING, SUSTAINABLE COMPUTING & COMMUNICATIONS, SOCIAL COMPUTING & NETWORKING (ISPA/BDCLOUD/SOCIALCOM/SUSTAINCOM 2019), 2019, : 1173 - 1179
  • [34] Research on Visual Question Answering Based on GAT Relational Reasoning
    Yalin Miao
    Wenfang Cheng
    Shuyun He
    Hui Jiang
    [J]. Neural Processing Letters, 2022, 54 : 1435 - 1448
  • [35] Research on Visual Question Answering Based on GAT Relational Reasoning
    Miao, Yalin
    Cheng, Wenfang
    He, Shuyun
    Jiang, Hui
    [J]. NEURAL PROCESSING LETTERS, 2022, 54 (02) : 1435 - 1448
  • [36] Detection-Based Intermediate Supervision For Visual Question Answering
    Liu, Yuhang
    Peng, Daowan
    Wei, Wei
    Fu, Yuanyuan
    Xie, Wenfeng
    Chen, Dangyang
    [J]. THIRTY-EIGHTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 38 NO 12, 2024, : 14061 - 14068
  • [37] Extending a Logic-Based Question Answering System for Administrative Texts
    Gloeckner, Ingo
    Pelzer, Bjoern
    [J]. MULTILINGUAL INFORMATION ACCESS EVALUATION I: TEXT RETRIEVAL EXPERIMENTS, 2010, 6241 : 265 - +
  • [38] Towards logic-based question answering under time constraints
    Gloeckner, Inuo
    [J]. IMECS 2008: INTERNATIONAL MULTICONFERENCE OF ENGINEERS AND COMPUTER SCIENTISTS, VOLS I AND II, 2008, : 13 - 18
  • [39] Visual Question Answering for Cultural Heritage
    Bongini, Pietro
    Becattini, Federico
    Bagdanov, Andrew D.
    Del Bimbo, Alberto
    [J]. INTERNATIONAL CONFERENCE FLORENCE HERI-TECH: THE FUTURE OF HERITAGE SCIENCE AND TECHNOLOGIES, 2020, 949
  • [40] An Improved Attention for Visual Question Answering
    Rahman, Tanzila
    Chou, Shih-Han
    Sigal, Leonid
    Carenini, Giuseppe
    [J]. 2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION WORKSHOPS, CVPRW 2021, 2021, : 1653 - 1662