Visual Question Answering based on Formal Logic

被引:3
|
作者
Sethuraman, Muralikrishnna G. [1 ]
Payani, Ali [2 ]
Fekri, Faramarz [1 ]
Kerce, J. Clayton [3 ]
机构
[1] Georgia Inst Technol, Sch Elect & Comp Engn, Atlanta, GA 30332 USA
[2] Cisco, Placer Cty, CA USA
[3] Georgia Inst Technol, Georgia Tech Res Inst, Atlanta, GA 30332 USA
关键词
Visual Question Answering; formal logic; transformers; interpretable learning;
D O I
10.1109/ICMLA52953.2021.00157
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Visual question answering (VQA) has been gaining a lot of traction in the machine learning community in the recent years due to the challenges posed in understanding information coming from multiple modalities (i.e., images, language). In VQA, a series of questions are posed based on a set of images and the task at hand is to arrive at the answer. To achieve this, we take a symbolic reasoning based approach using the framework of formal logic. The image and the questions are converted into symbolic representations on which explicit reasoning is performed. We propose a formal logic framework where (i) images are converted to logical background facts with the help of scene graphs, (ii) the questions are translated to first-order predicate logic clauses using a transformer based deep learning model, and (iii) perform satisfiability checks, by using the background knowledge and the grounding of predicate clauses, to obtain the answer. Our proposed method is highly interpretable and each step in the pipeline can be easily analyzed by a human. We validate our approach on the CLEVR and the GQA dataset. We achieve near perfect accuracy of 99.6% on the CLEVR dataset comparable to the state of art models, showcasing that formal logic is a viable tool to tackle visual question answering. Our model is also data efficient, achieving 99.1% accuracy on CLEVR dataset when trained on just 10% of the training data.
引用
收藏
页码:952 / 957
页数:6
相关论文
共 50 条
  • [1] Logic-Based Question Answering
    Furbach, Ulrich
    Gloeckner, Ingo
    Helbig, Hermann
    Pelzer, Bjoern
    [J]. KUNSTLICHE INTELLIGENZ, 2010, 24 (01): : 51 - 55
  • [2] A Logic-based Approach to Contrastive Explainability for Neurosymbolic Visual Question Answering
    Eiter, Thomas
    Geibinger, Tobias
    Higuera, Nelson
    Oetsch, Johannes
    [J]. PROCEEDINGS OF THE THIRTY-SECOND INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, IJCAI 2023, 2023, : 3668 - 3676
  • [3] Visual Question Answering Based on Position Alignment
    Xia, Qihao
    Yu, Chao
    Peng, Pingping
    Gu, Henghao
    Zheng, Zhengqi
    Zhao, Kun
    [J]. 2021 14TH INTERNATIONAL CONGRESS ON IMAGE AND SIGNAL PROCESSING, BIOMEDICAL ENGINEERING AND INFORMATICS (CISP-BMEI 2021), 2021,
  • [4] Question Modifiers in Visual Question Answering
    Britton, William
    Sarkhel, Somdeb
    Venugopal, Deepak
    [J]. LREC 2022: THIRTEEN INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, 2022, : 1472 - 1479
  • [5] Visual question answering model based on visual relationship detection
    Xi, Yuling
    Zhang, Yanning
    Ding, Songtao
    Wan, Shaohua
    [J]. SIGNAL PROCESSING-IMAGE COMMUNICATION, 2020, 80
  • [6] QAlayout: Question Answering Layout Based on Multimodal Attention for Visual Question Answering on Corporate Document
    Mahamoud, Ibrahim Souleiman
    Coustaty, Mickael
    Joseph, Aurelie
    d'Andecy, Vincent Poulain
    Ogier, Jean-Marc
    [J]. DOCUMENT ANALYSIS SYSTEMS, DAS 2022, 2022, 13237 : 659 - 673
  • [7] Dual Attention and Question Categorization-Based Visual Question Answering
    Mishra A.
    Anand A.
    Guha P.
    [J]. IEEE Transactions on Artificial Intelligence, 2023, 4 (01): : 81 - 91
  • [8] A visual question answering model based on image captioning
    Zhou, Kun
    Liu, Qiongjie
    Zhao, Dexin
    [J]. Multimedia Systems, 2024, 30 (06)
  • [9] VQA: Visual Question Answering
    Antol, Stanislaw
    Agrawal, Aishwarya
    Lu, Jiasen
    Mitchell, Margaret
    Batra, Dhruv
    Zitnick, C. Lawrence
    Parikh, Devi
    [J]. 2015 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2015, : 2425 - 2433
  • [10] Visual question answering algorithm based on image caption
    Cai, Wenliang
    Qiu, Guoyong
    [J]. PROCEEDINGS OF 2019 IEEE 3RD INFORMATION TECHNOLOGY, NETWORKING, ELECTRONIC AND AUTOMATION CONTROL CONFERENCE (ITNEC 2019), 2019, : 2076 - 2079