Visual Question Answering based on Formal Logic

被引：3

作者：

Sethuraman, Muralikrishnna G. ^{[1
]}

Payani, Ali ^{[2
]}

Fekri, Faramarz ^{[1
]}

Kerce, J. Clayton ^{[3
]}

机构：

[1] Georgia Inst Technol, Sch Elect & Comp Engn, Atlanta, GA 30332 USA

[2] Cisco, Placer Cty, CA USA

[3] Georgia Inst Technol, Georgia Tech Res Inst, Atlanta, GA 30332 USA

来源：

20TH IEEE INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND APPLICATIONS (ICMLA 2021) | 2021年

关键词：

Visual Question Answering; formal logic; transformers; interpretable learning;

D O I：

10.1109/ICMLA52953.2021.00157

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Visual question answering (VQA) has been gaining a lot of traction in the machine learning community in the recent years due to the challenges posed in understanding information coming from multiple modalities (i.e., images, language). In VQA, a series of questions are posed based on a set of images and the task at hand is to arrive at the answer. To achieve this, we take a symbolic reasoning based approach using the framework of formal logic. The image and the questions are converted into symbolic representations on which explicit reasoning is performed. We propose a formal logic framework where (i) images are converted to logical background facts with the help of scene graphs, (ii) the questions are translated to first-order predicate logic clauses using a transformer based deep learning model, and (iii) perform satisfiability checks, by using the background knowledge and the grounding of predicate clauses, to obtain the answer. Our proposed method is highly interpretable and each step in the pipeline can be easily analyzed by a human. We validate our approach on the CLEVR and the GQA dataset. We achieve near perfect accuracy of 99.6% on the CLEVR dataset comparable to the state of art models, showcasing that formal logic is a viable tool to tackle visual question answering. Our model is also data efficient, achieving 99.1% accuracy on CLEVR dataset when trained on just 10% of the training data.

引用

页码：952 / 957

页数：6

共 50 条

[1] Logic-Based Question Answering
Furbach, Ulrich
Gloeckner, Ingo
Helbig, Hermann
Pelzer, Bjoern
[J]. KUNSTLICHE INTELLIGENZ, 2010, 24 (01): : 51 - 55
[2] A Logic-based Approach to Contrastive Explainability for Neurosymbolic Visual Question Answering
Eiter, Thomas
Geibinger, Tobias
Higuera, Nelson
Oetsch, Johannes
[J]. PROCEEDINGS OF THE THIRTY-SECOND INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, IJCAI 2023, 2023, : 3668 - 3676
[3] Visual Question Answering Based on Position Alignment
Xia, Qihao
Yu, Chao
Peng, Pingping
Gu, Henghao
Zheng, Zhengqi
Zhao, Kun
[J]. 2021 14TH INTERNATIONAL CONGRESS ON IMAGE AND SIGNAL PROCESSING, BIOMEDICAL ENGINEERING AND INFORMATICS (CISP-BMEI 2021), 2021,
[4] Question Modifiers in Visual Question Answering
Britton, William
Sarkhel, Somdeb
Venugopal, Deepak
[J]. LREC 2022: THIRTEEN INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, 2022, : 1472 - 1479
[5] Visual question answering model based on visual relationship detection
Xi, Yuling
Zhang, Yanning
Ding, Songtao
Wan, Shaohua
[J]. SIGNAL PROCESSING-IMAGE COMMUNICATION, 2020, 80
[6] QAlayout: Question Answering Layout Based on Multimodal Attention for Visual Question Answering on Corporate Document
Mahamoud, Ibrahim Souleiman
Coustaty, Mickael
Joseph, Aurelie
d'Andecy, Vincent Poulain
Ogier, Jean-Marc
[J]. DOCUMENT ANALYSIS SYSTEMS, DAS 2022, 2022, 13237 : 659 - 673
[7] Dual Attention and Question Categorization-Based Visual Question Answering
Mishra A.
Anand A.
Guha P.
[J]. IEEE Transactions on Artificial Intelligence, 2023, 4 (01): : 81 - 91
[8] A visual question answering model based on image captioning
Zhou, Kun
Liu, Qiongjie
Zhao, Dexin
[J]. Multimedia Systems, 2024, 30 (06)
[9] VQA: Visual Question Answering
Antol, Stanislaw
Agrawal, Aishwarya
Lu, Jiasen
Mitchell, Margaret
Batra, Dhruv
Zitnick, C. Lawrence
Parikh, Devi
[J]. 2015 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2015, : 2425 - 2433
[10] Visual question answering algorithm based on image caption
Cai, Wenliang
Qiu, Guoyong
[J]. PROCEEDINGS OF 2019 IEEE 3RD INFORMATION TECHNOLOGY, NETWORKING, ELECTRONIC AND AUTOMATION CONTROL CONFERENCE (ITNEC 2019), 2019, : 2076 - 2079

← 1 2 3 4 5 →