Visual Question Answering based on Formal Logic

被引：3

作者：

Sethuraman, Muralikrishnna G. ^{[1
]}

Payani, Ali ^{[2
]}

Fekri, Faramarz ^{[1
]}

Kerce, J. Clayton ^{[3
]}

机构：

[1] Georgia Inst Technol, Sch Elect & Comp Engn, Atlanta, GA 30332 USA

[2] Cisco, Placer Cty, CA USA

[3] Georgia Inst Technol, Georgia Tech Res Inst, Atlanta, GA 30332 USA

来源：

20TH IEEE INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND APPLICATIONS (ICMLA 2021) | 2021年

关键词：

Visual Question Answering; formal logic; transformers; interpretable learning;

D O I：

10.1109/ICMLA52953.2021.00157

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Visual question answering (VQA) has been gaining a lot of traction in the machine learning community in the recent years due to the challenges posed in understanding information coming from multiple modalities (i.e., images, language). In VQA, a series of questions are posed based on a set of images and the task at hand is to arrive at the answer. To achieve this, we take a symbolic reasoning based approach using the framework of formal logic. The image and the questions are converted into symbolic representations on which explicit reasoning is performed. We propose a formal logic framework where (i) images are converted to logical background facts with the help of scene graphs, (ii) the questions are translated to first-order predicate logic clauses using a transformer based deep learning model, and (iii) perform satisfiability checks, by using the background knowledge and the grounding of predicate clauses, to obtain the answer. Our proposed method is highly interpretable and each step in the pipeline can be easily analyzed by a human. We validate our approach on the CLEVR and the GQA dataset. We achieve near perfect accuracy of 99.6% on the CLEVR dataset comparable to the state of art models, showcasing that formal logic is a viable tool to tackle visual question answering. Our model is also data efficient, achieving 99.1% accuracy on CLEVR dataset when trained on just 10% of the training data.

引用

页码：952 / 957

页数：6

共 50 条

[31] A Transformer-based Medical Visual Question Answering Model
Liu, Lei
Su, Xiangdong
Guo, Hui
Zhu, Daobin
[J]. 2022 26TH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION (ICPR), 2022, : 1712 - 1718
[32] Visual Question Answering based on multimodal triplet knowledge accumuation
Wang, Fengjuan
An, Gaoyun
[J]. 2022 16TH IEEE INTERNATIONAL CONFERENCE ON SIGNAL PROCESSING (ICSP2022), VOL 1, 2022, : 81 - 84
[33] Counting Attention Based on Classification Confidence for Visual Question Answering
Chen, Mingqin
Wang, Yilei
Chen, Shan
Wu, Yingjie
[J]. 2019 IEEE INTL CONF ON PARALLEL & DISTRIBUTED PROCESSING WITH APPLICATIONS, BIG DATA & CLOUD COMPUTING, SUSTAINABLE COMPUTING & COMMUNICATIONS, SOCIAL COMPUTING & NETWORKING (ISPA/BDCLOUD/SOCIALCOM/SUSTAINCOM 2019), 2019, : 1173 - 1179
[34] Research on Visual Question Answering Based on GAT Relational Reasoning
Yalin Miao
Wenfang Cheng
Shuyun He
Hui Jiang
[J]. Neural Processing Letters, 2022, 54 : 1435 - 1448
[35] Research on Visual Question Answering Based on GAT Relational Reasoning
Miao, Yalin
Cheng, Wenfang
He, Shuyun
Jiang, Hui
[J]. NEURAL PROCESSING LETTERS, 2022, 54 (02) : 1435 - 1448
[36] Detection-Based Intermediate Supervision For Visual Question Answering
Liu, Yuhang
Peng, Daowan
Wei, Wei
Fu, Yuanyuan
Xie, Wenfeng
Chen, Dangyang
[J]. THIRTY-EIGHTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 38 NO 12, 2024, : 14061 - 14068
[37] Extending a Logic-Based Question Answering System for Administrative Texts
Gloeckner, Ingo
Pelzer, Bjoern
[J]. MULTILINGUAL INFORMATION ACCESS EVALUATION I: TEXT RETRIEVAL EXPERIMENTS, 2010, 6241 : 265 - +
[38] Towards logic-based question answering under time constraints
Gloeckner, Inuo
[J]. IMECS 2008: INTERNATIONAL MULTICONFERENCE OF ENGINEERS AND COMPUTER SCIENTISTS, VOLS I AND II, 2008, : 13 - 18
[39] Visual Question Answering for Cultural Heritage
Bongini, Pietro
Becattini, Federico
Bagdanov, Andrew D.
Del Bimbo, Alberto
[J]. INTERNATIONAL CONFERENCE FLORENCE HERI-TECH: THE FUTURE OF HERITAGE SCIENCE AND TECHNOLOGIES, 2020, 949
[40] An Improved Attention for Visual Question Answering
Rahman, Tanzila
Chou, Shih-Han
Sigal, Leonid
Carenini, Giuseppe
[J]. 2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION WORKSHOPS, CVPRW 2021, 2021, : 1653 - 1662

← 1 2 3 4 5 →