Learning to Reason: End-to-End Module Networks for Visual Question Answering

被引:252
|
作者
Hu, Ronghang [1 ]
Andreas, Jacob [1 ]
Rohrbach, Marcus [1 ,2 ]
Darrell, Trevor [1 ]
Saenko, Kate [3 ]
机构
[1] Univ Calif Berkeley, Berkeley, CA 94720 USA
[2] Facebook AI Res, New York, NY USA
[3] Boston Univ, Boston, MA 02215 USA
关键词
D O I
10.1109/ICCV.2017.93
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Natural language questions are inherently compositional, and many are most easily answered by reasoning about their decomposition into modular sub-problems. For example, to answer "is there an equal number of balls and boxes?" we can look for balls, look for boxes, count them, and compare the results. The recently proposed Neural Module Network (NMN) architecture [3, 2] implements this approach to question answering by parsing questions into linguistic substructures and assembling question-specific deep networks from smaller modules that each solve one subtask. However, existing NMN implementations rely on brittle off-the-shelf parsers, and are restricted to the module configurations proposed by these parsers rather than learning them from data. In this paper, we propose End-to-End Module Networks (N2NMNs), which learn to reason by directly predicting instance-specific network layouts without the aid of a parser. Our model learns to generate network structures (by imitating expert demonstrations) while simultaneously learning network parameters (using the downstream task loss). Experimental results on the new CLEVR dataset targeted at compositional question answering show that N2NMNs achieve an error reduction of nearly 50% relative to state-of-theart attentional approaches, while discovering interpretable network architectures specialized for each question.
引用
收藏
页码:804 / 813
页数:10
相关论文
共 50 条
  • [1] Smoothing CNN for end-to-end training in visual question answering
    Long, Yu
    Tang, Pengjie
    Wang, Hanli
    Li, Qinyu
    DEVELOPMENTS OF ARTIFICIAL INTELLIGENCE TECHNOLOGIES IN COMPUTATION AND ROBOTICS, 2020, 12 : 784 - 791
  • [2] Improving Convolutional End-to-End Memory Networks with BERT for Question Answering
    Alkhawlani, Mohammed A.
    Azman, Azreen
    Abdullah, Muhamad Taufik
    Yaakob, Razali
    Kadir, Rabiah Abdul
    Alshari, Eissa M.
    INTELLIGENT SYSTEMS AND APPLICATIONS, VOL 2, INTELLISYS 2024, 2024, 1066 : 90 - 104
  • [3] Towards End-to-End Multilingual Question Answering
    Ekaterina Loginova
    Stalin Varanasi
    Günter Neumann
    Information Systems Frontiers, 2021, 23 : 227 - 241
  • [4] Towards End-to-End Multilingual Question Answering
    Loginova, Ekaterina
    Varanasi, Stalin
    Neumann, Guenter
    INFORMATION SYSTEMS FRONTIERS, 2021, 23 (01) : 227 - 241
  • [5] Explicit Reasoning over End-to-End Neural Architectures for Visual Question Answering
    Aditya, Somak
    Yang, Yezhou
    Baral, Chitta
    THIRTY-SECOND AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE / THIRTIETH INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE / EIGHTH AAAI SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2018, : 629 - 637
  • [6] SurgicalGPT: End-to-End Language-Vision GPT for Visual Question Answering in Surgery
    Seenivasan, Lalithkumar
    Islam, Mobarakol
    Kannan, Gokul
    Ren, Hongliang
    MEDICAL IMAGE COMPUTING AND COMPUTER ASSISTED INTERVENTION, MICCAI 2023, PT IX, 2023, 14228 : 281 - 290
  • [7] Naranjo Question Answering using End-to-End Multi-task Learning Model
    Rawat, Bhanu Pratap Singh
    Li, Fei
    Yu, Hong
    KDD'19: PROCEEDINGS OF THE 25TH ACM SIGKDD INTERNATIONAL CONFERENCCE ON KNOWLEDGE DISCOVERY AND DATA MINING, 2019, : 2547 - 2555
  • [8] GSQA: An End-to-End Model for Generative Spoken Question Answering
    Shih, Min-Han
    Chung, Ho-Lam
    Pai, Yu-Chi
    Hsu, Ming-Hao
    Lin, Guan-Ting
    Lie, Shang-Wen
    Lee, Hung-yi
    INTERSPEECH 2024, 2024, : 2970 - 2974
  • [9] End-to-End Open-Domain Question Answering with BERTserini
    Yang, Wei
    Xie, Yuqing
    Lin, Aileen
    Li, Xingyu
    Tan, Luchen
    Xiong, Kun
    Li, Ming
    Lin, Jimmy
    NAACL HLT 2019: THE 2019 CONFERENCE OF THE NORTH AMERICAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: HUMAN LANGUAGE TECHNOLOGIES: PROCEEDINGS OF THE DEMONSTRATIONS SESSION, 2019, : 72 - 77
  • [10] A Simple End-to-End Question Answering Model for Product Information
    Lai, Tuan Manh
    Bui, Trung
    Li, Sheng
    Lipka, Nedim
    ECONOMICS AND NATURAL LANGUAGE PROCESSING (ECONLP 2018), 2018, : 38 - 43