Question Modifiers in Visual Question Answering

被引:0
|
作者
Britton, William [1 ]
Sarkhel, Somdeb [2 ]
Venugopal, Deepak [1 ]
机构
[1] Univ Memphis, Memphis, TN 38152 USA
[2] Adobe Res, Bangalore, Karnataka, India
基金
美国国家科学基金会;
关键词
visual question answering; modifiers; deep models; perception;
D O I
暂无
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
Visual Question Answering (VQA) is a challenge problem that can advance AI by integrating several important sub-disciplines including natural language understanding and computer vision. Large VQA datasets that are publicly available for training and evaluation have driven the growth of VQA models that have obtained increasingly larger accuracy scores. However, it is also important to understand how much a model understands the details that are provided in a question. For example, studies in psychology have shown that syntactic complexity places a larger cognitive load on humans. Analogously, we want to understand if models have the perceptual capability to handle modifications to questions. Therefore, we develop a new dataset using Amazon Mechanical Turk where we asked workers to add modifiers to questions based on object properties and spatial relationships. We evaluate this data on LXMERT which is a state-of-the-art model in VQA that focuses more extensively on language processing. Our conclusions indicate that there is a significant negative impact on the performance of the model when the questions are modified to include more detailed information.
引用
收藏
页码:1472 / 1479
页数:8
相关论文
共 50 条
  • [21] Visual Question Answering for Cultural Heritage
    Bongini, Pietro
    Becattini, Federico
    Bagdanov, Andrew D.
    Del Bimbo, Alberto
    [J]. INTERNATIONAL CONFERENCE FLORENCE HERI-TECH: THE FUTURE OF HERITAGE SCIENCE AND TECHNOLOGIES, 2020, 949
  • [22] An Improved Attention for Visual Question Answering
    Rahman, Tanzila
    Chou, Shih-Han
    Sigal, Leonid
    Carenini, Giuseppe
    [J]. 2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION WORKSHOPS, CVPRW 2021, 2021, : 1653 - 1662
  • [23] Robust Explanations for Visual Question Answering
    Patro, Badri N.
    Patel, Shivansh
    Namboodiri, Vinay P.
    [J]. 2020 IEEE WINTER CONFERENCE ON APPLICATIONS OF COMPUTER VISION (WACV), 2020, : 1566 - 1575
  • [24] Chain of Reasoning for Visual Question Answering
    Wu, Chenfei
    Liu, Jinlai
    Wang, Xiaojie
    Dong, Xuan
    [J]. ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 31 (NIPS 2018), 2018, 31
  • [25] Visual Question Answering on 360° Images
    Chou, Shih-Han
    Chao, Wei-Lun
    Lai, Wei-Sheng
    Sun, Min
    Yang, Ming-Hsuan
    [J]. 2020 IEEE WINTER CONFERENCE ON APPLICATIONS OF COMPUTER VISION (WACV), 2020, : 1596 - 1605
  • [26] Medical visual question answering: A survey
    Lin, Zhihong
    Zhang, Donghao
    Tao, Qingyi
    Shi, Danli
    Haffari, Gholamreza
    Wu, Qi
    He, Mingguang
    Ge, Zongyuan
    [J]. ARTIFICIAL INTELLIGENCE IN MEDICINE, 2023, 143
  • [27] Multimodal Attention for Visual Question Answering
    Kodra, Lorena
    Mece, Elinda Kajo
    [J]. INTELLIGENT COMPUTING, VOL 1, 2019, 858 : 783 - 792
  • [28] Structured Attentions for Visual Question Answering
    Zhu, Chen
    Zhao, Yanpeng
    Huang, Shuaiyi
    Tu, Kewei
    Ma, Yi
    [J]. 2017 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2017, : 1300 - 1309
  • [29] Differential Attention for Visual Question Answering
    Patro, Badri
    Namboodiri, Vinay P.
    [J]. 2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2018, : 7680 - 7688
  • [30] Affective Visual Question Answering Network
    Ruwa, Nelson
    Mao, Qirong
    Wang, Liangjun
    Dong, Ming
    [J]. IEEE 1ST CONFERENCE ON MULTIMEDIA INFORMATION PROCESSING AND RETRIEVAL (MIPR 2018), 2018, : 170 - 173