The meaning of "most" for visual question answering models

被引:0
|
作者
Kuhnle, Alexander [1 ]
Copestake, Ann [1 ]
机构
[1] Univ Cambridge, Dept Comp Sci & Technol, Cambridge, England
基金
英国工程与自然科学研究理事会;
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The correct interpretation of quantifier statements in the context of a visual scene requires non-trivial inference mechanisms. For the example of "most", we discuss two strategies which rely on fundamentally different cognitive concepts. Our aim is to identify what strategy deep learning models for visual question answering learn when trained on such questions. To this end, we carefully design data to replicate experiments from psycholinguistics where the same question was investigated for humans. Focusing on the FiLM visual question answering model, our experiments indicate that a form of approximate number system emerges whose performance declines with more difficult scenes as predicted by Weber's law. Moreover, we identify confounding factors, like spatial arrangement of the scene, which impede the effectiveness of this system.
引用
收藏
页码:46 / 55
页数:10
相关论文
共 50 条
  • [1] Latent Variable Models for Visual Question Answering
    Wang, Zixu
    Miao, Yishu
    Specia, Lucia
    [J]. 2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION WORKSHOPS (ICCVW 2021), 2021, : 3137 - 3141
  • [2] Explicit Bias Discovery in Visual Question Answering Models
    Manjunatha, Varun
    Saini, Nirat
    Davis, Larry S.
    [J]. 2019 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2019), 2019, : 9554 - 9563
  • [3] Incorporation of question segregation procedures in visual question-answering models
    Chowdhury, Souvik
    Soni, Badal
    Phukan, Doli
    [J]. INTERNATIONAL JOURNAL OF COMPUTING SCIENCE AND MATHEMATICS, 2024, 20 (02) : 99 - 108
  • [4] Question Modifiers in Visual Question Answering
    Britton, William
    Sarkhel, Somdeb
    Venugopal, Deepak
    [J]. LREC 2022: THIRTEEN INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, 2022, : 1472 - 1479
  • [5] High-Order Attention Models for Visual Question Answering
    Schwartz, Idan
    Schwing, Alexander G.
    Hazan, Tamir
    [J]. ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 30 (NIPS 2017), 2017, 30
  • [6] VQA: Visual Question Answering
    Antol, Stanislaw
    Agrawal, Aishwarya
    Lu, Jiasen
    Mitchell, Margaret
    Batra, Dhruv
    Zitnick, C. Lawrence
    Parikh, Devi
    [J]. 2015 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2015, : 2425 - 2433
  • [7] VQA: Visual Question Answering
    Agrawal, Aishwarya
    Lu, Jiasen
    Antol, Stanislaw
    Mitchell, Margaret
    Zitnick, C. Lawrence
    Parikh, Devi
    Batra, Dhruv
    [J]. INTERNATIONAL JOURNAL OF COMPUTER VISION, 2017, 123 (01) : 4 - 31
  • [8] Survey on Visual Question Answering
    Bao X.-G.
    Zhou C.-L.
    Xiao K.-J.
    Qin B.
    [J]. Ruan Jian Xue Bao/Journal of Software, 2021, 32 (08): : 2522 - 2544
  • [9] Visual Question Answering A tutorial
    Teney, Damien
    Wu, Qi
    van den Hengel, Anton
    [J]. IEEE SIGNAL PROCESSING MAGAZINE, 2017, 34 (06) : 63 - 75
  • [10] Visual Question Generation as Dual Task of Visual Question Answering
    Li, Yikang
    Duan, Nan
    Zhou, Bolei
    Chu, Xiao
    Ouyang, Wanli
    Wang, Xiaogang
    Zhou, Ming
    [J]. 2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2018, : 6116 - 6124