Context-VQA: Towards Context-Aware and Purposeful Visual Question Answering

被引:0
|
作者
Naik, Nandita [1 ]
Potts, Christopher [1 ]
Kreiss, Elisa [1 ]
机构
[1] Stanford Univ, Stanford, CA 94305 USA
关键词
D O I
10.1109/ICCVW60793.2023.00301
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Visual question answering (VQA) has the potential to make the Internet more accessible in an interactive way, allowing people who cannot see images to ask questions about them. However, multiple studies have shown that people who are blind or have low-vision prefer image explanations that incorporate the context in which an image appears, yet current VQA datasets focus on images in isolation. We argue that VQA models will not fully succeed at meeting people's needs unless they take context into account. To further motivate and analyze the distinction between different contexts, we introduce Context-VQA(1), a VQA dataset that pairs images with contexts, specifically types of websites (e.g., a shopping website). We find that the types of questions vary systematically across contexts. For example, images presented in a travel context garner 2 times more "Where?" questions, and images on social media and news garner 2.8 and 1.8 times more "Who?" questions than the average. We also find that context effects are especially important when participants can't see the image. These results demonstrate that context affects the types of questions asked and that VQA models should be contextsensitive to better meet people's needs, especially in accessibility settings.
引用
收藏
页码:2813 / 2817
页数:5
相关论文
共 50 条
  • [1] CAAN: Context-Aware attention network for visual question answering
    Chen, Chongqing
    Han, Dezhi
    Chang, Chin-Chen
    [J]. Pattern Recognition, 2022, 132
  • [2] Boosting Visual Question Answering with Context-aware Knowledge Aggregation
    Li, Guohao
    Wang, Xin
    Zhu, Wenwu
    [J]. MM '20: PROCEEDINGS OF THE 28TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, 2020, : 1227 - 1235
  • [3] CAAN: Context-Aware attention network for visual question answering
    Chen, Chongqing
    Han, Dezhi
    Chang, Chin -Chen
    [J]. PATTERN RECOGNITION, 2022, 132
  • [4] Context-Aware Answer Extraction in Question Answering
    Seonwoo, Yeon
    Kin, Ji-Hoon
    Ha, Jung -Woo
    Oh, Alice
    [J]. PROCEEDINGS OF THE 2020 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING (EMNLP), 2020, : 2418 - 2428
  • [5] A Context-aware Attention Network for Interactive Question Answering
    Li, Huayu
    Min, Martin Renqiang
    Ge, Yong
    Kadav, Asim
    [J]. KDD'17: PROCEEDINGS OF THE 23RD ACM SIGKDD INTERNATIONAL CONFERENCE ON KNOWLEDGE DISCOVERY AND DATA MINING, 2017, : 927 - 935
  • [6] length Context-aware Multi-level Question Embedding Fusion for visual question answering
    Li, Shengdong
    Gong, Chen
    Zhu, Yuqing
    Luo, Chuanwen
    Hong, Yi
    Lv, Xueqiang
    [J]. INFORMATION FUSION, 2024, 102
  • [7] Question Part Relevance and Editing for Cooperative and Context-Aware VQA (C2VQA)
    Toor, Andeep S.
    Wechsler, Harry
    Nappi, Michele
    [J]. PROCEEDINGS OF THE 15TH INTERNATIONAL WORKSHOP ON CONTENT-BASED MULTIMEDIA INDEXING (CBMI), 2017,
  • [8] VQA: Visual Question Answering
    Antol, Stanislaw
    Agrawal, Aishwarya
    Lu, Jiasen
    Mitchell, Margaret
    Batra, Dhruv
    Zitnick, C. Lawrence
    Parikh, Devi
    [J]. 2015 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2015, : 2425 - 2433
  • [9] VQA: Visual Question Answering
    Agrawal, Aishwarya
    Lu, Jiasen
    Antol, Stanislaw
    Mitchell, Margaret
    Zitnick, C. Lawrence
    Parikh, Devi
    Batra, Dhruv
    [J]. INTERNATIONAL JOURNAL OF COMPUTER VISION, 2017, 123 (01) : 4 - 31
  • [10] Context-Aware Visual Tracking
    Yang, Ming
    Wu, Ying
    Hua, Gang
    [J]. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2009, 31 (07) : 1195 - 1209