Zero-Shot Translation of Attention Patterns in VQA Models to Natural Language

被引:0
|
作者
Salewski, Leonard [1 ]
Koepke, A. Sophia [1 ]
Lensch, Hendrik P. A. [1 ]
Akata, Zeynep [1 ,2 ]
机构
[1] Univ Tubingen, Tubingen, Germany
[2] MPI Intelligent Syst, Tubingen, Germany
来源
关键词
Zero-Shot Translation of Attention Patterns; VQA;
D O I
10.1007/978-3-031-54605-1_25
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Converting a model's internals to text can yield human-understandable insights about the model. Inspired by the recent success of training-free approaches for image captioning, we propose ZS-A2T, a zero-shot framework that translates the transformer attention of a given model into natural language without requiring any training. We consider this in the context of Visual Question Answering (VQA). ZS-A2T builds on a pre-trained large language model (LLM), which receives a task prompt, question, and predicted answer, as inputs. The LLM is guided to select tokens which describe the regions in the input image that the VQA model attended to. Crucially, we determine this similarity by exploiting the text-image matching capabilities of the underlying VQA model. Our framework does not require any training and allows the drop-in replacement of different guiding sources (e.g. attribution instead of attention maps), or language models. We evaluate this novel task on textual explanation datasets for VQA, giving state-of-the-art performances for the zero-shot setting on GQA-REX and VQA-X. Our code is available here.
引用
收藏
页码:378 / 393
页数:16
相关论文
共 50 条
  • [1] Exploring Question Decomposition for Zero-Shot VQA
    Khan, Zaid
    Kumar, Vijay B. G.
    Schulter, Samuel
    Chandraker, Manmohan
    Fu, Yun
    [J]. ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023), 2023,
  • [2] Zero-shot Natural Language Video Localization
    Nam, Jinwoo
    Ahn, Daechul
    Kang, Dongyeop
    Ha, Seong Jong
    Choi, Jonghyun
    [J]. 2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2021), 2021, : 1450 - 1459
  • [3] Large Language Models are Zero-Shot Reasoners
    Kojima, Takeshi
    Gu, Shixiang Shane
    Reid, Machel
    Matsuo, Yutaka
    Iwasawa, Yusuke
    [J]. ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 35, NEURIPS 2022, 2022,
  • [4] Language Models as Zero-Shot Trajectory Generators
    Kwon, Teyun
    Di Palo, Norman
    Johns, Edward
    [J]. IEEE ROBOTICS AND AUTOMATION LETTERS, 2024, 9 (07): : 6728 - 6735
  • [5] Improving Zero-shot Translation with Language-Independent Constraints
    Pham, Ngoc-Quan
    Niehues, Jan
    Ha, Thanh-Le
    Waibel, Alex
    [J]. FOURTH CONFERENCE ON MACHINE TRANSLATION (WMT 2019), VOL 1: RESEARCH PAPERS, 2019, : 13 - 23
  • [6] Language Tags Matter for Zero-Shot Neural Machine Translation
    Wu, Liwei
    Cheng, Shanbo
    Wang, Mingxuan
    Li, Lei
    [J]. FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, ACL-IJCNLP 2021, 2021, : 3001 - 3007
  • [7] Large Language Models as Zero-Shot Conversational Recommenders
    He, Zhankui
    Xie, Zhouhang
    Jha, Rahul
    Steck, Harald
    Liang, Dawen
    Feng, Yesu
    Majumder, Bodhisattwa Prasad
    Kallus, Nathan
    McAuley, Julian
    [J]. PROCEEDINGS OF THE 32ND ACM INTERNATIONAL CONFERENCE ON INFORMATION AND KNOWLEDGE MANAGEMENT, CIKM 2023, 2023, : 720 - 730
  • [8] Extensible Prompts for Language Models on Zero-shot Language Style Customization
    Ge, Tao
    Hu, Jing
    Dong, Li
    Mao, Shaoguang
    Xia, Yan
    Wang, Xun
    Chen, Si-Qing
    Wei, Furu
    [J]. ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023), 2023,
  • [9] Zero-Shot Grounding of Objects from Natural Language Queries
    Sadhu, Arka
    Chen, Kan
    Nevatia, Ram
    [J]. 2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2019), 2019, : 4693 - 4702
  • [10] Towards Zero-Shot Knowledge Distillation for Natural Language Processing
    Rashid, Ahmad
    Lioutas, Vasileios
    Ghaddar, Abbas
    Rezagholizadeh, Mehdi
    [J]. 2021 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING (EMNLP 2021), 2021, : 6551 - 6561