Fine-tuning your answers: a bag of tricks for improving VQA models

被引:0
|
作者
Arroyo, Roberto [1 ]
Alvarez, Sergio [1 ]
Aller, Aitor [1 ]
Bergasa, Luis M. [2 ]
Ortiz, Miguel E. [2 ]
机构
[1] NielsenIQ, Madrid, Spain
[2] Univ Alcala UAH, Elect Dept, Madrid, Spain
关键词
Computer vision; Natural language processing; Knowledge representation & reasoning; Visual question answering; Artificial intelligence;
D O I
10.1007/s11042-021-11546-z
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
In this paper, one of the most novel topics in Deep Learning (DL) is explored: Visual Question Answering (VQA). This research area uses three of the most important fields in Artificial Intelligence (AI) to automatically provide natural language answers for questions that a user can ask about an image. These fields are: 1) Computer Vision (CV), 2) Natural Language Processing (NLP) and 3) Knowledge Representation & Reasoning (KR&R). Initially, a review of the state of art in VQA and our contributions to it are discussed. Then, we build upon the ideas provided by Pythia, which is one of the most outstanding approaches. Therefore, a study of the Pythia's architecture is carried out with the aim of presenting varied enhancements with respect to the original proposal in order to fine-tune models using a bag of tricks. Several training strategies are compared to increase the global accuracy and understand the limitations associated with VQA models. Extended results check the impact of the different tricks over our enhanced architecture, jointly with additional qualitative results.
引用
收藏
页码:26889 / 26913
页数:25
相关论文
共 50 条
  • [41] Span Fine-tuning for Pre-trained Language Models
    Bao, Rongzhou
    Zhang, Zhuosheng
    Zhao, Hai
    FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, EMNLP 2021, 2021, : 1970 - 1979
  • [42] The anxiety toolkit: Strategies for fine-tuning your mind and moving past your stuck points
    Stanley, Peter
    AOTEAROA NEW ZEALAND SOCIAL WORK, 2015, 27 (03): : 87 - 89
  • [43] Symmetry breaking, duality and fine-tuning in hierarchical spin models
    Godina, JJ
    Meurice, Y
    Niermann, S
    Oktay, MB
    NUCLEAR PHYSICS B-PROCEEDINGS SUPPLEMENTS, 2000, 83-4 : 703 - 705
  • [44] Cascaded encoders for fine-tuning ASR models on overlapped speech
    Rose, Richard
    Chang, Oscar
    Siohan, Olivier
    INTERSPEECH 2023, 2023, : 3457 - 3461
  • [45] Analysis of fine-tuning measures in models with extended Higgs sectors
    Boer, Daniel
    Peeters, Ruud
    Zeinstra, Sybrand
    NUCLEAR PHYSICS B, 2019, 946
  • [46] Foundation Models and Fine-Tuning: A Benchmark for Out of Distribution Detection
    Cappio Borlino, Francesco
    Lu, Lorenzo
    Tommasi, Tatiana
    IEEE ACCESS, 2024, 12 : 79401 - 79414
  • [47] Fine-Tuning Generative Models as an Inference Method for Robotic Tasks
    Krupnik, Orr
    Shafer, Elisei
    Jurgenson, Tom
    Tamar, Aviv
    CONFERENCE ON ROBOT LEARNING, VOL 229, 2023, 229
  • [48] Fine-tuning large language models for chemical text mining
    Zhang, Wei
    Wang, Qinggong
    Kong, Xiangtai
    Xiong, Jiacheng
    Ni, Shengkun
    Cao, Duanhua
    Niu, Buying
    Chen, Mingan
    Li, Yameng
    Zhang, Runze
    Wang, Yitian
    Zhang, Lehan
    Li, Xutong
    Xiong, Zhaoping
    Shi, Qian
    Huang, Ziming
    Fu, Zunyun
    Zheng, Mingyue
    CHEMICAL SCIENCE, 2024, 15 (27) : 10600 - 10611
  • [49] Scaling Federated Learning for Fine-Tuning of Large Language Models
    Hilmkil, Agrin
    Callh, Sebastian
    Barbieri, Matteo
    Sutfeld, Leon Rene
    Zec, Edvin Listo
    Mogren, Olof
    NATURAL LANGUAGE PROCESSING AND INFORMATION SYSTEMS (NLDB 2021), 2021, 12801 : 15 - 23
  • [50] Fine-Tuning a Federation of Models - The Quest for a Discrete Event Module
    Morris, John
    Connolly, Kelly
    Hershey, William
    PROCEEDINGS OF THE 13TH WSEAS INTERNATIONAL CONFERENCE ON SYSTEMS: RECENT ADVANCES IN SYSTEMS, 2009, : 159 - +