Fine-tuning your answers: a bag of tricks for improving VQA models

被引:0
|
作者
Arroyo, Roberto [1 ]
Alvarez, Sergio [1 ]
Aller, Aitor [1 ]
Bergasa, Luis M. [2 ]
Ortiz, Miguel E. [2 ]
机构
[1] NielsenIQ, Madrid, Spain
[2] Univ Alcala UAH, Elect Dept, Madrid, Spain
关键词
Computer vision; Natural language processing; Knowledge representation & reasoning; Visual question answering; Artificial intelligence;
D O I
10.1007/s11042-021-11546-z
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
In this paper, one of the most novel topics in Deep Learning (DL) is explored: Visual Question Answering (VQA). This research area uses three of the most important fields in Artificial Intelligence (AI) to automatically provide natural language answers for questions that a user can ask about an image. These fields are: 1) Computer Vision (CV), 2) Natural Language Processing (NLP) and 3) Knowledge Representation & Reasoning (KR&R). Initially, a review of the state of art in VQA and our contributions to it are discussed. Then, we build upon the ideas provided by Pythia, which is one of the most outstanding approaches. Therefore, a study of the Pythia's architecture is carried out with the aim of presenting varied enhancements with respect to the original proposal in order to fine-tune models using a bag of tricks. Several training strategies are compared to increase the global accuracy and understand the limitations associated with VQA models. Extended results check the impact of the different tricks over our enhanced architecture, jointly with additional qualitative results.
引用
收藏
页码:26889 / 26913
页数:25
相关论文
共 50 条
  • [21] Improve Performance of Fine-tuning Language Models with Prompting
    Yang, Zijian Gyozo
    Ligeti-Nagy, Noenn
    INFOCOMMUNICATIONS JOURNAL, 2023, 15 : 62 - 68
  • [22] HackMentor: Fine-Tuning Large Language Models for Cybersecurity
    Zhang, Jie
    Wen, Hui
    Deng, Liting
    Xin, Mingfeng
    Li, Zhi
    Li, Lun
    Zhu, Hongsong
    Sun, Limin
    2023 IEEE 22ND INTERNATIONAL CONFERENCE ON TRUST, SECURITY AND PRIVACY IN COMPUTING AND COMMUNICATIONS, TRUSTCOM, BIGDATASE, CSE, EUC, ISCI 2023, 2024, : 452 - 461
  • [23] Robust fine-tuning of zero-shot models
    Wortsman, Mitchell
    Ilharco, Gabriel
    Kim, Jong Wook
    Li, Mike
    Kornblith, Simon
    Roelofs, Rebecca
    Lopes, Raphael Gontijo
    Hajishirzi, Hannaneh
    Farhadi, Ali
    Namkoong, Hongseok
    Schmidt, Ludwig
    2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2022, : 7949 - 7961
  • [24] CONVFIT: Conversational Fine-Tuning of Pretrained Language Models
    Vulic, Ivan
    Su, Pei-Hao
    Coope, Sam
    Gerz, Daniela
    Budzianowski, Pawel
    Casanueva, Inigo
    Mrksic, Nikola
    Wen, Tsung-Hsien
    2021 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING (EMNLP 2021), 2021, : 1151 - 1168
  • [25] Simultaneous paraphrasing and translation by fine-tuning Transformer models
    Chada, Rakesh
    NEURAL GENERATION AND TRANSLATION, 2020, : 198 - 203
  • [26] PETALS: Collaborative Inference and Fine-tuning of Large Models
    Borzunov, Alexander
    Baranchuk, Dmitry
    Dettmers, Tim
    Ryabinin, Max
    Belkada, Younes
    Chumachenko, Artem
    Samygin, Pavel
    Raffel, Colin
    PROCEEDINGS OF THE 61ST ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, ACL-DEMO 2023, VOL 3, 2023, : 558 - 568
  • [27] Relaxed fine-tuning in models with nonuniversal gaugino masses
    Abe, Hiroyuki
    Kobayashi, Tatsuo
    Omura, Yuji
    PHYSICAL REVIEW D, 2007, 76 (01):
  • [28] Stop Search With Acceptable Fine-Tuning in Susy Models
    Cici, Ali
    Un, Cem Salih
    Kirca, Zerrin
    PROCEEDINGS OF THE TURKISH PHYSICAL SOCIETY 32ND INTERNATIONAL PHYSICS CONGRESS (TPS32), 2017, 1815
  • [29] Fine-tuning language models to recognize semantic relations
    Roussinov, Dmitri
    Sharoff, Serge
    Puchnina, Nadezhda
    LANGUAGE RESOURCES AND EVALUATION, 2023, 57 (04) : 1463 - 1486
  • [30] Fine-tuning language models to recognize semantic relations
    Dmitri Roussinov
    Serge Sharoff
    Nadezhda Puchnina
    Language Resources and Evaluation, 2023, 57 : 1463 - 1486