Fine-tuning your answers: a bag of tricks for improving VQA models

被引:0
|
作者
Arroyo, Roberto [1 ]
Alvarez, Sergio [1 ]
Aller, Aitor [1 ]
Bergasa, Luis M. [2 ]
Ortiz, Miguel E. [2 ]
机构
[1] NielsenIQ, Madrid, Spain
[2] Univ Alcala UAH, Elect Dept, Madrid, Spain
关键词
Computer vision; Natural language processing; Knowledge representation & reasoning; Visual question answering; Artificial intelligence;
D O I
10.1007/s11042-021-11546-z
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
In this paper, one of the most novel topics in Deep Learning (DL) is explored: Visual Question Answering (VQA). This research area uses three of the most important fields in Artificial Intelligence (AI) to automatically provide natural language answers for questions that a user can ask about an image. These fields are: 1) Computer Vision (CV), 2) Natural Language Processing (NLP) and 3) Knowledge Representation & Reasoning (KR&R). Initially, a review of the state of art in VQA and our contributions to it are discussed. Then, we build upon the ideas provided by Pythia, which is one of the most outstanding approaches. Therefore, a study of the Pythia's architecture is carried out with the aim of presenting varied enhancements with respect to the original proposal in order to fine-tune models using a bag of tricks. Several training strategies are compared to increase the global accuracy and understand the limitations associated with VQA models. Extended results check the impact of the different tricks over our enhanced architecture, jointly with additional qualitative results.
引用
收藏
页码:26889 / 26913
页数:25
相关论文
共 50 条
  • [31] Fine-Tuning Language Models with Just Forward Passes
    Malladi, Sadhika
    Gao, Tianyu
    Nichani, Eshaan
    Damian, Alex
    Lee, Jason D.
    Chen, Danqi
    Arora, Sanjeev
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023), 2023,
  • [32] FedPFT: Federated Proxy Fine-Tuning of Foundation Models
    Peng, Zhaopeng
    Fan, Xiaoliang
    Chen, Yufan
    Wang, Zheng
    Pan, Shirui
    Wen, Chenglu
    Zhang, Ruisheng
    Wang, Cheng
    PROCEEDINGS OF THE THIRTY-THIRD INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, IJCAI 2024, 2024, : 4806 - 4814
  • [33] Improving Cardiac Arrest Outcomes: Fine-tuning or Paradigm Shift?
    Steen, Petter Andreas
    CIRCULATION, 2011, 124 (21)
  • [34] Improving Agent Behaviors with RL Fine-Tuning for Autonomous Driving
    Peng, Zhenghao
    Luo, Wenjie
    Lu, Yiren
    Shen, Tianyi
    Gulino, Cole
    Seff, Ari
    Fu, Justin
    COMPUTER VISION - ECCV 2024, PT XXV, 2025, 15083 : 165 - 181
  • [35] DR-Tune: Improving Fine-tuning of Pretrained Visual Models by Distribution Regularization with Semantic Calibration
    Zhou, Nan
    Chen, Jiaxin
    Huang, Di
    2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION, ICCV, 2023, : 1547 - 1556
  • [36] Demystifying Instruction Mixing for Fine-tuning Large Language Models
    Wang, Renxi
    Li, Haonan
    Wu, Minghao
    Wang, Yuxia
    Han, Xudong
    Zhang, Chiyu
    Baldwin, Timothy
    PROCEEDINGS OF THE 62ND ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, VOL 4: STUDENT RESEARCH WORKSHOP, 2024, : 86 - 93
  • [37] Fine-Tuning QurSim on Monolingual and Multilingual Models for Semantic Search
    Afzal, Tania
    Rauf, Sadaf Abdul
    Malik, Muhammad Ghulam Abbas
    Imran, Muhammad
    INFORMATION, 2025, 16 (02)
  • [38] Knee Implant Identification by Fine-Tuning Deep Learning Models
    Sukkrit Sharma
    Vineet Batta
    Malathy Chidambaranathan
    Prabhakaran Mathialagan
    Gayathri Mani
    M. Kiruthika
    Barun Datta
    Srinath Kamineni
    Guruva Reddy
    Suhas Masilamani
    Sandeep Vijayan
    Derek F. Amanatullah
    Indian Journal of Orthopaedics, 2021, 55 : 1295 - 1305
  • [39] Fine-tuning Deep Network Models for Classifying Fingerprint Images
    Thanh-Nghi Do
    The-Phi Pham
    Minh-Thu Tran-Nguyen
    2020 12TH INTERNATIONAL CONFERENCE ON KNOWLEDGE AND SYSTEMS ENGINEERING (IEEE KSE 2020), 2020, : 79 - 84
  • [40] Getting it right: the limits of fine-tuning large language models
    Browning, Jacob
    ETHICS AND INFORMATION TECHNOLOGY, 2024, 26 (02)