Fine-tuning your answers: a bag of tricks for improving VQA models

被引:0
|
作者
Arroyo, Roberto [1 ]
Alvarez, Sergio [1 ]
Aller, Aitor [1 ]
Bergasa, Luis M. [2 ]
Ortiz, Miguel E. [2 ]
机构
[1] NielsenIQ, Madrid, Spain
[2] Univ Alcala UAH, Elect Dept, Madrid, Spain
关键词
Computer vision; Natural language processing; Knowledge representation & reasoning; Visual question answering; Artificial intelligence;
D O I
10.1007/s11042-021-11546-z
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
In this paper, one of the most novel topics in Deep Learning (DL) is explored: Visual Question Answering (VQA). This research area uses three of the most important fields in Artificial Intelligence (AI) to automatically provide natural language answers for questions that a user can ask about an image. These fields are: 1) Computer Vision (CV), 2) Natural Language Processing (NLP) and 3) Knowledge Representation & Reasoning (KR&R). Initially, a review of the state of art in VQA and our contributions to it are discussed. Then, we build upon the ideas provided by Pythia, which is one of the most outstanding approaches. Therefore, a study of the Pythia's architecture is carried out with the aim of presenting varied enhancements with respect to the original proposal in order to fine-tune models using a bag of tricks. Several training strategies are compared to increase the global accuracy and understand the limitations associated with VQA models. Extended results check the impact of the different tricks over our enhanced architecture, jointly with additional qualitative results.
引用
收藏
页码:26889 / 26913
页数:25
相关论文
共 50 条
  • [1] Fine-tuning your answers: a bag of tricks for improving VQA models
    Roberto Arroyo
    Sergio Álvarez
    Aitor Aller
    Luis M. Bergasa
    Miguel E. Ortiz
    Multimedia Tools and Applications, 2022, 81 : 26889 - 26913
  • [2] Improving fine-tuning in composite Higgs models
    Banerjee, Avik
    Bhattacharyya, Gautam
    Ray, Tirtha Sankar
    PHYSICAL REVIEW D, 2017, 96 (03)
  • [3] FINE-TUNING YOUR BUSINESS MEETINGS
    AUGER, B
    CHEMICAL ENGINEERING, 1980, 87 (12) : 133 - &
  • [4] Improving CLIP Fine-tuning Performance
    Wei, Yixuan
    Hu, Han
    Xie, Zhenda
    Liu, Ze
    Zhang, Zheng
    Cao, Yue
    Bao, Jianmin
    Chen, Dong
    Guo, Baining
    2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION, ICCV, 2023, : 5416 - 5426
  • [5] Improving fine-tuning of self-supervised models with Contrastive Initialization
    Pan, Haolin
    Guo, Yong
    Deng, Qinyi
    Yang, Haomin
    Chen, Jian
    Chen, Yiqun
    NEURAL NETWORKS, 2023, 159 : 198 - 207
  • [6] Fine-tuning constraints on supergravity models
    Bastero-Gil, M
    Kane, GL
    King, SF
    PHYSICS LETTERS B, 2000, 474 (1-2) : 103 - 112
  • [7] GO BEYOND PLAIN FINE-TUNING: IMPROVING PRETRAINED MODELS FOR SOCIAL COMMONSENSE
    Chang, Ting-Yun
    Liu, Yang
    Gopalakrishnan, Karthik
    Hedayatnia, Behnam
    Zhou, Pei
    Hakkani-Tur, Dilek
    2021 IEEE SPOKEN LANGUAGE TECHNOLOGY WORKSHOP (SLT), 2021, : 1028 - 1035
  • [8] MISS: A Generative Pre-training and Fine-Tuning Approach for Med-VQA
    Chen, Jiawei
    Yang, Dingkang
    Jiang, Yue
    Lei, Yuxuan
    Zhang, Lihua
    ARTIFICIAL NEURAL NETWORKS AND MACHINE LEARNING-ICANN 2024, PT VIII, 2024, 15023 : 299 - 313
  • [9] Fine-tuning your quality system after registration
    Kerkstra, B
    ASQC'S 51ST ANNUAL QUALITY CONGRESS PROCEEDINGS, 1997, : 680 - 685
  • [10] FINE-TUNING OF ECONOMIC-FORECASTING MODELS
    HUJER, R
    CREMER, R
    KNEPEL, H
    JAHRBUCHER FUR NATIONALOKONOMIE UND STATISTIK, 1979, 194 (01): : 41 - 70