Improving Visual Question Answering with Pre-trained Language Modeling

被引:0
|
作者
Wu, Yue [1 ,2 ]
Gao, Huiyi [2 ]
Chen, Lei [2 ]
机构
[1] Anhui Univ, Inst Phys Sci & Informat Technol, Hefei 230601, Peoples R China
[2] Chinese Acad Sci, Inst Intelligent Machines, Hefei 230031, Peoples R China
基金
中国国家自然科学基金;
关键词
Visual question answering; pre-training; language modeling;
D O I
10.1117/12.2574575
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Visual question answering is a task of significant importance for research in artificial intelligence. However, most studies often use simple gated recurrent units (GRU) to extract question or image high-level features, and it is not enough for achieving a better performance. In this paper, two improvements are proposed to a general VQA model based on the dynamic memory network (DMN). We initialize the question module of our model using the pre-trained language model. On the other hand, we utilize a new module to replace GRU in the input fusion layer of the input module. Experimental results demonstrate the effectiveness of our method with the improvement of 1.52% on the Visual Question Answering V2 dataset over baseline.
引用
收藏
页数:5
相关论文
共 50 条
  • [41] Pre-trained language models in medicine: A survey *
    Luo, Xudong
    Deng, Zhiqi
    Yang, Binxia
    Luo, Michael Y.
    [J]. ARTIFICIAL INTELLIGENCE IN MEDICINE, 2024, 154
  • [42] Probing for Hyperbole in Pre-Trained Language Models
    Schneidermann, Nina Skovgaard
    Hershcovich, Daniel
    Pedersen, Bolette Sandford
    [J]. PROCEEDINGS OF THE 61ST ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, ACL-SRW 2023, VOL 4, 2023, : 200 - 211
  • [43] Unsupervised statistical text simplification using pre-trained language modeling for initialization
    Jipeng Qiang
    Feng Zhang
    Yun Li
    Yunhao Yuan
    Yi Zhu
    Xindong Wu
    [J]. Frontiers of Computer Science, 2023, 17
  • [44] Recommending metamodel concepts during modeling activities with pre-trained language models
    Martin Weyssow
    Houari Sahraoui
    Eugene Syriani
    [J]. Software and Systems Modeling, 2022, 21 : 1071 - 1089
  • [45] On solving textual ambiguities and semantic vagueness in MRC based question answering using generative pre-trained transformers
    Ahmed M.
    Khan H.
    Iqbal T.
    Alarfaj F.K.
    Alomair A.
    Almusallam N.
    [J]. PeerJ Computer Science, 2023, 9
  • [46] Unsupervised statistical text simplification using pre-trained language modeling for initialization
    Qiang, Jipeng
    Zhang, Feng
    Li, Yun
    Yuan, Yunhao
    Zhu, Yi
    Wu, Xindong
    [J]. FRONTIERS OF COMPUTER SCIENCE, 2023, 17 (01)
  • [47] Recommending metamodel concepts during modeling activities with pre-trained language models
    Weyssow, Martin
    Sahraoui, Houari
    Syriani, Eugene
    [J]. SOFTWARE AND SYSTEMS MODELING, 2022, 21 (03): : 1071 - 1089
  • [48] On solving textual ambiguities and semantic vagueness in MRC based question answering using generative pre-trained transformers
    Ahmed, Muzamil
    Khan, Hikmat
    Iqbal, Tassawar
    Alarfaj, Fawaz Khaled
    Alomair, Abdullah
    Almusallam, Naif
    [J]. PEERJ COMPUTER SCIENCE, 2023, 9
  • [49] LANGUAGE AND VISUAL RELATIONS ENCODING FOR VISUAL QUESTION ANSWERING
    Liu, Fei
    Liu, Jing
    Fang, Zhiwei
    Lu, Hanqing
    [J]. 2019 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING (ICIP), 2019, : 3307 - 3311
  • [50] Surgicberta: a pre-trained language model for procedural surgical language
    Bombieri, Marco
    Rospocher, Marco
    Ponzetto, Simone Paolo
    Fiorini, Paolo
    [J]. INTERNATIONAL JOURNAL OF DATA SCIENCE AND ANALYTICS, 2024, 18 (01) : 69 - 81