Improving Visual Question Answering with Pre-trained Language Modeling

被引:0
|
作者
Wu, Yue [1 ,2 ]
Gao, Huiyi [2 ]
Chen, Lei [2 ]
机构
[1] Anhui Univ, Inst Phys Sci & Informat Technol, Hefei 230601, Peoples R China
[2] Chinese Acad Sci, Inst Intelligent Machines, Hefei 230031, Peoples R China
基金
中国国家自然科学基金;
关键词
Visual question answering; pre-training; language modeling;
D O I
10.1117/12.2574575
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Visual question answering is a task of significant importance for research in artificial intelligence. However, most studies often use simple gated recurrent units (GRU) to extract question or image high-level features, and it is not enough for achieving a better performance. In this paper, two improvements are proposed to a general VQA model based on the dynamic memory network (DMN). We initialize the question module of our model using the pre-trained language model. On the other hand, we utilize a new module to replace GRU in the input fusion layer of the input module. Experimental results demonstrate the effectiveness of our method with the improvement of 1.52% on the Visual Question Answering V2 dataset over baseline.
引用
收藏
页数:5
相关论文
共 50 条
  • [1] Pre-trained Language Model for Biomedical Question Answering
    Yoon, Wonjin
    Lee, Jinhyuk
    Kim, Donghyeon
    Jeong, Minbyul
    Kang, Jaewoo
    [J]. MACHINE LEARNING AND KNOWLEDGE DISCOVERY IN DATABASES, ECML PKDD 2019, PT II, 2020, 1168 : 727 - 740
  • [2] Question-answering Forestry Pre-trained Language Model: ForestBERT
    Tan, Jingwei
    Zhang, Huaiqing
    Liu, Yang
    Yang, Jie
    Zheng, Dongping
    [J]. Linye Kexue/Scientia Silvae Sinicae, 2024, 60 (09): : 99 - 110
  • [3] A Pre-trained Language Model for Medical Question Answering Based on Domain Adaption
    Liu, Lang
    Ren, Junxiang
    Wu, Yuejiao
    Song, Ruilin
    Cheng, Zhen
    Wang, Sibo
    [J]. NATURAL LANGUAGE PROCESSING AND CHINESE COMPUTING, NLPCC 2022, PT II, 2022, 13552 : 216 - 227
  • [4] An empirical study of pre-trained language models in simple knowledge graph question answering
    Hu, Nan
    Wu, Yike
    Qi, Guilin
    Min, Dehai
    Chen, Jiaoyan
    Pan, Jeff Z.
    Ali, Zafar
    [J]. WORLD WIDE WEB-INTERNET AND WEB INFORMATION SYSTEMS, 2023, 26 (05): : 2855 - 2886
  • [5] Question Answering based Clinical Text Structuring Using Pre-trained Language Model
    Qiu, Jiahui
    Zhou, Yangming
    Ma, Zhiyuan
    Ruan, Tong
    Liu, Jinlin
    Sun, Jing
    [J]. 2019 IEEE INTERNATIONAL CONFERENCE ON BIOINFORMATICS AND BIOMEDICINE (BIBM), 2019, : 1596 - 1600
  • [6] ReLMKG: reasoning with pre-trained language models and knowledge graphs for complex question answering
    Xing Cao
    Yun Liu
    [J]. Applied Intelligence, 2023, 53 : 12032 - 12046
  • [7] An empirical study of pre-trained language models in simple knowledge graph question answering
    Nan Hu
    Yike Wu
    Guilin Qi
    Dehai Min
    Jiaoyan Chen
    Jeff Z Pan
    Zafar Ali
    [J]. World Wide Web, 2023, 26 : 2855 - 2886
  • [8] UniRaG: Unification, Retrieval, and Generation for Multimodal Question Answering With Pre-Trained Language Models
    Lim, Qi Zhi
    Lee, Chin Poo
    Lim, Kian Ming
    Samingan, Ahmad Kamsani
    [J]. IEEE ACCESS, 2024, 12 : 71505 - 71519
  • [9] VQAttack: Transferable Adversarial Attacks on Visual Question Answering via Pre-trained Models
    Yin, Ziyi
    Ye, Muchao
    Zhang, Tianrong
    Wang, Jiaqi
    Liu, Han
    Chen, Jinghui
    Wang, Ting
    Ma, Fenglong
    [J]. THIRTY-EIGHTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 38 NO 7, 2024, : 6755 - 6763
  • [10] ReLMKG: reasoning with pre-trained language models and knowledge graphs for complex question answering
    Cao, Xing
    Liu, Yun
    [J]. APPLIED INTELLIGENCE, 2023, 53 (10) : 12032 - 12046