An Enhanced Term Weighted Question Embedding for Visual Question Answering

被引:1
|
作者
Manmadhan, Sruthy [1 ,2 ]
Kovoor, Binsu C. [1 ]
机构
[1] Cochin Univ Sci & Technol, Div Informat Technol, Cochin 682022, Kerala, India
[2] NSS Coll Engn, Dept CSE, Akathethara, Kerala, India
关键词
Text classification; semantic similarity; supervised term weighting; visual question answering;
D O I
10.1142/S0219649222500289
中图分类号
G25 [图书馆学、图书馆事业]; G35 [情报学、情报工作];
学科分类号
1205 ; 120501 ;
摘要
Visual Question Answering (VQA) is a multi-modal AI-complete task of answering natural language questions about images. Literature solved VQA with a three-phase pipeline: image and question featurisation, multi-modal feature fusion and answer generation or prediction. Most of the works have given attention to the second phase, where multi-modal features get combined ignoring the effect of individual input features. This work investigates VQA's natural language question embedding phase by proposing a new question featurisation framework based on Supervised Term Weighting (STW) schemes. In addition, two new STW schemes integrating text semantics, qf.cos and tf.rf.sim, have been introduced to boost the framework's performance. A series of tests on the DAQUAR VQA dataset is used to compare the new system to conventional pre-trained word embedding. Over the past few years, STW schemes have been commonly used in text classification research. In light of this, tests are carried out to verify the effectiveness of the two newly proposed STW schemes in the general text classification task.
引用
收藏
页数:23
相关论文
共 50 条
  • [41] An Analysis of Visual Question Answering Algorithms
    Kafle, Kushal
    Kanan, Christopher
    [J]. 2017 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2017, : 1983 - 1991
  • [42] Affective Visual Question Answering Network
    Ruwa, Nelson
    Mao, Qirong
    Wang, Liangjun
    Dong, Ming
    [J]. IEEE 1ST CONFERENCE ON MULTIMEDIA INFORMATION PROCESSING AND RETRIEVAL (MIPR 2018), 2018, : 170 - 173
  • [43] Multimodal Attention for Visual Question Answering
    Kodra, Lorena
    Mece, Elinda Kajo
    [J]. INTELLIGENT COMPUTING, VOL 1, 2019, 858 : 783 - 792
  • [44] Structured Attentions for Visual Question Answering
    Zhu, Chen
    Zhao, Yanpeng
    Huang, Shuaiyi
    Tu, Kewei
    Ma, Yi
    [J]. 2017 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2017, : 1300 - 1309
  • [45] Differential Attention for Visual Question Answering
    Patro, Badri
    Namboodiri, Vinay P.
    [J]. 2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2018, : 7680 - 7688
  • [46] Learning Continuous Word Embedding with Metadata for Question Retrieval in Community Question Answering
    Zhou, Guangyou
    He, Tingting
    Zhao, Jun
    Hu, Po
    [J]. PROCEEDINGS OF THE 53RD ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS AND THE 7TH INTERNATIONAL JOINT CONFERENCE ON NATURAL LANGUAGE PROCESSING, VOL 1, 2015, : 250 - 259
  • [47] iVQA: Inverse Visual Question Answering
    Liu, Feng
    Xiang, Tao
    Hospedales, Timothy M.
    Yang, Wankou
    Sun, Changyin
    [J]. 2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2018, : 8611 - 8619
  • [48] Adapted GooLeNet for Visual Question Answering
    Huang, Jie
    Hu, Yue
    Yang, Weilong
    [J]. 2018 3RD INTERNATIONAL CONFERENCE ON MECHANICAL, CONTROL AND COMPUTER ENGINEERING (ICMCCE), 2018, : 603 - 606
  • [49] VAQA: Visual Arabic Question Answering
    Sarah M. kamel
    Shimaa I. Hassan
    Lamiaa Elrefaei
    [J]. Arabian Journal for Science and Engineering, 2023, 48 : 10803 - 10823
  • [50] Scene Text Visual Question Answering
    Biten, Ali Furkan
    Tito, Ruben
    Mafla, Andres
    Gomez, Lluis
    Rusinol, Marcal
    Valveny, Ernest
    Jawahar, C. V.
    Karatzas, Dimosthenis
    [J]. 2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2019), 2019, : 4290 - 4300