An Enhanced Term Weighted Question Embedding for Visual Question Answering

被引:1
|
作者
Manmadhan, Sruthy [1 ,2 ]
Kovoor, Binsu C. [1 ]
机构
[1] Cochin Univ Sci & Technol, Div Informat Technol, Cochin 682022, Kerala, India
[2] NSS Coll Engn, Dept CSE, Akathethara, Kerala, India
关键词
Text classification; semantic similarity; supervised term weighting; visual question answering;
D O I
10.1142/S0219649222500289
中图分类号
G25 [图书馆学、图书馆事业]; G35 [情报学、情报工作];
学科分类号
1205 ; 120501 ;
摘要
Visual Question Answering (VQA) is a multi-modal AI-complete task of answering natural language questions about images. Literature solved VQA with a three-phase pipeline: image and question featurisation, multi-modal feature fusion and answer generation or prediction. Most of the works have given attention to the second phase, where multi-modal features get combined ignoring the effect of individual input features. This work investigates VQA's natural language question embedding phase by proposing a new question featurisation framework based on Supervised Term Weighting (STW) schemes. In addition, two new STW schemes integrating text semantics, qf.cos and tf.rf.sim, have been introduced to boost the framework's performance. A series of tests on the DAQUAR VQA dataset is used to compare the new system to conventional pre-trained word embedding. Over the past few years, STW schemes have been commonly used in text classification research. In light of this, tests are carried out to verify the effectiveness of the two newly proposed STW schemes in the general text classification task.
引用
收藏
页数:23
相关论文
共 50 条
  • [1] Parallel multi-head attention and term-weighted question embedding for medical visual question answering
    Manmadhan, Sruthy
    Kovoor, Binsu C.
    [J]. MULTIMEDIA TOOLS AND APPLICATIONS, 2023, 82 (22) : 34937 - 34958
  • [2] Parallel multi-head attention and term-weighted question embedding for medical visual question answering
    Sruthy Manmadhan
    Binsu C Kovoor
    [J]. Multimedia Tools and Applications, 2023, 82 : 34937 - 34958
  • [3] Improving visual question answering using dropout and enhanced question encoder
    Fang, Zhiwei
    Liu, Jing
    Li, Yong
    Qiao, Yanyuan
    Lu, Hanqing
    [J]. PATTERN RECOGNITION, 2019, 90 : 404 - 414
  • [4] Multi-Tier Attention Network using Term-weighted Question Features for Visual Question Answering
    Manmadhan, Sruthy
    Kovoor, Binsu C.
    [J]. IMAGE AND VISION COMPUTING, 2021, 115
  • [5] Question Modifiers in Visual Question Answering
    Britton, William
    Sarkhel, Somdeb
    Venugopal, Deepak
    [J]. LREC 2022: THIRTEEN INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, 2022, : 1472 - 1479
  • [6] Multi visual and textual embedding on visual question answering for blind people
    Tung Le
    Huy Tien Nguyen
    Minh Le Nguyen
    [J]. NEUROCOMPUTING, 2021, 465 : 451 - 464
  • [7] Embedding Spatial Relations in Visual Question Answering for Remote Sensing
    Faure, Maxime
    Lobry, Sylvain
    Kurtz, Camille
    Wendling, Laurent
    [J]. 2022 26TH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION (ICPR), 2022, : 310 - 316
  • [8] Multimodal Knowledge Reasoning for Enhanced Visual Question Answering
    Hussain, Afzaal
    Maqsood, Ifrah
    Shahzad, Muhammad
    Fraz, Muhammad Moazam
    [J]. 2022 16TH INTERNATIONAL CONFERENCE ON SIGNAL-IMAGE TECHNOLOGY & INTERNET-BASED SYSTEMS, SITIS, 2022, : 224 - 230
  • [9] Visual Question Generation as Dual Task of Visual Question Answering
    Li, Yikang
    Duan, Nan
    Zhou, Bolei
    Chu, Xiao
    Ouyang, Wanli
    Wang, Xiaogang
    Zhou, Ming
    [J]. 2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2018, : 6116 - 6124
  • [10] length Context-aware Multi-level Question Embedding Fusion for visual question answering
    Li, Shengdong
    Gong, Chen
    Zhu, Yuqing
    Luo, Chuanwen
    Hong, Yi
    Lv, Xueqiang
    [J]. INFORMATION FUSION, 2024, 102