An Enhanced Term Weighted Question Embedding for Visual Question Answering

被引:1
|
作者
Manmadhan, Sruthy [1 ,2 ]
Kovoor, Binsu C. [1 ]
机构
[1] Cochin Univ Sci & Technol, Div Informat Technol, Cochin 682022, Kerala, India
[2] NSS Coll Engn, Dept CSE, Akathethara, Kerala, India
关键词
Text classification; semantic similarity; supervised term weighting; visual question answering;
D O I
10.1142/S0219649222500289
中图分类号
G25 [图书馆学、图书馆事业]; G35 [情报学、情报工作];
学科分类号
1205 ; 120501 ;
摘要
Visual Question Answering (VQA) is a multi-modal AI-complete task of answering natural language questions about images. Literature solved VQA with a three-phase pipeline: image and question featurisation, multi-modal feature fusion and answer generation or prediction. Most of the works have given attention to the second phase, where multi-modal features get combined ignoring the effect of individual input features. This work investigates VQA's natural language question embedding phase by proposing a new question featurisation framework based on Supervised Term Weighting (STW) schemes. In addition, two new STW schemes integrating text semantics, qf.cos and tf.rf.sim, have been introduced to boost the framework's performance. A series of tests on the DAQUAR VQA dataset is used to compare the new system to conventional pre-trained word embedding. Over the past few years, STW schemes have been commonly used in text classification research. In light of this, tests are carried out to verify the effectiveness of the two newly proposed STW schemes in the general text classification task.
引用
收藏
页数:23
相关论文
共 50 条
  • [21] Knowledge-Enhanced Medical Visual Question Answering: A Survey
    Wang, Haofen
    Du, Huifang
    [J]. WEB AND BIG DATA. APWEB-WAIM 2022 INTERNATIONAL WORKSHOPS, KGMA 2022, SEMIBDMA 2022, DEEPLUDA 2022, 2023, 1784 : 3 - 9
  • [22] Learning neighbor-enhanced region representations and question-guided visual representations for visual question answering
    Gao, Ling
    Zhang, Hongda
    Sheng, Nan
    Shi, Lida
    Xu, Hao
    [J]. EXPERT SYSTEMS WITH APPLICATIONS, 2024, 238
  • [23] Multi-stage hybrid embedding fusion network for visual question answering
    Lao, Mingrui
    Guo, Yanming
    Pu, Nan
    Chen, Wei
    Liu, Yu
    Lew, Michael S.
    [J]. NEUROCOMPUTING, 2021, 423 : 541 - 550
  • [24] Exploiting Sentence Embedding for Medical Question Answering
    Hao, Yu
    Liu, Xien
    Wu, Ji
    Lv, Ping
    [J]. THIRTY-THIRD AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE / THIRTY-FIRST INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE / NINTH AAAI SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2019, : 938 - 945
  • [25] Knowledge Graph Embedding Based Question Answering
    Huang, Xiao
    Zhang, Jingyuan
    Li, Dingcheng
    Li, Ping
    [J]. PROCEEDINGS OF THE TWELFTH ACM INTERNATIONAL CONFERENCE ON WEB SEARCH AND DATA MINING (WSDM'19), 2019, : 105 - 113
  • [26] Question-Guided Hybrid Convolution for Visual Question Answering
    Gao, Peng
    Li, Hongsheng
    Li, Shuang
    Lu, Pan
    Li, Yikang
    Hoi, Steven C. H.
    Wang, Xiaogang
    [J]. COMPUTER VISION - ECCV 2018, PT I, 2018, 11205 : 485 - 501
  • [27] Multiple answers to a question: a new approach for visual question answering
    Hosseinabad, Sayedshayan Hashemi
    Safayani, Mehran
    Mirzaei, Abdolreza
    [J]. VISUAL COMPUTER, 2021, 37 (01): : 119 - 131
  • [28] Generating Question Relevant Captions to Aid Visual Question Answering
    Wu, Jialin
    Hu, Zeyuan
    Mooney, Raymond J.
    [J]. 57TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2019), 2019, : 3585 - 3594
  • [29] Debiased Visual Question Answering via the perspective of question types
    Huai, Tianyu
    Yang, Shuwen
    Zhang, Junhang
    Zhao, Jiabao
    He, Liang
    [J]. PATTERN RECOGNITION LETTERS, 2024, 178 : 181 - 187
  • [30] Multiple answers to a question: a new approach for visual question answering
    Sayedshayan Hashemi Hosseinabad
    Mehran Safayani
    Abdolreza Mirzaei
    [J]. The Visual Computer, 2021, 37 : 119 - 131