Semantically Guided Visual Question Answering

被引:5
|
作者
Zhao, Handong [1 ]
Fan, Quanfu [2 ]
Gutfreund, Dan [2 ]
Fu, Yun [1 ]
机构
[1] Northeastern Univ, Boston, MA 02115 USA
[2] IBM TJ Watson Res Ctr, Yorktown Hts, NY USA
关键词
D O I
10.1109/WACV.2018.00205
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
We present a novel approach to enhance the challenging task of Visual Question Answering (VQA) by incorporating and enriching semantic knowledge in a VQA model. We first apply Multiple Instance Learning (MIL) to extract a richer visual representation addressing concepts beyond objects such as actions and colors. Motivated by the observation that semantically related answers often appear together in prediction, we further develop a new semantically-guided loss function for model learning which has the potential to drive weakly-scored but correct answers to the top while suppressing wrong answers. We show that these two ideas contribute to performance improvement in a complementary way. We demonstrate competitive results comparable to the state of the art on two VQA benchmark datasets.
引用
收藏
页码:1852 / 1860
页数:9
相关论文
共 50 条
  • [1] Question Type Guided Attention in Visual Question Answering
    Shi, Yang
    Furlanello, Tommaso
    Zha, Sheng
    Anandkumar, Animashree
    [J]. COMPUTER VISION - ECCV 2018, PT IV, 2018, 11208 : 158 - 175
  • [2] QUES-TO-VISUAL GUIDED VISUAL QUESTION ANSWERING
    Wu, Xiangyu
    Lu, Jianfeng
    Li, Zhuanfeng
    Xiong, Fengchao
    [J]. 2022 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING, ICIP, 2022, : 4193 - 4197
  • [3] Question-Guided Hybrid Convolution for Visual Question Answering
    Gao, Peng
    Li, Hongsheng
    Li, Shuang
    Lu, Pan
    Li, Yikang
    Hoi, Steven C. H.
    Wang, Xiaogang
    [J]. COMPUTER VISION - ECCV 2018, PT I, 2018, 11205 : 485 - 501
  • [4] Question-guided feature pyramid network for medical visual question answering
    Yu, Yonglin
    Li, Haifeng
    Shi, Hanrong
    Li, Lin
    Xiao, Jun
    [J]. EXPERT SYSTEMS WITH APPLICATIONS, 2023, 214
  • [5] SLAKE: A SEMANTICALLY-LABELED KNOWLEDGE-ENHANCED DATASET FOR MEDICAL VISUAL QUESTION ANSWERING
    Liu, Bo
    Zhan, Li-Ming
    Xu, Li
    Ma, Lin
    Yang, Yan
    Wu, Xiao-Ming
    [J]. 2021 IEEE 18TH INTERNATIONAL SYMPOSIUM ON BIOMEDICAL IMAGING (ISBI), 2021, : 1650 - 1654
  • [6] Semantically Corroborating Neural Attention for Biomedical Question Answering
    Oita, Marilena
    Vani, K.
    Oezdemir-Zaech, Fatma
    [J]. MACHINE LEARNING AND KNOWLEDGE DISCOVERY IN DATABASES, ECML PKDD 2019, PT II, 2020, 1168 : 670 - 685
  • [7] Dual Self-Guided Attention with Sparse Question Networks for Visual Question Answering
    Shen, Xiang
    Han, Dezhi
    Chang, Chin-Chen
    Zong, Liang
    [J]. IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 2022, E105D (04) : 785 - 796
  • [8] Language-Guided Visual Aggregation Network for Video Question Answering
    Liang, Xiao
    Wang, Di
    Wang, Quan
    Wan, Bo
    An, Lingling
    He, Lihuo
    [J]. PROCEEDINGS OF THE 31ST ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2023, 2023, : 5195 - 5203
  • [9] Question Modifiers in Visual Question Answering
    Britton, William
    Sarkhel, Somdeb
    Venugopal, Deepak
    [J]. LREC 2022: THIRTEEN INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, 2022, : 1472 - 1479
  • [10] Multimodal Cross-guided Attention Networks for Visual Question Answering
    Liu, Haibin
    Gong, Shengrong
    Ji, Yi
    Yang, Jianyu
    Xing, Tengfei
    Liu, Chunping
    [J]. PROCEEDINGS OF THE 2018 INTERNATIONAL CONFERENCE ON COMPUTER MODELING, SIMULATION AND ALGORITHM (CMSA 2018), 2018, 151 : 347 - 353