PRIOR VISUAL RELATIONSHIP REASONING FOR VISUAL QUESTION ANSWERING

被引:0
|
作者
Yang, Zhuoqian [1 ,2 ]
Qin, Zengchang [2 ,3 ]
Yu, Jing [4 ]
Wan, Tao [5 ]
机构
[1] Carnegie Mellon Univ, Inst Robot, Pittsburgh, PA 15213 USA
[2] Beihang Univ, Sch ASEE, Intelligent Comp & Machine Learning Lab, Beijing, Peoples R China
[3] Codemao, AI Res, Shenzhen, Peoples R China
[4] Chinese Acad Sci, Inst Informat Engn, Beijing, Peoples R China
[5] Beihang Univ, Sch Biol Sci & Med Engn, Beijing, Peoples R China
关键词
VQA; GCN; Attention Mechanism;
D O I
暂无
中图分类号
TB8 [摄影技术];
学科分类号
0804 ;
摘要
Visual Question Answering (VQA) is a representative task of cross-modal reasoning where an image and a free-form question in natural language are presented and the correct answer needs to be determined using both visual and textual information. One of the key issues of VQA is to reason with semantic clues in the visual content under the guidance of the question. In this paper, we propose Scene Graph Convolutional Network (SceneGCN) to jointly reason the object properties and their semantic relations for the correct answer. The visual relationship is projected into a deep learned semantic space constrained by visual context and language priors. Based on comprehensive experiments on two challenging datasets: GQA and VQA 2.0, we demonstrate the effectiveness and interpretability of the new model.
引用
收藏
页码:1411 / 1415
页数:5
相关论文
共 50 条
  • [1] Sequential Visual Reasoning for Visual Question Answering
    Liu, Jinlai
    Wu, Chenfei
    Wang, Xiaojie
    Dong, Xuan
    [J]. PROCEEDINGS OF 2018 5TH IEEE INTERNATIONAL CONFERENCE ON CLOUD COMPUTING AND INTELLIGENCE SYSTEMS (CCIS), 2018, : 410 - 415
  • [2] Chain of Reasoning for Visual Question Answering
    Wu, Chenfei
    Liu, Jinlai
    Wang, Xiaojie
    Dong, Xuan
    [J]. ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 31 (NIPS 2018), 2018, 31
  • [3] Improving reasoning with contrastive visual information for visual question answering
    Long, Yu
    Tang, Pengjie
    Wang, Hanli
    Yu, Jian
    [J]. ELECTRONICS LETTERS, 2021, 57 (20) : 758 - 760
  • [4] Visual question answering by pattern matching and reasoning
    Zhan, Huayi
    Xiong, Peixi
    Wang, Xin
    Yang, Lan
    [J]. NEUROCOMPUTING, 2022, 467 : 323 - 336
  • [5] Multimodal Learning and Reasoning for Visual Question Answering
    Ilievski, Ilija
    Feng, Jiashi
    [J]. ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 30 (NIPS 2017), 2017, 30
  • [6] Visual Question Answering Research on Joint Knowledge and Visual Information Reasoning
    Su, Zhenqiang
    Gou, Gang
    [J]. Computer Engineering and Applications, 2024, 60 (05) : 95 - 102
  • [7] Coarse-to-Fine Reasoning for Visual Question Answering
    Nguyen, Binh X.
    Tuong Do
    Huy Tran
    Tjiputra, Erman
    Tran, Quang D.
    Anh Nguyen
    [J]. 2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION WORKSHOPS, CVPRW 2022, 2022, : 4557 - 4565
  • [8] Medical Visual Question Answering via Conditional Reasoning
    Zhan, Li-Ming
    Liu, Bo
    Fan, Lu
    Chen, Jiaxin
    Wu, Xiao-Ming
    [J]. MM '20: PROCEEDINGS OF THE 28TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, 2020, : 2345 - 2354
  • [9] Relational reasoning and adaptive fusion for visual question answering
    Shen, Xiang
    Han, Dezhi
    Zong, Liang
    Guo, Zihan
    Hua, Jie
    [J]. APPLIED INTELLIGENCE, 2024, 54 (06) : 5062 - 5080
  • [10] Multimodal Knowledge Reasoning for Enhanced Visual Question Answering
    Hussain, Afzaal
    Maqsood, Ifrah
    Shahzad, Muhammad
    Fraz, Muhammad Moazam
    [J]. 2022 16TH INTERNATIONAL CONFERENCE ON SIGNAL-IMAGE TECHNOLOGY & INTERNET-BASED SYSTEMS, SITIS, 2022, : 224 - 230