PRIOR VISUAL RELATIONSHIP REASONING FOR VISUAL QUESTION ANSWERING

被引:0
|
作者
Yang, Zhuoqian [1 ,2 ]
Qin, Zengchang [2 ,3 ]
Yu, Jing [4 ]
Wan, Tao [5 ]
机构
[1] Carnegie Mellon Univ, Inst Robot, Pittsburgh, PA 15213 USA
[2] Beihang Univ, Sch ASEE, Intelligent Comp & Machine Learning Lab, Beijing, Peoples R China
[3] Codemao, AI Res, Shenzhen, Peoples R China
[4] Chinese Acad Sci, Inst Informat Engn, Beijing, Peoples R China
[5] Beihang Univ, Sch Biol Sci & Med Engn, Beijing, Peoples R China
关键词
VQA; GCN; Attention Mechanism;
D O I
暂无
中图分类号
TB8 [摄影技术];
学科分类号
0804 ;
摘要
Visual Question Answering (VQA) is a representative task of cross-modal reasoning where an image and a free-form question in natural language are presented and the correct answer needs to be determined using both visual and textual information. One of the key issues of VQA is to reason with semantic clues in the visual content under the guidance of the question. In this paper, we propose Scene Graph Convolutional Network (SceneGCN) to jointly reason the object properties and their semantic relations for the correct answer. The visual relationship is projected into a deep learned semantic space constrained by visual context and language priors. Based on comprehensive experiments on two challenging datasets: GQA and VQA 2.0, we demonstrate the effectiveness and interpretability of the new model.
引用
收藏
页码:1411 / 1415
页数:5
相关论文
共 50 条
  • [31] Visual Question Answering
    Nada, Ahmed
    Chen, Min
    2024 INTERNATIONAL CONFERENCE ON COMPUTING, NETWORKING AND COMMUNICATIONS, ICNC, 2024, : 6 - 10
  • [32] Quantifying and Alleviating the Language Prior Problem in Visual Question Answering
    Guo, Yangyang
    Cheng, Zhiyong
    Nie, Liqiang
    Liu, Yibing
    Wang, Yinglong
    Kankanhalli, Mohan
    PROCEEDINGS OF THE 42ND INTERNATIONAL ACM SIGIR CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION RETRIEVAL (SIGIR '19), 2019, : 75 - 84
  • [33] BLOCK: Bilinear Superdiagonal Fusion for Visual Question Answering and Visual Relationship Detection
    Ben-Younes, Hedi
    Cadene, Remi
    Thome, Nicolas
    Cord, Matthieu
    THIRTY-THIRD AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE / THIRTY-FIRST INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE / NINTH AAAI SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2019, : 8102 - 8109
  • [34] Reasoning on the Relation: Enhancing Visual Representation for Visual Question Answering and Cross-Modal Retrieval
    Yu, Jing
    Zhang, Weifeng
    Lu, Yuhang
    Qin, Zengchang
    Hu, Yue
    Tan, Jianlong
    Wu, Qi
    IEEE TRANSACTIONS ON MULTIMEDIA, 2020, 22 (12) : 3196 - 3209
  • [35] ViCLEVR: a visual reasoning dataset and hybrid multimodal fusion model for visual question answering in Vietnamese
    Tran, Khiem Vinh
    Phan, Hao Phu
    Van Nguyen, Kiet
    Nguyen, Ngan Luu Thuy
    MULTIMEDIA SYSTEMS, 2024, 30 (04)
  • [36] Seeing and Reasoning: A Simple Deep Learning Approach to Visual Question Answering
    Zakari, Rufai Yusuf
    Owusu, Jim Wilson
    Qin, Ke
    He, Tao
    Luo, Guangchun
    BIG DATA MINING AND ANALYTICS, 2025, 8 (02): : 458 - 478
  • [37] ChartQA: A Benchmark for Question Answering about Charts with Visual and Logical Reasoning
    Masry, Ahmed
    Long, Do Xuan
    Tan, Jia Qing
    Joty, Shafiq
    Hogue, Enamul
    FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2022), 2022, : 2263 - 2279
  • [38] Visual Question Generation as Dual Task of Visual Question Answering
    Li, Yikang
    Duan, Nan
    Zhou, Bolei
    Chu, Xiao
    Ouyang, Wanli
    Wang, Xiaogang
    Zhou, Ming
    2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2018, : 6116 - 6124
  • [39] Multimodal feature fusion by relational reasoning and attention for visual question answering
    Zhang, Weifeng
    Yu, Jing
    Hu, Hua
    Hu, Haiyang
    Qin, Zengchang
    INFORMATION FUSION, 2020, 55 (55) : 116 - 126
  • [40] Question Modifiers in Visual Question Answering
    Britton, William
    Sarkhel, Somdeb
    Venugopal, Deepak
    LREC 2022: THIRTEEN INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, 2022, : 1472 - 1479