Learning neighbor-enhanced region representations and question-guided visual representations for visual question answering

被引:1
|
作者
Gao, Ling [1 ]
Zhang, Hongda [2 ]
Sheng, Nan [1 ]
Shi, Lida [2 ]
Xu, Hao [1 ,2 ]
机构
[1] Jilin Univ, Coll Comp Sci & Technol, Changchun 130012, Peoples R China
[2] Jilin Univ, Sch Artificial Intelligence, Changchun 130012, Peoples R China
基金
中国国家自然科学基金;
关键词
Visual question answering; Deep learning; Feature graph; Attention mechanism; Random walk;
D O I
10.1016/j.eswa.2023.122239
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Great strides have been made in visual question answering field (VQA) based on the application and development of deep learning in related research fields. Existing models in this field focus on the learning and fusion of visual and textual features. However, it is extremely crucial for VQA tasks to focus on the associations between image regions and use question information to enhance key features. In this paper, we propose a method for mining and integrating neighbor-enhanced region representations and question-guided visual representations. Particularly, the region feature graph is first constructed to integrate the features of all regions and the relationships between regions. Secondly, a random walk-based method is presented to acquire the neighbor-enhanced region representations, which combines the topological relationships of all region nodes in the graph. The question-guided vertical and horizontal dual attention mechanism is then proposed to enhance the region representation from the region level and the feature level, respectively. Finally, the enhanced region representation and question representation are integrated adaptively to achieve answer prediction. Convincible experiments show that our method achieves improvements and outperforms prior state-of-the-art methods on two competitive benchmarks, i.e., VQA v1 and VQA v2.
引用
收藏
页数:12
相关论文
共 50 条
  • [1] Question-Guided Hybrid Convolution for Visual Question Answering
    Gao, Peng
    Li, Hongsheng
    Li, Shuang
    Lu, Pan
    Li, Yikang
    Hoi, Steven C. H.
    Wang, Xiaogang
    [J]. COMPUTER VISION - ECCV 2018, PT I, 2018, 11205 : 485 - 501
  • [2] Question-guided feature pyramid network for medical visual question answering
    Yu, Yonglin
    Li, Haifeng
    Shi, Hanrong
    Li, Lin
    Xiao, Jun
    [J]. EXPERT SYSTEMS WITH APPLICATIONS, 2023, 214
  • [3] Visual Question Answering with Textual Representations for Images
    Hirota, Yusuke
    Garcia, Noa
    Otani, Mayu
    Chu, Chenhui
    Nakashima, Yuta
    Taniguchi, Ittetsu
    Onoye, Takao
    [J]. 2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION WORKSHOPS (ICCVW 2021), 2021, : 3147 - 3150
  • [4] Ask, Attend and Answer: Exploring Question-Guided Spatial Attention for Visual Question Answering
    Xu, Huijuan
    Saenko, Kate
    [J]. COMPUTER VISION - ECCV 2016, PT VII, 2016, 9911 : 451 - 466
  • [5] A question-guided multi-hop reasoning graph network for visual question answering
    Xu, Zhaoyang
    Gu, Jinguang
    Liu, Maofu
    Zhou, Guangyou
    Fu, Haidong
    Qiu, Chen
    [J]. INFORMATION PROCESSING & MANAGEMENT, 2023, 60 (02)
  • [6] Graph-Structured Representations for Visual Question Answering
    Teney, Damien
    Liu, Lingqiao
    van den Hengel, Anton
    [J]. 30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017), 2017, : 3233 - 3241
  • [7] MIRTT: Learning Multimodal Interaction Representations from Trilinear Transformers for Visual Question Answering
    Wang, Junjie
    Ji, Yatai
    Sun, Jiaqi
    Yang, Yujiu
    Sakai, Tetsuya
    [J]. FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, EMNLP 2021, 2021, : 2280 - 2292
  • [8] Question Type Guided Attention in Visual Question Answering
    Shi, Yang
    Furlanello, Tommaso
    Zha, Sheng
    Anandkumar, Animashree
    [J]. COMPUTER VISION - ECCV 2018, PT IV, 2018, 11208 : 158 - 175
  • [9] Semantically Guided Visual Question Answering
    Zhao, Handong
    Fan, Quanfu
    Gutfreund, Dan
    Fu, Yun
    [J]. 2018 IEEE WINTER CONFERENCE ON APPLICATIONS OF COMPUTER VISION (WACV 2018), 2018, : 1852 - 1860
  • [10] QUES-TO-VISUAL GUIDED VISUAL QUESTION ANSWERING
    Wu, Xiangyu
    Lu, Jianfeng
    Li, Zhuanfeng
    Xiong, Fengchao
    [J]. 2022 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING, ICIP, 2022, : 4193 - 4197