Visual question answering model based on visual relationship detection

被引:68
|
作者
Xi, Yuling [1 ]
Zhang, Yanning [1 ]
Ding, Songtao [2 ]
Wan, Shaohua [3 ,4 ]
机构
[1] Northwestern Polytech Univ, Sch Comp Sci, Xian 710072, Shaanxi, Peoples R China
[2] Xian Univ Posts & Telecommun, Sch Comp Sci & Technol, Xian 710121, Shaanxi, Peoples R China
[3] Zhongnan Univ Econ & Law, Sch Informat & Safety Engn, Wuhan 430073, Hubei, Peoples R China
[4] Nanjing Univ, State Key Lab Novel Software Technol, Nanjing 210023, Jiangsu, Peoples R China
关键词
Visual question answering; Appearance features; Relationship predicate; Word vector similarity;
D O I
10.1016/j.image.2019.115648
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
visual question answering (VQA) is a learning task involving two major fields of computer vision and natural language processing. The development of deep learning technology has contributed to the advancement of this research area. Although the research on the question answering model has made great progress, the low accuracy of the VQA model is mainly because the current question answering model structure is relatively simple, the attention mechanism of model is deviated from human attention and lacks a higher level of logical reasoning ability. In response to the above problems, we propose a VQA model based on multi-objective visual relationship detection. Firstly, the appearance feature is used to replace the image features from the original object, and the appearance model is extended by the principle of word vector similarity. The appearance features and relationship predicates are then fed into the word vector space and represented by a fixed length vector. Finally, through the concatenation of elements between the image feature and the question vector are fed into the classifier to generate an output answer. Our method is benchmarked on the DQAUAR data set, and evaluated by the Acc WUPS@0.0 and WUPS@0.9.
引用
收藏
页数:8
相关论文
共 50 条
  • [1] BLOCK: Bilinear Superdiagonal Fusion for Visual Question Answering and Visual Relationship Detection
    Ben-Younes, Hedi
    Cadene, Remi
    Thome, Nicolas
    Cord, Matthieu
    [J]. THIRTY-THIRD AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE / THIRTY-FIRST INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE / NINTH AAAI SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2019, : 8102 - 8109
  • [2] PRIOR VISUAL RELATIONSHIP REASONING FOR VISUAL QUESTION ANSWERING
    Yang, Zhuoqian
    Qin, Zengchang
    Yu, Jing
    Wan, Tao
    [J]. 2020 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING (ICIP), 2020, : 1411 - 1415
  • [3] A visual question answering model based on image captioning
    Zhou, Kun
    Liu, Qiongjie
    Zhao, Dexin
    [J]. Multimedia Systems, 2024, 30 (06)
  • [4] Detection-Based Intermediate Supervision For Visual Question Answering
    Liu, Yuhang
    Peng, Daowan
    Wei, Wei
    Fu, Yuanyuan
    Xie, Wenfeng
    Chen, Dangyang
    [J]. THIRTY-EIGHTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 38 NO 12, 2024, : 14061 - 14068
  • [5] Change Detection Meets Visual Question Answering
    Yuan, Zhenghang
    Mou, Lichao
    Xiong, Zhitong
    Zhu, Xiao Xiang
    [J]. IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, 2022, 60
  • [6] A Transformer-based Medical Visual Question Answering Model
    Liu, Lei
    Su, Xiangdong
    Guo, Hui
    Zhu, Daobin
    [J]. 2022 26TH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION (ICPR), 2022, : 1712 - 1718
  • [7] Robust Visual Question Answering Based on Counterfactual Samples and Relationship Perception
    Qin, Hong
    An, Gaoyun
    Ruan, Qiuqi
    [J]. IMAGE AND GRAPHICS TECHNOLOGIES AND APPLICATIONS, IGTA 2021, 2021, 1480 : 145 - 158
  • [8] Vector Semiotic Model for Visual Question Answering
    Kovalev, Alexey K.
    Shaban, Makhmud
    Osipov, Evgeny
    Panov, Aleksandr, I
    [J]. COGNITIVE SYSTEMS RESEARCH, 2022, 71 : 52 - 63
  • [9] Visual Question Answering Based on Position Alignment
    Xia, Qihao
    Yu, Chao
    Peng, Pingping
    Gu, Henghao
    Zheng, Zhengqi
    Zhao, Kun
    [J]. 2021 14TH INTERNATIONAL CONGRESS ON IMAGE AND SIGNAL PROCESSING, BIOMEDICAL ENGINEERING AND INFORMATICS (CISP-BMEI 2021), 2021,
  • [10] Visual Question Answering based on Formal Logic
    Sethuraman, Muralikrishnna G.
    Payani, Ali
    Fekri, Faramarz
    Kerce, J. Clayton
    [J]. 20TH IEEE INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND APPLICATIONS (ICMLA 2021), 2021, : 952 - 957