Inner Knowledge-based Img2Doc Scheme for Visual Question Answering

被引:9
|
作者
Li, Qun [1 ]
Xiao, Fu [1 ]
Bhanu, Bir [2 ]
Sheng, Biyun [1 ]
Hong, Richang [3 ]
机构
[1] Nanjing Univ Posts & Telecommun, Sch Comp Sci & Technol, 9 Wenyuan Rd, Nanjing 210023, Peoples R China
[2] Univ Calif Riverside, Ctr Res Intelligent Syst, 900 Univ Ave, Riverside, CA 92521 USA
[3] Hefei Univ Technol, Sch Comp Sci & Informat Engn, 193 Tunxi Rd, Hefei 230009, Peoples R China
基金
中国国家自然科学基金;
关键词
VQA; dense image captioning; Doc2Vec; inner knowledge-based; attribute network;
D O I
10.1145/3489142
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Visual Question Answering (VQA) is a research topic of significant interest at the intersection of computer vision and natural language understanding. Recent research indicates that attributes and knowledge can effectively improve performance for both image captioning and VQA. In this article, an inner knowledge-based Img2Doc algorithm for VQA is presented. The inner knowledge is characterized as the inner attribute relationship in visual images. In addition to using an attribute network for inner knowledge-based image representation, VQA scheme is associated with a question-guided Doc2Vec method for question-answering. The attribute network generates inner knowledge-based features for visual images, while a novel question-guided Doc2Vec method aims at converting natural language text to vector features. After the vector features are extracted, they are combined with visual image features into a classifier to provide an answer. Based on our model, the VQA problem is resolved by textual question answering. The experimental results demonstrate that the proposed method achieves superior performance on multiple benchmark datasets.
引用
收藏
页数:21
相关论文
共 50 条
  • [1] Explicit Knowledge-based Reasoning for Visual Question Answering
    Wang, Peng
    Wu, Qi
    Shen, Chunhua
    Dick, Anthony
    van den Hengel, Anton
    [J]. PROCEEDINGS OF THE TWENTY-SIXTH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2017, : 1290 - 1296
  • [2] Knowledge-based question answering
    Rinaldi, F
    Dowdall, J
    Hess, M
    Mollá, D
    Schwitter, R
    Kaljurand, K
    [J]. KNOWLEDGE-BASED INTELLIGENT INFORMATION AND ENGINEERING SYSTEMS, PT 1, PROCEEDINGS, 2003, 2773 : 785 - 792
  • [3] Knowledge-based question answering
    Hermjakob, U
    Hovy, EH
    Lin, CY
    [J]. 6TH WORLD MULTICONFERENCE ON SYSTEMICS, CYBERNETICS AND INFORMATICS, VOL XVI, PROCEEDINGS: COMPUTER SCIENCE III, 2002, : 66 - 71
  • [4] Rich Visual Knowledge-Based Augmentation Network for Visual Question Answering
    Zhang, Liyang
    Liu, Shuaicheng
    Liu, Donghao
    Zeng, Pengpeng
    Li, Xiangpeng
    Song, Jingkuan
    Gao, Lianli
    [J]. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2021, 32 (10) : 4362 - 4373
  • [5] Knowledge enhancement and scene understanding for knowledge-based visual question answering
    Zhenqiang Su
    Gang Gou
    [J]. Knowledge and Information Systems, 2024, 66 : 2193 - 2208
  • [6] Knowledge enhancement and scene understanding for knowledge-based visual question answering
    Su, Zhenqiang
    Gou, Gang
    [J]. KNOWLEDGE AND INFORMATION SYSTEMS, 2024, 66 (03) : 2193 - 2208
  • [7] Cross-modal knowledge reasoning for knowledge-based visual question answering
    Yu, Jing
    Zhu, Zihao
    Wang, Yujing
    Zhang, Weifeng
    Hu, Yue
    Tan, Jianlong
    [J]. PATTERN RECOGNITION, 2020, 108
  • [8] MuKEA: Multimodal Knowledge Extraction and Accumulation for Knowledge-based Visual Question Answering
    Ding, Yang
    Yu, Jing
    Liu, Bang
    Hu, Yue
    Cui, Mingxin
    Wu, Qi
    [J]. 2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2022), 2022, : 5079 - 5088
  • [9] Knowledge-Based Embodied Question Answering
    Tan, Sinan
    Ge, Mengmeng
    Guo, Di
    Liu, Huaping
    Sun, Fuchun
    [J]. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2023, 45 (10) : 11948 - 11960
  • [10] Answering knowledge-based visual questions via the exploration of Question Purpose
    Song, Lingyun
    Li, Jianao
    Liu, Jun
    Yang, Yang
    Shang, Xuequn
    Sun, Mingxuan
    [J]. PATTERN RECOGNITION, 2023, 133