ConceptBert: Concept-Aware Representation for Visual Question Answering

被引:0
|
作者
Garderes, Francois [1 ,3 ]
Ziaeefard, Maryam [2 ]
Abeloos, Baptiste [3 ]
Lecue, Freddy [3 ,4 ]
机构
[1] Ecole Polytech, Paris, France
[2] McGill Univ, Montreal, PQ, Canada
[3] Thales, Montreal, PQ, Canada
[4] INRIA, Paris, France
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Visual Question Answering (VQA) is a challenging task that has received increasing attention from both the computer vision and the natural language processing communities. Current works in VQA focus on questions which are answerable by direct analysis of the question and image alone. We present a concept-aware algorithm, ConceptBert, for questions which require common sense, or basic factual knowledge from external structured content. Given an image and a question in natural language, ConceptBert requires visual elements of the image and a Knowledge Graph (KG) to infer the correct answer. We introduce a multi-modal representation which learns a joint Concept-VisionLanguage embedding. We exploit ConceptNet KG for encoding the common sense knowledge and evaluate our methodology on the Outside Knowledge-VQA (OK-VQA) and VQA datasets. Our code is available at https:// github.com/ZiaMaryam/ConceptBERT
引用
收藏
页码:489 / 498
页数:10
相关论文
共 50 条
  • [1] Visual Question Answering with Question Representation Update (QRU)
    Li, Ruiyu
    Jia, Jiaya
    [J]. ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 29 (NIPS 2016), 2016, 29
  • [2] CHANGE-AWARE VISUAL QUESTION ANSWERING
    Yuan, Zhenghang
    Mou, Lichao
    Zhu, Xiao Xiang
    [J]. 2022 IEEE INTERNATIONAL GEOSCIENCE AND REMOTE SENSING SYMPOSIUM (IGARSS 2022), 2022, : 227 - 230
  • [3] Mood-aware visual question answering
    Ruwa, Nelson
    Mao, Qirong
    Wang, Liangjun
    Gou, Jianping
    Dong, Ming
    [J]. NEUROCOMPUTING, 2019, 330 : 305 - 316
  • [4] Question-aware dynamic scene graph of local semantic representation learning for visual question answering
    Wu, Jinmeng
    Ge, Fulin
    Hong, Hanyu
    Shi, Yu
    Hao, Yanbin
    Ma, Lei
    [J]. PATTERN RECOGNITION LETTERS, 2023, 170 : 93 - 99
  • [5] See and Learn More: Dense Caption-Aware Representation for Visual Question Answering
    Bi, Yandong
    Jiang, Huajie
    Hu, Yongli
    Sun, Yanfeng
    Yin, Baocai
    [J]. IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2024, 34 (02) : 1135 - 1146
  • [6] A Survey on Representation Learning in Visual Question Answering
    Sahani, Manish
    Singh, Priyadarshan
    Jangpangi, Sachin
    Kumar, Shailender
    [J]. MACHINE LEARNING AND BIG DATA ANALYTICS (PROCEEDINGS OF INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND BIG DATA ANALYTICS (ICMLBDA) 2021), 2022, 256 : 326 - 336
  • [7] STRUCTURED SEMANTIC REPRESENTATION FOR VISUAL QUESTION ANSWERING
    Yu, Dongchen
    Gao, Xing
    Xiong, Hongkai
    [J]. 2018 25TH IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING (ICIP), 2018, : 2286 - 2290
  • [8] K-PathVQA: Knowledge-Aware Multimodal Representation for Pathology Visual Question Answering
    Naseem U.
    Khushi M.
    Dunn A.G.
    Kim J.
    [J]. IEEE Journal of Biomedical and Health Informatics, 2024, 28 (04) : 1886 - 1895
  • [9] Knowledge-aware image understanding with multi-level visual representation enhancement for visual question answering
    Feng Yan
    Zhe Li
    Wushour Silamu
    Yanbing Li
    [J]. Machine Learning, 2024, 113 : 3789 - 3805
  • [10] KVQA: Knowledge-Aware Visual Question Answering
    Shah, Sanket
    Mishra, Anand
    Yadati, Naganand
    Talukdar, Partha Pratim
    [J]. THIRTY-THIRD AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE / THIRTY-FIRST INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE / NINTH AAAI SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2019, : 8876 - 8884