PGCL: Prompt guidance and self-supervised contrastive learning-based method for Visual Question Answering

被引:0
|
作者
Gao, Ling [1 ]
Zhang, Hongda [2 ]
Liu, Yiming [3 ]
Sheng, Nan [1 ]
Feng, Haotian [1 ]
Xu, Hao [1 ,2 ]
机构
[1] Jilin Univ, Coll Comp Sci & Technol, Changchun 130012, Peoples R China
[2] Jilin Univ, Sch Artificial Intelligence, Changchun 130012, Peoples R China
[3] Jilin Univ, Coll Software, Changchun 130012, Peoples R China
基金
中国国家自然科学基金;
关键词
Visual question answering; Prompt; Contrastive learning; Transformer;
D O I
10.1016/j.eswa.2024.124011
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Recent works have demonstrated the efficacy of Chain -of -Thought (CoT), which comprises multimodal information, in multiple complex reasoning tasks. CoT, involving multiple stages of reasoning, has also been applied to Visual Question Answering (VQA) for scientific questions. Existing research on CoT in science -oriented VQA primarily concentrates on the extraction and integration of visual and textual information. However, they overlook the fact that image -question pairs, categorized by different attributes (such as subject, topic, category, skill, grade, and difficulty), emphasize distinct text information, visual information, and reasoning capabilities. Therefore, this work proposes a novel VQA method termed PGCL, founded on the prompt guidance strategy and self -supervised contrastive learning. PGCL strategically excavates and integrates text and visual information based on attribute information. Specifically, two prompt templates are first crafted. They are subsequently combined with the attribution information and the interference information of image -question pairs to generate a series of prompt positive and prompt negative samples respectively. The mining of visual and text representations is then guided by constructed prompts. These prompt -guided representations are integrated and enhanced via transformer architecture and self -supervised contrastive learning. The fused features are eventually learned to predict answers for VQA. Sufficient experiments have convincingly substantiated the individual contributions of the components within PGCL, as well as the performance of PGCL.
引用
收藏
页数:12
相关论文
共 50 条
  • [1] Simple contrastive learning in a self-supervised manner for robust visual question answering
    Yang, Shuwen
    Xiao, Luwei
    Wu, Xingjiao
    Xu, Junjie
    Wang, Linlin
    He, Liang
    [J]. COMPUTER VISION AND IMAGE UNDERSTANDING, 2024, 241
  • [2] Self-supervised Graph Contrastive Learning for Video Question Answering
    Yao, Xuan
    Gao, Jun-Yu
    Xu, Chang-Sheng
    [J]. Ruan Jian Xue Bao/Journal of Software, 2023, 34 (05): : 2083 - 2100
  • [3] Overcoming Language Priors with Self-supervised Learning for Visual Question Answering
    Zhi, Xi
    Mao, Zhendong
    Liu, Chunxiao
    Zhang, Peng
    Wang, Bin
    Zhang, Yongdong
    [J]. PROCEEDINGS OF THE TWENTY-NINTH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2020, : 1083 - 1089
  • [4] Self-supervised Contrastive Cross-Modality Representation Learning for Spoken Question Answering
    You, Chenyu
    Chen, Nuo
    Zou, Yuexian
    [J]. FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, EMNLP 2021, 2021, : 28 - 39
  • [5] ASCL: Adaptive self-supervised counterfactual learning for robust visual question answering
    Shu, Xinyao
    Yan, Shiyang
    Yang, Xu
    Wu, Ziheng
    Chen, Zhongfeng
    Lu, Zhenyu
    [J]. EXPERT SYSTEMS WITH APPLICATIONS, 2024, 248
  • [6] A multi-scale self-supervised hypergraph contrastive learning framework for video question answering
    Wang, Zheng
    Wu, Bin
    Ota, Kaoru
    Dong, Mianxiong
    Li, He
    [J]. NEURAL NETWORKS, 2023, 168 : 272 - 286
  • [7] elBERto: Self-supervised commonsense learning for question answering
    Zhan, Xunlin
    Li, Yuan
    Dong, Xiao
    Liang, Xiaodan
    Hu, Zhiting
    Carin, Lawrence
    [J]. KNOWLEDGE-BASED SYSTEMS, 2022, 258
  • [8] Self-supervised Visual Feature Learning and Classification Framework: Based on Contrastive Learning
    Wang, Zhibo
    Yan, Shen
    Zhang, Xiaoyu
    Lobo, Niels Da Vitoria
    [J]. 16TH IEEE INTERNATIONAL CONFERENCE ON CONTROL, AUTOMATION, ROBOTICS AND VISION (ICARCV 2020), 2020, : 719 - 725
  • [9] Self-supervised Dialogue Learning for Spoken Conversational Question Answering
    Chen, Nuo
    You, Chenyu
    Zou, Yuexian
    [J]. INTERSPEECH 2021, 2021, : 231 - 235
  • [10] QASAR: Self-Supervised Learning Framework for Extractive Question Answering
    Assem, Haytham
    Sarkar, Iajdeep
    Dutta, Sourav
    [J]. 2021 IEEE INTERNATIONAL CONFERENCE ON BIG DATA (BIG DATA), 2021, : 1797 - 1808