SINC: Self-Supervised In-Context Learning for Vision-Language Tasks

被引:0
|
作者
Chen, Yi-Syuan [1 ]
Song, Yun-Zhu [1 ]
Yeo, Cheng Yu [1 ]
Liu, Bei [2 ]
Fu, Jianlong [2 ]
Shuai, Hong-Han [1 ]
机构
[1] Natl Yang Ming Chiao Tung Univ, Hsinchu, Taiwan
[2] Microsoft Res Asia, Beijing, Peoples R China
关键词
D O I
10.1109/ICCV51070.2023.01415
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Large Pre-trained Transformers exhibit an intriguing capacity for in-context learning. Without gradient updates, these models can rapidly construct new predictors from demonstrations presented in the inputs. Recent works promote this ability in the vision-language domain by incorporating visual information into large language models that can already make in-context predictions. However, these methods could inherit issues in the language domain, such as template sensitivity and hallucination. Also, the scale of these language models raises a significant demand for computations, making learning and operating these models resource-intensive. To this end, we raise a question: "How can we enable in-context learning without relying on the intrinsic in-context ability of large language models?". To answer it, we propose a succinct and general framework, Self-supervised IN-Context learning (SINC), that introduces a meta-model to learn on self-supervised prompts consisting of tailored demonstrations. The learned models can be transferred to downstream tasks for making incontext predictions on-the-fly. Extensive experiments show that SINC outperforms gradient-based methods in various vision-language tasks under few-shot settings. Furthermore, the designs of SINC help us investigate the benefits of in-context learning across different tasks, and the analysis further reveals the essential components for the emergence of in-context learning in the vision-language domain.
引用
下载
收藏
页码:15384 / 15396
页数:13
相关论文
共 50 条
  • [31] Uncertainty-Aware Self-Supervised Learning of Spatial Perception Tasks
    Nava, Mirko
    Paolillo, Antonio
    Guzzi, Jerome
    Gambardella, Luca Maria
    Giusti, Alessandro
    IEEE ROBOTICS AND AUTOMATION LETTERS, 2021, 6 (04) : 6693 - 6700
  • [32] Diverse Distributions of Self-Supervised Tasks for Meta-Learning in NLP
    Bansal, Trapit
    Gunasekaran, Karthick
    Wang, Tong
    Munkhdalai, Tsendsuren
    McCallum, Andrew
    2021 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING (EMNLP 2021), 2021, : 5812 - 5824
  • [33] Pretext Tasks Selection for Multitask Self-Supervised Audio Representation Learning
    Zaiem, Salah
    Parcollet, Titouan
    Essid, Slim
    Heba, Abdelwahab
    IEEE JOURNAL OF SELECTED TOPICS IN SIGNAL PROCESSING, 2022, 16 (06) : 1439 - 1453
  • [34] Self-supervised Video Representation Learning by Context and Motion Decoupling
    Huang, Lianghua
    Liu, Yu
    Wang, Bin
    Pan, Pan
    Xu, Yinghui
    Jin, Rong
    2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2021, 2021, : 13881 - 13890
  • [35] NLX-GPT: A Model for Natural Language Explanations in Vision and Vision-Language Tasks
    Sammani, Fawaz
    Mukherjee, Tanmoy
    Deligiannis, Nikos
    2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2022, : 8312 - 8322
  • [36] Efficient Self-Supervised Learning Representations for Spoken Language Identification
    Liu, Hexin
    Perera, Leibny Paola Garcia
    Khong, Andy W. H.
    Chng, Eng Siong
    Styles, Suzy J.
    Khudanpur, Sanjeev
    IEEE JOURNAL OF SELECTED TOPICS IN SIGNAL PROCESSING, 2022, 16 (06) : 1296 - 1307
  • [37] Learning an Effective Context-Response Matching Model with Self-Supervised Tasks for Retrieval-based Dialogues
    Xu, Ruijian
    Tao, Chongyang
    Jiang, Daxin
    Zhao, Xueliang
    Zhao, Dongyan
    Yan, Rui
    THIRTY-FIFTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, THIRTY-THIRD CONFERENCE ON INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE AND THE ELEVENTH SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2021, 35 : 14158 - 14166
  • [38] Supervised Pretraining Can Learn In-Context Reinforcement Learning
    Lee, Jonathan N.
    Xie, Annie
    Pacchiano, Aldo
    Chandak, Yash
    Finn, Chelsea
    Nachum, Ofir
    Brunskill, Emma
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023), 2023,
  • [39] In-context language control with production tasks in bilinguals: An fMRI study
    Zhang, Yong
    Huang, Peiyu
    Song, Zhe
    Fang, Liang
    Shen, Tong
    Li, Yan
    Gong, Qiyong
    Xie, Peng
    BRAIN RESEARCH, 2014, 1585 : 131 - 140
  • [40] Survey on Self-Supervised Learning: Auxiliary Pretext Tasks and Contrastive Learning Methods in Imaging
    Albelwi, Saleh
    ENTROPY, 2022, 24 (04)