VL-Meta: Vision-Language Models for Multimodal Meta-Learning

被引:3
|
作者
Ma, Han [1 ]
Fan, Baoyu [1 ]
Ng, Benjamin K. [1 ]
Lam, Chan-Tong [1 ]
机构
[1] Macao Polytech Univ, Fac Appl Sci, Taipa 999078, Macao, Peoples R China
关键词
vision-language models; multimodal learning; meta-learning; token-level training; visual question answering;
D O I
10.3390/math12020286
中图分类号
O1 [数学];
学科分类号
0701 ; 070101 ;
摘要
Multimodal learning is a promising area in artificial intelligence (AI) that can make the model understand different kinds of data. Existing works are trying to re-train a new model based on pre-trained models that requires much data, computation power, and time. However, it is difficult to achieve in low-resource or small-sample situations. Therefore, we propose VL-Meta, Vision Language Models for Multimodal Meta Learning. It (1) presents the vision-language mapper and multimodal fusion mapper, which are light model structures, to use the existing pre-trained models to make models understand images to language feature space and save training data, computation power, and time; (2) constructs the meta-task pool that can only use a small amount of data to construct enough training data and improve the generalization of the model to learn the data knowledge and task knowledge; (3) proposes the token-level training that can align inputs with the outputs during training to improve the model performance; and (4) adopts the multi-task fusion loss to learn the different abilities for the models. It achieves a good performance on the Visual Question Answering (VQA) task, which shows the feasibility and effectiveness of the model. This solution can help blind or visually impaired individuals obtain visual information.
引用
收藏
页数:16
相关论文
共 50 条
  • [31] Prompt-guided and multimodal landscape scenicness assessments with vision-language models
    Levering, Alex
    Marcos, Diego
    Jacobs, Nathan
    Tuia, Devis
    PLOS ONE, 2024, 19 (09):
  • [32] A Framework for Vision-Language Warm-up Tasks in Multimodal Dialogue Models
    Lee, Jaewook
    Park, Seongsik
    Park, Seong-Heum
    Kim, Hongjin
    Kim, Harksoo
    2023 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING, EMNLP 2023, 2023, : 2789 - 2799
  • [33] Multimodal Search on Iconclass using Vision-Language Pre-Trained Models
    Santini, Cristian
    Posthumus, Etienne
    Tietz, Tabea
    Tan, Mary Ann
    Bruns, Oleksandra
    Sack, Harald
    2023 ACM/IEEE JOINT CONFERENCE ON DIGITAL LIBRARIES, JCDL, 2023, : 285 - 287
  • [34] Towards Multimodal Vision-Language Models Generating Non-generic Text
    Robbins, Wes
    THIRTY-SIXTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE / THIRTY-FOURTH CONFERENCE ON INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE / TWELVETH SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2022, : 13138 - 13139
  • [35] White-box Multimodal Jailbreaks Against Large Vision-Language Models
    Shanghai Key Lab of Intell. Info. Processing, School of CS, Fudan University, Shanghai, China
    不详
    不详
    MM - Proc. ACM Int. Conf. Multimed., (6920-6928):
  • [36] Boosting adversarial transferability in vision-language models via multimodal feature heterogeneity
    Chen, Long
    Chen, Yuling
    Ouyang, Zhi
    Dou, Hui
    Zhang, Yangwen
    Sang, Haiwei
    SCIENTIFIC REPORTS, 2025, 15 (01):
  • [37] Leveraging enhanced task embeddings for generalization in multimodal meta-learning
    Rao, Shuzhen
    Huang, Jun
    NEURAL COMPUTING & APPLICATIONS, 2023, 35 (15): : 10765 - 10778
  • [38] Multimodal Meta-Learning for Cold-Start Sequential Recommendation
    Pan, Xingyu
    Chen, Yushuo
    Tian, Changxin
    Lin, Zihan
    Wang, Jinpeng
    Hu, He
    Zhao, Wayne Xin
    PROCEEDINGS OF THE 31ST ACM INTERNATIONAL CONFERENCE ON INFORMATION AND KNOWLEDGE MANAGEMENT, CIKM 2022, 2022, : 3421 - 3430
  • [39] MetaSTNet: Multimodal Meta-Learning for Cellular Traffic Conformal Prediction
    Ma, Hui
    Yang, Kai
    IEEE TRANSACTIONS ON NETWORK SCIENCE AND ENGINEERING, 2024, 11 (02): : 1999 - 2011
  • [40] Learning Meta-Learning (LML) dataset: Survey data of meta-learning parameters
    Corraya, Sonia
    Al Mamun, Shamim
    Kaiser, M. Shamim
    DATA IN BRIEF, 2023, 51