Self-Improving Teacher Cultivates Better Student: Distillation Calibration for Multimodal Large Language Models

被引:0
|
作者
Li, Xinwei [1 ]
Lin, Li [1 ]
Wang, Shuai [1 ]
Qian, Chen [2 ]
机构
[1] Southeast Univ, Nanjing, Peoples R China
[2] Tsinghua Univ, Beijing, Peoples R China
关键词
multimodal reasoning; knowledge distillation; large language models;
D O I
10.1145/3626772.3657692
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Multimodal content generation, which leverages visual information to enhance the comprehension of cross-modal understanding, plays a critical role in Multimodal Information Retrieval. With the development of large language models (LLMs), recent research has adopted visual instruction tuning to inject the knowledge of LLMs into downstream multimodal tasks. The high complexity and great demand for resources urge researchers to study e.cient distillation solutions to transfer the knowledge from pre-trained multimodal models (teachers) to more compact student models. However, the instruction tuning for knowledge distillation in multimodal LLMs is resource-intensive and capability-restricted. The comprehension of students is highly reliant on the teacher models. To address this issue, we propose a novel Multimodal Distillation Calibration framework (MmDC). The main idea is to generate high-quality training instances that challenge student models to comprehend and prompt the teacher to calibrate the knowledge transferred to students, ultimately cultivating a better student model in downstream tasks. This framework comprises two stages: (1) multimodal alignment and (2) knowledge distillation calibration. In the.rst stage, parameter-e.cient.ne-tuning is used to enhance feature alignment between di.erent modalities. In the second stage, we develop a calibration strategy to assess the student model's capability and generate high-quality instances to calibrate knowledge distillation from teacher to student. The experiments on diverse datasets show that our framework e.ciently improves the student model's capabilities. Our 7B-size student model, after three iterations of distillation calibration, outperforms the current state-of-the-art LLaVA-13B model on the ScienceQA and LLaVA Test datasets and also exceeds other strong baselines in a zero-shot setting.
引用
收藏
页码:882 / 892
页数:11
相关论文
共 11 条
  • [1] Large Language Models are Better Reasoners with Self-Verification
    Weng, Yixuan
    Zhu, Minjun
    Xia, Fei
    Li, Bin
    He, Shizhu
    Liu, Shengping
    Sun, Bin
    Liu, Kang
    Zhao, Jun
    FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS - EMNLP 2023, 2023, : 2550 - 2575
  • [2] Judgment aggregation, discursive dilemma and reflective equilibrium: Neural language models as self-improving doxastic agents
    Betz, Gregor
    Richardson, Kyle
    FRONTIERS IN ARTIFICIAL INTELLIGENCE, 2022, 5
  • [3] Multimodal prediction of student performance: A fusion of signed graph neural networks and large language models
    Wang, Sijie
    Ni, Lin
    Zhang, Zeyu
    Li, Xiaoxuan
    Zheng, Xianda
    Liu, Jiamou
    PATTERN RECOGNITION LETTERS, 2024, 181 : 1 - 8
  • [4] Improving VR Accessibility Through Automatic 360 Scene Description Using Multimodal Large Language Models
    Masasi de Oliveira, Elisa Ayumi
    Costa Silva, Diogo Fernandes
    Galvao Filho, Arlindo Rodrigues
    PROCEEDINGS OF 26TH SYMPOSIUM ON VIRTUAL AND AUGMENTED REALITY, SVR 2024, 2024, : 289 - 293
  • [5] Self-Para-Consistency: Improving Reasoning Tasks at Low Cost for Large Language Models
    Chen, Wenqing
    Wang, Weicheng
    Chu, Zhixuan
    Ren, Kui
    Zheng, Zibin
    Lu, Zhichao
    FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: ACL 2024, 2024, : 14162 - 14167
  • [6] Self-chats from Large Language Models Make Small Emotional Support Chatbot Better
    Zheng, Zhonghua
    Liao, Lizi
    Deng, Yang
    Qin, Libo
    Nie, Liqiang
    PROCEEDINGS OF THE 62ND ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, VOL 1: LONG PAPERS, 2024, : 11325 - 11345
  • [7] Enhancing Multilingual Capabilities of Large Language Models through Self-Distillation from Resource-Rich Languages
    Zhang, Yuanchi
    Wang, Yile
    Liu, Zijun
    Wang, Shuo
    Wang, Xiaolong
    Li, Peng
    Sun, Maosong
    Liu, Yang
    PROCEEDINGS OF THE 62ND ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, VOL 1: LONG PAPERS, 2024, : 11189 - 11204
  • [8] Improving Diversity of Demographic Representation in Large Language Models via Collective-Critiques and Self-Voting
    Lahoti, Preethi
    Blumni, Nicholas
    Ma, Xiao
    Kotikalapudi, Raghavendra
    Potluri, Sahitya
    Tan, Qijun
    Srinivasan, Hansa
    Packer, Ben
    Beirami, Ahmad
    Beutel, Alex
    Chen, Jilin
    2023 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING (EMNLP 2023), 2023, : 10383 - 10405
  • [9] Cache me if you Can: an Online Cost-aware Teacher-Student Framework to Reduce the Calls to Large Language Models
    Stogiannidis, Ilias
    Vassos, Stavros
    Malakasiotis, Prodromos
    Androutsopoulos, Ion
    FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (EMNLP 2023), 2023, : 14999 - 15008
  • [10] Improving medical reasoning through retrieval and self-reflection with retrieval-augmented large language models
    Jeong, Minbyul
    Sohn, Jiwoong
    Sung, Mujeen
    Kang, Jaewoo
    BIOINFORMATICS, 2024, 40 : i119 - i129