Self-Improving Teacher Cultivates Better Student: Distillation Calibration for Multimodal Large Language Models

被引：0

作者：

Li, Xinwei ^{[1
]}

Lin, Li ^{[1
]}

Wang, Shuai ^{[1
]}

Qian, Chen ^{[2
]}

机构：

[1] Southeast Univ, Nanjing, Peoples R China

[2] Tsinghua Univ, Beijing, Peoples R China

来源：

PROCEEDINGS OF THE 47TH INTERNATIONAL ACM SIGIR CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION RETRIEVAL, SIGIR 2024 | 2024年

关键词：

multimodal reasoning; knowledge distillation; large language models;

D O I：

10.1145/3626772.3657692

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Multimodal content generation, which leverages visual information to enhance the comprehension of cross-modal understanding, plays a critical role in Multimodal Information Retrieval. With the development of large language models (LLMs), recent research has adopted visual instruction tuning to inject the knowledge of LLMs into downstream multimodal tasks. The high complexity and great demand for resources urge researchers to study e.cient distillation solutions to transfer the knowledge from pre-trained multimodal models (teachers) to more compact student models. However, the instruction tuning for knowledge distillation in multimodal LLMs is resource-intensive and capability-restricted. The comprehension of students is highly reliant on the teacher models. To address this issue, we propose a novel Multimodal Distillation Calibration framework (MmDC). The main idea is to generate high-quality training instances that challenge student models to comprehend and prompt the teacher to calibrate the knowledge transferred to students, ultimately cultivating a better student model in downstream tasks. This framework comprises two stages: (1) multimodal alignment and (2) knowledge distillation calibration. In the.rst stage, parameter-e.cient.ne-tuning is used to enhance feature alignment between di.erent modalities. In the second stage, we develop a calibration strategy to assess the student model's capability and generate high-quality instances to calibrate knowledge distillation from teacher to student. The experiments on diverse datasets show that our framework e.ciently improves the student model's capabilities. Our 7B-size student model, after three iterations of distillation calibration, outperforms the current state-of-the-art LLaVA-13B model on the ScienceQA and LLaVA Test datasets and also exceeds other strong baselines in a zero-shot setting.

引用

页码：882 / 892

页数：11

共 11 条

[1] Large Language Models are Better Reasoners with Self-Verification
Weng, Yixuan
Zhu, Minjun
Xia, Fei
Li, Bin
He, Shizhu
Liu, Shengping
Sun, Bin
Liu, Kang
Zhao, Jun
FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS - EMNLP 2023, 2023, : 2550 - 2575
[2] Judgment aggregation, discursive dilemma and reflective equilibrium: Neural language models as self-improving doxastic agents
Betz, Gregor
Richardson, Kyle
FRONTIERS IN ARTIFICIAL INTELLIGENCE, 2022, 5
[3] Multimodal prediction of student performance: A fusion of signed graph neural networks and large language models
Wang, Sijie
Ni, Lin
Zhang, Zeyu
Li, Xiaoxuan
Zheng, Xianda
Liu, Jiamou
PATTERN RECOGNITION LETTERS, 2024, 181 : 1 - 8
[4] Improving VR Accessibility Through Automatic 360 Scene Description Using Multimodal Large Language Models
Masasi de Oliveira, Elisa Ayumi
Costa Silva, Diogo Fernandes
Galvao Filho, Arlindo Rodrigues
PROCEEDINGS OF 26TH SYMPOSIUM ON VIRTUAL AND AUGMENTED REALITY, SVR 2024, 2024, : 289 - 293
[5] Self-Para-Consistency: Improving Reasoning Tasks at Low Cost for Large Language Models
Chen, Wenqing
Wang, Weicheng
Chu, Zhixuan
Ren, Kui
Zheng, Zibin
Lu, Zhichao
FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: ACL 2024, 2024, : 14162 - 14167
[6] Self-chats from Large Language Models Make Small Emotional Support Chatbot Better
Zheng, Zhonghua
Liao, Lizi
Deng, Yang
Qin, Libo
Nie, Liqiang
PROCEEDINGS OF THE 62ND ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, VOL 1: LONG PAPERS, 2024, : 11325 - 11345
[7] Enhancing Multilingual Capabilities of Large Language Models through Self-Distillation from Resource-Rich Languages
Zhang, Yuanchi
Wang, Yile
Liu, Zijun
Wang, Shuo
Wang, Xiaolong
Li, Peng
Sun, Maosong
Liu, Yang
PROCEEDINGS OF THE 62ND ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, VOL 1: LONG PAPERS, 2024, : 11189 - 11204
[8] Improving Diversity of Demographic Representation in Large Language Models via Collective-Critiques and Self-Voting
Lahoti, Preethi
Blumni, Nicholas
Ma, Xiao
Kotikalapudi, Raghavendra
Potluri, Sahitya
Tan, Qijun
Srinivasan, Hansa
Packer, Ben
Beirami, Ahmad
Beutel, Alex
Chen, Jilin
2023 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING (EMNLP 2023), 2023, : 10383 - 10405
[9] Cache me if you Can: an Online Cost-aware Teacher-Student Framework to Reduce the Calls to Large Language Models
Stogiannidis, Ilias
Vassos, Stavros
Malakasiotis, Prodromos
Androutsopoulos, Ion
FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (EMNLP 2023), 2023, : 14999 - 15008
[10] Improving medical reasoning through retrieval and self-reflection with retrieval-augmented large language models
Jeong, Minbyul
Sohn, Jiwoong
Sung, Mujeen
Kang, Jaewoo
BIOINFORMATICS, 2024, 40 : i119 - i129

← 1 2 →