Lightweight Model Pre-Training via Language Guided Knowledge Distillation

被引：0

作者：

Li, Mingsheng ^{[1
]}

Zhang, Lin ^{[1
]}

Zhu, Mingzhen ^{[1
]}

Huang, Zilong ^{[2
]}

Yu, Gang ^{[2
]}

Fan, Jiayuan ^{[3
]}

Chen, Tao ^{[1
]}

机构：

[1] Fudan Univ, Sch Informat Sci & Technol, Shanghai 200433, Peoples R China

[2] Tencent GY Lab, Shanghai 200000, Peoples R China

[3] Fudan Univ, Acad Engn & Technol, Shanghai 200433, Peoples R China

来源：

IEEE TRANSACTIONS ON MULTIMEDIA | 2024年 / 26卷

基金：

中国国家自然科学基金; 国家重点研发计划;

关键词：

Visualization; Semantics; Task analysis; Feature extraction; Training; Computational modeling; Image segmentation; Lightweight model pre-training; language-guided distillation; textual semantics bank; visual semantics banks;

D O I：

10.1109/TMM.2024.3410532

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

This paper studies the problem of pre-training for small models, which is essential for many mobile devices. Current state-of-the-art methods on this problem transfer the representational knowledge of a large network (as a Teacher) into a smaller model (as a Student) using self-supervised distillation, improving the performance of the small model on downstream tasks. However, existing approaches are insufficient in extracting the crucial knowledge that is useful for discerning categories in downstream tasks during the distillation process. In this paper, for the first time, we introduce language guidance to the distillation process and propose a new method named Language-Guided Distillation (LGD) system, which uses category names of the target downstream task to help refine the knowledge transferred between the teacher and student. To this end, we utilize a pre-trained text encoder to extract semantic embeddings from language and construct a textual semantic space called Textual Semantics Bank (TSB). Furthermore, we design a Language-Guided Knowledge Aggregation (LGKA) module to construct the visual semantic space, also named Visual Semantics Bank (VSB). The task-related knowledge is transferred by driving a student encoder to mimic the similarity score distribution inferred by a teacher over TSB and VSB. Compared with other small models obtained by either ImageNet pre-training or self-supervised distillation, experiment results show that the distilled lightweight model using the proposed LGD method presents state-of-the-art performance and is validated on various downstream tasks, including classification, detection, and segmentation.

引用

页码：10720 / 10730

页数：11

共 50 条

[1] Legal judgment prediction based on pre-training model and knowledge distillation
Pan R.-D.
Kong W.-J.
Qi J.
Kongzhi yu Juece/Control and Decision, 2021, 37 (01): : 67 - 76
[2] MindLLM: Lightweight large language model pre-training, evaluation and domain application
Yang, Yizhe
Sun, Huashan
Li, Jiawei
Liu, Runheng
Li, Yinghao
Liu, Yuhang
Gao, Yang
Huang, Heyan
AI OPEN, 2024, 5 : 155 - 180
[3] Knowledge distilled pre-training model for vision-language-navigation
Bo Huang
Shuai Zhang
Jitao Huang
Yijun Yu
Zhicai Shi
Yujie Xiong
Applied Intelligence, 2023, 53 : 5607 - 5619
[4] Knowledge distilled pre-training model for vision-language-navigation
Huang, Bo
Zhang, Shuai
Huang, Jitao
Yu, Yijun
Shi, Zhicai
Xiong, Yujie
APPLIED INTELLIGENCE, 2023, 53 (05) : 5607 - 5619
[5] Self-Influence Guided Data Reweighting for Language Model Pre-training
Thakkar, Megh
Bolukbasi, Tolga
Ganapathy, Sriram
Vashishth, Shikhar
Chandar, Sarath
Talukdar, Partha
2023 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING, EMNLP 2023, 2023, : 2033 - 2045
[6] Contrastive Language-knowledge Graph Pre-training
Yuan, Xiaowei
Liu, Kang
Wang, Yequan
ACM TRANSACTIONS ON ASIAN AND LOW-RESOURCE LANGUAGE INFORMATION PROCESSING, 2024, 23 (04)
[7] Knowledge Enhanced Pre-Training Model for Vision-Language-Navigation Task
HUANG Jitao
ZENG Guohui
HUANG Bo
GAO Yongbin
LIU Jin
SHI Zhicai
Wuhan University Journal of Natural Sciences, 2021, 26 (02) : 147 - 155
[8] Graph Structure Enhanced Pre-Training Language Model for Knowledge Graph Completion
Zhu, Huashi
Xu, Dexuan
Huang, Yu
Jin, Zhi
Ding, Weiping
Tong, Jiahui
Chong, Guoshuang
IEEE TRANSACTIONS ON EMERGING TOPICS IN COMPUTATIONAL INTELLIGENCE, 2024, 8 (04): : 2697 - 2708
[9] Explanation Guided Knowledge Distillation for Pre-trained Language Model Compression
Yang, Zhao
Zhang, Yuanzhe
Sui, Dianbo
Ju, Yiming
Zhao, Jun
Liu, Kang
ACM TRANSACTIONS ON ASIAN AND LOW-RESOURCE LANGUAGE INFORMATION PROCESSING, 2024, 23 (02)
[10] Filtering, Distillation, and Hard Negatives for Vision-Language Pre-Training
Radenovic, Filip
Dubey, Abhimanyu
Kadian, Abhishek
Mihaylov, Todor
Vandenhende, Simon
Patel, Yash
Wen, Yi
Ramanathan, Vignesh
Mahajan, Dhruv
2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR, 2023, : 6967 - 6977

← 1 2 3 4 5 →