Lightweight Model Pre-Training via Language Guided Knowledge Distillation

被引:0
|
作者
Li, Mingsheng [1 ]
Zhang, Lin [1 ]
Zhu, Mingzhen [1 ]
Huang, Zilong [2 ]
Yu, Gang [2 ]
Fan, Jiayuan [3 ]
Chen, Tao [1 ]
机构
[1] Fudan Univ, Sch Informat Sci & Technol, Shanghai 200433, Peoples R China
[2] Tencent GY Lab, Shanghai 200000, Peoples R China
[3] Fudan Univ, Acad Engn & Technol, Shanghai 200433, Peoples R China
基金
中国国家自然科学基金; 国家重点研发计划;
关键词
Visualization; Semantics; Task analysis; Feature extraction; Training; Computational modeling; Image segmentation; Lightweight model pre-training; language-guided distillation; textual semantics bank; visual semantics banks;
D O I
10.1109/TMM.2024.3410532
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
This paper studies the problem of pre-training for small models, which is essential for many mobile devices. Current state-of-the-art methods on this problem transfer the representational knowledge of a large network (as a Teacher) into a smaller model (as a Student) using self-supervised distillation, improving the performance of the small model on downstream tasks. However, existing approaches are insufficient in extracting the crucial knowledge that is useful for discerning categories in downstream tasks during the distillation process. In this paper, for the first time, we introduce language guidance to the distillation process and propose a new method named Language-Guided Distillation (LGD) system, which uses category names of the target downstream task to help refine the knowledge transferred between the teacher and student. To this end, we utilize a pre-trained text encoder to extract semantic embeddings from language and construct a textual semantic space called Textual Semantics Bank (TSB). Furthermore, we design a Language-Guided Knowledge Aggregation (LGKA) module to construct the visual semantic space, also named Visual Semantics Bank (VSB). The task-related knowledge is transferred by driving a student encoder to mimic the similarity score distribution inferred by a teacher over TSB and VSB. Compared with other small models obtained by either ImageNet pre-training or self-supervised distillation, experiment results show that the distilled lightweight model using the proposed LGD method presents state-of-the-art performance and is validated on various downstream tasks, including classification, detection, and segmentation.
引用
收藏
页码:10720 / 10730
页数:11
相关论文
共 50 条
  • [31] oLMpics-On What Language Model Pre-training Captures
    Talmor, Alon
    Elazar, Yanai
    Goldberg, Yoav
    Berant, Jonathan
    TRANSACTIONS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, 2020, 8 (08) : 743 - 758
  • [32] Pre-training A Prompt Pool for Vision-Language Model
    Liu, Jun
    Gu, Yang
    Yang, Zhaohua
    Guo, Shuai
    Liu, Huaqiu
    Chen, Yiqiang
    2023 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS, IJCNN, 2023,
  • [33] REALM: Retrieval-Augmented Language Model Pre-Training
    Guu, Kelvin
    Lee, Kenton
    Tung, Zora
    Pasupat, Panupong
    Chang, Ming-Wei
    INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 119, 2020, 119
  • [34] Conditional Embedding Pre-Training Language Model for Image Captioning
    Pengfei Li
    Min Zhang
    Peijie Lin
    Jian Wan
    Ming Jiang
    Neural Processing Letters, 2022, 54 : 4987 - 5003
  • [35] Pre-training and Evaluation of Numeracy-oriented Language Model
    Feng, Fuli
    Rui, Xilin
    Wang, Wenjie
    Cao, Yixin
    Chua, Tat-Seng
    ICAIF 2021: THE SECOND ACM INTERNATIONAL CONFERENCE ON AI IN FINANCE, 2021,
  • [36] Position-guided Text Prompt for Vision-Language Pre-training
    Wang, Jinpeng
    Zhou, Pan
    Shou, Mike Zheng
    Yan, Shuicheng
    2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2023, : 23242 - 23251
  • [37] Subset selection for domain adaptive pre-training of language model
    Hwang, Junha
    Lee, Seungdong
    Kim, Haneul
    Jeong, Young-Seob
    SCIENTIFIC REPORTS, 2025, 15 (01):
  • [38] Gradual Syntactic Label Replacement for Language Model Pre-Training
    Wang, Yile
    Zhang, Yue
    Li, Peng
    Liu, Yang
    IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2024, 32 : 486 - 496
  • [39] Analysing The Impact of Sequence Composition on Language Model Pre-Training
    Zhao, Yu
    Qu, Yuanbin
    Staniszewski, Konrad
    Tworkowski, Szymon
    Liu, Wei
    Milos, Piotr
    Wu, Yuxiang
    Minervini, Pasquale
    PROCEEDINGS OF THE 62ND ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, VOL 1: LONG PAPERS, 2024, : 7897 - 7912
  • [40] Learning Better Masking for Better Language Model Pre-training
    Yang, Dongjie
    Zhang, Zhuosheng
    Zhao, Hai
    PROCEEDINGS OF THE 61ST ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, ACL 2023, VOL 1, 2023, : 7255 - 7267