Lightweight Model Pre-Training via Language Guided Knowledge Distillation

被引:0
|
作者
Li, Mingsheng [1 ]
Zhang, Lin [1 ]
Zhu, Mingzhen [1 ]
Huang, Zilong [2 ]
Yu, Gang [2 ]
Fan, Jiayuan [3 ]
Chen, Tao [1 ]
机构
[1] Fudan Univ, Sch Informat Sci & Technol, Shanghai 200433, Peoples R China
[2] Tencent GY Lab, Shanghai 200000, Peoples R China
[3] Fudan Univ, Acad Engn & Technol, Shanghai 200433, Peoples R China
基金
中国国家自然科学基金; 国家重点研发计划;
关键词
Visualization; Semantics; Task analysis; Feature extraction; Training; Computational modeling; Image segmentation; Lightweight model pre-training; language-guided distillation; textual semantics bank; visual semantics banks;
D O I
10.1109/TMM.2024.3410532
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
This paper studies the problem of pre-training for small models, which is essential for many mobile devices. Current state-of-the-art methods on this problem transfer the representational knowledge of a large network (as a Teacher) into a smaller model (as a Student) using self-supervised distillation, improving the performance of the small model on downstream tasks. However, existing approaches are insufficient in extracting the crucial knowledge that is useful for discerning categories in downstream tasks during the distillation process. In this paper, for the first time, we introduce language guidance to the distillation process and propose a new method named Language-Guided Distillation (LGD) system, which uses category names of the target downstream task to help refine the knowledge transferred between the teacher and student. To this end, we utilize a pre-trained text encoder to extract semantic embeddings from language and construct a textual semantic space called Textual Semantics Bank (TSB). Furthermore, we design a Language-Guided Knowledge Aggregation (LGKA) module to construct the visual semantic space, also named Visual Semantics Bank (VSB). The task-related knowledge is transferred by driving a student encoder to mimic the similarity score distribution inferred by a teacher over TSB and VSB. Compared with other small models obtained by either ImageNet pre-training or self-supervised distillation, experiment results show that the distilled lightweight model using the proposed LGD method presents state-of-the-art performance and is validated on various downstream tasks, including classification, detection, and segmentation.
引用
收藏
页码:10720 / 10730
页数:11
相关论文
共 50 条
  • [41] XLM-E: Cross-lingual Language Model Pre-training via ELECTRA
    Chi, Zewen
    Huang, Shaohan
    Dong, Li
    Ma, Shuming
    Zheng, Bo
    Singhal, Saksham
    Bajaj, Payal
    Song, Xia
    Mao, Xian-Ling
    Huang, Heyan
    Wei, Furu
    PROCEEDINGS OF THE 60TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2022), VOL 1: (LONG PAPERS), 2022, : 6170 - 6182
  • [42] A knowledge-guided pre-training framework for improving molecular representation learning
    Li, Han
    Zhang, Ruotian
    Min, Yaosen
    Ma, Dacheng
    Zhao, Dan
    Zeng, Jianyang
    NATURE COMMUNICATIONS, 2023, 14 (01)
  • [43] IMITATE: Clinical Prior Guided Hierarchical Vision-Language Pre-Training
    Liu, Che
    Cheng, Sibo
    Shi, Miaojing
    Shah, Anand
    Bai, Wenjia
    Arcucci, Rossella
    IEEE TRANSACTIONS ON MEDICAL IMAGING, 2025, 44 (01) : 519 - 529
  • [44] ASRLM: ASR-Robust Language Model Pre-training via Generative and Discriminative Learning
    Hu, Qian
    Han, Xue
    Wang, Yiting
    Wang, Yitong
    Deng, Chao
    Feng, Junlan
    NATURAL LANGUAGE PROCESSING AND CHINESE COMPUTING, PT III, NLPCC 2024, 2025, 15361 : 407 - 419
  • [45] KNOWLEDGE DISTILLATION FROM BERT IN PRE-TRAINING AND FINE-TUNING FOR POLYPHONE DISAMBIGUATION
    Sun, Hao
    Tan, Xu
    Gan, Jun-Wei
    Zhao, Sheng
    Han, Dongxu
    Liu, Hongzhi
    Qin, Tao
    Liu, Tie-Yan
    2019 IEEE AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING WORKSHOP (ASRU 2019), 2019, : 168 - 175
  • [46] Anatomical Structure-Guided Medical Vision-Language Pre-training
    Li, Qingqiu
    Yan, Xiaohan
    Xu, Jilan
    Yuan, Runtian
    Zhang, Yuejie
    Feng, Rui
    Shen, Quanli
    Zhang, Xiaobo
    Wang, Shujun
    MEDICAL IMAGE COMPUTING AND COMPUTER ASSISTED INTERVENTION - MICCAI 2024, PT XI, 2024, 15011 : 80 - 90
  • [47] Pre-training Universal Language Representation
    Li, Yian
    Zhao, Hai
    59TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS AND THE 11TH INTERNATIONAL JOINT CONFERENCE ON NATURAL LANGUAGE PROCESSING (ACL-IJCNLP 2021), VOL 1, 2021, : 5122 - 5133
  • [48] Cross-Lingual Natural Language Generation via Pre-Training
    Chi, Zewen
    Dong, Li
    Wei, Furu
    Wang, Wenhui
    Mao, Xian-Ling
    Huang, Heyan
    THIRTY-FOURTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, THE THIRTY-SECOND INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE AND THE TENTH AAAI SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2020, 34 : 7570 - 7577
  • [49] A knowledge-guided pre-training framework for improving molecular representation learning
    Han Li
    Ruotian Zhang
    Yaosen Min
    Dacheng Ma
    Dan Zhao
    Jianyang Zeng
    Nature Communications, 14 (1)
  • [50] Understanding Chinese Video and Language via Contrastive Multimodal Pre-Training
    Lei, Chenyi
    Luo, Shixian
    Liu, Yong
    He, Wanggui
    Wang, Jiamang
    Wang, Guoxin
    Tang, Haihong
    Miao, Chunyan
    Li, Houqiang
    PROCEEDINGS OF THE 29TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2021, 2021, : 2567 - 2576