Lightweight Model Pre-Training via Language Guided Knowledge Distillation

被引:0
|
作者
Li, Mingsheng [1 ]
Zhang, Lin [1 ]
Zhu, Mingzhen [1 ]
Huang, Zilong [2 ]
Yu, Gang [2 ]
Fan, Jiayuan [3 ]
Chen, Tao [1 ]
机构
[1] Fudan Univ, Sch Informat Sci & Technol, Shanghai 200433, Peoples R China
[2] Tencent GY Lab, Shanghai 200000, Peoples R China
[3] Fudan Univ, Acad Engn & Technol, Shanghai 200433, Peoples R China
基金
中国国家自然科学基金; 国家重点研发计划;
关键词
Visualization; Semantics; Task analysis; Feature extraction; Training; Computational modeling; Image segmentation; Lightweight model pre-training; language-guided distillation; textual semantics bank; visual semantics banks;
D O I
10.1109/TMM.2024.3410532
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
This paper studies the problem of pre-training for small models, which is essential for many mobile devices. Current state-of-the-art methods on this problem transfer the representational knowledge of a large network (as a Teacher) into a smaller model (as a Student) using self-supervised distillation, improving the performance of the small model on downstream tasks. However, existing approaches are insufficient in extracting the crucial knowledge that is useful for discerning categories in downstream tasks during the distillation process. In this paper, for the first time, we introduce language guidance to the distillation process and propose a new method named Language-Guided Distillation (LGD) system, which uses category names of the target downstream task to help refine the knowledge transferred between the teacher and student. To this end, we utilize a pre-trained text encoder to extract semantic embeddings from language and construct a textual semantic space called Textual Semantics Bank (TSB). Furthermore, we design a Language-Guided Knowledge Aggregation (LGKA) module to construct the visual semantic space, also named Visual Semantics Bank (VSB). The task-related knowledge is transferred by driving a student encoder to mimic the similarity score distribution inferred by a teacher over TSB and VSB. Compared with other small models obtained by either ImageNet pre-training or self-supervised distillation, experiment results show that the distilled lightweight model using the proposed LGD method presents state-of-the-art performance and is validated on various downstream tasks, including classification, detection, and segmentation.
引用
收藏
页码:10720 / 10730
页数:11
相关论文
共 50 条
  • [21] Unified Language Model Pre-training for Natural Language Understanding and Generation
    Dong, Li
    Yang, Nan
    Wang, Wenhui
    Wei, Furu
    Liu, Xiaodong
    Wang, Yu
    Gao, Jianfeng
    Zhou, Ming
    Hon, Hsiao-Wuen
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 32 (NIPS 2019), 2019, 32
  • [22] Knowledge Transfer via Pre-training for Recommendation: A Review and Prospect
    Zeng, Zheni
    Xiao, Chaojun
    Yao, Yuan
    Xie, Ruobing
    Liu, Zhiyuan
    Lin, Fen
    Lin, Leyu
    Sun, Maosong
    FRONTIERS IN BIG DATA, 2021, 4
  • [23] Improving Knowledge Tracing via Pre-training Question Embeddings
    Liu, Yunfei
    Yang, Yang
    Chen, Xianyu
    Shen, Jian
    Zhang, Haifeng
    Yu, Yong
    PROCEEDINGS OF THE TWENTY-NINTH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2020, : 1577 - 1583
  • [24] Simultaneously Training and Compressing Vision-and-Language Pre-Training Model
    Qi, Qiaosong
    Zhang, Aixi
    Liao, Yue
    Sun, Wenyu
    Wang, Yongliang
    Li, Xiaobo
    Liu, Si
    IEEE TRANSACTIONS ON MULTIMEDIA, 2023, 25 : 8194 - 8203
  • [25] Improving Knowledge Tracing via Pre-training Question Embeddings
    Liu, Yunfei
    Yang, Yang
    Chen, Xianyu
    Shen, Jian
    Zhang, Haifeng
    Yu, Yong
    PROCEEDINGS OF THE TWENTY-NINTH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2020, : 1556 - 1562
  • [26] ARCHICLIP Enhanced Contrastive Language-Image Pre-training Model With Architectural Prior Knowledge
    Xia, Shengtao
    Cheng, Yiming
    Tian, Runjia
    PROCEEDINGS OF THE 29TH INTERNATIONAL CONFERENCE OF THE ASSOCIATION FOR COMPUTER-AIDED ARCHITECTURAL DESIGN RESEARCH IN ASIA, CAADRIA 2024, VOL 1, 2024, : 69 - 78
  • [27] Retrieval-based Knowledge Augmented Vision Language Pre-training
    Rao, Jiahua
    Shan, Zifei
    Liu, Longpo
    Zhou, Yao
    Yang, Yuedong
    PROCEEDINGS OF THE 31ST ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2023, 2023, : 5399 - 5409
  • [28] Pre-training language model incorporating domain-specific heterogeneous knowledge into a unified representation
    Zhu, Hongyin
    Peng, Hao
    Lyu, Zhiheng
    Hou, Lei
    Li, Juanzi
    Xiao, Jinghui
    EXPERT SYSTEMS WITH APPLICATIONS, 2023, 215
  • [29] Pre-training via Paraphrasing
    Lewis, Mike
    Ghazvininejad, Marjan
    Ghosh, Gargi
    Aghajanyan, Armen
    Wang, Sida
    Zettlemoyer, Luke
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 33, NEURIPS 2020, 2020, 33
  • [30] Conditional Embedding Pre-Training Language Model for Image Captioning
    Li, Pengfei
    Zhang, Min
    Lin, Peijie
    Wan, Jian
    Jiang, Ming
    NEURAL PROCESSING LETTERS, 2022, 54 (06) : 4987 - 5003