XtremeDistil: Multi-stage Distillation for Massive Multilingual Models

被引:0
|
作者
Mukherjee, Subhabrata [1 ]
Awadallah, Ahmed Hassan [1 ]
机构
[1] Microsoft Res AI, Redmond, WA 98052 USA
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Deep and large pre-trained language models are the state-of-the-art for various natural language processing tasks. However, the huge size of these models could be a deterrent to using them in practice. Some recent works use knowledge distillation to compress these huge models into shallow ones. In this work we study knowledge distillation with a focus on multilingual Named Entity Recognition (NER). In particular, we study several distillation strategies and propose a stage-wise optimization scheme leveraging teacher internal representations, that is agnostic of teacher architecture, and show that it outperforms strategies employed in prior works. Additionally, we investigate the role of several factors like the amount of unlabeled data, annotation resources, model architecture and inference latency to name a few. We show that our approach leads to massive compression of teacher models like mBERT by upto 35x in terms of parameters and 51x in terms of latency for batch inference while retaining 95% of its F1-score for NER over 41 languages.
引用
收藏
页码:2221 / 2234
页数:14
相关论文
共 50 条
  • [1] PHYSICS OF DESALINATION BY MULTI-STAGE FLASH DISTILLATION
    SILVER, RS
    JOURNAL OF THE BRITISH NUCLEAR ENERGY SOCIETY, 1968, 7 (01): : 36 - &
  • [2] Cost optimization of multi-stage gap membrane distillation
    Dudchenko, Alexander, V
    Bartholomew, Timothy, V
    Mauter, Meagan S.
    JOURNAL OF MEMBRANE SCIENCE, 2021, 627
  • [3] Exergy analysis of multi-stage crude distillation units
    Xingang LI
    Canwei LIN
    Lei WANG
    Hong LI
    Frontiers of Chemical Science and Engineering, 2013, 7 (04) : 437 - 446
  • [4] Exergy analysis of multi-stage crude distillation units
    Li, Xingang
    Lin, Canwei
    Wang, Lei
    Li, Hong
    FRONTIERS OF CHEMICAL SCIENCE AND ENGINEERING, 2013, 7 (04) : 437 - 446
  • [5] MULTI-STAGE FLASH DISTILLATION - FIRST 10 YEARS
    SILVER, RS
    DESALINATION, 1971, 9 (01) : 3 - &
  • [6] Exergy analysis of multi-stage crude distillation units
    Xingang Li
    Canwei Lin
    Lei Wang
    Hong Li
    Frontiers of Chemical Science and Engineering, 2013, 7 : 437 - 446
  • [8] Multi-stage models of cancer and disease
    Webster, Anthony
    INTERNATIONAL JOURNAL OF EPIDEMIOLOGY, 2021, 50
  • [9] Multi-stage knowledge distillation for sequential recommendation with interest knowledge
    Du, Yongping
    Niu, Jinyu
    Wang, Yuxin
    Jin, Xingnan
    INFORMATION SCIENCES, 2024, 654
  • [10] Solar membrane distillation: theoretical assessment of multi-stage concept
    Carlos Vega-Beltran, Juan
    Garcia-Rodriguez, Lourdes
    Martin-Mateos, Isabel
    Blanco-Galvez, Julian
    DESALINATION AND WATER TREATMENT, 2010, 18 (1-3) : 133 - 138