Improving Neural Topic Models with Wasserstein Knowledge Distillation

被引:1
|
作者
Adhya, Suman [1 ]
Sanyal, Debarshi Kumar [1 ]
机构
[1] Indian Assoc Cultivat Sci, Jadavpur 700032, India
关键词
Topic modeling; Knowledge distillation; Wasserstein distance; Contextualized topic model; Variational autoencoder;
D O I
10.1007/978-3-031-28238-6_21
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Topic modeling is a dominant method for exploring document collections on the web and in digital libraries. Recent approaches to topic modeling use pretrained contextualized language models and variational autoencoders. However, large neural topic models have a considerable memory footprint. In this paper, we propose a knowledge distillation framework to compress a contextualized topicmodel without loss in topic quality. In particular, the proposed distillation objective is to minimize the cross-entropy of the soft labels produced by the teacher and the student models, as well as to minimize the squared 2-Wasserstein distance between the latent distributions learned by the two models. Experiments on two publicly available datasets show that the student trained with knowledge distillation achieves topic coherence much higher than that of the original student model, and even surpasses the teacher while containing far fewer parameters than the teacher. The distilled model also outperforms several other competitive topic models on topic coherence.
引用
收藏
页码:321 / 330
页数:10
相关论文
共 50 条
  • [41] Improving Knowledge Distillation via Head and Tail Categories
    Xu, Liuchi
    Ren, Jin
    Huang, Zhenhua
    Zheng, Weishi
    Chen, Yunwen
    [J]. IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2024, 34 (05) : 3465 - 3480
  • [42] Improving the accuracy of pruned network using knowledge distillation
    Prakosa, Setya Widyawan
    Leu, Jenq-Shiou
    Chen, Zhao-Hong
    [J]. PATTERN ANALYSIS AND APPLICATIONS, 2021, 24 (02) : 819 - 830
  • [43] Improving the accuracy of pruned network using knowledge distillation
    Setya Widyawan Prakosa
    Jenq-Shiou Leu
    Zhao-Hong Chen
    [J]. Pattern Analysis and Applications, 2021, 24 : 819 - 830
  • [44] Benchmarking Neural Topic Models: An Empirical Study
    Thanh-Nam Doan
    Tuan-Anh Hoang
    [J]. FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, ACL-IJCNLP 2021, 2021, : 4363 - 4368
  • [45] Knowledge Distillation: Bad Models Can Be Good Role Models
    Kaplun, Gal
    Malach, Eran
    Nakkiran, Preetum
    Shalev-Shwartz, Shai
    [J]. ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 35 (NEURIPS 2022), 2022,
  • [46] Improving the accuracy of mechanistic models for dynamic batch distillation enabled by neural network: An industrial plant case
    Xiaoyu Zhou
    Xiangyi Gao
    Mingmei Wang
    Erwei Song
    Erqiang Wang
    [J]. Chinese Journal of Chemical Engineering, 2024, 73 (09) : 290 - 300
  • [47] Improving the accuracy of mechanistic models for dynamic batch distillation enabled by neural network: An industrial plant case
    Zhou, Xiaoyu
    Gao, Xiangyi
    Wang, Mingmei
    Song, Erwei
    Wang, Erqiang
    [J]. CHINESE JOURNAL OF CHEMICAL ENGINEERING, 2024, 73 : 290 - 300
  • [48] Neural Machine Translation with Heterogeneous Topic Knowledge Embeddings
    Wang, Weixuan
    Peng, Wei
    Zhang, Meng
    Liu, Qun
    [J]. 2021 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING (EMNLP 2021), 2021, : 3197 - 3202
  • [49] Translating with Bilingual Topic Knowledge for Neural Machine Translation
    Wei, Xiangpeng
    Hu, Yue
    Xing, Luxi
    Wang, Yipeng
    Gao, Li
    [J]. THIRTY-THIRD AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE / THIRTY-FIRST INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE / NINTH AAAI SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2019, : 7257 - 7264
  • [50] On using neural networks models for distillation control
    Munsif, HP
    Riggs, JB
    [J]. DISTILLATION AND ABSORPTION '97, VOLS 1 AND 2, 1997, (142): : 259 - 268