Efficient language model adaptation with Noise Contrastive Estimation and Kullback-Leibler regularization

被引:5
|
作者
Andres-Ferrer, Jesus [1 ]
Bodenstab, Nathan [1 ]
Vozila, Paul [1 ]
机构
[1] Nuance Commun, Burlington, MA 01803 USA
关键词
speech recognition; NCE; KLD; language modeling; adaptation;
D O I
10.21437/Interspeech.2018-1345
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Many language modeling (LM) tasks have limited in-domain data for training. Exploiting out-of-domain data while retaining the relevant in-domain statistics is a desired property in these scenarios. Kullback-Leibler Divergence (KLD) regularization is a popular method for acoustic model (AM) adaptation. KLD regularization assumes that the last layer is a softmax that fully activates the targets of both in-domain and out-of-domain models. Unfortunately, this softmax activation is computationally prohibitive for language modeling where the number of output classes is large, typically 50k to 100K, but may even exceed 800k in some cases. The computational bottleneck of the softmax during LM training can be reduced by an order of magnitude using techniques such as noise contrastive estimation (NCE), which replaces the cross-entropy loss function with a binary classification problem between the target output and random noise samples. In this work we combine NCE and KLD regularization and offer a fast domain adaptation method for LM training, while also retaining important attributes of the original NCE, such as self-normalization. We show on a medical domain-adaptation task that our method improves perplexity by 10.1 % relative to a strong LSTM baseline.
引用
收藏
页码:3368 / 3372
页数:5
相关论文
共 50 条
  • [1] Efficient distributional reinforcement learning with Kullback-Leibler divergence regularization
    Li, Renxing
    Shang, Zhiwei
    Zheng, Chunhua
    Li, Huiyun
    Liang, Qing
    Cui, Yunduan
    [J]. APPLIED INTELLIGENCE, 2023, 53 (21) : 24847 - 24863
  • [2] Efficient distributional reinforcement learning with Kullback-Leibler divergence regularization
    Renxing Li
    Zhiwei Shang
    Chunhua Zheng
    Huiyun Li
    Qing Liang
    Yunduan Cui
    [J]. Applied Intelligence, 2023, 53 : 24847 - 24863
  • [3] Efficient adaptation text design based on the Kullback-Leibler measure
    Cui, XD
    Alwan, A
    [J]. 2002 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOLS I-IV, PROCEEDINGS, 2002, : 613 - 616
  • [4] Nonparametric Estimation of Kullback-Leibler Divergence
    Zhang, Zhiyi
    Grabchak, Michael
    [J]. NEURAL COMPUTATION, 2014, 26 (11) : 2570 - 2593
  • [5] Statistical Estimation of the Kullback-Leibler Divergence
    Bulinski, Alexander
    Dimitrov, Denis
    [J]. MATHEMATICS, 2021, 9 (05) : 1 - 36
  • [6] Kullback-Leibler information and interval estimation
    Shanmugam, R
    [J]. COMMUNICATIONS IN STATISTICS-THEORY AND METHODS, 1999, 28 (09) : 2057 - 2063
  • [7] Probabilistic Forecast Reconciliation with Kullback-Leibler Divergence Regularization
    Zhang, Guanyu
    Li, Feng
    Kang, Yanfei
    [J]. 2023 23RD IEEE INTERNATIONAL CONFERENCE ON DATA MINING WORKSHOPS, ICDMW 2023, 2023, : 601 - 607
  • [8] Model Fusion with Kullback-Leibler Divergence
    Claici, Sebastian
    Yurochkin, Mikhail
    Ghosh, Soumya
    Solomon, Justin
    [J]. INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 119, 2020, 119
  • [9] Minimization of the Kullback-Leibler Divergence for Nonlinear Estimation
    Darling, Jacob E.
    DeMars, Kyle J.
    [J]. JOURNAL OF GUIDANCE CONTROL AND DYNAMICS, 2017, 40 (07) : 1739 - 1748
  • [10] Kullback-Leibler Divergence Estimation of Continuous Distributions
    Perez-Cruz, Fernando
    [J]. 2008 IEEE INTERNATIONAL SYMPOSIUM ON INFORMATION THEORY PROCEEDINGS, VOLS 1-6, 2008, : 1666 - 1670