Efficient language model adaptation with Noise Contrastive Estimation and Kullback-Leibler regularization

被引：5

作者：

Andres-Ferrer, Jesus ^{[1
]}

Bodenstab, Nathan ^{[1
]}

Vozila, Paul ^{[1
]}

机构：

[1] Nuance Commun, Burlington, MA 01803 USA

来源：

19TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2018), VOLS 1-6: SPEECH RESEARCH FOR EMERGING MARKETS IN MULTILINGUAL SOCIETIES | 2018年

关键词：

speech recognition; NCE; KLD; language modeling; adaptation;

D O I：

10.21437/Interspeech.2018-1345

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Many language modeling (LM) tasks have limited in-domain data for training. Exploiting out-of-domain data while retaining the relevant in-domain statistics is a desired property in these scenarios. Kullback-Leibler Divergence (KLD) regularization is a popular method for acoustic model (AM) adaptation. KLD regularization assumes that the last layer is a softmax that fully activates the targets of both in-domain and out-of-domain models. Unfortunately, this softmax activation is computationally prohibitive for language modeling where the number of output classes is large, typically 50k to 100K, but may even exceed 800k in some cases. The computational bottleneck of the softmax during LM training can be reduced by an order of magnitude using techniques such as noise contrastive estimation (NCE), which replaces the cross-entropy loss function with a binary classification problem between the target output and random noise samples. In this work we combine NCE and KLD regularization and offer a fast domain adaptation method for LM training, while also retaining important attributes of the original NCE, such as self-normalization. We show on a medical domain-adaptation task that our method improves perplexity by 10.1 % relative to a strong LSTM baseline.

引用

页码：3368 / 3372

页数：5

共 50 条

[1] Efficient distributional reinforcement learning with Kullback-Leibler divergence regularization
Li, Renxing
Shang, Zhiwei
Zheng, Chunhua
Li, Huiyun
Liang, Qing
Cui, Yunduan
[J]. APPLIED INTELLIGENCE, 2023, 53 (21) : 24847 - 24863
[2] Efficient distributional reinforcement learning with Kullback-Leibler divergence regularization
Renxing Li
Zhiwei Shang
Chunhua Zheng
Huiyun Li
Qing Liang
Yunduan Cui
[J]. Applied Intelligence, 2023, 53 : 24847 - 24863
[3] Efficient adaptation text design based on the Kullback-Leibler measure
Cui, XD
Alwan, A
[J]. 2002 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOLS I-IV, PROCEEDINGS, 2002, : 613 - 616
[4] Nonparametric Estimation of Kullback-Leibler Divergence
Zhang, Zhiyi
Grabchak, Michael
[J]. NEURAL COMPUTATION, 2014, 26 (11) : 2570 - 2593
[5] Statistical Estimation of the Kullback-Leibler Divergence
Bulinski, Alexander
Dimitrov, Denis
[J]. MATHEMATICS, 2021, 9 (05) : 1 - 36
[6] Kullback-Leibler information and interval estimation
Shanmugam, R
[J]. COMMUNICATIONS IN STATISTICS-THEORY AND METHODS, 1999, 28 (09) : 2057 - 2063
[7] Probabilistic Forecast Reconciliation with Kullback-Leibler Divergence Regularization
Zhang, Guanyu
Li, Feng
Kang, Yanfei
[J]. 2023 23RD IEEE INTERNATIONAL CONFERENCE ON DATA MINING WORKSHOPS, ICDMW 2023, 2023, : 601 - 607
[8] Model Fusion with Kullback-Leibler Divergence
Claici, Sebastian
Yurochkin, Mikhail
Ghosh, Soumya
Solomon, Justin
[J]. INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 119, 2020, 119
[9] Minimization of the Kullback-Leibler Divergence for Nonlinear Estimation
Darling, Jacob E.
DeMars, Kyle J.
[J]. JOURNAL OF GUIDANCE CONTROL AND DYNAMICS, 2017, 40 (07) : 1739 - 1748
[10] Kullback-Leibler Divergence Estimation of Continuous Distributions
Perez-Cruz, Fernando
[J]. 2008 IEEE INTERNATIONAL SYMPOSIUM ON INFORMATION THEORY PROCEEDINGS, VOLS 1-6, 2008, : 1666 - 1670

← 1 2 3 4 5 →