Latent Words Recurrent Neural Network Language Models for Automatic Speech Recognition

被引：7

作者：

Masumura, Ryo ^{[1
]}

Asami, Taichi ^{[1
]}

Oba, Takanobu ^{[1
]}

Sakauchi, Sumitaka ^{[1
]}

Ito, Akinori ^{[2
]}

机构：

[1] NTT Corp, NTT Media Intelligence Labs, Yokosuka, Kanagawa 2390847, Japan

[2] Tohoku Univ, Grad Sch Engn, Sendai, Miyagi 9808579, Japan

来源：

IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS | 2019年 / E102D卷 / 12期

关键词：

latent words recurrent neural network language models; n-gram approximation; Viterbi approximation; automatic speech recognition; MIXTURE; GRAM;

D O I：

10.1587/transinf.2018EDP7242

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

This paper demonstrates latent word recurrent neural network language models (LW-RNN-LMs) for enhancing automatic speech recognition (ASR). LW-RNN-LMs are constructed so as to pick up advantages in both recurrent neural network language models (RNN-LMs) and latent word language models (LW-LMs). The RNN-LMs can capture long-range context information and offer strong performance, and the LW-LMs are robust for out-of-domain tasks based on the latent word space modeling. However, the RNN-LMs cannot explicitly capture hidden relationships behind observed words since a concept of a latent variable space is not present. In addition, the LW-LMs cannot take into account long-range relationships between latent words. Our idea is to combine RNN-LM and LW-LM so as to compensate individual disadvantages. The LW-RNN-LMs can support both a latent variable space modeling as well as LW-LMs and a long-range relationship modeling as well as RNN-LMs at the same time. From the viewpoint of RNN-LMs, LW-RNN-LM can be considered as a soft class RNN-LM with a vast latent variable space. In contrast, from the viewpoint of LW-LMs, LW-RNN-LM can be considered as an LW-LM that uses the RNN structure for latent variable modeling instead of an n-gram structure. This paper also details a parameter inference method and two kinds of implementation methods, an n-gram approximation and a Viterbi approximation, for introducing the LW-LM to ASR. Our experiments show effectiveness of LW-RNN-LMs on a perplexity evaluation for the Penn Treebank corpus and an ASR evaluation for Japanese spontaneous speech tasks.

引用

页码：2557 / 2567

页数：11

共 50 条

[1] BIDIRECTIONAL RECURRENT NEURAL NETWORK LANGUAGE MODELS FOR AUTOMATIC SPEECH RECOGNITION
Arisoy, Ebru
Sethy, Abhinav
Ramabhadran, Bhuvana
Chen, Stanley
[J]. 2015 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING (ICASSP), 2015, : 5421 - 5425
[2] Latent Words Recurrent Neural Network Language Models
Masumura, Ryo
Asami, Taichi
Oba, Takanobu
Masataki, Hirokazu
Sakauchi, Sumitaka
Ito, Akinori
[J]. 16TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2015), VOLS 1-5, 2015, : 2380 - 2384
[3] Efficient Training and Evaluation of Recurrent Neural Network Language Models for Automatic Speech Recognition
Chen, Xie
Liu, Xunying
Wang, Yongqiang
Gales, Mark J. F.
Woodland, Philip C.
[J]. IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2016, 24 (11) : 2146 - 2157
[4] gram Approximation of Latent Words Language Models for Domain Robust Automatic Speech Recognition
Masumura, Ryo
Asami, Taichi
Oba, Takanobu
Masataki, Hirokazu
Sakauchi, Sumitaka
Takahashi, Satoshi
[J]. IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 2016, E99D (10): : 2462 - 2470
[5] Domain Adaptation Based on Mixture of Latent Words Language Models for Automatic Speech Recognition
Masumura, Ryo
Asami, Taichi
Oba, Takanobu
Masataki, Hirokazu
Sakauchi, Sumitaka
Ito, Akinori
[J]. IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 2018, E101D (06): : 1581 - 1590
[6] Investigating Bidirectional Recurrent Neural Network Language Models for Speech Recognition
Chen, X.
Ragni, A.
Liu, X.
Gales, M. J. F.
[J]. 18TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2017), VOLS 1-6: SITUATED INTERACTION, 2017, : 269 - 273
[7] Conversion of Recurrent Neural Network Language Models to Weighted Finite State Transducers for Automatic Speech Recognition
Lecorve, Gwenole
Motlicek, Petr
[J]. 13TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2012 (INTERSPEECH 2012), VOLS 1-3, 2012, : 1666 - 1669
[8] LEARNING RECURRENT NEURAL NETWORK LANGUAGE MODELS WITH CONTEXT-SENSITIVE LABEL SMOOTHING FOR AUTOMATIC SPEECH RECOGNITION
Song, Minguang
Zhao, Yunxin
Wang, Shaojun
Han, Mei
[J]. 2020 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2020, : 6159 - 6163
[9] SEMANTIC WORD EMBEDDING NEURAL NETWORK LANGUAGE MODELS FOR AUTOMATIC SPEECH RECOGNITION
Audhkhasi, Kartik
Sethy, Abhinav
Ramabhadran, Bhuvana
[J]. 2016 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING PROCEEDINGS, 2016, : 5995 - 5999
[10] GAUSSIAN PROCESS LSTM RECURRENT NEURAL NETWORK LANGUAGE MODELS FOR SPEECH RECOGNITION
Lam, Max W. Y.
Chen, Xie
Hu, Shoukang
Yu, Jianwei
Liu, Xunying
Meng, Helen
[J]. 2019 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2019, : 7235 - 7239

← 1 2 3 4 5 →