Integrating Prosodic Information into Recurrent Neural Network Language Model For Speech Recognition

被引：0

作者：

Fu, Tong ^{[1
]}

Han, Yang ^{[1
]}

Li, Xiangang ^{[1
]}

Liu, Yi ^{[1
]}

Wu, Xihong ^{[1
]}

机构：

[1] Peking Univ, Minist Educ, Key Lab Machine Percept, Speech & Hearing Res Ctr, Beijing 100871, Peoples R China

来源：

2015 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA) | 2015年

关键词：

D O I：

暂无

中图分类号：

TM [电工技术]; TN [电子技术、通信技术];

学科分类号：

0808 ; 0809 ;

摘要：

Prosody is a kind of cues that are critical to human speech perception and comprehension, so it is plausible to integrate prosodic information into machine speech recognition. However, as a result of the supra-segmental nature, it is hard to integrate prosodic information with conventional acoustic features. Recently, RNNLMs have shown to be the state-of-the-art language model in many tasks. We thus attempt to integrate prosodic information into RNNLMs for improving speech recognition performance based on rescoring strategy. Firstly, three word-level prosodic features are extracted from speech and then passed to RNNLMs separately. Therefore RNNLMs predict the next word based on prosodic features and word history. Experiments conducted on LibriSpeech Corpus show that the word error rate decreases from 8.07% to 7.96%. Secondly, prosodic information is combined on feature-level and model-level for further improvements and word error rate decreases 4.71% relatively.

引用

页码：1194 / 1197

页数：4

共 50 条

[1] Recurrent Neural Network Language Model Adaptation for Conversational Speech Recognition
Li, Ke
Xu, Hainan
Wang, Yiming
Povey, Daniel
Khudanpur, Sanjeev
[J]. 19TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2018), VOLS 1-6: SPEECH RESEARCH FOR EMERGING MARKETS IN MULTILINGUAL SOCIETIES, 2018, : 3373 - 3377
[2] Recurrent Neural Network Language Model with Part-of-speech for Mandarin Speech Recognition
Gong, Caixia
Li, Xiangang
Wu, Xihong
[J]. 2014 9TH INTERNATIONAL SYMPOSIUM ON CHINESE SPOKEN LANGUAGE PROCESSING (ISCSLP), 2014, : 459 - 463
[3] RECURRENT NEURAL NETWORK LANGUAGE MODEL WITH STRUCTURED WORD EMBEDDINGS FOR SPEECH RECOGNITION
He, Tianxing
Xiang, Xu
Qian, Yanmin
Yu, Kai
[J]. 2015 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING (ICASSP), 2015, : 5396 - 5400
[4] ACCELERATING RECURRENT NEURAL NETWORK LANGUAGE MODEL BASED ONLINE SPEECH RECOGNITION SYSTEM
Lee, Kyungmin
Park, Chiyoun
Kim, Namhoon
Lee, Jaewon
[J]. 2018 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2018, : 5904 - 5908
[5] RECURRENT NEURAL NETWORK LANGUAGE MODEL TRAINING WITH NOISE CONTRASTIVE ESTIMATION FOR SPEECH RECOGNITION
Chen, X.
Liu, X.
Gales, M. J. E.
Woodland, P. C.
[J]. 2015 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING (ICASSP), 2015, : 5411 - 5415
[6] Multi-Domain Recurrent Neural Network Language Model for Medical Speech Recognition
Tilk, Ottokar
Alumaee, Tanel
[J]. HUMAN LANGUAGE TECHNOLOGIES - THE BALTIC PERSPECTIVE, BALTIC HLT 2014, 2014, 268 : 149 - +
[7] Investigating Bidirectional Recurrent Neural Network Language Models for Speech Recognition
Chen, X.
Ragni, A.
Liu, X.
Gales, M. J. F.
[J]. 18TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2017), VOLS 1-6: SITUATED INTERACTION, 2017, : 269 - 273
[8] A Speech Recognition System for Bengali Language using Recurrent Neural Network
Islam, Jahirul
Mubassira, Masiath
Islam, Md. Rakibul
Das, Amit Kumar
[J]. 2019 IEEE 4TH INTERNATIONAL CONFERENCE ON COMPUTER AND COMMUNICATION SYSTEMS (ICCCS 2019), 2019, : 73 - 76
[9] BIDIRECTIONAL RECURRENT NEURAL NETWORK LANGUAGE MODELS FOR AUTOMATIC SPEECH RECOGNITION
Arisoy, Ebru
Sethy, Abhinav
Ramabhadran, Bhuvana
Chen, Stanley
[J]. 2015 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING (ICASSP), 2015, : 5421 - 5425
[10] Integrating meta-information into recurrent neural network language models
Shi, Yangyang
Larson, Martha
Pelemans, Joris
Jonker, Catholijn M.
Wambacq, Patrick
Wiggers, Pascal
Demuynck, Kris
[J]. SPEECH COMMUNICATION, 2015, 73 : 64 - 80

← 1 2 3 4 5 →