Integrating Prosodic Information into Recurrent Neural Network Language Model For Speech Recognition

被引:0
|
作者
Fu, Tong [1 ]
Han, Yang [1 ]
Li, Xiangang [1 ]
Liu, Yi [1 ]
Wu, Xihong [1 ]
机构
[1] Peking Univ, Minist Educ, Key Lab Machine Percept, Speech & Hearing Res Ctr, Beijing 100871, Peoples R China
关键词
D O I
暂无
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
Prosody is a kind of cues that are critical to human speech perception and comprehension, so it is plausible to integrate prosodic information into machine speech recognition. However, as a result of the supra-segmental nature, it is hard to integrate prosodic information with conventional acoustic features. Recently, RNNLMs have shown to be the state-of-the-art language model in many tasks. We thus attempt to integrate prosodic information into RNNLMs for improving speech recognition performance based on rescoring strategy. Firstly, three word-level prosodic features are extracted from speech and then passed to RNNLMs separately. Therefore RNNLMs predict the next word based on prosodic features and word history. Experiments conducted on LibriSpeech Corpus show that the word error rate decreases from 8.07% to 7.96%. Secondly, prosodic information is combined on feature-level and model-level for further improvements and word error rate decreases 4.71% relatively.
引用
收藏
页码:1194 / 1197
页数:4
相关论文
共 50 条
  • [1] Recurrent Neural Network Language Model Adaptation for Conversational Speech Recognition
    Li, Ke
    Xu, Hainan
    Wang, Yiming
    Povey, Daniel
    Khudanpur, Sanjeev
    [J]. 19TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2018), VOLS 1-6: SPEECH RESEARCH FOR EMERGING MARKETS IN MULTILINGUAL SOCIETIES, 2018, : 3373 - 3377
  • [2] Recurrent Neural Network Language Model with Part-of-speech for Mandarin Speech Recognition
    Gong, Caixia
    Li, Xiangang
    Wu, Xihong
    [J]. 2014 9TH INTERNATIONAL SYMPOSIUM ON CHINESE SPOKEN LANGUAGE PROCESSING (ISCSLP), 2014, : 459 - 463
  • [3] RECURRENT NEURAL NETWORK LANGUAGE MODEL WITH STRUCTURED WORD EMBEDDINGS FOR SPEECH RECOGNITION
    He, Tianxing
    Xiang, Xu
    Qian, Yanmin
    Yu, Kai
    [J]. 2015 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING (ICASSP), 2015, : 5396 - 5400
  • [4] ACCELERATING RECURRENT NEURAL NETWORK LANGUAGE MODEL BASED ONLINE SPEECH RECOGNITION SYSTEM
    Lee, Kyungmin
    Park, Chiyoun
    Kim, Namhoon
    Lee, Jaewon
    [J]. 2018 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2018, : 5904 - 5908
  • [5] RECURRENT NEURAL NETWORK LANGUAGE MODEL TRAINING WITH NOISE CONTRASTIVE ESTIMATION FOR SPEECH RECOGNITION
    Chen, X.
    Liu, X.
    Gales, M. J. E.
    Woodland, P. C.
    [J]. 2015 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING (ICASSP), 2015, : 5411 - 5415
  • [6] Multi-Domain Recurrent Neural Network Language Model for Medical Speech Recognition
    Tilk, Ottokar
    Alumaee, Tanel
    [J]. HUMAN LANGUAGE TECHNOLOGIES - THE BALTIC PERSPECTIVE, BALTIC HLT 2014, 2014, 268 : 149 - +
  • [7] Investigating Bidirectional Recurrent Neural Network Language Models for Speech Recognition
    Chen, X.
    Ragni, A.
    Liu, X.
    Gales, M. J. F.
    [J]. 18TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2017), VOLS 1-6: SITUATED INTERACTION, 2017, : 269 - 273
  • [8] A Speech Recognition System for Bengali Language using Recurrent Neural Network
    Islam, Jahirul
    Mubassira, Masiath
    Islam, Md. Rakibul
    Das, Amit Kumar
    [J]. 2019 IEEE 4TH INTERNATIONAL CONFERENCE ON COMPUTER AND COMMUNICATION SYSTEMS (ICCCS 2019), 2019, : 73 - 76
  • [9] BIDIRECTIONAL RECURRENT NEURAL NETWORK LANGUAGE MODELS FOR AUTOMATIC SPEECH RECOGNITION
    Arisoy, Ebru
    Sethy, Abhinav
    Ramabhadran, Bhuvana
    Chen, Stanley
    [J]. 2015 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING (ICASSP), 2015, : 5421 - 5425
  • [10] Integrating meta-information into recurrent neural network language models
    Shi, Yangyang
    Larson, Martha
    Pelemans, Joris
    Jonker, Catholijn M.
    Wambacq, Patrick
    Wiggers, Pascal
    Demuynck, Kris
    [J]. SPEECH COMMUNICATION, 2015, 73 : 64 - 80