Long Short-Term Memory Based Language Model for Indonesian Spontaneous Speech Recognition

被引:0
|
作者
Putri, Fanda Yuliana [1 ]
Lestari, Dessi Puji [1 ]
Widyantoro, Dwi Hendratmo [1 ]
机构
[1] Bandung Inst Technol, Sch Elect Engn & Informat, Bandung, Indonesia
关键词
speech recognition system; ASR; spontaneous; language model; perplexity; LSTM; n-gram;
D O I
暂无
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
A robust recognition performance in daily or spontaneous conversation becomes necessary for a speech recognizer when deployed in real world applications. Meanwhile, the Indonesian speech recognition system (ASR) still has poor performance compared to dictated speech. In this work, we used deep neural networks approach, focused primarily on using long short-term memory (LSTM) to improve the language model performance as it has been successfully applied to many long context-dependent problems including language modeling. We tried different architectures and parameters to get the optimal combination, including deep LSTMs and LSTM with projection layer (LSTMP). Thereafter, different type of corpus was employed to enrich the language model linguistically. All our LSTM language models achieved significant improvement in terms of perplexity and word error rate (%WER) compared to n-gram as the baseline. The perplexity improvement was up to 50.6% and best WER reduction was 3.61% as evaluated with Triphone GMM- HMM acoustic model. The optimal architecture combination we got is deep LSTMP with L2 regularization.
引用
收藏
页码:44 / 48
页数:5
相关论文
共 50 条
  • [21] LATTICE RESCORING STRATEGIES FOR LONG SHORT TERM MEMORY LANGUAGE MODELS IN SPEECH RECOGNITION
    Kumar, Shankar
    Nirschl, Michael
    Holtmann-Rice, Daniel
    Liao, Hank
    Suresh, Ananda Theertha
    Yu, Felix
    [J]. 2017 IEEE AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING WORKSHOP (ASRU), 2017, : 165 - 172
  • [22] Long Short-Term Memory based Convolutional Recurrent Neural Networks for Large Vocabulary Speech Recognition
    Li, Xiangang
    Wu, Xihong
    [J]. 16TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2015), VOLS 1-5, 2015, : 3219 - 3223
  • [23] Electroencephalography-based imagined speech recognition using deep long short-term memory network
    Agarwal, Prabhakar
    Kumar, Sandeep
    [J]. ETRI JOURNAL, 2022, 44 (04) : 672 - 685
  • [24] Named Entity Recognition on Indonesian Twitter Posts Using Long Short-Term Memory Networks
    Rachman, Valdi
    Savitri, Septiviana
    Augustianti, Fithriannisa
    Mahendra, Rahmad
    [J]. 2017 INTERNATIONAL CONFERENCE ON ADVANCED COMPUTER SCIENCE AND INFORMATION SYSTEMS (ICACSIS), 2017, : 228 - 232
  • [25] MINIMUM WORD ERROR TRAINING OF LONG SHORT-TERM MEMORY RECURRENT NEURAL NETWORK LANGUAGE MODELS FOR SPEECH RECOGNITION
    Hori, Takaaki
    Hori, Chiori
    Watanabe, Shinji
    Hershey, John R.
    [J]. 2016 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING PROCEEDINGS, 2016, : 5990 - 5994
  • [26] Language Modeling Using Part-of-speech and Long Short-Term Memory Networks
    Norouzi, Sanaz Saki
    Akbari, Ahmad
    Nasersharif, Babak
    [J]. 2019 9TH INTERNATIONAL CONFERENCE ON COMPUTER AND KNOWLEDGE ENGINEERING (ICCKE 2019), 2019, : 182 - 187
  • [27] BIDIRECTIONAL QUATERNION LONG SHORT-TERM MEMORY RECURRENT NEURAL NETWORKS FOR SPEECH RECOGNITION
    Parcollet, Titouan
    Morchid, Mohamed
    Linares, Georges
    De Mori, Renato
    [J]. 2019 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2019, : 8519 - 8523
  • [28] Speech Dereverberation Using Long Short-Term Memory
    Mimura, Masato
    Sakai, Shinsuke
    Kawahara, Tatsuya
    [J]. 16TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2015), VOLS 1-5, 2015, : 2435 - 2439
  • [29] A Speech Recognition Method Using Long Short-Term Memory Network in Low Resources
    Shu, Fan
    Qu, Dan
    Zhang, Wenlin
    Zhou, Lili
    Guo, Wu
    [J]. Hsi-An Chiao Tung Ta Hsueh/Journal of Xi'an Jiaotong University, 2017, 51 (10): : 120 - 127
  • [30] Modeling Speaker Variability Using Long Short-Term Memory Networks for Speech Recognition
    Li, Xiangang
    Wu, Xihong
    [J]. 16TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2015), VOLS 1-5, 2015, : 1086 - 1090