Long Short-Term Memory Based Language Model for Indonesian Spontaneous Speech Recognition

被引:0
|
作者
Putri, Fanda Yuliana [1 ]
Lestari, Dessi Puji [1 ]
Widyantoro, Dwi Hendratmo [1 ]
机构
[1] Bandung Inst Technol, Sch Elect Engn & Informat, Bandung, Indonesia
关键词
speech recognition system; ASR; spontaneous; language model; perplexity; LSTM; n-gram;
D O I
暂无
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
A robust recognition performance in daily or spontaneous conversation becomes necessary for a speech recognizer when deployed in real world applications. Meanwhile, the Indonesian speech recognition system (ASR) still has poor performance compared to dictated speech. In this work, we used deep neural networks approach, focused primarily on using long short-term memory (LSTM) to improve the language model performance as it has been successfully applied to many long context-dependent problems including language modeling. We tried different architectures and parameters to get the optimal combination, including deep LSTMs and LSTM with projection layer (LSTMP). Thereafter, different type of corpus was employed to enrich the language model linguistically. All our LSTM language models achieved significant improvement in terms of perplexity and word error rate (%WER) compared to n-gram as the baseline. The perplexity improvement was up to 50.6% and best WER reduction was 3.61% as evaluated with Triphone GMM- HMM acoustic model. The optimal architecture combination we got is deep LSTMP with L2 regularization.
引用
收藏
页码:44 / 48
页数:5
相关论文
共 50 条
  • [1] Speech Emotion Recognition for Indonesian Language Using Long Short-Term Memory
    Lasiman, Jeremia Jason
    Lestari, Dessi Puji
    [J]. 2018 INTERNATIONAL CONFERENCE ON COMPUTER, CONTROL, INFORMATICS AND ITS APPLICATIONS (IC3INA), 2018, : 40 - 43
  • [2] Part of Speech Tagging for Indonesian Language using Bidirectional Long Short-Term Memory
    Handrata, Dellon
    Purwanto, Christian Nathaniel
    Chandra, Fransisca Haryanti
    Santoso, Joan
    Gunawan
    [J]. 2019 1ST INTERNATIONAL CONFERENCE ON CYBERNETICS AND INTELLIGENT SYSTEM (ICORIS), 2019, : 85 - 88
  • [3] Long Short-term Memory for Tibetan Speech Recognition
    Wang, Weizhe
    Chen, Ziyan
    Yang, Hongwu
    [J]. PROCEEDINGS OF 2020 IEEE 4TH INFORMATION TECHNOLOGY, NETWORKING, ELECTRONIC AND AUTOMATION CONTROL CONFERENCE (ITNEC 2020), 2020, : 1059 - 1063
  • [4] Recognition of Sign Language System for Indonesian Language Using Long Short-Term Memory Neural Networks
    Rakun, Erdefi
    Arymurthy, Aniati M.
    Stefanus, Lim Y.
    Wicaksono, Alfan F.
    Wisesa, I. Wayan W.
    [J]. ADVANCED SCIENCE LETTERS, 2018, 24 (02) : 999 - 1004
  • [5] Recognition of Spontaneous Conversational Speech using Long Short-Term Memory Phoneme Predictions
    Woellmer, Martin
    Eyben, Florian
    Schuller, Bjoern
    Rigoll, Gerhard
    [J]. 11TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2010 (INTERSPEECH 2010), VOLS 3 AND 4, 2010, : 1946 - 1949
  • [6] SIGN LANGUAGE RECOGNITION WITH LONG SHORT-TERM MEMORY
    Liu, Tao
    Zhou, Wengang
    Li, Hougiang
    [J]. 2016 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING (ICIP), 2016, : 2871 - 2875
  • [7] Deep Long Short-Term Memory Networks for Speech Recognition
    Chien, Jen-Tzung
    Misbullah, Alim
    [J]. 2016 10TH INTERNATIONAL SYMPOSIUM ON CHINESE SPOKEN LANGUAGE PROCESSING (ISCSLP), 2016,
  • [8] LONG SHORT-TERM MEMORY LANGUAGE MODELS WITH ADDITIVE MORPHOLOGICAL FEATURES FOR AUTOMATIC SPEECH RECOGNITION
    Renshaw, Daniel
    Hall, Keith B.
    [J]. 2015 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING (ICASSP), 2015, : 5246 - 5250
  • [9] Speech-to-Text Conversion in Indonesian Language Using a Deep Bidirectional Long Short-Term Memory Algorithm
    Dwijayanti, Suci
    Tami, Muhammad Abid
    Suprapto, Bhakti Yudho
    [J]. INTERNATIONAL JOURNAL OF ADVANCED COMPUTER SCIENCE AND APPLICATIONS, 2021, 12 (03) : 225 - 230
  • [10] Long Short-Term Memory Networks for Noise Robust Speech Recognition
    Woellmer, Martin
    Sun, Yang
    Eyben, Florian
    Schuller, Bjoern
    [J]. 11TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2010 (INTERSPEECH 2010), VOLS 3 AND 4, 2010, : 2966 - 2969