Speech Dereverberation Using Long Short-Term Memory

被引:0
|
作者
Mimura, Masato [1 ]
Sakai, Shinsuke [1 ]
Kawahara, Tatsuya [1 ]
机构
[1] Kyoto Univ, Acad Ctr Comp & Media Studies, Sakyo Ku, Kyoto 6068501, Japan
关键词
Speech Dereverberation; Long Short-Term Memory (LSTM); Deep Autoencoder (DAE); NEURAL-NETWORKS; RECOGNITION; ALGORITHM;
D O I
暂无
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
Recently, neural networks have been used for not only phone recognition but also denoising and dereverberation. However, the conventional denoising deep autoencoder (DAE) based on the feed-forward structure is not capable of handling very long speech frames of reverberation. LSTM can be effectively trained to reduce the average error between the enhanced signal and the original clean signal by considering the effect of the long past time frames. In this paper, we demonstrate that considering as long as the maximum reverberation time of the database is effective. Since the effect of reverberation varies depending on the phone-class of the whole speech context, we augment the input of the autoencoder with the phone-class information of the past frames as well as the current frame and call this version of the LSTM autoencoder pLSTM. In the speech recognition experiment using the data set of Reverb Challenge 2014, the LSTM front-end reduced the WER of the multi condition DNN-HMM by 14.5%, and the use of the phone class feature yielded in pLSTM further improvement of 7.5%. The performance with the pLSTM is comparable to that of pDAE, while the number of parameters is only 1/25-1/8.
引用
收藏
页码:2435 / 2439
页数:5
相关论文
共 50 条
  • [1] Dual-stream Speech Dereverberation Network Using Long-term and Short-term Cues
    Li, Nan
    Ge, Meng
    Wang, Longbiao
    Dang, Jianwu
    [J]. 2022 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2022,
  • [2] Time Series-based Spoof Speech Detection Using Long Short-term Memory and Bidirectional Long Short-term Memory
    Mirza, Arsalan R.
    Al-Talabani, Abdulbasit K.
    [J]. ARO-THE SCIENTIFIC JOURNAL OF KOYA UNIVERSITY, 2024, 12 (02): : 119 - 129
  • [3] Long Short-term Memory for Tibetan Speech Recognition
    Wang, Weizhe
    Chen, Ziyan
    Yang, Hongwu
    [J]. PROCEEDINGS OF 2020 IEEE 4TH INFORMATION TECHNOLOGY, NETWORKING, ELECTRONIC AND AUTOMATION CONTROL CONFERENCE (ITNEC 2020), 2020, : 1059 - 1063
  • [4] Emotion Recognition From Speech and Text using Long Short-Term Memory
    Venkateswarlu, Sonagiri China
    Jeevakala, Siva Ramakrishna
    Kumar, Naluguru Udaya
    Munaswamy, Pidugu
    Pendyala, Dhanalaxmi
    [J]. ENGINEERING TECHNOLOGY & APPLIED SCIENCE RESEARCH, 2023, 13 (04) : 11166 - 11169
  • [5] Speech Emotion Recognition for Indonesian Language Using Long Short-Term Memory
    Lasiman, Jeremia Jason
    Lestari, Dessi Puji
    [J]. 2018 INTERNATIONAL CONFERENCE ON COMPUTER, CONTROL, INFORMATICS AND ITS APPLICATIONS (IC3INA), 2018, : 40 - 43
  • [6] Deep Long Short-Term Memory Networks for Speech Recognition
    Chien, Jen-Tzung
    Misbullah, Alim
    [J]. 2016 10TH INTERNATIONAL SYMPOSIUM ON CHINESE SPOKEN LANGUAGE PROCESSING (ISCSLP), 2016,
  • [7] Short-Term Load Forecasting using A Long Short-Term Memory Network
    Liu, Chang
    Jin, Zhijian
    Gu, Jie
    Qiu, Caiming
    [J]. 2017 IEEE PES INNOVATIVE SMART GRID TECHNOLOGIES CONFERENCE EUROPE (ISGT-EUROPE), 2017,
  • [8] Language Modeling Using Part-of-speech and Long Short-Term Memory Networks
    Norouzi, Sanaz Saki
    Akbari, Ahmad
    Nasersharif, Babak
    [J]. 2019 9TH INTERNATIONAL CONFERENCE ON COMPUTER AND KNOWLEDGE ENGINEERING (ICCKE 2019), 2019, : 182 - 187
  • [9] Recognition of Spontaneous Conversational Speech using Long Short-Term Memory Phoneme Predictions
    Woellmer, Martin
    Eyben, Florian
    Schuller, Bjoern
    Rigoll, Gerhard
    [J]. 11TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2010 (INTERSPEECH 2010), VOLS 3 AND 4, 2010, : 1946 - 1949
  • [10] LOMBARD SPEECH SYNTHESIS USING LONG SHORT-TERM MEMORY RECURRENT NEURAL NETWORKS
    Bollepalli, Bajibabu
    Airaksinen, Manu
    Alku, Paavo
    [J]. 2017 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2017, : 5505 - 5509