Deep Learning Based Dereverberation of Temporal Envelopes for Robust Speech Recognition

被引:5
|
作者
Purushothaman, Anurenjan [1 ]
Sreeram, Anirudh [1 ]
Kumar, Rohit [1 ]
Ganapathy, Sriram [1 ]
机构
[1] Indian Inst Sci, Elect Engn, Learning & Extract Acoust Patterns LEAP Lab, Bangalore, Karnataka, India
来源
关键词
Automatic speech recognition; Frequency domain linear prediction (FDLP); Dereverberation; Neural speech enhancement; DOMAIN;
D O I
10.21437/Interspeech.2020-2283
中图分类号
R36 [病理学]; R76 [耳鼻咽喉科学];
学科分类号
100104 ; 100213 ;
摘要
Automatic speech recognition in reverberant conditions is a challenging task as the long-term envelopes of the reverberant speech are temporally smeared. In this paper, we propose a neural model for enhancement of sub-band temporal envelopes for dereverberation of speech. The temporal envelopes are derived using the autoregressive modeling framework of frequency domain linear prediction (FDLP). The neural enhancement model proposed in this paper performs an envelop gain based enhancement of temporal envelopes and it consists of a series of convolutional and recurrent neural network layers. The enhanced sub-band envelopes are used to generate features for automatic speech recognition (ASR). The ASR experiments are performed on the REVERB challenge dataset as well as the CHiME-3 dataset. In these experiments, the proposed neural enhancement approach provides significant improvements over a baseline ASR system with beamformed audio (average relative improvements of 21% on the development set and about 11% on the evaluation set in word error rates for REVERB challenge dataset).
引用
收藏
页码:1688 / 1692
页数:5
相关论文
共 50 条
  • [1] Robust Speech Dereverberation Based on WPE and Deep Learning
    Li, Hao
    Zhang, Xueliang
    Gao, Guanglai
    [J]. 2020 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA ASC), 2020, : 52 - 56
  • [2] An End-to-End Deep Learning Approach to Simultaneous Speech Dereverberation and Acoustic Modeling for Robust Speech Recognition
    Wu, Bo
    Li, Kehuang
    Ge, Fengpei
    Huang, Zhen
    Yang, Minglei
    Siniscalchi, Sabato Marco
    Lee, Chin-Hui
    [J]. IEEE JOURNAL OF SELECTED TOPICS IN SIGNAL PROCESSING, 2017, 11 (08) : 1289 - 1300
  • [3] Deep Learning Based Target Cancellation for Speech Dereverberation
    Wang, Zhong-Qiu
    Wang, DeLiang
    [J]. IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2020, 28 : 941 - 950
  • [4] Dereverberation of autoregressive envelopes for far-field speech recognition
    Purushothaman, Anurenjan
    Sreeram, Anirudh
    Kumar, Rohit
    Ganapathy, Sriram
    [J]. COMPUTER SPEECH AND LANGUAGE, 2022, 72
  • [5] Investigating Generative Adversarial Networks based Speech Dereverberation for Robust Speech Recognition
    Wang, Ke
    Zhang, Junbo
    Sun, Sining
    Wang, Yujun
    Xiang, Fei
    Xie, Lei
    [J]. 19TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2018), VOLS 1-6: SPEECH RESEARCH FOR EMERGING MARKETS IN MULTILINGUAL SOCIETIES, 2018, : 1581 - 1585
  • [6] Dereverberation based on Wavelet Packet Filtering for Robust Automatic Speech Recognition
    Gomez, Randy
    Kawahara, Tatsuya
    [J]. 13TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2012 (INTERSPEECH 2012), VOLS 1-3, 2012, : 1242 - 1245
  • [7] An Improved Wavelet-based Dereverberation for Robust Automatic Speech Recognition
    Gomez, Randy
    Kawahara, Tatsuya
    [J]. 11TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2010 (INTERSPEECH 2010), VOLS 1-2, 2010, : 578 - 581
  • [8] SPEECH DEREVERBERATION BASED ON INTEGRATED DEEP AND ENSEMBLE LEARNING ALGORITHM
    Lee, Wei-Jen
    Wang, Syu-Siang
    Chen, Fei
    Lu, Xugang
    Chien, Shao-Yi
    Tsao, Yu
    [J]. 2018 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2018, : 5454 - 5458
  • [9] Deep Learning for Environmentally Robust Speech Recognition
    Alhamada, A., I
    Khalifa, O. O.
    Abdalla, A. H.
    [J]. PROCEEDINGS OF THE 7TH INTERNATIONAL CONFERENCE ON ELECTRONIC DEVICES, SYSTEMS AND APPLICATIONS (ICEDSA2020), 2020, 2306
  • [10] Deep Learning-Based Amplitude Fusion for Speech Dereverberation
    Liu, Chunlei
    Wang, Longbiao
    Dang, Jianwu
    [J]. DISCRETE DYNAMICS IN NATURE AND SOCIETY, 2020, 2020