Strategies for distant speech recognition in reverberant environments

被引:46
|
作者
Delcroix, Marc [1 ]
Yoshioka, Takuya [1 ]
Ogawa, Atsunori [1 ]
Kubo, Yotaro [1 ]
Fujimoto, Masakiyo [1 ]
Ito, Nobutaka [1 ]
Kinoshita, Keisuke [1 ]
Espi, Miquel [1 ]
Araki, Shoko [1 ]
Hori, Takaaki [1 ]
Nakatani, Tomohiro [1 ]
机构
[1] NTT Corp, NTT Commun Sci Labs, Kyoto, Japan
关键词
Reverberant speech recognition; Robust speech recognition; REVERB challenge; Dereverberation; Noise reduction; Deep neural network; BLIND SEPARATION; DEREVERBERATION; NOISE; ADAPTATION; MIXTURES; FEATURES; DOMAIN;
D O I
10.1186/s13634-015-0245-7
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
Reverberation and noise are known to severely affect the automatic speech recognition (ASR) performance of speech recorded by distant microphones. Therefore, we must deal with reverberation if we are to realize high-performance hands-free speech recognition. In this paper, we review a recognition system that we developed at our laboratory to deal with reverberant speech. The system consists of a speech enhancement (SE) front-end that employs long-term linear prediction-based dereverberation followed by noise reduction. We combine our SE front-end with an ASR back-end that uses neural networks for acoustic and language modeling. The proposed system achieved top scores on the ASR task of the REVERB challenge. This paper describes the different technologies used in our system and presents detailed experimental results that justify our implementation choices and may provide hints for designing distant ASR systems.
引用
收藏
页数:15
相关论文
共 50 条
  • [41] Humanoid separation of speech sources in reverberant environments
    Schulz, Sylvia
    Herfet, Thorsten
    [J]. 2008 3RD INTERNATIONAL SYMPOSIUM ON COMMUNICATIONS, CONTROL AND SIGNAL PROCESSING, VOLS 1-3, 2008, : 377 - 382
  • [42] USING AUTOMATIC SPEECH RECOGNITION AND SPEECH SYNTHESIS TO IMPROVE THE INTELLIGIBILITY OF COCHLEAR IMPLANT USERS IN REVERBERANT LISTENING ENVIRONMENTS
    Chu, Kevin
    Collins, Leslie
    Mainsah, Boyla
    [J]. 2020 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2020, : 6929 - 6933
  • [43] Speech recognition based on HMM decomposition and composition method with a microphone array in noisy reverberant environments
    Miki, K
    Nishiura, T
    Nakamura, S
    Shikano, K
    [J]. ELECTRONICS AND COMMUNICATIONS IN JAPAN PART II-ELECTRONICS, 2002, 85 (09): : 13 - 22
  • [44] Robust speech recognition in reverberant environments by using an optimal synthetic room impulse response model
    Liu, Jindong
    Yang, Guang-Zhong
    [J]. SPEECH COMMUNICATION, 2015, 67 : 65 - 77
  • [45] Sub-band temporal modulation envelopes and their normalization for automatic speech recognition in reverberant environments
    Lu, Xugang
    Unoki, Masashi
    Nakamura, Satoshi
    [J]. COMPUTER SPEECH AND LANGUAGE, 2011, 25 (03): : 571 - 584
  • [46] Estimation of speech recognition performance in noisy and reverberant environments using PESQ score and acoustic parameters
    Fukumori, Takahiro
    Nakayama, Masato
    Nishiura, Takanobu
    Yamashita, Yoichi
    [J]. 2013 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA), 2013,
  • [47] A STUDY ON DATA AUGMENTATION OF REVERBERANT SPEECH FOR ROBUST SPEECH RECOGNITION
    Ko, Tom
    Peddinti, Vijayaditya
    Povey, Daniel
    Seltzer, Michael L.
    Khudanpur, Sanjeev
    [J]. 2017 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2017, : 5220 - 5224
  • [48] Reverberant Speech Recognition Based on Denoising Autoencoder
    Ishii, Takaaki
    Komiyama, Hiroki
    Shinozaki, Takahiro
    Horiuchi, Yasuo
    Kuroiwa, Shingo
    [J]. 14TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2013), VOLS 1-5, 2013, : 3479 - 3483
  • [49] Modulation spectrum analysis for recognition of reverberant speech
    Mallidi, Sri Harish
    Ganapathy, Sriram
    Hermansky, Hynek
    [J]. 12TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2011 (INTERSPEECH 2011), VOLS 1-5, 2011, : 196 - 199
  • [50] Missing data speech recognition in reverberant conditions
    Palomäki, KJ
    Brown, GJ
    Barker, J
    [J]. 2002 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOLS I-IV, PROCEEDINGS, 2002, : 65 - 68