ROBUST SPEECH RECOGNITION IN UNKNOWN REVERBERANT AND NOISY CONDITIONS

被引:0
|
作者
Hsiao, Roger [1 ]
Ma, Jeff [1 ]
Hartmann, William [1 ]
Karafiat, Martin [2 ]
Grezl, Frantisek [2 ]
Burget, Lukas [2 ]
Szoke, Igor [2 ]
Cernocky, Jan Honza [2 ]
Watanabe, Shinji [3 ]
Chen, Zhuo [3 ]
Mallidi, Sri Harish [4 ]
Hermansky, Hynek [4 ]
Tsakalidis, Stavros [1 ]
Schwartz, Richard [1 ]
机构
[1] Raytheon BBN Technol, Cambridge, MA 02138 USA
[2] Brno Univ Technol, Speech FIT & Ctr Excellence IT4I, CS-61090 Brno, Czech Republic
[3] Mitsubishi Elect Res Labs, Cambridge, MA USA
[4] Johns Hopkins Univ, Baltimore, MD USA
关键词
ASpIRE challenge; robust speech recognition;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In this paper, we describe our work on the ASpIRE (Automatic Speech recognition In Reverberant Environments) challenge, which aims to assess the robustness of automatic speech recognition (ASR) systems. The main characteristic of the challenge is developing a high-performance system without access to matched training and development data. While the evaluation data are recorded with far-field microphones in noisy and reverberant rooms, the training data are telephone speech and close talking. Our approach to this challenge includes speech enhancement, neural network methods and acoustic model adaptation, We show that these techniques can successfully alleviate the performance degradation due to noisy audio and data mismatch.
引用
下载
收藏
页码:533 / 538
页数:6
相关论文
共 50 条
  • [1] Techniques for robust speech recognition in noisy and reverberant conditions
    Brown, GJ
    Palomäki, KJ
    SPEECH SEPARATION BY HUMANS AND MACHINES, 2005, : 213 - 220
  • [2] OPTIMIZING SPECTRAL SUBTRACTION AND WIENER FILTERING FOR ROBUST SPEECH RECOGNITION IN REVERBERANT AND NOISY CONDITIONS
    Gomez, Randy
    Kawahara, Tatsuya
    2010 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2010, : 4566 - 4569
  • [3] Speech Enhancement and Recognition of Compressed Speech Signal in Noisy Reverberant Conditions
    Suman, Maloji
    Khan, Habibulla
    Latha, M. Madhavi
    Kumari, Devarakonda Aruna
    PROCEEDINGS OF THE INTERNATIONAL CONFERENCE ON INFORMATION SYSTEMS DESIGN AND INTELLIGENT APPLICATIONS 2012 (INDIA 2012), 2012, 132 : 379 - +
  • [4] Feature Transformations for Robust Speech Recognition in Reverberant Conditions
    Yuliani, Asri R.
    Sustika, Rika
    Yuwana, Raden S.
    Pardede, Hilman F.
    2017 INTERNATIONAL CONFERENCE ON COMPUTER, CONTROL, INFORMATICS AND ITS APPLICATIONS (IC3INA), 2017, : 57 - 62
  • [5] DISTANT SPEECH RECOGNITION IN REVERBERANT NOISY CONDITIONS EMPLOYING A MICROPHONE ARRAY
    Morales-Cordovilla, Juan A.
    Hagmueller, Martin
    Pessentheiner, Hannes
    Kubin, Gernot
    2014 PROCEEDINGS OF THE 22ND EUROPEAN SIGNAL PROCESSING CONFERENCE (EUSIPCO), 2014, : 2380 - 2384
  • [6] ROBUST RECOGNITION OF REVERBERANT AND NOISY SPEECH USING COHERENCE-BASED PROCESSING
    Menon, Anjali
    Kim, Chanwoo
    Stern, Richard M.
    2019 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2019, : 6775 - 6779
  • [7] Speech Emotion Recognition in Noisy and Reverberant Environments
    Heracleous, Panikos
    Yasuda, Keiji
    Sugaya, Fumiaki
    Yoneyama, Akio
    Hashimoto, Masayuki
    2017 SEVENTH INTERNATIONAL CONFERENCE ON AFFECTIVE COMPUTING AND INTELLIGENT INTERACTION (ACII), 2017, : 262 - 266
  • [8] Speech Intelligibility Enhancement in Noisy Reverberant Conditions
    Li, Junfeng
    Xia, Risheng
    Fang, Qiang
    Li, Aijun
    Yan, Yonghong
    2016 10TH INTERNATIONAL SYMPOSIUM ON CHINESE SPOKEN LANGUAGE PROCESSING (ISCSLP), 2016,
  • [9] AMPLITUDE MODULATION SPECTROGRAM BASED FEATURES FOR ROBUST SPEECH RECOGNITION IN NOISY AND REVERBERANT ENVIRONMENTS
    Moritz, Niko
    Anemueller, Joern
    Kollmeier, Birger
    2011 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2011, : 5492 - 5495
  • [10] Robust Speaker Identification in Noisy and Reverberant Conditions
    Zhao, Xiaojia
    Wang, Yuxuan
    Wang, DeLiang
    IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2014, 22 (04) : 836 - 845