An Overview of Noise-Robust Automatic Speech Recognition

被引：369

作者：

Li, Jinyu ^{[1
]}

Deng, Li ^{[1
]}

Gong, Yifan ^{[1
]}

Haeb-Umbach, Reinhold ^{[2
]}

机构：

[1] Microsoft Corp, Redmond, WA 98052 USA

[2] Univ Paderborn, Dept Commun Engn, D-33098 Paderborn, Germany

来源：

IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING | 2014年 / 22卷 / 04期

关键词：

Speech recognition; noise; robustness; distortion modeling; compensation; uncertainty processing; joint model training; NONNEGATIVE MATRIX FACTORIZATION; PREDICTIVE CLASSIFICATION APPROACH; MAXIMUM-LIKELIHOOD-ESTIMATION; RAPID SPEAKER ADAPTATION; HISTOGRAM EQUALIZATION; LINEAR-REGRESSION; MASK ESTIMATION; FEATURE ENHANCEMENT; JOINT COMPENSATION; ENVIRONMENT MODEL;

D O I：

10.1109/TASLP.2014.2304637

中图分类号：

O42 [声学];

学科分类号：

070206 ; 082403 ;

摘要：

New waves of consumer-centric applications, such as voice search and voice interaction with mobile devices and home entertainment systems, increasingly require automatic speech recognition (ASR) to be robust to the full range of real-world noise and other acoustic distorting conditions. Despite its practical importance, however, the inherent links between and distinctions among the myriad of methods for noise-robust ASR have yet to be carefully studied in order to advance the field further. To this end, it is critical to establish a solid, consistent, and common mathematical foundation for noise-robust ASR, which is lacking at present. This article is intended to fill this gap and to provide a thorough overview of modern noise-robust techniques for ASR developed over the past 30 years. We emphasize methods that are proven to be successful and that are likely to sustain or expand their future applicability. We distill key insights from our comprehensive overview in this field and take a fresh look at a few old problems, which nevertheless are still highly relevant today. Specifically, we have analyzed and categorized a wide range of noise-robust techniques using five different criteria: 1) feature-domain vs. model-domain processing, 2) the use of prior knowledge about the acoustic environment distortion, 3) the use of explicit environment-distortion models, 4) deterministic vs. uncertainty processing, and 5) the use of acoustic models trained jointly with the same feature enhancement or model adaptation process used in the testing stage. With this taxonomy-oriented review, we equip the reader with the insight to choose among techniques and with the awareness of the performance-complexity tradeoffs. The pros and cons of using different noise-robust ASR techniques in practical application scenarios are provided as a guide to interested practitioners. The current challenges and future research directions in this field is also carefully analyzed.

引用

页码：745 / 777

页数：33

共 50 条

[1] Factorial Speech Processing Models for Noise-Robust Automatic Speech Recognition
Khademian, Mahdi
Homayounpour, Mohammad Mehdi
[J]. 2015 23RD IRANIAN CONFERENCE ON ELECTRICAL ENGINEERING (ICEE), 2015, : 637 - 642
[2] INCORPORATING MASK MODELLING FOR NOISE-ROBUST AUTOMATIC SPEECH RECOGNITION
Koekueer, Muenevver
Jancovic, Peter
[J]. 2009 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOLS 1- 8, PROCEEDINGS, 2009, : 3929 - 3932
[3] Empirical Mode Decomposition For Noise-Robust Automatic Speech Recognition
Wu, Kuo-Hao
Chen, Chia-Ping
[J]. 11TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2010 (INTERSPEECH 2010), VOLS 3 AND 4, 2010, : 2074 - 2077
[4] A companding front end for noise-robust automatic speech recognition
Guinness, J
Raj, B
Schmidt-Nielsen, B
Turicchia, L
Sarpeshkar, R
[J]. 2005 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOLS 1-5: SPEECH PROCESSING, 2005, : 249 - 252
[5] Noise-Robust Algorithm of Speech Features Extraction for Automatic Speech Recognition System
Yakhnev, A. N.
Pisarev, A. S.
[J]. PROCEEDINGS OF THE XIX IEEE INTERNATIONAL CONFERENCE ON SOFT COMPUTING AND MEASUREMENTS (SCM 2016), 2016, : 206 - 208
[6] Sparse coding of the modulation spectrum for noise-robust automatic speech recognition
Sara Ahmadi
Seyed Mohammad Ahadi
Bert Cranen
Lou Boves
[J]. EURASIP Journal on Audio, Speech, and Music Processing, 2014
[7] Novel frequency masking curves for noise-robust automatic speech recognition
Chen, Chia-Ping
Yeh, Ja-Zang
Wu, Bo-Feng
[J]. JOURNAL OF THE CHINESE INSTITUTE OF ENGINEERS, 2013, 36 (06) : 696 - 703
[8] Sparse coding of the modulation spectrum for noise-robust automatic speech recognition
Ahmadi, Sara
Ahadi, Seyed Mohammad
Cranen, Bert
Boves, Lou
[J]. EURASIP JOURNAL ON AUDIO SPEECH AND MUSIC PROCESSING, 2014, : 1 - 20
[9] Noise-Robust speech recognition of Conversational Telephone Speech
Chen, Gang
Tolba, Hesham
O'Shaughnessy, Douglas
[J]. INTERSPEECH 2006 AND 9TH INTERNATIONAL CONFERENCE ON SPOKEN LANGUAGE PROCESSING, VOLS 1-5, 2006, : 1101 - 1104
[10] Noise-robust automatic speech recognition using a predictive echo state network
Skowronski, Mark D.
Harris, John G.
[J]. IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2007, 15 (05): : 1724 - 1730

← 1 2 3 4 5 →