Speech recognition by humans and machines under conditions with severe channel variability and noise

被引：1

作者：

Lippmann, RP

Carlson, BA

机构：

来源：

APPLICATIONS AND SCIENCE OF ARTIFICIAL NEURAL NETWORKS III | 1997年 / 3077卷

关键词：

speech recognition; speech perception; missing features; filtering; noise; robust; neural network;

D O I：

10.1117/12.271525

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Despite dramatic recent advances in speech recognition technology, speech recognizers still perform much worse than humans. The difference in performance between humans and machines is most dramatic when variable amounts and types of filtering and noise are present during testing. For example, humans readily understand speech that is low-pass filtered below 3 kHz or high-pass filtered above 1 kHz. Machines trained with wide-band speech, however, degrade dramatically under these conditions. An approach to compensate for variable unknown sharp filtering and noise is presented which uses mel-filter-bank magnitudes as input features, estimates the signal-to-noise ratio (SNR) for each filter, and uses missing feature theory to dynamically modify the probability computations performed using Gaussian Mixture or Radial Basis Function neural network classifiers embedded within Hidden Markov Model (HMM) recognizers. The approach was successfully demonstrated using a talker-independent digit recognition task. It was found that recognition accuracy across many conditions rises from below 50% to above 95% with this approach. These promising results suggest future work to dynamically estimate SNR's and to explore the dynamics of human adaptation to channel and noise variability.

引用

页码：46 / 57

页数：12

共 50 条

[21] Robust distributed speech recognition in noise and packet loss conditions
Flynn, Ronan
Jones, Edward
[J]. DIGITAL SIGNAL PROCESSING, 2010, 20 (06) : 1559 - 1571
[22] Analysis and compensation of speech under stress and noise for environmental robustness in speech recognition
Hansen, JHL
[J]. SPEECH COMMUNICATION, 1996, 20 (1-2) : 151 - 173
[23] What's the difference? Comparing humans and machines on the Aurora 2 speech recognition task
Meyer, Bernd T.
[J]. 14TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2013), VOLS 1-5, 2013, : 2633 - 2637
[24] Temporal summation under masking conditions and speech recognition
Rimskaya-Korsakova L.K.
[J]. Human Physiology, 2013, 39 (4) : 355 - 363
[25] Discriminative learning of additive noise and channel distortions for robust speech recognition
Han, JQ
Han, MS
Park, GB
Park, J
Gao, W
Hwang, D
[J]. PROCEEDINGS OF THE 1998 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING, VOLS 1-6, 1998, : 81 - 84
[26] Maximum likelihood joint estimation of channel and noise for robust speech recognition
Zhao, YX
[J]. 2000 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, PROCEEDINGS, VOLS I-VI, 2000, : 1109 - 1112
[27] Variability of Lombard effects under different noise conditions
Wakao, A
Takeda, K
Itakura, F
[J]. ICSLP 96 - FOURTH INTERNATIONAL CONFERENCE ON SPOKEN LANGUAGE PROCESSING, PROCEEDINGS, VOLS 1-4, 1996, : 2009 - 2012
[28] Listening benefits in speech-in-speech recognition are altered under reverberant conditions
Viswanathan, Navin
Kokkinakis, Kostas
[J]. JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 2019, 145 (05): : EL348 - EL353
[29] Is Listening in Noise Worth It? The Neurobiology of Speech Recognition in Challenging Listening Conditions
Eckert, Mark A.
Teubner-Rhodes, Susan
Vaden, Kenneth I., Jr.
[J]. EAR AND HEARING, 2016, 37 : 101S - 110S
[30] Noise robust in-domain children speech enhancement for automatic Punjabi recognition system under mismatched conditions
Bawa, Puneet
Kadyan, Virender
[J]. APPLIED ACOUSTICS, 2021, 175

← 1 2 3 4 5 →