Speech recognition by humans and machines under conditions with severe channel variability and noise

被引:1
|
作者
Lippmann, RP
Carlson, BA
机构
关键词
speech recognition; speech perception; missing features; filtering; noise; robust; neural network;
D O I
10.1117/12.271525
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Despite dramatic recent advances in speech recognition technology, speech recognizers still perform much worse than humans. The difference in performance between humans and machines is most dramatic when variable amounts and types of filtering and noise are present during testing. For example, humans readily understand speech that is low-pass filtered below 3 kHz or high-pass filtered above 1 kHz. Machines trained with wide-band speech, however, degrade dramatically under these conditions. An approach to compensate for variable unknown sharp filtering and noise is presented which uses mel-filter-bank magnitudes as input features, estimates the signal-to-noise ratio (SNR) for each filter, and uses missing feature theory to dynamically modify the probability computations performed using Gaussian Mixture or Radial Basis Function neural network classifiers embedded within Hidden Markov Model (HMM) recognizers. The approach was successfully demonstrated using a talker-independent digit recognition task. It was found that recognition accuracy across many conditions rises from below 50% to above 95% with this approach. These promising results suggest future work to dynamically estimate SNR's and to explore the dynamics of human adaptation to channel and noise variability.
引用
收藏
页码:46 / 57
页数:12
相关论文
共 50 条
  • [21] Robust distributed speech recognition in noise and packet loss conditions
    Flynn, Ronan
    Jones, Edward
    [J]. DIGITAL SIGNAL PROCESSING, 2010, 20 (06) : 1559 - 1571
  • [22] Analysis and compensation of speech under stress and noise for environmental robustness in speech recognition
    Hansen, JHL
    [J]. SPEECH COMMUNICATION, 1996, 20 (1-2) : 151 - 173
  • [23] What's the difference? Comparing humans and machines on the Aurora 2 speech recognition task
    Meyer, Bernd T.
    [J]. 14TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2013), VOLS 1-5, 2013, : 2633 - 2637
  • [24] Temporal summation under masking conditions and speech recognition
    Rimskaya-Korsakova L.K.
    [J]. Human Physiology, 2013, 39 (4) : 355 - 363
  • [25] Discriminative learning of additive noise and channel distortions for robust speech recognition
    Han, JQ
    Han, MS
    Park, GB
    Park, J
    Gao, W
    Hwang, D
    [J]. PROCEEDINGS OF THE 1998 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING, VOLS 1-6, 1998, : 81 - 84
  • [26] Maximum likelihood joint estimation of channel and noise for robust speech recognition
    Zhao, YX
    [J]. 2000 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, PROCEEDINGS, VOLS I-VI, 2000, : 1109 - 1112
  • [27] Variability of Lombard effects under different noise conditions
    Wakao, A
    Takeda, K
    Itakura, F
    [J]. ICSLP 96 - FOURTH INTERNATIONAL CONFERENCE ON SPOKEN LANGUAGE PROCESSING, PROCEEDINGS, VOLS 1-4, 1996, : 2009 - 2012
  • [28] Listening benefits in speech-in-speech recognition are altered under reverberant conditions
    Viswanathan, Navin
    Kokkinakis, Kostas
    [J]. JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 2019, 145 (05): : EL348 - EL353
  • [29] Is Listening in Noise Worth It? The Neurobiology of Speech Recognition in Challenging Listening Conditions
    Eckert, Mark A.
    Teubner-Rhodes, Susan
    Vaden, Kenneth I., Jr.
    [J]. EAR AND HEARING, 2016, 37 : 101S - 110S
  • [30] Noise robust in-domain children speech enhancement for automatic Punjabi recognition system under mismatched conditions
    Bawa, Puneet
    Kadyan, Virender
    [J]. APPLIED ACOUSTICS, 2021, 175