Modeling human auditory perception for noise-robust speech recognition

被引:0
|
作者
Lee, SY [1 ]
机构
[1] Korea Adv Inst Sci & Technol, Dept BioSyst, Taejon 305701, South Korea
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Several bio-inspired models of human auditory perception are reported for robust speech recognition in realworld noisy environment. The developed mathematical models of the human auditory pathway are integrated into a speech recognition system, of which 3 components are. (1) the nonlinear feature extraction model from cochlea to auditory cortex, (2) the binaural processing model at superior olivery complex, and (3) the top-down attention model from higher brain to the cochlea. The unsupervised Independent Component Analysis shows that some auditory feature extraction and binaural processing mechanisms follow information theory with sparse representation. The ICA-based features resemble frequency-limited features extracted from the cochlea and also more complex time-frequency features from the inferior colliculus and auditory cortex. The top-down attention model shows how the pre-acquired knowledge in our brain filters out irrelevant features or fills in missing features in the sensory data. Both the top-down attention and bottomup binaural processing are combined into a single system for high-noisy cases. This auditory model requires extensive computing, and several VLSI implementations had been developed for real-time applications. Experimental results demonstrate much better recognition performance in realworld noisy environments.
引用
收藏
页码:PL72 / PL74
页数:3
相关论文
共 50 条
  • [1] Modeling auditory perception to improve robust speech recognition
    Strope, B
    Alwan, A
    [J]. THIRTY-FIRST ASILOMAR CONFERENCE ON SIGNALS, SYSTEMS & COMPUTERS, VOLS 1 AND 2, 1998, : 1056 - 1060
  • [2] Sparse Auditory Reproducing Kernel (SPARK) Features for Noise-Robust Speech Recognition
    Fazel, Amin
    Chakrabartty, Shantanu
    [J]. IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2012, 20 (04): : 1362 - 1371
  • [3] Fusion Feature Extraction Based on Auditory and Energy for Noise-Robust Speech Recognition
    Shi, Yanyan
    Bai, Jing
    Xue, Peiyun
    Shi, Dianxi
    [J]. IEEE ACCESS, 2019, 7 : 81911 - 81922
  • [4] Noise-Robust speech recognition of Conversational Telephone Speech
    Chen, Gang
    Tolba, Hesham
    O'Shaughnessy, Douglas
    [J]. INTERSPEECH 2006 AND 9TH INTERNATIONAL CONFERENCE ON SPOKEN LANGUAGE PROCESSING, VOLS 1-5, 2006, : 1101 - 1104
  • [5] Modeling sub-band correlation for noise-robust speech recognition
    McAuley, J
    Ming, J
    Hanna, P
    Stewart, D
    [J]. 2004 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOL I, PROCEEDINGS: SPEECH PROCESSING, 2004, : 1017 - 1020
  • [6] EXTENDED VTS FOR NOISE-ROBUST SPEECH RECOGNITION
    van Dalen, R. C.
    Gales, M. J. F.
    [J]. 2009 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOLS 1- 8, PROCEEDINGS, 2009, : 3829 - 3832
  • [7] Covariance Modelling for Noise-Robust Speech Recognition
    van Dalen, R. C.
    Gales, M. J. F.
    [J]. INTERSPEECH 2008: 9TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2008, VOLS 1-5, 2008, : 2000 - 2003
  • [8] An Overview of Noise-Robust Automatic Speech Recognition
    Li, Jinyu
    Deng, Li
    Gong, Yifan
    Haeb-Umbach, Reinhold
    [J]. IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2014, 22 (04) : 745 - 777
  • [9] Extended VTS for Noise-Robust Speech Recognition
    van Dalen, Rogier C.
    Gales, Mark J. F.
    [J]. IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2011, 19 (04): : 733 - 743
  • [10] Frame decorrelation for noise-robust speech recognition
    Jung, HY
    Kim, DY
    Un, CK
    [J]. ELECTRONICS LETTERS, 1996, 32 (13) : 1163 - 1164