Computational speech segregation based on an auditory-inspired modulation analysis

被引:15
|
作者
May, Tobias [1 ]
Dau, Torsten [1 ]
机构
[1] Tech Univ Denmark, Dept Elect Engn, Ctr Appl Hearing Res, DK-2800 Lyngby, Denmark
来源
关键词
INNER HAIR-CELL; AMPLITUDE-MODULATION; FREQUENCY-SELECTIVITY; NOISE; INTELLIGIBILITY; MODEL; HEARING; RECOGNITION; PERCEPTION; MASKING;
D O I
10.1121/1.4901711
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
A monaural speech segregation system is presented that estimates the ideal binary mask from noisy speech based on the supervised learning of amplitude modulation spectrogram (AMS) features. Instead of using linearly scaled modulation filters with constant absolute bandwidth, an auditory-inspired modulation filterbank with logarithmically scaled filters is employed. To reduce the dependency of the AMS features on the overall background noise level, a feature normalization stage is applied. In addition, a spectro-temporal integration stage is incorporated in order to exploit the context information about speech activity present in neighboring time-frequency units. In order to evaluate the generalization performance of the system to unseen acoustic conditions, the speech segregation system is trained with a limited set of low signal-to-noise ratio (SNR) conditions, but tested over a wide range of SNRs up to 20 dB. A systematic evaluation of the system demonstrates that auditory-inspired modulation processing can substantially improve the mask estimation accuracy in the presence of stationary and fluctuating interferers. (C) 2014 Acoustical Society of America.
引用
收藏
页码:3350 / 3359
页数:10
相关论文
共 50 条
  • [1] Computational speech segregation based on an auditory-inspired modulation analysis
    [J]. May, Tobias, 1600, Acoustical Society of America (136):
  • [2] Whispered Speech Detection in Noise Using Auditory-Inspired Modulation Spectrum Features
    Sarria-Paja, Milton
    Falk, Tiago H.
    [J]. IEEE SIGNAL PROCESSING LETTERS, 2013, 20 (08) : 783 - 786
  • [3] Improved monaural speech segregation based on computational auditory scene analysis
    Wang Yu
    Lin Jiajun
    Chen Ning
    Yuan Wenhao
    [J]. EURASIP Journal on Audio, Speech, and Music Processing, 2013
  • [4] Improved monaural speech segregation based on computational auditory scene analysis
    Wang Yu
    Lin Jiajun
    Chen Ning
    Yuan Wenhao
    [J]. EURASIP JOURNAL ON AUDIO SPEECH AND MUSIC PROCESSING, 2013,
  • [5] Auditory-Inspired Morphological Processing of Speech Spectrograms: Applications in Automatic Speech Recognition and Speech Enhancement
    Cadore, Joyner
    Valverde-Albacete, Francisco J.
    Gallardo-Antolin, Ascension
    Pelaez-Moreno, Carmen
    [J]. COGNITIVE COMPUTATION, 2013, 5 (04) : 426 - 441
  • [6] Auditory-Inspired Morphological Processing of Speech Spectrograms: Applications in Automatic Speech Recognition and Speech Enhancement
    Joyner Cadore
    Francisco J. Valverde-Albacete
    Ascensión Gallardo-Antolín
    Carmen Peláez-Moreno
    [J]. Cognitive Computation, 2013, 5 : 426 - 441
  • [7] A computational auditory scene analysis system for speech segregation and robust speech recognition
    Shao, Yang
    Srinivasan, Soundararajan
    Jin, Zhaozhang
    Wang, DeLiang
    [J]. COMPUTER SPEECH AND LANGUAGE, 2010, 24 (01): : 77 - 93
  • [8] Auditory-Inspired Speech Envelope Extraction Methods for Improved EEG-Based Auditory Attention Detection in a Cocktail Party Scenario
    Biesmans, Wouter
    Das, Neetha
    Francart, Tom
    Bertrand, Alexander
    [J]. IEEE TRANSACTIONS ON NEURAL SYSTEMS AND REHABILITATION ENGINEERING, 2017, 25 (05) : 402 - 412
  • [9] Auditory-inspired sparse representation of audio signals
    Pichevar, Ramin
    Najaf-Zadeh, Hossein
    Thibault, Louis
    Landili, Hassan
    [J]. SPEECH COMMUNICATION, 2011, 53 (05) : 643 - 657
  • [10] Auditory-Inspired Heart Sound Temporal Analysis for Patent Ductus Arteriosus
    Sung, Po-Hsun
    Wang, Jieh-Neng
    Chen, Bo-Wei
    Jangand, Ling-Sheng
    Wang, Jhing-Fa
    [J]. 1ST INTERNATIONAL CONFERENCE ON ORANGE TECHNOLOGIES (ICOT 2013), 2013, : 231 - 234