Binaural Classification-Based Speech Segregation and Robust Speaker Recognition System

被引:0
|
作者
R. Venkatesan
A. Balaji Ganesh
机构
[1] Velammal Engineering College,Electronic System Design Laboratory, Department of Electrical and Electronics Engineering
关键词
Binaural cues; Computational auditory scene analysis; Automatic speaker recognition; Gabor Hilbert envelope features; Deep recurrent neural network; Soft time–frequency masking;
D O I
暂无
中图分类号
学科分类号
摘要
The paper presents an auditory scene analyser that comprises of two joint simultaneous modules, namely binaural speech segregation and speaker recognition. The binaural speech segregation is realized by incorporating interaural time and level differences, interaural phase difference and interaural coherence along with direct-to-reverberant ratio into deep recurrent neural network. The performance of deep recurrent network-based speech segregation is validated in terms of source to interference ratio, source to distortion ratio and source to artifacts ratio and compared with existing architectures including deep neural network. It is observed that performance of conventional deep recurrent neural network can be improved further by involving discriminative objectives along with soft time–frequency masking as a layer in the network structure. The system also proposes a spectro-temporal extractor which is referred as Gabor–Hilbert envelope coefficients (GHEC). The proposed monaural feature is responsible for extracting discriminative acoustic information from segregated speech sources. The performance of GHEC is validated under various noisy and reverberant environments and the results are compared with existing monaural features. The results of binaural speech segregation have shown better signal-to-noise ratio at an average of 0.7 dB even in the presence of higher reverberation time, 0.89 s over other baseline algorithms.
引用
收藏
页码:3383 / 3411
页数:28
相关论文
共 50 条
  • [1] Binaural Classification-Based Speech Segregation and Robust Speaker Recognition System
    Venkatesan, R.
    Ganesh, A. Balaji
    [J]. CIRCUITS SYSTEMS AND SIGNAL PROCESSING, 2018, 37 (08) : 3383 - 3411
  • [2] Exploring Monaural Features for Classification-Based Speech Segregation
    Wang, Yuxuan
    Han, Kun
    Wang, DeLiang
    [J]. IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2013, 21 (02): : 270 - 279
  • [3] Emotional Speech Clustering based Robust Speaker Recognition System
    Li, Dongdong
    Yang, Yingchun
    [J]. PROCEEDINGS OF THE 2009 2ND INTERNATIONAL CONGRESS ON IMAGE AND SIGNAL PROCESSING, VOLS 1-9, 2009, : 4576 - +
  • [4] Robust Speech Recognition Based on Binaural Auditory Processing
    Menon, Anjali
    Kim, Chanwoo
    Stern, Richard M.
    [J]. 18TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2017), VOLS 1-6: SITUATED INTERACTION, 2017, : 3872 - 3876
  • [5] AUTOMATED SPEECH RECOGNITION SYSTEM FOR SPEAKER EMOTION CLASSIFICATION
    Anithadevi, N.
    Gokul, P.
    Nandan, S. Muhil
    Magesh, R.
    Shiddharth, S.
    [J]. PROCEEDINGS OF THE 2020 5TH INTERNATIONAL CONFERENCE ON COMPUTING, COMMUNICATION AND SECURITY (ICCCS-2020), 2020,
  • [6] BINAURAL PROCESSING FOR ROBUST RECOGNITION OF DEGRADED SPEECH
    Menon, Anjali
    Kim, Chanwoo
    Kurokawa, Umpei
    Stern, Richard M.
    [J]. 2017 IEEE AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING WORKSHOP (ASRU), 2017, : 24 - 31
  • [7] Robust Speaker Authentication Based on Combined Speech and Voiceprint Recognition
    Malcangi, Mario
    [J]. COMPUTATIONAL METHODS IN SCIENCE AND ENGINEERING, VOL 2: ADVANCES IN COMPUTATIONAL SCIENCE, 2009, 1148 : 872 - 875
  • [8] Robust speech recognition using signal processing based on binaural perception
    Stern, RM
    Sullivan, TM
    [J]. ACUSTICA, 1996, 82 : S92 - S92
  • [9] Robust Speaker Identification Based on Binaural Masks
    Ghalamiosgouei, Sina
    Geravanchizadeh, Masoud
    [J]. SPEECH COMMUNICATION, 2021, 132 : 1 - 9
  • [10] A computational auditory scene analysis system for speech segregation and robust speech recognition
    Shao, Yang
    Srinivasan, Soundararajan
    Jin, Zhaozhang
    Wang, DeLiang
    [J]. COMPUTER SPEECH AND LANGUAGE, 2010, 24 (01): : 77 - 93