Binaural Classification-Based Speech Segregation and Robust Speaker Recognition System

被引：0

作者：

R. Venkatesan

A. Balaji Ganesh

机构：

[1] Velammal Engineering College,Electronic System Design Laboratory, Department of Electrical and Electronics Engineering

来源：

Circuits, Systems, and Signal Processing | 2018年 / 37卷

关键词：

Binaural cues; Computational auditory scene analysis; Automatic speaker recognition; Gabor Hilbert envelope features; Deep recurrent neural network; Soft time–frequency masking;

D O I：

暂无

中图分类号：

学科分类号：

摘要：

The paper presents an auditory scene analyser that comprises of two joint simultaneous modules, namely binaural speech segregation and speaker recognition. The binaural speech segregation is realized by incorporating interaural time and level differences, interaural phase difference and interaural coherence along with direct-to-reverberant ratio into deep recurrent neural network. The performance of deep recurrent network-based speech segregation is validated in terms of source to interference ratio, source to distortion ratio and source to artifacts ratio and compared with existing architectures including deep neural network. It is observed that performance of conventional deep recurrent neural network can be improved further by involving discriminative objectives along with soft time–frequency masking as a layer in the network structure. The system also proposes a spectro-temporal extractor which is referred as Gabor–Hilbert envelope coefficients (GHEC). The proposed monaural feature is responsible for extracting discriminative acoustic information from segregated speech sources. The performance of GHEC is validated under various noisy and reverberant environments and the results are compared with existing monaural features. The results of binaural speech segregation have shown better signal-to-noise ratio at an average of 0.7 dB even in the presence of higher reverberation time, 0.89 s over other baseline algorithms.

引用

页码：3383 / 3411

页数：28

共 50 条

[1] Binaural Classification-Based Speech Segregation and Robust Speaker Recognition System
Venkatesan, R.
Ganesh, A. Balaji
[J]. CIRCUITS SYSTEMS AND SIGNAL PROCESSING, 2018, 37 (08) : 3383 - 3411
[2] Exploring Monaural Features for Classification-Based Speech Segregation
Wang, Yuxuan
Han, Kun
Wang, DeLiang
[J]. IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2013, 21 (02): : 270 - 279
[3] Emotional Speech Clustering based Robust Speaker Recognition System
Li, Dongdong
Yang, Yingchun
[J]. PROCEEDINGS OF THE 2009 2ND INTERNATIONAL CONGRESS ON IMAGE AND SIGNAL PROCESSING, VOLS 1-9, 2009, : 4576 - +
[4] Robust Speech Recognition Based on Binaural Auditory Processing
Menon, Anjali
Kim, Chanwoo
Stern, Richard M.
[J]. 18TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2017), VOLS 1-6: SITUATED INTERACTION, 2017, : 3872 - 3876
[5] AUTOMATED SPEECH RECOGNITION SYSTEM FOR SPEAKER EMOTION CLASSIFICATION
Anithadevi, N.
Gokul, P.
Nandan, S. Muhil
Magesh, R.
Shiddharth, S.
[J]. PROCEEDINGS OF THE 2020 5TH INTERNATIONAL CONFERENCE ON COMPUTING, COMMUNICATION AND SECURITY (ICCCS-2020), 2020,
[6] BINAURAL PROCESSING FOR ROBUST RECOGNITION OF DEGRADED SPEECH
Menon, Anjali
Kim, Chanwoo
Kurokawa, Umpei
Stern, Richard M.
[J]. 2017 IEEE AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING WORKSHOP (ASRU), 2017, : 24 - 31
[7] Robust Speaker Authentication Based on Combined Speech and Voiceprint Recognition
Malcangi, Mario
[J]. COMPUTATIONAL METHODS IN SCIENCE AND ENGINEERING, VOL 2: ADVANCES IN COMPUTATIONAL SCIENCE, 2009, 1148 : 872 - 875
[8] Robust speech recognition using signal processing based on binaural perception
Stern, RM
Sullivan, TM
[J]. ACUSTICA, 1996, 82 : S92 - S92
[9] Robust Speaker Identification Based on Binaural Masks
Ghalamiosgouei, Sina
Geravanchizadeh, Masoud
[J]. SPEECH COMMUNICATION, 2021, 132 : 1 - 9
[10] A computational auditory scene analysis system for speech segregation and robust speech recognition
Shao, Yang
Srinivasan, Soundararajan
Jin, Zhaozhang
Wang, DeLiang
[J]. COMPUTER SPEECH AND LANGUAGE, 2010, 24 (01): : 77 - 93

← 1 2 3 4 5 →