ITERATIVE DEEP NEURAL NETWORKS FOR SPEAKER-INDEPENDENT BINAURAL BLIND SPEECH SEPARATION

被引：0

作者：

Liu, Qingju ^{[1
]}

Xu, Yong ^{[1
]}

Jackson, Philip J. B. ^{[1
]}

Wang, Wenwu ^{[1
]}

Coleman, Philip ^{[2
]}

机构：

[1] Univ Surrey, Ctr Vis Speech & Signal Proc, Guildford, Surrey, England

[2] Univ Surrey, Inst Sound Recording, Guildford, Surrey, England

来源：

2018 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP) | 2018年

基金：

英国工程与自然科学研究理事会;

关键词：

Deep neural network; binaural blind speech separation; spectral and spatial; iterative DNN;

D O I：

暂无

中图分类号：

O42 [声学];

学科分类号：

070206 ; 082403 ;

摘要：

In this paper, we propose an iterative deep neural network (DNN)-based binaural source separation scheme, for recovering two concurrent speech signals in a room environment. Besides the commonly-used spectral features, the DNN also takes non-linearly wrapped binaural spatial features as input, which are refined iteratively using parameters estimated from the DNN output via a feedback loop. Different DNN structures have been tested, including a classic multilayer perception regression architecture as well as a new hybrid network with both convolutional and densely-connected layers. Objective evaluations in terms of PESQ and STOI showed consistent improvement over baseline methods using traditional binaural features, especially when the hybrid DNN architecture was employed. In addition, our proposed scheme is robust to mismatches between the training and testing data.

引用

页码：541 / 545

页数：5

共 50 条

[41] Acoustic-phonetic speech parameters for speaker-independent speech recognition
Deshmukh, O
Espy-Wilson, CY
Juneja, A
2002 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOLS I-IV, PROCEEDINGS, 2002, : 593 - 596
[42] Across-speaker Articulatory Normalization for Speaker-independent Silent Speech Recognition
Wang, Jun
Samal, Ashok
Green, Jordan R.
15TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2014), VOLS 1-4, 2014, : 1179 - 1183
[43] Looking to Listen at the Cocktail Party: A Speaker-Independent Audio-Visual Model for Speech Separation
Ephrat, Ariel
Mosseri, Inbar
Lang, Oran
Dekel, Tali
Wilson, Kevin
Hassidim, Avinatan
Freeman, William T.
Rubinstein, Michael
ACM TRANSACTIONS ON GRAPHICS, 2018, 37 (04):
[44] Binaural Speech Intelligibility Estimation Using Deep Neural Networks
Kondo, Kazuhiro
Taira, Kazuya
Kobayashi, Yosuke
19TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2018), VOLS 1-6: SPEECH RESEARCH FOR EMERGING MARKETS IN MULTILINGUAL SOCIETIES, 2018, : 1858 - 1862
[45] Speaker-independent speech recognition based on tree-structured speaker clustering
Kosaka, T
Matsunaga, S
Sagayama, S
COMPUTER SPEECH AND LANGUAGE, 1996, 10 (01): : 55 - 74
[46] Artificial neural networks as speech recognisers for dysarthric speech: Identifying the best-performing set of MFCC parameters and studying a speaker-independent approach
Shahamiri, Seyed Reza
Salim, Siti Salwah Binti
ADVANCED ENGINEERING INFORMATICS, 2014, 28 (01) : 102 - 110
[47] An automatic speech recognition system with speaker-independent identification support
Caranica, Alexandru
Burileanu, Corneliu
ADVANCED TOPICS IN OPTOELECTRONICS, MICROELECTRONICS, AND NANOTECHNOLOGIES VII, 2015, 9258
[48] ON LARGE-VOCABULARY SPEAKER-INDEPENDENT CONTINUOUS SPEECH RECOGNITION
LEE, KF
SPEECH COMMUNICATION, 1988, 7 (04) : 375 - 379
[49] Speaker-independent telephone speech recognition system: the VCS TeleRec
Hunt, Alan
Speech technology, 1988, 4 (02): : 80 - 82
[50] Speaker-Independent Spectral Enhancement for Bone-Conducted Speech
Cheng, Liangliang
Dou, Yunfeng
Zhou, Jian
Wang, Huabin
Tao, Liang
ALGORITHMS, 2023, 16 (03)

← 1 2 3 4 5 →