ITERATIVE DEEP NEURAL NETWORKS FOR SPEAKER-INDEPENDENT BINAURAL BLIND SPEECH SEPARATION

被引:0
|
作者
Liu, Qingju [1 ]
Xu, Yong [1 ]
Jackson, Philip J. B. [1 ]
Wang, Wenwu [1 ]
Coleman, Philip [2 ]
机构
[1] Univ Surrey, Ctr Vis Speech & Signal Proc, Guildford, Surrey, England
[2] Univ Surrey, Inst Sound Recording, Guildford, Surrey, England
基金
英国工程与自然科学研究理事会;
关键词
Deep neural network; binaural blind speech separation; spectral and spatial; iterative DNN;
D O I
暂无
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
In this paper, we propose an iterative deep neural network (DNN)-based binaural source separation scheme, for recovering two concurrent speech signals in a room environment. Besides the commonly-used spectral features, the DNN also takes non-linearly wrapped binaural spatial features as input, which are refined iteratively using parameters estimated from the DNN output via a feedback loop. Different DNN structures have been tested, including a classic multilayer perception regression architecture as well as a new hybrid network with both convolutional and densely-connected layers. Objective evaluations in terms of PESQ and STOI showed consistent improvement over baseline methods using traditional binaural features, especially when the hybrid DNN architecture was employed. In addition, our proposed scheme is robust to mismatches between the training and testing data.
引用
收藏
页码:541 / 545
页数:5
相关论文
共 50 条
  • [41] Acoustic-phonetic speech parameters for speaker-independent speech recognition
    Deshmukh, O
    Espy-Wilson, CY
    Juneja, A
    2002 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOLS I-IV, PROCEEDINGS, 2002, : 593 - 596
  • [42] Across-speaker Articulatory Normalization for Speaker-independent Silent Speech Recognition
    Wang, Jun
    Samal, Ashok
    Green, Jordan R.
    15TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2014), VOLS 1-4, 2014, : 1179 - 1183
  • [43] Looking to Listen at the Cocktail Party: A Speaker-Independent Audio-Visual Model for Speech Separation
    Ephrat, Ariel
    Mosseri, Inbar
    Lang, Oran
    Dekel, Tali
    Wilson, Kevin
    Hassidim, Avinatan
    Freeman, William T.
    Rubinstein, Michael
    ACM TRANSACTIONS ON GRAPHICS, 2018, 37 (04):
  • [44] Binaural Speech Intelligibility Estimation Using Deep Neural Networks
    Kondo, Kazuhiro
    Taira, Kazuya
    Kobayashi, Yosuke
    19TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2018), VOLS 1-6: SPEECH RESEARCH FOR EMERGING MARKETS IN MULTILINGUAL SOCIETIES, 2018, : 1858 - 1862
  • [45] Speaker-independent speech recognition based on tree-structured speaker clustering
    Kosaka, T
    Matsunaga, S
    Sagayama, S
    COMPUTER SPEECH AND LANGUAGE, 1996, 10 (01): : 55 - 74
  • [46] Artificial neural networks as speech recognisers for dysarthric speech: Identifying the best-performing set of MFCC parameters and studying a speaker-independent approach
    Shahamiri, Seyed Reza
    Salim, Siti Salwah Binti
    ADVANCED ENGINEERING INFORMATICS, 2014, 28 (01) : 102 - 110
  • [47] An automatic speech recognition system with speaker-independent identification support
    Caranica, Alexandru
    Burileanu, Corneliu
    ADVANCED TOPICS IN OPTOELECTRONICS, MICROELECTRONICS, AND NANOTECHNOLOGIES VII, 2015, 9258
  • [49] Speaker-independent telephone speech recognition system: the VCS TeleRec
    Hunt, Alan
    Speech technology, 1988, 4 (02): : 80 - 82
  • [50] Speaker-Independent Spectral Enhancement for Bone-Conducted Speech
    Cheng, Liangliang
    Dou, Yunfeng
    Zhou, Jian
    Wang, Huabin
    Tao, Liang
    ALGORITHMS, 2023, 16 (03)