Binaural speech enhancement algorithm based on attention and deep learning

被引:0
|
作者
Li R. [1 ]
Li Q. [1 ]
Zhao F. [1 ]
Liu S. [1 ]
机构
[1] Faculty of Information Technology, Beijing University of Technology, Beijing
关键词
attention mechanism; binaural cues; binaural speech enhancement; convolutional recurrent neural network; Gammatone filter;
D O I
10.13245/j.hust.238536
中图分类号
学科分类号
摘要
In order to reduce the influence of noise and reverberation in binaural speech,and improve speech quality and intelligibility,a binaural speech enhancement algorithm based on attention mechanism and improved convolutional recurrent neural network was proposed.In this algorithm,the spectral features and binaural cues of binaural speech were first extracted,and channel attention was applied to the spectral features to obtain reliable spectral features,while reliable binaural cues were obtained by applying spatial attention to the binaural cues,then the two features were combined as the input of neural network.A neural network structure that uses model attention as a skip connection for the encode layer and decode layer of convolutional recurrent neural network,and the bidirectional long and short-term memory network was used to obtain time domain information. Experimental results show that the proposed algorithm has better performance in different noise and reverberation conditions. © 2023 Huazhong University of Science and Technology. All rights reserved.
引用
收藏
页码:125 / 131and166
相关论文
共 20 条
  • [1] 48, 11, pp. 17-23, (2020)
  • [2] 49, 6, pp. 43-49
  • [3] LI R, BAO C, XIA B, Speech enhancement using the combination of adaptive wavelet threshold and spectral subtraction based on wavelet packet decomposition[C], Proc of IEEE International Conference on Signal Processing, pp. 481-484, (2012)
  • [4] TAN K, WANG D L., A convolutional recurrent neural network for real-time speech enhancement[C], Proc of Interspeech 2018, pp. 3229-3233, (2018)
  • [5] 41, 12, pp. 2932-2938, (2019)
  • [6] JIANG Y, WANG D L, LIU R S, Binaural classification for reverberant speech segregation using deep neural networks[J], IEEE/ACM Transactions on Audio Speech & Language Processing, 22, 12, pp. 2112-2121, (2014)
  • [7] ZHANG Xueliang, WANG Deliang, Deep learning based binaural speech separation in reverberant environments[J], IEEE/ACM Transactions on Audio Speech & Language Processing, 25, 5, pp. 1075-1084, (2017)
  • [8] SUN Y, WANG W, CHAMBERS J, Two-stage monaural source separation in reverberant room environments using deep neural networks[J], IEEE/ACM Transactions on Audio, Speech, and Language Processing, 27, 1, pp. 125-139, (2018)
  • [9] DADVAR P, GERAVANCHIZADEH M., Robust binaural speech separation in adverse conditions based on deep neural network with modified spatial features and training target[J], Speech Communication, 108, pp. 41-52, (2019)
  • [10] LI R, LI T,, SUN X, Speech separation based on reliable binaural cues with two-stage neural network in noisy-reverberant environments[J], Applied Acoustics, 168, 1, (2020)