Real-time single-channel speech enhancement based on causal attention mechanism

被引:6
|
作者
Fan, Junyi [1 ]
Yang, Jibin [2 ]
Zhang, Xiongwei [2 ]
Yao, Yao [1 ]
机构
[1] Army Engn Univ, Grad Sch, Nanjing 210007, Jiangsu, Peoples R China
[2] Army Engn Univ, Coll Command & Control Engn, Nanjing 210007, Jiangsu, Peoples R China
基金
中国国家自然科学基金;
关键词
Attention mechanism; Causality; Single; -channel; Single -side relative position representation; Speech enhancement; SELF-ATTENTION; NEURAL-NETWORK; FREQUENCY; DOMAIN; CNN;
D O I
10.1016/j.apacoust.2022.109084
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
To achieve real-time single-channel speech enhancement, i.e., enhancing with no or low latency, this paper proposes a causal speech enhancement model with an attention mechanism based on Transformer. The model uses a causal codec with a U -net-like structure as the backbone network, which is improved with an upper triangle mask matrix and a single-side relative position representation on the basis of ensuring the causality. The mask matrix preserves the attentional focus on the historical global information and the single-side relative position representation focuses more on the information that needs attention in the local information. In addition, the weighted loss function in both time and fre-quency domains is used to guide the optimization direction of the training. Exhaustive comparison exper-iments are conducted on the Voice-Bank Demand dataset, and the experimental results show that the proposed causal model, compared with existing real-time single-channel speech enhancement models, not only possesses better enhancement results but also has faster training speed and fewer trainable parameters.(c) 2022 Elsevier Ltd. All rights reserved.
引用
收藏
页数:10
相关论文
共 50 条
  • [1] Real-time single-channel deep neural network-based speech enhancement on edge devices
    Shankar, Nikhil
    Bhat, Gautam Shreedhar
    Panahi, Issa M. S.
    [J]. INTERSPEECH 2020, 2020, : 3281 - 3285
  • [2] PERFORMANCE COMPARISON OF REAL-TIME SINGLE-CHANNEL SPEECH DEREVERBERATION ALGORITHMS
    Xiong, Feifei
    Meyer, Bernd T.
    Cauchi, Benjamin
    Jukic, Ante
    Doclo, Simon
    Goetze, Stefan
    [J]. 2017 HANDS-FREE SPEECH COMMUNICATIONS AND MICROPHONE ARRAYS (HSCMA 2017), 2017, : 126 - 130
  • [3] Real-Time Speech Enhancement Algorithm Based on Attention LSTM
    Liang, Ruiyu
    Kong, Fanliu
    Xie, Yue
    Tang, Guichen
    Cheng, Jiaming
    [J]. IEEE ACCESS, 2020, 8 : 48464 - 48476
  • [4] ONLINE DEEP ATTRACTOR NETWORK FOR REAL-TIME SINGLE-CHANNEL SPEECH SEPARATION
    Han, Cong
    Luo, Yi
    Mesgarani, Nima
    [J]. 2019 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2019, : 361 - 365
  • [5] Lightweight Causal Transformer with Local Self-Attention for Real-Time Speech Enhancement
    Oostermeijer, Koen
    Wang, Qing
    Du, Jun
    [J]. INTERSPEECH 2021, 2021, : 2831 - 2835
  • [6] Single-Channel Real-Time Drowsiness Detection Based on Electroencephalography
    Albalawi, Hassan
    Li, Xin
    [J]. 2018 40TH ANNUAL INTERNATIONAL CONFERENCE OF THE IEEE ENGINEERING IN MEDICINE AND BIOLOGY SOCIETY (EMBC), 2018, : 98 - 101
  • [7] Single-Channel Speech Enhancement Based on Psychoacoustic Masking
    Zhou, Tingting
    Zeng, Yumin
    Wang, Rongrong
    [J]. JOURNAL OF THE AUDIO ENGINEERING SOCIETY, 2017, 65 (04): : 272 - 284
  • [8] DUAL-BRANCH ATTENTION-IN-ATTENTION TRANSFORMER FOR SINGLE-CHANNEL SPEECH ENHANCEMENT
    Yu, Guochen
    Li, Andong
    Zheng, Chengshi
    Guo, Yinuo
    Wang, Yutian
    Wang, Hui
    [J]. 2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2022, : 7847 - 7851
  • [9] FULLSUBNET: A FULL-BAND AND SUB-BAND FUSION MODEL FOR REAL-TIME SINGLE-CHANNEL SPEECH ENHANCEMENT
    Hao, Xiang
    Su, Xiangdong
    Horaud, Radu
    Li, Xiaofei
    [J]. 2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021), 2021, : 6633 - 6637
  • [10] Real-time Multi-channel Speech Enhancement Based on Neural Network Masking with Attention Model
    Xue, Cheng
    Huang, Weilong
    Chen, Weiguang
    Feng, Jinwei
    [J]. INTERSPEECH 2021, 2021, : 1862 - 1866