Adaptive selection of local and non-local attention mechanisms for speech enhancement

被引:0
|
作者
Xu, Xinmeng [1 ]
Tu, Weiping [1 ,2 ,3 ]
Yang, Yuhong [1 ,3 ]
机构
[1] Wuhan Univ, Natl Engn Res Ctr Multimedia Software, Sch Comp Sci, Wuhan, Peoples R China
[2] Hubei Luojia Lab, Wuhan, Peoples R China
[3] Wuhan Univ, Hubei Key Lab Multimedia & Network Commun Engn, Wuhan, Peoples R China
关键词
Speech enhancement; Local and non-local attention; Adaptive selection; Reinforcement learning; Difficulty-adjusted reward; DOMAIN; NOISE;
D O I
10.1016/j.neunet.2024.106236
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In speech enhancement tasks, local and non -local attention mechanisms have been significantly improved and well studied. However, a natural speech signal contains many dynamic and fast -changing acoustic features, and focusing on one type of attention mechanism (local or non -local) cannot precisely capture the most discriminative information for estimating target speech from background interference. To address this issue, we introduce an adaptive selection network to dynamically select an appropriate route that determines whether to use the attention mechanisms and which to use for the task. We train the adaptive selection network using reinforcement learning with a developed difficulty -adjusted reward that is related to the performance, complexity, and difficulty of target speech estimation from the noisy mixtures. Consequently, we propose an A ttention S election S peech E nhancement N etwork (ASSENet) with the innovative dynamic block that consists of an adaptive selection network and a local and non -local attention based speech enhancement network. In particular, the ASSENet incorporates both local and non -local attention and develops the attention mechanism selection technique to explore the appropriate route of local and non -local attention mechanisms for speech enhancement tasks. The results show that our method achieves comparable and superior performance to existing approaches with attractive computational costs.
引用
收藏
页数:11
相关论文
共 50 条
  • [1] CASE-Net: Integrating local and non-local attention operations for speech enhancement
    Xu, Xinmeng
    Tu, Weiping
    Yang, Yuhong
    [J]. SPEECH COMMUNICATION, 2023, 148 : 31 - 39
  • [2] Infrared image enhancement based on adaptive non-local filter and local contrast
    Zhang F.
    Hu H.
    Wang Y.
    [J]. Optik, 2023, 292
  • [3] A Convolutional Neural Network with Non-Local Module for Speech Enhancement
    Li, Xiaoqi
    Li, Yaxing
    Li, Meng
    Xu, Shan
    Dong, Yuanjie
    Sun, Xinrong
    Xiong, Shengwu
    [J]. INTERSPEECH 2019, 2019, : 1796 - 1800
  • [4] A Classification-Based Non-local Means Adaptive Filtering for Speech Enhancement and Its FPGA Prototype
    Srinivas, Nagapuri
    Pradhan, Gayadhar
    Kumar, Puli Kishore
    [J]. CIRCUITS SYSTEMS AND SIGNAL PROCESSING, 2020, 39 (05) : 2489 - 2506
  • [5] A Classification-Based Non-local Means Adaptive Filtering for Speech Enhancement and Its FPGA Prototype
    Nagapuri Srinivas
    Gayadhar Pradhan
    Puli Kishore Kumar
    [J]. Circuits, Systems, and Signal Processing, 2020, 39 : 2489 - 2506
  • [6] An adaptive coupling approach of local and non-local micromechanics
    Yang, Zihao
    Zheng, Shaoqi
    Han, Fei
    Guan, Xiaofei
    Zhang, Jieqiong
    [J]. JOURNAL OF COMPUTATIONAL PHYSICS, 2023, 489
  • [7] Speech Style Effects on Local and Non-local Coarticulation in French
    Turco, Giuseppina
    Guitard-Ivent, Fanny
    Fougeron, Cecile
    [J]. STUDIES ON SPEECH PRODUCTION, 2018, 10733 : 121 - 133
  • [8] A COARSE-TO-FINE FRAMEWORK FOR LEARNED COLOR ENHANCEMENT WITH NON-LOCAL ATTENTION
    Shan, Chaowei
    Zhang, Zhizheng
    Chen, Zhibo
    [J]. 2019 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING (ICIP), 2019, : 949 - 953
  • [9] NON-LOCAL FIELD AND NON-LOCAL INTERACTION
    KATAYAMA, Y
    [J]. PROGRESS OF THEORETICAL PHYSICS, 1952, 8 (03): : 381 - 382
  • [10] Enhancement of Noisy Speech Signal by Non-Local Means Estimation of Variational Mode Functions
    Srinivas, Nagapuri
    Pradhan, Gayadhar
    Shahnawazuddin, S.
    [J]. 19TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2018), VOLS 1-6: SPEECH RESEARCH FOR EMERGING MARKETS IN MULTILINGUAL SOCIETIES, 2018, : 1156 - 1160