Convolutional Transformer based Local and Global Feature Learning for Speech Enhancement

被引:0
|
作者
Jannu, Chaitanya [1 ]
Vanambathina, Sunny Dayal [1 ]
机构
[1] VIT AP Univ, Sch Elect Engn, Amaravati, India
关键词
Convolutional neural network; recurrent neural network; speech enhancement; multi-head attention; two-stage convolutional transformer; feed-forward network; NEURAL-NETWORK; DILATED CONVOLUTIONS; RECOGNITION;
D O I
暂无
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
Speech enhancement (SE) is an important method for improving speech quality and intelligibility in noisy environments where received speech is severely distorted by noise. An efficient speech enhancement system relies on accurately modelling the long-term dependencies of noisy speech. Deep learning has greatly benefited by the use of transformers where long-term dependencies can be modelled more efficiently with multi-head attention (MHA) by using sequence similarity. Transformers frequently outperform recurrent neural network (RNN) and convolutional neural network (CNN) models in many tasks while utilizing parallel processing. In this paper we proposed a two-stage convolutional transformer for speech enhancement in time domain. The transformer considers global information as well as parallel computing, resulting in a reduction of long-term noise. In the proposed work unlike two -stage transformer neural network (TSTNN) different transformer structures for intra and inter transformers are used for extracting the local as well as global features of noisy speech. Moreover, a CNN module is added to the transformer so that short-term noise can be reduced more effectively, based on the ability of CNN to extract local information. The experimental findings demonstrate that the proposed model outperformed the other existing models in terms of STOI (short-time objective intelligibility), and PESQ (perceptual evaluation of the speech quality).
引用
收藏
页码:731 / 743
页数:13
相关论文
共 50 条
  • [21] A Convolutional Neural Network with Non-Local Module for Speech Enhancement
    Li, Xiaoqi
    Li, Yaxing
    Li, Meng
    Xu, Shan
    Dong, Yuanjie
    Sun, Xinrong
    Xiong, Shengwu
    INTERSPEECH 2019, 2019, : 1796 - 1800
  • [22] Speech Enhancement based on Deep Convolutional Neural Network
    Nuthakki, Ramesh
    Masanta, Payel
    Yukta, T. N.
    PROCEEDINGS OF THE 2021 FIFTH INTERNATIONAL CONFERENCE ON I-SMAC (IOT IN SOCIAL, MOBILE, ANALYTICS AND CLOUD) (I-SMAC 2021), 2021, : 770 - 775
  • [23] SETransformer: Speech Enhancement Transformer
    Yu, Weiwei
    Zhou, Jian
    Wang, HuaBin
    Tao, Liang
    COGNITIVE COMPUTATION, 2022, 14 (03) : 1152 - 1158
  • [24] SETransformer: Speech Enhancement Transformer
    Weiwei Yu
    Jian Zhou
    HuaBin Wang
    Liang Tao
    Cognitive Computation, 2022, 14 : 1152 - 1158
  • [25] ds-FCRN: three-dimensional dual-stream fully convolutional residual networks and transformer-based global-local feature learning for brain age prediction
    Wu, Yutong
    Zhang, Chen
    Ma, Xiangge
    Zhu, Xinyu
    Lin, Lan
    Tian, Miao
    BRAIN STRUCTURE & FUNCTION, 2025, 230 (02):
  • [26] UTran-DSR: a novel transformer-based model using feature enhancement for dysarthric speech recognition
    Irshad, Usama
    Mahum, Rabbia
    Ganiyu, Ismaila
    Butt, Faisal Shafique
    Hidri, Lotfi
    Ali, Tamer G.
    El-Sherbeeny, Ahmed M.
    EURASIP JOURNAL ON AUDIO SPEECH AND MUSIC PROCESSING, 2024, 2024 (01):
  • [27] Disentangled Feature Learning for Noise-Invariant Speech Enhancement
    Bae, Soo Hyun
    Choi, Inkyu
    Kim, Nam Soo
    APPLIED SCIENCES-BASEL, 2019, 9 (11):
  • [28] Upgraded Attention-Based Local Feature Learning Block for Speech Emotion Recognition
    Zhao, Huan
    Gao, Yingxue
    Xiao, Yufeng
    ADVANCES IN KNOWLEDGE DISCOVERY AND DATA MINING, PAKDD 2021, PT II, 2021, 12713 : 118 - 130
  • [29] TF-LOCOFORMER: TRANSFORMER WITH LOCAL MODELING BY CONVOLUTION FOR SPEECH SEPARATION AND ENHANCEMENT
    Saijo, Kohei
    Wichern, Gordon
    Germain, Francois G.
    Pan, Zexu
    Le Roux, Jonathan
    2024 18TH INTERNATIONAL WORKSHOP ON ACOUSTIC SIGNAL ENHANCEMENT, IWAENC 2024, 2024, : 205 - 209
  • [30] Global-Local Motion Transformer for Unsupervised Skeleton-Based Action Learning
    Kim, Boeun
    Chang, Hyung Jin
    Kim, Jungho
    Choi, Jin Young
    Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), 2022, 13664 LNCS : 209 - 225