Convolutional Transformer based Local and Global Feature Learning for Speech Enhancement

被引:0
|
作者
Jannu, Chaitanya [1 ]
Vanambathina, Sunny Dayal [1 ]
机构
[1] VIT AP Univ, Sch Elect Engn, Amaravati, India
关键词
Convolutional neural network; recurrent neural network; speech enhancement; multi-head attention; two-stage convolutional transformer; feed-forward network; NEURAL-NETWORK; DILATED CONVOLUTIONS; RECOGNITION;
D O I
暂无
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
Speech enhancement (SE) is an important method for improving speech quality and intelligibility in noisy environments where received speech is severely distorted by noise. An efficient speech enhancement system relies on accurately modelling the long-term dependencies of noisy speech. Deep learning has greatly benefited by the use of transformers where long-term dependencies can be modelled more efficiently with multi-head attention (MHA) by using sequence similarity. Transformers frequently outperform recurrent neural network (RNN) and convolutional neural network (CNN) models in many tasks while utilizing parallel processing. In this paper we proposed a two-stage convolutional transformer for speech enhancement in time domain. The transformer considers global information as well as parallel computing, resulting in a reduction of long-term noise. In the proposed work unlike two -stage transformer neural network (TSTNN) different transformer structures for intra and inter transformers are used for extracting the local as well as global features of noisy speech. Moreover, a CNN module is added to the transformer so that short-term noise can be reduced more effectively, based on the ability of CNN to extract local information. The experimental findings demonstrate that the proposed model outperformed the other existing models in terms of STOI (short-time objective intelligibility), and PESQ (perceptual evaluation of the speech quality).
引用
收藏
页码:731 / 743
页数:13
相关论文
共 50 条
  • [31] Global-Local Motion Transformer for Unsupervised Skeleton-Based Action Learning
    Kim, Boeun
    Chang, Hyung Jin
    Kim, Jungho
    Choi, Jin Young
    COMPUTER VISION - ECCV 2022, PT IV, 2022, 13664 : 209 - 225
  • [32] WaterFormer: A Global–Local Transformer for Underwater Image Enhancement With Environment Adaptor
    Wen, Junjie
    Cui, Jinqiang
    Yang, Guidong
    Zhao, Benyun
    Zhai, Yu
    Gao, Zhi
    Dou, Lihua
    Chen, Ben M.
    IEEE ROBOTICS & AUTOMATION MAGAZINE, 2024, 31 (01) : 29 - 40
  • [33] Frequency transformer with local feature enhancement for improved vehicle re-identification
    Xiang, Honglin
    Wang, Jiahao
    Sun, Yulong
    Ye, Ming
    JOURNAL OF SUPERCOMPUTING, 2025, 81 (04):
  • [34] An Exploration of Length Generalization in Transformer-Based Speech Enhancement
    Zhang, Qiquan
    Zhu, Hongxu
    Qian, Xinyuan
    Ambikairajah, Eliathamby
    Li, Haizhou
    INTERSPEECH 2024, 2024, : 1725 - 1729
  • [35] Local-Global Feature-Aware Transformer Based Residual Network for Hyperspectral Image Denoising
    Wang, Fengfeng
    Li, Jie
    Yuan, Qiangqiang
    Zhang, Liangpei
    IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, 2022, 60
  • [36] Gait recognition with global-local feature fusion based on swin transformer-3DCNN
    Wang, Ting
    Zhou, Guanghang
    Pu, Yanfeng
    Moreno, Ramon
    Yang, Guoping
    SIGNAL IMAGE AND VIDEO PROCESSING, 2025, 19 (01)
  • [37] CONVOLUTIONAL NEURAL NETWORKS CONSIDERING LOCAL AND GLOBAL FEATURES FOR IMAGE ENHANCEMENT
    Kinoshita, Yuma
    Kiya, Hitoshi
    2019 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING (ICIP), 2019, : 2110 - 2114
  • [38] NSE-CATNet: Deep Neural Speech Enhancement Using Convolutional Attention Transformer Network
    Saleem, Nasir
    Gunawan, Teddy Surya
    Kartiwi, Mira
    Nugroho, Bambang Setia
    Wijayanto, Inung
    IEEE ACCESS, 2023, 11 : 66979 - 66994
  • [39] A parallel convolutional neural network-transformer model for underwater target recognition based on multimodal feature learning
    Cui, Xuerong
    Zheng, Qingqing
    Li, Juan
    Jiang, Bin
    Li, Shibao
    Liu, Jianhang
    PROCEEDINGS OF THE INSTITUTION OF MECHANICAL ENGINEERS PART M-JOURNAL OF ENGINEERING FOR THE MARITIME ENVIRONMENT, 2024, 238 (04) : 943 - 953
  • [40] Regression-Based Speech Enhancement by Convolutional Neural Network
    Erseven, Mustafa
    Bolat, Bulent
    2018 26TH SIGNAL PROCESSING AND COMMUNICATIONS APPLICATIONS CONFERENCE (SIU), 2018,