Convolutional Transformer based Local and Global Feature Learning for Speech Enhancement

被引:0
|
作者
Jannu, Chaitanya [1 ]
Vanambathina, Sunny Dayal [1 ]
机构
[1] VIT AP Univ, Sch Elect Engn, Amaravati, India
关键词
Convolutional neural network; recurrent neural network; speech enhancement; multi-head attention; two-stage convolutional transformer; feed-forward network; NEURAL-NETWORK; DILATED CONVOLUTIONS; RECOGNITION;
D O I
暂无
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
Speech enhancement (SE) is an important method for improving speech quality and intelligibility in noisy environments where received speech is severely distorted by noise. An efficient speech enhancement system relies on accurately modelling the long-term dependencies of noisy speech. Deep learning has greatly benefited by the use of transformers where long-term dependencies can be modelled more efficiently with multi-head attention (MHA) by using sequence similarity. Transformers frequently outperform recurrent neural network (RNN) and convolutional neural network (CNN) models in many tasks while utilizing parallel processing. In this paper we proposed a two-stage convolutional transformer for speech enhancement in time domain. The transformer considers global information as well as parallel computing, resulting in a reduction of long-term noise. In the proposed work unlike two -stage transformer neural network (TSTNN) different transformer structures for intra and inter transformers are used for extracting the local as well as global features of noisy speech. Moreover, a CNN module is added to the transformer so that short-term noise can be reduced more effectively, based on the ability of CNN to extract local information. The experimental findings demonstrate that the proposed model outperformed the other existing models in terms of STOI (short-time objective intelligibility), and PESQ (perceptual evaluation of the speech quality).
引用
收藏
页码:731 / 743
页数:13
相关论文
共 50 条
  • [1] LGFCTR: Local and Global Feature Convolutional Transformer for Image Matching
    Zhong, Wenhao
    Jiang, Jie
    EXPERT SYSTEMS WITH APPLICATIONS, 2025, 270
  • [2] Learning Local to Global Feature Aggregation for Speech Emotion Recognition
    Lu, Cheng
    Lian, Hailun
    Zheng, Wenming
    Zong, Yuan
    Zhao, Yan
    Li, Sunan
    INTERSPEECH 2023, 2023, : 1908 - 1912
  • [3] Global Enhancement but Local Suppression in Feature Based Attention
    Mueller, Matthias M.
    Forschack, Norman
    Andersen, Soeren
    PERCEPTION, 2016, 45 : 196 - 196
  • [4] Transformer-Based Visual Object Tracking with Global Feature Enhancement
    Wang, Shuai
    Fang, Genwen
    Liu, Lei
    Wang, Jun
    Zhu, Kongfen
    Melo, Silas N.
    APPLIED SCIENCES-BASEL, 2023, 13 (23):
  • [5] Global-local feature learning for fine-grained food classification based on Swin Transformer
    Kim, Jun-Hwa
    Kim, Namho
    Won, Chee Sun
    ENGINEERING APPLICATIONS OF ARTIFICIAL INTELLIGENCE, 2024, 133
  • [6] LGCNet: Feature Enhancement and Consistency Learning Based on Local and Global Coherence Network for Correspondence Selection
    Wu, Tzu-Han
    Chen, Kuan-Wen
    2023 IEEE INTERNATIONAL CONFERENCE ON ROBOTICS AND AUTOMATION, ICRA, 2023, : 6182 - 6188
  • [7] Global and local feature extraction based on convolutional neural network residual learning for MR image denoising
    Li, Meng
    Yun, Juntong
    Liu, Dingxi
    Jiang, Daixiang
    Xiong, Hanlin
    Jiang, Du
    Hu, Shunbo
    Liu, Rong
    Li, Gongfa
    PHYSICS IN MEDICINE AND BIOLOGY, 2024, 69 (20):
  • [8] Global Enhancement but Local Suppression in Feature-based Attention
    Forschack, Norman
    Andersen, Soren K.
    Mueller, Matthias M.
    JOURNAL OF COGNITIVE NEUROSCIENCE, 2017, 29 (04) : 619 - 627
  • [9] VHF Speech Enhancement Based on Transformer
    Han, Xue
    Pan, Mingyang
    Li, Zhengzhong
    Ge, Haipeng
    Liu, Zongying
    IEEE OPEN JOURNAL OF INTELLIGENT TRANSPORTATION SYSTEMS, 2022, 3 : 146 - 152
  • [10] Fully Convolutional Transformer with Local-Global Attention
    Lee, Sihaeng
    Yi, Eojindl
    Lee, Janghyeon
    Yoo, Jinsu
    Lee, Honglak
    Kim, Seung Hwan
    2022 IEEE/RSJ INTERNATIONAL CONFERENCE ON INTELLIGENT ROBOTS AND SYSTEMS (IROS), 2022, : 552 - 559