Convolutional Transformer based Local and Global Feature Learning for Speech Enhancement

被引：0

作者：

Jannu, Chaitanya ^{[1
]}

Vanambathina, Sunny Dayal ^{[1
]}

机构：

[1] VIT AP Univ, Sch Elect Engn, Amaravati, India

来源：

INTERNATIONAL JOURNAL OF ADVANCED COMPUTER SCIENCE AND APPLICATIONS | 2023年 / 14卷 / 01期

关键词：

Convolutional neural network; recurrent neural network; speech enhancement; multi-head attention; two-stage convolutional transformer; feed-forward network; NEURAL-NETWORK; DILATED CONVOLUTIONS; RECOGNITION;

D O I：

暂无

中图分类号：

TP301 [理论、方法];

学科分类号：

081202 ;

摘要：

Speech enhancement (SE) is an important method for improving speech quality and intelligibility in noisy environments where received speech is severely distorted by noise. An efficient speech enhancement system relies on accurately modelling the long-term dependencies of noisy speech. Deep learning has greatly benefited by the use of transformers where long-term dependencies can be modelled more efficiently with multi-head attention (MHA) by using sequence similarity. Transformers frequently outperform recurrent neural network (RNN) and convolutional neural network (CNN) models in many tasks while utilizing parallel processing. In this paper we proposed a two-stage convolutional transformer for speech enhancement in time domain. The transformer considers global information as well as parallel computing, resulting in a reduction of long-term noise. In the proposed work unlike two -stage transformer neural network (TSTNN) different transformer structures for intra and inter transformers are used for extracting the local as well as global features of noisy speech. Moreover, a CNN module is added to the transformer so that short-term noise can be reduced more effectively, based on the ability of CNN to extract local information. The experimental findings demonstrate that the proposed model outperformed the other existing models in terms of STOI (short-time objective intelligibility), and PESQ (perceptual evaluation of the speech quality).

引用

页码：731 / 743

页数：13

共 50 条

[31] Global-Local Motion Transformer for Unsupervised Skeleton-Based Action Learning
Kim, Boeun
Chang, Hyung Jin
Kim, Jungho
Choi, Jin Young
COMPUTER VISION - ECCV 2022, PT IV, 2022, 13664 : 209 - 225
[32] WaterFormer: A Global–Local Transformer for Underwater Image Enhancement With Environment Adaptor
Wen, Junjie
Cui, Jinqiang
Yang, Guidong
Zhao, Benyun
Zhai, Yu
Gao, Zhi
Dou, Lihua
Chen, Ben M.
IEEE ROBOTICS & AUTOMATION MAGAZINE, 2024, 31 (01) : 29 - 40
[33] Frequency transformer with local feature enhancement for improved vehicle re-identification
Xiang, Honglin
Wang, Jiahao
Sun, Yulong
Ye, Ming
JOURNAL OF SUPERCOMPUTING, 2025, 81 (04):
[34] An Exploration of Length Generalization in Transformer-Based Speech Enhancement
Zhang, Qiquan
Zhu, Hongxu
Qian, Xinyuan
Ambikairajah, Eliathamby
Li, Haizhou
INTERSPEECH 2024, 2024, : 1725 - 1729
[35] Local-Global Feature-Aware Transformer Based Residual Network for Hyperspectral Image Denoising
Wang, Fengfeng
Li, Jie
Yuan, Qiangqiang
Zhang, Liangpei
IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, 2022, 60
[36] Gait recognition with global-local feature fusion based on swin transformer-3DCNN
Wang, Ting
Zhou, Guanghang
Pu, Yanfeng
Moreno, Ramon
Yang, Guoping
SIGNAL IMAGE AND VIDEO PROCESSING, 2025, 19 (01)
[37] CONVOLUTIONAL NEURAL NETWORKS CONSIDERING LOCAL AND GLOBAL FEATURES FOR IMAGE ENHANCEMENT
Kinoshita, Yuma
Kiya, Hitoshi
2019 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING (ICIP), 2019, : 2110 - 2114
[38] NSE-CATNet: Deep Neural Speech Enhancement Using Convolutional Attention Transformer Network
Saleem, Nasir
Gunawan, Teddy Surya
Kartiwi, Mira
Nugroho, Bambang Setia
Wijayanto, Inung
IEEE ACCESS, 2023, 11 : 66979 - 66994
[39] A parallel convolutional neural network-transformer model for underwater target recognition based on multimodal feature learning
Cui, Xuerong
Zheng, Qingqing
Li, Juan
Jiang, Bin
Li, Shibao
Liu, Jianhang
PROCEEDINGS OF THE INSTITUTION OF MECHANICAL ENGINEERS PART M-JOURNAL OF ENGINEERING FOR THE MARITIME ENVIRONMENT, 2024, 238 (04) : 943 - 953
[40] Regression-Based Speech Enhancement by Convolutional Neural Network
Erseven, Mustafa
Bolat, Bulent
2018 26TH SIGNAL PROCESSING AND COMMUNICATIONS APPLICATIONS CONFERENCE (SIU), 2018,

← 1 2 3 4 5 →