Convolutional Transformer based Local and Global Feature Learning for Speech Enhancement

被引：0

作者：

Jannu, Chaitanya ^{[1
]}

Vanambathina, Sunny Dayal ^{[1
]}

机构：

[1] VIT AP Univ, Sch Elect Engn, Amaravati, India

来源：

INTERNATIONAL JOURNAL OF ADVANCED COMPUTER SCIENCE AND APPLICATIONS | 2023年 / 14卷 / 01期

关键词：

Convolutional neural network; recurrent neural network; speech enhancement; multi-head attention; two-stage convolutional transformer; feed-forward network; NEURAL-NETWORK; DILATED CONVOLUTIONS; RECOGNITION;

D O I：

暂无

中图分类号：

TP301 [理论、方法];

学科分类号：

081202 ;

摘要：

Speech enhancement (SE) is an important method for improving speech quality and intelligibility in noisy environments where received speech is severely distorted by noise. An efficient speech enhancement system relies on accurately modelling the long-term dependencies of noisy speech. Deep learning has greatly benefited by the use of transformers where long-term dependencies can be modelled more efficiently with multi-head attention (MHA) by using sequence similarity. Transformers frequently outperform recurrent neural network (RNN) and convolutional neural network (CNN) models in many tasks while utilizing parallel processing. In this paper we proposed a two-stage convolutional transformer for speech enhancement in time domain. The transformer considers global information as well as parallel computing, resulting in a reduction of long-term noise. In the proposed work unlike two -stage transformer neural network (TSTNN) different transformer structures for intra and inter transformers are used for extracting the local as well as global features of noisy speech. Moreover, a CNN module is added to the transformer so that short-term noise can be reduced more effectively, based on the ability of CNN to extract local information. The experimental findings demonstrate that the proposed model outperformed the other existing models in terms of STOI (short-time objective intelligibility), and PESQ (perceptual evaluation of the speech quality).

引用

页码：731 / 743

页数：13

共 50 条

[21] A Convolutional Neural Network with Non-Local Module for Speech Enhancement
Li, Xiaoqi
Li, Yaxing
Li, Meng
Xu, Shan
Dong, Yuanjie
Sun, Xinrong
Xiong, Shengwu
INTERSPEECH 2019, 2019, : 1796 - 1800
[22] Speech Enhancement based on Deep Convolutional Neural Network
Nuthakki, Ramesh
Masanta, Payel
Yukta, T. N.
PROCEEDINGS OF THE 2021 FIFTH INTERNATIONAL CONFERENCE ON I-SMAC (IOT IN SOCIAL, MOBILE, ANALYTICS AND CLOUD) (I-SMAC 2021), 2021, : 770 - 775
[23] SETransformer: Speech Enhancement Transformer
Yu, Weiwei
Zhou, Jian
Wang, HuaBin
Tao, Liang
COGNITIVE COMPUTATION, 2022, 14 (03) : 1152 - 1158
[24] SETransformer: Speech Enhancement Transformer
Weiwei Yu
Jian Zhou
HuaBin Wang
Liang Tao
Cognitive Computation, 2022, 14 : 1152 - 1158
[25] ds-FCRN: three-dimensional dual-stream fully convolutional residual networks and transformer-based global-local feature learning for brain age prediction
Wu, Yutong
Zhang, Chen
Ma, Xiangge
Zhu, Xinyu
Lin, Lan
Tian, Miao
BRAIN STRUCTURE & FUNCTION, 2025, 230 (02):
[26] UTran-DSR: a novel transformer-based model using feature enhancement for dysarthric speech recognition
Irshad, Usama
Mahum, Rabbia
Ganiyu, Ismaila
Butt, Faisal Shafique
Hidri, Lotfi
Ali, Tamer G.
El-Sherbeeny, Ahmed M.
EURASIP JOURNAL ON AUDIO SPEECH AND MUSIC PROCESSING, 2024, 2024 (01):
[27] Disentangled Feature Learning for Noise-Invariant Speech Enhancement
Bae, Soo Hyun
Choi, Inkyu
Kim, Nam Soo
APPLIED SCIENCES-BASEL, 2019, 9 (11):
[28] Upgraded Attention-Based Local Feature Learning Block for Speech Emotion Recognition
Zhao, Huan
Gao, Yingxue
Xiao, Yufeng
ADVANCES IN KNOWLEDGE DISCOVERY AND DATA MINING, PAKDD 2021, PT II, 2021, 12713 : 118 - 130
[29] TF-LOCOFORMER: TRANSFORMER WITH LOCAL MODELING BY CONVOLUTION FOR SPEECH SEPARATION AND ENHANCEMENT
Saijo, Kohei
Wichern, Gordon
Germain, Francois G.
Pan, Zexu
Le Roux, Jonathan
2024 18TH INTERNATIONAL WORKSHOP ON ACOUSTIC SIGNAL ENHANCEMENT, IWAENC 2024, 2024, : 205 - 209
[30] Global-Local Motion Transformer for Unsupervised Skeleton-Based Action Learning
Kim, Boeun
Chang, Hyung Jin
Kim, Jungho
Choi, Jin Young
Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), 2022, 13664 LNCS : 209 - 225

← 1 2 3 4 5 →