AN INTRA- AND INTER-FRAME SEQUENCE MODEL WITH DISCRETE COSINE TRANSFORM FOR STREAMING SPEECH ENHANCEMENT<bold> </bold>

被引：0

作者：

Zhang, Yuewei ^{[1
]}

Zhuo, Huanbin ^{[2
]}

Zhu, Jie ^{[1
]}

机构：

[1] Shanghai Jiao Tong Univ, Dept Elect Engn, Shanghai, Peoples R China

[2] Tencent Video Cloud, Shanghai, Peoples R China

来源：

2024 IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXPO WORKSHOPS, ICMEW 2024 | 2024年

关键词：

Speech enhancement; dual sequence modeling; discrete cosine transform; causal convolution<bold>; </bold>;

D O I：

10.1109/ICMEW63481.2024.10645392

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Nowadays, in order to improve the speech enhancement performance, many methods attempt to reconstruct the target magnitude and phase spectrum simultaneously. They usually process the complex short-time Fourier transform (STFT) spectrum, leading to a huge model complexity. In this paper, we utilize the short-time discrete cosine transform (STDCT) rather than STFT. Since STDCT is a lossless real-valued transformation with implicit phase, our method achieves an excellent performance with lower complexity. Besides, we take convolutional recurrent network (CRN) as the network backbone, and design a dual sequence modeling block to capture the intra-frame correlation among different frequency bins and the inter-frame context along the time dimension simultaneously, so we name our model IICRN. The experimental results indicate that IICRN achieves superior performance over previous advanced methods.<bold> </bold>

引用

页数：4

共 4 条

[1] Intra- and Inter-frame Features for Automatic Speech Recognition
Lee, Sung Joo
Kang, Byung Ok
Chung, Hoon
Lee, Yunkeun
ETRI JOURNAL, 2014, 36 (03) : 514 - 517
[2] Intra- and inter-frame prediction in bandwidth scalable coding of wideband speech
Song, G. -B.
IET SIGNAL PROCESSING, 2011, 5 (02) : 164 - 170
[3] Accuracy Enhancement in Intra- and Inter-Frame Example Search for Lossless Video Coding
Nemoto, Koji
Kameda, Yusuke
Matsuda, Ichiro
Itoh, Susumu
Unno, Kyohei
Naito, Sei
INTERNATIONAL WORKSHOP ON ADVANCED IMAGING TECHNOLOGY (IWAIT) 2020, 2020, 11515
[4] Speech enhancement based on Laplacian-Gaussian model and simplified phase discrimination in Discrete Cosine Transform domain
School of Electronics and Information, Suzhou University, Suzhou 215021, China
不详
Shengxue Xuebao, 2008, 3 (244-251):

← 1 →