AN INTRA- AND INTER-FRAME SEQUENCE MODEL WITH DISCRETE COSINE TRANSFORM FOR STREAMING SPEECH ENHANCEMENT<bold> </bold>

被引:0
|
作者
Zhang, Yuewei [1 ]
Zhuo, Huanbin [2 ]
Zhu, Jie [1 ]
机构
[1] Shanghai Jiao Tong Univ, Dept Elect Engn, Shanghai, Peoples R China
[2] Tencent Video Cloud, Shanghai, Peoples R China
关键词
Speech enhancement; dual sequence modeling; discrete cosine transform; causal convolution<bold>; </bold>;
D O I
10.1109/ICMEW63481.2024.10645392
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Nowadays, in order to improve the speech enhancement performance, many methods attempt to reconstruct the target magnitude and phase spectrum simultaneously. They usually process the complex short-time Fourier transform (STFT) spectrum, leading to a huge model complexity. In this paper, we utilize the short-time discrete cosine transform (STDCT) rather than STFT. Since STDCT is a lossless real-valued transformation with implicit phase, our method achieves an excellent performance with lower complexity. Besides, we take convolutional recurrent network (CRN) as the network backbone, and design a dual sequence modeling block to capture the intra-frame correlation among different frequency bins and the inter-frame context along the time dimension simultaneously, so we name our model IICRN. The experimental results indicate that IICRN achieves superior performance over previous advanced methods.<bold> </bold>
引用
收藏
页数:4
相关论文
共 4 条
  • [1] Intra- and Inter-frame Features for Automatic Speech Recognition
    Lee, Sung Joo
    Kang, Byung Ok
    Chung, Hoon
    Lee, Yunkeun
    ETRI JOURNAL, 2014, 36 (03) : 514 - 517
  • [2] Intra- and inter-frame prediction in bandwidth scalable coding of wideband speech
    Song, G. -B.
    IET SIGNAL PROCESSING, 2011, 5 (02) : 164 - 170
  • [3] Accuracy Enhancement in Intra- and Inter-Frame Example Search for Lossless Video Coding
    Nemoto, Koji
    Kameda, Yusuke
    Matsuda, Ichiro
    Itoh, Susumu
    Unno, Kyohei
    Naito, Sei
    INTERNATIONAL WORKSHOP ON ADVANCED IMAGING TECHNOLOGY (IWAIT) 2020, 2020, 11515
  • [4] Speech enhancement based on Laplacian-Gaussian model and simplified phase discrimination in Discrete Cosine Transform domain
    School of Electronics and Information, Suzhou University, Suzhou 215021, China
    不详
    Shengxue Xuebao, 2008, 3 (244-251):