Speech Enhancement Using Self-Supervised Pre-Trained Model and Vector Quantization

被引:0
|
作者
Zhao, Xiao-Ying [1 ]
Zhu, Qiu-Shi [2 ]
Zhang, Jie [2 ]
机构
[1] Univ Sci & Technol China USTC, Hefei, Anhui, Peoples R China
[2] Univ Sci & Technol China USTC, Natl Engn Res Ctr Speech & Language Informat Proc, Hefei, Anhui, Peoples R China
基金
中国国家自然科学基金;
关键词
NOISE;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
With the development of deep learning, neural network-based speech enhancement (SE) models have shown excellent performance. Meanwhile, it was shown that the development of self-supervised pre-trained models can be applied to various downstream tasks. In this paper, we will consider the application of the pre-trained model to the real-time SE problem. Specifically, the encoder and bottleneck layer of the DEMUCS model are initialized using the self-supervised pre-trained WavLM model, the convolution in the encoder is replaced by causal convolution, and the transformer encoder in the bottleneck layer is based on causal attention mask. In addition, as discretizing the noisy speech representations is more beneficial for denoising, we utilize a quantization module to discretize the representation output from the bottleneck layer, which is then fed into the decoder to reconstruct the clean speech waveform. Experimental results on the Valentini dataset and an internal dataset show that the pre-trained model based initialization can improve the SE performance and the discretization operation suppresses the noise component in the representations to some extent, which can further improve the performance.
引用
收藏
页码:330 / 334
页数:5
相关论文
共 50 条
  • [1] Self-Supervised Quantization of Pre-Trained Neural Networks for Multiplierless Acceleration
    Vogel, Sebastian
    Springer, Jannik
    Guntoro, Andre
    Ascheid, Gerd
    [J]. 2019 DESIGN, AUTOMATION & TEST IN EUROPE CONFERENCE & EXHIBITION (DATE), 2019, : 1094 - 1099
  • [2] Interpretabilty of Speech Emotion Recognition modelled using Self-Supervised Speech and Text Pre-Trained Embeddings
    Girish, K. V. Vijay
    Konjeti, Srikanth
    Vepa, Jithendra
    [J]. INTERSPEECH 2022, 2022, : 4496 - 4500
  • [3] SPIQ: A Self-Supervised Pre-Trained Model for Image Quality Assessment
    Chen, Pengfei
    Li, Leida
    Wu, Qingbo
    Wu, Jinjian
    [J]. IEEE Signal Processing Letters, 2022, 29 : 513 - 517
  • [4] SPIQ: A Self-Supervised Pre-Trained Model for Image Quality Assessment
    Chen, Pengfei
    Li, Leida
    Wu, Qingbo
    Wu, Jinjian
    [J]. IEEE SIGNAL PROCESSING LETTERS, 2022, 29 : 513 - 517
  • [5] Adapting Pre-Trained Self-Supervised Learning Model for Speech Recognition with Light-Weight Adapters
    Yue, Xianghu
    Gao, Xiaoxue
    Qian, Xinyuan
    Li, Haizhou
    [J]. ELECTRONICS, 2024, 13 (01)
  • [6] Improving Speech Separation with Knowledge Distilled from Self-supervised Pre-trained Models
    Qu, Bowen
    Li, Chenda
    Bai, Jinfeng
    Qian, Yanmin
    [J]. 2022 13TH INTERNATIONAL SYMPOSIUM ON CHINESE SPOKEN LANGUAGE PROCESSING (ISCSLP), 2022, : 329 - 333
  • [7] Explore the Use of Self-supervised Pre-trained Acoustic Features on Disguised Speech Detection
    Quan, Jie
    Yang, Yingchun
    [J]. BIOMETRIC RECOGNITION (CCBR 2021), 2021, 12878 : 483 - 490
  • [8] Unsupervised Visual Anomaly Detection Using Self-Supervised Pre-Trained Transformer
    Kim, Jun-Hyung
    Kwon, Goo-Rak
    [J]. IEEE ACCESS, 2024, 12 : 127604 - 127613
  • [9] ON THE USE OF SELF-SUPERVISED PRE-TRAINED ACOUSTIC AND LINGUISTIC FEATURES FOR CONTINUOUS SPEECH EMOTION RECOGNITION
    Macary, Manon
    Tahon, Marie
    Esteve, Yannick
    Rousseau, Anthony
    [J]. 2021 IEEE SPOKEN LANGUAGE TECHNOLOGY WORKSHOP (SLT), 2021, : 373 - 380
  • [10] BadEncoder: Backdoor Attacks to Pre-trained Encoders in Self-Supervised Learning
    Jia, Jinyuan
    Liu, Yupei
    Gong, Neil Zhenqiang
    [J]. 43RD IEEE SYMPOSIUM ON SECURITY AND PRIVACY (SP 2022), 2022, : 2043 - 2059