Speech Enhancement Using Self-Supervised Pre-Trained Model and Vector Quantization

被引:0
|
作者
Zhao, Xiao-Ying [1 ]
Zhu, Qiu-Shi [2 ]
Zhang, Jie [2 ]
机构
[1] Univ Sci & Technol China USTC, Hefei, Anhui, Peoples R China
[2] Univ Sci & Technol China USTC, Natl Engn Res Ctr Speech & Language Informat Proc, Hefei, Anhui, Peoples R China
基金
中国国家自然科学基金;
关键词
NOISE;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
With the development of deep learning, neural network-based speech enhancement (SE) models have shown excellent performance. Meanwhile, it was shown that the development of self-supervised pre-trained models can be applied to various downstream tasks. In this paper, we will consider the application of the pre-trained model to the real-time SE problem. Specifically, the encoder and bottleneck layer of the DEMUCS model are initialized using the self-supervised pre-trained WavLM model, the convolution in the encoder is replaced by causal convolution, and the transformer encoder in the bottleneck layer is based on causal attention mask. In addition, as discretizing the noisy speech representations is more beneficial for denoising, we utilize a quantization module to discretize the representation output from the bottleneck layer, which is then fed into the decoder to reconstruct the clean speech waveform. Experimental results on the Valentini dataset and an internal dataset show that the pre-trained model based initialization can improve the SE performance and the discretization operation suppresses the noise component in the representations to some extent, which can further improve the performance.
引用
收藏
页码:330 / 334
页数:5
相关论文
共 50 条
  • [21] KNOWLEDGE DISTILLATION FOR NEURAL TRANSDUCERS FROM LARGE SELF-SUPERVISED PRE-TRAINED MODELS
    Yang, Xiaoyu
    Li, Qiujia
    Woodland, Philip C.
    [J]. 2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2022, : 8527 - 8531
  • [22] Self-supervised Learning Based on a Pre-trained Method for the Subtype Classification of Spinal Tumors
    Jiao, Menglei
    Liu, Hong
    Yang, Zekang
    Tian, Shuai
    Ouyang, Hanqiang
    Li, Yuan
    Yuan, Yuan
    Liu, Jianfang
    Wang, Chunjie
    Lang, Ning
    Jiang, Liang
    Yuan, Huishu
    Qian, Yueliang
    Wang, Xiangdong
    [J]. COMPUTATIONAL MATHEMATICS MODELING IN CANCER ANALYSIS, CMMCA 2022, 2022, 13574 : 58 - 67
  • [23] Mitigating Backdoor Attacks in Pre-Trained Encoders via Self-Supervised Knowledge Distillation
    Bie, Rongfang
    Jiang, Jinxiu
    Xie, Hongcheng
    Guo, Yu
    Miao, Yinbin
    Jia, Xiaohua
    [J]. IEEE Transactions on Services Computing, 2024, 17 (05): : 2613 - 2625
  • [24] END-TO-END SPOKEN LANGUAGE UNDERSTANDING USING TRANSFORMER NETWORKS AND SELF-SUPERVISED PRE-TRAINED FEATURES
    Morais, Edmilson
    Kuo, Hong-Kwang J.
    Thomas, Samuel
    Tuske, Zoltan
    Kingsbury, Brian
    [J]. 2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021), 2021, : 7483 - 7487
  • [25] Employing bimodal representations to predict DNA bendability within a self-supervised pre-trained framework
    Yang, Minghao
    Zhang, Shichen
    Zheng, Zhihang
    Zhang, Pengfei
    Liang, Yan
    Tang, Shaojun
    [J]. NUCLEIC ACIDS RESEARCH, 2024, 52 (06)
  • [26] GhostEncoder: Stealthy backdoor attacks with dynamic triggers to pre-trained encoders in self-supervised learning
    Wang, Qiannan
    Yin, Changchun
    Fang, Liming
    Liu, Zhe
    Wang, Run
    Lin, Chenhao
    [J]. COMPUTERS & SECURITY, 2024, 142
  • [27] Boosting Self-Supervised Embeddings for Speech Enhancement
    Hung, Kuo-Hsuan
    Fu, Szu-Wei
    Tseng, Huan-Hsin
    Chiang, Hsin-Tien
    Tsao, Yu
    Lin, Chii-Wann
    [J]. INTERSPEECH 2022, 2022, : 186 - 190
  • [28] A Systematic Review of Transformer-Based Pre-Trained Language Models through Self-Supervised Learning
    Kotei, Evans
    Thirunavukarasu, Ramkumar
    [J]. INFORMATION, 2023, 14 (03)
  • [29] SSCLNet: A Self-Supervised Contrastive Loss-Based Pre-Trained Network for Brain MRI Classification
    Mishra, Animesh
    Jha, Ritesh
    Bhattacharjee, Vandana
    [J]. IEEE ACCESS, 2023, 11 : 6673 - 6681
  • [30] DatUS: Data-driven Unsupervised Semantic Segmentation with Pre-trained Self-supervised Vision Transformer
    Kumar S.
    Sur A.
    Baruah R.D.
    [J]. IEEE Transactions on Cognitive and Developmental Systems, 2024, 16 (05) : 1 - 14