Zero-shot test time adaptation via knowledge distillation for personalized speech denoising and dereverberation

被引:2
|
作者
Kim, Sunwoo [1 ]
Athi, Mrudula [1 ]
Shi, Guangji [1 ]
Kim, Minje [1 ,2 ]
Kristjansson, Trausti [1 ]
机构
[1] Amazon Lab126, Sunnyvale, CA 94089 USA
[2] Univ Illinois, Dept Comp Sci, Urbana, IL 61801 USA
来源
基金
美国国家科学基金会;
关键词
DOMAIN ADAPTATION; ENHANCEMENT; NOISE;
D O I
10.1121/10.0024621
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
A personalization framework to adapt compact models to test time environments and improve their speech enhancement (SE) performance in noisy and reverberant conditions is proposed. The use-cases are when the end-user device encounters only one or a few speakers and noise types that tend to reoccur in the specific acoustic environment. Hence, a small personalized model that is sufficient to handle this focused subset of the original universal SE problem is postulated. The study addresses a major data shortage issue: although the goal is to learn from a specific user's speech signals and the test time environment, the target clean speech is unavailable for model training due to privacy-related concerns and technical difficulty of recording noise and reverberation-free voice signals. The proposed zero-shot personalization method uses no clean speech target. Instead, it employs the knowledge distillation framework, where the more advanced denoising results from an overly large teacher work as pseudo targets to train a small student model. Evaluation on various test time conditions suggests that the proposed personalization approach can significantly enhance the compact student model's test time performance. Personalized models outperform larger non-personalized baseline models, demonstrating that personalization achieves model compression with no loss in dereverberation and denoising performance.
引用
收藏
页码:1353 / 1367
页数:15
相关论文
共 50 条
  • [1] TEST-TIME ADAPTATION TOWARD PERSONALIZED SPEECH ENHANCEMENT: ZERO-SHOT LEARNING WITH KNOWLEDGE DISTILLATION
    Kim, Sunwoo
    Kim, Minje
    2021 IEEE WORKSHOP ON APPLICATIONS OF SIGNAL PROCESSING TO AUDIO AND ACOUSTICS (WASPAA), 2021, : 176 - 180
  • [2] Zero-Shot Knowledge Distillation in Deep Networks
    Nayak, Gaurav Kumar
    Mopuri, Konda Reddy
    Shaj, Vaisakh
    Babu, R. Venkatesh
    Chakraborty, Anirban
    INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 97, 2019, 97
  • [3] Zero-Shot Text Normalization via Cross-Lingual Knowledge Distillation
    Wang, Linqin
    Huang, Xiang
    Yu, Zhengtao
    Peng, Hao
    Gao, Shengxiang
    Mao, Cunli
    Huang, Yuxin
    Dong, Ling
    Yu, Philip S.
    IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2024, 32 : 4631 - 4646
  • [4] Improving Zero-Shot Generalization of Learned Prompts via Unsupervised Knowledge Distillation
    Mistretta, Marco
    Baldrati, Alberto
    Bertini, Marco
    Bagdanov, Andrew D.
    COMPUTER VISION - ECCV 2024, PT LXXXIV, 2025, 15142 : 459 - 477
  • [5] Triple-0: Zero-shot denoising and dereverberation on an end-to-end frozen anechoic speech separation network
    Gul, Sania
    Khan, Muhammad Salman
    Ur-Rehman, Ata
    PLOS ONE, 2024, 19 (07):
  • [6] Robust Test-Time Adaptation for Zero-Shot Prompt Tuning
    Zhang, Ding-Chu
    Zhou, Zhi
    Li, Yu-Feng
    THIRTY-EIGHTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 38 NO 15, 2024, : 16714 - 16722
  • [7] Zero-Shot Cross-Lingual Knowledge Transfer in VQA via Multimodal Distillation
    Weng, Yu
    Dong, Jun
    He, Wenbin
    Chaomurilige
    Liu, Xuan
    Liu, Zheng
    Gao, Honghao
    IEEE TRANSACTIONS ON COMPUTATIONAL SOCIAL SYSTEMS, 2024, : 1 - 11
  • [8] Denoising Knowledge Transfer Model for Zero-Shot MRI Reconstruction
    Hou, Ruizhi
    Li, Fang
    IEEE TRANSACTIONS ON COMPUTATIONAL IMAGING, 2025, 11 : 52 - 64
  • [9] Zero-Shot Visual Sentiment Prediction via Cross-Domain Knowledge Distillation
    Moroto, Yuya
    Ye, Yingrui
    Maeda, Keisuke
    Ogawa, Takahiro
    Haseyama, Miki
    IEEE OPEN JOURNAL OF SIGNAL PROCESSING, 2024, 5 : 177 - 185
  • [10] Towards Zero-Shot Knowledge Distillation for Natural Language Processing
    Rashid, Ahmad
    Lioutas, Vasileios
    Ghaddar, Abbas
    Rezagholizadeh, Mehdi
    2021 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING (EMNLP 2021), 2021, : 6551 - 6561