A Systematic Comparison of Phonetic Aware Techniques for Speech Enhancement

被引:0
|
作者
Tal, Or [1 ]
Mandel, Moshe [1 ]
Kreuk, Felix [2 ]
Adi, Yossi [1 ,2 ]
机构
[1] Hebrew Univ Jerusalem, Jerusalem, Israel
[2] Meta AI Res, New York, NY USA
来源
关键词
speech enhancement; phonetic-models; self-supervised learning; automatic speech recognition;
D O I
10.21437/Interspeech.2022-695
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
Speech enhancement has seen great improvement in recent years using end-to-end neural networks. However, most models are agnostic to the spoken phonetic content. Recently, several studies suggested phonetic-aware speech enhancement, mostly using perceptual supervision. Yet, injecting phonetic features during model optimization can take additional forms (e.g., model conditioning). In this paper, we conduct a systematic comparison between different methods of incorporating phonetic information in a speech enhancement model. By conducting a series of controlled experiments, we observe the influence of different phonetic content models as well as various feature-injection techniques on enhancement performance, considering both causal and non-causal models. Specifically, we evaluate three settings for injecting phonetic information, namely: i) feature conditioning; ii) perceptual supervision; and iii) regularization. Phonetic features are obtained using an intermediate layer of either a supervised pre-trained Automatic Speech Recognition (ASR) model or by using a pre-trained Self-Supervised Learning (SSL) model. We further observe the effect of choosing different embedding layers on performance, considering both manual and learned configurations. Results suggest that using a SSL model as phonetic features outperforms the ASR one in most cases. Interestingly, the conditioning setting performs best among the evaluated configurations. Code is available on the following repository.
引用
收藏
页码:1193 / 1197
页数:5
相关论文
共 50 条
  • [1] Incorporating Broad Phonetic Information for Speech Enhancement
    Lu, Yen-Ju
    Liao, Chien-Feng
    Lu, Xugang
    Hung, Jeih-weih
    Tsao, Yu
    [J]. INTERSPEECH 2020, 2020, : 2417 - 2421
  • [2] IMPROVING SPEECH ENHANCEMENT WITH PHONETIC EMBEDDING FEATURES
    Wu, Bo
    Yu, Meng
    Chen, Lianwu
    Jin, Mingjie
    Su, Dan
    Yu, Dong
    [J]. 2019 IEEE AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING WORKSHOP (ASRU 2019), 2019, : 645 - 651
  • [3] PHONETIC FEEDBACK FOR SPEECH ENHANCEMENT WITH AND WITHOUT PARALLEL SPEECH DATA
    Plantinga, Peter
    Bagchi, Deblin
    Fosler-Lussier, Eric
    [J]. 2020 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2020, : 6679 - 6683
  • [4] Phonetic enhancement of sibilants in infant-directed speech
    Cristia, Alejandrina
    [J]. JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 2010, 128 (01): : 424 - 434
  • [5] Phonetic Features Enhancement for Bangla Automatic Speech Recognition
    Kabir, Sharif M. Rasel
    Hassan, Foyzul
    Ahamed, Foysal
    Mamun, Khondokar
    Huda, Mohammad Nurul
    Nusrat, Fariha
    [J]. 2015 INTERNATIONAL CONFERENCE ON COMPUTER AND INFORMATION ENGINEERING (ICCIE), 2015, : 25 - 28
  • [6] Constrained Iterative Speech Enhancement Using Phonetic Classes
    Das, Amit
    Hansen, John H. L.
    [J]. IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2012, 20 (06): : 1869 - 1883
  • [7] Clustering techniques for acoustic-phonetic speech classification
    Pohjalainen, J
    [J]. NORSIG 2004: PROCEEDINGS OF THE 6TH NORDIC SIGNAL PROCESSING SYMPOSIUM, 2004, 46 : 348 - 351
  • [8] RESEARCH TECHNIQUES FOR PHONETIC COMPARISON OF LANGUAGES
    DELATTRE, P
    [J]. IRAL-INTERNATIONAL REVIEW OF APPLIED LINGUISTICS IN LANGUAGE TEACHING, 1963, 1 (02): : 85 - 97
  • [9] Improved phase aware speech enhancement using bio-inspired and ANN techniques
    Tusar Kanti Dash
    Sandeep Singh Solanki
    Ganapati Panda
    [J]. Analog Integrated Circuits and Signal Processing, 2020, 102 : 465 - 477
  • [10] Improved phase aware speech enhancement using bio-inspired and ANN techniques
    Dash, Tusar Kanti
    Solanki, Sandeep Singh
    Panda, Ganapati
    [J]. ANALOG INTEGRATED CIRCUITS AND SIGNAL PROCESSING, 2020, 102 (03) : 465 - 477