A Systematic Comparison of Phonetic Aware Techniques for Speech Enhancement

被引:0
|
作者
Tal, Or [1 ]
Mandel, Moshe [1 ]
Kreuk, Felix [2 ]
Adi, Yossi [1 ,2 ]
机构
[1] Hebrew Univ Jerusalem, Jerusalem, Israel
[2] Meta AI Res, New York, NY USA
来源
关键词
speech enhancement; phonetic-models; self-supervised learning; automatic speech recognition;
D O I
10.21437/Interspeech.2022-695
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
Speech enhancement has seen great improvement in recent years using end-to-end neural networks. However, most models are agnostic to the spoken phonetic content. Recently, several studies suggested phonetic-aware speech enhancement, mostly using perceptual supervision. Yet, injecting phonetic features during model optimization can take additional forms (e.g., model conditioning). In this paper, we conduct a systematic comparison between different methods of incorporating phonetic information in a speech enhancement model. By conducting a series of controlled experiments, we observe the influence of different phonetic content models as well as various feature-injection techniques on enhancement performance, considering both causal and non-causal models. Specifically, we evaluate three settings for injecting phonetic information, namely: i) feature conditioning; ii) perceptual supervision; and iii) regularization. Phonetic features are obtained using an intermediate layer of either a supervised pre-trained Automatic Speech Recognition (ASR) model or by using a pre-trained Self-Supervised Learning (SSL) model. We further observe the effect of choosing different embedding layers on performance, considering both manual and learned configurations. Results suggest that using a SSL model as phonetic features outperforms the ASR one in most cases. Interestingly, the conditioning setting performs best among the evaluated configurations. Code is available on the following repository.
引用
收藏
页码:1193 / 1197
页数:5
相关论文
共 50 条
  • [41] Phase-Aware Single-channel Speech Enhancement
    Mowlaee, Pejman
    Watanabe, Mario Kaoru
    Saeidi, Rahim
    [J]. 14TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2013), VOLS 1-5, 2013, : 1871 - 1873
  • [42] On Speech Intelligibility Estimation of Phase-Aware Single-Channel Speech Enhancement
    Gaich, Andreas
    Mowlaee, Pejman
    [J]. 16TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2015), VOLS 1-5, 2015, : 2553 - 2557
  • [43] F0 CONTOUR ESTIMATION USING PHONETIC FEATURE IN ELECTROLARYNGEAL SPEECH ENHANCEMENT
    Cai, Zexin
    Xu, Zhicheng
    Li, Ming
    [J]. 2019 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2019, : 6490 - 6494
  • [44] Speaker-Aware Speech Enhancement with Self-Attention
    Lin, Ju
    Van Wijngaarden, Adriaan J.
    Smith, Melissa C.
    Wang, Kuang-Ching
    [J]. 29TH EUROPEAN SIGNAL PROCESSING CONFERENCE (EUSIPCO 2021), 2021, : 486 - 490
  • [45] Investigation on the Band Importance of Phase-aware Speech Enhancement
    Zhang, Zhuohuang
    Williamson, Donald S.
    Shen, Yi
    [J]. INTERSPEECH 2022, 2022, : 4651 - 4655
  • [46] COMPARISON OF DIFFERENT SPEECH ENHANCEMENT METHODS ON RECOGNITION OF NOISY SPEECH
    AHMED, MS
    ALMARZOUG, AM
    [J]. ARABIAN JOURNAL FOR SCIENCE AND ENGINEERING, 1994, 19 (01): : 45 - 56
  • [47] Phonetic Speech Analysis for Speech to Text Conversion
    Bapat, Abhijit V.
    Nagalkar, Lalit K.
    [J]. IEEE REGION 10 COLLOQUIUM AND THIRD INTERNATIONAL CONFERENCE ON INDUSTRIAL AND INFORMATION SYSTEMS, VOLS 1 AND 2, 2008, : 320 - 323
  • [48] Phonetic transcription of disordered speech
    Powell, TW
    [J]. TOPICS IN LANGUAGE DISORDERS, 2001, 21 (04) : 52 - 72
  • [49] CHORAL SPEECH AND PHONETIC INVARIANCE
    COHEN, JR
    KUPIN, JJ
    [J]. JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 1976, 60 : S46 - S46
  • [50] PHONETIC CODE OF A SPEECH SIGNAL
    KULYA, VI
    PIROGOV, AA
    [J]. TELECOMMUNICATIONS AND RADIO ENGINEER-USSR, 1970, (06): : 85 - &