HIGH FIDELITY SPEECH REGENERATION WITH APPLICATION TO SPEECH ENHANCEMENT

被引:9
|
作者
Polyak, Adam [1 ,2 ]
Wolf, Lior [1 ,2 ]
Adi, Yossi [1 ]
Kabeli, Ori [1 ]
Taigman, Yaniv [1 ]
机构
[1] Facebook AI Res, Menlo Pk, CA 94025 USA
[2] Tel Aviv Univ, Sch Comp Sci, Tel Aviv, Israel
关键词
speech enhancement; audio generation;
D O I
10.1109/ICASSP39728.2021.9414853
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
Speech enhancement has seen great improvement in recent years mainly through contributions in denoising, speaker separation, and dereverberation methods that mostly deal with environmental effects on vocal audio. To enhance speech beyond the limitations of the original signal, we take a regeneration approach, in which we recreate the speech from its essence, including the semi-recognized speech, prosody features, and identity. We propose a wav-to-wav generative model for speech that can generate 24khz speech in a real-time manner and which utilizes a compact speech representation, composed of ASR and identity features, to achieve a higher level of intelligibility. Inspired by voice conversion methods, we train to augment the speech characteristics while preserving the identity of the source using an auxiliary identity network. Perceptual acoustic metrics and subjective tests show that the method obtains valuable improvements over recent baselines.
引用
收藏
页码:7143 / 7147
页数:5
相关论文
共 50 条
  • [1] Application of speech conversion to alaryngeal speech enhancement
    Bi, N
    Qi, YY
    [J]. IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING, 1997, 5 (02): : 97 - 105
  • [2] Speech enhancement using harmonic regeneration
    Plapous, C
    Marro, C
    Scalart, P
    [J]. 2005 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOLS 1-5: SPEECH PROCESSING, 2005, : 157 - 160
  • [3] Speech conversion and its application to alaryngeal speech enhancement
    Bi, N
    Qi, YY
    [J]. ICSP '96 - 1996 3RD INTERNATIONAL CONFERENCE ON SIGNAL PROCESSING, PROCEEDINGS, VOLS I AND II, 1996, : 1586 - 1589
  • [4] DUAL APPLICATION OF SPEECH ENHANCEMENT FOR AUTOMATIC SPEECH RECOGNITION
    Pandey, Ashutosh
    Liu, Chunxi
    Wang, Yun
    Saraf, Yatharth
    [J]. 2021 IEEE SPOKEN LANGUAGE TECHNOLOGY WORKSHOP (SLT), 2021, : 223 - 228
  • [5] A modular approach to speech enhancement with an application to speech coding
    Accardi, AJ
    Cox, RV
    [J]. ICASSP '99: 1999 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, PROCEEDINGS VOLS I-VI, 1999, : 201 - 204
  • [6] CONSTRAINED ITERATIVE SPEECH ENHANCEMENT WITH APPLICATION TO SPEECH RECOGNITION
    HANSEN, JHL
    CLEMENTS, MA
    [J]. IEEE TRANSACTIONS ON SIGNAL PROCESSING, 1991, 39 (04) : 795 - 805
  • [7] Speech directivity patterns generated from a high-fidelity speech corpus
    Trine, Allison
    Miller, Margaret
    Buss, Emily
    Stecker, G. Christopher
    Monson, Brian B.
    [J]. JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 2023, 153 (03):
  • [8] New speech harmonic structure measure and it application to post speech enhancement
    Yu, AT
    Wang, HC
    [J]. 2004 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOL I, PROCEEDINGS: SPEECH PROCESSING, 2004, : 729 - 732
  • [9] Spatial Covariance Matrix Estimation for Reverberant Speech with Application to Speech Enhancement
    Weisman, Ran
    Tourbabin, Vladimir
    Calamia, Paul
    Rafaely, Boaz
    [J]. INTERSPEECH 2020, 2020, : 4044 - 4048
  • [10] GENERALIZED CEPSTRAL MODELING OF DEGRADED SPEECH AND ITS APPLICATION TO SPEECH ENHANCEMENT
    KANNO, T
    KOBAYASHI, T
    IMAI, S
    [J]. IEICE TRANSACTIONS ON FUNDAMENTALS OF ELECTRONICS COMMUNICATIONS AND COMPUTER SCIENCES, 1993, E76A (08) : 1300 - 1367