HIGH FIDELITY SPEECH REGENERATION WITH APPLICATION TO SPEECH ENHANCEMENT

被引：9

作者：

Polyak, Adam ^{[1
,2
]}

Wolf, Lior ^{[1
,2
]}

Adi, Yossi ^{[1
]}

Kabeli, Ori ^{[1
]}

Taigman, Yaniv ^{[1
]}

机构：

[1] Facebook AI Res, Menlo Pk, CA 94025 USA

[2] Tel Aviv Univ, Sch Comp Sci, Tel Aviv, Israel

来源：

2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021) | 2021年

关键词：

speech enhancement; audio generation;

D O I：

10.1109/ICASSP39728.2021.9414853

中图分类号：

O42 [声学];

学科分类号：

070206 ; 082403 ;

摘要：

Speech enhancement has seen great improvement in recent years mainly through contributions in denoising, speaker separation, and dereverberation methods that mostly deal with environmental effects on vocal audio. To enhance speech beyond the limitations of the original signal, we take a regeneration approach, in which we recreate the speech from its essence, including the semi-recognized speech, prosody features, and identity. We propose a wav-to-wav generative model for speech that can generate 24khz speech in a real-time manner and which utilizes a compact speech representation, composed of ASR and identity features, to achieve a higher level of intelligibility. Inspired by voice conversion methods, we train to augment the speech characteristics while preserving the identity of the source using an auxiliary identity network. Perceptual acoustic metrics and subjective tests show that the method obtains valuable improvements over recent baselines.

引用

页码：7143 / 7147

页数：5

共 50 条

[1] Application of speech conversion to alaryngeal speech enhancement
Bi, N
Qi, YY
[J]. IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING, 1997, 5 (02): : 97 - 105
[2] Speech enhancement using harmonic regeneration
Plapous, C
Marro, C
Scalart, P
[J]. 2005 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOLS 1-5: SPEECH PROCESSING, 2005, : 157 - 160
[3] Speech conversion and its application to alaryngeal speech enhancement
Bi, N
Qi, YY
[J]. ICSP '96 - 1996 3RD INTERNATIONAL CONFERENCE ON SIGNAL PROCESSING, PROCEEDINGS, VOLS I AND II, 1996, : 1586 - 1589
[4] DUAL APPLICATION OF SPEECH ENHANCEMENT FOR AUTOMATIC SPEECH RECOGNITION
Pandey, Ashutosh
Liu, Chunxi
Wang, Yun
Saraf, Yatharth
[J]. 2021 IEEE SPOKEN LANGUAGE TECHNOLOGY WORKSHOP (SLT), 2021, : 223 - 228
[5] A modular approach to speech enhancement with an application to speech coding
Accardi, AJ
Cox, RV
[J]. ICASSP '99: 1999 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, PROCEEDINGS VOLS I-VI, 1999, : 201 - 204
[6] CONSTRAINED ITERATIVE SPEECH ENHANCEMENT WITH APPLICATION TO SPEECH RECOGNITION
HANSEN, JHL
CLEMENTS, MA
[J]. IEEE TRANSACTIONS ON SIGNAL PROCESSING, 1991, 39 (04) : 795 - 805
[7] Speech directivity patterns generated from a high-fidelity speech corpus
Trine, Allison
Miller, Margaret
Buss, Emily
Stecker, G. Christopher
Monson, Brian B.
[J]. JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 2023, 153 (03):
[8] New speech harmonic structure measure and it application to post speech enhancement
Yu, AT
Wang, HC
[J]. 2004 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOL I, PROCEEDINGS: SPEECH PROCESSING, 2004, : 729 - 732
[9] Spatial Covariance Matrix Estimation for Reverberant Speech with Application to Speech Enhancement
Weisman, Ran
Tourbabin, Vladimir
Calamia, Paul
Rafaely, Boaz
[J]. INTERSPEECH 2020, 2020, : 4044 - 4048
[10] GENERALIZED CEPSTRAL MODELING OF DEGRADED SPEECH AND ITS APPLICATION TO SPEECH ENHANCEMENT
KANNO, T
KOBAYASHI, T
IMAI, S
[J]. IEICE TRANSACTIONS ON FUNDAMENTALS OF ELECTRONICS COMMUNICATIONS AND COMPUTER SCIENCES, 1993, E76A (08) : 1300 - 1367

← 1 2 3 4 5 →