THE SOUND OF MY VOICE: SPEAKER REPRESENTATION LOSS FOR TARGET VOICE SEPARATION

被引：0

作者：

Mun, Seongkyu ^{[1
]}

Choe, Soyeon ^{[1
]}

Huh, Jaesung ^{[1
]}

Chung, Joon Son ^{[1
]}

机构：

[1] Naver Corp, Gyeoggi Do, South Korea

来源：

2020 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING | 2020年

关键词：

Source separation; speaker recognition; triplet loss; speaker representation;

D O I：

10.1109/icassp40776.2020.9053521

中图分类号：

O42 [声学];

学科分类号：

070206 ; 082403 ;

摘要：

Content and style representations have been widely studied in the field of style transfer. In this paper, we propose a new loss function using speaker content representation for audio source separation, and we call it speaker representation loss. The objective is to extract the target speaker voice from the noisy input and also remove it from the residual components. Compared to the conventional spectral reconstruction, our proposed framework maximizes the use of target speaker information by minimizing the distance between the speaker representations of reference and source separation output. We also propose triplet speaker representation loss as an additional criterion to remove the target speaker information from residual spectrogram output. VoiceFilter framework is adopted to evaluate source separation performance using the VCTK database, and we achieved improved performances compared to the baseline loss function without any additional network parameters.

引用

页码：7289 / 7293

页数：5

共 50 条

[1] Sound of My Voice
Gray, Carmen
SIGHT AND SOUND, 2012, 22 (10): : 102 - 102
[2] The 'Sound of My Voice'
Power, C
TLS-THE TIMES LITERARY SUPPLEMENT, 2002, (5193): : 23 - 23
[3] The Sound of My Voice
Farooq, Umer
Franco, Diana
ACADEMIC MEDICINE, 2023, 98 (01) : 16 - 16
[4] Sound of My Voice
Duke, Shaun
SCIENCE FICTION FILM AND TELEVISION, 2014, 7 (01) : 120 - 124
[5] THE 'SOUND OF MY VOICE' - BUTLIN,R
RILEY, J
SCOTTISH LITERARY JOURNAL, 1995, : 32 - 33
[6] THE 'SOUND OF MY VOICE' - BUTLIN,R
MELMOTH, J
TLS-THE TIMES LITERARY SUPPLEMENT, 1987, (4374): : 109 - 109
[7] Prototypical speaker-interference loss for target voice separation using non-parallel audio samples
Mun, Seongkyu
Gowda, Dhananjaya
Lee, Jihwan
Han, Changwoo
Lee, Dokyun
Kim, Chanwoo
INTERSPEECH 2022, 2022, : 276 - 280
[8] Online Target Speaker Voice Activity Detection for Speaker Diarization
Wang, Weiqing
Lin, Qingjian
Li, Ming
INTERSPEECH 2022, 2022, : 1441 - 1445
[9] Sound Identification and Speaker Recognition for Aircraft Cockpit Voice Recorder
Lin, Yang
PROCEEDINGS OF 2010 ASIA-PACIFIC INTERNATIONAL SYMPOSIUM ON AEROSPACE TECHNOLOGY, VOL 1 AND 2, 2010, : 260 - 263
[10] Sound Identification and Speaker Recognition for Aircraft Cockpit Voice Recorder
Lin, Yang
PROCEEDINGS OF THE 10TH CONFERENCE ON MAN-MACHINE-ENVIRONMENT SYSTEM ENGINEERING, 2010, : 147 - 150

← 1 2 3 4 5 →