THE SOUND OF MY VOICE: SPEAKER REPRESENTATION LOSS FOR TARGET VOICE SEPARATION

被引:0
|
作者
Mun, Seongkyu [1 ]
Choe, Soyeon [1 ]
Huh, Jaesung [1 ]
Chung, Joon Son [1 ]
机构
[1] Naver Corp, Gyeoggi Do, South Korea
关键词
Source separation; speaker recognition; triplet loss; speaker representation;
D O I
10.1109/icassp40776.2020.9053521
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
Content and style representations have been widely studied in the field of style transfer. In this paper, we propose a new loss function using speaker content representation for audio source separation, and we call it speaker representation loss. The objective is to extract the target speaker voice from the noisy input and also remove it from the residual components. Compared to the conventional spectral reconstruction, our proposed framework maximizes the use of target speaker information by minimizing the distance between the speaker representations of reference and source separation output. We also propose triplet speaker representation loss as an additional criterion to remove the target speaker information from residual spectrogram output. VoiceFilter framework is adopted to evaluate source separation performance using the VCTK database, and we achieved improved performances compared to the baseline loss function without any additional network parameters.
引用
收藏
页码:7289 / 7293
页数:5
相关论文
共 50 条
  • [21] Voice Conversion for TTS Systems with Tuning on the Target Speaker Based on GMM
    Zahariev, Vadim
    Azarov, Elias
    Petrovsky, Alexander
    SPEECH AND COMPUTER, SPECOM 2017, 2017, 10458 : 788 - 798
  • [22] VOICE CONVERGIN: SPEAKER DE-IDENTIFICATION BY VOICE TRANSFORMATION
    Jin, Qin
    Toth, Arthur R.
    Schultz, Tanja
    Black, Alan W.
    2009 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOLS 1- 8, PROCEEDINGS, 2009, : 3909 - 3912
  • [23] Multimodal Target Speech Separation with Voice and Face References
    Qu, Leyuan
    Weber, Cornelius
    Wermter, Stefan
    INTERSPEECH 2020, 2020, : 1416 - 1420
  • [24] Characteristics of a Voice to Identify a Speaker
    Kinkiri, S.
    Bakarat, B.
    Keates, S.
    PROCEEDINGS OF THE 2ND INTERNATIONAL CONFERENCE ON ADVANCES IN SIGNAL PROCESSING AND ARTIFICIAL INTELLIGENCE, ASPAI' 2020, 2020, : 169 - 173
  • [25] Voice - recognition of speaker sex
    Kramer, Elena
    SPRACHE-STIMME-GEHOR, 2014, 38 (01): : 8 - 8
  • [26] OPTIMIZING VOICE CONVERSION NETWORK WITH CYCLE CONSISTENCY LOSS OF SPEAKER IDENTITY
    Du, Hongqiang
    Tian, Xiaohai
    Xie, Lei
    Li, Haizhou
    2021 IEEE SPOKEN LANGUAGE TECHNOLOGY WORKSHOP (SLT), 2021, : 507 - 513
  • [27] Effect of Input Waveform to Vibration Speaker on Sound Quality of Electric Artificial Voice
    Asakura, T.
    Shindo, K.
    SOUND AND VIBRATION, 2020, 54 (02): : 85 - 98
  • [28] MY INK VOICE
    Beyer, Marcel
    EUROPE-REVUE LITTERAIRE MENSUELLE, 2024, (1137) : 240 - 242
  • [29] Hearing my voice
    Campbell, Peter
    PSYCHOLOGIST, 2007, 20 (05) : 298 - 299
  • [30] 'MY FATHERS VOICE'
    ORR, G
    AMERICAN POETRY REVIEW, 1994, 23 (02): : 17 - 17