THE SOUND OF MY VOICE: SPEAKER REPRESENTATION LOSS FOR TARGET VOICE SEPARATION

被引:0
|
作者
Mun, Seongkyu [1 ]
Choe, Soyeon [1 ]
Huh, Jaesung [1 ]
Chung, Joon Son [1 ]
机构
[1] Naver Corp, Gyeoggi Do, South Korea
关键词
Source separation; speaker recognition; triplet loss; speaker representation;
D O I
10.1109/icassp40776.2020.9053521
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
Content and style representations have been widely studied in the field of style transfer. In this paper, we propose a new loss function using speaker content representation for audio source separation, and we call it speaker representation loss. The objective is to extract the target speaker voice from the noisy input and also remove it from the residual components. Compared to the conventional spectral reconstruction, our proposed framework maximizes the use of target speaker information by minimizing the distance between the speaker representations of reference and source separation output. We also propose triplet speaker representation loss as an additional criterion to remove the target speaker information from residual spectrogram output. VoiceFilter framework is adopted to evaluate source separation performance using the VCTK database, and we achieved improved performances compared to the baseline loss function without any additional network parameters.
引用
收藏
页码:7289 / 7293
页数:5
相关论文
共 50 条
  • [41] Will My Voice Be Heard?
    Glessner, Heather Dean
    JOURNAL OF GENETIC COUNSELING, 2012, 21 (02) : 189 - 191
  • [42] Finding my voice
    Hall, Rachael
    CHEMICAL & ENGINEERING NEWS, 2018, 96 (36) : 29 - 29
  • [43] Hear my voice
    Spinney, L
    NEW SCIENTIST, 2003, 177 (2383) : 36 - 39
  • [44] Lifting My Voice
    Mcmillan, Jeffery S.
    OPERA NEWS, 2015, 79 (09): : 60 - 60
  • [45] 'ARGENTINA MY VOICE'
    FRASER, K
    DESCANT, 1986, 17 (04): : 7 - 17
  • [46] 'Where is my voice in this?'
    Brookman-Byrne, Annie
    PSYCHOLOGIST, 2020, 33 : 13 - 13
  • [47] Finding my voice
    Randal, P
    Hamer, H
    ACTA PSYCHIATRICA SCANDINAVICA, 2006, 114 : 3 - 4
  • [48] A Hierarchical Speaker Representation Framework for One-shot Singing Voice Conversion
    Li, Xu
    Liu, Shansong
    Shan, Ying
    INTERSPEECH 2022, 2022, : 4307 - 4311
  • [49] Is my voice just a familiar voice? An electrophysiological study
    Graux, Jerome
    Gomot, Marie
    Roux, Sylvie
    Bonnet-Brilhault, Frederique
    Bruneau, Nicole
    SOCIAL COGNITIVE AND AFFECTIVE NEUROSCIENCE, 2015, 10 (01) : 101 - 105
  • [50] At the loss of voice
    Chatti, Mounira
    EUROPE-REVUE LITTERAIRE MENSUELLE, 2014, (1024) : 363 - 364