THE SOUND OF MY VOICE: SPEAKER REPRESENTATION LOSS FOR TARGET VOICE SEPARATION

被引:0
|
作者
Mun, Seongkyu [1 ]
Choe, Soyeon [1 ]
Huh, Jaesung [1 ]
Chung, Joon Son [1 ]
机构
[1] Naver Corp, Gyeoggi Do, South Korea
关键词
Source separation; speaker recognition; triplet loss; speaker representation;
D O I
10.1109/icassp40776.2020.9053521
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
Content and style representations have been widely studied in the field of style transfer. In this paper, we propose a new loss function using speaker content representation for audio source separation, and we call it speaker representation loss. The objective is to extract the target speaker voice from the noisy input and also remove it from the residual components. Compared to the conventional spectral reconstruction, our proposed framework maximizes the use of target speaker information by minimizing the distance between the speaker representations of reference and source separation output. We also propose triplet speaker representation loss as an additional criterion to remove the target speaker information from residual spectrogram output. VoiceFilter framework is adopted to evaluate source separation performance using the VCTK database, and we achieved improved performances compared to the baseline loss function without any additional network parameters.
引用
收藏
页码:7289 / 7293
页数:5
相关论文
共 50 条
  • [11] Target speaker filtration by mask estimation for source speaker traceability in voice conversion
    Zhang, Junfei
    Zhang, Xiongwei
    Sun, Meng
    Zou, Xia
    Jia, Chong
    Li, Yihao
    ENGINEERING APPLICATIONS OF ARTIFICIAL INTELLIGENCE, 2024, 136
  • [12] WITH VAULTED VOICE VERIFICATION MY VOICE IS MY KEY
    Johnson, R. C.
    Boult, Terrance E.
    2013 IEEE INTERNATIONAL CONFERENCE ON TECHNOLOGIES FOR HOMELAND SECURITY (HST), 2013, : 453 - 459
  • [13] Comprehensive source-target speaker voice conversion analysis
    1600, UK Simulation Society, Clifton Lane, Nottingham, NG11 8NS, United Kingdom (15):
  • [14] 'MY VOICE'
    MIEZELAITIS, E
    SOVIET LITERATURE, 1977, (07): : 3 - 6
  • [15] Spoofing Speaker Verification With Voice Style Transfer And Reconstruction Loss
    Thebaud, Thomas
    Le Lan, Gael
    Larcher, Anthony
    2021 IEEE INTERNATIONAL WORKSHOP ON INFORMATION FORENSICS AND SECURITY (WIFS), 2021, : 7 - 13
  • [16] The Little Voice in My HeadThe Little Voice in My HeadRoss
    Jennifer A. Ross
    Journal of General Internal Medicine, 2024, 39 (14) : 2864 - 2865
  • [17] Unsupervised Interpretable Representation Learning for Singing Voice Separation
    Mimilakis, Stylianos, I
    Drossos, Konstantinos
    Schuller, Gerald
    28TH EUROPEAN SIGNAL PROCESSING CONFERENCE (EUSIPCO 2020), 2021, : 1412 - 1416
  • [18] My Face and My Voice
    Barks, Coleman
    GEORGIA REVIEW, 2009, 63 (03): : 470 - 474
  • [19] The voice of water, the voice that leads us to sound
    Zbudilova, Helena
    PENSAMIENTO Y CULTURA, 2008, 11 (01): : 131 - 138
  • [20] VoiceFilter: Targeted Voice Separation by Speaker-Conditioned Spectrogram Masking
    Wang, Quan
    Muckenhirn, Hannah
    Wilson, Kevin
    Sridhar, Prashant
    Wu, Zelin
    Hershey, John R.
    Saurous, Rif A.
    Weiss, Ron J.
    Jia, Ye
    Moreno, Ignacio Lopez
    INTERSPEECH 2019, 2019, : 2728 - 2732