Manipulating Voice Attributes by Adversarial Learning of Structured Disentangled Representations

被引:1
|
作者
Benaroya, Laurent [1 ]
Obin, Nicolas [1 ]
Roebel, Axel [1 ]
机构
[1] Sorbonne Univ, Anal Synth Team, STMS, IRCAM,CNRS,French Minist Culture, F-75004 Paris, France
关键词
voice conversion; attribute manipulation; representation learning; information disentanglement; adversarial learning; cross-entropy; CONVERSION;
D O I
10.3390/e25020375
中图分类号
O4 [物理学];
学科分类号
0702 ;
摘要
Voice conversion (VC) consists of digitally altering the voice of an individual to manipulate part of its content, primarily its identity, while maintaining the rest unchanged. Research in neural VC has accomplished considerable breakthroughs with the capacity to falsify a voice identity using a small amount of data with a highly realistic rendering. This paper goes beyond voice identity manipulation and presents an original neural architecture that allows the manipulation of voice attributes (e.g., gender and age). The proposed architecture is inspired by the fader network, transferring the same ideas to voice manipulation. The information conveyed by the speech signal is disentangled into interpretative voice attributes by means of minimizing adversarial loss to make the encoded information mutually independent while preserving the capacity to generate a speech signal from the disentangled codes. During inference for voice conversion, the disentangled voice attributes can be manipulated and the speech signal can be generated accordingly. For experimental evaluation, the proposed method is applied to the task of voice gender conversion using the freely available VCTK dataset. Quantitative measurements of mutual information between the variables of speaker identity and speaker gender show that the proposed architecture can learn gender-independent representation of speakers. Additional measurements of speaker recognition indicate that speaker identity can be recognized accurately from the gender-independent representation. Finally, a subjective experiment conducted on the task of voice gender manipulation shows that the proposed architecture can convert voice gender with very high efficiency and good naturalness.
引用
收藏
页数:16
相关论文
共 50 条
  • [31] Learning Disentangled Representations for Natural Language Definitions
    Carvalho, Danilo S.
    Mercatali, Giangiacomo
    Zhang, Yingji
    Freitas, Andre
    17TH CONFERENCE OF THE EUROPEAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, EACL 2023, 2023, : 1371 - 1384
  • [32] Black-box adversarial attacks by manipulating image attributes
    Wei, Xingxing
    Guo, Ying
    Li, Bo
    INFORMATION SCIENCES, 2021, 550 : 285 - 296
  • [33] Learning Disentangled Joint Continuous and Discrete Representations
    Dupont, Emilien
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 31 (NIPS 2018), 2018, 31
  • [34] Black-box adversarial attacks by manipulating image attributes
    Wei, Xingxing
    Guo, Ying
    Li, Bo
    Information Sciences, 2021, 550 : 285 - 296
  • [35] Artifacts-Disentangled Adversarial Learning for Deepfake Detection
    Li, Xin
    Ni, Rongrong
    Yang, Pengpeng
    Fu, Zhiqiang
    Zhao, Yao
    IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2023, 33 (04) : 1658 - 1670
  • [36] Speaker-Independent Emotional Voice Conversion via Disentangled Representations
    Chen, Xunquan
    Xu, Xuexin
    Chen, Jinhui
    Zhang, Zhizhong
    Takiguchi, Tetsuya
    Hancock, Edwin R.
    IEEE TRANSACTIONS ON MULTIMEDIA, 2023, 25 : 7480 - 7493
  • [37] Cross-lingual Voice Conversion with Disentangled Universal Linguistic Representations
    Yang, Zhenchuan
    Zhang, Weibin
    Liu, Yufei
    Xing, Xiaofen
    INTERSPEECH 2021, 2021, : 1604 - 1608
  • [38] Hierarchical disentangled representation learning for singing voice conversion
    Takahashi, Naoya
    Singh, Mayank Kumar
    Mitsufuji, Yuki
    2021 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2021,
  • [39] Learning structured representations
    Shastri, L
    Wendelken, C
    NEUROCOMPUTING, 2003, 52-4 : 363 - 370
  • [40] KNOWLEDGE ROUTER: Learning Disentangled Representations for Knowledge Graphs
    Zhang, Shuai
    Rao, Xi
    Tay, Yi
    Zhang, Ce
    2021 CONFERENCE OF THE NORTH AMERICAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: HUMAN LANGUAGE TECHNOLOGIES (NAACL-HLT 2021), 2021, : 1 - 10