Diverse style oriented many-to-many emotional voice conversion

被引:0
|
作者
Zhou, Jian [1 ]
Luo, Xiangyu [1 ]
Wang, Huabin [1 ]
Zheng, Wenming [2 ]
Tao, Liang [1 ]
机构
[1] Key Laboratory of Intelligent Computing and Signal Processing, Anhui University, Hefei,230601, China
[2] Key Laboratory of Child Development and Learning Science of Ministry of Education, Southeast University, Nanjing,210096, China
来源
Shengxue Xuebao/Acta Acustica | 2024年 / 49卷 / 06期
关键词
Network coding - Speech enhancement;
D O I
10.12395/0371-0025.2023192
中图分类号
学科分类号
摘要
To address the issues of insufficient emotional separation and lack of diversity in emotional expression in existing generative adversarial network (GAN)-based emotional voice conversion methods, this paper proposes a many-to-many speech emotional voice conversion method aimed at style diversification. The method is based on a GAN model with a dual-generator structure, where a consistency loss is applied to the latent representations of different generators to ensure the consistency of speech content and speaker characteristics, thereby improving the similarity between the converted speech emotion and the target emotion. Additionally, this method utilizes an emotion mapping network and emotion feature encoder to provide diversified emotional representations of the same emotion category for the generators. Experimental results show that the proposed emotion conversion method yields speech emotions that are closer to the target emotion, with a richer variety of emotional styles. © 2024 Science Press. All rights reserved.
引用
收藏
页码:1297 / 1303
相关论文
共 50 条
  • [1] Many-to-many eigenvoice conversion with reference voice
    Ohtani, Yamato
    Toda, Tomoki
    Saruwatari, Hiroshi
    Shikano, Kiyohiro
    INTERSPEECH 2009: 10TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2009, VOLS 1-5, 2009, : 1591 - 1594
  • [2] Accent and Speaker Disentanglement in Many-to-many Voice Conversion
    Wang, Zhichao
    Ge, Wenshuo
    Wang, Xiong
    Yang, Shan
    Gan, Wendong
    Chen, Haitao
    Li, Hai
    Xie, Lei
    Li, Xiulin
    2021 12TH INTERNATIONAL SYMPOSIUM ON CHINESE SPOKEN LANGUAGE PROCESSING (ISCSLP), 2021,
  • [3] Many-to-many voice conversion with sentence embedding based on VAACGAN
    Li, Yanping
    Cao, Pan
    Shi, Yang
    Zhang, Yan
    Beijing Hangkong Hangtian Daxue Xuebao/Journal of Beijing University of Aeronautics and Astronautics, 2021, 47 (03): : 500 - 508
  • [4] Evaluation of a Singing Voice Conversion Method Based on Many-to-Many Eigenvoice Conversion
    Doi, Hironori
    Toda, Tomoki
    Nakano, Tomoyasu
    Goto, Masataka
    Nakamura, Satoshi
    14TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2013), VOLS 1-5, 2013, : 1066 - 1070
  • [5] Many-to-many voice conversion experiments using a Korean speech corpus
    Yook, Dongsuk
    Seo, HyungJin
    Ko, Bonggu
    Yoo, In-Chul
    JOURNAL OF THE ACOUSTICAL SOCIETY OF KOREA, 2022, 41 (03): : 351 - 358
  • [6] Many-to-Many Voice Transformer Network
    Kameoka, Hirokazu
    Huang, Wen-Chin
    Tanaka, Kou
    Kaneko, Takuhiro
    Hojo, Nobukatsu
    Toda, Tomoki
    IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2021, 29 : 656 - 670
  • [7] Many-to-Many Voice Conversion based Feature Disentanglement using Variational Autoencoder
    Luang, Manh
    Viet Anh Tran
    INTERSPEECH 2021, 2021, : 851 - 855
  • [8] Non-parallel Many-to-many Voice Conversion with PSR-StarGAN
    Li, Yanping
    Xu, Dongxiang
    Zhang, Yan
    Wang, Yang
    Chen, Binbin
    INTERSPEECH 2020, 2020, : 781 - 785
  • [9] Non-parallel Many-to-many Singing Voice Conversion by Adversarial Learning
    Hu, Jinsen
    Yu, Chunyan
    Guan, Faqian
    2019 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA ASC), 2019, : 125 - 132
  • [10] Multiple Non-Negative Matrix Factorization for Many-to-Many Voice Conversion
    Aihara, Ryo
    Takiguchi, Tetsuya
    Ariki, Yasuo
    IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2016, 24 (07) : 1175 - 1184