Diverse style oriented many-to-many emotional voice conversion

被引:0
|
作者
Zhou, Jian [1 ]
Luo, Xiangyu [1 ]
Wang, Huabin [1 ]
Zheng, Wenming [2 ]
Tao, Liang [1 ]
机构
[1] Key Laboratory of Intelligent Computing and Signal Processing, Anhui University, Hefei,230601, China
[2] Key Laboratory of Child Development and Learning Science of Ministry of Education, Southeast University, Nanjing,210096, China
来源
Shengxue Xuebao/Acta Acustica | 2024年 / 49卷 / 06期
关键词
Network coding - Speech enhancement;
D O I
10.12395/0371-0025.2023192
中图分类号
学科分类号
摘要
To address the issues of insufficient emotional separation and lack of diversity in emotional expression in existing generative adversarial network (GAN)-based emotional voice conversion methods, this paper proposes a many-to-many speech emotional voice conversion method aimed at style diversification. The method is based on a GAN model with a dual-generator structure, where a consistency loss is applied to the latent representations of different generators to ensure the consistency of speech content and speaker characteristics, thereby improving the similarity between the converted speech emotion and the target emotion. Additionally, this method utilizes an emotion mapping network and emotion feature encoder to provide diversified emotional representations of the same emotion category for the generators. Experimental results show that the proposed emotion conversion method yields speech emotions that are closer to the target emotion, with a richer variety of emotional styles. © 2024 Science Press. All rights reserved.
引用
收藏
页码:1297 / 1303
相关论文
共 50 条
  • [21] NON-PARALLEL TRAINING FOR MANY-TO-MANY EIGENVOICE CONVERSION
    Ohtani, Yamato
    Toda, Tomoki
    Saruwatari, Hiroshi
    Shikano, Kiyohiro
    2010 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2010, : 4822 - 4825
  • [22] Many-to-Many Voice Conversion based on Bottleneck Features with Variational Autoencoder for Non-parallel Training Data
    Li, Yanping
    Lee, Kong Aik
    Yuan, Yougen
    Li, Haizhou
    Yang, Zhen
    2018 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA ASC), 2018, : 829 - 833
  • [23] Fast Learning for Non-Parallel Many-to-Many Voice Conversion with Residual Star Generative Adversarial Networks
    Zhao, Shengkui
    Nguyen, Trung Hieu
    Wang, Hao
    Ma, Bin
    INTERSPEECH 2019, 2019, : 689 - 693
  • [24] F0-CONSISTENT MANY-TO-MANY NON-PARALLEL VOICE CONVERSION VIA CONDITIONAL AUTOENCODER
    Qian, Kaizhi
    Fin, Zeyu
    Hasegawa-Johnson, Mark
    Mysore, Gautham J.
    2020 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2020, : 6284 - 6288
  • [25] NON-PARALLEL MANY-TO-MANY VOICE CONVERSION BY KNOWLEDGE TRANSFER FROM A TEXT-TO-SPEECH MODEL
    Yu, Xinyuan
    Mak, Brian
    2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021), 2021, : 5924 - 5928
  • [26] GLGAN-VC: A Guided Loss-Based Generative Adversarial Network for Many-to-Many Voice Conversion
    Dhar, Sandipan
    Jana, Nanda Dulal
    Das, Swagatam
    IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2025, 36 (01) : 1813 - 1826
  • [27] GLGAN-VC: A Guided Loss-Based Generative Adversarial Network for Many-to-Many Voice Conversion
    Dhar, Sandipan
    Jana, Nanda Dulal
    Das, Swagatam
    IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2025, 36 (01) : 1813 - 1826
  • [28] Many-to-Many Singing Performance Style Transfer on Pitch and Energy Contours
    Hsu, Yu-Teng
    Wang, Jun-You
    Jang, Jyh-Shing Roger
    IEEE SIGNAL PROCESSING LETTERS, 2025, 32 : 166 - 170
  • [29] Toward learning a unified many-to-many mapping for diverse image translation
    Xu, Wenju
    Shawn, Keshmiri
    Wang, Guanghui
    PATTERN RECOGNITION, 2019, 93 : 570 - 580
  • [30] Non-parallel and many-to-many voice conversion using variational autoencoders integrating speech recognition and speaker verification
    Saito, Yuki
    Nakamura, Taiki
    Ijima, Yusuke
    Nishida, Kyosuke
    Takamichi, Shinnosuke
    ACOUSTICAL SCIENCE AND TECHNOLOGY, 2021, 42 (01) : 1 - 11