NON-AUTOREGRESSIVE SEQUENCE-TO-SEQUENCE VOICE CONVERSION

被引:9
|
作者
Hayashi, Tomoki [1 ,2 ]
Huang, Wen-Chin [2 ]
Kobayashi, Kazuhiro [1 ,2 ]
Toda, Tomoki [2 ]
机构
[1] TARVO Inc, Nagoya, Aichi, Japan
[2] Nagoya Univ, Nagoya, Aichi, Japan
关键词
Voice conversion; non-autoregressive; sequenceto-sequence; Transformer; Conformer;
D O I
10.1109/ICASSP39728.2021.9413973
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
This paper proposes a novel voice conversion (VC) method based on non-autoregressive sequence-to-sequence (NAR-S2S) models. Inspired by the great success of NAR-S2S models such as FastSpeech in text-to-speech (TTS), we extend the FastSpeech2 model for the VC problem. We introduce the convolution-augmented Transformer (Conformer) instead of the Transformer, making it possible to capture both local and global context information from the input sequence. Furthermore, we extend variance predictors to variance converters to explicitly convert the source speaker's prosody components such as pitch and energy into the target speaker. The experimental evaluation with the Japanese speaker dataset, which consists of male and female speakers of 1,000 utterances, demonstrates that the proposed model enables us to perform more stable, faster, and better conversion than autoregressive S2S (AR-S2S) models such as Tacotron2 and Transformer.
引用
收藏
页码:7068 / 7072
页数:5
相关论文
共 50 条
  • [1] AN INVESTIGATION OF STREAMING NON-AUTOREGRESSIVE SEQUENCE-TO-SEQUENCE VOICE CONVERSION
    Hayashi, Tomoki
    Kobayashi, Kazuhiro
    Toda, Tomoki
    [J]. 2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2022, : 6802 - 6806
  • [2] AN INVESTIGATION OF STREAMING NON-AUTOREGRESSIVE SEQUENCE-TO-SEQUENCE VOICE CONVERSION
    Hayashi, Tomoki
    Kobayashi, Kazuhiro
    Toda, Tomoki
    [J]. ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings, 2022, 2022-May : 6802 - 6806
  • [3] Translating Images to Road Network: A Non-Autoregressive Sequence-to-Sequence Approach
    Lu, Jiachen
    Peng, Renyuan
    Cai, Xinyue
    Xu, Hang
    Li, Hongyang
    Wen, Feng
    Zhang, Wei
    Zhang, Li
    [J]. 2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION, ICCV, 2023, : 23 - 33
  • [4] Integrated Training for Sequence-to-Sequence Models Using Non-Autoregressive Transformer
    Tokarchuk, Evgeniia
    Rosendahl, Jan
    Wang, Weiyue
    Petrushkov, Pavel
    Lancewicki, Tomer
    Khadivi, Shahram
    Ney, Hermann
    [J]. IWSLT 2021: THE 18TH INTERNATIONAL CONFERENCE ON SPOKEN LANGUAGE TRANSLATION, 2021, : 276 - 286
  • [5] Jointly Masked Sequence-to-Sequence Model for Non-Autoregressive Neural Machine Translation
    Guo, Junliang
    Xu, Linli
    Chen, Enhong
    [J]. 58TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2020), 2020, : 376 - 385
  • [6] Non-parallel Sequence-to-Sequence Voice Conversion for Arbitrary Speakers
    Zhang, Ying
    Che, Hao
    Wang, Xiaorui
    [J]. 2021 12TH INTERNATIONAL SYMPOSIUM ON CHINESE SPOKEN LANGUAGE PROCESSING (ISCSLP), 2021,
  • [7] Sequence-to-Sequence Acoustic Modeling for Voice Conversion
    Zhang, Jing-Xuan
    Ling, Zhen-Hua
    Liu, Li-Juan
    Jiang, Yuan
    Dai, Li-Rong
    [J]. IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2019, 27 (03) : 631 - 644
  • [8] Pretraining Techniques for Sequence-to-Sequence Voice Conversion
    Huang, Wen-Chin
    Hayashi, Tomoki
    Wu, Yi-Chiao
    Kameoka, Hirokazu
    Toda, Tomoki
    [J]. IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2021, 29 : 745 - 755
  • [9] An Overview & Analysis of Sequence-to-Sequence Emotional Voice Conversion
    Yang, Zijiang
    Jing, Xin
    Triantafyllopoulos, Andreas
    Song, Meishu
    Aslan, Ilhan
    Schuller, Bjoern W.
    [J]. INTERSPEECH 2022, 2022, : 4915 - 4919
  • [10] Sequence-to-Sequence Emotional Voice Conversion With Strength Control
    Choi, Heejin
    Hahn, Minsoo
    [J]. IEEE ACCESS, 2021, 9 : 42674 - 42687