IMPROVING SEQUENCE-TO-SEQUENCE VOICE CONVERSION BY ADDING TEXT-SUPERVISION

被引:0
|
作者
Zhang, Jing-Xuan [1 ]
Ling, Zhen-Hua [1 ]
Jiang, Yuan [2 ]
Liu, Li-Juan [2 ]
Liang, Chen [3 ]
Dai, Li-Rong [1 ]
机构
[1] Univ Sci & Technol China, Natl Engn Lab Speech & Language Informat Proc, Hefei, Anhui, Peoples R China
[2] iFLYTEK Co Ltd, iFLYTEK Res, Hefei, Anhui, Peoples R China
[3] Anhui Sci & Technol Res Inst, Hefei, Anhui, Peoples R China
关键词
sequence-to-sequence; neural network; voice conversion; text-supervision; DEEP NEURAL-NETWORKS;
D O I
暂无
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
This paper presents methods of making using of text supervision to improve the performance of sequence-to-sequence (seq2seq) voice conversion. Compared with conventional frame-to-frame voice conversion approaches, the seq2seq acoustic modeling method proposed in our previous work achieved higher naturalness and similarity. In this paper, we further improve its performance by utilizing the text transcriptions of parallel training data. First, a multi-task learning structure is designed which adds auxiliary classifiers to the middle layers of the seq2seq model and predicts linguistic labels as a secondary task. Second, a data-augmentation method is proposed which utilizes text alignment to produce extra parallel sequences for model training. Experiments are conducted to evaluate our proposed method with training sets at different sizes. Experimental results show that the multi-task learning with linguistic labels is effective at reducing the errors of seq2seq voice conversion. The data-augmentation method can further improve the performance of seq2seq voice conversion when only 50 or 100 training utterances are available.
引用
下载
收藏
页码:6785 / 6789
页数:5
相关论文
共 50 条
  • [1] Sequence-to-Sequence Acoustic Modeling for Voice Conversion
    Zhang, Jing-Xuan
    Ling, Zhen-Hua
    Liu, Li-Juan
    Jiang, Yuan
    Dai, Li-Rong
    IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2019, 27 (03) : 631 - 644
  • [2] Pretraining Techniques for Sequence-to-Sequence Voice Conversion
    Huang, Wen-Chin
    Hayashi, Tomoki
    Wu, Yi-Chiao
    Kameoka, Hirokazu
    Toda, Tomoki
    IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2021, 29 : 745 - 755
  • [3] Voice Transformer Network: Sequence-to-Sequence Voice Conversion Using Transformer with Text-to-Speech Pretraining
    Huang, Wen-Chin
    Hayashi, Tomoki
    Wu, Yi-Chiao
    Kameoka, Hirokazu
    Toda, Tomoki
    INTERSPEECH 2020, 2020, : 4676 - 4680
  • [4] NON-AUTOREGRESSIVE SEQUENCE-TO-SEQUENCE VOICE CONVERSION
    Hayashi, Tomoki
    Huang, Wen-Chin
    Kobayashi, Kazuhiro
    Toda, Tomoki
    2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021), 2021, : 7068 - 7072
  • [5] Sequence-to-Sequence Emotional Voice Conversion With Strength Control
    Choi, Heejin
    Hahn, Minsoo
    IEEE ACCESS, 2021, 9 : 42674 - 42687
  • [6] An Overview & Analysis of Sequence-to-Sequence Emotional Voice Conversion
    Yang, Zijiang
    Jing, Xin
    Triantafyllopoulos, Andreas
    Song, Meishu
    Aslan, Ilhan
    Schuller, Bjoern W.
    INTERSPEECH 2022, 2022, : 4915 - 4919
  • [7] DISTILLING SEQUENCE-TO-SEQUENCE VOICE CONVERSION MODELS FOR STREAMING CONVERSION APPLICATIONS
    Tanaka, Kou
    Kameoka, Hirokazu
    Kaneko, Takuhiro
    Seki, Shogo
    2022 IEEE SPOKEN LANGUAGE TECHNOLOGY WORKSHOP, SLT, 2022, : 1022 - 1028
  • [8] MANDARIN ELECTROLARYNGEAL SPEECH VOICE CONVERSION WITH SEQUENCE-TO-SEQUENCE MODELING
    Yen, Ming-Chi
    Huang, Wen-Chin
    Kobayashi, Kazuhiro
    Peng, Yu-Huai
    Tsai, Shu-Wei
    Tsao, Yu
    Toda, Tomoki
    Jang, Jyh-Shing Roger
    Wang, Hsin-Min
    2021 IEEE AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING WORKSHOP (ASRU), 2021, : 650 - 657
  • [9] Non-parallel Sequence-to-Sequence Voice Conversion for Arbitrary Speakers
    Zhang, Ying
    Che, Hao
    Wang, Xiaorui
    2021 12TH INTERNATIONAL SYMPOSIUM ON CHINESE SPOKEN LANGUAGE PROCESSING (ISCSLP), 2021,
  • [10] AN INVESTIGATION OF STREAMING NON-AUTOREGRESSIVE SEQUENCE-TO-SEQUENCE VOICE CONVERSION
    Hayashi, Tomoki
    Kobayashi, Kazuhiro
    Toda, Tomoki
    2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2022, : 6802 - 6806