Many-to-Many Voice Transformer Network

被引:20
|
作者
Kameoka, Hirokazu [1 ]
Huang, Wen-Chin [2 ]
Tanaka, Kou [1 ]
Kaneko, Takuhiro [1 ]
Hojo, Nobukatsu [1 ]
Toda, Tomoki [2 ]
机构
[1] NTT Corp, NTT Commun Sci Labs, Atsugi, Kanagawa 2430198, Japan
[2] Nagoya Univ, Nagoya, Aichi 4648601, Japan
关键词
Training; Acoustics; Computational modeling; Decoding; Data models; Training data; Computer architecture; Attention; many-to-many VC; sequence-to-sequence learning; voice conversion (VC); transformer network; CONVOLUTIONAL SEQUENCE; CONVERSION; SPEECH;
D O I
10.1109/TASLP.2020.3047262
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
This paper proposes a voice conversion (VC) method based on a sequence-to-sequence (S2S) learning framework, which enables simultaneous conversion of the voice characteristics, pitch contour, and duration of input speech. We previously proposed an S2S-based VC method using a transformer network architecture called the voice transformer network (VTN). The original VTN was designed to learn only a mapping of speech feature sequences from one speaker to another. Here, the main idea we propose is an extension of the original VTN that can simultaneously learn mappings among multiple speakers. This extension, called the many-to-many VTN, enables us to fully use available training data collected from multiple speakers by capturing common latent features that can be shared across different speakers. It also allows us to introduce a training loss called the identity mapping loss to ensure that the input feature sequence will remain unchanged when the source and target speaker indices are the same. Using this particular loss for model training has been found to be extremely effective in improving the performance of the model at test time. We conducted speaker identity conversion experiments and found that our model obtained higher sound quality and speaker similarity than baseline methods. We also found that our model, with a slight modification to its architecture, can handle any-to-many conversion tasks reasonably well.
引用
收藏
页码:656 / 670
页数:15
相关论文
共 50 条
  • [21] Many-to-many Voice Conversion Based on Multiple Non-negative Matrix Factorization
    Aihara, Ryo
    Takiguchi, Testuya
    Ariki, Yasuo
    16TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2015), VOLS 1-5, 2015, : 2749 - 2753
  • [22] Decentralized inventory control policies for equipment management in a many-to-many network
    Hall, RW
    Zhong, HS
    TRANSPORTATION RESEARCH PART A-POLICY AND PRACTICE, 2002, 36 (10) : 849 - 865
  • [23] NON-PARALLEL MANY-TO-MANY VOICE CONVERSION USING LOCAL LINGUISTIC TOKENS
    Wang, Chao
    Yu, Yibiao
    2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021), 2021, : 5929 - 5933
  • [24] Reducing Network Cost of Many-to-Many Communication in Unidirectional WDM Rings With Network Coding
    Long, Long
    Kamal, Ahmed E.
    JOURNAL OF LIGHTWAVE TECHNOLOGY, 2009, 27 (19) : 4209 - 4220
  • [25] A novel chaotic neural network for many-to-many associations and successive learning
    Duan, SK
    Liu, GY
    Wang, LD
    Qiu, YH
    PROCEEDINGS OF 2003 INTERNATIONAL CONFERENCE ON NEURAL NETWORKS & SIGNAL PROCESSING, PROCEEDINGS, VOLS 1 AND 2, 2003, : 135 - 138
  • [26] Implementation in the many-to-many matching market
    Sotomayor, M
    GAMES AND ECONOMIC BEHAVIOR, 2004, 46 (01) : 199 - 212
  • [27] Many-to-many aggregation for sensor networks
    Silberstein, Adam
    Yang, Jun
    2007 IEEE 23RD INTERNATIONAL CONFERENCE ON DATA ENGINEERING, VOLS 1-3, 2007, : 961 - +
  • [28] SEARCH MEMORY FOR MANY-TO-MANY COMPARISONS
    DIGBY, DW
    IEEE TRANSACTIONS ON COMPUTERS, 1973, C-22 (08) : 768 - 772
  • [29] Many-to-many matching and price discrimination
    Gomes, Renato
    Pavan, Alessandro
    THEORETICAL ECONOMICS, 2016, 11 (03): : 1005 - 1052
  • [30] A many-to-many 'rural hospital theorem'
    Klijn, Flip
    Yazici, Ayse
    JOURNAL OF MATHEMATICAL ECONOMICS, 2014, 54 : 63 - 73