Many-to-Many Voice Transformer Network

被引:20
|
作者
Kameoka, Hirokazu [1 ]
Huang, Wen-Chin [2 ]
Tanaka, Kou [1 ]
Kaneko, Takuhiro [1 ]
Hojo, Nobukatsu [1 ]
Toda, Tomoki [2 ]
机构
[1] NTT Corp, NTT Commun Sci Labs, Atsugi, Kanagawa 2430198, Japan
[2] Nagoya Univ, Nagoya, Aichi 4648601, Japan
关键词
Training; Acoustics; Computational modeling; Decoding; Data models; Training data; Computer architecture; Attention; many-to-many VC; sequence-to-sequence learning; voice conversion (VC); transformer network; CONVOLUTIONAL SEQUENCE; CONVERSION; SPEECH;
D O I
10.1109/TASLP.2020.3047262
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
This paper proposes a voice conversion (VC) method based on a sequence-to-sequence (S2S) learning framework, which enables simultaneous conversion of the voice characteristics, pitch contour, and duration of input speech. We previously proposed an S2S-based VC method using a transformer network architecture called the voice transformer network (VTN). The original VTN was designed to learn only a mapping of speech feature sequences from one speaker to another. Here, the main idea we propose is an extension of the original VTN that can simultaneously learn mappings among multiple speakers. This extension, called the many-to-many VTN, enables us to fully use available training data collected from multiple speakers by capturing common latent features that can be shared across different speakers. It also allows us to introduce a training loss called the identity mapping loss to ensure that the input feature sequence will remain unchanged when the source and target speaker indices are the same. Using this particular loss for model training has been found to be extremely effective in improving the performance of the model at test time. We conducted speaker identity conversion experiments and found that our model obtained higher sound quality and speaker similarity than baseline methods. We also found that our model, with a slight modification to its architecture, can handle any-to-many conversion tasks reasonably well.
引用
收藏
页码:656 / 670
页数:15
相关论文
共 50 条
  • [41] Beyond Subspace Isolation: Many-to-Many Transformer for Light Field Image Super-Resolution
    Hu, Zeke Zexi
    Chen, Xiaoming
    Chung, Vera Yuk Ying
    Shen, Yiran
    IEEE TRANSACTIONS ON MULTIMEDIA, 2025, 27 : 1334 - 1348
  • [42] Many-to-Many Game-Theoretic Approach for the Measurement of Transportation Network Vulnerability
    Lownes, Nicholas E.
    Wang, Qixing
    Ibrahim, Saleh
    Ammar, Reda A.
    Rajasekaran, Sanguthevar
    Sharma, Dolly
    TRANSPORTATION RESEARCH RECORD, 2011, (2263) : 1 - 8
  • [43] Many-to-Many Superpixel Matching for Robust Tracking
    Wang, Junqiu
    Yagi, Yasushi
    IEEE TRANSACTIONS ON CYBERNETICS, 2014, 44 (07) : 1237 - 1248
  • [44] Remote intelligence enables many-to-many solutions
    Condon, Michael
    Control Engineering, 2022, 69 (10)
  • [45] On the scalability of many-to-many reliable multicast sessions
    Yoon, WY
    Lee, D
    Youn, HY
    CONCURRENCY AND COMPUTATION-PRACTICE & EXPERIENCE, 2004, 16 (13): : 1353 - 1363
  • [46] SMART: A many-to-many multicast protocol for ATM
    Gauthier, E
    LeBoudec, JY
    Oechslin, P
    IEEE JOURNAL ON SELECTED AREAS IN COMMUNICATIONS, 1997, 15 (03) : 458 - 472
  • [47] Tractable Model of Dynamic Many-to-Many Matching
    Peski, Marcin
    AMERICAN ECONOMIC JOURNAL-MICROECONOMICS, 2022, 14 (02) : 1 - 43
  • [48] Popular critical matchings in the many-to-many setting
    Nasre, Meghana
    Nimbhorkar, Prajakta
    Ranjan, Keshav
    Sarkar, Ankita
    THEORETICAL COMPUTER SCIENCE, 2024, 982
  • [49] Pareto optimality in many-to-many matching problems
    Cechlarova, Katarina
    Eirinakis, Pavlos
    Fleiner, Tamas
    Magos, Dimitrios
    Mourtos, Ioannis
    Potpinkova, Eva
    DISCRETE OPTIMIZATION, 2014, 14 : 160 - 169
  • [50] Many-to-Many Relational Parallel Coordinates Displays
    Lind, Mats
    Johansson, Jimmy
    Cooper, Matthew
    INFORMATION VISUALIZATION, IV 2009, PROCEEDINGS, 2009, : 25 - +