Many-to-Many Voice Transformer Network

被引：20

作者：

Kameoka, Hirokazu ^{[1
]}

Huang, Wen-Chin ^{[2
]}

Tanaka, Kou ^{[1
]}

Kaneko, Takuhiro ^{[1
]}

Hojo, Nobukatsu ^{[1
]}

Toda, Tomoki ^{[2
]}

机构：

[1] NTT Corp, NTT Commun Sci Labs, Atsugi, Kanagawa 2430198, Japan

[2] Nagoya Univ, Nagoya, Aichi 4648601, Japan

来源：

IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING | 2021年 / 29卷

关键词：

Training; Acoustics; Computational modeling; Decoding; Data models; Training data; Computer architecture; Attention; many-to-many VC; sequence-to-sequence learning; voice conversion (VC); transformer network; CONVOLUTIONAL SEQUENCE; CONVERSION; SPEECH;

D O I：

10.1109/TASLP.2020.3047262

中图分类号：

O42 [声学];

学科分类号：

070206 ; 082403 ;

摘要：

This paper proposes a voice conversion (VC) method based on a sequence-to-sequence (S2S) learning framework, which enables simultaneous conversion of the voice characteristics, pitch contour, and duration of input speech. We previously proposed an S2S-based VC method using a transformer network architecture called the voice transformer network (VTN). The original VTN was designed to learn only a mapping of speech feature sequences from one speaker to another. Here, the main idea we propose is an extension of the original VTN that can simultaneously learn mappings among multiple speakers. This extension, called the many-to-many VTN, enables us to fully use available training data collected from multiple speakers by capturing common latent features that can be shared across different speakers. It also allows us to introduce a training loss called the identity mapping loss to ensure that the input feature sequence will remain unchanged when the source and target speaker indices are the same. Using this particular loss for model training has been found to be extremely effective in improving the performance of the model at test time. We conducted speaker identity conversion experiments and found that our model obtained higher sound quality and speaker similarity than baseline methods. We also found that our model, with a slight modification to its architecture, can handle any-to-many conversion tasks reasonably well.

引用

页码：656 / 670

页数：15

共 50 条

[41] Beyond Subspace Isolation: Many-to-Many Transformer for Light Field Image Super-Resolution
Hu, Zeke Zexi
Chen, Xiaoming
Chung, Vera Yuk Ying
Shen, Yiran
IEEE TRANSACTIONS ON MULTIMEDIA, 2025, 27 : 1334 - 1348
[42] Many-to-Many Game-Theoretic Approach for the Measurement of Transportation Network Vulnerability
Lownes, Nicholas E.
Wang, Qixing
Ibrahim, Saleh
Ammar, Reda A.
Rajasekaran, Sanguthevar
Sharma, Dolly
TRANSPORTATION RESEARCH RECORD, 2011, (2263) : 1 - 8
[43] Many-to-Many Superpixel Matching for Robust Tracking
Wang, Junqiu
Yagi, Yasushi
IEEE TRANSACTIONS ON CYBERNETICS, 2014, 44 (07) : 1237 - 1248
[44] Remote intelligence enables many-to-many solutions
Condon, Michael
Control Engineering, 2022, 69 (10)
[45] On the scalability of many-to-many reliable multicast sessions
Yoon, WY
Lee, D
Youn, HY
CONCURRENCY AND COMPUTATION-PRACTICE & EXPERIENCE, 2004, 16 (13): : 1353 - 1363
[46] SMART: A many-to-many multicast protocol for ATM
Gauthier, E
LeBoudec, JY
Oechslin, P
IEEE JOURNAL ON SELECTED AREAS IN COMMUNICATIONS, 1997, 15 (03) : 458 - 472
[47] Tractable Model of Dynamic Many-to-Many Matching
Peski, Marcin
AMERICAN ECONOMIC JOURNAL-MICROECONOMICS, 2022, 14 (02) : 1 - 43
[48] Popular critical matchings in the many-to-many setting
Nasre, Meghana
Nimbhorkar, Prajakta
Ranjan, Keshav
Sarkar, Ankita
THEORETICAL COMPUTER SCIENCE, 2024, 982
[49] Pareto optimality in many-to-many matching problems
Cechlarova, Katarina
Eirinakis, Pavlos
Fleiner, Tamas
Magos, Dimitrios
Mourtos, Ioannis
Potpinkova, Eva
DISCRETE OPTIMIZATION, 2014, 14 : 160 - 169
[50] Many-to-Many Relational Parallel Coordinates Displays
Lind, Mats
Johansson, Jimmy
Cooper, Matthew
INFORMATION VISUALIZATION, IV 2009, PROCEEDINGS, 2009, : 25 - +

← 1 2 3 4 5 →