Exemplar-based voice conversion using joint nonnegative matrix factorization

被引:0
|
作者
Zhizheng Wu
Eng Siong Chng
Haizhou Li
机构
[1] Nanyang Technological University,School of Computer Engineering
[2] University of Edinburgh,Centre for Speech Technology Research
[3] Nanyang Technological University,School of Computer Engineering
[4] Nanyang Technological University,Human Language Technology Department, Institute for Infocomm Research, School of Computer Engineering
来源
关键词
Speech synthesis; Voice conversion; Exemplar; Sparse representation; Nonnegative matrix factorization; Joint nonnegative matrix factorization;
D O I
暂无
中图分类号
学科分类号
摘要
Exemplar-based sparse representation is a nonparametric framework for voice conversion. In this framework, a target spectrum is generated as a weighted linear combination of a set of basis spectra, namely exemplars, extracted from the training data. This framework adopts coupled source-target dictionaries consisting of acoustically aligned source-target exemplars, and assumes they can share the same activation matrix. At runtime, a source spectrogram is factorized as a product of the source dictionary and the common activation matrix, which is applied to the target dictionary to generate the target spectrogram. In practice, either low-resolution mel-scale filter bank energies or high-resolution spectra are adopted in the source dictionary. Low-resolution features are flexible in capturing the temporal information without increasing the computational cost and the memory occupation significantly, while high-resolution spectra contain significant spectral details. In this paper, we propose a joint nonnegative matrix factorization technique to find the common activation matrix using low- and high-resolution features at the same time. In this way, the common activation matrix is able to benefit from low- and high-resolution features directly. We conducted experiments on the VOICES database to evaluate the performance of the proposed method. Both objective and subjective evaluations confirmed the effectiveness of the proposed methods.
引用
收藏
页码:9943 / 9958
页数:15
相关论文
共 50 条
  • [1] Exemplar-based voice conversion using joint nonnegative matrix factorization
    Wu, Zhizheng
    Chng, Eng Siong
    Li, Haizhou
    MULTIMEDIA TOOLS AND APPLICATIONS, 2015, 74 (22) : 9943 - 9958
  • [2] Exemplar-based Emotional Voice Conversion Using Non-negative Matrix Factorization
    Aihara, Ryo
    Ueda, Reina
    Takiguchi, Tetsuya
    Ariki, Yasuo
    2014 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA), 2014,
  • [3] ACTIVITY-MAPPING NON-NEGATIVE MATRIX FACTORIZATION FOR EXEMPLAR-BASED VOICE CONVERSION
    Aihara, Ryo
    Takiguchi, Tetsuya
    Ariki, Yasuo
    2015 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING (ICASSP), 2015, : 4899 - 4903
  • [4] Small-parallel exemplar-based voice conversion in noisy environments using affine non-negative matrix factorization
    Aihara, Ryo
    Fujii, Takao
    Nakashika, Toru
    Takiguchi, Tetsuya
    Ariki, Yasuo
    EURASIP JOURNAL ON AUDIO SPEECH AND MUSIC PROCESSING, 2015, : 1 - 9
  • [5] Small-parallel exemplar-based voice conversion in noisy environments using affine non-negative matrix factorization
    Ryo Aihara
    Takao Fujii
    Toru Nakashika
    Tetsuya Takiguchi
    Yasuo Ariki
    EURASIP Journal on Audio, Speech, and Music Processing, 2015
  • [6] EXEMPLAR-BASED VOICE CONVERSION IN NOISY ENVIRONMENT
    Takashima, Ryoichi
    Takiguchi, Tetsuya
    Ariki, Yasuo
    2012 IEEE WORKSHOP ON SPOKEN LANGUAGE TECHNOLOGY (SLT 2012), 2012, : 313 - 317
  • [7] Exemplar-Based Voice Conversion Using Sparse Representation in Noisy Environments
    Takashima, Ryoichi
    Takiguchi, Tetsuya
    Ariki, Yasuo
    IEICE TRANSACTIONS ON FUNDAMENTALS OF ELECTRONICS COMMUNICATIONS AND COMPUTER SCIENCES, 2013, E96A (10) : 1946 - 1953
  • [8] Dictionary optimization and clustering for exemplar-based voice conversion
    Sun, Wei
    Yu, Yibiao
    FIFTH INTERNATIONAL WORKSHOP ON PATTERN RECOGNITION, 2020, 11526
  • [9] An Exemplar-Based Approach to Frequency Warping for Voice Conversion
    Tian, Xiaohai
    Lee, Siu Wa
    Wu, Zhizheng
    Chng, Eng Siong
    Li, Haizhou
    IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2017, 25 (10) : 1863 - 1876
  • [10] Exemplar-Based Spectral Detail Compensation for Voice Conversion
    Peng, Yu-Huai
    Hwang, Hsin-Te
    Wu, Yi-Chiao
    Tsao, Yu
    Wang, Hsin-Min
    19TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2018), VOLS 1-6: SPEECH RESEARCH FOR EMERGING MARKETS IN MULTILINGUAL SOCIETIES, 2018, : 486 - 490