HMM-based speech synthesis using sub-band basis spectrum model

被引:0
|
作者
Ohtani, Yamato [1 ]
Tamura, Masatsune [1 ]
Morita, Masahiro [1 ]
Kagoshima, Takehiko [1 ]
Akamine, Masami [1 ]
机构
[1] Toshiba Co Ltd, Corp Res & Dev Ctr, Knowledge Media Lab, Tokyo, Japan
关键词
speech synthesis; hidden Markov model; sub-band basis spectrum model; phase feature; SYNTHESIS SYSTEM;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In this paper, we propose HMM-based text-to-speech (TTS) using sub-band basis spectrum model (SBM). SBM can represent vocal tract spectra and phase characteristics by a linear combination of sub-band basis vectors. Some reports suggest that analysis-synthesized speech based on SBM is close to natural speech and SBM can perform effectively in TTS. Therefore, the SBM framework is expected to have good effects on HMM-based TTS by improving speech quality. Subjective experimental results show that the proposed method improves speech quality in some conditions.
引用
收藏
页码:1438 / 1441
页数:4
相关论文
共 50 条
  • [1] Statistical Bandwidth Extension for Speech Synthesis Based on Gaussian Mixture Model with Sub-Band Basis Spectrum Model
    Ohtani, Yamato
    Tamura, Masatsune
    Morita, Masahiro
    Akamine, Masami
    [J]. IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 2016, E99D (10): : 2481 - 2489
  • [2] Amplitude Spectrum based Excitation Model for HMM-based Speech Synthesis
    Wen, Zhengqi
    Tao, Jianhua
    [J]. 13TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2012 (INTERSPEECH 2012), VOLS 1-3, 2012, : 1426 - 1429
  • [3] Adaptation of pitch and spectrum for HMM-based speech synthesis using MLLR
    Tamura, M
    Masuko, T
    Tokuda, K
    Kobayashi, T
    [J]. 2001 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOLS I-VI, PROCEEDINGS: VOL I: SPEECH PROCESSING 1; VOL II: SPEECH PROCESSING 2 IND TECHNOL TRACK DESIGN & IMPLEMENTATION OF SIGNAL PROCESSING SYSTEMS NEURALNETWORKS FOR SIGNAL PROCESSING; VOL III: IMAGE & MULTIDIMENSIONAL SIGNAL PROCESSING MULTIMEDIA SIGNAL PROCESSING, 2001, : 805 - 808
  • [4] An acoustic model adaptation using hmm-based speech synthesis
    Tanaka, K
    Kuroiwa, S
    Tsuge, S
    Ren, F
    [J]. 2003 INTERNATIONAL CONFERENCE ON NATURAL LANGUAGE PROCESSING AND KNOWLEDGE ENGINEERING, PROCEEDINGS, 2003, : 368 - 373
  • [5] Pitch-Scaled Spectrum based Excitation Model for HMM-based Speech Synthesis
    Wen, Zhengqi
    Tao, Jianhua
    Hain, Horst-Udo
    [J]. PROCEEDINGS OF 2012 IEEE 11TH INTERNATIONAL CONFERENCE ON SIGNAL PROCESSING (ICSP) VOLS 1-3, 2012, : 609 - +
  • [6] Pitch-Scaled Spectrum Based Excitation Model for HMM-based Speech Synthesis
    Wen, Zhengqi
    Tao, Jianhua
    Pan, Shifeng
    Wang, Yang
    [J]. JOURNAL OF SIGNAL PROCESSING SYSTEMS FOR SIGNAL IMAGE AND VIDEO TECHNOLOGY, 2014, 74 (03): : 423 - 435
  • [7] Pitch-Scaled Spectrum Based Excitation Model for HMM-based Speech Synthesis
    Zhengqi Wen
    Jianhua Tao
    Shifeng Pan
    Yang Wang
    [J]. Journal of Signal Processing Systems, 2014, 74 : 423 - 435
  • [8] Multi-resolution sub-band features and models for HMM-based phonetic modelling
    McCourt, PM
    Vaseghi, SV
    Doherty, B
    [J]. COMPUTER SPEECH AND LANGUAGE, 2000, 14 (03): : 241 - 259
  • [9] Two-band excitation for HMM-based speech synthesis
    Kim, Sang-Jin
    Hahn, Minsoo
    [J]. IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 2007, E90D (01): : 378 - 381
  • [10] HMM-based emotional speech synthesis using average emotion model
    Qin, Long
    Ling, Zhen-Hua
    Wu, Yi-Jian
    Zhang, Bu-Fan
    Wang, Ren-Hua
    [J]. CHINESE SPOKEN LANGUAGE PROCESSING, PROCEEDINGS, 2006, 4274 : 233 - +