COMBINING VOCAL TRACT LENGTH NORMALIZATION WITH HIERARCHIAL LINEAR TRANSFORMATIONS

被引:0
|
作者
Saheer, Lakshmi [1 ,3 ]
Yamagishi, Junichi [2 ,4 ]
Garner, Philip N. [1 ]
Dines, John [1 ]
机构
[1] Idiap Res Inst, CH-1920 Martigny, Switzerland
[2] Univ Edinburgh, Ctr Speech Technol Res, Edinburgh EH8 9AB, Midlothian, Scotland
[3] Ecole Polytech Fed Lausanne, CH-1015 Lausanne, Switzerland
[4] Natl Inst Informat, Tokyo 1018430, Japan
基金
英国工程与自然科学研究理事会;
关键词
Statistical parametric speech synthesis; hidden Markov models; speaker adaptation; vocal tract length normalization; constrained structural maximum a posteriori linear regression; SPEAKER ADAPTATION;
D O I
暂无
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
Recent research has demonstrated the effectiveness of vocal tract length normalization (VTLN) as a rapid adaptation technique for statistical parametric speech synthesis. VTLN produces speech with naturalness preferable to that of MLLR-based adaptation techniques, being much closer in quality to that generated by the original average voice model. However with only a single parameter, VTLN captures very few speaker specific characteristics when compared to linear transform based adaptation techniques. This paper proposes that the merits of VTLN can be combined with those of linear transform based adaptation in a hierarchial Bayesian framework, where VTLN is used as the prior information. A novel technique for propagating the gender information from the VTLN prior through constrained structural maximum a posteriori linear regression (CSMAPLR) adaptation is presented. Experiments show that the resulting transformation has improved speech quality with better naturalness, intelligibility and improved speaker similarity.
引用
收藏
页码:4493 / 4496
页数:4
相关论文
共 50 条
  • [1] Combining Vocal Tract Length Normalization With Hierarchical Linear Transformations
    Saheer, Lakshmi
    Yamagishi, Junichi
    Garner, Philip N.
    Dines, John
    [J]. IEEE JOURNAL OF SELECTED TOPICS IN SIGNAL PROCESSING, 2014, 8 (02) : 262 - 272
  • [2] A parametric approach to vocal tract length normalization
    Eide, E
    Gish, H
    [J]. 1996 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, CONFERENCE PROCEEDINGS, VOLS 1-6, 1996, : 346 - 348
  • [3] Time domain vocal tract length normalization
    Sündermann, D
    Bonafonte, A
    Ney, H
    Hoge, H
    [J]. Proceedings of the Fourth IEEE International Symposium on Signal Processing and Information Technology, 2004, : 191 - 194
  • [4] Parameter optimization for Vocal Tract Length Normalization
    Dognin, P
    El-Jaroudi, A
    Billa, J
    [J]. 2000 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, PROCEEDINGS, VOLS I-VI, 2000, : 1767 - 1770
  • [5] An Approach to Vocal Tract Length Normalization by Robust Formant
    Kabir, A.
    Barker, J.
    Giurgiu, M.
    [J]. RECENT ADVANCES IN CIRCUITS, SYSTEMS AND SIGNALS, 2010, : 345 - +
  • [6] Vocal Tract Length Normalization Features for Audio Search
    Madhavi, Maulik C.
    Sharma, Shubham
    Patil, Hemant A.
    [J]. TEXT, SPEECH, AND DIALOGUE (TSD 2015), 2015, 9302 : 387 - 395
  • [7] A bilinear transform approach for vocal tract length normalization
    Xu, W
    Wang, BX
    Ding, Q
    [J]. 2004 8TH INTERNATIONAL CONFERENCE ON CONTROL, AUTOMATION, ROBOTICS AND VISION, VOLS 1-3, 2004, : 547 - 551
  • [8] The ΔF method of vocal tract length normalization for vowels
    Johnson, Keith
    [J]. LABORATORY PHONOLOGY, 2020, 11 (01):
  • [9] A frequency warping approach for vocal tract length normalization
    Ding, Q
    Xu, W
    Wang, BX
    [J]. 2004 7TH INTERNATIONAL CONFERENCE ON SIGNAL PROCESSING PROCEEDINGS, VOLS 1-3, 2004, : 691 - 694
  • [10] Region-Based Vocal Tract Length Normalization for ASR
    Maragakis, Michail G.
    Potamianos, Alexandros
    [J]. INTERSPEECH 2008: 9TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2008, VOLS 1-5, 2008, : 1365 - 1368