Measuring the gap between HMM-based ASR and TTS

被引:0
|
作者
Dines, John [1 ]
Yamagishi, Junichi [2 ]
King, Simon [2 ]
机构
[1] Idiap Res Inst, CH-1920 Martigny, Switzerland
[2] Univ Edinburgh, CSTR, Edinburgh EH8 9AB, Midlothian, Scotland
基金
英国工程与自然科学研究理事会;
关键词
speech synthesis; speech recognition; unified models;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The EMIME European project is conducting research in the development of technologies for mobile, personalised speech-to-speech translation systems. The hidden Markov model is being used as the underlying technology in both automatic speech recognition (ASR) and text-to-speech synthesis (TTS) components, thus, the investigation of unified statistical modelling approaches has become an implicit goal of our research. As one of the first steps towards this goal, we have been investigating commonalities and differences between HMM-based ASR and TTS. In this paper we present results and analysis of a series of experiments that have been conducted on English ASR and TTS systems measuring their performance with respect to phone set and lexicon, acoustic feature type and dimensionality and HMM topology. Our results show that, although the fundamental statistical model may be essentially the same, optimal ASR and TTS performance often demands diametrically opposed system designs. This represents a major challenge to be addressed in the investigation of such unified modelling approaches.
引用
收藏
页码:1411 / +
页数:2
相关论文
共 50 条
  • [31] Equivalence between LC-CRF and HMM, and Discriminative Computing of HMM-Based MPM and MAP
    Azeraf, Elie
    Monfrini, Emmanuel
    Pieczynski, Wojciech
    ALGORITHMS, 2023, 16 (03)
  • [32] What's so complex about conversational speech? A comparison of HMM-based and transformer-based ASR architectures
    Linke, Julian
    Geiger, Bernhard C.
    Kubin, Gernot
    Schuppler, Barbara
    COMPUTER SPEECH AND LANGUAGE, 2025, 90
  • [33] Croatian HMM-based speech synthesis
    Department of Informatics, Faculty of Philosophy, University of Rijeka, Omladinska 14, Rijeka
    51000, Croatia
    J. Compt. Inf. Technol., 2006, 4 (307-313):
  • [34] A HMM-BASED METHOD FOR ANOMALY DETECTION
    Wang, Fei
    Zhu, Hongliang
    Tian, Bin
    Xin, Yang
    Niu, Xinxin
    Yang, Yu
    2011 4TH IEEE INTERNATIONAL CONFERENCE ON BROADBAND NETWORK AND MULTIMEDIA TECHNOLOGY (4TH IEEE IC-BNMT2011), 2011, : 276 - 280
  • [35] HMM-BASED ARCHITECTURE FOR FACE IDENTIFICATION
    SAMARIA, F
    YOUNG, S
    IMAGE AND VISION COMPUTING, 1994, 12 (08) : 537 - 543
  • [36] HMM-based audio keyword generation
    Xu, M
    Duan, LY
    Cai, J
    Chia, LT
    Xu, CS
    Tian, Q
    ADVANCES IN MULTIMEDIA INFORMATION PROCESSING - PCM 2004, PT 3, PROCEEDINGS, 2004, 3333 : 566 - 574
  • [37] An HMM-based approach to humming transcription
    Shih, HH
    Narayanan, SS
    Kuo, CCJ
    IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXPO, VOL I AND II, PROCEEDINGS, 2002, : 337 - 340
  • [38] HMM-based synthesis of creaky voice
    Raitio, Tuomo
    Kane, John
    Drugman, Thomas
    Gobl, Christer
    14TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2013), VOLS 1-5, 2013, : 2315 - +
  • [39] Developing HMM-based recognizers with ESMERALDA
    Fink, GA
    TEXT, SPEECH AND DIALOGUE, 1999, 1692 : 229 - 234
  • [40] HMM-Based Vietnamese Speech Synthesis
    Trinh Quoc Son
    2015 IEEE/ACIS 14TH INTERNATIONAL CONFERENCE ON COMPUTER AND INFORMATION SCIENCE (ICIS), 2015, : 349 - 353