Improving the accuracy of the speech synthesis based phonetic alignment using multiple acoustic features

被引:0
|
作者
Paulo, S [1 ]
Oliveira, LC [1 ]
机构
[1] IST, INESC ID, Spoken Language Syst Lab, P-1000029 Lisbon, Portugal
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The phonetic alignment of the spoken utterances for speech research are commonly performed by HMM-based speech recognizers, in forced alignment mode, but the training of the phonetic segment models requires considerable amounts of annotated data. When no such material is available, a possible solution is to synthesize the same phonetic sequence and align the resulting speech signal with the spoken utterances. However, without a careful choice of acoustic features used in this procedure, it can perform poorly when applied to continuous speech utterances. In this paper we propose a new method to select the best features to use in the alignment procedure for each pair of phonetic segment classes. The results show that this selection considerably reduces the segment boundary location errors.
引用
收藏
页码:31 / 39
页数:9
相关论文
共 50 条
  • [41] SOME FEATURES OF ACOUSTIC-PHONETIC SEGMENTS OF VOICELESS PLOSIVES AND THEIR RELATION TO SPEECH CONTEXT.
    Hayamizu, Satoru
    Tanaka, Kazuyo
    Ohta, Kozo
    [J]. Denshi Gijutsu Sogo Kenkyusho Iho/Bulletin of the Electrotechnical Laboratory, 1988, 52 (03): : 38 - 42
  • [42] Acoustic Features for Classification Based Speech Separation
    Wang, Yuxuan
    Han, Kun
    Wang, DeLiang
    [J]. 13TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2012 (INTERSPEECH 2012), VOLS 1-3, 2012, : 1530 - 1533
  • [43] Speech Emotion Recognition Based on Multiple Acoustic Features and Deep Convolutional Neural Network
    Bhangale, Kishor
    Kothandaraman, Mohanaprasad
    [J]. ELECTRONICS, 2023, 12 (04)
  • [44] EFFECTIVENESS OF PLP-BASED PHONETIC SEGMENTATION FOR SPEECH SYNTHESIS
    Shah, Nirmesh J.
    Vachhani, Bhavik B.
    Sailor, Hardik B.
    Patil, Hemant A.
    [J]. 2014 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2014,
  • [45] Design of an Urdu Speech Recognizer based upon acoustic phonetic modeling approach
    Akram, MU
    Arif, M
    [J]. INMIC 2004: 8TH INTERNATIONAL MULTITOPIC CONFERENCE, PROCEEDINGS, 2004, : 91 - 96
  • [46] Classification of stop place in consonant-vowel contexts using feature extrapolation of acoustic-phonetic features in telephone speech
    Lee, Jung-Won
    Choi, Jeung-Yoon
    Kang, Hong-Goo
    [J]. JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 2012, 131 (02): : 1536 - 1546
  • [47] Improving multiple sequence alignment biological accuracy through genetic algorithms
    Orobitg, Miquel
    Cores, Fernando
    Guirado, Fernando
    Roig, Concepcio
    Notredame, Cedric
    [J]. JOURNAL OF SUPERCOMPUTING, 2013, 65 (03): : 1076 - 1088
  • [48] Improving a Conversational Speech Recognition System Using Phonetic and Neural Transcript Correction
    Campos-Soberanis, Mario
    Campos-Sobrino, Diego
    Viana-Camara, Rafael
    [J]. ADVANCES IN SOFT COMPUTING (MICAI 2021), PT II, 2021, 13068 : 46 - 58
  • [49] Improving multiple sequence alignment biological accuracy through genetic algorithms
    Miquel Orobitg
    Fernando Cores
    Fernando Guirado
    Concepció Roig
    Cedric Notredame
    [J]. The Journal of Supercomputing, 2013, 65 : 1076 - 1088
  • [50] Improving Speech Understanding Accuracy with Limited Training Data Using Multiple Language Models and Multiple Understanding Models
    Katsumaru, Masaki
    Nakano, Mikio
    Komatani, Kazunori
    Funakoshi, Kotaro
    Ogata, Tetsuya
    Okuno, Hiroshi G.
    [J]. INTERSPEECH 2009: 10TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2009, VOLS 1-5, 2009, : 2699 - +