Improving the accuracy of the speech synthesis based phonetic alignment using multiple acoustic features

被引:0
|
作者
Paulo, S [1 ]
Oliveira, LC [1 ]
机构
[1] IST, INESC ID, Spoken Language Syst Lab, P-1000029 Lisbon, Portugal
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The phonetic alignment of the spoken utterances for speech research are commonly performed by HMM-based speech recognizers, in forced alignment mode, but the training of the phonetic segment models requires considerable amounts of annotated data. When no such material is available, a possible solution is to synthesize the same phonetic sequence and align the resulting speech signal with the spoken utterances. However, without a careful choice of acoustic features used in this procedure, it can perform poorly when applied to continuous speech utterances. In this paper we propose a new method to select the best features to use in the alignment procedure for each pair of phonetic segment classes. The results show that this selection considerably reduces the segment boundary location errors.
引用
收藏
页码:31 / 39
页数:9
相关论文
共 50 条
  • [1] Phonetic segmentation using multiple speech features
    Mporas, Iosif
    Ganchev, Todor
    Fakotakis, Nikos
    [J]. INTERNATIONAL JOURNAL OF SPEECH TECHNOLOGY, 2008, 11 (02) : 73 - 85
  • [2] PHONETIC FEATURES AND ACOUSTIC INVARIANCE IN SPEECH
    BLUMSTEIN, SE
    STEVENS, KN
    [J]. COGNITION, 1981, 10 (1-3) : 25 - 32
  • [3] IMPROVING SPEECH ENHANCEMENT WITH PHONETIC EMBEDDING FEATURES
    Wu, Bo
    Yu, Meng
    Chen, Lianwu
    Jin, Mingjie
    Su, Dan
    Yu, Dong
    [J]. 2019 IEEE AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING WORKSHOP (ASRU 2019), 2019, : 645 - 651
  • [4] Phonetic alignment:: speech synthesis-based vs. Viterbi-based
    Malfrère, F
    Deroo, O
    Dutoit, T
    Ris, C
    [J]. SPEECH COMMUNICATION, 2003, 40 (04) : 503 - 515
  • [5] Acoustic-Phonetic Features for Refining the Explicit Speech Segmentation
    Selmini, Antonio Marcos
    Violaro, Fabio
    [J]. INTERSPEECH 2007: 8TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION, VOLS 1-4, 2007, : 1853 - 1856
  • [6] Improving accuracy of multiple sequence alignment algorithms based on alignment of neighboring residues
    Lu, Yue
    Sze, Sing-Hoi
    [J]. NUCLEIC ACIDS RESEARCH, 2009, 37 (02) : 463 - 472
  • [7] Classification of Fricatives Using Feature Extrapolation of Acoustic-Phonetic Features in Telephone Speech
    Lee, Jung-Won
    Choi, Jeung-Yoon
    Kang, Hong-Goo
    [J]. 12TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2011 (INTERSPEECH 2011), VOLS 1-5, 2011, : 1268 - 1271
  • [8] Speech intelligibility of dysarthric speech: human scores and acoustic-phonetic features
    Xue, Wei
    van Hout, Roeland
    Boogmans, Fleur
    Ganzeboom, Mario
    Cucchiarini, Catia
    Strik, Helmer
    [J]. INTERSPEECH 2021, 2021, : 2911 - 2915
  • [9] Dialectal Assamese Vowel Speech Detection using Acoustic Phonetic Features, KNN and RNN
    Sharma, Mridusmita
    Sarma, Kandarpa Kumar
    [J]. 2ND INTERNATIONAL CONFERENCE ON SIGNAL PROCESSING AND INTEGRATED NETWORKS (SPIN) 2015, 2015, : 674 - 678
  • [10] Characterizing Parkinson's Disease Speech by Acoustic and Phonetic Features
    Proenca, Jorge
    Veiga, Arlindo
    Candeias, Sara
    Lemos, Joao
    Januario, Cristina
    Perdigao, Fernando
    [J]. COMPUTATIONAL PROCESSING OF THE PORTUGUESE LANGUAGE, 2014, 8775 : 24 - 35