Improving the accuracy of the speech synthesis based phonetic alignment using multiple acoustic features

被引：0

作者：

Paulo, S ^{[1
]}

Oliveira, LC ^{[1
]}

机构：

[1] IST, INESC ID, Spoken Language Syst Lab, P-1000029 Lisbon, Portugal

来源：

COMPUTATIONAL PROCESSING OF THE PORTUGUESE LANAGUAGE, PROCEEDINGS | 2003年 / 2721卷

关键词：

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

The phonetic alignment of the spoken utterances for speech research are commonly performed by HMM-based speech recognizers, in forced alignment mode, but the training of the phonetic segment models requires considerable amounts of annotated data. When no such material is available, a possible solution is to synthesize the same phonetic sequence and align the resulting speech signal with the spoken utterances. However, without a careful choice of acoustic features used in this procedure, it can perform poorly when applied to continuous speech utterances. In this paper we propose a new method to select the best features to use in the alignment procedure for each pair of phonetic segment classes. The results show that this selection considerably reduces the segment boundary location errors.

引用

页码：31 / 39

页数：9

共 50 条

[41] SOME FEATURES OF ACOUSTIC-PHONETIC SEGMENTS OF VOICELESS PLOSIVES AND THEIR RELATION TO SPEECH CONTEXT.
Hayamizu, Satoru
Tanaka, Kazuyo
Ohta, Kozo
[J]. Denshi Gijutsu Sogo Kenkyusho Iho/Bulletin of the Electrotechnical Laboratory, 1988, 52 (03): : 38 - 42
[42] Acoustic Features for Classification Based Speech Separation
Wang, Yuxuan
Han, Kun
Wang, DeLiang
[J]. 13TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2012 (INTERSPEECH 2012), VOLS 1-3, 2012, : 1530 - 1533
[43] Speech Emotion Recognition Based on Multiple Acoustic Features and Deep Convolutional Neural Network
Bhangale, Kishor
Kothandaraman, Mohanaprasad
[J]. ELECTRONICS, 2023, 12 (04)
[44] EFFECTIVENESS OF PLP-BASED PHONETIC SEGMENTATION FOR SPEECH SYNTHESIS
Shah, Nirmesh J.
Vachhani, Bhavik B.
Sailor, Hardik B.
Patil, Hemant A.
[J]. 2014 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2014,
[45] Design of an Urdu Speech Recognizer based upon acoustic phonetic modeling approach
Akram, MU
Arif, M
[J]. INMIC 2004: 8TH INTERNATIONAL MULTITOPIC CONFERENCE, PROCEEDINGS, 2004, : 91 - 96
[46] Classification of stop place in consonant-vowel contexts using feature extrapolation of acoustic-phonetic features in telephone speech
Lee, Jung-Won
Choi, Jeung-Yoon
Kang, Hong-Goo
[J]. JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 2012, 131 (02): : 1536 - 1546
[47] Improving multiple sequence alignment biological accuracy through genetic algorithms
Orobitg, Miquel
Cores, Fernando
Guirado, Fernando
Roig, Concepcio
Notredame, Cedric
[J]. JOURNAL OF SUPERCOMPUTING, 2013, 65 (03): : 1076 - 1088
[48] Improving a Conversational Speech Recognition System Using Phonetic and Neural Transcript Correction
Campos-Soberanis, Mario
Campos-Sobrino, Diego
Viana-Camara, Rafael
[J]. ADVANCES IN SOFT COMPUTING (MICAI 2021), PT II, 2021, 13068 : 46 - 58
[49] Improving multiple sequence alignment biological accuracy through genetic algorithms
Miquel Orobitg
Fernando Cores
Fernando Guirado
Concepció Roig
Cedric Notredame
[J]. The Journal of Supercomputing, 2013, 65 : 1076 - 1088
[50] Improving Speech Understanding Accuracy with Limited Training Data Using Multiple Language Models and Multiple Understanding Models
Katsumaru, Masaki
Nakano, Mikio
Komatani, Kazunori
Funakoshi, Kotaro
Ogata, Tetsuya
Okuno, Hiroshi G.
[J]. INTERSPEECH 2009: 10TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2009, VOLS 1-5, 2009, : 2699 - +

← 1 2 3 4 5 →