Phonetic analysis and automatic prediction of vowel duration in Hungarian spontaneous speech

被引:0
|
作者
Beke, Andras [1 ]
Gosy, Maria [1 ]
机构
[1] Hungarian Acad Sci Phonet, Res Inst Linguist, Budapest, Hungary
来源
关键词
D O I
10.3233/IDT-140198
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
A large number of phonetic and phonology research papers analyzed segmental durations focusing on factors and interactions that determine their durations. The results often play an important role in Language Technology applications, for example in TTS (text-to-speech synthesis), ASR (automatic speech recognition) and are widely used in infocommunications today. Both the research theory behind this topic, and its applications in infocommunications increasingly rely on the results of the cognitive sciences. This fact encourages scientists to discuss these results within the field of cognitive infocommunications (CogInfoCom) as supporting elements of novel kinds of cognitive capabilities of infocommunication systems. Speech sound duration depends on various factors such as phonetic quality, phonological context, phonological position in the word or in the utterance, speech style, etc. The multifunction dependence of vowel duration is more complex in those languages where vowel length is a distinctive feature like in Hungarian. The main goal of the present research was to analyze the physical durations of pairs of vowels in spontaneous speech that exhibit a phonological length opposition. In addition, we intended to develop an algorithm for automatic classification of the short and long vowels occurring in spontaneous speech. On the basis of these findings we intended to predict automatically the vowel durations based on three different methods. The measured data confirmed our hypothesis that phonologically short vs. long vowels would significantly differ in their physical durations in spontaneous speech. The results of the automatic vowel length classification also supported this finding. The third aspect of our investigations was to use different supervised learning methods in order to predict vowel duration, based on different feature vectors consisting of characteristic and spectral features. The best result was yielded by the combined features and FFNN were used. The correlation between the target and the predicted vowel duration was 0.79 while RMSE was 25 ms. The results obtained support the complexity of features affecting vowel duration, on the one hand, and indicate the temporal complexity of segments in spontaneous speech, as has been reported for Lithuanian, Czech, Hindi, Telugu and Korean, on the other hand.
引用
收藏
页码:301 / 314
页数:14
相关论文
共 50 条
  • [1] Characteristics and Spectral Features Used in Automatic Prediction of Vowel Duration in Spontaneous Speech
    Beke, A.
    Gosy, M.
    3RD IEEE INTERNATIONAL CONFERENCE ON COGNITIVE INFOCOMMUNICATIONS (COGINFOCOM 2012), 2012, : 65 - 70
  • [2] Predicting Vowel Duration in Spontaneous Canadian French Speech
    Williams, Darcie
    Poire, Francois
    INTERSPEECH 2007: 8TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION, VOLS 1-4, 2007, : 1953 - 1956
  • [3] Phonetic correlates of phonological vowel quantity in Yakut read and spontaneous speech
    Vasilyeva, Lena
    Arnhold, Anja
    Jarvikivi, Juhani
    JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 2016, 139 (05): : 2541 - 2550
  • [4] Automatic measurement of vowel duration via structured prediction
    Adi, Yossi
    Keshet, Joseph
    Cibelli, Emily
    Gustafson, Erin
    Clopper, Cynthia
    Goldrick, Matthew
    JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 2016, 140 (06): : 4517 - 4527
  • [5] AUTOMATIC RECOGNITION OF SCHWA VARIANTS IN SPONTANEOUS HUNGARIAN SPEECH
    Andras Beke
    Gyoergy Szaszak
    ACTA LINGUISTICA HUNGARICA, 2010, 57 (2-3) : 329 - 353
  • [6] ANALYSIS OF 1500 PHONETIC ERRORS IN SPONTANEOUS SPEECH
    SHATTUCKHUFNAGEL, SR
    KLATT, DH
    JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 1975, 58 : S62 - S62
  • [7] Automatic Analysis of Phonetic Speech Style Dimensions
    Ryant, Neville
    Liberman, Mark
    17TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2016), VOLS 1-5: UNDERSTANDING SPEECH PROCESSING IN HUMANS AND MACHINES, 2016, : 77 - 81
  • [8] SPECTROGRAPHIC ANALYSIS OF VOWEL AND WORD DURATION IN APRAXIA OF SPEECH
    COLLINS, M
    ROSENBEK, JC
    WERTZ, RT
    JOURNAL OF SPEECH AND HEARING RESEARCH, 1983, 26 (02): : 224 - 230
  • [9] VOWEL DURATION CHARACTERISTICS OF ESOPHAGEAL SPEECH
    CHRISTENSEN, JM
    WEINBERG, B
    JOURNAL OF SPEECH AND HEARING RESEARCH, 1976, 19 (04): : 678 - 689
  • [10] VOWEL DURATION IN WHISPERED AND IN NORMAL SPEECH
    SHARF, DJ
    LANGUAGE AND SPEECH, 1964, 7 (02) : 89 - 97