Phonetic analysis and automatic prediction of vowel duration in Hungarian spontaneous speech

被引:0
|
作者
Beke, Andras [1 ]
Gosy, Maria [1 ]
机构
[1] Hungarian Acad Sci Phonet, Res Inst Linguist, Budapest, Hungary
来源
关键词
D O I
10.3233/IDT-140198
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
A large number of phonetic and phonology research papers analyzed segmental durations focusing on factors and interactions that determine their durations. The results often play an important role in Language Technology applications, for example in TTS (text-to-speech synthesis), ASR (automatic speech recognition) and are widely used in infocommunications today. Both the research theory behind this topic, and its applications in infocommunications increasingly rely on the results of the cognitive sciences. This fact encourages scientists to discuss these results within the field of cognitive infocommunications (CogInfoCom) as supporting elements of novel kinds of cognitive capabilities of infocommunication systems. Speech sound duration depends on various factors such as phonetic quality, phonological context, phonological position in the word or in the utterance, speech style, etc. The multifunction dependence of vowel duration is more complex in those languages where vowel length is a distinctive feature like in Hungarian. The main goal of the present research was to analyze the physical durations of pairs of vowels in spontaneous speech that exhibit a phonological length opposition. In addition, we intended to develop an algorithm for automatic classification of the short and long vowels occurring in spontaneous speech. On the basis of these findings we intended to predict automatically the vowel durations based on three different methods. The measured data confirmed our hypothesis that phonologically short vs. long vowels would significantly differ in their physical durations in spontaneous speech. The results of the automatic vowel length classification also supported this finding. The third aspect of our investigations was to use different supervised learning methods in order to predict vowel duration, based on different feature vectors consisting of characteristic and spectral features. The best result was yielded by the combined features and FFNN were used. The correlation between the target and the predicted vowel duration was 0.79 while RMSE was 25 ms. The results obtained support the complexity of features affecting vowel duration, on the one hand, and indicate the temporal complexity of segments in spontaneous speech, as has been reported for Lithuanian, Czech, Hindi, Telugu and Korean, on the other hand.
引用
收藏
页码:301 / 314
页数:14
相关论文
共 50 条
  • [21] ACOUSTICAL ANALYSIS OF VOWEL DURATION IN APRAXIA OF SPEECH - A CASE-STUDY
    CALIGIURI, MP
    TILL, JA
    FOLIA PHONIATRICA, 1983, 35 (05): : 226 - 234
  • [22] Automatic Rhythm and Speech Rate Analysis of Mising Spontaneous Speech
    Gogoi, Parismita
    Sarmah, Priyankoo
    Prasanna, S. R. M.
    SPEECH AND COMPUTER, SPECOM 2022, 2022, 13721 : 201 - 213
  • [23] Automatic quantitative analysis of spontaneous aphasic speech
    Le, Duc
    Licata, Keli
    Provost, Emily Mower
    SPEECH COMMUNICATION, 2018, 100 : 1 - 12
  • [25] VOWEL QUANTITY IN THE KOSOVO-RESAVA DIALECT: A SPONTANEOUS SPEECH ANALYSIS
    Janevska, Marija N.
    NASLEDE, 2023, 20 (55): : 25 - 36
  • [26] Toward an Optimum Feature Set and HMM Model Parameters for Automatic Phonetic Alignment of Spontaneous Speech
    Karnjanadecha, Montri
    Zahorian, Stephen A.
    13TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2012 (INTERSPEECH 2012), VOLS 1-3, 2012, : 2287 - 2290
  • [27] Boundary Markers in Spontaneous Hungarian Speech
    Beke, Andras
    Gosy, Maria
    Horvath, Viktoria
    HUMAN LANGUAGE TECHNOLOGY: CHALLENGES FOR COMPUTER SCIENCE AND LINGUISTICS, 2016, 9561 : 3 - 15
  • [28] Improvements on Automatic Speech Segmentation at the Phonetic Level
    Ander Gomez, Jon
    Calvo, Marcos
    PROGRESS IN PATTERN RECOGNITION, IMAGE ANALYSIS, COMPUTER VISION, AND APPLICATIONS, 2011, 7042 : 557 - 564
  • [29] PHONETIC SUBSPACE ADAPTATION FOR AUTOMATIC SPEECH RECOGNITION
    Ghalehjegh, Sina Hamidi
    Rose, Richard C.
    2013 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2013, : 7937 - 7941
  • [30] THE USE OF PHONETIC RULES IN AUTOMATIC SPEECH RECOGNITION
    ZUE, VW
    SPEECH COMMUNICATION, 1983, 2 (2-3) : 181 - 186