Automatic voice onset time detection for unvoiced stops (/p/,/t/,/k/) with application to accent classification

被引:28
|
作者
Hansen, John H. L. [1 ]
Gray, Sharmistha S. [1 ]
Kim, Wooil [1 ]
机构
[1] Univ Texas Dallas, CRSS, Erik Jonsson Sch Engn & Comp Sci, Dept Elect Engn, Richardson, TX 75080 USA
关键词
Voice Onset Time (VOT); Voice Onset Region (VOR); Teager Energy Operator (TEO); Accent classification; SPEECH; ENGLISH; TRANSFORMS; PERCEPTION; INVARIANT; ROTATION; FRENCH;
D O I
10.1016/j.specom.2010.05.004
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
Articulation characteristics of particular phonemes can provide cues to distinguish accents in spoken English. For example, as shown in Arslan and Hansen (1996, 1997), Voice Onset Time (VOT) can be used to classify mandarin, Turkish, German and American accented English. Our goal in this study is to develop an automatic system that classifies accents using VOT in unvoiced stops(1). VOT is an important temporal feature which is often overlooked in speech perception, speech recognition, as well as accent detection. Fixed length frame-based speech processing inherently ignores VOT. In this paper, a more effective VOT detection scheme using the non-linear energy tracking algorithm Teager Energy Operator (TEO), across a sub-frequency band partition for unvoiced stops (/p/, /t/ and /k/), is introduced. The proposed VOT detection algorithm also incorporates spectral differences in the Voice Onset Region (VOR) and the succeeding vowel of a given stop-vowel sequence to classify speakers having accents due to different ethnic origin. The spectral cues are enhanced using one of the four types of feature parameter extractions - Discrete Mellin Transform (DMT), Discrete Mellin Fourier Transform (DMFT) and Discrete Wavelet Transform using the lowest and the highest frequency resolutions (DWTlfr and DWThfr). A Hidden Markov Model (HMM) classifier is employed with these extracted parameters and applied to the problem of accent classification. Three different language groups (American English, Chinese, and Indian) are used from the CU-Accent database. The VOT is detected with less than 10% error when compared to the manual detected VOT with a success rate of 79.90%, 87.32% and 47.73% for English, Chinese and Indian speakers (includes atypical cases for Indian case), respectively. It is noted that the DMT and DWTlfr features are good for parameterizing speech samples which exhibit substitution of succeeding vowel after the stop in accented speech. The successful accent classification rates of DMT and DWTlfr features are 66.13% and 71.67%, for /p/ and /t/ respectively, for pairwise accent detection. Alternatively, the DMFT feature works on all accent sensitive words considered, with a success rate of 70.63%. This study shows that effective VOT detection can be achieved using an integrated TEO processing with spectral difference analysis in the VOR that can be employed for accent classification. (C) 2010 Elsevier B.V. All rights reserved.
引用
收藏
页码:777 / 789
页数:13
相关论文
共 11 条
  • [1] Detection of Voice Onset Time (VOT) for unvoived stops (|p|, |t|, |k|) using the Teager Energy Operator (TEO) for automatic detection of accented English
    Das, S
    Hansen, JHL
    [J]. NORSIG 2004: PROCEEDINGS OF THE 6TH NORDIC SIGNAL PROCESSING SYMPOSIUM, 2004, 46 : 344 - 347
  • [2] Automatic estimation of voice onset time for word-initial stops by applying random forest to onset detection
    Lin, Chi-Yueh
    Wang, Hsiao-Chuan
    [J]. JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 2011, 130 (01): : 514 - 525
  • [3] AUTOMATIC DETECTION OF VOICE ONSET TIME IN DYSARTHRIC SPEECH
    Novotny, Michal
    Pospisil, Jakub
    Cmejla, Roman
    Rusz, Jan
    [J]. 2015 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING (ICASSP), 2015, : 4340 - 4344
  • [4] AUTOMATIC DETECTION OF VOICE ONSET TIME CONTRASTS FOR USE IN PRONUNCIATION ASSESSMENT
    Kazemzadeh, Abe
    Tepperman, Joseph
    Silva, Jorge
    You, Hong
    Lee, Sungbok
    Alwan, Abeer
    Narayanan, Shrikanth
    [J]. INTERSPEECH 2006 AND 9TH INTERNATIONAL CONFERENCE ON SPOKEN LANGUAGE PROCESSING, VOLS 1-5, 2006, : 721 - +
  • [5] Detection of Voice Onset Time (VOT) for unvoiced stop sound in Modern Standard Arabic (MSA) based on power signal
    AlDahri, Sulaiman S.
    Alhakami, Hazem A.
    [J]. PROCEEDINGS OF 2016 IEEE 13TH INTERNATIONAL CONFERENCE ON SIGNAL PROCESSING (ICSP 2016), 2016, : 551 - 556
  • [6] Automatic detection of Voice Onset Time in voiceless plosives using gated recurrent units
    Arias-Vergara, T.
    Arguello-Velez, P.
    Vasquez-Correa, J. C.
    Noeth, E.
    Schuster, M.
    Gonzalez-Rativa, M. C.
    Orozco-Arroyave, J. R.
    [J]. DIGITAL SIGNAL PROCESSING, 2020, 104
  • [7] Voice-onset-time in p-t-k and b-d-g in the Spanish spoken in Valdivia: an acoustic analysis
    Roldan, Y
    SotoBarba, J
    [J]. ESTUDIOS FILOLOGICOS, 1997, (32): : 27 - 33
  • [8] Wearable ECG for Real Time Complex P-QRS-T Detection and Classification of Various Arrhythmias
    Mishra, Biswajit
    Arora, Neha
    Vora, Yash
    [J]. 2019 11TH INTERNATIONAL CONFERENCE ON COMMUNICATION SYSTEMS & NETWORKS (COMSNETS), 2019, : 870 - 875
  • [9] Enhancing manual P-phase arrival detection and automatic onset time picking in a noisy microseismic data in underground mines
    Mborah, Charles
    Ge, Maochen
    [J]. INTERNATIONAL JOURNAL OF MINING SCIENCE AND TECHNOLOGY, 2018, 28 (04) : 691 - 699
  • [10] Enhancing manual P-phase arrival detection and automatic onset time picking in a noisy microseismic data in underground mines
    Mborah Charles
    Ge Maochen
    [J]. International Journal of Mining Science and Technology, 2018, 28 (04) : 691 - 699