Comparing measurement errors for formants in synthetic and natural vowels

被引:33
|
作者
Shadle, Christine H. [1 ]
Nam, Hosung [1 ,2 ]
Whalen, D. H. [1 ,3 ]
机构
[1] Haskins Labs Inc, 300 George St, New Haven, CT 06511 USA
[2] Korea Univ, Dept English Language & Literature, 145 Anam Ro, Seoul 136701, South Korea
[3] CUNY, Grad Ctr, Program Speech Language Hearing Sci, 365 Fifth Ave, New York, NY 10016 USA
来源
基金
美国国家卫生研究院;
关键词
VOCAL-TRACT RESONANCES; AREA FUNCTIONS; ACCURACY; SPEECH;
D O I
10.1121/1.4940665
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
The measurement of formant frequencies of vowels is among the most common measurements in speech studies, but measurements are known to be biased by the particular fundamental frequency (F0) exciting the formants. Approaches to reducing the errors were assessed in two experiments. In the first, synthetic vowels were constructed with five different first formant (F1) values and nine different F0 values; formant bandwidths, and higher formant frequencies, were constant. Input formant values were compared to manual measurements and automatic measures using the linear prediction coding-Burg algorithm, linear prediction closed-phase covariance, the weighted linear prediction-attenuated main excitation (WLP-AME) algorithm [Alku, Pohjalainen, Vainio, Laukkanen, and Story (2013). J. Acoust. Soc. Am. 134(2), 1295-1313], spectra smoothed cepstrally and by averaging repeated discrete Fourier transforms. Formants were also measured manually from pruned reassigned spectrograms (RSs) [Fulop (2011). Speech Spectrum Analysis (Springer, Berlin)]. All but WLP-AME and RS had large errors in the direction of the strongest harmonic; the smallest errors occur with WLP-AME and RS. In the second experiment, these methods were used on vowels in isolated words spoken by four speakers. Results for the natural speech show that F0 bias affects all automatic methods, including WLP-AME; only the formants measured manually from RS appeared to be accurate. In addition, RS coped better with weaker formants and glottal fry. (c) 2016 Acoustical Society of America.
引用
收藏
页码:713 / 727
页数:15
相关论文
共 50 条
  • [1] Frequency discrimination of stylized synthetic vowels with two formants
    Lyzenga, J
    Horst, JW
    JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 1998, 104 (05): : 2956 - 2966
  • [2] AUTOMATIC MEASUREMENT OF THE FORMANTS OF VOWELS IN DIVERSE CONSONANTAL ENVIRONMENTS
    HOUSE, AS
    STEVENS, KN
    FUJISAKI, H
    JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 1960, 32 (11): : 1517 - 1517
  • [3] Correlation of the Formants of the Romanian Vowels
    Gheltu, Stefan-Andrei
    PROCEEDINGS OF THE 2018 10TH INTERNATIONAL CONFERENCE ON ELECTRONICS, COMPUTERS AND ARTIFICIAL INTELLIGENCE (ECAI), 2018,
  • [4] THE RECOGNITION OF SYNTHETIC AND NATURAL VOWELS
    HIRSH, IJ
    REYNOLDS, EG
    JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 1953, 25 (04): : 832 - 832
  • [5] The Variability of Vowels' Formants in Forensic Speech
    Cenceschi, Sonia
    Meluzzi, Chiara
    Trivilini, Alessandro
    IEEE INSTRUMENTATION & MEASUREMENT MAGAZINE, 2021, 24 (01) : 38 - 41
  • [6] Movement of formants of vowels in Lithuanian language
    Balbonas, D.
    Daunys, G.
    ELEKTRONIKA IR ELEKTROTECHNIKA, 2007, (07) : 15 - 18
  • [7] PERCEPTUAL STRUCTURES OF SYNTHETIC AND NATURAL VOWELS
    GOVAERTS, G
    PSYCHOLOGICA BELGICA, 1978, 18 (01) : 27 - 67
  • [8] ROLES OF PITCH AND HIGHER FORMANTS IN PERCEPTION OF VOWELS
    FUJISAKI, H
    KAWASHIMA, T
    IEEE TRANSACTIONS ON AUDIO AND ELECTROACOUSTICS, 1968, AU16 (01): : 73 - +
  • [9] Relation of formants and subglottal resonances in Hungarian vowels
    Csapo, Tomas Gabor
    Barkanyi, Zsuzsanna
    Graczi, Tekla Etelka
    Bohm, Tamas
    Lulich, Steven M.
    INTERSPEECH 2009: 10TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2009, VOLS 1-5, 2009, : 452 - +
  • [10] Evaluation of the effect of stress on formants in Farsi vowels
    Gharavian, D
    Ahadi, SM
    2004 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOL I, PROCEEDINGS: SPEECH PROCESSING, 2004, : 661 - 664