Analysis and recognition of whispered speech

被引:121
|
作者
Ito, T [1 ]
Takeda, K [1 ]
Itakura, F [1 ]
机构
[1] Nagoya Univ, Grad Sch Engn, Nagoya, Aichi 4648603, Japan
关键词
speech recognition; whispered speech; telephone handset; noise robustness;
D O I
10.1016/j.specom.2003.10.005
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
In this study, we have examined the acoustic characteristics of whispered speech and addressed some of the issues involved in recognition of whispered speech used for communication over a mobile phone in a noisy environment. The acoustic analysis shows that there is an upward shift of formant frequencies of vowels as observed in the whispered speech data compared to the normal speech data. Voiced consonants in the whispered speech have lower energy at low frequencies up to 1.5 kHz and their spectral flatness is greater compared to the normal speech. In experiments on whispered speech recognition, results of our studies on adaptation of the whispered speech models have shown that adaptation using a small amount of whispered speech data from a target speaker can be effectively used for recognition of the whispered speech. In a noisy environment, the recognition accuracy decreases significantly for the whispered speech compared to the normal speaking of the same speech. A method to increase the SNR by covering the mouth with a hand has been shown to give a higher recognition accuracy for the whispered speech frequently encountered for private communication in a noisy environment. (C) 2004 Elsevier B.V. All rights reserved.
引用
收藏
页码:139 / 152
页数:14
相关论文
共 50 条
  • [21] VISUAL-ONLY RECOGNITION OF NORMAL, WHISPERED AND SILENT SPEECH
    Petridis, Stavros
    Shen, Jie
    Cetin, Doruk
    Pantic, Maja
    2018 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2018, : 6219 - 6223
  • [22] Transfer Learning with Bottleneck Feature Networks for Whispered Speech Recognition
    Lim, Boon Pang
    Wong, Faith
    Li, Yuyao
    Bay, Jia Wei
    17TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2016), VOLS 1-5: UNDERSTANDING SPEECH PROCESSING IN HUMANS AND MACHINES, 2016, : 1578 - 1582
  • [23] Whispered speech recognition based on gammatone filterbank cepstral coefficients
    B. Marković
    J. Galić
    Ð. Grozdić
    S. T. Jovičić
    M. Mijić
    Journal of Communications Technology and Electronics, 2017, 62 : 1255 - 1261
  • [24] ACOUSTIC ANALYSIS FOR SPEAKER IDENTIFICATION OF WHISPERED SPEECH
    Fan, Xing
    Hansen, John H. L.
    2010 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2010, : 5046 - 5049
  • [25] UT-VOCAL EFFORT II: ANALYSIS AND CONSTRAINED-LEXICON RECOGNITION OF WHISPERED SPEECH
    Ghaffarzadegan, Shabnam
    Boril, Hynek
    Hansen, John H. L.
    2014 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2014,
  • [26] Generative Modeling of Pseudo-Whisper for Robust Whispered Speech Recognition
    Ghaffarzadegan, Shabnam
    Boril, Hynek
    Hansen, John H. L.
    IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2016, 24 (10) : 1705 - 1720
  • [27] Significance of parametric spectral ratio methods in detection and recognition of whispered speech
    Mathur, Arpit
    Reddy, Shankar M.
    Hegde, Rajesh M.
    EURASIP JOURNAL ON ADVANCES IN SIGNAL PROCESSING, 2012,
  • [28] Whispered Speech Recognition Based on Audio Data Augmentation and Inverse Filtering
    Galic, Jovan
    Markovic, Branko
    Grozdic, Dorde
    Popovic, Branislav
    Sajic, Slavko
    APPLIED SCIENCES-BASEL, 2024, 14 (18):
  • [29] Significance of parametric spectral ratio methods in detection and recognition of whispered speech
    Arpit Mathur
    Shankar M Reddy
    Rajesh M Hegde
    EURASIP Journal on Advances in Signal Processing, 2012
  • [30] Whispered Speech Recognition Using Deep Denoising Autoencoder and Inverse Filtering
    Grozdic, Dorde T.
    Jovicic, Slobodan T.
    IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2017, 25 (12) : 2313 - 2322