Analysis and recognition of whispered speech

被引：121

作者：

Ito, T ^{[1
]}

Takeda, K ^{[1
]}

Itakura, F ^{[1
]}

机构：

[1] Nagoya Univ, Grad Sch Engn, Nagoya, Aichi 4648603, Japan

来源：

SPEECH COMMUNICATION | 2005年 / 45卷 / 02期

关键词：

speech recognition; whispered speech; telephone handset; noise robustness;

D O I：

10.1016/j.specom.2003.10.005

中图分类号：

O42 [声学];

学科分类号：

070206 ; 082403 ;

摘要：

In this study, we have examined the acoustic characteristics of whispered speech and addressed some of the issues involved in recognition of whispered speech used for communication over a mobile phone in a noisy environment. The acoustic analysis shows that there is an upward shift of formant frequencies of vowels as observed in the whispered speech data compared to the normal speech data. Voiced consonants in the whispered speech have lower energy at low frequencies up to 1.5 kHz and their spectral flatness is greater compared to the normal speech. In experiments on whispered speech recognition, results of our studies on adaptation of the whispered speech models have shown that adaptation using a small amount of whispered speech data from a target speaker can be effectively used for recognition of the whispered speech. In a noisy environment, the recognition accuracy decreases significantly for the whispered speech compared to the normal speaking of the same speech. A method to increase the SNR by covering the mouth with a hand has been shown to give a higher recognition accuracy for the whispered speech frequently encountered for private communication in a noisy environment. (C) 2004 Elsevier B.V. All rights reserved.

引用

页码：139 / 152

页数：14

共 50 条

[21] VISUAL-ONLY RECOGNITION OF NORMAL, WHISPERED AND SILENT SPEECH
Petridis, Stavros
Shen, Jie
Cetin, Doruk
Pantic, Maja
2018 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2018, : 6219 - 6223
[22] Transfer Learning with Bottleneck Feature Networks for Whispered Speech Recognition
Lim, Boon Pang
Wong, Faith
Li, Yuyao
Bay, Jia Wei
17TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2016), VOLS 1-5: UNDERSTANDING SPEECH PROCESSING IN HUMANS AND MACHINES, 2016, : 1578 - 1582
[23] Whispered speech recognition based on gammatone filterbank cepstral coefficients
B. Marković
J. Galić
Ð. Grozdić
S. T. Jovičić
M. Mijić
Journal of Communications Technology and Electronics, 2017, 62 : 1255 - 1261
[24] ACOUSTIC ANALYSIS FOR SPEAKER IDENTIFICATION OF WHISPERED SPEECH
Fan, Xing
Hansen, John H. L.
2010 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2010, : 5046 - 5049
[25] UT-VOCAL EFFORT II: ANALYSIS AND CONSTRAINED-LEXICON RECOGNITION OF WHISPERED SPEECH
Ghaffarzadegan, Shabnam
Boril, Hynek
Hansen, John H. L.
2014 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2014,
[26] Generative Modeling of Pseudo-Whisper for Robust Whispered Speech Recognition
Ghaffarzadegan, Shabnam
Boril, Hynek
Hansen, John H. L.
IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2016, 24 (10) : 1705 - 1720
[27] Significance of parametric spectral ratio methods in detection and recognition of whispered speech
Mathur, Arpit
Reddy, Shankar M.
Hegde, Rajesh M.
EURASIP JOURNAL ON ADVANCES IN SIGNAL PROCESSING, 2012,
[28] Whispered Speech Recognition Based on Audio Data Augmentation and Inverse Filtering
Galic, Jovan
Markovic, Branko
Grozdic, Dorde
Popovic, Branislav
Sajic, Slavko
APPLIED SCIENCES-BASEL, 2024, 14 (18):
[29] Significance of parametric spectral ratio methods in detection and recognition of whispered speech
Arpit Mathur
Shankar M Reddy
Rajesh M Hegde
EURASIP Journal on Advances in Signal Processing, 2012
[30] Whispered Speech Recognition Using Deep Denoising Autoencoder and Inverse Filtering
Grozdic, Dorde T.
Jovicic, Slobodan T.
IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2017, 25 (12) : 2313 - 2322

← 1 2 3 4 5 →