ROBUST SPEAKING RATE ESTIMATION USING BROAD PHONETIC CLASS RECOGNITION

被引：21

作者：

Yuan, Jiahong ^{[1
]}

Liberman, Mark ^{[1
]}

机构：

[1] Univ Penn, Philadelphia, PA 19104 USA

来源：

2010 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING | 2010年

关键词：

Speaking rate estimation; syllable detection; robustness; broad phonetic class; SPEECH;

D O I：

10.1109/ICASSP.2010.5495686

中图分类号：

O42 [声学];

学科分类号：

070206 ; 082403 ;

摘要：

Robust speaking rate estimation can be useful in automatic speech recognition and speaker identification, and accurate, automatic measures of speaking rate are also relevant for research in linguistics, psychology, and social sciences. In this study we built a broad phonetic class recognizer for speaking rate estimation. We tested the recognizer on a variety of data sets, including laboratory speech, telephone conversations, foreign accented speech, and speech in different languages, and we found that the recognizer's estimates are robust under these sources of variation. We also found that the acoustic models of the broad phonetic classes are more robust than those of the monophones for syllable detection.

引用

页码：4222 / 4225

页数：4

共 50 条

[31] Introduction of the speaking rate in the model of speech recognition
Yousfi, A
Meziane, A
[J]. INTERNATIONAL CONFERENCE ON PARALLEL COMPUTING IN ELECTRICAL ENGINEERING - PARELEC 2000, PROCEEDINGS, 2000, : 64 - 66
[32] Pre-recognition measures of speaking rate
Samudravijaya, K
Singh, SK
Rao, PVS
[J]. SPEECH COMMUNICATION, 1998, 24 (01) : 73 - 84
[33] Information theoretic optimal vocal tract region selection from real time magnetic resonance images for broad phonetic class recognition
Prasad, Abhay
Ghosh, Prasanta Kumar
[J]. COMPUTER SPEECH AND LANGUAGE, 2016, 39 : 108 - 128
[34] Probabilistic Class Histogram Equalization Based on Posterior Mean Estimation for Robust Speech Recognition
Suh, Youngjoo
Kim, Hoirin
[J]. IEEE SIGNAL PROCESSING LETTERS, 2015, 22 (12) : 2421 - 2424
[35] Perceptual speech processing and phonetic feature mapping for robust vowel recognition
Bu, LK
Chiueh, TD
[J]. IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING, 2000, 8 (02): : 105 - 114
[36] Dialect Recognition using Adapted Phonetic Models
Shen, Wade
Chen, Nancy
Reynolds, Douglas
[J]. INTERSPEECH 2008: 9TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2008, VOLS 1-5, 2008, : 763 - 766
[37] Island-Driven Search Using Broad Phonetic Classes
Sainath, Tara N.
[J]. 2009 IEEE WORKSHOP ON AUTOMATIC SPEECH RECOGNITION & UNDERSTANDING (ASRU 2009), 2009, : 287 - 292
[38] ROBUST PARTIAL FACE RECOGNITION USING INSTANCE-TO-CLASS DISTANCE
Hu, Junlin
Lu, Jiwen
Tan, Yap-Peng
[J]. 2013 IEEE INTERNATIONAL CONFERENCE ON VISUAL COMMUNICATIONS AND IMAGE PROCESSING (IEEE VCIP 2013), 2013,
[39] Convex Weighting Criteria for Speaking Rate Estimation
Jiao, Yishan
Berisha, Visar
Tu, Ming
Liss, Julie
[J]. IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2015, 23 (09) : 1421 - 1430
[40] PHONETIC PROTOTYPES - INFLUENCE OF PLACE OF ARTICULATION AND SPEAKING RATE ON THE INTERNAL STRUCTURE OF VOICING CATEGORIES
VOLAITIS, LE
MILLER, JL
[J]. JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 1992, 92 (02): : 723 - 735

← 1 2 3 4 5 →