Robust Word Recognition using articulatory trajectories and Gestures

被引：0

作者：

Mitra, Vikramjit ^{[1
]}

Nam, Hosung ^{[2
]}

Espy-Wilson, Carol ^{[1
]}

Saltzman, Elliot ^{[2
,3
]}

Goldstein, Louis ^{[2
,4
]}

机构：

[1] Univ Maryland, Dept Elect & Comp Eng, Syst Res Inst, College Pk, MD 20742 USA

[2] Haskins Labs Inc, New Haven, CT USA

[3] Boston Univ, Dept Phys Therapy & Athlet Training, Boston, MA USA

[4] Univ Southern Calif, Dept Linguist, Los Angeles, CA USA

来源：

11TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2010 (INTERSPEECH 2010), VOLS 3 AND 4 | 2010年

关键词：

Noise Robust Speech Recognition; Articulatory Phonology; Speech gestures; Tract Variables; TADA Model Neural Networks; Speech Inversion;

D O I：

暂无

中图分类号：

TM [电工技术]; TN [电子技术、通信技术];

学科分类号：

0808 ; 0809 ;

摘要：

Articulatory Phonology views speech as an ensemble of constricting events (e.g. narrowing lips, raising tongue tip), gestures, at distinct organs (lips, tongue tip, tongue body, velum, and glottis) along the vocal tract. This study shows that articulatory information in the form of gestures and their output trajectories (tract variable time functions or TVs) can help to improve the performance of automatic speech recognition systems. The lack of any natural speech database containing such articulatory information prompted us to use a synthetic speech dataset (obtained from Haskins Laboratories TAsk Dynamic model of speech production) that contains acoustic waveform for a given utterance and its corresponding gestures and TVs. First, we propose neural network based models to recognize the gestures and estimate the TVs from acoustic information. Second, the "synthetic-data trained" articulatory models were applied to the natural speech utterances in Aurora-2 corpus to estimate their gestures and TVs. Finally, we show that the estimated articulatory information helps to improve the noise robustness of a word recognition system when used along with the cepstral features.

引用

页码：2038 / +

页数：2

共 50 条

[21] Recognition and Prediction of Surgical Gestures and Trajectories Using Transformer Models in Robot-Assisted Surgery
Shi, Chang
Zheng, Yi
Fey, Ann Majewicz
2022 IEEE/RSJ INTERNATIONAL CONFERENCE ON INTELLIGENT ROBOTS AND SYSTEMS (IROS), 2022, : 8017 - 8024
[22] ANALYSIS OF ARTICULATORY GESTURES OF [T] AND [D]
MCGLONE, RE
PROFFIT, W
MASON, R
JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 1974, 55 (02): : 384 - 384
[23] Mapping between acoustic and articulatory gestures
Centre for Speech Technology , School of Computer Science and Communication, KTH , SE-100 44 Stockholm, Sweden
Speech Commun, 4 (567-589):
[24] CHILDRENS VERBAL AWARENESS OF ARTICULATORY GESTURES
SCHUCKERS, GH
DANILOFF, RG
CIESZKIE.G
THOMPSON, RP
JOURNAL OF COMMUNICATION DISORDERS, 1974, 7 (03) : 239 - 245
[25] Articulatory gestures are individually selected in production
Tilsen, Sam
Goldstein, Louis
JOURNAL OF PHONETICS, 2012, 40 (06) : 764 - 779
[26] Mapping between acoustic and articulatory gestures
Ananthakrishnan, G.
Engwall, Olov
SPEECH COMMUNICATION, 2011, 53 (04) : 567 - 589
[27] ADULTS AWARENESS OF CERTAIN ARTICULATORY GESTURES
RUSCELLO, DM
SHOLTIS, DM
MOREAU, VK
PERCEPTUAL AND MOTOR SKILLS, 1980, 50 (03) : 1156 - 1158
[28] Combining acoustic and articulatory feature information for robust speech recognition
Kirchhoff, K
Fink, GA
Sagerer, G
SPEECH COMMUNICATION, 2002, 37 (3-4) : 303 - 319
[29] Analysis of coarticulated speech using estimated articulatory trajectories
Sivaraman, Ganesh
Mitra, Vikramjit
Tiede, Mark
Saltzman, Elliot
Goldstein, Louis
Espy-Wilson, Carol
16TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2015), VOLS 1-5, 2015, : 369 - 373
[30] MEASUREMENT OF THE ARTICULATORY GESTURES USING COMBINATION OF NON-INVASIVE TECHNIQUES
HORIGUCHI, S
NIIMI, S
FOLIA PHONIATRICA, 1989, 41 (4-5): : 177 - 177

← 1 2 3 4 5 →