Robust Word Recognition using articulatory trajectories and Gestures

被引:0
|
作者
Mitra, Vikramjit [1 ]
Nam, Hosung [2 ]
Espy-Wilson, Carol [1 ]
Saltzman, Elliot [2 ,3 ]
Goldstein, Louis [2 ,4 ]
机构
[1] Univ Maryland, Dept Elect & Comp Eng, Syst Res Inst, College Pk, MD 20742 USA
[2] Haskins Labs Inc, New Haven, CT USA
[3] Boston Univ, Dept Phys Therapy & Athlet Training, Boston, MA USA
[4] Univ Southern Calif, Dept Linguist, Los Angeles, CA USA
关键词
Noise Robust Speech Recognition; Articulatory Phonology; Speech gestures; Tract Variables; TADA Model Neural Networks; Speech Inversion;
D O I
暂无
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
Articulatory Phonology views speech as an ensemble of constricting events (e.g. narrowing lips, raising tongue tip), gestures, at distinct organs (lips, tongue tip, tongue body, velum, and glottis) along the vocal tract. This study shows that articulatory information in the form of gestures and their output trajectories (tract variable time functions or TVs) can help to improve the performance of automatic speech recognition systems. The lack of any natural speech database containing such articulatory information prompted us to use a synthetic speech dataset (obtained from Haskins Laboratories TAsk Dynamic model of speech production) that contains acoustic waveform for a given utterance and its corresponding gestures and TVs. First, we propose neural network based models to recognize the gestures and estimate the TVs from acoustic information. Second, the "synthetic-data trained" articulatory models were applied to the natural speech utterances in Aurora-2 corpus to estimate their gestures and TVs. Finally, we show that the estimated articulatory information helps to improve the noise robustness of a word recognition system when used along with the cepstral features.
引用
收藏
页码:2038 / +
页数:2
相关论文
共 50 条
  • [31] Handwritten Devanagari Word Recognition using Robust Invariant Feature Transforms
    Guruprasad, Prathima
    Guruprasad
    PROCEEDINGS OF THE 2015 INTERNATIONAL CONFERENCE ON APPLIED AND THEORETICAL COMPUTING AND COMMUNICATION TECHNOLOGY (ICATCCT), 2015, : 327 - 330
  • [32] Speech errors and articulatory gestures: an electropalatographic investigation
    Liker, Marko
    Zoric, Ana Vidovic
    SUVREMENA LINGVISTIKA, 2020, 46 (90): : 205 - 222
  • [33] Typology of mixing articulatory gestures in phonetics and phonology
    Recasens, Daniel
    LOQUENS, 2019, 6 (01):
  • [34] THE TIMING OF ARTICULATORY GESTURES - EVIDENCE FOR RELATIONAL INVARIANTS
    TULLER, B
    KELSO, JAS
    JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 1984, 76 (04): : 1030 - 1036
  • [35] Shared processing of planning articulatory gestures and grasping
    Vainio, L.
    Tiainen, M.
    Tiippana, K.
    Vainio, M.
    EXPERIMENTAL BRAIN RESEARCH, 2014, 232 (07) : 2359 - 2368
  • [36] Shared processing of planning articulatory gestures and grasping
    L. Vainio
    M. Tiainen
    K. Tiippana
    M. Vainio
    Experimental Brain Research, 2014, 232 : 2359 - 2368
  • [37] Using articulatory likelihoods in the recognition of dysarthric speech
    Rudzicz, Frank
    SPEECH COMMUNICATION, 2012, 54 (03) : 430 - 444
  • [38] Robust Modeling and Recognition of Hand Gestures with Dynamic Bayesian Network
    Suk, Heung-Il
    Sin, Bong-Kee
    Lee, Seong-Whan
    19TH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION, VOLS 1-6, 2008, : 3994 - 3997
  • [39] Speech recognition using cepstral articulatory features
    Najnin, Shamima
    Banerjee, Bonny
    SPEECH COMMUNICATION, 2019, 107 : 26 - 37
  • [40] Whole-Word Recognition from Articulatory Movements for Silent Speech Interfaces
    Wang, Jun
    Samal, Ashok
    Green, Jordan R.
    Rudzicz, Frank
    13TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2012 (INTERSPEECH 2012), VOLS 1-3, 2012, : 1326 - 1329