DISCRIMINATIVELY ESTIMATED JOINT ACOUSTIC, DURATION, AND LANGUAGE MODEL FOR SPEECH RECOGNITION

被引:13
|
作者
Lehr, Maider [1 ]
Shafran, Izhak [1 ]
机构
[1] Oregon Hlth & Sci Univ, Ctr Spoken Language Understanding, Portland, OR 97201 USA
关键词
discriminative modeling; language modeling; acoustic modeling; duration modeling;
D O I
10.1109/ICASSP.2010.5495227
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
We introduce a discriminative model for speech recognition that integrates acoustic, duration and language components. In the framework of finite state machines, a general model for speech recognition G is a finite state transduction from acoustic state sequences to word sequences (e. g., search graph in many speech recognizers). The lattices from a baseline recognizer can be viewed as an a posteriori version of G after having observed an utterance. So far, discriminative language models have been proposed to correct the output side of G and is applied on the lattices. The acoustic state sequences on the input side of these lattice can also be exploited to improve the choice of the best hypotheses through the lattice. Taking this view, the model proposed in this paper jointly estimates the parameters for acoustic and language components in a discriminative setting. The resulting model can be factored as corrections for the input and the output sides of the general model G. This formulation allows us to incorporate duration cues seamlessly. Empirical results on a large vocabulary Arabic GALE task demonstrate that the proposed model improves word error rate substantially, with a gain of 1.6% absolute. Through a series of experiments we analyze the contributions from and interactions between acoustic, duration and language components to find that duration cues play an important role in Arabic task.
引用
收藏
页码:5542 / 5545
页数:4
相关论文
共 50 条
  • [1] DISCRIMINATIVELY ESTIMATED DISCRETE, PARAMETRIC AND SMOOTHED-DISCRETE DURATION MODELS FOR SPEECH RECOGNITION
    Lehr, Maider
    Shafran, Izhak
    [J]. 2011 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2011, : 5340 - 5343
  • [2] Joint acoustic and language modeling for speech recognition
    Chien, Jen-Tzung
    Chueh, Chuang-Hua
    [J]. SPEECH COMMUNICATION, 2010, 52 (03) : 223 - 235
  • [3] Discriminatively Trained Dependency Language Modeling for Conversational Speech Recognition
    Lambert, Benjamin
    Raj, Bhiksha
    Singh, Rita
    [J]. 14TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2013), VOLS 1-5, 2013, : 3381 - 3385
  • [4] Speech recognition based on unified model of acoustic and language aspects of speech
    [J]. 1600, Nippon Telegraph and Telephone Corp. (11):
  • [5] Discriminatively trained continuous Hindi speech recognition using integrated acoustic features and recurrent neural network language modeling
    Kumar, A.
    Aggarwal, R. K.
    [J]. JOURNAL OF INTELLIGENT SYSTEMS, 2021, 30 (01) : 165 - 179
  • [6] Joint Training of Speech Separation, Filterbank and Acoustic Model for Robust Automatic Speech Recognition
    Wang, Zhong-Qiu
    Wang, DeLiang
    [J]. 16TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2015), VOLS 1-5, 2015, : 2839 - 2843
  • [7] FMPE: Discriminatively trained features for speech recognition
    Povey, D
    Kingsbury, B
    Mangu, L
    Saon, G
    Soltau, H
    Zweig, G
    [J]. 2005 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOLS 1-5: SPEECH PROCESSING, 2005, : 961 - 964
  • [8] ACOUSTIC AND LANGUAGE PROCESSING TECHNOLOGY FOR SPEECH RECOGNITION
    MATSUOKA, T
    MINAMI, Y
    [J]. NTT REVIEW, 1995, 7 (02): : 30 - 39
  • [9] A novel duration model for speech recognition
    Yuan, Lichi
    Wan, Changxuan
    [J]. PROCEEDINGS OF THE FOURTH IASTED INTERNATIONAL CONFERENCE ON CIRCUITS, SIGNALS, AND SYSTEMS, 2006, : 279 - +
  • [10] Acoustic Model Adaptation for Speech Recognition
    Shinoda, Koichi
    [J]. IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 2010, E93D (09): : 2348 - 2362