Split-lexicon based hierarchical recognition of speech using syllable and word level acoustic units

被引:0
|
作者
Sethy, A [1 ]
Narayanan, S [1 ]
机构
[1] Univ So Calif, Speech Anal & Interpretat Lab, Integrated Media Syst Ctr, Dept Elect Engn Syst, Los Angeles, CA 90089 USA
关键词
D O I
暂无
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
Most speech recognition systems, especially LVCSR, use context dependent phones as the basic acoustic unit for recognition. The primary motive for this is the relative ease with which phone based systems can be trained robustly with small amounts of data. However as recent research indicates, significant improvements in recognition accuracy can be gained by using acoustic units of longer duration such as syllables. Syllable and other longer length units provide an efficient way for modeling long term temporal dependencies in speech which are difficult to cover in a phoneme based recognition framework. But these longer duration units suffer from training data sparsity problem since a large number of units in the lexicon will have little or no acoustic training data. In this paper we present a two step approach to address the training data sparsity problem. First we use CD phones to initialize the higher level units in a manner which minimizes the impact of training data sparsity. Subsequently we present methods to split the lexicon into units of different acoustic length based on a analysis of the training data. We present results which show that a 25-30% improvement in terms of word error rate can be acheived by using CD phone initialization and variable length unit selection on a medium vocabulary continuous speech recognition task.
引用
收藏
页码:772 / 775
页数:4
相关论文
共 50 条
  • [41] ISOLATED WORD SPEECH RECOGNITION USING A NEURAL NETWORK BASED SOURCE MODEL
    LEE, GE
    TATTERSALL, GD
    SMYTH, SG
    [J]. BT TECHNOLOGY JOURNAL, 1992, 10 (03): : 38 - 47
  • [42] SPEAKER-INDEPENDENT ISOLATED WORD RECOGNITION BASED ON SPLIT METHOD USING MULTIPLE WORD TEMPLATES.
    Sugamura, Noboru
    Shikano, Kiyohiro
    Aikawa, Kiyoaki
    Kohda, Masaki
    [J]. Denki Tsushin Kenkyujo kenkyu jitsuyoka hokoku, 1985, 34 (12): : 1687 - 1695
  • [43] WISE: Word-Level Interaction-Based Multimodal Fusion for Speech Emotion Recognition
    Shen, Guang
    Lai, Riwei
    Chen, Rui
    Zhang, Yu
    Zhang, Kejia
    Han, Qilong
    Song, Hongtao
    [J]. INTERSPEECH 2020, 2020, : 369 - 373
  • [44] Combining multiple-sized sub-word units in a speech recognition system using baseform selection
    Nagarajan, T.
    Vijayalakshmi, P.
    O'Shaughnessy, Douglas
    [J]. INTERSPEECH 2006 AND 9TH INTERNATIONAL CONFERENCE ON SPOKEN LANGUAGE PROCESSING, VOLS 1-5, 2006, : 1595 - 1597
  • [45] DNN-Based Acoustic Modeling for Russian Speech Recognition Using Kaldi
    Kipyatkova, Irina
    Karpov, Alexey
    [J]. SPEECH AND COMPUTER, 2016, 9811 : 246 - 253
  • [46] Personalized Speech Recognizer with Keyword-based Personalized Lexicon and Language Model using Word Vector Representations
    Yeh, Ching-Feng
    Liou, Yuan-Ming
    Lee, Hung-Yi
    Lee, Lin-shan
    [J]. 16TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2015), VOLS 1-5, 2015, : 3521 - 3525
  • [47] SPEECH EMOTION RECOGNITION WITH CO-ATTENTION BASED MULTI-LEVEL ACOUSTIC INFORMATION
    Zou, Heqing
    Si, Yuke
    Chen, Chen
    Rajan, Deepu
    Chng, Eng Siong
    [J]. 2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2022, : 7367 - 7371
  • [48] Prosodic Word-Based Error Correction in Speech Recognition Using Prosodic Word Expansion and Contextual Information
    Liu, Chao-Hong
    Wu, Chung-Hsien
    [J]. 11TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2010 (INTERSPEECH 2010), VOLS 1-2, 2010, : 1385 - 1388
  • [49] Phoneme and Word Based Model for Tamil Speech Recognition using GMM-HMM
    Karpagavalli, S.
    Chandra, E.
    [J]. ICACCS 2015 PROCEEDINGS OF THE 2ND INTERNATIONAL CONFERENCE ON ADVANCED COMPUTING & COMMUNICATION SYSTEMS, 2015,
  • [50] Acoustic Model Adaptation for Emotional Speech Recognition Using Twitter-Based Emotional Speech Corpus
    Kosaka, Tetsuo
    Aizawa, Yoshitaka
    Kato, Masaharu
    Nose, Takashi
    [J]. 2018 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA ASC), 2018, : 1747 - 1751