Split-lexicon based hierarchical recognition of speech using syllable and word level acoustic units

被引:0
|
作者
Sethy, A [1 ]
Narayanan, S [1 ]
机构
[1] Univ So Calif, Speech Anal & Interpretat Lab, Integrated Media Syst Ctr, Dept Elect Engn Syst, Los Angeles, CA 90089 USA
关键词
D O I
暂无
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
Most speech recognition systems, especially LVCSR, use context dependent phones as the basic acoustic unit for recognition. The primary motive for this is the relative ease with which phone based systems can be trained robustly with small amounts of data. However as recent research indicates, significant improvements in recognition accuracy can be gained by using acoustic units of longer duration such as syllables. Syllable and other longer length units provide an efficient way for modeling long term temporal dependencies in speech which are difficult to cover in a phoneme based recognition framework. But these longer duration units suffer from training data sparsity problem since a large number of units in the lexicon will have little or no acoustic training data. In this paper we present a two step approach to address the training data sparsity problem. First we use CD phones to initialize the higher level units in a manner which minimizes the impact of training data sparsity. Subsequently we present methods to split the lexicon into units of different acoustic length based on a analysis of the training data. We present results which show that a 25-30% improvement in terms of word error rate can be acheived by using CD phone initialization and variable length unit selection on a medium vocabulary continuous speech recognition task.
引用
收藏
页码:772 / 775
页数:4
相关论文
共 50 条
  • [1] Investigating The Use Of Syllable Acoustic Units For Amharic Speech Recognition
    Dribssa, Adey Edessa
    Tachbelie, Martha Yifiru
    [J]. PROCEEDINGS OF THE 2015 12TH IEEE AFRICON INTERNATIONAL CONFERENCE - GREEN INNOVATION FOR AFRICAN RENAISSANCE (AFRICON), 2015,
  • [2] Speech recognition using syllable-like units
    Hu, ZH
    Schalkwyk, J
    Barnard, E
    Cole, R
    [J]. ICSLP 96 - FOURTH INTERNATIONAL CONFERENCE ON SPOKEN LANGUAGE PROCESSING, PROCEEDINGS, VOLS 1-4, 1996, : 1117 - 1120
  • [3] A neural network using acoustic sub-word units for continuous speech recognition
    Yu, HJ
    Oh, YH
    [J]. ICSLP 96 - FOURTH INTERNATIONAL CONFERENCE ON SPOKEN LANGUAGE PROCESSING, PROCEEDINGS, VOLS 1-4, 1996, : 506 - 509
  • [4] Boosting Thai Syllable Speech Recognition Using Acoustic Models Combination
    Tangwongsan, Supachai
    Phoophuangpairoj, Rong
    [J]. ICCEE 2008: PROCEEDINGS OF THE 2008 INTERNATIONAL CONFERENCE ON COMPUTER AND ELECTRICAL ENGINEERING, 2008, : 568 - 572
  • [5] Syllable-Based Speech Recognition Using EMG
    Lopez-Larraz, Eduardo
    Mozos, Oscar M.
    Antelis, Javier M.
    Minguez, Javier
    [J]. 2010 ANNUAL INTERNATIONAL CONFERENCE OF THE IEEE ENGINEERING IN MEDICINE AND BIOLOGY SOCIETY (EMBC), 2010, : 4699 - 4702
  • [6] Dichotic Speech Recognition Using CVC Word and Nonsense CVC Syllable Stimuli
    Findlen, Ursula M.
    Roup, Christina M.
    [J]. JOURNAL OF THE AMERICAN ACADEMY OF AUDIOLOGY, 2011, 22 (01) : 13 - 22
  • [7] Using Syllables as Acoustic Units for Spontaneous Speech Recognition
    Hejtmanek, Jan
    [J]. TEXT, SPEECH AND DIALOGUE, 2010, 6231 : 299 - 305
  • [8] Speech emotion recognition based on syllable-level feature extraction
    Rehman, Abdul
    Liu, Zhen-Tao
    Wu, Min
    Cao, Wei-Hua
    Jiang, Cheng-Shan
    [J]. APPLIED ACOUSTICS, 2023, 211
  • [9] Deep Neural Networks for Syllable based Acoustic Modeling in Chinese Speech Recognition
    Li, Xiangang
    Hong, Caifu
    Yang, Yuning
    Wu, Xihong
    [J]. 2013 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA), 2013,
  • [10] Unsupervised word discovery from speech using automatic segmentation into syllable-like units
    Rasanen, Okko
    Doyle, Gabriel
    Frank, Michael C.
    [J]. 16TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2015), VOLS 1-5, 2015, : 3204 - 3208