A usage of the syllable unit based on morphological statistics in Korean large vocabulary continuous speech recognition system

被引:4
|
作者
Ri, Hyok-Chol [1 ]
机构
[1] KIM IL SUNG Univ, Coll Informat Sci, Pyongyang, North Korea
关键词
Recognition unit; Language model; Morpheme; Syllable;
D O I
10.1007/s10772-019-09637-2
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
In large vocabulary continuous speech recognition (LVCSR), it is important in improving the system's performance to determine reasonably the recognition unit. In Korean continuous speech recognition, a morph rather than a word is used basically as the recognition unit due to Korean's agglutinative property and a good performance is provided by combining high-frequency morph sequences, which leading to an increase of vocabulary size and high out-of-vocabulary (OOV) rate. Sub-lexical units such as a syllable and a graphone are widely used for inflectional languages, while they have not been introduced successfully for Korean speech recognition, due to a weakness of their linguistic information. In this paper, we investigate a usage of a syllable unit to resolve a mismatch problem between the recognition unit and vocabulary size that have occurred frequently in Korean large vocabulary speech recognition. We apply the local segmentation into syllables based on morphological statistics and perform experiments using the language model (LM) constructed from mixed unit types of morpheme, combined morpheme and syllable. By the proposed model, an absolute reduction of around 0.4% in word error rate (WER) is obtained compared to a traditional LM consisting of morphemes and combined morphemes.
引用
收藏
页码:971 / 977
页数:7
相关论文
共 50 条
  • [1] A usage of the syllable unit based on morphological statistics in Korean large vocabulary continuous speech recognition system
    Hyok-Chol Ri
    [J]. International Journal of Speech Technology, 2019, 22 : 971 - 977
  • [2] Syllable-based large vocabulary continuous speech recognition
    Ganapathiraju, A
    Hamaker, J
    Picone, J
    Ordowski, M
    Doddington, GR
    [J]. IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING, 2001, 9 (04): : 358 - 366
  • [3] Syllable based language model for large vocabulary continuous speech recognition of Uyghur
    [J]. Silamu, W. (wushour@xju.edu.cn), 1600, Tsinghua University (53):
  • [4] Syllable Based Language Model for Large Vocabulary Continuous Speech Recognition of Polish
    Majewski, Piotr
    [J]. TEXT, SPEECH AND DIALOGUE, PROCEEDINGS, 2008, 5246 : 397 - 401
  • [5] Korean large vocabulary continuous speech recognition with morpheme-based recognition units
    Kwon, OW
    Park, J
    [J]. SPEECH COMMUNICATION, 2003, 39 (3-4) : 287 - 300
  • [6] The RWTH large vocabulary continuous speech recognition system
    Ney, H
    Welling, L
    Ortmanns, S
    Beulen, K
    Wessel, F
    [J]. PROCEEDINGS OF THE 1998 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING, VOLS 1-6, 1998, : 853 - 856
  • [7] A Myanmar Large Vocabulary Continuous Speech Recognition System
    Naing, Hay Mar Soe
    Hlaing, Aye Mya
    Pa, Win Pa
    Hu, Xinhui
    Thu, Ye Kyaw
    Hori, Chiori
    Kawai, Hisashi
    [J]. 2015 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA), 2015, : 320 - 327
  • [8] Morpheme-based modeling of pronunciation variation for large vocabulary continuous speech recognition in Korean
    Lee, Kyong-Nim
    Chung, Minhwa
    [J]. IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 2007, E90D (07) : 1063 - 1072
  • [9] A large vocabulary continuous speech recognition system for Persian language
    Sameti, Hossein
    Veisi, Hadi
    Bahrani, Mohammad
    Babaali, Bagher
    Hosseinzadeh, Khosro
    [J]. EURASIP JOURNAL ON AUDIO SPEECH AND MUSIC PROCESSING, 2011, : 1 - 12
  • [10] A LARGE VOCABULARY CONTINUOUS SPEECH RECOGNITION SYSTEM WITH HIGH PREDICTABILITY
    SHIGENAGA, M
    SEKIGUCHI, Y
    YAMAGUCHI, T
    MASUDA, R
    [J]. IEICE TRANSACTIONS ON COMMUNICATIONS ELECTRONICS INFORMATION AND SYSTEMS, 1991, 74 (07): : 1817 - 1825