An Unsupervised Learning and Statistical Approach for Vietnamese Word Recognition and Segmentation

被引:0
|
作者
Trung, Hieu Le [1 ]
Vu Le Anh [2 ]
Trung, Kien Le [3 ]
机构
[1] St Petersburg State Univ, St Petersburg, Russia
[2] Hoa Sen Univ, Ho Chi Minh City, Vietnam
[3] Ernst Moritz Arndt Univ Greifswald, Inst Math, Greifswald, Germany
来源
INTELLIGENT INFORMATION AND DATABASE SYSTEMS, PT II, PROCEEDINGS | 2010年 / 5991卷
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
There are two main topics in this paper: (i) Vietnamese words are recognized and sentences are segmented into words by using probabilistic models; (ii) the optimum probabilistic model is constructed by an unsupervised learning processing. For each probabilistic model, new words are recognized and their syllables are linked together. The syllable-linking process improves the accuracy of statistical functions which improves contrarily the new words recognition. Hence, the probabilistic model will converge to the optimum one. Our experimented corpus is generated from about 250.000 online news articles, which consist of about 19.000.000 sentences. The accuracy of the segmented algorithm is over 90%. Our Vietnamese word and phrase dictionary contains more than 150.000 elements.
引用
收藏
页码:195 / +
页数:2
相关论文
共 50 条
  • [21] The link between statistical segmentation and word learning in adults
    Mirman, Daniel
    Magnuson, James S.
    Estes, Katharine Graf
    Dixon, James A.
    COGNITION, 2008, 108 (01) : 271 - 280
  • [22] Predicting Reaction Times in Word Recognition by Unsupervised Learning of Morphology
    Virpioja, Sami
    Lehtonen, Minna
    Hulten, Annika
    Salmelin, Riitta
    Lagus, Krista
    ARTIFICIAL NEURAL NETWORKS AND MACHINE LEARNING - ICANN 2011, PT I, 2011, 6791 : 275 - +
  • [23] Chinese Word Segmentation of Ideological and Political Education Based on Unsupervised Learning
    Zang, Wen-jing
    Yang, Xing-hai
    Liu, Zi-zhao
    Zhang, Yu-lin
    PROCEEDINGS OF 2019 2ND INTERNATIONAL CONFERENCE ON BIG DATA TECHNOLOGIES (ICBDT 2019), 2019, : 109 - 113
  • [24] A Lexicon-Corpus-based Unsupervised Chinese Word Segmentation Approach
    Lu Pengyu
    Pu Jingchuan
    Du Mingming
    Lou Xiaojuan
    Jin Lijun
    INTERNATIONAL JOURNAL ON SMART SENSING AND INTELLIGENT SYSTEMS, 2014, 7 (01): : 263 - 282
  • [25] Unsupervised microstructure segmentation by mimicking metallurgists’ approach to pattern recognition
    Hoheok Kim
    Junya Inoue
    Tadashi Kasuya
    Scientific Reports, 10
  • [26] Unsupervised microstructure segmentation by mimicking metallurgists' approach to pattern recognition
    Kim, Hoheok
    Inoue, Junya
    Kasuya, Tadashi
    SCIENTIFIC REPORTS, 2020, 10 (01)
  • [27] An unsupervised statistical representation learning method for human activity recognition
    Abdi, Mohammad Foad
    BabaAli, Bagher
    Momeni, Saleh
    SIGNAL IMAGE AND VIDEO PROCESSING, 2024, 18 (10) : 7041 - 7052
  • [28] Word Segmentation as Unsupervised Constituency Parsing
    Alhama, Raquel G.
    PROCEEDINGS OF THE 60TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2022), VOL 1: (LONG PAPERS), 2022, : 4103 - 4112
  • [29] Contextual Dependencies in Unsupervised Word Segmentation
    Goldwater, Sharon
    Griffiths, Thomas L.
    Johnson, Mark
    COLING/ACL 2006, VOLS 1 AND 2, PROCEEDINGS OF THE CONFERENCE, 2006, : 673 - 680
  • [30] Word segmentation of Vietnamese texts: a comparison of approaches
    Dinh Quang Thang
    Le Hong Phuong
    Nguyen Thi Minh Huyen
    Nguyen Cam Tu
    Rossignol, Mathias
    Vu Xuan Luong
    SIXTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, LREC 2008, 2008, : 1933 - 1936