Mongolian word segmentation system based on unsupervised statistical model

被引:0
|
作者
Wang, Siriguleng [1 ]
Bao, Meirong [1 ]
Arong [1 ]
机构
[1] Inner Mongolia Normal Univ, Coll Comp & Informat Engn, Hohhot, Inner Mongolia, Peoples R China
关键词
Mongolian word segmentation; unsupervised statistical model; Machine translation;
D O I
10.2495/ISME20130911
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
This article attempts to build segmentation system combined affix library with unsupervised statistical model. We solved the problem of over segmentation and incomplete segmentation that was caused by the use of unsupervised model by postprocessing Mongolian words with Mongolian reduction rule and the accuracy improved obviously. In 500 sentences test set, system accuracy increased from 62.6% to 76.1%. The biggest advantage of this segmentation system is that we could use raw corpus for training directly.
引用
收藏
页码:707 / 714
页数:8
相关论文
共 50 条
  • [1] Mongolian word segmentation based on statistical language model
    Hou, Hong-Xu
    Liu, Qun
    Nasanurtu
    Murengaowa
    Li, Jin-Tao
    Moshi Shibie yu Rengong Zhineng/Pattern Recognition and Artificial Intelligence, 2009, 22 (01): : 108 - 112
  • [2] UNSUPERVISED WORD SEGMENTATION BASED ON WORD INFLUENCE
    Yan, Ruohao
    Zhang, Huaping
    Silamu, Wushour
    Hamdulla, Askar
    ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings, 2023,
  • [3] An Unsupervised Learning and Statistical Approach for Vietnamese Word Recognition and Segmentation
    Trung, Hieu Le
    Vu Le Anh
    Trung, Kien Le
    INTELLIGENT INFORMATION AND DATABASE SYSTEMS, PT II, PROCEEDINGS, 2010, 5991 : 195 - +
  • [4] A Refined HDP-Based Model for Unsupervised Chinese Word Segmentation
    Pei, Wenzhe
    Han, Dongxu
    Chang, Baobao
    CHINESE COMPUTATIONAL LINGUISTICS AND NATURAL LANGUAGE PROCESSING BASED ON NATURALLY ANNOTATED BIG DATA, 2013, 8208 : 44 - 51
  • [5] An Unsupervised Topic Segmentation Model Incorporating Word Order
    Jameel, Shoaib
    Lam, Wai
    SIGIR'13: THE PROCEEDINGS OF THE 36TH INTERNATIONAL ACM SIGIR CONFERENCE ON RESEARCH & DEVELOPMENT IN INFORMATION RETRIEVAL, 2013, : 203 - 212
  • [6] Statistical-based approach to word segmentation
    Wang, YL
    Phillips, IT
    Haralick, R
    15TH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION, VOL 4, PROCEEDINGS: APPLICATIONS, ROBOTICS SYSTEMS AND ARCHITECTURES, 2000, : 555 - 558
  • [7] Unsupervised segmentation of natural images based on statistical modeling
    Zhu, Zhong-jie
    Wang, Yu-er
    Jiang, Gang-yi
    NEUROCOMPUTING, 2017, 252 : 95 - 101
  • [8] Chinese to Braille Translation Based on Braille Word Segmentation Using Statistical Model
    王向东
    杨阳
    张金超
    姜文斌
    刘宏
    钱跃良
    JournalofShanghaiJiaotongUniversity(Science), 2017, 22 (01) : 82 - 86
  • [9] Chinese to Braille translation based on Braille word segmentation using statistical model
    Wang X.
    Yang Y.
    Zhang J.
    Jiang W.
    Liu H.
    Qian Y.
    Wang, Xiangdong (xdwang@ict.ac.cn), 1600, Shanghai Jiaotong University (22): : 82 - 86
  • [10] A New Unsupervised Approach to Word Segmentation
    Wang, Hanshi
    Zhu, Jian
    Tang, Shiping
    Fan, Xiaozhong
    COMPUTATIONAL LINGUISTICS, 2011, 37 (03) : 421 - 454