An improved Dijkstra algorithm in Chinese Word Segmentation

被引:0
|
作者
Zhang Xueyan [1 ]
Xue Xiao
Yang Shenggang
Zhao Limei [1 ]
机构
[1] NningBo TV & Radio Univ, Dept Informat Technol, Ningbo, Zhejiang, Peoples R China
来源
ITESS: 2008 PROCEEDINGS OF INFORMATION TECHNOLOGY AND ENVIRONMENTAL SYSTEM SCIENCES, PT 2 | 2008年
关键词
Chinese word segmentation; APSP; Precision; recall;
D O I
暂无
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
This paper presents a pragmatic approach to Chinese word segmentation. It applies an improved Dijkstra algorithm to segment Chinese character stream. This research uses three steps to process an input Chinese character stream. First, find all possible Chinese word candidates based on the dictionary. Second, apply an improved Dijkstra's APSP theory to find a word path which costs the least fee. It is accomplished by computing all these word candidates' best left neighbor and their aggregate fee. During this procedure, the key step is to find the last candidate word which has the minimum aggregate fee from all last possible candidate words. Third, find all continuous single-character candidate words in the result of step two to check if they can be an entity name. The third step is the main difference from most of other previous works; it does improve the efficiency of recognizing entity names and also improve the precision of recognizing new words. This approach has been implemented in a search engine demo system based on Lucene. It consists of lexicon word processing and OOV (out of lexicon words) recognition. Experiments show that the algorithm does work well.
引用
收藏
页码:909 / 914
页数:6
相关论文
共 50 条
  • [1] Improved fast algorithm for Chinese word segmentation
    Chen, Guilin
    Wang, Yongcheng
    Han, Kesong
    Wang, Gang
    Jisuanji Yanjiu yu Fazhan/Computer Research and Development, 2000, 37 (04): : 418 - 424
  • [2] Improved fast algorithm for Chinese word segmentation
    Chen, Guilin
    Wang, Yongcheng
    Han, Kesong
    Wang, Gang
    2000, Sci Press (37):
  • [3] Research on Improved Algorithm for Chinese Word Segmentation Based on Markov Chain
    Pang Baomao
    Shi Haoshan
    FIFTH INTERNATIONAL CONFERENCE ON INFORMATION ASSURANCE AND SECURITY, VOL 1, PROCEEDINGS, 2009, : 236 - 238
  • [4] Models and algorithm of Chinese word segmentation
    Wang, X
    Fu, G
    Yeung, DS
    Liu, JNK
    Luk, R
    IC-AI'2000: PROCEEDINGS OF THE INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 1-III, 2000, : 1279 - 1284
  • [5] An improved automatic Chinese word segmentation mechanism
    Wang, Hu
    Wang, Qianping
    RECENT ADVANCE OF CHINESE COMPUTING TECHNOLOGIES, 2007, : 147 - 150
  • [6] Maximum likelihood algorithm on Chinese word segmentation
    Lo, WS
    Wong, PF
    Siu, MH
    2002 6TH INTERNATIONAL CONFERENCE ON SIGNAL PROCESSING PROCEEDINGS, VOLS I AND II, 2002, : 468 - 471
  • [7] An Improved Chinese Segmentation Algorithm Based on Segmentation Dictionary
    Niu, Yan
    Li, Lala
    PROCEEDINGS OF THE 2009 INTERNATIONAL CONFERENCE ON COMPUTER TECHNOLOGY AND DEVELOPMENT, VOL 1, 2009, : 184 - 187
  • [8] An Improved Embedding Matching Model for Chinese Word Segmentation
    Deng, Xiaolong
    Sun, Yingfei
    2018 INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND BIG DATA (ICAIBD), 2018, : 195 - 200
  • [9] Design of Chinese word segmentation system based on improved Chinese converse dictionary and reverse maximum matching algorithm
    Zhang, Liyi
    Li, Yazi
    Meng, Jian
    WEB INFORMATION SYSTEMS - WISE 2006 WORKSHOPS, PROCEEDINGS, 2006, 4256 : 171 - 181
  • [10] Chinese Word Segmentation Based on Improved Double Hashtable
    Shao, Hong
    Sun, Huayu
    Cui, Wencheng
    FIFTH INTERNATIONAL CONFERENCE ON MACHINE VISION (ICMV 2012): COMPUTER VISION, IMAGE ANALYSIS AND PROCESSING, 2013, 8783