An efficient Chinese word segmentation algorithm for Chinese information processing on the Internet

被引:0
|
作者
Wong, PK [1 ]
机构
[1] Hong Kong Inst Vocat Educ Sha Tin, Dept Comp Studies, Hong Kong Vocat Training Council, Hong Kong, Hong Kong, Peoples R China
来源
INTERNET APPLICATIONS | 1999年 / 1749卷
关键词
D O I
暂无
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
A Chinese word segmentation algorithm based on forward maximum matching and word binding force is proposed in this paper. To support this algorithm, a text corpus of over 63 millions characters is employed to enrich an 80,000-words lexicon in terms of its word entries and word binding forces. As it stands now, given as input line of text, the word segmentor can process on the average 210,000 characters per second when running on an IBM RISC System/6000 3BT workstation with a correct word identification rate of 99.74%. The proposed word segmentation algorithm can be applied to process the huge amount of Chinese information on the Internet.
引用
收藏
页码:427 / 432
页数:6
相关论文
共 50 条
  • [1] Word segmentation in Chinese language processing
    Shu, Xinxin
    Wang, Junhui
    Shen, Xiaotong
    Qu, Annie
    [J]. Statistics and Its Interface, 2017, 10 (02) : 165 - 173
  • [2] Models and algorithm of Chinese word segmentation
    Wang, X
    Fu, G
    Yeung, DS
    Liu, JNK
    Luk, R
    [J]. IC-AI'2000: PROCEEDINGS OF THE INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 1-III, 2000, : 1279 - 1284
  • [3] The role of semantic information in Chinese word segmentation
    Chen, Ruqi
    Huang, Linjieqiong
    Perea, Manuel
    Li, Xingshan
    [J]. LANGUAGE COGNITION AND NEUROSCIENCE, 2024,
  • [4] Improved fast algorithm for Chinese word segmentation
    Chen, Guilin
    Wang, Yongcheng
    Han, Kesong
    Wang, Gang
    [J]. Jisuanji Yanjiu yu Fazhan/Computer Research and Development, 2000, 37 (04): : 418 - 424
  • [5] Improved fast algorithm for Chinese word segmentation
    Chen, Guilin
    Wang, Yongcheng
    Han, Kesong
    Wang, Gang
    [J]. 2000, Sci Press (37):
  • [6] Maximum likelihood algorithm on Chinese word segmentation
    Lo, WS
    Wong, PF
    Siu, MH
    [J]. 2002 6TH INTERNATIONAL CONFERENCE ON SIGNAL PROCESSING PROCEEDINGS, VOLS I AND II, 2002, : 468 - 471
  • [7] An improved Dijkstra algorithm in Chinese Word Segmentation
    Zhang Xueyan
    Xue Xiao
    Yang Shenggang
    Zhao Limei
    [J]. ITESS: 2008 PROCEEDINGS OF INFORMATION TECHNOLOGY AND ENVIRONMENTAL SYSTEM SCIENCES, PT 2, 2008, : 909 - 914
  • [8] Chinese readers utilize emotion information for word segmentation
    Huang, Linjieqiong
    Zhang, Xiangyang
    Li, Xingshan
    [J]. PSYCHONOMIC BULLETIN & REVIEW, 2024, 31 (04) : 1548 - 1557
  • [9] Chinese word segmentation and its effect on information retrieval
    Foo, S
    Li, H
    [J]. INFORMATION PROCESSING & MANAGEMENT, 2004, 40 (01) : 161 - 190
  • [10] Research for Chinese word segmentation algorithm on GPU platform
    Liu, Yong
    Luo, Liping
    Wang, Zongshui
    Huang, Dongping
    Huang, Jingxing
    Jia, Lianyin
    [J]. Journal of Computational Information Systems, 2012, 8 (15): : 6515 - 6522