Chinese word segmentation and its effect on information retrieval

被引:51
|
作者
Foo, S [1 ]
Li, H [1 ]
机构
[1] Nanyang Technol Univ, Sch Commun & Informat, Div Informat Studies, Singapore 637718, Singapore
关键词
Chinese; information retrieval; word segmentation; retrieval effectiveness;
D O I
10.1016/S0306-4573(02)00079-1
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
A set of IR experiments was carried out to study the impact of Chinese word segmentation and its effect on information retrieval (IR) at the Division of Information Studies, Nanyang Technological University, Singapore. A total of four automatic character-based segmentation approaches and a manual word segmentation approach was first carried out to obtain the word segments for indexing and to evaluate the segmentation accuracy of these automatic approaches. The IR experiments study both the influence of different document segmentation approaches on IR effectiveness and the methods used for query segmentation. Traditional data recall and precision measures were used to gauge IR effectiveness. A number of queries were selected and subjected to further detailed analysis to further explore the influence of word segmentation on IR. The findings reveal that the segmentation approach has an effect on IR effectiveness. Better IR results are obtained by using the same method for query and document processing as this increase the probability of the query-document match. The recognition of a higher number of 2-character words generally contributes to the improvement of IR effectiveness. However, manual segmentation does not always work better than character-based segmentation as a result of the existence of longer words with more than two characters. No evidence is found that ambiguous words resulting from the segmentation process significantly affect IR. (C) 2002 Elsevier Ltd. All rights reserved.
引用
收藏
页码:161 / 190
页数:30
相关论文
共 50 条
  • [1] Information retrieval oriented adaptive Chinese word segmentation system
    School of Computer Science and Engineering, Beihang University, Beijing 100083, China
    [J]. Ruan Jian Xue Bao, 2006, 3 (356-363):
  • [2] The role of semantic information in Chinese word segmentation
    Chen, Ruqi
    Huang, Linjieqiong
    Perea, Manuel
    Li, Xingshan
    [J]. LANGUAGE COGNITION AND NEUROSCIENCE, 2024,
  • [3] Decryption of Full Text Retrieval Technology: Chinese Word Segmentation
    Lu, Xuebing
    Xu, Yili
    Deng, Weiwei
    Yan, Yingjie
    [J]. PROCEEDINGS OF THE 2016 2ND INTERNATIONAL CONFERENCE ON MATERIALS ENGINEERING AND INFORMATION TECHNOLOGY APPLICATIONS (MEITA 2016), 2017, 107 : 334 - 337
  • [4] Chinese readers utilize emotion information for word segmentation
    Huang, Linjieqiong
    Zhang, Xiangyang
    Li, Xingshan
    [J]. PSYCHONOMIC BULLETIN & REVIEW, 2024, 31 (04) : 1548 - 1557
  • [5] An efficient Chinese word segmentation algorithm for Chinese information processing on the Internet
    Wong, PK
    [J]. INTERNET APPLICATIONS, 1999, 1749 : 427 - 432
  • [6] Semantic Web Services Retrieval Model Based on Chinese Word Segmentation
    Huang Ying-hui
    Li Guan-yu
    Mu Shuai
    [J]. ALPIT 2008: SEVENTH INTERNATIONAL CONFERENCE ON ADVANCED LANGUAGE PROCESSING AND WEB INFORMATION TECHNOLOGY, PROCEEDINGS, 2008, : 550 - 555
  • [7] A Novel Chinese Word Segmentation Method Utilizing Morphology Information
    Xu Shuona
    Zeng Biqing
    [J]. SOFTWARE ENGINEERING AND KNOWLEDGE ENGINEERING: THEORY AND PRACTICE, VOL 1, 2012, 114 : 321 - 328
  • [8] The Research on Chinese Word Segmentation System with Semantic Annotations Information
    Cheng, Xian-Yi
    Kang, Wei
    Zhang, Jie
    Shi, Quan
    [J]. PROCEEDINGS OF THE 2ND INTERNATIONAL CONFERENCE ON SOFT COMPUTING IN INFORMATION COMMUNICATION TECHNOLOGY, 2014, : 195 - 199
  • [9] The Research of Chinese Word Segmentation Disambiguation Dased on Context Information
    Mai Fanjin
    Le, Zhao
    Ling, Huang
    [J]. ICIIP'18: PROCEEDINGS OF THE 3RD INTERNATIONAL CONFERENCE ON INTELLIGENT INFORMATION PROCESSING, 2018, : 101 - 106
  • [10] A heuristic approach for segmentation granularity problem in Chinese information retrieval
    Fan, Ding
    Bin, Wang
    Sili, Wang
    [J]. ALPIT 2007: PROCEEDINGS OF THE 6TH INTERNATIONAL CONFERENCE ON ADVANCED LANGUAGE PROCESSING AND WEB INFORMATION TECHNOLOGY, 2007, : 87 - +