Research on Enhancing the Effectiveness of the Chinese Text Automatic Categorization Based on ICTCLAS Segmentation Method

被引:0
|
作者
Li, Xiangdong [1 ]
Zhang, Cheng [2 ]
机构
[1] Wuhan Univ, Sch Informat Management, Ctr Studies Informat Resources, Wuhan 430072, Peoples R China
[2] Wuhan Univ, Sch Informat Management, Wuhan 430072, Peoples R China
关键词
Chinese segmentation; text automatic categorization; classification effect; mix; high information;
D O I
暂无
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
The article proposed a method that suggest a way to replace some lower category identification capacity items from the ICTCLAS segmentation result by drawing the feature items that owns a better category identification capacity from the 2-gram segmentation result to improve the classification effect of ICTCALS segmentation method. By using KNN categorization algorithm and Naive Bayes text categorization method, it proved this way worked well on FuDan university corpus. And it also analyzed the reason why the method was relatively noneffective on the Sogou laboratory corpus through the test.
引用
收藏
页码:267 / 270
页数:4
相关论文
共 50 条
  • [21] Chinese text categorization based on CCIPCA and SMO
    Li, Xin-Fu
    He, Hai-Bin
    Zhao, Lei-Lei
    PROCEEDINGS OF 2008 INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND CYBERNETICS, VOLS 1-7, 2008, : 2514 - 2518
  • [22] Research of automatic Chinese word segmentation
    Liu, KY
    Zheng, JH
    2002 INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND CYBERNETICS, VOLS 1-4, PROCEEDINGS, 2002, : 805 - 809
  • [23] Chinese text classification without automatic word segmentation
    Liu, Wei
    Allison, Ben
    Guthrie, David
    Guthrie, Louise
    ALPIT 2007: PROCEEDINGS OF THE 6TH INTERNATIONAL CONFERENCE ON ADVANCED LANGUAGE PROCESSING AND WEB INFORMATION TECHNOLOGY, 2007, : 45 - +
  • [24] A Concept-based Model for Enhancing Text Categorization
    Shehata, Shady
    Karray, Fakhri
    Kamel, Mohamed
    KDD-2007 PROCEEDINGS OF THE THIRTEENTH ACM SIGKDD INTERNATIONAL CONFERENCE ON KNOWLEDGE DISCOVERY AND DATA MINING, 2007, : 629 - 637
  • [25] Automatic Category Theme Identification and Hierarchy Generation for Chinese Text Categorization
    Hsin-Chang Yang
    Chung-Hong Lee
    Journal of Intelligent Information Systems, 2005, 25 : 47 - 67
  • [26] Automatic category theme identification and hierarchy generation for Chinese text categorization
    Yang, HC
    Lee, CH
    JOURNAL OF INTELLIGENT INFORMATION SYSTEMS, 2005, 25 (01) : 47 - 67
  • [27] A method for automatic determination of the feature vector size for text categorization
    Fragoso, Rogerio C. P.
    Pinheiro, Roberto H. W.
    Cavalcanti, George D. C.
    PROCEEDINGS OF 2016 5TH BRAZILIAN CONFERENCE ON INTELLIGENT SYSTEMS (BRACIS 2016), 2016, : 259 - 264
  • [28] A method for automatic text categorization using word sense disambiguation
    Montes Rendon, Azucena
    Vargas A., Rocio
    Estrada Esquivel, Hugo
    Gonzalez Serna, Juan G.
    Ruiz Ascencio, Jose
    COMPUTATIONAL SCIENCE AND ITS APPLICATIONS - ICCSA 2008, PT 2, PROCEEDINGS, 2008, 5073 : 1158 - 1169
  • [29] Integration of manual and automatic text categorization. A categorization workbench for text-based email and spam
    Sun, Q
    Schommer, C
    Lang, A
    KI 2004: ADVANCES IN ARTIFICIAL INTELLIGENCE, PROCEEDINGS, 2004, 3238 : 156 - 167
  • [30] Using LSA and text segmentation to improve automatic Chinese dialogue text summarization
    Chuan-han Liu
    Yong-cheng Wang
    Fei Zheng
    De-rong Liu
    Journal of Zhejiang University-SCIENCE A, 2007, 8 : 79 - 87