Automatic Chinese Text Categorization System Based on Mutual Information

被引:0
|
作者
Lu, Zhimao [1 ]
Shi, Hong [1 ]
Zhang, Qi [1 ]
Yuan, Chaoyue [1 ]
机构
[1] Harbin Engn Univ, Informat & Commun Engn Coll, Harbin, Heilongjiang Pr, Peoples R China
关键词
Automatic Text Categorization; Feature Selection; Mutual Information; KNN; SVM;
D O I
暂无
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Feature selection is a key step in automatic text categorization system and it has a significant impact on classification result. In this paper we do research on mutual information (MI) which is one basic method of feature selection. Firstly, we found out three main problems of MI by analyzing the formula of MI theoretically and systematically - the MI loss, the information difference among categories, and the excessive emphasis on low-frequency terms. Then, to solve these three questions, we proposed an improved feature selection method by calculating the absolute values of MI and calculating the differential values between maximum and average of MI. At last, we tested our method using K-Nearest Neighbor (KNN) classifier and Support Vector Machine (SVM) classifier respectively, and we also compared it with the original method on Chinese corpus. The results demonstrate the effectiveness and feasibility of the proposed method.
引用
收藏
页码:4986 / 4990
页数:5
相关论文
共 50 条
  • [21] Multi-modal Chinese Text Emotion Metaphor Computation Based on Mutual Information and Information Entropy
    Zeng, Zhifa
    Li, Yuhang
    [J]. ACM TRANSACTIONS ON ASIAN AND LOW-RESOURCE LANGUAGE INFORMATION PROCESSING, 2024, 23 (08) : 1 - 18
  • [22] Chinese Text Categorization Based on Deep Belief Networks
    Song, Jia
    Qin, Sijun
    Zhang, Pengzhou
    [J]. 2016 IEEE/ACIS 15TH INTERNATIONAL CONFERENCE ON COMPUTER AND INFORMATION SCIENCE (ICIS), 2016, : 1123 - 1127
  • [23] Chinese text categorization based on fuzzy association rules
    Yuan, Fang
    Guo, Yu-Qin
    Yang, Liu
    Yang, Fan
    [J]. PROCEEDINGS OF 2006 INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND CYBERNETICS, VOLS 1-7, 2006, : 1030 - +
  • [24] The contribution of mutual information in the intonational phrase prediction in chinese text
    Hu, GP
    Chen, BF
    Fan, M
    Wang, RH
    [J]. 2003 INTERNATIONAL CONFERENCE ON NATURAL LANGUAGE PROCESSING AND KNOWLEDGE ENGINEERING, PROCEEDINGS, 2003, : 407 - 412
  • [25] KEY PROBLEMS IN CATEGORIZATION OF CONTRACT TEXT BASED ON INFORMATION
    Yavorska, I. Y.
    [J]. ACTUAL PROBLEMS OF ECONOMICS, 2009, (100): : 283 - 288
  • [26] Automatic derivation of a phoneme set with tone information for Chinese speech recognition based on mutual information criterion
    Zhang, Jin-Song
    Hu, Xin-Hui
    Nakamura, Satohi
    [J]. 2006 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING, VOLS 1-13, 2006, : 337 - 340
  • [27] Chinese Text Classification System on Regulatory Information Based on SVM
    Zhang, Mingyue
    Ai, Xinbo
    Hu, Yanzhu
    [J]. 2018 4TH INTERNATIONAL CONFERENCE ON ENVIRONMENTAL SCIENCE AND MATERIAL APPLICATION, 2019, 252
  • [28] Gaussian Process Based Text Categorization for Healthy Information
    Chen, Sih-Huei
    Lee, Yuan-Shan
    Tai, Tzu-Chiang
    Wang, Jia-Ching
    [J]. PROCEEDINGS OF 2015 INTERNATIONAL CONFERENCE ON ORANGE TECHNOLOGIES (ICOT), 2015, : 30 - 33
  • [29] Automatic text categorization based on K-nearest neighbor
    Sun, J.
    Wang, W.
    Zhong, Y.-X.
    [J]. Beijing Youdian Xueyuan Xuebao/Journal of Beijing University of Posts And Telecommunications, 2001, 24 (01): : 42 - 46
  • [30] The Chinese text categorization system with association rule and category priority
    Chiang, Ding-An
    Keh, Huan-Chao
    Huang, Hui-Hua
    Chyr, Derming
    [J]. EXPERT SYSTEMS WITH APPLICATIONS, 2008, 35 (1-2) : 102 - 110