An enhanced text categorization method based on improved text frequency approach and mutual information algorithm

被引:0
|
作者
Pei Zhili [1 ,2 ]
Shi Xiaohu [1 ]
Marchese, Maurizio [3 ]
Liang Yanchun [1 ,3 ]
机构
[1] Jilin Univ, Coll Comp Sci & Technol, Minist Educ, Key Lab Symbol Computat & Knowledge Engn, Changchun 130012, Peoples R China
[2] Natl Univ Inner Mongolia, Coll Math & Comp Sci, Tongliao 028043, Peoples R China
[3] Univ Trent, Dept Informat & Commun Technol, I-38050 Povo, TN, Italy
关键词
text categorization; mutual information; feature selection; characteristic weights; classifier;
D O I
暂无
中图分类号
T [工业技术];
学科分类号
08 ;
摘要
Text categorization plays an important role in data mining. Feature selection is the most important process of text categorization. Focused on feature selection, we present an improved text frequency method for filtering of low frequency features to deal with the data preprocessing, propose an improved mutual information algorithm for feature selection, and develop an improved tf. idf method for characteristic weights evaluation. The proposed method is applied to the benchmark test set Reuters-21578 Top10 to examine its effectiveness. Numerical results show that the precision, the recall and the value of F1 of the proposed method are all superior to those of existing conventional methods.
引用
收藏
页码:1494 / 1500
页数:7
相关论文
共 50 条
  • [31] An optimal Text categorization algorithm based on SVM
    Wang, Ziqiang
    Sun, Xia
    Zhang, Dexian
    [J]. 2006 INTERNATIONAL CONFERENCE ON COMMUNICATIONS, CIRCUITS AND SYSTEMS PROCEEDINGS, VOLS 1-4: VOL 1: SIGNAL PROCESSING, 2006, : 2137 - +
  • [32] Algorithm of Text Categorization based on Cloud Computing
    Huang, Liqin
    Lin, Liqun
    Liu, Yanhuang
    [J]. INFORMATION, COMMUNICATION AND ENGINEERING, 2013, 311 : 158 - +
  • [33] The study on Web product reviews mining based on an improved text categorization algorithm
    Hu, Dongbin
    Luo, Lixia
    Xu, Lihua
    [J]. ELECTRONIC-BUSINESS INTELLIGENCE: FOR CORPORATE COMPETITIVE ADVANTAGES IN THE AGE OF EMERGING TECHNOLOGIES & GLOBALIZATION, 2010, 14 : 449 - 455
  • [34] An improved co-training text categorization algorithm based on diversity measures
    Tang, Huan-Ling
    Lin, Zheng-Kui
    Lu, Ming-Yu
    [J]. Tien Tzu Hsueh Pao/Acta Electronica Sinica, 2008, 36 (SUPPL.): : 138 - 143
  • [35] An algorithm for text categorization with SVM
    Hu, J
    Huang, HK
    [J]. 2002 IEEE REGION 10 CONFERENCE ON COMPUTERS, COMMUNICATIONS, CONTROL AND POWER ENGINEERING, VOLS I-III, PROCEEDINGS, 2002, : 47 - 50
  • [36] An Improved Algorithm for Multiclass Text Categorization with Support Vector Machine
    Shao, Fubo
    He, Guoping
    Zhang, Xin
    [J]. PROCEEDINGS OF THE 2008 INTERNATIONAL SYMPOSIUM ON COMPUTATIONAL INTELLIGENCE AND DESIGN, VOL 1, 2008, : 336 - 339
  • [37] An improved sine cosine algorithm to select features for text categorization
    Belazzoug, Mouhoub
    Touahria, Mohamed
    Nouioua, Farid
    Brahimi, Mohammed
    [J]. JOURNAL OF KING SAUD UNIVERSITY-COMPUTER AND INFORMATION SCIENCES, 2020, 32 (04) : 454 - 464
  • [38] An improved K-nearest-neighbor algorithm for text categorization
    Jiang, Shengyi
    Pang, Guansong
    Wu, Meiling
    Kuang, Limin
    [J]. EXPERT SYSTEMS WITH APPLICATIONS, 2012, 39 (01) : 1503 - 1509
  • [39] Text categorization method based on Extension Theory
    Yi, Y
    Zheng, Y
    He, ZS
    Wu, ZF
    [J]. 2003 INTERNATIONAL CONFERENCE ON NATURAL LANGUAGE PROCESSING AND KNOWLEDGE ENGINEERING, PROCEEDINGS, 2003, : 646 - 649
  • [40] Text categorization based on frequent patterns with term frequency
    Chen, XY
    Chen, Y
    Wang, L
    Hu, YF
    [J]. PROCEEDINGS OF THE 2004 INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND CYBERNETICS, VOLS 1-7, 2004, : 1610 - 1615