An enhanced text categorization method based on improved text frequency approach and mutual information algorithm

被引:0
|
作者
Maurizio Marchese
机构
[1] 38050-Povo (TN)
[2] Department of Information and Communication Technology University of Trento
[3] Italy)
[4] Via Sommarive 14
基金
中国国家自然科学基金;
关键词
text categorization; mutual information; feature selection; characteristic weights; classifier;
D O I
暂无
中图分类号
TP301.6 [算法理论];
学科分类号
081202 ;
摘要
Text categorization plays an important role in data mining. Feature selection is the most important process of text categorization. Focused on feature selection, we present an improved text frequency method for filtering of low frequency features to deal with the data preprocessing, propose an improved mutual information algorithm for feature selection, and develop an improved tf.idf method for characteristic weights evaluation. The proposed method is applied to the benchmark test set Reuters-21578 Top10 to examine its effectiveness. Numerical results show that the precision, the recall and the value of F1 of the proposed method are all superior to those of existing conventional methods.
引用
收藏
页码:1494 / 1500
页数:7
相关论文
共 50 条
  • [1] An enhanced text categorization method based on improved text frequency approach and mutual information algorithm
    Pei Zhili
    Shi Xiaohu
    Marchese, Maurizio
    Liang Yanchun
    [J]. PROGRESS IN NATURAL SCIENCE-MATERIALS INTERNATIONAL, 2007, 17 (12) : 1494 - 1500
  • [2] Text Categorization Method Based on Improved Mutual Information and Characteristic Weights Evaluation Algorithms
    Pei, Zhili
    Shi, Xiaohu
    Marchese, Maurizio
    Liang, Yanchun
    [J]. FOURTH INTERNATIONAL CONFERENCE ON FUZZY SYSTEMS AND KNOWLEDGE DISCOVERY, VOL 4, PROCEEDINGS, 2007, : 87 - +
  • [3] The Improvement Research of Mutual Information Algorithm for Text Categorization
    Kai, Lu
    Li, Chen
    [J]. KNOWLEDGE ENGINEERING AND MANAGEMENT , ISKE 2013, 2014, 278 : 225 - 232
  • [4] An improved text categorization algorithm based on VSM
    Geng, Ji
    Lu, Yunling
    Chen, Wei
    Qin, Zhiguang
    [J]. 2014 IEEE 17TH INTERNATIONAL CONFERENCE ON COMPUTATIONAL SCIENCE AND ENGINEERING (CSE), 2014, : 1701 - 1706
  • [5] Feature selection algorithm for text classification based on improved mutual information
    丛帅
    张积宾
    徐志明
    王宇颖
    [J]. Journal of Harbin Institute of Technology(New series), 2011, (03) : 144 - 148
  • [6] Feature selection algorithm for text classification based on improved mutual information
    丛帅
    张积宾
    徐志明
    王宇颖
    [J]. Journal of Harbin Institute of Technology., 2011, 18 (03) - 148
  • [7] Automatic Chinese Text Categorization System Based on Mutual Information
    Lu, Zhimao
    Shi, Hong
    Zhang, Qi
    Yuan, Chaoyue
    [J]. 2009 IEEE INTERNATIONAL CONFERENCE ON MECHATRONICS AND AUTOMATION, VOLS 1-7, CONFERENCE PROCEEDINGS, 2009, : 4986 - 4990
  • [8] Item Categorization Algorithm Based on Improved Text Representation
    Zhenchao, Tu
    Jing, Ma
    [J]. Data Analysis and Knowledge Discovery, 2022, 6 (05) : 34 - 43
  • [9] Improved Mutual Information Method For Text Feature Selection
    Ding Xiaoming
    Tang Yan
    [J]. PROCEEDINGS OF THE 2013 8TH INTERNATIONAL CONFERENCE ON COMPUTER SCIENCE & EDUCATION (ICCSE 2013), 2013, : 163 - 166
  • [10] An Improved Parallel Algorithm for Text Categorization
    Yang, Wenchuan
    Fu, Yimin
    Zhang, Dong
    [J]. 2016 INTERNATIONAL SYMPOSIUM ON COMPUTER, CONSUMER AND CONTROL (IS3C), 2016, : 451 - 454