A Novel Feature Weight Algorithm for Text Categorization

被引:0
|
作者
Shang, Wenqian
Dong, Hongbin
Zhu, Haibin
Wang, Yongbin
机构
关键词
D O I
暂无
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
With the development of the web, large numbers of documents are put onto the Internet. More and more digital libraries, news sources and inner data of companies are available. Automatic text categorization becomes more and more important for dealing with massive data. However, text preprocessing is still the bottleneck of text categorization based on Vector Space Model (VSM). The result of text preprocessing directly affects the performance and precision of categorization. Moreover, feature selection and feature weight become the major obstacles of text preprocessing. In this paper, we mainly focus on feature weight. We present a novel feature weight algorithm-TF-Gini that can improve the categorization peformance significantly. The experiment results verify the effectiveness of this algorithm.
引用
收藏
页码:269 / 275
页数:7
相关论文
共 50 条
  • [1] A novel feature selection algorithm for text categorization
    Shang, Wenqian
    Huang, Houkuan
    Zhu, Haibin
    Lin, Yongmin
    Qu, Youli
    Wang, Zhihai
    [J]. EXPERT SYSTEMS WITH APPLICATIONS, 2007, 33 (01) : 1 - 5
  • [2] Novel feature selection algorithm for Chinese text categorization based on CHI
    Cai Zhenliang
    Wang Jian
    Liu Jiqiang
    [J]. PROCEEDINGS OF 2016 IEEE 13TH INTERNATIONAL CONFERENCE ON SIGNAL PROCESSING (ICSP 2016), 2016, : 1035 - 1039
  • [3] A comparative study on feature weight in text categorization
    Deng, ZH
    Tang, SW
    Yang, DQ
    Zhang, M
    Li, LY
    Xie, KQ
    [J]. ADVANCED WEB TECHNOLOGIES AND APPLICATIONS, 2004, 3007 : 588 - 597
  • [4] Oscillating feature subset search algorithm for text categorization
    Novovicova, Jana
    Somol, Petr
    Pudil, Pavel
    [J]. PROGRESS IN PATTERN RECOGNITION, IMAGE ANALYSIS AND APPLICATIONS, PROCEEDINGS, 2006, 4225 : 578 - 587
  • [5] An Improved Strategy of the Feature Selection Algorithm for the Text Categorization
    Yang, Jieming
    Lu, Yixin
    Liu, Zhiying
    [J]. 2019 20TH IEEE/ACIS INTERNATIONAL CONFERENCE ON SOFTWARE ENGINEERING, ARTIFICIAL INTELLIGENCE, NETWORKING AND PARALLEL/DISTRIBUTED COMPUTING (SNPD), 2019, : 3 - 7
  • [6] CHINESE TEXT CATEGORIZATION STUDY BASED ON FEATURE WEIGHT LEARNING
    Zhan, Yan
    Chen, Hao
    Zhang, Su-Fang
    Zheng, Mei
    [J]. PROCEEDINGS OF 2009 INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND CYBERNETICS, VOLS 1-6, 2009, : 1723 - +
  • [7] GU metric - A new feature selection algorithm for text categorization
    Uchyigit, Gulden
    Clark, Keith
    [J]. ICEIS 2007: PROCEEDINGS OF THE NINTH INTERNATIONAL CONFERENCE ON ENTERPRISE INFORMATION SYSTEMS: ARTIFICIAL INTELLIGENCE AND DECISION SUPPORT SYSTEMS, 2007, : 399 - 402
  • [8] Class-dependent feature selection algorithm for text categorization
    Fragoso, Rogerio C. P.
    Pinheiro, Roberto H. W.
    Cavalcanti, George D. C.
    [J]. 2016 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2016, : 3508 - 3515
  • [9] A novel approach for text categorization by applying hybrid genetic bat algorithm through feature extraction and feature selection methods
    Eliguzel, Nazmiye
    Cetinkaya, Cihan
    Dereli, Tuerkay
    [J]. EXPERT SYSTEMS WITH APPLICATIONS, 2022, 202
  • [10] Lazy learner text categorization algorithm based on embedded feature selection
    Yan Peng
    Zheng Xuefeng
    Zhu Jianyong
    Xiao Yunhong
    [J]. JOURNAL OF SYSTEMS ENGINEERING AND ELECTRONICS, 2009, 20 (03) : 651 - 659