Study On Feature Selection And Weighting Based On Synonym Merge In Text Categorization

被引:3
|
作者
Lu, Zhenyu [1 ]
Lin, Yongmin [1 ]
Zhao, Shuang [1 ]
Chen, Xuebin [2 ]
机构
[1] Hebei Polytech Univ, Coll Econ & Management, Tangshan, Peoples R China
[2] Hebei Polytech Univ, Coll Sci, Tangshan, Peoples R China
关键词
text categorization; feature selection; feature weighting; entropy; TongYiCi CiLin; synonym merge;
D O I
10.1109/ICFN.2010.70
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Feature selection and weighting is one of the key problem in text categorization. The chief obstacles to feature selection are noise and sparseness. This paper presents an approach of Chinese text feature selection and weighting based on semantic statistics. First, we use synonymous concepts to extract feature values in text based on Thesaurus which names TongYiCi CiLin. Then, we introduce a new weight function based on term frequency and entropy, which adjusts the effect of the feature term in the classifier according to the feature term's strength. Experiments show that our method is much better than kinds of traditional feature selection methods and it improve the performance of text categorization systems
引用
收藏
页码:105 / 109
页数:5
相关论文
共 50 条
  • [41] Hybrid feature selection based on enhanced genetic algorithm for text categorization
    Ghareb, Abdullah Saeed
    Abu Bakar, Azuraliza
    Hamdan, Abdul Razak
    [J]. EXPERT SYSTEMS WITH APPLICATIONS, 2016, 49 : 31 - 47
  • [42] New Feature Selection Methods Based on Context Similarity for Text Categorization
    Chen, Yifei
    Han, Bingqing
    Hou, Ping
    [J]. 2014 11TH INTERNATIONAL CONFERENCE ON FUZZY SYSTEMS AND KNOWLEDGE DISCOVERY (FSKD), 2014, : 598 - 604
  • [43] Improved Information Gain-based Feature Selection for Text Categorization
    Gao, Zhe
    Xu, Yajing
    Meng, Fanyu
    Qi, Feng
    Lin, Zhiqing
    [J]. 2014 4TH INTERNATIONAL CONFERENCE ON WIRELESS COMMUNICATIONS, VEHICULAR TECHNOLOGY, INFORMATION THEORY AND AEROSPACE & ELECTRONIC SYSTEMS (VITAE), 2014,
  • [44] An Algorithm of Feature Selection in Text Categorization Based on Gini-index
    Zhu, Wei-Dong
    Wang, Bo
    Lin, Yong-Min
    [J]. PROCEEDINGS OF THE 2015 INTERNATIONAL CONFERENCE ON MANAGEMENT SCIENCE AND MANAGEMENT INNOVATION, 2015, 6 : 272 - 278
  • [45] Lazy learner text categorization algorithm based on embedded feature selection
    Yan Peng~(1
    2.China State Information Center
    [J]. Journal of Systems Engineering and Electronics, 2009, 20 (03) : 651 - 659
  • [46] Relative term-frequency based feature selection for text categorization
    Yang, SM
    Wu, XB
    Deng, ZH
    Zhang, M
    Yang, DQ
    [J]. 2002 INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND CYBERNETICS, VOLS 1-4, PROCEEDINGS, 2002, : 1432 - 1436
  • [47] An alternative framework for univariate filter based feature selection for text categorization
    Guru, D. S.
    Suhil, Mahamad
    Raju, Lavanya Narayana
    Kumar, N. Vinay
    [J]. PATTERN RECOGNITION LETTERS, 2018, 103 : 23 - 31
  • [48] Temporal-based Feature Selection and Transfer Learning for Text Categorization
    Fukumoto, Fumiyo
    Suzuki, Yoshimi
    [J]. 2015 7TH INTERNATIONAL JOINT CONFERENCE ON KNOWLEDGE DISCOVERY, KNOWLEDGE ENGINEERING AND KNOWLEDGE MANAGEMENT (IC3K), 2015, : 17 - 26
  • [49] A comparative study on feature selection of text categorization for hidden Markov models
    Yi, K
    Beheshti, J
    [J]. CANADIAN JOURNAL OF INFORMATION AND LIBRARY SCIENCE-REVUE CANADIENNE DES SCIENCES DE L INFORMATION ET DE BIBLIOTHECONOMIE, 2004, 28 (03): : 101 - 101
  • [50] Video frame categorization using sort-merge feature selection
    Liu, Y
    Kender, JR
    [J]. IEEE WORKSHOP ON MOTION AND VIDEO COMPUTING (MOTION 2002), PROCEEDINGS, 2002, : 72 - 77