Study On Feature Selection And Weighting Based On Synonym Merge In Text Categorization

被引:3
|
作者
Lu, Zhenyu [1 ]
Lin, Yongmin [1 ]
Zhao, Shuang [1 ]
Chen, Xuebin [2 ]
机构
[1] Hebei Polytech Univ, Coll Econ & Management, Tangshan, Peoples R China
[2] Hebei Polytech Univ, Coll Sci, Tangshan, Peoples R China
关键词
text categorization; feature selection; feature weighting; entropy; TongYiCi CiLin; synonym merge;
D O I
10.1109/ICFN.2010.70
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Feature selection and weighting is one of the key problem in text categorization. The chief obstacles to feature selection are noise and sparseness. This paper presents an approach of Chinese text feature selection and weighting based on semantic statistics. First, we use synonymous concepts to extract feature values in text based on Thesaurus which names TongYiCi CiLin. Then, we introduce a new weight function based on term frequency and entropy, which adjusts the effect of the feature term in the classifier according to the feature term's strength. Experiments show that our method is much better than kinds of traditional feature selection methods and it improve the performance of text categorization systems
引用
收藏
页码:105 / 109
页数:5
相关论文
共 50 条
  • [1] Feature Selection for Text Classification Based on Part of Speech Filter and Synonym Merge
    Qin, Sijun
    Song, Jia
    Zhang, Pengzhou
    Tan, Yue
    [J]. 2015 12th International Conference on Fuzzy Systems and Knowledge Discovery (FSKD), 2015, : 681 - 685
  • [2] A study on feature weighting in Chinese text categorization
    Xue, DJ
    Sun, MS
    [J]. COMPUTATIONAL LINGUISTICS AND INTELLIGENT TEXT PROCESSING, PROCEEDINGS, 2003, 2588 : 592 - 601
  • [3] A feature weighting scheme for text categorization based on feature importance
    Liu, He
    Liu, Dayou
    Pei, Zhili
    Gao, Ying
    [J]. Jisuanji Yanjiu yu Fazhan/Computer Research and Development, 2009, 46 (10): : 1693 - 1703
  • [4] Text Categorization Based on Clustering Feature Selection
    Zhou, Xiaofei
    Hu, Yue
    Guo, Li
    [J]. 2ND INTERNATIONAL CONFERENCE ON INFORMATION TECHNOLOGY AND QUANTITATIVE MANAGEMENT, ITQM 2014, 2014, 31 : 398 - 405
  • [5] Study on Feature Selection in Finance Text Categorization
    Sun, Changqiu
    Wang, Xiaolong
    Xu, Jun
    [J]. 2009 IEEE INTERNATIONAL CONFERENCE ON SYSTEMS, MAN AND CYBERNETICS (SMC 2009), VOLS 1-9, 2009, : 5077 - 5082
  • [6] Study on constraints for feature selection in text categorization
    Xu, Yan
    Li, Jintao
    Wang, Bin
    Sun, Chunming
    Zhang, Sen
    [J]. 2008, Science Press, 18,Shuangqing Street,Haidian, Beijing, 100085, China (45):
  • [7] Feature selection based on feature interactions with application to text categorization
    Tang, Xiaochuan
    Dai, Yuanshun
    Xiang, Yanping
    [J]. EXPERT SYSTEMS WITH APPLICATIONS, 2019, 120 : 207 - 216
  • [8] Study on mutual information-based feature selection for text categorization
    Xu, Yan
    Jones, Gareth
    Li, Jintao
    Wang, Bin
    Sun, Chunming
    [J]. Journal of Computational Information Systems, 2007, 3 (03): : 1007 - 1012
  • [9] An empirical study of feature selection for text categorization based on term weightage
    How, BC
    Narayanan, K
    [J]. IEEE/WIC/ACM INTERNATIONAL CONFERENCE ON WEB INTELLIGENCE (WI 2004), PROCEEDINGS, 2004, : 599 - 602
  • [10] Feature subset selection in SOM based text categorization
    Bassiouny, S
    Nagi, M
    Hussein, MF
    [J]. IC-AI '04 & MLMTA'04 , VOL 1 AND 2, PROCEEDINGS, 2004, : 860 - 866