Feature Selection and Term Weighting

被引:2
|
作者
Algarni, Abdulmohsen [1 ]
Tairan, Nasser [1 ]
机构
[1] King Khalid Univ, Coll Comp Sci, Abha 61411, Saudi Arabia
关键词
INFORMATION;
D O I
10.1109/WI-IAT.2014.53
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Term-based approaches can extract many features in text documents, but most include noise. Many popular text-mining techniques have been adapted to reduce noisy information from extracted features but still contains some noises features. However, the noise features are extracted from the same training documents that good features extracted from. Therefore, the main problem is that some training documents contain large a mount of noises data. If we can reduce the noises data in the training documents that would help to reduce noises in extracted features. Moreover, we believe that remove some of training documents (documents that contains noises data more than useful data) can help to improve the effectiveness of the classifier. Using the advantages of clustering method can help to reduce the affect of noises data. The main problem of clustering is defined to be that of finding groups of similar projects in the data. In this paper we introduce the methodology that using clustering algorithm to group training data before use it. Also we tested our theory that not all training documents are useful to train the classifier.
引用
收藏
页码:336 / 339
页数:4
相关论文
共 50 条
  • [1] Categorical term descriptor: A proposed term weighting scheme for feature selection
    How, BC
    Kulathuramaiyer, N
    Kiong, WT
    [J]. 2005 IEEE/WIC/ACM INTERNATIONAL CONFERENCE ON WEB INTELLIGENCE, PROCEEDINGS, 2005, : 313 - 316
  • [2] Feature weighting as a tool for unsupervised feature selection
    Panday, Deepak
    de Amorim, Renato Cordeiro
    Lane, Peter
    [J]. INFORMATION PROCESSING LETTERS, 2018, 129 : 44 - 52
  • [3] Feature Weighting and Feature Selection in Fuzzy Clustering
    Borgelt, Christian
    [J]. 2008 IEEE INTERNATIONAL CONFERENCE ON FUZZY SYSTEMS, VOLS 1-5, 2008, : 838 - 844
  • [4] Incremental feature weighting for fuzzy feature selection
    Wang, Ling
    Meng, Jianyao
    Huang, Ruixia
    Zhu, Hui
    Peng, Kaixiang
    [J]. FUZZY SETS AND SYSTEMS, 2019, 368 : 1 - 19
  • [5] Efficient Feature Selection and Domain Relevance Term Weighting Method for Document Classification
    Khan, Aurangzeb
    Baharudin, Baharum
    Khan, Khairullah
    [J]. 2010 SECOND INTERNATIONAL CONFERENCE ON COMPUTER ENGINEERING AND APPLICATIONS: ICCEA 2010, PROCEEDINGS, VOL 2, 2010, : 398 - 403
  • [6] Discrete feature weighting & selection algorithm
    Jankowski, N
    [J]. PROCEEDINGS OF THE INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS 2003, VOLS 1-4, 2003, : 636 - 641
  • [7] Topical Term Weighting based on Extended Random Sets for Relevance Feature Selection
    Alharbi, Abdullah Semran
    Li, Yuefeng
    Xu, Yue
    [J]. 2017 IEEE/WIC/ACM INTERNATIONAL CONFERENCE ON WEB INTELLIGENCE (WI 2017), 2017, : 654 - 661
  • [8] Supervised Feature Selection With a Stratified Feature Weighting Method
    Chen, Renjie
    Sun, Ning
    Chen, Xiaojun
    Yang, Min
    Wu, Qingyao
    [J]. IEEE ACCESS, 2018, 6 : 15087 - 15098
  • [9] A feature selection method based on term frequency difference and positive weighting factor
    Zhou, Hongfang
    Li, Xiang
    Wang, Chenguang
    Ma, Yiming
    [J]. DATA & KNOWLEDGE ENGINEERING, 2022, 141
  • [10] Supervised Feature Selection With Orthogonal Regression and Feature Weighting
    Wu, Xia
    Xu, Xueyuan
    Liu, Jianhong
    Wang, Hailing
    Hu, Bin
    Nie, Feiping
    [J]. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2021, 32 (05) : 1831 - 1838