Feature Selection by Using Heuristic Methods for Text Classification

被引:0
|
作者
Sel, Ilhami [1 ]
Yeroglu, Celalettin [1 ]
Hanbay, Davut [1 ]
机构
[1] Inonu Univ, Bilgisayar Muhendisligi Bolumu, Malatya, Turkey
关键词
Natural Language Processing; Doc2Vec; Whale Optimization; Grey Wolf Optimization; Chi-Square;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Feature selection can be defined as the selection of the best subset to represent the data set in machine learning applications, in other words extraction of the unnecessary data that has no effect on the result. In classification problems efficiency and accuracy of the system can be increased when the dimension is reduced by feature selection. In this study, text classifying application is performed by using the data set of "20 News Group" released in Reuters News Agent. The pre-processed news data were converted to vectors by using Doc2Vec method and the data set was created and classified by Naive Bayes method. Subsequently, a subset of the data set was formed by using heuristic methods that were inspired by nature (Whale and Gray Wolf Optimization Algorithms) and Chi-square method for feature selection. Then the reclassification was applied and the results were compared. While the success of the system with 600 features before the feature selection is 0.9214, the performance ratio of the 100 featured models created later is figured higher (0.94095 - 0.93833- 0.93619).
引用
下载
收藏
页数:6
相关论文
共 50 条
  • [1] Feature Selection Methods for Text Classification
    Dasgupta, Anirban
    Drineas, Petros
    Harb, Boulos
    Josifovski, Vanja
    Mahoney, Michael W.
    KDD-2007 PROCEEDINGS OF THE THIRTEENTH ACM SIGKDD INTERNATIONAL CONFERENCE ON KNOWLEDGE DISCOVERY AND DATA MINING, 2007, : 230 - +
  • [2] Comparison on Feature Selection Methods for Text Classification
    Liu, Wenkai
    Xiao, Jiongen
    Hong, Ming
    2020 THE 4TH INTERNATIONAL CONFERENCE ON MANAGEMENT ENGINEERING, SOFTWARE ENGINEERING AND SERVICE SCIENCES (ICMSS 2020), 2020, : 82 - 86
  • [3] Efficient Text Classification Using Best Feature Selection and Combination of Methods
    Srinivas, M.
    Supreethi, K. P.
    Prasad, E. V.
    Kumari, S. Anitha
    HUMAN INTERFACE AND THE MANAGEMENT OF INFORMATION: DESIGNING INFORMATION ENVIRONMENTS, PT I, 2009, 5617 : 437 - +
  • [4] A modified multi objective heuristic for effective feature selection in text classification
    Thiyagarajan, D.
    Shanthi, N.
    CLUSTER COMPUTING-THE JOURNAL OF NETWORKS SOFTWARE TOOLS AND APPLICATIONS, 2019, 22 : 10625 - 10635
  • [5] Filter feature selection methods for text classification: a review
    Ming, Hong
    Heyong, Wang
    MULTIMEDIA TOOLS AND APPLICATIONS, 2023, 83 (1) : 2053 - 2091
  • [6] Filter feature selection methods for text classification: a review
    Hong Ming
    Wang Heyong
    Multimedia Tools and Applications, 2024, 83 : 2053 - 2091
  • [7] Comparison of feature selection methods in Kurdish text classification
    Ari M. Saeed
    Soran Badawi
    Sara A. Ahmed
    Diyari A. Hassan
    Iran Journal of Computer Science, 2024, 7 (1) : 55 - 64
  • [8] An Experimental Study of Feature Selection Methods for Text Classification
    Uchyigit, Gulden
    Clark, Keith
    PERSONALIZATION TECHNIQUES AND RECOMMENDER SYSTEMS, 2008, : 303 - 320
  • [9] On Two-Stage Feature Selection Methods for Text Classification
    Uysal, Alper Kursat
    IEEE ACCESS, 2018, 6 : 43233 - 43251
  • [10] Feature selection methods for text classification: a systematic literature review
    Pintas, Julliano Trindade
    Fernandes, Leandro A. F.
    Garcia, Ana Cristina Bicharra
    ARTIFICIAL INTELLIGENCE REVIEW, 2021, 54 (08) : 6149 - 6200