Feature Selection by Using Heuristic Methods for Text Classification

被引:0
|
作者
Sel, Ilhami [1 ]
Yeroglu, Celalettin [1 ]
Hanbay, Davut [1 ]
机构
[1] Inonu Univ, Bilgisayar Muhendisligi Bolumu, Malatya, Turkey
关键词
Natural Language Processing; Doc2Vec; Whale Optimization; Grey Wolf Optimization; Chi-Square;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Feature selection can be defined as the selection of the best subset to represent the data set in machine learning applications, in other words extraction of the unnecessary data that has no effect on the result. In classification problems efficiency and accuracy of the system can be increased when the dimension is reduced by feature selection. In this study, text classifying application is performed by using the data set of "20 News Group" released in Reuters News Agent. The pre-processed news data were converted to vectors by using Doc2Vec method and the data set was created and classified by Naive Bayes method. Subsequently, a subset of the data set was formed by using heuristic methods that were inspired by nature (Whale and Gray Wolf Optimization Algorithms) and Chi-square method for feature selection. Then the reclassification was applied and the results were compared. While the success of the system with 600 features before the feature selection is 0.9214, the performance ratio of the 100 featured models created later is figured higher (0.94095 - 0.93833- 0.93619).
引用
下载
收藏
页数:6
相关论文
共 50 条
  • [31] ARTC: feature selection using association rules for text classification
    Mozamel M. Saeed
    Zaher Al Aghbari
    Neural Computing and Applications, 2022, 34 : 22519 - 22529
  • [32] Feature Selection for Text Classification Using Machine Learning Approaches
    Thirumoorthy, K.
    Muneeswaran, K.
    NATIONAL ACADEMY SCIENCE LETTERS-INDIA, 2022, 45 (01): : 51 - 56
  • [33] Feature selection using improved mutual information for text classification
    Novovicová, J
    Malík, A
    Pudil, P
    STRUCTURAL, SYNTACTIC, AND STATISTICAL PATTERN RECOGNITION, PROCEEDINGS, 2004, 3138 : 1010 - 1017
  • [34] Feature subset selection using naive Bayes for text classification
    Feng, Guozhong
    Guo, Jianhua
    Jing, Bing-Yi
    Sun, Tieli
    PATTERN RECOGNITION LETTERS, 2015, 65 : 109 - 115
  • [35] A heuristic for feature selection for the classification with neural nets
    Feldbusch, F
    JOINT 9TH IFSA WORLD CONGRESS AND 20TH NAFIPS INTERNATIONAL CONFERENCE, PROCEEDINGS, VOLS. 1-5, 2001, : 173 - 178
  • [36] Feature Selection for Text Classification Using Machine Learning Approaches
    K. Thirumoorthy
    K. Muneeswaran
    National Academy Science Letters, 2022, 45 : 51 - 56
  • [37] A systematic literature review on meta-heuristic based feature selection techniques for text classification
    Al-shalif S.A.
    Senan N.
    Saeed F.
    Ghaban W.
    Ibrahim N.
    Aamir M.
    Sharif W.
    PeerJ Computer Science, 2024, 10 : 1 - 45
  • [38] A systematic literature review on meta-heuristic based feature selection techniques for text classification
    Al-shalif, Sarah Abdulkarem
    Senan, Norhalina
    Saeed, Faisal
    Ghaban, Wad
    Ibrahim, Noraini
    Aamir, Muhammad
    Sharif, Wareesa
    PEERJ COMPUTER SCIENCE, 2024, 10
  • [39] A Review on Feature Selection and Feature Extraction for Text Classification
    Shah, Foram P.
    Patel, Vibha
    PROCEEDINGS OF THE 2016 IEEE INTERNATIONAL CONFERENCE ON WIRELESS COMMUNICATIONS, SIGNAL PROCESSING AND NETWORKING (WISPNET), 2016, : 2264 - 2268
  • [40] Classification of Prostatic Tissues using Feature Selection Methods
    Bouatmane, S.
    Nekhoul, B.
    Bouridane, A.
    Tanougast, C.
    11TH MEDITERRANEAN CONFERENCE ON MEDICAL AND BIOLOGICAL ENGINEERING AND COMPUTING 2007, VOLS 1 AND 2, 2007, 16 (1-2): : 843 - +