Helmholtz principle based supervised and unsupervised feature selection methods for text mining

被引:31
|
作者
Tutkan, Melike [1 ]
Ganiz, Murat Can [2 ]
Akyokus, Selim [1 ]
机构
[1] Dogus Univ, Dept Comp Engn, Istanbul, Turkey
[2] Marmara Univ, Dept Comp Engn, Istanbul, Turkey
关键词
Feature selection; Attribute selection; Machine learning; Text mining; Text classification; Helmholtz principle; SEMANTIC SMOOTHING METHOD; ALGORITHM;
D O I
10.1016/j.ipm.2016.03.007
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
One of the important problems in text classification is the high dimensionality of the feature space. Feature selection methods are used to reduce the dimensionality of the feature space by selecting the most valuable features for classification. Apart from reducing the dimensionality, feature selection methods have potential to improve text classifiers' performance both in terms of accuracy and time. Furthermore, it helps to build simpler and as a result more comprehensible models. In this study we propose new methods for feature selection from textual data, called Meaning Based Feature Selection (MBFS) which is based on the Helmholtz principle from the Gestalt theory of human perception which is used in image processing. The proposed approaches are extensively evaluated by their effect on the classification performance of two well-known classifiers on several datasets and compared with several feature selection algorithms commonly used in text mining. Our results demonstrate the value of the MBFS methods in terms of classification accuracy and execution time. (C) 2016 Elsevier Ltd. All rights reserved.
引用
收藏
页码:885 / 910
页数:26
相关论文
共 50 条
  • [1] Comparison between Supervised and Unsupervised Feature Selection Methods
    Haar, Lilli
    Anding, Katharina
    Trambitckii, Konstantin
    Notni, Gunther
    [J]. ICPRAM: PROCEEDINGS OF THE 8TH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION APPLICATIONS AND METHODS, 2019, : 582 - 589
  • [2] A comparative study on unsupervised feature selection methods for text clustering
    Liu, LY
    Kang, JC
    Yu, J
    Wang, ZL
    [J]. PROCEEDINGS OF THE 2005 IEEE INTERNATIONAL CONFERENCE ON NATURAL LANGUAGE PROCESSING AND KNOWLEDGE ENGINEERING (IEEE NLP-KE'05), 2005, : 597 - 601
  • [3] Ensemble feature selection using distance-based supervised and unsupervised methods in binary classification
    Hallajian, Bita
    Motameni, Homayun
    Akbari, Ebrahim
    [J]. EXPERT SYSTEMS WITH APPLICATIONS, 2022, 200
  • [4] Unsupervised feature selection for text data
    Wiratunga, Nirmalie
    Lothian, Rob
    Massie, Stewart
    [J]. ADVANCES IN CASE-BASED REASONING, PROCEEDINGS, 2006, 4106 : 340 - 354
  • [5] Supervised Hebb rule based feature selection for text classification
    Heyong, Wang
    Ming, Hong
    [J]. INFORMATION PROCESSING & MANAGEMENT, 2019, 56 (01) : 167 - 191
  • [6] Supervised, Unsupervised, and Semi-Supervised Feature Selection: A Review on Gene Selection
    Ang, Jun Chin
    Mirzal, Andri
    Haron, Habibollah
    Hamed, Haza Nuzly Abdull
    [J]. IEEE-ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS, 2016, 13 (05) : 971 - 989
  • [7] Feature selection methods for event detection in Twitter: a text mining approach
    Hossny, Ahmad Hany
    Mitchell, Lewis
    Lothian, Nick
    Osborne, Grant
    [J]. SOCIAL NETWORK ANALYSIS AND MINING, 2020, 10 (01)
  • [8] Feature selection methods for event detection in Twitter: a text mining approach
    Ahmad Hany Hossny
    Lewis Mitchell
    Nick Lothian
    Grant Osborne
    [J]. Social Network Analysis and Mining, 2020, 10
  • [9] Rough Set Based Feature Selection Approach for Text Mining
    Sailaja, N. Venkata
    Sree, L. Padma
    Mangathayaru, N.
    [J]. PROCEEDINGS OF THE 2016 2ND INTERNATIONAL CONFERENCE ON CONTEMPORARY COMPUTING AND INFORMATICS (IC3I), 2016, : 40 - 45
  • [10] A review of unsupervised feature selection methods
    Saúl Solorio-Fernández
    J. Ariel Carrasco-Ochoa
    José Fco. Martínez-Trinidad
    [J]. Artificial Intelligence Review, 2020, 53 : 907 - 948