A Comparative Study on Feature Selection Techniques for Multi-cluster Text Data

被引:1
|
作者
Gupta, Ananya [1 ]
Begum, Shahin Ara [1 ]
机构
[1] Assam Univ, Dept Comp Sci, Silchar 788011, India
关键词
Feature selection; Multi-Cluster feature selection; Tf-Idf; Clustering; Text data;
D O I
10.1007/978-981-13-0761-4_21
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Text clustering involves data that are of very high dimension. Feature selection techniques find subsets of relevant features from the original feature space that help in efficient and effective clustering. Selection of relevant features merely on ranking scores without considering correlation interferes with the clustering performance. An efficient feature selection technique should be capable of preserving the multi-cluster structure of the data. The purpose of the present work is to demonstrate that feature selection techniques which take into consideration the correlation among features in multi-cluster scenario show better clustering results than those techniques that simply rank features independent of each other. This paper compares two feature selection techniques in this regard viz. the traditional Tf-Idf and the Multi-Cluster Feature Selection (MCFS) technique. The experimental results over the TDT2 and Reuters-21,578 datasets show the superior clustering results of MCFS over traditional Tf-Idf.
引用
收藏
页码:203 / 215
页数:13
相关论文
共 50 条
  • [1] Efficient multi-cluster feature selection on text data
    Gupta, Ananya
    Begum, Shahin Ara
    [J]. JOURNAL OF INFORMATION & OPTIMIZATION SCIENCES, 2019, 40 (08): : 1583 - 1598
  • [2] Feature Selection on Data Stream via Multi-Cluster structure Preservation
    Ma, Rui
    Wang, Yijie
    Cheng, Li
    [J]. CIKM '20: PROCEEDINGS OF THE 29TH ACM INTERNATIONAL CONFERENCE ON INFORMATION & KNOWLEDGE MANAGEMENT, 2020, : 1065 - 1074
  • [3] Unsupervised Feature Selection for Multi-cluster Data via Smooth Distributed Score
    Liu, Furui
    Liu, Xiyan
    [J]. EMERGING INTELLIGENT COMPUTING TECHNOLOGY AND APPLICATIONS, 2012, 304 : 74 - +
  • [4] Multi-Cluster Feature Selection Based on Isometric Mapping
    Yadi Wang
    Zefeng Zhang
    Yinghao Lin
    [J]. IEEE/CAA Journal of Automatica Sinica, 2022, 9 (03) : 570 - 572
  • [5] Multi-Cluster Feature Selection Based on Isometric Mapping
    Wang, Yadi
    Zhang, Zefeng
    Lin, Yinghao
    [J]. IEEE-CAA JOURNAL OF AUTOMATICA SINICA, 2022, 9 (03) : 570 - 572
  • [6] Classification of Brain MRI using Multi-Cluster Feature Selection and KNN Classifier
    Kalbkhani, Hashem
    Salimi, Arghavan
    Shayesteh, Mahrokh G.
    [J]. 2015 23RD IRANIAN CONFERENCE ON ELECTRICAL ENGINEERING (ICEE), 2015, : 93 - 98
  • [7] DendroX: multi-level multi-cluster selection in dendrograms
    Feiling Feng
    Qiaonan Duan
    Xiaoqing Jiang
    Xiaoming Kao
    Dadong Zhang
    [J]. BMC Genomics, 25
  • [8] Multi-cluster nonlinear unsupervised feature selection via joint manifold learning and generalized Lasso
    Wang, Yadi
    Huang, Mengyao
    Zhou, Liming
    Che, Hangjun
    Jiang, Bingbing
    [J]. EXPERT SYSTEMS WITH APPLICATIONS, 2024, 255
  • [9] DendroX: multi-level multi-cluster selection in dendrograms
    Feng, Feiling
    Duan, Qiaonan
    Jiang, Xiaoqing
    Kao, Xiaoming
    Zhang, Dadong
    [J]. BMC GENOMICS, 2024, 25 (01)
  • [10] A Comparative Study on Feature Selection in Unbalance Text Classification
    Xu, Yan
    [J]. 2012 INTERNATIONAL SYMPOSIUM ON INFORMATION SCIENCE AND ENGINEERING (ISISE), 2012, : 44 - 47