An Enhanced Feature Selection for Text Documents

被引:1
|
作者
Thatha, Venkata Nagaraju [1 ]
Babu, A. Sudhir [2 ]
Haritha, D. [1 ]
机构
[1] JNTUK Univ, Dept Comp Sci & Engn, Kakinada, India
[2] PVPSIT, Dept Comp Sci & Engn, Vijayawada, India
关键词
Text mining; Bag of Words; Stop word removal; Stemming; Feature selection;
D O I
10.1007/978-981-32-9690-9_3
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In the current digital world, a vast amount of data is recorded in a variety of forms like pictures, data, video, and audio. Generally, such type of information which is available in large voluminous form is actually not available in an organized manner which is appropriate for text processing. Text mining is a subfield of data mining which aims at exploring the useful information from the recorded resources. Document clustering helps the users to effectively navigate, review, and classify text documents into significant clusters, the knowledge that helps to handle the enormous amount of text mining. Preprocessing and feature selection are of tremendous importance in document clustering. In document clustering, preprocessing techniques applied to the documents are Bag of Words (BOW), Stop word removal, and Porter stemming. In this paper, we proposed an easy to use framework for preprocessing and Enhanced Term Frequency-Inverse Document Frequency (Enhanced TF-IDF) method for feature selection.
引用
收藏
页码:21 / 29
页数:9
相关论文
共 50 条
  • [41] A New Feature Selection Method for Text Clustering
    XU Junling1
    2. State Key Laboratory of Software Engineering
    3. Department of Computer Science and Engineering
    [J]. Wuhan University Journal of Natural Sciences, 2007, (05) : 912 - 916
  • [42] Efficient Method for Feature Selection in Text Classification
    Sun, Jian
    Zhang, Xiang
    Liao, Dan
    Chang, Victor
    [J]. 2017 INTERNATIONAL CONFERENCE ON ENGINEERING AND TECHNOLOGY (ICET), 2017,
  • [43] Feature selection algorithms to improve documents' classification performance
    Sousa, PAC
    Pimentao, JP
    Santos, BRD
    Moura-Pires, F
    [J]. ADVANCES IN WEB INTELLIGENCE, 2003, 2663 : 288 - 296
  • [44] Text Categorization Based on Clustering Feature Selection
    Zhou, Xiaofei
    Hu, Yue
    Guo, Li
    [J]. 2ND INTERNATIONAL CONFERENCE ON INFORMATION TECHNOLOGY AND QUANTITATIVE MANAGEMENT, ITQM 2014, 2014, 31 : 398 - 405
  • [45] A Bayesian feature selection paradigm for text classification
    Feng, Guozhong
    Guo, Jianhua
    Jing, Bing-Yi
    Hao, Lizhu
    [J]. INFORMATION PROCESSING & MANAGEMENT, 2012, 48 (02) : 283 - 302
  • [46] A new feature selection method for text classification
    Uchyigit, Gulden
    Clark, Keith
    [J]. INTERNATIONAL JOURNAL OF PATTERN RECOGNITION AND ARTIFICIAL INTELLIGENCE, 2007, 21 (02) : 423 - 438
  • [47] An Effective Feature Selection Method for Text Categorization
    Qiu, Xipeng
    Zhou, Jinlong
    Huang, Xuanjing
    [J]. ADVANCES IN KNOWLEDGE DISCOVERY AND DATA MINING, PT I: 15TH PACIFIC-ASIA CONFERENCE, PAKDD 2011, 2011, 6634 : 50 - 61
  • [48] Rank Aggregation based Text Feature Selection
    Wu, Ou
    Zuo, Haiqiang
    Zhu, Mingliang
    Hu, Weiming
    Gao, Jun
    Wang, Hanzi
    [J]. 2009 IEEE/WIC/ACM INTERNATIONAL JOINT CONFERENCES ON WEB INTELLIGENCE (WI) AND INTELLIGENT AGENT TECHNOLOGIES (IAT), VOL 1, 2009, : 165 - +
  • [49] Combination of Feature Selection Methods for Text Categorisation
    Neumayer, Robert
    Mayer, Rudolf
    Norvag, Kjetil
    [J]. ADVANCES IN INFORMATION RETRIEVAL, 2011, 6611 : 763 - +
  • [50] Study on constraints for feature selection in text categorization
    Xu, Yan
    Li, Jintao
    Wang, Bin
    Sun, Chunming
    Zhang, Sen
    [J]. 2008, Science Press, 18,Shuangqing Street,Haidian, Beijing, 100085, China (45):