An Enhanced Feature Selection for Text Documents

被引:1
|
作者
Thatha, Venkata Nagaraju [1 ]
Babu, A. Sudhir [2 ]
Haritha, D. [1 ]
机构
[1] JNTUK Univ, Dept Comp Sci & Engn, Kakinada, India
[2] PVPSIT, Dept Comp Sci & Engn, Vijayawada, India
关键词
Text mining; Bag of Words; Stop word removal; Stemming; Feature selection;
D O I
10.1007/978-981-32-9690-9_3
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In the current digital world, a vast amount of data is recorded in a variety of forms like pictures, data, video, and audio. Generally, such type of information which is available in large voluminous form is actually not available in an organized manner which is appropriate for text processing. Text mining is a subfield of data mining which aims at exploring the useful information from the recorded resources. Document clustering helps the users to effectively navigate, review, and classify text documents into significant clusters, the knowledge that helps to handle the enormous amount of text mining. Preprocessing and feature selection are of tremendous importance in document clustering. In document clustering, preprocessing techniques applied to the documents are Bag of Words (BOW), Stop word removal, and Porter stemming. In this paper, we proposed an easy to use framework for preprocessing and Enhanced Term Frequency-Inverse Document Frequency (Enhanced TF-IDF) method for feature selection.
引用
收藏
页码:21 / 29
页数:9
相关论文
共 50 条
  • [31] Feature Selection Methods for Text Classification
    Dasgupta, Anirban
    Drineas, Petros
    Harb, Boulos
    Josifovski, Vanja
    Mahoney, Michael W.
    [J]. KDD-2007 PROCEEDINGS OF THE THIRTEENTH ACM SIGKDD INTERNATIONAL CONFERENCE ON KNOWLEDGE DISCOVERY AND DATA MINING, 2007, : 230 - +
  • [32] Unsupervised feature selection for text data
    Wiratunga, Nirmalie
    Lothian, Rob
    Massie, Stewart
    [J]. ADVANCES IN CASE-BASED REASONING, PROCEEDINGS, 2006, 4106 : 340 - 354
  • [33] Feature selection for text classification: A review
    Xuelian Deng
    Yuqing Li
    Jian Weng
    Jilian Zhang
    [J]. Multimedia Tools and Applications, 2019, 78 : 3797 - 3816
  • [34] Feature Selection for Ordinal Text Classification
    Baccianella, Stefano
    Esuli, Andrea
    Sebastiani, Fabrizio
    [J]. NEURAL COMPUTATION, 2014, 26 (03) : 557 - 591
  • [35] New methods for text categorization based on a new feature selection method and a new similarity measure between documents
    Lee, Li-Wei
    Chen, Shyi-Ming
    [J]. ADVANCES IN APPLIED ARTIFICIAL INTELLIGENCE, PROCEEDINGS, 2006, 4031 : 1280 - 1289
  • [36] Summarizing text documents: Sentence selection and evaluation metrics
    Goldstein, J
    Kantrowitz, M
    Mittal, V
    Carbonell, J
    [J]. SIGIR'99: PROCEEDINGS OF 22ND INTERNATIONAL CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION RETRIEVAL, 1999, : 121 - 128
  • [37] Feature Selection and Feature Weight Estimate in Web Text Mining
    Pei, Zhili
    Qi, Jianhong
    Zhang, Xinhong
    Zhou, Yuxin
    Bai, Mingyu
    Wang, Qinghu
    Liu, Lisha
    Fan, Xiaojing
    Jiang, Mingyang
    [J]. 2ND INTERNATIONAL CONFERENCE ON INFORMATION AND COMMUNICATION TECHNOLOGY FOR EDUCATION (ICTE 2015), 2015, : 316 - 320
  • [38] Feature selection based on feature interactions with application to text categorization
    Tang, Xiaochuan
    Dai, Yuanshun
    Xiang, Yanping
    [J]. EXPERT SYSTEMS WITH APPLICATIONS, 2019, 120 : 207 - 216
  • [39] Hierarchical approach to select feature vectors for classification of text documents
    Kapalavayi, Nagesh
    Murthy, S. N. Jayaram
    Hu, Gongzhu
    [J]. 2006 IEEE INTERNATIONAL CONFERENCE ON COMPUTER SYSTEMS AND APPLICATIONS, VOLS 1-3, 2006, : 1179 - +
  • [40] An Evolutionary Algorithm for Feature Selective Double Clustering of Text Documents
    Nourashrafeddin, S. N.
    Milios, Evangelos
    Arnold, Dirk V.
    [J]. 2013 IEEE CONGRESS ON EVOLUTIONARY COMPUTATION (CEC), 2013, : 446 - 453