An Enhanced Feature Selection for Text Documents

被引:1
|
作者
Thatha, Venkata Nagaraju [1 ]
Babu, A. Sudhir [2 ]
Haritha, D. [1 ]
机构
[1] JNTUK Univ, Dept Comp Sci & Engn, Kakinada, India
[2] PVPSIT, Dept Comp Sci & Engn, Vijayawada, India
关键词
Text mining; Bag of Words; Stop word removal; Stemming; Feature selection;
D O I
10.1007/978-981-32-9690-9_3
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In the current digital world, a vast amount of data is recorded in a variety of forms like pictures, data, video, and audio. Generally, such type of information which is available in large voluminous form is actually not available in an organized manner which is appropriate for text processing. Text mining is a subfield of data mining which aims at exploring the useful information from the recorded resources. Document clustering helps the users to effectively navigate, review, and classify text documents into significant clusters, the knowledge that helps to handle the enormous amount of text mining. Preprocessing and feature selection are of tremendous importance in document clustering. In document clustering, preprocessing techniques applied to the documents are Bag of Words (BOW), Stop word removal, and Porter stemming. In this paper, we proposed an easy to use framework for preprocessing and Enhanced Term Frequency-Inverse Document Frequency (Enhanced TF-IDF) method for feature selection.
引用
收藏
页码:21 / 29
页数:9
相关论文
共 50 条
  • [1] Feature selection and text classification for Chinese web documents
    Xu, JC
    Liu, DY
    Hu, M
    [J]. PROCEEDINGS OF THE 2004 INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND CYBERNETICS, VOLS 1-7, 2004, : 1304 - 1309
  • [2] Evolutionary Feature Selection for Text Documents using the SVM
    Morariu, Daniel I.
    Vintan, Lucian N.
    Tresp, Volker
    [J]. PROCEEDINGS OF WORLD ACADEMY OF SCIENCE, ENGINEERING AND TECHNOLOGY, VOL 15, 2006, 15 : 215 - +
  • [3] A Feature Selection Method for Classifying Highly Similar Text Documents
    Kim, Jeenyoung
    Min, Daiki
    [J]. INDUSTRIAL ENGINEERING AND MANAGEMENT SYSTEMS, 2021, 20 (02): : 148 - 162
  • [4] An enhanced feature selection method for text classification
    Kang, Jinbeom
    Lee, Eunshil
    Hong, Kwanghee
    Park, Jeahyun
    Kim, Taehwan
    Park, Juyoung
    Choi, Joongmin
    Yang, Jaeyoung
    [J]. PROCEEDINGS OF THE SECOND IASTED INTERNATIONAL CONFERENCE ON COMPUTATIONAL INTELLIGENCE, 2006, : 36 - 41
  • [5] Variable Global Feature Selection Scheme for automatic classification of text documents
    Agnihotri, Deepak
    Verma, Kesari
    Tripathi, Priyanka
    [J]. EXPERT SYSTEMS WITH APPLICATIONS, 2017, 81 : 268 - 281
  • [6] Feature Selection for Enhanced Author Identification of Turkish Text
    Bay, Yasemin
    Celebi, Erbug
    [J]. INFORMATION SCIENCES AND SYSTEMS 2015, 2016, 363 : 371 - 379
  • [7] Using micro-documents for feature selection: The case of ordinal text classification
    Baccianella, Stefano
    Esuli, Andrea
    Sebastiani, Fabrizio
    [J]. EXPERT SYSTEMS WITH APPLICATIONS, 2013, 40 (11) : 4687 - 4696
  • [8] Enhanced Filter Feature Selection Methods for Arabic Text Categorization
    Ghareb, Abdullah Saeed
    Abu Bakara, Azuraliza
    Al-Radaideh, Qasem A.
    Hamdan, Abdul Razak
    [J]. INTERNATIONAL JOURNAL OF INFORMATION RETRIEVAL RESEARCH, 2018, 8 (02) : 1 - 24
  • [9] Rough set feature selection methods for case-based categorization of text documents
    Gupta, KM
    Moore, PG
    Aha, DW
    Pal, SK
    [J]. PATTERN RECOGNITION AND MACHINE INTELLIGENCE, PROCEEDINGS, 2005, 3776 : 792 - 798
  • [10] A New Feature Selection Method based on Intuitionistic Fuzzy Entropy to Categorize Text Documents
    Revanasiddappa, M. B.
    Harish, B. S.
    [J]. INTERNATIONAL JOURNAL OF INTERACTIVE MULTIMEDIA AND ARTIFICIAL INTELLIGENCE, 2018, 5 (03): : 106 - 117