An Enhanced Feature Selection for Text Documents

被引：1

作者：

Thatha, Venkata Nagaraju ^{[1
]}

Babu, A. Sudhir ^{[2
]}

Haritha, D. ^{[1
]}

机构：

[1] JNTUK Univ, Dept Comp Sci & Engn, Kakinada, India

[2] PVPSIT, Dept Comp Sci & Engn, Vijayawada, India

来源：

SMART INTELLIGENT COMPUTING AND APPLICATIONS, VOL 2 | 2020年 / 160卷

关键词：

Text mining; Bag of Words; Stop word removal; Stemming; Feature selection;

D O I：

10.1007/978-981-32-9690-9_3

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

In the current digital world, a vast amount of data is recorded in a variety of forms like pictures, data, video, and audio. Generally, such type of information which is available in large voluminous form is actually not available in an organized manner which is appropriate for text processing. Text mining is a subfield of data mining which aims at exploring the useful information from the recorded resources. Document clustering helps the users to effectively navigate, review, and classify text documents into significant clusters, the knowledge that helps to handle the enormous amount of text mining. Preprocessing and feature selection are of tremendous importance in document clustering. In document clustering, preprocessing techniques applied to the documents are Bag of Words (BOW), Stop word removal, and Porter stemming. In this paper, we proposed an easy to use framework for preprocessing and Enhanced Term Frequency-Inverse Document Frequency (Enhanced TF-IDF) method for feature selection.

引用

页码：21 / 29

页数：9

共 50 条

[1] Feature selection and text classification for Chinese web documents
Xu, JC
Liu, DY
Hu, M
[J]. PROCEEDINGS OF THE 2004 INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND CYBERNETICS, VOLS 1-7, 2004, : 1304 - 1309
[2] Evolutionary Feature Selection for Text Documents using the SVM
Morariu, Daniel I.
Vintan, Lucian N.
Tresp, Volker
[J]. PROCEEDINGS OF WORLD ACADEMY OF SCIENCE, ENGINEERING AND TECHNOLOGY, VOL 15, 2006, 15 : 215 - +
[3] A Feature Selection Method for Classifying Highly Similar Text Documents
Kim, Jeenyoung
Min, Daiki
[J]. INDUSTRIAL ENGINEERING AND MANAGEMENT SYSTEMS, 2021, 20 (02): : 148 - 162
[4] An enhanced feature selection method for text classification
Kang, Jinbeom
Lee, Eunshil
Hong, Kwanghee
Park, Jeahyun
Kim, Taehwan
Park, Juyoung
Choi, Joongmin
Yang, Jaeyoung
[J]. PROCEEDINGS OF THE SECOND IASTED INTERNATIONAL CONFERENCE ON COMPUTATIONAL INTELLIGENCE, 2006, : 36 - 41
[5] Variable Global Feature Selection Scheme for automatic classification of text documents
Agnihotri, Deepak
Verma, Kesari
Tripathi, Priyanka
[J]. EXPERT SYSTEMS WITH APPLICATIONS, 2017, 81 : 268 - 281
[6] Feature Selection for Enhanced Author Identification of Turkish Text
Bay, Yasemin
Celebi, Erbug
[J]. INFORMATION SCIENCES AND SYSTEMS 2015, 2016, 363 : 371 - 379
[7] Using micro-documents for feature selection: The case of ordinal text classification
Baccianella, Stefano
Esuli, Andrea
Sebastiani, Fabrizio
[J]. EXPERT SYSTEMS WITH APPLICATIONS, 2013, 40 (11) : 4687 - 4696
[8] Enhanced Filter Feature Selection Methods for Arabic Text Categorization
Ghareb, Abdullah Saeed
Abu Bakara, Azuraliza
Al-Radaideh, Qasem A.
Hamdan, Abdul Razak
[J]. INTERNATIONAL JOURNAL OF INFORMATION RETRIEVAL RESEARCH, 2018, 8 (02) : 1 - 24
[9] Rough set feature selection methods for case-based categorization of text documents
Gupta, KM
Moore, PG
Aha, DW
Pal, SK
[J]. PATTERN RECOGNITION AND MACHINE INTELLIGENCE, PROCEEDINGS, 2005, 3776 : 792 - 798
[10] A New Feature Selection Method based on Intuitionistic Fuzzy Entropy to Categorize Text Documents
Revanasiddappa, M. B.
Harish, B. S.
[J]. INTERNATIONAL JOURNAL OF INTERACTIVE MULTIMEDIA AND ARTIFICIAL INTELLIGENCE, 2018, 5 (03): : 106 - 117

← 1 2 3 4 5 →