An Enhanced Feature Selection for Text Documents

被引：1

作者：

Thatha, Venkata Nagaraju ^{[1
]}

Babu, A. Sudhir ^{[2
]}

Haritha, D. ^{[1
]}

机构：

[1] JNTUK Univ, Dept Comp Sci & Engn, Kakinada, India

[2] PVPSIT, Dept Comp Sci & Engn, Vijayawada, India

来源：

SMART INTELLIGENT COMPUTING AND APPLICATIONS, VOL 2 | 2020年 / 160卷

关键词：

Text mining; Bag of Words; Stop word removal; Stemming; Feature selection;

D O I：

10.1007/978-981-32-9690-9_3

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

In the current digital world, a vast amount of data is recorded in a variety of forms like pictures, data, video, and audio. Generally, such type of information which is available in large voluminous form is actually not available in an organized manner which is appropriate for text processing. Text mining is a subfield of data mining which aims at exploring the useful information from the recorded resources. Document clustering helps the users to effectively navigate, review, and classify text documents into significant clusters, the knowledge that helps to handle the enormous amount of text mining. Preprocessing and feature selection are of tremendous importance in document clustering. In document clustering, preprocessing techniques applied to the documents are Bag of Words (BOW), Stop word removal, and Porter stemming. In this paper, we proposed an easy to use framework for preprocessing and Enhanced Term Frequency-Inverse Document Frequency (Enhanced TF-IDF) method for feature selection.

引用

页码：21 / 29

页数：9

共 50 条

[31] Feature Selection Methods for Text Classification
Dasgupta, Anirban
Drineas, Petros
Harb, Boulos
Josifovski, Vanja
Mahoney, Michael W.
[J]. KDD-2007 PROCEEDINGS OF THE THIRTEENTH ACM SIGKDD INTERNATIONAL CONFERENCE ON KNOWLEDGE DISCOVERY AND DATA MINING, 2007, : 230 - +
[32] Unsupervised feature selection for text data
Wiratunga, Nirmalie
Lothian, Rob
Massie, Stewart
[J]. ADVANCES IN CASE-BASED REASONING, PROCEEDINGS, 2006, 4106 : 340 - 354
[33] Feature selection for text classification: A review
Xuelian Deng
Yuqing Li
Jian Weng
Jilian Zhang
[J]. Multimedia Tools and Applications, 2019, 78 : 3797 - 3816
[34] Feature Selection for Ordinal Text Classification
Baccianella, Stefano
Esuli, Andrea
Sebastiani, Fabrizio
[J]. NEURAL COMPUTATION, 2014, 26 (03) : 557 - 591
[35] New methods for text categorization based on a new feature selection method and a new similarity measure between documents
Lee, Li-Wei
Chen, Shyi-Ming
[J]. ADVANCES IN APPLIED ARTIFICIAL INTELLIGENCE, PROCEEDINGS, 2006, 4031 : 1280 - 1289
[36] Summarizing text documents: Sentence selection and evaluation metrics
Goldstein, J
Kantrowitz, M
Mittal, V
Carbonell, J
[J]. SIGIR'99: PROCEEDINGS OF 22ND INTERNATIONAL CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION RETRIEVAL, 1999, : 121 - 128
[37] Feature Selection and Feature Weight Estimate in Web Text Mining
Pei, Zhili
Qi, Jianhong
Zhang, Xinhong
Zhou, Yuxin
Bai, Mingyu
Wang, Qinghu
Liu, Lisha
Fan, Xiaojing
Jiang, Mingyang
[J]. 2ND INTERNATIONAL CONFERENCE ON INFORMATION AND COMMUNICATION TECHNOLOGY FOR EDUCATION (ICTE 2015), 2015, : 316 - 320
[38] Feature selection based on feature interactions with application to text categorization
Tang, Xiaochuan
Dai, Yuanshun
Xiang, Yanping
[J]. EXPERT SYSTEMS WITH APPLICATIONS, 2019, 120 : 207 - 216
[39] Hierarchical approach to select feature vectors for classification of text documents
Kapalavayi, Nagesh
Murthy, S. N. Jayaram
Hu, Gongzhu
[J]. 2006 IEEE INTERNATIONAL CONFERENCE ON COMPUTER SYSTEMS AND APPLICATIONS, VOLS 1-3, 2006, : 1179 - +
[40] An Evolutionary Algorithm for Feature Selective Double Clustering of Text Documents
Nourashrafeddin, S. N.
Milios, Evangelos
Arnold, Dirk V.
[J]. 2013 IEEE CONGRESS ON EVOLUTIONARY COMPUTATION (CEC), 2013, : 446 - 453

← 1 2 3 4 5 →