An approach to text classification using dimensionality reduction and combination of classifiers

被引:10
|
作者
Jain, G [1 ]
Ginwala, A [1 ]
Aslandogan, YA [1 ]
机构
[1] Univ Texas, Dept Comp Sci & Engn, Arlington, TX 76019 USA
关键词
D O I
10.1109/IRI.2004.1431521
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Text classification involves assignment of predetermined categories to textual resources. Applications of text classification include recommendation systems, personalization, help desk automation, content filtering and routing, selective alerting, and text mining. This paper describes an experiment for improving the classification accuracy of a large text corpus by the use of dimensionality reduction and multiple-classifier combination techniques. Three different classifiers have been used namely Naive Bayes, J48 Decision Tree and Decision Table. The results of these classifiers are combined using techniques such as Simple Voting, Weighted Voting and Probability-based Voting. The classification accuracy is further improved by the use of a dimensionality reduction method based on concept indexing. Experiments conducted on the Reuters 21578 dataset indicate that the combination approach provides an improved and scalable method for text classification. Also, it is observed that concept indexing helps with classification accuracy in addition to efficiency and scalability.
引用
收藏
页码:564 / 569
页数:6
相关论文
共 50 条
  • [31] An Improvement of Flat Approach on Hierarchical Text Classification Using Top-Level Pruning Classifiers
    Phachongkitphiphat, Natchanon
    Vateekul, Peerapon
    [J]. 2014 11TH INTERNATIONAL JOINT CONFERENCE ON COMPUTER SCIENCE AND SOFTWARE ENGINEERING (JCSSE), 2014, : 86 - 90
  • [32] On the fusion of threshold classifiers for categorization and dimensionality reduction
    Hans A. Kestler
    Ludwig Lausser
    Wolfgang Lindner
    Günther Palm
    [J]. Computational Statistics, 2011, 26 : 321 - 340
  • [33] On the fusion of threshold classifiers for categorization and dimensionality reduction
    Kestler, Hans A.
    Lausser, Ludwig
    Lindner, Wolfgang
    Palm, Guenther
    [J]. COMPUTATIONAL STATISTICS, 2011, 26 (02) : 321 - 340
  • [34] Meta-Classification using SVM Classifiers for Text Documents
    Morariu, Daniel I.
    Vintan, Lucian N.
    Tresp, Volker
    [J]. PROCEEDINGS OF WORLD ACADEMY OF SCIENCE, ENGINEERING AND TECHNOLOGY, VOL 15, 2006, 15 : 222 - +
  • [35] Multivariate stream data classification using simple text classifiers
    Seo, Sungbo
    Kang, Jaewoo
    Lee, Dongwon
    Ryu, Kean Ho
    [J]. DATABASE AND EXPERT SYSTEMS APPLICATIONS, PROCEEDINGS, 2006, 4080 : 420 - 429
  • [36] A method of dimensionality reduction by selection of components in principal component analysis for text classification
    Zhang, Yangwu
    Li, Guohe
    Zong, Heng
    [J]. FILOMAT, 2018, 32 (05) : 1499 - 1506
  • [37] POST-PROCESSING AND DIMENSIONALITY REDUCTION FOR EXTREME LEARNING MACHINE IN TEXT CLASSIFICATION
    Trusca, Maria Mihaela
    Aldea, Anamaria
    Gradinaru, Simona Elena
    Albu, Crisan
    [J]. ECONOMIC COMPUTATION AND ECONOMIC CYBERNETICS STUDIES AND RESEARCH, 2021, 55 (04): : 37 - 50
  • [38] An effective dimensionality reduction method for text classification based on TFP-tree
    Liu, Lu
    [J]. JOURNAL OF INTELLIGENT & FUZZY SYSTEMS, 2018, 34 (03) : 1893 - 1905
  • [39] Text Categorisation Through Dimensionality Reduction Using Wavelet Transform
    Chamorro-Padial, Jorge
    Rodriguez-Sanchez, Rosa
    [J]. JOURNAL OF INFORMATION & KNOWLEDGE MANAGEMENT, 2020, 19 (04)
  • [40] Using minimum classification error training in dimensionality reduction
    Wang, Xuechuan
    Paliwal, Kuldip K.
    [J]. Neural Networks for Signal Processing - Proceedings of the IEEE Workshop, 2000, 1 : 338 - 345