Text Document Classification with PCA and One-Class SVM

被引:5
|
作者
Kumar, B. Shravan [1 ,2 ]
Ravi, Vadlamani [1 ]
机构
[1] Inst Dev & Res Banking Technol, Ctr Excellence Analyt, Castle Hills Rd 1, Hyderabad 500057, Andhra Pradesh, India
[2] Univ Hyderabad, Sch Comp & Informat Sci, Hyderabad 500046, Andhra Pradesh, India
关键词
Text mining; Dimensionality reduction; Document classification; Principal component analysis; One-class support vector machine; PRINCIPAL COMPONENT ANALYSIS; DIMENSION REDUCTION; SELECTION;
D O I
10.1007/978-981-10-3153-3_11
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
We propose a document classifier based on principal component analysis (PCA) and one-class support vector machine (OCSVM), where PCA helps achieve dimensionality reduction and OCSVM performs classification. Initially, PCA is invoked on the document-term matrix resulting in choosing the top few principal components. Later, OCSVM is trained on the records of the matrix corresponding to the negative class. Then, we tested the trained OCSVM with the records of the matrix corresponding to the positive class. The effectiveness of the proposed model is demonstrated on the popular datasets, viz., 20NG, malware, Syskill, & Webert, and customer feedbacks of a Bank. We observed that the hybrid yielded very high accuracies in all datasets.
引用
收藏
页码:107 / 115
页数:9
相关论文
共 50 条
  • [1] One-Class Text Document Classification with OCSVM and LSI
    Kumar, B. Shravan
    Ravi, Vadlamani
    ARTIFICIAL INTELLIGENCE AND EVOLUTIONARY COMPUTATIONS IN ENGINEERING SYSTEMS, ICAIECES 2016, 2017, 517 : 597 - 606
  • [2] Parameter estimation of one-class SVM on imbalance text classification
    Zhuang, Ling
    Dai, Honghua
    ADVANCES IN ARTIFICIAL INTELLIGENCE, PROCEEDINGS, 2006, 4013 : 538 - 549
  • [3] Document representation for one-class SVM
    Wu, XY
    Srihari, R
    Zheng, ZH
    MACHINE LEARNING: ECML 2004, PROCEEDINGS, 2004, 3201 : 489 - 500
  • [4] One-Class SVMs for Document Classification
    Manevitz, Larry M.
    Yousef, Malik
    Journal of Machine Learning Research, 2002, 2 : 139 - 154
  • [5] Document Classification in a non-stationary environment: A One-Class SVM Approach
    Anh Khoi Ngo Ho
    Ragot, Nicolas
    Ramel, Jean-Yves
    Eglin, Veronique
    Sidere, Nicolas
    2013 12TH INTERNATIONAL CONFERENCE ON DOCUMENT ANALYSIS AND RECOGNITION (ICDAR), 2013, : 616 - 620
  • [6] One-class SVMs for document classification
    Manevitz, LM
    Yousef, M
    JOURNAL OF MACHINE LEARNING RESEARCH, 2002, 2 (02) : 139 - 154
  • [7] Document Classification with One-class Multiview Learning
    Chen, Bin
    Li, Bin
    Pan, Zhisong
    Feng, Aimin
    2009 INTERNATIONAL CONFERENCE ON INDUSTRIAL AND INFORMATION SYSTEMS, PROCEEDINGS, 2009, : 289 - +
  • [8] In-depth comparisons of MaxEnt, biased SVM and one-class SVM for one-class classification of remote sensing data
    Mack, Benjamin
    Waske, Bjoern
    REMOTE SENSING LETTERS, 2017, 8 (03) : 290 - 299
  • [9] Improved one-class SVM classifier for sounds classification
    Rabaoui, A.
    Davy, M.
    Rossignol, S.
    Lachiri, Z.
    Ellouze, N.
    2007 IEEE CONFERENCE ON ADVANCED VIDEO AND SIGNAL BASED SURVEILLANCE, 2007, : 117 - +
  • [10] Nearest Mean Classification via One-Class SVM
    Shin, Donghyuk
    Kim, Saejoon
    INTERNATIONAL JOINT CONFERENCE ON COMPUTATIONAL SCIENCES AND OPTIMIZATION, VOL 1, PROCEEDINGS, 2009, : 593 - 596