A Modified Support Vector Clustering Method for Document Categorization

被引:0
|
作者
Harish, B. S. [1 ]
Revanasiddappa, M. B. [1 ]
Kumar, S. V. Aruna [1 ]
机构
[1] JSS Sci & Technol Univ, Dept Informat Sci & Engn, Mysuru, Karnataka, India
关键词
text categorization; support vector clustering; juzzy C-Means; term document matrix;
D O I
暂无
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
In this paper, we propose a novel text categorization method based on modified Support Vector Clustering (SVC). SVC is a density based clustering approach, which handles the arbitrary shape clusters effectively. The main drawback of traditional SVC is that it treats unclassified documents as outliers. To overcome this problem, we employed Fuzzy C-Means (FCM) to cluster unclassified documents. The modified SVC (SVC-FCM) is applied to categorize text documents. The proposed method consists of three steps: In the first step, Regularized Locality Preserving Indexing (RLPI) is applied on Term Document Matrix (TDM) to reduce dimensionality of features. In second step, we use SVC to find base-cluster centers of documents. Finally, we use FCM to cluster unclassified documents. To evaluate the performance of the proposed method, we conducted experiments on standard 20-NewsGroup dataset.
引用
收藏
页码:1 / 5
页数:5
相关论文
共 50 条
  • [21] Support vector machines for spam categorization
    Drucker, H
    Wu, DH
    Vapnik, VN
    [J]. IEEE TRANSACTIONS ON NEURAL NETWORKS, 1999, 10 (05): : 1048 - 1054
  • [22] Automatic document categorization - Interpreting the perfomance of clustering algorithms
    Stein, B
    Eissen, SMZ
    [J]. KI 2003: ADVANCES IN ARTIFICIAL INTELLIGENCE, 2003, 2821 : 254 - 266
  • [23] Partitioning-based clustering for Web document categorization
    Boley, D
    Gini, M
    Gross, R
    Han, EH
    Hastings, K
    Karypis, G
    Kumar, V
    Mobasher, B
    Moore, J
    [J]. DECISION SUPPORT SYSTEMS, 1999, 27 (03) : 329 - 341
  • [24] An enhanced Support Vector Machine classification framework by using Euclidean distance function for text document categorization
    Lam Hong Lee
    Chin Heng Wan
    Rajprasad Rajkumar
    Dino Isa
    [J]. Applied Intelligence, 2012, 37 : 80 - 99
  • [25] An enhanced Support Vector Machine classification framework by using Euclidean distance function for text document categorization
    Lee, Lam Hong
    Wan, Chin Heng
    Rajkumar, Rajprasad
    Isa, Dino
    [J]. APPLIED INTELLIGENCE, 2012, 37 (01) : 80 - 99
  • [26] An anti-noise text categorization method based on support vector machines
    Chen, L
    Huang, J
    Gong, ZH
    [J]. ADVANCES IN WEB INTELLIGENCE, PROCEEDINGS, 2005, 3528 : 272 - 278
  • [27] A new training method for support vector machines:: Clustering k-NN support vector machines
    Comak, Emre
    Arslan, Ahmet
    [J]. EXPERT SYSTEMS WITH APPLICATIONS, 2008, 35 (03) : 564 - 568
  • [28] Fuzzy model identification using support vector clustering method
    Uçar, A
    Demir, Y
    Güzelis, C
    [J]. ARTIFICIAL NEURAL NETWORKS AND NEURAL INFORMATION PROCESSING - ICAN/ICONIP 2003, 2003, 2714 : 225 - 233
  • [29] A Fast and Stable Cluster Labeling Method for Support Vector Clustering
    Li, Huina
    [J]. JOURNAL OF COMPUTERS, 2013, 8 (12) : 3251 - 3256
  • [30] Sparse Tensor Co-Clustering as a Tool for Document Categorization
    Boutalbi, Rafika
    Labiod, Lazhar
    Nadif, Mohamed
    [J]. PROCEEDINGS OF THE 42ND INTERNATIONAL ACM SIGIR CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION RETRIEVAL (SIGIR '19), 2019, : 1157 - 1160