Web Document Categorization by Support Vector Clustering

被引:0
|
作者
Shi, Daming [1 ]
Tsui, Ming Hei [1 ]
Liu, Jigang [1 ]
机构
[1] Nanyang Technol Univ, Sch Comp Engn, Singapore 639798, Singapore
关键词
Document clustering; support vector clustering; simulated annealing;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Search Engine has proven its effectiveness for retrieval of information from World Wide Web. Traditionally, the search results are arranged in an ordered list by popularity and relevancy. However, the enormous size of matched web pages causes Inefficiency for users to locate the most relevant web pages. A proper organization of the search result is important to improve its browsability of web searching. In this paper, we proposed by performing Support Vector Clustering (SVC) on the search result to reorganize results in groups of similar context to facilitate effective browsing of search result by the users. SVC is a non-parametric clustering algorithm that can group clusters with arbitrary shapes and without the need to specify the number of clusters. It is a kernel clustering method that maps via a nonlinear function to a high dimension feature space. To obtain the optimal clustering result, choosing of the accurate parameters (kernel width and penalty coefficient) for SVC is crucial. In this paper, it proposed an automatic tuning method for SVC parameters to obtain the optimal result The results from the experiment have proven the effectiveness and usefulness of abovementioned method. The performance is comparable to other popular clustering techniques.
引用
收藏
页码:1482 / 1487
页数:6
相关论文
共 50 条
  • [1] A Modified Support Vector Clustering Method for Document Categorization
    Harish, B. S.
    Revanasiddappa, M. B.
    Kumar, S. V. Aruna
    [J]. 2016 IEEE INTERNATIONAL CONFERENCE ON KNOWLEDGE ENGINEERING AND APPLICATIONS (ICKEA 2016), 2016, : 1 - 5
  • [2] Document categorization using support vector machines
    Villasana, Sergio
    Seijas, Cesar
    Caralli, Antonino
    Jimenez, Jesus
    Pacheco, Jose
    [J]. INGENIERIA UC, 2008, 15 (03): : 45 - 52
  • [3] Partitioning-based clustering for Web document categorization
    Boley, D
    Gini, M
    Gross, R
    Han, EH
    Hastings, K
    Karypis, G
    Kumar, V
    Mobasher, B
    Moore, J
    [J]. DECISION SUPPORT SYSTEMS, 1999, 27 (03) : 329 - 341
  • [4] Hierarchically SVM classification based on support vector clustering method and its application to document categorization
    Hao, Pei-Yi
    Chiang, Jung-Hsien
    Tu, Yi-Kun
    [J]. EXPERT SYSTEMS WITH APPLICATIONS, 2007, 33 (03) : 627 - 635
  • [5] Improving Support Vector Data Description for Document Clustering
    Wang, Ziqiang
    Sun, Xia
    [J]. ADVANCES IN FUTURE COMPUTER AND CONTROL SYSTEMS, VOL 2, 2012, 160 : 271 - 276
  • [6] Application for Web Text Categorization Based on Support Vector Machine
    Pan Hao
    Duan Ying
    Tan Longyuan
    [J]. 2009 INTERNATIONAL FORUM ON COMPUTER SCIENCE-TECHNOLOGY AND APPLICATIONS, VOL 2, PROCEEDINGS, 2009, : 42 - 45
  • [7] Web Document Classification using Support Vector Machine
    Shinde, Sharmila
    Joeg, Prasanna
    Vanjale, Sandeep
    [J]. 2017 INTERNATIONAL CONFERENCE ON CURRENT TRENDS IN COMPUTER, ELECTRICAL, ELECTRONICS AND COMMUNICATION (CTCEEC), 2017, : 688 - 691
  • [8] Incremental fuzzy clustering for document categorization
    Mei, Jian-Ping
    Wang, Yangtao
    Chen, Lihui
    Miao, Chunyan
    [J]. 2014 IEEE INTERNATIONAL CONFERENCE ON FUZZY SYSTEMS (FUZZ-IEEE), 2014, : 1518 - 1525
  • [9] Document clustering method using dimension reduction and support vector clustering to overcome sparseness
    Jun, Sunghae
    Park, Sang-Sung
    Jang, Dong-Sik
    [J]. EXPERT SYSTEMS WITH APPLICATIONS, 2014, 41 (07) : 3204 - 3212
  • [10] Document clustering using locality preserving indexing and support vector machines
    Chengfu Yang
    Zhang Yi
    [J]. Soft Computing, 2008, 12 : 677 - 683