An enhanced Support Vector Machine classification framework by using Euclidean distance function for text document categorization

被引:113
|
作者
Lee, Lam Hong [1 ]
Wan, Chin Heng [1 ]
Rajkumar, Rajprasad [2 ]
Isa, Dino [2 ]
机构
[1] Univ Tunku Abdul Rahman, Fac Informat & Commun Technol, Kampar 31900, Perak, Malaysia
[2] Univ Nottingham, Fac Engn, Intelligent Syst Res Grp, Semenyih 43500, Selangor, Malaysia
关键词
Text document classification; Support Vector Machine; Euclidean distance function; Kernel function; Soft margin parameter; KERNEL PARAMETERS; LEARNING-METHODS;
D O I
10.1007/s10489-011-0314-z
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
This paper presents the implementation of a new text document classification framework that uses the Support Vector Machine (SVM) approach in the training phase and the Euclidean distance function in the classification phase, coined as Euclidean-SVM. The SVM constructs a classifier by generating a decision surface, namely the optimal separating hyper-plane, to partition different categories of data points in the vector space. The concept of the optimal separating hyper-plane can be generalized for the non-linearly separable cases by introducing kernel functions to map the data points from the input space into a high dimensional feature space so that they could be separated by a linear hyper-plane. This characteristic causes the implementation of different kernel functions to have a high impact on the classification accuracy of the SVM. Other than the kernel functions, the value of soft margin parameter, C is another critical component in determining the performance of the SVM classifier. Hence, one of the critical problems of the conventional SVM classification framework is the necessity of determining the appropriate kernel function and the appropriate value of parameter C for different datasets of varying characteristics, in order to guarantee high accuracy of the classifier. In this paper, we introduce a distance measurement technique, using the Euclidean distance function to replace the optimal separating hyper-plane as the classification decision making function in the SVM. In our approach, the support vectors for each category are identified from the training data points during training phase using the SVM. In the classification phase, when a new data point is mapped into the original vector space, the average distances between the new data point and the support vectors from different categories are measured using the Euclidean distance function. The classification decision is made based on the category of support vectors which has the lowest average distance with the new data point, and this makes the classification decision irrespective of the efficacy of hyper-plane formed by applying the particular kernel function and soft margin parameter. We tested our proposed framework using several text datasets. The experimental results show that this approach makes the accuracy of the Euclidean-SVM text classifier to have a low impact on the implementation of kernel functions and soft margin parameter C.
引用
收藏
页码:80 / 99
页数:20
相关论文
共 50 条
  • [1] An enhanced Support Vector Machine classification framework by using Euclidean distance function for text document categorization
    Lam Hong Lee
    Chin Heng Wan
    Rajprasad Rajkumar
    Dino Isa
    [J]. Applied Intelligence, 2012, 37 : 80 - 99
  • [2] Automated text categorization using support vector machine
    Kwok, JTY
    [J]. ICONIP'98: THE FIFTH INTERNATIONAL CONFERENCE ON NEURAL INFORMATION PROCESSING JOINTLY WITH JNNS'98: THE 1998 ANNUAL CONFERENCE OF THE JAPANESE NEURAL NETWORK SOCIETY - PROCEEDINGS, VOLS 1-3, 1998, : 347 - 351
  • [3] Text document preprocessing with the Bayes formula for classification using the Support Vector Machine
    Isa, Dino
    Lee, Lam Hong
    Kallimani, V. P.
    RajKumar, R.
    [J]. IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2008, 20 (09) : 1264 - 1272
  • [4] Web Document Classification using Support Vector Machine
    Shinde, Sharmila
    Joeg, Prasanna
    Vanjale, Sandeep
    [J]. 2017 INTERNATIONAL CONFERENCE ON CURRENT TRENDS IN COMPUTER, ELECTRICAL, ELECTRONICS AND COMMUNICATION (CTCEEC), 2017, : 688 - 691
  • [5] Document categorization using support vector machines
    Villasana, Sergio
    Seijas, Cesar
    Caralli, Antonino
    Jimenez, Jesus
    Pacheco, Jose
    [J]. INGENIERIA UC, 2008, 15 (03): : 45 - 52
  • [6] An improved incremental learning algorithm for text categorization using support vector machine
    Cao, Jianfang
    Wang, Hongbin
    [J]. Journal of Chemical and Pharmaceutical Research, 2014, 6 (06) : 210 - 217
  • [7] Document classification based on support vector machine using a concept vector model
    Deng, Shuang
    Peng, Hong
    [J]. 2006 IEEE/WIC/ACM INTERNATIONAL CONFERENCE ON WEB INTELLIGENCE, (WI 2006 MAIN CONFERENCE PROCEEDINGS), 2006, : 473 - +
  • [8] Exploring Feature Selection and Support Vector Machine in Text Categorization
    Abdul-Rahman, Shuzlina
    Mutalib, Sofianita
    Khanafi, Nur Amira
    Ali, Azliza Mohd
    [J]. 2013 IEEE 16TH INTERNATIONAL CONFERENCE ON COMPUTATIONAL SCIENCE AND ENGINEERING (CSE 2013), 2013, : 1101 - 1104
  • [9] A new transductive support vector machine approach to text categorization
    Sun, F
    Sun, MS
    [J]. PROCEEDINGS OF THE 2005 IEEE INTERNATIONAL CONFERENCE ON NATURAL LANGUAGE PROCESSING AND KNOWLEDGE ENGINEERING (IEEE NLP-KE'05), 2005, : 631 - 635
  • [10] An Improved Algorithm for Multiclass Text Categorization with Support Vector Machine
    Shao, Fubo
    He, Guoping
    Zhang, Xin
    [J]. PROCEEDINGS OF THE 2008 INTERNATIONAL SYMPOSIUM ON COMPUTATIONAL INTELLIGENCE AND DESIGN, VOL 1, 2008, : 336 - 339