An enhanced Support Vector Machine classification framework by using Euclidean distance function for text document categorization

被引:113
|
作者
Lee, Lam Hong [1 ]
Wan, Chin Heng [1 ]
Rajkumar, Rajprasad [2 ]
Isa, Dino [2 ]
机构
[1] Univ Tunku Abdul Rahman, Fac Informat & Commun Technol, Kampar 31900, Perak, Malaysia
[2] Univ Nottingham, Fac Engn, Intelligent Syst Res Grp, Semenyih 43500, Selangor, Malaysia
关键词
Text document classification; Support Vector Machine; Euclidean distance function; Kernel function; Soft margin parameter; KERNEL PARAMETERS; LEARNING-METHODS;
D O I
10.1007/s10489-011-0314-z
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
This paper presents the implementation of a new text document classification framework that uses the Support Vector Machine (SVM) approach in the training phase and the Euclidean distance function in the classification phase, coined as Euclidean-SVM. The SVM constructs a classifier by generating a decision surface, namely the optimal separating hyper-plane, to partition different categories of data points in the vector space. The concept of the optimal separating hyper-plane can be generalized for the non-linearly separable cases by introducing kernel functions to map the data points from the input space into a high dimensional feature space so that they could be separated by a linear hyper-plane. This characteristic causes the implementation of different kernel functions to have a high impact on the classification accuracy of the SVM. Other than the kernel functions, the value of soft margin parameter, C is another critical component in determining the performance of the SVM classifier. Hence, one of the critical problems of the conventional SVM classification framework is the necessity of determining the appropriate kernel function and the appropriate value of parameter C for different datasets of varying characteristics, in order to guarantee high accuracy of the classifier. In this paper, we introduce a distance measurement technique, using the Euclidean distance function to replace the optimal separating hyper-plane as the classification decision making function in the SVM. In our approach, the support vectors for each category are identified from the training data points during training phase using the SVM. In the classification phase, when a new data point is mapped into the original vector space, the average distances between the new data point and the support vectors from different categories are measured using the Euclidean distance function. The classification decision is made based on the category of support vectors which has the lowest average distance with the new data point, and this makes the classification decision irrespective of the efficacy of hyper-plane formed by applying the particular kernel function and soft margin parameter. We tested our proposed framework using several text datasets. The experimental results show that this approach makes the accuracy of the Euclidean-SVM text classifier to have a low impact on the implementation of kernel functions and soft margin parameter C.
引用
收藏
页码:80 / 99
页数:20
相关论文
共 50 条
  • [11] Application for Web Text Categorization Based on Support Vector Machine
    Pan Hao
    Duan Ying
    Tan Longyuan
    [J]. 2009 INTERNATIONAL FORUM ON COMPUTER SCIENCE-TECHNOLOGY AND APPLICATIONS, VOL 2, PROCEEDINGS, 2009, : 42 - 45
  • [12] Solving multi-label text categorization problem using support vector machine approach with membership function
    Department of Industrial and Information Management, National Cheng Kung University, 1 Ta-Shueh Road, Tainan City 70101, Taiwan
    不详
    [J]. Neurocomputing, 1600, 17 (3682-3689):
  • [13] Solving multi-label text categorization problem using support vector machine approach with membership function
    Wang, Tai-Yue
    Chiang, Huei-Min
    [J]. NEUROCOMPUTING, 2011, 74 (17) : 3682 - 3689
  • [14] Support vector machines for text categorization in Chinese question classification
    Lin, Xu-Dong
    Peng, Hong
    Liu, Bo
    [J]. 2006 IEEE/WIC/ACM INTERNATIONAL CONFERENCE ON WEB INTELLIGENCE, (WI 2006 MAIN CONFERENCE PROCEEDINGS), 2006, : 334 - +
  • [15] Research of Support Vector Machine in Text Classification
    Shan, Chen
    [J]. FUTURE COMPUTER, COMMUNICATION, CONTROL AND AUTOMATION, 2011, 119 : 567 - 573
  • [16] Online Support Vector Machine Based on Minimum Euclidean Distance
    Dahiya, Kalpana
    Chauhan, Vinod Kumar
    Sharma, Anuj
    [J]. PROCEEDINGS OF INTERNATIONAL CONFERENCE ON COMPUTER VISION AND IMAGE PROCESSING, CVIP 2016, VOL 1, 2017, 459 : 89 - 99
  • [17] AN EXPERT FRAMEWORK FOR EFFECTIVE DOCUMENT CLASSIFICATION USING SUPPORT VECTOR MACHINES
    Shahbaz, Muhammad
    Ahmed, Qanita
    Guergachi, Aziz
    [J]. INTERNATIONAL JOURNAL OF INNOVATIVE COMPUTING INFORMATION AND CONTROL, 2013, 9 (04): : 1523 - 1537
  • [18] Efficient text categorization using a min-max modular support vector machine
    Liu, FY
    Wang, KA
    Lul, BL
    Utiyama, M
    Isahara, H
    [J]. HUMAN INTERACTION WITH MACHINES, 2006, : 13 - +
  • [19] Text Classification on Customer Review Dataset Using Support Vector Machine
    Bamgboye, Pelumi O.
    Adebiyi, Marion O.
    Adebiyi, Abayomi A.
    Osang, Francis B.
    Adebiyi, Ayodele A.
    Enwere, Miracle Nmesomachi
    Shekari, Abednego
    [J]. INTELLIGENT SUSTAINABLE SYSTEMS, WORLDS4 2022, VOL 2, 2023, 579 : 407 - 415
  • [20] Robustified distance based fuzzy membership function for support vector machine classification
    Mohammadi, M.
    Sarmad, M.
    [J]. IRANIAN JOURNAL OF FUZZY SYSTEMS, 2019, 16 (06): : 191 - 204