A fast classification strategy for SVM on the large-scale high-dimensional datasets

被引:0
|
作者
I-Jing Li
Jiunn-Lin Wu
Chih-Hung Yeh
机构
[1] National Taichung University of Science and Technology,Department of Applied Statistics
[2] National Chung Hsing University,Deptartment of Computer Science and Engineering
来源
关键词
Profile support vector machine; Large-scale datasets; High-dimensional data; MagKmeans algorithm; Fast condensed nearest neighbor rule;
D O I
暂无
中图分类号
学科分类号
摘要
The challenges of the classification for the large-scale and high-dimensional datasets are: (1) It requires huge computational burden in the training phase and in the classification phase; (2) it needs large storage requirement to save many training data; and (3) it is difficult to determine decision rules in the high-dimensional data. Nonlinear support vector machine (SVM) is a popular classifier, and it performs well on a high-dimensional dataset. However, it easily leads overfitting problem especially when the data are not evenly distributed. Recently, profile support vector machine (PSVM) is proposed to solve this problem. Because local learning is superior to global learning, multiple linear SVM models are trained to get similar performance to a nonlinear SVM model. However, it is inefficient in the training phase. In this paper, we proposed a fast classification strategy for PSVM to speed up the training time and the classification time. We first choose border samples near the decision boundary from training samples. Then, the reduced training samples are clustered to several local subsets through MagKmeans algorithm. In the paper, we proposed a fast search method to find the optimal solution for MagKmeans algorithm. Each cluster is used to learn multiple linear SVM models. Both artificial datasets and real datasets are used to evaluate the performance of the proposed method. In the experimental result, the proposed method prevents overfitting and underfitting problems. Moreover, the proposed strategy is effective and efficient.
引用
收藏
页码:1023 / 1038
页数:15
相关论文
共 50 条
  • [1] A fast classification strategy for SVM on the large-scale high-dimensional datasets
    Li, I-Jing
    Wu, Jiunn-Lin
    Yeh, Chih-Hung
    [J]. PATTERN ANALYSIS AND APPLICATIONS, 2018, 21 (04) : 1023 - 1038
  • [2] Parallel algorithms for clustering high-dimensional large-scale datasets
    Nagesh, H
    Goil, S
    Choudhary, A
    [J]. DATA MINING FOR SCIENTIFIC AND ENGINEERING APPLICATIONS, 2001, 2 : 335 - 356
  • [3] LARGE-SCALE HIGH-DIMENSIONAL CLUSTERING WITH FAST SKETCHING
    Chatalic, Antoine
    Gribonval, Remi
    Keriven, Nicolas
    [J]. 2018 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2018, : 4714 - 4718
  • [4] Latent-lSVM classification of very high-dimensional and large-scale multi-class datasets
    Thanh-Nghi Do
    Poulet, Francois
    [J]. CONCURRENCY AND COMPUTATION-PRACTICE & EXPERIENCE, 2019, 31 (02):
  • [5] High-Dimensional Signature Compression for Large-Scale Image Classification
    Sanchez, Jorge
    Perronnin, Florent
    [J]. 2011 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2011, : 1665 - 1672
  • [6] Fast SVM classifier for large-scale classification problems
    Wang, Huajun
    Li, Genghui
    Wang, Zhenkun
    [J]. INFORMATION SCIENCES, 2023, 642
  • [7] An iterative SVM approach to feature selection and classification in high-dimensional datasets
    Liu, Dehua
    Qian, Hui
    Dai, Guang
    Zhang, Zhihua
    [J]. PATTERN RECOGNITION, 2013, 46 (09) : 2531 - 2537
  • [8] Visualizing Large-scale and High-dimensional Data
    Tang, Jian
    Liu, Jingzhou
    Zhang, Ming
    Mei, Qiaozhu
    [J]. PROCEEDINGS OF THE 25TH INTERNATIONAL CONFERENCE ON WORLD WIDE WEB (WWW'16), 2016, : 287 - 297
  • [9] RANSAC-SVM for Large-Scale Datasets
    Nishida, Kenji
    Kurita, Takio
    [J]. 19TH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION, VOLS 1-6, 2008, : 3767 - 3770
  • [10] Supervised Papers Classification on Large-Scale High-Dimensional Data with Apache Spark
    Akritidis, Leonidas
    Bozanis, Panayiotis
    Fevgas, Athanasios
    [J]. 2018 16TH IEEE INT CONF ON DEPENDABLE, AUTONOM AND SECURE COMP, 16TH IEEE INT CONF ON PERVAS INTELLIGENCE AND COMP, 4TH IEEE INT CONF ON BIG DATA INTELLIGENCE AND COMP, 3RD IEEE CYBER SCI AND TECHNOL CONGRESS (DASC/PICOM/DATACOM/CYBERSCITECH), 2018, : 987 - 994