A fast classification strategy for SVM on the large-scale high-dimensional datasets

被引:0
|
作者
I-Jing Li
Jiunn-Lin Wu
Chih-Hung Yeh
机构
[1] National Taichung University of Science and Technology,Department of Applied Statistics
[2] National Chung Hsing University,Deptartment of Computer Science and Engineering
来源
关键词
Profile support vector machine; Large-scale datasets; High-dimensional data; MagKmeans algorithm; Fast condensed nearest neighbor rule;
D O I
暂无
中图分类号
学科分类号
摘要
The challenges of the classification for the large-scale and high-dimensional datasets are: (1) It requires huge computational burden in the training phase and in the classification phase; (2) it needs large storage requirement to save many training data; and (3) it is difficult to determine decision rules in the high-dimensional data. Nonlinear support vector machine (SVM) is a popular classifier, and it performs well on a high-dimensional dataset. However, it easily leads overfitting problem especially when the data are not evenly distributed. Recently, profile support vector machine (PSVM) is proposed to solve this problem. Because local learning is superior to global learning, multiple linear SVM models are trained to get similar performance to a nonlinear SVM model. However, it is inefficient in the training phase. In this paper, we proposed a fast classification strategy for PSVM to speed up the training time and the classification time. We first choose border samples near the decision boundary from training samples. Then, the reduced training samples are clustered to several local subsets through MagKmeans algorithm. In the paper, we proposed a fast search method to find the optimal solution for MagKmeans algorithm. Each cluster is used to learn multiple linear SVM models. Both artificial datasets and real datasets are used to evaluate the performance of the proposed method. In the experimental result, the proposed method prevents overfitting and underfitting problems. Moreover, the proposed strategy is effective and efficient.
引用
收藏
页码:1023 / 1038
页数:15
相关论文
共 50 条
  • [21] RECURSIVE REDUCTION NET FOR LARGE-SCALE HIGH-DIMENSIONAL DATA
    Ke, Tsung-Wei
    Liu, Tyng-Luh
    2016 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING (ICIP), 2016, : 1903 - 1907
  • [22] Batched Large-scale Bayesian Optimization in High-dimensional Spaces
    Wang, Zi
    Gehring, Clement
    Kohli, Pushmeet
    Jegelka, Stefanie
    INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND STATISTICS, VOL 84, 2018, 84
  • [23] Classifying very high-dimensional and large-scale multi-class image datasets with Latent-lSVM
    Thanh-Nghi Do
    Poulet, Francois
    2016 INT IEEE CONFERENCES ON UBIQUITOUS INTELLIGENCE & COMPUTING, ADVANCED & TRUSTED COMPUTING, SCALABLE COMPUTING AND COMMUNICATIONS, CLOUD AND BIG DATA COMPUTING, INTERNET OF PEOPLE, AND SMART WORLD CONGRESS (UIC/ATC/SCALCOM/CBDCOM/IOP/SMARTWORLD), 2016, : 714 - 721
  • [24] A Supervised Learning Model for High-Dimensional and Large-Scale Data
    Peng, Chong
    Cheng, Jie
    Cheng, Qiang
    ACM TRANSACTIONS ON INTELLIGENT SYSTEMS AND TECHNOLOGY, 2017, 8 (02)
  • [25] Feature screening with large-scale and high-dimensional survival data
    Yi, Grace Y.
    He, Wenqing
    Carroll, Raymond. J.
    BIOMETRICS, 2022, 78 (03) : 894 - 907
  • [26] Distributed Methods for High-dimensional and Large-scale Tensor Factorization
    Shin, Kijung
    Kang, U.
    2014 IEEE INTERNATIONAL CONFERENCE ON DATA MINING (ICDM), 2014, : 989 - 994
  • [27] Deep anomaly detection: A linear one-class SVM approach for high-dimensional and large-scale data
    Suresh, K.
    Velmurugan, K. Jayasakthi
    Vidhya, R.
    Sudha, S. Rahini
    Kavitha, V
    APPLIED SOFT COMPUTING, 2024, 167
  • [28] High-dimensional and large-scale anomaly detection using a linear one-class SVM with deep learning
    Erfani, Sarah M.
    Rajasegarar, Sutharshan
    Karunasekera, Shanika
    Leckie, Christopher
    PATTERN RECOGNITION, 2016, 58 : 121 - 134
  • [29] An alternative SMOTE oversampling strategy for high-dimensional datasets
    Maldonado, Sebastian
    Lopez, Julio
    Vairetti, Carla
    APPLIED SOFT COMPUTING, 2019, 76 : 380 - 389
  • [30] Hybrid Classification of High-Dimensional Biomedical Tumour Datasets
    Byczkowska-Lipinska, Liliana
    Wosiak, Agnieszka
    ADVANCED AND INTELLIGENT COMPUTATIONS IN DIAGNOSIS AND CONTROL, 2016, 386 : 287 - 298