A fast classification strategy for SVM on the large-scale high-dimensional datasets

被引:0
|
作者
I-Jing Li
Jiunn-Lin Wu
Chih-Hung Yeh
机构
[1] National Taichung University of Science and Technology,Department of Applied Statistics
[2] National Chung Hsing University,Deptartment of Computer Science and Engineering
来源
关键词
Profile support vector machine; Large-scale datasets; High-dimensional data; MagKmeans algorithm; Fast condensed nearest neighbor rule;
D O I
暂无
中图分类号
学科分类号
摘要
The challenges of the classification for the large-scale and high-dimensional datasets are: (1) It requires huge computational burden in the training phase and in the classification phase; (2) it needs large storage requirement to save many training data; and (3) it is difficult to determine decision rules in the high-dimensional data. Nonlinear support vector machine (SVM) is a popular classifier, and it performs well on a high-dimensional dataset. However, it easily leads overfitting problem especially when the data are not evenly distributed. Recently, profile support vector machine (PSVM) is proposed to solve this problem. Because local learning is superior to global learning, multiple linear SVM models are trained to get similar performance to a nonlinear SVM model. However, it is inefficient in the training phase. In this paper, we proposed a fast classification strategy for PSVM to speed up the training time and the classification time. We first choose border samples near the decision boundary from training samples. Then, the reduced training samples are clustered to several local subsets through MagKmeans algorithm. In the paper, we proposed a fast search method to find the optimal solution for MagKmeans algorithm. Each cluster is used to learn multiple linear SVM models. Both artificial datasets and real datasets are used to evaluate the performance of the proposed method. In the experimental result, the proposed method prevents overfitting and underfitting problems. Moreover, the proposed strategy is effective and efficient.
引用
收藏
页码:1023 / 1038
页数:15
相关论文
共 50 条
  • [31] A fast fuzzy clustering algorithm for large-scale datasets
    Shi, LK
    He, PL
    ADVANCED DATA MINING AND APPLICATIONS, PROCEEDINGS, 2005, 3584 : 203 - 208
  • [32] Estimating the Number of Clusters in High-Dimensional Large Datasets
    Zhu, Xutong
    Li, Lingli
    INTERNATIONAL JOURNAL OF DATA WAREHOUSING AND MINING, 2023, 19 (02)
  • [33] Systematic Review of Clustering High-Dimensional and Large Datasets
    Pandove, Divya
    Goel, Shivani
    Rani, Rinkle
    ACM TRANSACTIONS ON KNOWLEDGE DISCOVERY FROM DATA, 2018, 12 (02)
  • [34] DPM: Fast and scalable Clustering Algorithm for Large Scale High Dimensional Datasets
    Ghanem, Tamer F.
    Elkilani, Wail S.
    Ahmed, Hatem S.
    Hadhoud, Mohiy M.
    2014 10TH INTERNATIONAL COMPUTER ENGINEERING CONFERENCE (ICENCO), 2014, : 26 - 35
  • [35] Snacks: a fast large-scale kernel SVM solver
    Tanji, Sofiane
    Della Vecchia, Andrea
    Glineur, Francois
    Villa, Silvia
    2023 EUROPEAN CONTROL CONFERENCE, ECC, 2023,
  • [36] LARGE-SCALE PARALLEL SIMULATION OF HIGH-DIMENSIONAL AMERICAN OPTION PRICING
    Chang Hong-xu
    Lu Zhong-hua
    Chi Xue-bin
    DCABES 2009: THE 8TH INTERNATIONAL SYMPOSIUM ON DISTRIBUTED COMPUTING AND APPLICATIONS TO BUSINESS, ENGINEERING AND SCIENCE, PROCEEDINGS, 2009, : 127 - 132
  • [37] DPM: Fast and scalable Clustering Algorithm for Large Scale High Dimensional Datasets
    Ghanem, Tamer F.
    Elkilani, Wail S.
    Ahmed, Hatem S.
    Hadhoud, Mohiy M.
    2014 9TH INTERNATIONAL CONFERENCE ON COMPUTER ENGINEERING & SYSTEMS (ICCES), 2014, : 71 - 79
  • [38] A clustering scheme for large high-dimensional document datasets
    Jiang, Jung-Yi
    Chen, Jing-Wen
    Lee, Shie-Jue
    ADVANCES IN COMPUTATION AND INTELLIGENCE, PROCEEDINGS, 2007, 4683 : 511 - 519
  • [39] Scalable Iterative Classification for Sanitizing Large-Scale Datasets
    Li, Bo
    Vorobeychik, Yevgeniy
    Li, Muqun
    Malin, Bradley
    IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2017, 29 (03) : 698 - 711
  • [40] Large-scale Parallel Simulation of High-dimensional American Option Pricing
    Chang Hong-xu
    Lu Zhong-hua
    Chi Xue-bin
    JOURNAL OF ALGORITHMS & COMPUTATIONAL TECHNOLOGY, 2012, 6 (01) : 1 - 16