Support Vector Machines on Large Data Sets: Simple Parallel Approaches

被引:13
|
作者
Meyer, Oliver [1 ]
Bischl, Bernd [1 ]
Weihs, Claus [1 ]
机构
[1] TU Dortmund, Dept Stat, Chair Computat Stat, Dortmund, Germany
关键词
D O I
10.1007/978-3-319-01595-8_10
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Support Vector Machines (SVMs) are well-known for their excellent performance in the field of statistical classification. Still, the high computational cost due to the cubic runtime complexity is problematic for larger data sets. To mitigate this, Graf et al. (Adv. Neural Inf. Process. Syst. 17:521-528, 2005) proposed the Cascade SVM. It is a simple, stepwise procedure, in which the SVM is iteratively trained on subsets of the original data set and support vectors of resulting models are combined to create new training sets. The general idea is to bound the size of all considered training sets and therefore obtain a significant speedup. Another relevant advantage is that this approach can easily be parallelized because a number of independent models have to be fitted during each stage of the cascade. Initial experiments show that even moderate parallelization can reduce the computation time considerably, with only minor loss in accuracy. We compare the Cascade SVM to the standard SVM and a simple parallel bagging method w.r.t. both classification accuracy and training time. We also introduce a new stepwise bagging approach that exploits parallelization in a better way than the Cascade SVM and contains an adaptive stopping-time to select the number of stages for improved accuracy.
引用
收藏
页码:87 / 95
页数:9
相关论文
共 50 条
  • [21] Classifying data sets using posterior probability for multiclass support vector machines
    Wang, Hongmei
    Zeng, Yuan
    Zhao, Zheng
    Wang, Chengshan
    Journal of Computational Information Systems, 2008, 4 (02): : 541 - 546
  • [22] Parallel Computing of Support Vector Machines: A Survey
    Tavara, Shirin
    ACM COMPUTING SURVEYS, 2019, 51 (06)
  • [23] Core vector machines: Fast SVM training on very large data sets
    Tsang, IW
    Kwok, JT
    Cheung, PM
    JOURNAL OF MACHINE LEARNING RESEARCH, 2005, 6 : 363 - 392
  • [24] Using support vector machines for classifying large sets of multi-represented objects
    Kriegel, HP
    Kröger, P
    Pryakhin, A
    Schubert, M
    PROCEEDINGS OF THE FOURTH SIAM INTERNATIONAL CONFERENCE ON DATA MINING, 2004, : 102 - 113
  • [25] Data condensation in large databases by incremental learning with support vector machines
    Mitra, P
    Murthy, CA
    Pal, SK
    15TH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION, VOL 2, PROCEEDINGS: PATTERN RECOGNITION AND NEURAL NETWORKS, 2000, : 708 - 711
  • [26] Feature analysis: Support vector machines approaches
    Shi, Y
    Zhang, TX
    IMAGE EXTRACTION, SEGMENTATION, AND RECOGNITION, 2001, 4550 : 245 - 251
  • [27] Goal programming approaches to support vector machines
    Nakayama, H
    Yun, Y
    Asada, T
    Yoon, M
    KNOWLEDGE-BASED INTELLIGENT INFORMATION AND ENGINEERING SYSTEMS, PT 1, PROCEEDINGS, 2003, 2773 : 356 - 363
  • [28] A simple decomposition method for support vector machines
    Hsu, CW
    Lin, CJ
    MACHINE LEARNING, 2002, 46 (1-3) : 291 - 314
  • [29] A Simple Decomposition Method for Support Vector Machines
    Chih-Wei Hsu
    Chih-Jen Lin
    Machine Learning, 2002, 46 : 291 - 314
  • [30] Selecting Relevant Descriptors for Classification by Bayesian Estimates: A Comparison with Decision Trees and Support Vector Machines Approaches for Disparate Data Sets
    Carbon-Mangels, Miriam
    Hutter, Michael C.
    MOLECULAR INFORMATICS, 2011, 30 (10) : 885 - 895