Support Vector Machines on Large Data Sets: Simple Parallel Approaches

被引:13
|
作者
Meyer, Oliver [1 ]
Bischl, Bernd [1 ]
Weihs, Claus [1 ]
机构
[1] TU Dortmund, Dept Stat, Chair Computat Stat, Dortmund, Germany
关键词
D O I
10.1007/978-3-319-01595-8_10
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Support Vector Machines (SVMs) are well-known for their excellent performance in the field of statistical classification. Still, the high computational cost due to the cubic runtime complexity is problematic for larger data sets. To mitigate this, Graf et al. (Adv. Neural Inf. Process. Syst. 17:521-528, 2005) proposed the Cascade SVM. It is a simple, stepwise procedure, in which the SVM is iteratively trained on subsets of the original data set and support vectors of resulting models are combined to create new training sets. The general idea is to bound the size of all considered training sets and therefore obtain a significant speedup. Another relevant advantage is that this approach can easily be parallelized because a number of independent models have to be fitted during each stage of the cascade. Initial experiments show that even moderate parallelization can reduce the computation time considerably, with only minor loss in accuracy. We compare the Cascade SVM to the standard SVM and a simple parallel bagging method w.r.t. both classification accuracy and training time. We also introduce a new stepwise bagging approach that exploits parallelization in a better way than the Cascade SVM and contains an adaptive stopping-time to select the number of stages for improved accuracy.
引用
收藏
页码:87 / 95
页数:9
相关论文
共 50 条
  • [1] Training Support Vector Machines on Large Sets of Image Data
    Kukenys, Ignas
    McCane, Brendan
    Neumegen, Tim
    COMPUTER VISION - ACCV 2009, PT III, 2010, 5996 : 331 - 340
  • [2] Using the Leader Algorithm with Support Vector Machines for Large Data Sets
    Romero, Enrique
    ARTIFICIAL NEURAL NETWORKS AND MACHINE LEARNING - ICANN 2011, PT I, 2011, 6791 : 225 - 232
  • [3] Using support vector machines for mining regression classes in large data sets
    Sun, ZH
    Gao, LX
    Sun, YX
    2002 IEEE REGION 10 CONFERENCE ON COMPUTERS, COMMUNICATIONS, CONTROL AND POWER ENGINEERING, VOLS I-III, PROCEEDINGS, 2002, : 89 - 92
  • [4] Parallel decomposition approaches for training support vector machines
    Serafini, T
    Zanghirati, G
    Zanni, L
    PARALLEL COMPUTING: SOFTWARE TECHNOLOGY, ALGORITHMS, ARCHITECTURES AND APPLICATIONS, 2004, 13 : 259 - 266
  • [5] Boosting support vector machines for imbalanced data sets
    Wang, Benjamin X.
    Japkowicz, Nathalie
    KNOWLEDGE AND INFORMATION SYSTEMS, 2010, 25 (01) : 1 - 20
  • [6] Boosting support vector machines for imbalanced data sets
    Wang, Benjamin X.
    Japkowicz, Nathalie
    FOUNDATIONS OF INTELLIGENT SYSTEMS, PROCEEDINGS, 2008, 4994 : 38 - 47
  • [7] Boosting support vector machines for imbalanced data sets
    Benjamin X. Wang
    Nathalie Japkowicz
    Knowledge and Information Systems, 2010, 25 : 1 - 20
  • [8] One-class support vector machines for large-scale data sets
    Wang, H. (hgwang@tsinghua.edu.cn), 2013, Southeast University (43):
  • [9] Data mining with parallel support vector machines for classification
    Eitrich, Tatjana
    Lang, Bruno
    ADVANCES IN INFORMATION SYSTEMS, PROCEEDINGS, 2006, 4243 : 197 - 206
  • [10] Parallel tuning of support vector machine learning parameters for large and unbalanced data sets
    Eitrich, T
    Lang, B
    COMPUTATIONAL LIFE SCIENCES, PROCEEDINGS, 2005, 3695 : 253 - 264