Support Vector Machines on Large Data Sets: Simple Parallel Approaches

被引:13
|
作者
Meyer, Oliver [1 ]
Bischl, Bernd [1 ]
Weihs, Claus [1 ]
机构
[1] TU Dortmund, Dept Stat, Chair Computat Stat, Dortmund, Germany
关键词
D O I
10.1007/978-3-319-01595-8_10
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Support Vector Machines (SVMs) are well-known for their excellent performance in the field of statistical classification. Still, the high computational cost due to the cubic runtime complexity is problematic for larger data sets. To mitigate this, Graf et al. (Adv. Neural Inf. Process. Syst. 17:521-528, 2005) proposed the Cascade SVM. It is a simple, stepwise procedure, in which the SVM is iteratively trained on subsets of the original data set and support vectors of resulting models are combined to create new training sets. The general idea is to bound the size of all considered training sets and therefore obtain a significant speedup. Another relevant advantage is that this approach can easily be parallelized because a number of independent models have to be fitted during each stage of the cascade. Initial experiments show that even moderate parallelization can reduce the computation time considerably, with only minor loss in accuracy. We compare the Cascade SVM to the standard SVM and a simple parallel bagging method w.r.t. both classification accuracy and training time. We also introduce a new stepwise bagging approach that exploits parallelization in a better way than the Cascade SVM and contains an adaptive stopping-time to select the number of stages for improved accuracy.
引用
收藏
页码:87 / 95
页数:9
相关论文
共 50 条
  • [41] A distributed ensemble of relevance vector machines for large-scale data sets on Spark
    Qin, Wangchen
    Liu, Fang
    Tong, Mi
    Li, Zhengying
    SOFT COMPUTING, 2021, 25 (10) : 7119 - 7130
  • [42] Comments on the Core vector machines: Fast SVM training on very large data sets
    LITIS, INSA de Rouen, Avenue de l'Université, 76801 Saint-Etienne du Rouvray, France
    J. Mach. Learn. Res., 2007, (291-301):
  • [43] A distributed ensemble of relevance vector machines for large-scale data sets on Spark
    Wangchen Qin
    Fang Liu
    Mi Tong
    Zhengying Li
    Soft Computing, 2021, 25 : 7119 - 7130
  • [44] Fast Training on Large Genomics Data using Distributed Support Vector Machines
    Theera-Ampornpunt, Nawanol
    Kim, Seong Gon
    Ghoshal, Asish
    Bagchi, Saurabh
    Grama, Ananth
    Chaterji, Somali
    2016 8TH INTERNATIONAL CONFERENCE ON COMMUNICATION SYSTEMS AND NETWORKS (COMSNETS), 2016,
  • [45] Parallel Multicategory Support Vector Machines (PMC-SVM) for classifying microcarray data
    Zhang, Chaoyang
    Li, Peng
    Rajendran, Arun
    Deng, Youping
    FIRST INTERNATIONAL MULTI-SYMPOSIUMS ON COMPUTER AND COMPUTATIONAL SCIENCES (IMSCCS 2006), PROCEEDINGS, VOL 1, 2006, : 110 - +
  • [46] A hierarchical and parallel method for training support vector machines
    Wen, YM
    Lu, BL
    ADVANCES IN NEURAL NETWORKS - ISNN 2005, PT 1, PROCEEDINGS, 2005, 3496 : 881 - 886
  • [47] Linear programming approaches for multicategory support vector machines
    Yajima, Y
    EUROPEAN JOURNAL OF OPERATIONAL RESEARCH, 2005, 162 (02) : 514 - 531
  • [48] Heuristic approaches for support vector machines with the ramp loss
    Carrizosa, Emilio
    Nogales-Gomez, Amaya
    Morales, Dolores Romero
    OPTIMIZATION LETTERS, 2014, 8 (03) : 1125 - 1135
  • [49] Heuristic approaches for support vector machines with the ramp loss
    Emilio Carrizosa
    Amaya Nogales-Gómez
    Dolores Romero Morales
    Optimization Letters, 2014, 8 : 1125 - 1135
  • [50] Ensemble approaches of support vector machines for multiclass classification
    Min, Jun-Ki
    Hong, Jin-Hyuk
    Cho, Sung-Bae
    PATTERN RECOGNITION AND MACHINE INTELLIGENCE, PROCEEDINGS, 2007, 4815 : 1 - 10